I am currently in the process of preparing an occurrence dataset from an eDNA project. However, during species classification, I have encountered some OTU sequences that were only identified up to the domain level - Eukaryota, with no lower-level taxon rank available.
I have noticed that the GBIF backbone does not include the ‘domain’ taxon rank, and I am seeking suggestions on how to proceed with these occurrences. Should I exclude them and only keep those OTUs (and their corresponding occurrences) that have been identified up to at least the kingdom level?
You can include them in the dataset, and if you include sequences as well, then potential future users may be able to re-annotate using an updated database. The interpreted value for scientificName would show up as ‘Incertae sedis’ but the original (verbatim) value would still be associated with each record.
How specific was the primer you used? Could you with relative certainty say it should be kingdom level ‘Animalia’ for example? Or do you expect that some sequences may belong to other kingdoms?
I would add this modification to Cecilies comment: You should preferably include all OTUs and their sequences (using the dna-derived dwc extension). That will make the data interoperable across datasets, with or without annotation.
We used COI gene, which isn’t specific to any kingdom really, and those annotated ones in the dataset now contains kingdom Animalia, Chromista, Fungi, Plantae and Protozoa - which is actually quite diverse.
I will proceed with including all OTUs as occurrence along with their DNA sequences.