I wanted to follow up on the Data Use Club webinar today. I have been collecting metadata on identification keys and similar resources (books, articles, websites, etc.) in a public, searchable catalog. For some of those resources I have also collected checklists, to better specify the taxonomic coverage of the identification keys (example). Sometimes these contain unusual combinations, such as Stollia venustissima (Schranck, 1776) [now Eysarcoris venustissimus (Schrank, 1776)] that do not appear in the GBIF Backbone. Would these checklists be suited for inclusion in the Backbone Taxonomy?
If so, how could I go about that? The checklists are available as Darwin Core (example), but perhaps not in the format expected by GBIF. Right now I have about 600 checklists, sometimes multiple per publication as I try to capture the coverage of individual keys (e.g. when there are separate keys for adults and larvae). Should I merge them or treat them as separate datasets?
I am also wondering about possible issues with the data quality. I have put checks in place against typos — by matching the names to GBIF (and CoL for non-major taxon ranks) with GNverifier — but of course that does not work as well when the name/combination is not in GBIF. When this is the case I double-check the name but I can therefore not guarantee that there are absolutely no typos.
[Edit: not to mention mistakes from the authors of the publications themselves. The example I link to includes “Coptosoma globus Fieber, 1861, Eur. Hem, p. 380”, which actually refers to C. globus (Fabricius) and not a new species.]
Hi @larsgw, thank you for this post!
Are those checklists published on GBIF.org? or Checklistbank.org?
I haven’t published anything yet, except on my own site. I was wondering whether it would be worth publishing them (given possible typographical errors of the author, and possible additional ones from me), and if so where (GBIF or Checklistbank.org) and how (as one dataset, or one dataset per publication).
@larsgw I think whether you would like to publish the data as separate datasets or one big dataset is up to you.
Everything published on GBIF makes it to Checklistbank. With that in mind, checklist bank is able to handle more format than Darwin Core Archive.
In either case, both system will perform a number of automated checks which might be helpful for catching possible issues in the datasets.
As for their inclusion in the GBIF Backbone, this is a question for @markus
Thanks @larsgw that looks like an excellent resource we would be glad to integrate with.
I have taken the DwC csv file of your example and it pretty much works out of the box: ChecklistBank
Just the scientificNameID field had to be renamed to taxonID as the foreign keys from parentNameUsageID otherwise would have been broken.
To publish this to GBIF and ChecklistBank it would require some metadata about the dataset. The publication metadata you have here for example seems like a perfect fit to me, I have used it (as far as I could make sense of the cyrillic) as metadata for the test dataset I created with your data:
There is just a single red issue the software has spotted, which is remarkable. A missing author bracket.
Publishing a dataset per key or publication is both fine. As the metadata mostly seems to be about the publication and the taxonomy of the keys is very likely the same I would probably go for a dataset per publication if that is something you can generate. But by key is also very fine really.
That’s great to hear. I’ll try to figure out how/where I can actually do the imports.
Would duplicate entries for a single name/taxon matter? On my side, de-duplicating a union of multiple checklists might be a bit error-prone if names are provided with e.g. different authorship strings (with/without, abbreviated/full) in different parts of the publication.
It wouldn’t be nice to have duplicates/variants, but would not at all hurt integrating them into COL & GBIF. We would only add them once.
But maybe it is simpler and more accurate then to include the lists just as you have them. Is there any metadata to distinguish them, e.g. a different title like “Key to adults. In: xxx”?
Sorry, I missed this reply. Sometimes the different keys have a specified, citable title or heading, but usually not. I would have to make them, and the resulting citations would be non-standard. I think the most correct solution would be de-duplication. Perhaps I can do that based on the GBIF backbone ID that I try to link to anyway, that should work in most cases. I also still have to de-duplicate pro-parte synonyms.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.