Contributing checklists from identification keys

larsgw · May 17, 2023, 8:54pm

I wanted to follow up on the Data Use Club webinar today. I have been collecting metadata on identification keys and similar resources (books, articles, websites, etc.) in a public, searchable catalog. For some of those resources I have also collected checklists, to better specify the taxonomic coverage of the identification keys (example). Sometimes these contain unusual combinations, such as Stollia venustissima (Schranck, 1776) [now Eysarcoris venustissimus (Schrank, 1776)] that do not appear in the GBIF Backbone. Would these checklists be suited for inclusion in the Backbone Taxonomy?

If so, how could I go about that? The checklists are available as Darwin Core (example), but perhaps not in the format expected by GBIF. Right now I have about 600 checklists, sometimes multiple per publication as I try to capture the coverage of individual keys (e.g. when there are separate keys for adults and larvae). Should I merge them or treat them as separate datasets?

I am also wondering about possible issues with the data quality. I have put checks in place against typos — by matching the names to GBIF (and CoL for non-major taxon ranks) with GNverifier — but of course that does not work as well when the name/combination is not in GBIF. When this is the case I double-check the name but I can therefore not guarantee that there are absolutely no typos.

[Edit: not to mention mistakes from the authors of the publications themselves. The example I link to includes “Coptosoma globus Fieber, 1861, Eur. Hem, p. 380”, which actually refers to C. globus (Fabricius) and not a new species.]

mgrosjean · May 22, 2023, 6:57am

Hi @larsgw, thank you for this post!
Are those checklists published on GBIF.org? or Checklistbank.org?

larsgw · May 22, 2023, 11:42am

I haven’t published anything yet, except on my own site. I was wondering whether it would be worth publishing them (given possible typographical errors of the author, and possible additional ones from me), and if so where (GBIF or Checklistbank.org) and how (as one dataset, or one dataset per publication).

mgrosjean · May 22, 2023, 12:12pm

@larsgw I think whether you would like to publish the data as separate datasets or one big dataset is up to you.
Everything published on GBIF makes it to Checklistbank. With that in mind, checklist bank is able to handle more format than Darwin Core Archive.
In either case, both system will perform a number of automated checks which might be helpful for catching possible issues in the datasets.

As for their inclusion in the GBIF Backbone, this is a question for @markus

markus · May 22, 2023, 1:06pm

Thanks @larsgw that looks like an excellent resource we would be glad to integrate with.
I have taken the DwC csv file of your example and it pretty much works out of the box: ChecklistBank

Just the scientificNameID field had to be renamed to taxonID as the foreign keys from parentNameUsageID otherwise would have been broken.

To publish this to GBIF and ChecklistBank it would require some metadata about the dataset. The publication metadata you have here for example seems like a perfect fit to me, I have used it (as far as I could make sense of the cyrillic) as metadata for the test dataset I created with your data:
https://www.dev.checklistbank.org/dataset/59204/about

There is just a single red issue the software has spotted, which is remarkable. A missing author bracket.

Publishing a dataset per key or publication is both fine. As the metadata mostly seems to be about the publication and the taxonomy of the keys is very likely the same I would probably go for a dataset per publication if that is something you can generate. But by key is also very fine really.

larsgw · May 23, 2023, 9:59am

That’s great to hear. I’ll try to figure out how/where I can actually do the imports.

Would duplicate entries for a single name/taxon matter? On my side, de-duplicating a union of multiple checklists might be a bit error-prone if names are provided with e.g. different authorship strings (with/without, abbreviated/full) in different parts of the publication.

markus · May 23, 2023, 10:27am

It wouldn’t be nice to have duplicates/variants, but would not at all hurt integrating them into COL & GBIF. We would only add them once.

But maybe it is simpler and more accurate then to include the lists just as you have them. Is there any metadata to distinguish them, e.g. a different title like “Key to adults. In: xxx”?

larsgw · May 26, 2023, 10:24pm

Sorry, I missed this reply. Sometimes the different keys have a specified, citable title or heading, but usually not. I would have to make them, and the resulting citations would be non-standard. I think the most correct solution would be de-duplication. Perhaps I can do that based on the GBIF backbone ID that I try to link to anyway, that should work in most cases. I also still have to de-duplicate pro-parte synonyms.

Topic		Replies	Views
Publication of Checklists Diversifying the GBIF data model	2	535	March 18, 2023
GBIF checklist datasets and data gaps - GBIF Data Blog Data blog	6	4518	December 14, 2023
How do I get new taxa added? Data Publishing	2	749	June 20, 2021
Switching GBIF’s taxonomic backbone to the Catalogue of Life extended release (x-release) (GBIF technical support hour for nodes) Data Publishing NodesSupportHour , taxonomy	5	376	November 13, 2024
Overview of the technical components of GBIF Data Publishing NodesSupportHour	7	1145	February 12, 2023

Contributing checklists from identification keys

Related topics