Investigating taxonomic issues on GBIF.org

The video recording of the presentation is available here: Investigating taxonomic issues on GBIF.org.

Links mentioned in the presentation:

Q&A transcript:

How to report several issues? For example if hundreds of records are identified with associated taxonomic issues should we log separate issues? Should we share a report?

If you have identified such amount of issues, perhaps you can identify a pattern. For example, specific families or genera which would all relate to the same taxonomic source. Sometimes there are issues that are due tohigher taxonomy (it can happen in some records coming from literature treatments). So it would be best to log one feedback message or GitHub issue. If you (or we) can identify a pattern, we will handle it as one issue and if not, we will split it accordingly.

How can scientificNameID values help unambiguous matching to the GBIF Backbone taxonomy? For example, federal employees in the US often use ITIS as their taxonomic reference, would it help if they provide such identifiers?

Only the WoRMS reference databases is currently used for ID matching. In addition to that, in order to be used, the identifiers must be integrated in the GBIF Backbone taxonomy. As the backbone hasn’t been updated in more than a year, you would still encounter some challenges if you use the identifiers for the more recent species, as they might not be in the taxonomy.
We have opened a GitHub issue to investigate adding more sources for scientificNameID matching: Resolve more taxonID suppliers than WoRMS · Issue #1119 · gbif/pipelines · GitHub. You are welcome to suggest the integration of identifiers from specific taxonomic references if they aren’t used yet.Note that sometimes the scientificName and classification and the scientificNameID provided don’t match at all. In that case, our system prioritizes the identifier match and flags the occurrences.

I reported a taxonomic GitHub issue two years ago and it was closed as solved , but the taxonomic issue remains. What happened?

It looks like it was our mistake. We reopened the issue now. If you have suggestions for open-licence taxonomic sources that we could use for the Backbone taxonomy (especially for algae), please let us know.

What happens to the taxonKeys/taxonID s if a name changes (for example if a misspelling was corrected)? Is the updated name associated with a new key/id?

The new name would get a new identifier in most cases (the exception would be very minor corrections). All the scientific names associated with occurrence records are reinterpreted after the GBIF Backbone taxonomy is updated. They will be linked to the new taxon keys/identifiers. That means that if you use a taxonKey to query records on GBIF, you should make sure that it is still relevant after a GBIF Backbone update.

If we have an issue with (for example) a marine species, can we log it in GBIF , or should we log it with the Catalogue of Life or with W oRMS directly?

You are welcome to contact the taxonomic source directly. In most cases, we forward the feedback to these sources anyway. Contacting them directly will likely result in a faster update. We receive the updates from these sources.
Note that sometimes, some marine species names can come from other sources than WoRMS. Don’t forget to check the source of the name.
You are welcome to contact us if the source is unresponsive, or if you aren’t sure what the source of the name is.

Which tools are worth educating data publisher on? If you are going to advise someone about publishing on GBIF, which tools would you recommend them to use to make sure the taxonomic information provided is of quality?

Publishers are very welcome to use the GBIF Species matching tool (Species name matching), which is based on the Species match API: Species API :: Technical Documentation.
There are two limitations to using the species matching tool:

  1. the names are only matched to the GBIF Backbone taxonomy (you can’t choose another reference)
  2. there is a limit to the number of names that you can match to the taxonomy

An alternative would be to use the checklistbank asynchronous matching tool: ChecklistBank. It has no limit to the number of names that can be matched, and you can choose any checklist available on checklistbank as reference. Please also check this tutorial: ChecklistBank tutorial (other checklistbank tutorials are available from this page: Data Use Club Practical Session: accessing and downloading species information - #2 by mgrosjean).

You can always publish your data on the GBIF test website: https://www.gbif-uat.org from TEST IPTs to see how they would be interpreted by GBIF.

Will hybrid names be integrated in the extended Catalogue of Life release (XR COL)? They are currently not included in the Catalogue of Life.

It is true that the Catalogue of Life doesn’t have hybrid names but the extended release will have them. You can learn more about this extended release here: Switching GBIF’s taxonomic backbone to the Catalogue of Life extended release (x-release). There are currently 3785 hybrid names in XR COL vs 5767 hybrids in the backbone. We need to check first where those missing names come from before we are able to assess whether they can be integrated in the XR COL.

We would like to publish our GBIF datasets to OBIS as well , and we need to provide the AphiaIDs. Is there a way for GBIF to infer the AphiaIDs based on the names?

You could use the name matching checklistbank function to match your names to WoRMS.

Could the name matching be automated?

You can use the API call /dataset/{key}/match/nameusage/job, where the key is the checklist bank key for the reference dataset you want (documented here: COL ChecklistBank API).

Ideally, the system could infer the AphiaIDs directly from the names provided, I have logged the idea in the IPT GitHub repository: Would it be possible to add AphiaIDs to species records on datasets on the IPT · Issue #2649 · gbif/ipt · GitHub

Can the name matching system work for any identifiers?

Potentially, you can match your names to any checklist available in Checklistbank.

Can the RGBIF functions give a good estimate of how the names would be interpreted by GBIF?

The RGBIF package is a wrapper for the GBIF API (GBIF API Reference :: Technical Documentation). You need to use the function that correspondsto match (Species API :: Technical Documentation), not search. The match function is what is used to match the scientific names of occurrences to the GBIF backbone taxonomy. See also this blogpost: (Almost) everything you want to know about the GBIF Species API - GBIF Data Blog.
With that in mind, you can only query up to 100,000 records with the API.
Using the matching tool from checklistbank would help you match as many names as you want.

About hybrid names : any chance to cover cultivar names (not the names in the ICNCP code , but cultivars used by plant breeders)?

It depends if there is a checklist available for those names. If not, the challenge is to first assemble and publish such a checklist.

We would like to have a session about translation of the Darwin Core to other languages for the conference Datos Vivos 2025 in Bogota. If you are interested in participating, please contact @EstebanMH-SiB.

3 Likes