Understanding GBIF taxonomic keys - usageKey, taxonKey, speciesKey

You might sometimes encounter various confusing GBIF keys when using GBIF or rgbif,

All GBIF taxonomic keys are id numbers used in place of words to identify species or some other taxonomic group.

taxonKey:
A taxonKey is the primary id number used in GBIF to id a species (or some higher group). These are the id numbers found in the GBIF backbone taxonomy. Often you will see them in the URL of an occurrence search: https://www.gbif.org/occurrence/search?taxon_key=7412043. These are the most important keys and usually what other keys map back to (rule of thumb: “all keys lead to taxonKeys”).

usageKey:
Using rgbif or GBIF API, you might encounter a usageKey. This key can be treated as equivalent to a GBIF taxonkey in the context of the GBIF Backbone. I have never found a usageKey in the wild that was not equivalent to a GBIF taxonKey, but it might exist.

speciesKey:
Sometimes you might find a speciesKey in a download or somewhere else. This key is the key for the species and is often equivalent to the GBIF taxonKey in the context of the GBIF Backbone. If the record is not of rank SPECIES, it won’t have a speciesKey. The same goes for the other GBIF ranked keys: genusKey, familyKey, classKey, orderKey, phylumKey, kingdomKey. These are also equivalent to their corresponding GBIF taxonKeys when the taxa is that rank.

nubKey:
A nubKey is a GBIF taxonKey found in the context of another checklist. For example, the checklist WORMS has 154946057 as its key for Animalia, but a nubKey of 1, which is the GBIF taxonKey for Animaila. It is called a “nubKey” rather than a taxonKey for legacy and compatibility reasons.

parentKey: in the context of the GBIF backbone the parentKey, is the GBIF taxonKey of the nearest higher rank (genus, family, order, class ect…). For example, the parentKey of Arthropoda (54) is Animaila (1). However, in a checklist that is not the GBIF backbone these numbers will be different.

Let me know if you have encountered any other keys that you find confusing…

5 Likes

Thank you, this helps but one thing not mentioned above is taxonID - I have a further question, asked here:

Hi!

There are a few more keys:
sourceTaxonKey
datasetKey
constituentKey
For example, https://api.gbif.org/v1/species/2607519

Could you explain what it is and how can to use, please?
Thanks in advance!

Hi @tagezi

The other fields you mentioned are here to help track where the name comes from. As you might already know, the GBIF backbone taxonomy is build from other sources which are published on GBIF as datasets/checklists (see this blogpost: Six questions answered about the GBIF Backbone Taxonomy - GBIF Data Blog). The idea is to be as transparent as possible when it comes from the source of the names.

The datasetKey is the key of the dataset (=checklist) used as source for this taxon. In the example you mentioned, it is https://www.gbif.org/dataset/09ae0135-3572-4662-9917-49d5f73daf6a ( Species Fungorum for CoL+).

The constituentKey is mostly used for sources that come to GBIF via the Catalogue of Life. That’s why the key you see here correspond to the Catalogue of Life checklist: https://www.gbif.org/dataset/7ddf754f-d193-4cc9-b351-99906754a03b. So in the example, Cladonia P.Browne, 1756 comes from Species Fungorum via the Catalogue of Life.

The sourceTaxonKey is the key of the taxon in the source checklist. In the example you mentioned, it is: https://www.gbif.org/species/176009415 (this is what was shared on GBIF in the first place).

Does it make sense? Please let us know if you have more questions.

Hello,

Thank you for the clarifications!

Could you also explain what acceptedTaxonKey stands for and how it is different from taxonKey?

Often, speciesKey matches the acceptedTaxonKey and not the taxonKey.

For example: https://api.gbif.org/v1/occurrence/search?taxonKey=8148822

1 Like

I’m also confused about the relationship between taxonKey, speciesKey, and acceptedTaxonKey.

Using rgbif, if I search for “Conyza canadensis”:

tmp <- name_backbone(name = "Conyza canadensis")

The results include:

key value
usageKey 5404801
acceptedUsageKey 3146791
speciesKey 3146791

From your explanation, I take it usageKey and taxonKey are synonyms. However, in this case usageKey/taxonKey and speciesKey are not equivalent.

If I search for the usageKey (as taxonKey) the results include:

taxonKey 5404801
speciesKey 3146791
acceptedTaxonKey 3146791
scientificName Conyza canadensis (L.) Cronquist
acceptedScientificName Erigeron canadensis L.

Searching for the acceptedTaxonKey I get:

taxonKey 3146791
speciesKey 3146791
acceptedTaxonKey 3146791
scientificName Erigeron canadensis L.
acceptedScientificName Erigeron canadensis L.

I downloaded both taxa via

rgbif::occ_download(pred("taxonKey", 3146791), ...)

Using the acceptedTaxonKey I retrieved 357182 records. Using the original usageKey/taxonKey, I got 174860 records, and I confirmed that they were all present in the larger record set.

After all of this, I think the taxonKey/usageKey applies to the exact name you search for, but if that name is considered a synonym you will not get all the records for the accepted species it is included within in the backbone. To get all the records for the species, including the ‘accepted’ species it has been assigned to, the code to use is the acceptedTaxonKey. Which is probably(?) the same as the speciesKey, at least when we’re looking for a species.

And as an added twist, rgbif::name_backbone(name = "Conyza canadensis") doesn’t include acceptedTaxonKey in the results, but does return an acceptedUsageKey.

I think/hope I’ve answered my own question, but it would be reassuring to have all of this explained somewhere. Is it? Have I missed it?

Hi @plantarum

You are correct.

The taxonKey corresponds indeed to the key of the name you searched. In the case where this name is not the accepted name (but a synonym), the acceptedTaxonKey will be different.

If you search occurrences with the key corresponding to an accepted name (the acceptedTaxonKey), you will get all the occurrences where scientific name match the accepted name as well as any synonym or child taxa (this will include any subspecies in this case).
While if you search occurrences with the key corresponding to a synonym, you will only get the occurrences where the scientific name matches the synonym.
This is why you get different results depending on the key you use.

Some explanations can be found in the GBIF API documentation (see for example the explanation for the taxonKey parameter in the occurrence download).

Other helpful resources include the Data Use Club recorded webinars. There is one about the GBIF API and one about the GBIF backbone taxonomy which could be quire helpful.

The documentation needs some improvement.

1 Like