Simple_CSV doesn't include the search key used

I’ve just downloaded a large set of occurrence records for several hundred taxa. I did some work up front to make sure I was using the taxonKeys for accepted taxa, in order to get all records for the taxa including their synonyms.

With my list of taxonKeys, I used the following rgbif function to get the data:

occ_download_prep(
    pred_in("taxonKey", keys),
    pred("hasCoordinate", TRUE),
    format = "SIMPLE_CSV"
    )

where keys is a vector of taxonKeys.

That all worked fine, but now my original search set of ~800 species has returned a table with ~2000 species in it. I was expecting that, but I was also expecting SimpleCSV would include the acceptedTaxonKey, or some other column I could use to group my records by the original search terms.

Am I missing something, or will I now need to do a name_usage search for each taxonKey in my data to find its acceptedTaxonKey (i.e., which of my original search terms it belongs to)?

It will be a bit awkward if the API lets you search for multiple taxa at once, but returns a set of occurrences with different names/keys, without indicating which records belong with which of the searched taxa.

1 Like

Hi @plantarum,

As far as I know, the simple CSV download only includes the taxonKey (which is the GBIF backboneKey corresponding to the the scientificName or higherRank - if the scientificName isn’t found in the backbone) and the speciesKey which corresponds to the accepted name for occurrences with the taxon rank species.

The values in the scientificName field correspond to the scientific name interpreted by GBIF (in other words, matched to the backbone taxonomy) while the verbatimScientificName field contains the name as provided to GBIF by the data publishers.

If you need more fields, I suggest downloading occurrences in the Darwin Core Archive download format. Amongst other fields, the occurrence.txt file contains:

  • taxonKey
  • acceptedTaxonKey
  • kingdomKey
  • phylumKey
  • classKey
  • orderKey
  • familyKey
  • genusKey
  • subgenusKey
  • speciesKey
  • species
  • acceptedScientificName
1 Like

Thanks,

Yes, the speciesKey in some cases is what I need. But not consistently, as some of the taxa I was searching for were subspecies, for those I’m out of luck.

I found it a little confusing trying to find out what fields are included in simpleCSV. The only documentation I could locate was limited to saying “includes the most commonly used fields”, but doesn’t actually list which fields they are. I will submit a new search for the Darwin Core Archive.

  • ty
1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.