Rgbif: How to download Taxon Identifiers

Hello,
I would like to download all Taxon Identifiers (e.g. Index Fungorum ID, NCBI taxonomy ID, EPPO Code, etc.) for given occurrences (e.g. Verticillium dahliae), using rgbif.
Can you help me?
Many thanks in advance.

Giovanni

2 Likes

rgbif maintainer here. First, are you sure those other taxon identifiers are in GBIF’s data?

Yes. Search a fungal species, click on the species to open its page (e.g. Verticillium dahliae Kleb.), and you will see TAXON IDENTIFIERS at the bottom of the page

Hello giovabubi, since those taxon identifiers seem to be sourced from wikidata, wouldn’t it be a better approach to fetch them from their SPARQL endpoint instead? You could still use the gbif taxon identifier to link both sources.

I wrote a quick demo for this in R, this needs some work to make it more efficient. You are probably better off using WikidataR::extract_claims() to retrieve claims for a number of objects at the same time, but I opted to do it this way so it would be easier to read if you are not familiar with Wikidata.

Let me know if I can clarify anything

Dear Pieter,
this script is amazing. It is exactly what I want. Thank you so much! I have thousands of species to search, and I will implement your script within a For Loop.
Thank you again.
Cheers

Giovanni

Dear Pieter,
the script gives an error when searching species like Agriopis marginaria. Might you be so kind to check?
Thank you in advance.

Giovanni

@giovabubi , I didn’t notice your message, if you ping me using @ I’ll notice quicker next time.

The issue was that Agriopis marginaria has multiple values for the iNaturalist taxon ID property. I’ve made a small change so the script will warn you if this happens but still return a dataframe with multiple rows (with all identifiers repeating except the one with multiple values).

However, I encourage you to investigate taxa which have multiple values for an identifier to figure out why this happened. In this case it’s a matter of a subspecies taxon on naturalist being mapped to the corresponding species level object on wikidata.

Also, It’s a lot more efficient to reduce the amount of queries you send towards wikidata, at the moment you are querying the database twice for every taxon. You could reduce this a lot by querying multiple taxa at the same time. I didn’t do this because I wanted my script to be easier for you to read in case you weren’t familiar with SPARQL and wikidataR. But it would save you some time and the wikidata sparql endpoint some load.

@giovabubi

I’ve made some changes so you can now query many different taxa at the same time:

Many thanks @pieter ! :yum:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.