Rgbif: How to download Taxon Identifiers

giovabubi · March 20, 2021, 5:34pm

Hello,
I would like to download all Taxon Identifiers (e.g. Index Fungorum ID, NCBI taxonomy ID, EPPO Code, etc.) for given occurrences (e.g. Verticillium dahliae), using rgbif.
Can you help me?
Many thanks in advance.

Giovanni

sckott · March 27, 2021, 6:33pm

rgbif maintainer here. First, are you sure those other taxon identifiers are in GBIF’s data?

giovabubi · March 27, 2021, 6:58pm

Yes. Search a fungal species, click on the species to open its page (e.g. Verticillium dahliae Kleb.), and you will see TAXON IDENTIFIERS at the bottom of the page

pieter · April 14, 2021, 10:04am

Hello giovabubi, since those taxon identifiers seem to be sourced from wikidata, wouldn’t it be a better approach to fetch them from their SPARQL endpoint instead? You could still use the gbif taxon identifier to link both sources.

I wrote a quick demo for this in R, this needs some work to make it more efficient. You are probably better off using WikidataR::extract_claims() to retrieve claims for a number of objects at the same time, but I opted to do it this way so it would be easier to read if you are not familiar with Wikidata.

gist.github.com

https://gist.github.com/PietrH/bdbadd18564dedfbb5defb5c89a17160

gbif_query_to_wikidata_ext_identifiers.R

# demonstration script of how to get all wikidata identifiers for a taxon via gbif lookup

# NOTES This script is not vectorized over species, and could be a lot more
# efficient by reducing the number of queries per taxon from two to one, and by
# only generating the lut once instead of once for every column.

# load libraries ----------------------------------------------------------

library(rgbif)
library(WikidataQueryServiceR)

This file has been truncated. show original

Let me know if I can clarify anything

giovabubi · April 14, 2021, 11:32am

Dear Pieter,
this script is amazing. It is exactly what I want. Thank you so much! I have thousands of species to search, and I will implement your script within a For Loop.
Thank you again.
Cheers

Giovanni

giovabubi · April 17, 2021, 8:27pm

Dear Pieter,
the script gives an error when searching species like Agriopis marginaria. Might you be so kind to check?
Thank you in advance.

Giovanni

pieter · May 5, 2021, 9:49am

@giovabubi , I didn’t notice your message, if you ping me using @ I’ll notice quicker next time.

The issue was that Agriopis marginaria has multiple values for the iNaturalist taxon ID property. I’ve made a small change so the script will warn you if this happens but still return a dataframe with multiple rows (with all identifiers repeating except the one with multiple values).

gist.github.com

https://gist.github.com/PietrH/bdbadd18564dedfbb5defb5c89a17160

gbif_query_to_wikidata_ext_identifiers.R

# demonstration script of how to get all wikidata identifiers for a taxon via
# gbif lookup

# NOTES This script is not vectorized over species, and could be a lot more
# efficient by reducing the number of queries per taxon from two to one, and by
# only generating the lut once instead of once for every column.

# load libraries ----------------------------------------------------------

library(rgbif)

This file has been truncated. show original

However, I encourage you to investigate taxa which have multiple values for an identifier to figure out why this happened. In this case it’s a matter of a subspecies taxon on naturalist being mapped to the corresponding species level object on wikidata.

pieter · May 5, 2021, 9:51am

Also, It’s a lot more efficient to reduce the amount of queries you send towards wikidata, at the moment you are querying the database twice for every taxon. You could reduce this a lot by querying multiple taxa at the same time. I didn’t do this because I wanted my script to be easier for you to read in case you weren’t familiar with SPARQL and wikidataR. But it would save you some time and the wikidata sparql endpoint some load.

@giovabubi

I’ve made some changes so you can now query many different taxa at the same time:

gist.github.com

https://gist.github.com/PietrH/bdbadd18564dedfbb5defb5c89a17160#file-gbif_query_to_wikidata_ext_identifiers_bulk-r

gbif_query_to_wikidata_ext_identifiers.R

# demonstration script of how to get all wikidata identifiers for a taxon via
# gbif lookup

# NOTES This script is not vectorized over species, and could be a lot more
# efficient by reducing the number of queries per taxon from two to one, and by
# only generating the lut once instead of once for every column.

# load libraries ----------------------------------------------------------

library(rgbif)

This file has been truncated. show original

gbif_query_to_wikidata_ext_identifiers_bulk.R

# demonstration script of how to get all wikidata identifiers for a taxon via
# gbif lookup, in bulk this time

# This script queries GBIF once to fetch taxon id's, then Wikidata to get a list
# of properties for the corresponding objects, then it fetches the values for
# these properties in bulk

# load libraries ----------------------------------------------------------

library(rgbif)

This file has been truncated. show original

list_of_species_to_test.txt

Xyleborinus saxesenii Ratzeburg, 1837
Trogoderma variabile Ballion, 1878
Hypothenemus birmanus Wood & Bright, 1992
Dinoderus minutus (Fabricius, 1775)
Cryptolestes ferrugineus (Stephens, 1831)
Oryzaephilus surinamensis (Linnaeus, 1758)
Xanthogaleruca luteola (O.F.Müller, 1766)
Anthrenus scrophulariae (Linnaeus, 1758)
Crossotarsus nitescens Schedl, 1979a
Crossotarsus externedentatus Wood & Bright, 1992

This file has been truncated. show original

There are more than three files. show original

giovabubi · May 6, 2021, 11:23am

Many thanks @pieter !

system · June 5, 2021, 9:23pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Download occurrences by IDs with rgbif Data Use	16	967	April 18, 2024
GBIF API beginners guide - GBIF Data Blog data-blog	8	5134	November 25, 2021
Downloading occurrences from a long list of species in R and Python - GBIF Data Blog data-blog	8	11017	April 18, 2024
Download iNaturalist images from GBIF using R Data Use	7	2365	April 1, 2022
How to retrieve basyonim and all synonims of a vector of species using RGBIF Data Use	2	883	July 2, 2022

Rgbif: How to download Taxon Identifiers

Related topics