Fetching datasets(Keys) instead of occurrence records with rgbif functions

Hello :slight_smile:
Within the rgbif-package i am trying to collect datatsets that match some contraints like having media (StillImage) and being preserved specimen.I also want to filter them by their TaxonKeys. So as for occurrence records, it could look like this:

occ_with_images ← occ_search(
mediaType = “StillImage”,
basisOfRecord = “PRESERVED_SPECIMEN”,
taxonKey = 7289045,
limit = limit,
start = start
)
Where i will get a nested list with occ-records and along some other information the datasets name (or better the datasetKey) the occurrence belongs to.

So i am wondering, is there a way, to request the datasets directly that match the constraints?
Currently i have to fetch the occurrence records and extract the unique DatasetKeys behind them and only then i can start to access the datasets of interest.

It might be important to note, that we do not know the datasetkeys or datasets names beforehand. We are rather trying to identify datasets of interest matching specified parameters. I did find the datasets() function, but no work around to apply constraints in a similar manner as it is possible for the occ_search() or occ_download().
dataset_search() seems to only allow for a query to filter for stillimages and preserved specimen, which is not precise enough.

I hope i was able to make clear what i am looking for, and big thanks in advance!

Hi @usersarah

It sounds like what you need is to use the API facets like this: http://api.gbif.org/v1/occurrence/search?media_type=StillImage&taxon_key=7289045&basisOfRecord=PRESERVED_SPECIMEN&limit=0&facet=dataset_key

You could also try a SQL occurrence download, you can read more about it here: GBIF SQL Downloads - GBIF Data Blog.

@usersarah

You could try to use facets in rgbif.

occ_count(facet="datasetKey", taxonKey=7289045,facetLimit=100000)

This will return just the dataset keys that match the search.

https://docs.ropensci.org/rgbif/articles/occ_counts.html

2 Likes

thank you this helped a lot!

@mgrosjean , @jwaller
thanks to both of you. I now get a table of datasetKeys very quick and effective. There is just one open question for me - not sure if this should go in a new thread: Where do i find the dataset names to the corresponding datasetKeys?

If i use the TaxonKeys above, i find 5 distinct datasetKeys, but only 3 dataset names in “datasetName”. Yet of course, if i manually search for the 5 datasets with the datasetKeys (e.g., OAC-BIO Herbarium) I find 5 different datasets with different names. Can you maybe let me know where exactly i can find the same information when using the rgbif package? Without having to make it a 2-step search ideally.

thank you in advance!

@usersarah
This would be the way to get the dataset title from datasetKey

library(rgbif)
dataset_get("4fa7b334-ce0d-4e88-aaae-2e0c138d049e")$title

You can loop through them using purrr::map or something similar.

1 Like

Great, thank you alot!

@usersarah note that if you generate an occurrence download, you will get a list (and title) of all the datasets that have data included in the download. See this example: Download. This is the type of file you can then have: https://api.gbif.org/v1/occurrence/download/0006047-241007104925546/datasets/export?format=TSV. You can do this via the web interface or the API.