Annual growth per dataset

nl-bif · January 19, 2022, 10:15am

For the annual reporting of the Dutch node - NLBIF - I need summary stats on how many records were added for each dataset from the Netherlands in 2021. Is there an easy API call to derive that information for e.g. Search dataset_key=15f819bd-6612-4447-854b-14d12ee1022d ?

dnoesgaard · January 19, 2022, 10:47am

Hi Niels.

I actually don’t think we log changes in the number of records per dataset, at least not in a systematic way that can be easily retrieved. However, there’s a log for each dataset of every step during ingestion and the number of records is also logged.

You can the ingestion history here (example): GBIF Registry

The API call for a dataset ingestion history would be:

https://api.gbif.org/v1/ingestion/history/15f819bd-6612-4447-854b-14d12ee1022d?limit=<number of events>

In the history for each ingestion event look in pipelineExecutions.steps for numberRecords, e.g.

"pipelineExecutions": [
{
"key": 1812560,
"stepsToRun": [
"VERBATIM_TO_INTERPRETED"
],
"rerunReason": "IUCN_RELEASE",
"created": "2022-01-18T14:10:26.038015",
"createdBy": "nvolik",
"steps": [
{
"key": 4603740,
"type": "VERBATIM_TO_INTERPRETED",
"runner": "DISTRIBUTED",
"started": "2022-01-18T19:13:42.498",
"finished": "2022-01-18T19:37:09.822",
"state": "COMPLETED",
"message": "{\"datasetUuid\":\"15f819bd-6612-4447-854b-14d12ee1022d\",\"attempt\":224,\"interpretTypes\":[\"TAXONOMY\",\"BASIC\"],\"pipelineSteps\":[\"HDFS_VIEW\",\"INTERPRETED_TO_INDEX\",\"VERBATIM_TO_INTERPRETED\"],\"runner\":\"DISTRIBUTED\",\"endpointType\":\"DWC_ARCHIVE\",\"extraPath\":null,\"validationResult\":{\"tripletValid\":false,\"occurrenceIdValid\":true,\"useExtendedRecordId\":null,\"numberOfRecords\":2000000},\"resetPrefix\":\"202201181405\",\"executionId\":1812560,\"routingKey\":\"occurrence.pipelines.verbatim.finished.distributed\"}",
"numberRecords": 4972211,
...

One would then try to find two ingestion events, one year apart and simply compare the values of numberRecords.

Will require a bit of scripting, but I suppose it could be done

nl-bif · January 20, 2022, 11:44am

Hi Daniel, thanks! That works. I’ll post some lines of R-code later. Niels

dnoesgaard · January 20, 2022, 11:55am

Great, do share, other people might find it useful too!

mgrosjean · January 20, 2022, 12:59pm

@nl-bif, this might not be useful for your this year but we started making available some numbers for datasets over time (including the occurrence counts). However, it only starts in July 2021: Index of /registry/dataset

Something that could be useful now would be to use the metadata associated with the whole GBIF downloads each month.
For example, here is is a download generated the 1st of January 2021 with no filter: https://doi.org/10.15468/dl.djx2hq. You can access the list of the datasets and number of records associated with each of them by using the API (either: https://api.gbif.org/v1/occurrence/download/0147082-200613084148143/datasets or https://api.gbif.org/v1/occurrence/download/0147082-200613084148143/datasets/export if you want to get a file out of it). You can compare it with this download generated the 1st of January 2022: https://doi.org/10.15468/dl.c2ycac (https://api.gbif.org/v1/occurrence/download/0088430-210914110416597/datasets/export). You just have to extract the datasets you need to compare in each file.

dnoesgaard · January 20, 2022, 1:20pm

Thanks, @mgrosjean, this approach seems more sound than what I had suggested!

nl-bif · January 20, 2022, 1:40pm

Interesting solution! One question, where can I find the whole GBIF downloads for each month?

mgrosjean · January 20, 2022, 2:06pm

I don’t know if we have a place where all those are listed. @MattBlissett might know.

In the meanwhile, here is a list I made (we have only been making this type of monthly download since 2018):
monthly_whole_GBIF_downloads_key_doi_date.csv (3.4 KB)

Edit: if you attempt to download all the occurrences on GBIF without any filter from the UI, you should be redirected to the download of the current month.

system · February 20, 2022, 12:07am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to see number of occurrences growth in GBIF Data Use	5	69	April 25, 2025
/occurrence/download/dataset/{datasetKey} Filtering Data Use	7	1055	June 4, 2021
Tour of the registry and how to debug ingestion Data Publishing NodesSupportHour	1	651	February 4, 2023
Event lists added to sampling-event datasets	0	957	July 19, 2018
Overview of the technical components of GBIF Data Publishing NodesSupportHour	7	1015	February 12, 2023

Annual growth per dataset

Related topics