GBIF attempts to improve identifier stability by monitoring changes of occurrenceIDs - GBIF Data Blog

dodobot · November 6, 2023, 12:53pm

Since 2022, GBIF has been monitoring changes of occurrenceIDs in datasets to improve the stability of GBIF identifiers. We pause data ingestion when we detect more than half of occurrence records in the latest version have different occurrenceIDs from the previous version (live on GBIF.org). This identifier validation process automatically creates issues on GitHub and GBIF helpdesk will contact the publishers to verify the changes of occurrenceIDs.

This is a companion discussion topic for the original entry at https://data-blog.gbif.org/post/improve-identifier-stability

EstebanMH-SiB · November 7, 2023, 1:52am

Thank you very much @mgrosjean @Kumiko, really informative and useful. A small question, are datablogs available for translation in Crowdin? Or in case that they are not available, could we translate the first part of this blog and publish it in our web page (with all the credits to Kumiko, of course). We will try to share this information with our network in 2024 and make it more widely known.

dshorthouse · November 7, 2023, 2:17am

Great work everyone on the new routine and description of what actions are taken under what conditions. And, thanks for the shout out to Bionomia and its users who stand to benefit from the increased stability in occurrenceIDs you’re able to foster. This is a ton of work with a lot of back-and-forth communications with data publishers. Let’s hope the volume of issues continues to show sign of attenuation as data publishers embrace the importance for those who use their data, for repeatability in science, and for their own tracking purposes.

Kumiko · November 7, 2023, 10:25am

Thank you for your interest in sharing this! The data blog is not in Crowdin. Please feel free to use this blog post. You can translate and publish the materials on your website. Thank you for mentioning the credits.

peterdesmet · November 9, 2023, 11:30am

Excellent!

There seems to be a formatting issue in the section " Three options to deal with identifier issues". I believe the following is intended to be a table?

Number Option Who can do this What happens after 1 Resume the data ingestion by allowing changes of occurrenceIDs GBIF helpdesk GBIF identifiers under old occurrenceIDs will be deprecated and new GBIF identifiers will be given for new occurrenceIDs. 2 Change back …

peterdesmet · November 9, 2023, 11:32am

Oh, I see I can also read it at GBIF attempts to improve identifier stability by monitoring changes of occurrenceIDs - GBIF Data Blog, which has the correct formatting

Topic		Replies	Views
OccurrenceID stability (GBIF technical support hour for Nodes) Data Publishing NodesSupportHour	5	874	November 21, 2023
Calculating collection date --> GBIF upload date lag times Data Use	3	289	May 2, 2024
The strange case(s) of the missing identity Miscellaneous	23	274	September 8, 2024
Occurrence records without their event records Data Publishing	5	750	February 20, 2022
Webinar 2: Where are occurrences? (David Shorthouse) Diversifying the GBIF data model	0	575	July 16, 2022

GBIF attempts to improve identifier stability by monitoring changes of occurrenceIDs - GBIF Data Blog

Related topics