Determining if Occurrences Have Been Deleted

mihtmo · March 31, 2026, 8:29pm

I’m reviving an old thread of @MatDillen’s that went unanswered:

My situation is the same as their post—I have a fairly large, periodic query that I’m running on the download request API. This sync is working well, and I’m able to filter to records which were modified after my last sync to prevent from running a huge query every time. However, the case of deleted records is worrying me.

It’s my impression that deleted records do not continue to exist with a “deleted” attribute of some sort, although I would love to be wrong! Is there any way to see, from a download, that a record has disappeared? If not, does anyone have any ideas for how to avoid downloading every record in a large query, every single time, just to check for deleted records?

Thanks in advance!

MatDillen · April 1, 2026, 9:39am

The only method I found was to set up predicate queries for all the records that were present in the previous query, but not anymore in the most recent one. Then you’ll get a result like https://doi.org/10.15468/dl.n6rrdm where you’ll only get 44.765 out of the queried batch of 100.000. The missing records from this download will have effectively been deleted from the index and will only show a tombstone record on their /occurrence/[GBIFID] endpoint. The 44.765 found can be presumed to having (possibly temporarily) no longer corresponded to your recurring query’s conditions.

However, it is also possible for records to go missing intermittently, reappearing later with a new gbifID. When I ran my process over a year ago, I ran into the example of https://www.gbif.org/occurrence/5004364847 and https://www.gbif.org/occurrence/4910179219 . The latter is tombstoned and thus was no longer present in my most recent download. But a new record with the same data (catalog number, occurrence id, cetaf id…) had appeared in the most recent download.

My theory at the time was that this record had been omitted in a published version at the (Biocase) source, and then re-added in a later one. I think this may/will cause the process to preserve the link between source IDs and the int gbifIDs to break.

It is not trivial to identify and troubleshoot these kind of glitches, because, as far as I know, GBIF does not preserve the gbif-processed version of a deleted record - only the raw source data in the tombstone page. So you’ll have to do some mapping and converting yourself to enable pairwise comparison or clustering to flag these kind of “duplicates”. And, of course, because source record identifiers are diverse in protocol and not always stable themselves.

mgrosjean · April 1, 2026, 1:10pm

Deleted records aren’t indexed so they can’t be searched. @MatDillen solution is probably the easiest for you.

As Mathias mentioned, some records disappear but may reappear later. Some of this phenomenon is due to changes in occurrenceID values by data providers. If you want to learn more on the topic, you can read this blogpost or watch this video.

pieter · April 7, 2026, 12:13pm

Would comparing parquet dumps be an option?

AnaMaria · April 22, 2026, 12:47pm

Yeah, there’s no delete flag in GBIF. If a record is missing, it just means it no longer matches your query. Most people handle it by comparing snapshots.

Topic		Replies	Views
How to determine if any in a big list of occurrences have been deleted? Miscellaneous	1	164	May 23, 2024
Download occurrences by IDs with rgbif Data Use	16	1163	April 18, 2024
How to download data uploaded after a certain date? Miscellaneous	7	1912	December 5, 2019
Occurrence download API: check if download requested recently? Data Use	2	4119	April 4, 2019
Common things to look out for when post-processing GBIF downloads - GBIF Data Blog Data blog	8	4176	October 20, 2021

Determining if Occurrences Have Been Deleted

Related topics