lastInterpreted -- How often does GBIF interpret records?

Forgive me if this has already been asked:

I’m working on a project to generate preliminary conservation rankings for Texas invertebrates. As part of this project, I’m working with a local copy of GBIF’s invertebrate records for Texas from a select few datasets.
I’m also working with GBIF’s backbone as a starting point.

My hope would be to occasionally sync this data as GBIF receives updates. Initially, I had planned to use the modified column to perform these syncs (only ingest data that has been modified by the provider since my last sync). However, I realized that, in the event that GBIF reformed their backbone, I could potentially end up with orphaned observation taxa.

I then turned to the lastInterpreted column as a solution–if I ingest records that have been reinterpreted in the system since my last sync, I could guarantee that I would get both newly added records as well as records that may have received changes within GBIF’s system.

The problem I’ve run into is that in the last week, nearly every record in my dataset (2.5 million +) has been reinterpreted twice.
Is this to be expected? I don’t want to be re-downloading the entire dataset if I don’t need to be, but I’m currently a bit stumped on how to ensure that any taxonomic reinterpretations are synced to my local data.

Thanks for the help,
Mitch

Hi @mihtmo

Sorry for the late reply. Unfortunately, there isn’t any easy way to achieve what you would like to do.

The modified field is a value provided by the GBIF data publishers (Darwin Core Quick Reference Guide - Darwin Core). Not everyone will provide a value for that field so you wouldn’t necessarily get the latest records.

lastInterpreted won’t tell you whether a particular record has been updated. Every time a dataset is ingested by GBIF, all the records are reinterpreted. For example, if a dataset is updated because one new record was added, all the records in the dataset will be reinterpreted. In addition to that, GBIF may reinterpret all the records in the index when some interpretation changes are deployed.

Maybe an idea would be to check if the number of records in your selection has increased significantly before considering creating a new download?