Forgive me if this has already been asked:
I’m working on a project to generate preliminary conservation rankings for Texas invertebrates. As part of this project, I’m working with a local copy of GBIF’s invertebrate records for Texas from a select few datasets.
I’m also working with GBIF’s backbone as a starting point.
My hope would be to occasionally sync this data as GBIF receives updates. Initially, I had planned to use the modified column to perform these syncs (only ingest data that has been modified by the provider since my last sync). However, I realized that, in the event that GBIF reformed their backbone, I could potentially end up with orphaned observation taxa.
I then turned to the lastInterpreted column as a solution–if I ingest records that have been reinterpreted in the system since my last sync, I could guarantee that I would get both newly added records as well as records that may have received changes within GBIF’s system.
The problem I’ve run into is that in the last week, nearly every record in my dataset (2.5 million +) has been reinterpreted twice.
Is this to be expected? I don’t want to be re-downloading the entire dataset if I don’t need to be, but I’m currently a bit stumped on how to ensure that any taxonomic reinterpretations are synced to my local data.
Thanks for the help,
Mitch