How to download data uploaded after a certain date?

Hi there,

I wondered if there was a way of downloading occurrence data that has been uploaded to GBIF since a certain date?

I want to download (via rgbif) all occurrences for a particular taxonomic group (Poaceae family) that have been uploaded to GBIF since 2014 (i.e since the last time I downloaded them).

From looking at the numbers of records, I believe that specifying the ‘year’ criteria (i.e all records from the start of 2014) will only give me occurrences made in that time frame. There must be records that are from before 2014 that have been uploaded in the time since then (the number of occurrences has increased by ~10m since 2014 but only ~1.5m are accounted for by specifying the year criteria). Is there information about when the records are uploaded? I’ve tried specifying the ‘last interpreted’ in the GBIF api to see how that influences the number of records but it seems that all records have been modified since 2014 (no change in number of records by adding that criteria).

Is there a way without downloading all the data again?

Many thanks!

@sckott do you know a way of doing this in rgbif? Thanks :slight_smile:

Hi Kim,

I do think the last interpreted date would the right way forward, but I’m unsure why you’re not getting the expected results. I’m hoping some of my colleagues can chip in here. @mgrosjean, @MattBlissett?

/Daniel

Thanks for asking here @kim1801 - I want to see what GBIF folks have to say first to make sure we know what the various date variables mean

An update to provide more info - it seems like something changed (at least for Poaceae records) after August this year with regards to ‘last interpreted’. Here are some numbers:

last interpreted before end of August 2019 = 0 records
last interpreted before end of Sept 2019 = 24 M records
last interpreted before end of October 2019 = 26.8 records
last interpreted before end of November 2019 = 26.9 records

Anyone know what is going on with ‘last interpreted’?

@mgrosjean @MattBlissett perhaps?

Best wishes, and thanks!

Kim

Hi Kim, Scott,

lastInterpreted, and the related fields lastParsed and lastCrawled, are not really useful for analysis. They exist so data publishers can see the state of their data in GBIF.org.

The oldest lastInterpreted date is in September because all records were reprocessed in September to link them to the updated GBIF backbone taxonomy. Records are also reprocessed if we update our geography index, or other interpretation processes.

There isn’t an easy way to omit the data you downloaded before. If a key (or GBIF id) is present in both the 2014 download and the current index, then that is the same record (we do not reuse keys), but it’s possible for publishers to delete and recreate records, so a new key is not a guarantee that the record wasn’t present in GBIF in 2014.

Ah ok, I understand. Many thanks for that information Matt.

Cheers,

Kim