The first of the month/year GBIF problem and the interval date GBIF problem

I call each of these a “GBIF problem” in the title because they usually arise when GBIF processes data from an IPT. This isn’t always the case, though (see below).

Here are three examples. In each case a perfectly valid ISO 8601 eventDate entry is converted by GBIF to an incorrect ISO 8601 date.

  • 2017 is processed as 2017-01-01
  • 2017-04 is processed as 2017-04-01
  • 2017-04-06/2017-06-25 is processed as 2017-04-06

These GBIF problems have been repeatedly noted and discussed on the GBIF, TDWG and related GitHub pages for years:

To save you reading through all those issues and discussions, here’s a summary: “GBIF is aware of these problems but hasn’t fixed them yet”.

To its credit, GBIF allows data users to see or download both the before-processing date (verbatim value) and the after-processing date (interpreted value). While this is great for savvy data users, it must be a continuing source of embarrassment for GBIF technical staff.

But it isn’t always GBIF’s fault. Data compilers and publishers sometimes work with software that refuses to recognise “2017” as a date or “2017-04-06/2017-06-25” as a date interval. The date error arises during data entry and stays there. If the data publisher enters “2017-01-01” when only “2017” is known (or worse, “2017-00-00”), that’s the data publisher’s mistake, not GBIF’s.

The solution in before-processing is to pass the data to a third-party data-checking service. First-of-month and first-of-year dates will get queried, and if verbatimEventDate is “6 April to 25 June 2017”, the data compiler will be encouraged to change eventDate from “2017-04-06” to “2017-04-06/2017-06-25”.


Robert Mesibov (“datafixer”); robert.mesibov@gmail.com

These issues were addressed in the UK’s NBN Data Model by the authors of our biological recording desktop software Recorder (now Recorder 6) where they were implemented.
As you say Robert, a large complex issue but one useful starting point might be the model detailed in
Copp, C. J. T. (2000). The NBN data model and its implementation in Recorder 2000. Environmental Information Management, (November), 1–105. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.5946&rep=rep1&type=pdf
There may be further topics raised on this on the NBN Forum at Forum which retains stuff from decades ago
The model was subsequently used in the UK’s online system iRecord - operated by our BRC (funded by JNCC)
All predating Darwin Core, none of my reading of that suggests that the NBN model was ever consulted during its development, maybe I’m wrong
That’s the best response I can give to this off the top of my head. Always a bit pushed on these GBIF forum topics as they close for comments so quickly. To investigate the way that the UK addressed these problems I’d suggest posting enquiries on the NBN Forum, with the warning that like many such forums it’s very deserted these days. The desktop system Recorder 6 is still maintained if you wish to explore, I guess there’s a more recent paper than that 2000 one to be had somewhere
Good hunting
Darwyn

There’s is now work underway to fix how eventDate is handled in the API: GBIF API: Supporting ranges in occurrence eventDate