I call each of these a “GBIF problem” in the title because they usually arise when GBIF processes data from an IPT. This isn’t always the case, though (see below).
Here are three examples. In each case a perfectly valid ISO 8601 eventDate entry is converted by GBIF to an incorrect ISO 8601 date.
- 2017 is processed as 2017-01-01
- 2017-04 is processed as 2017-04-01
- 2017-04-06/2017-06-25 is processed as 2017-04-06
These GBIF problems have been repeatedly noted and discussed on the GBIF, TDWG and related GitHub pages for years:
- Display occurrence dates with appropriate level of precision #313 (March 2016)
- Introduce Occurrence isoEventDate field(s) #2 (September 2016)
- event date whats wrong? #217 (June 2017)
- Invalid date reported for date intervals in eventDate #60 (October 2017)
- year from, year to #652 (November 2017)
- eventDates incorrect #1307 (April 2018)
- Misleading “Identified date unlikely” flag (Remarks) #1488 (August 2018)
- dates in table list #1538 (September 2018)
- Representing imprecise dates in the Java object model #36 (December 2018)
- startDayOfYear and endDayOfYear, year, month, day #223 (May 2019)
- Additional checks for vague date ranges required? #23 (February 2020)
- January 1 peaks #2796 (May 2020)
- Can we prevent over-precise eventDates being generated in interpretation? #2816 (June 2020)
- The interpreted date seems to be wrong : “2008-06-01/2008-06-30” is interpreted to “2008-06-01” #3786 (November 2021)
- Temporal Coverage doesn’t allow for year only or year month only date ranges #1752 (March 2022)
- Preventing/Reporting the creation of false data in eventDate #811 (October 2022)
- Introduce Occurrence isoEventDate field(s) #2 (September 2016 - October 2018)
- What and how to return Occurrence eventDate? #4 (September 2016 - February 2020)
To save you reading through all those issues and discussions, here’s a summary: “GBIF is aware of these problems but hasn’t fixed them yet”.
To its credit, GBIF allows data users to see or download both the before-processing date (verbatim value) and the after-processing date (interpreted value). While this is great for savvy data users, it must be a continuing source of embarrassment for GBIF technical staff.
But it isn’t always GBIF’s fault. Data compilers and publishers sometimes work with software that refuses to recognise “2017” as a date or “2017-04-06/2017-06-25” as a date interval. The date error arises during data entry and stays there. If the data publisher enters “2017-01-01” when only “2017” is known (or worse, “2017-00-00”), that’s the data publisher’s mistake, not GBIF’s.
The solution in before-processing is to pass the data to a third-party data-checking service. First-of-month and first-of-year dates will get queried, and if verbatimEventDate is “6 April to 25 June 2017”, the data compiler will be encouraged to change eventDate from “2017-04-06” to “2017-04-06/2017-06-25”.
Robert Mesibov (“datafixer”); robert.mesibov@gmail.com