GBIF API: Supporting ranges in occurrence eventDate

Dear GBIF API users,

A longstanding issue with the GBIF API is the interpretation and formatting of the Darwin Core term “eventDate”.

Summary: instead of GBIF changing published eventDate values like 2009-03-18/2009-04-13 and 2010 to 2009-03-18 and 2010-01-01 respectively, we propose returning the values 2009-03-18/2009-04-13 and 2010 in the occurrence API and in downloads. Existing code/scripts that use the eventDate value may need to be updated.

What and how to return Occurrence eventDate? · Issue #4 · gbif/gbif-api · GitHub is the main issue tracking this.

The recommended best practise for the term is “use a date that conforms to ISO 8601-1:2019” (see Darwin Core quick reference guide - Darwin Core).

ISO 8601-1:2019 supports date ranges, and some publishers provide these. Examples are 2000-05, or 2007-11-13/2007-11-15. GBIF’s current interpretation changes date ranges like this to the first possible day in the range (2000-05-01 and 2007-11-13).

At least 64 million occurrences are affected.

Change to date interpretation

We propose changing the eventDate field in the GBIF API to support ISO 8601-1 date ranges. A range will be returned where one was provided by the publisher, either directly as a range in the eventDate field, or through a combination of the year, month, day, startDayOfYear and endDayOfYear fields.

The data quality checks on dates will be improved to check for consistency between these fields: eventDate, year, month, day, startDayOfYear and endDayOfYear. These fields will only be populated if they are constant for the whole range of dates — a range spanning several days in January 2020 will have year=2020, month=January and day=(Blank).

startDayOfYear and endDayOfYear will also be present if the range is accurate to days.

Examples:

published event date intepreted eventDate int. year int. month int. day int. sdoy int. edoy
2023-01-13 2023-01-13 2023 1 13 13 13
2023-01 2023-01 2023 1
2023 2023 2023
2023-01-13/2023-01-14 2023-01-13/2023-01-14 2023 1 13 14
2023-01-13/14 2023-01-13/14 2023 1 13 14
2023-01/2023-02 2023-01/2023-02 2023
2023-01/02 2023-01/02 2023
2023/2024 2023/2024
2023-01-01/2023-12-31 2023-01-01/2023-12-31 2023 1 365

Other cases where we can unambiguously determine a date or date range will also be handled, for example a record with a year and month but no eventDate, or non-ISO dates like January 2023.

API example:

This record (portal link) is published with eventDate=2009-03-18/2009-04-13, year=2009, month=3, day=18. We currently change the eventDate:

"year": 2009,
"month": 3,
"day": 18,
"eventDate": "2009-03-18T00:00:00",

With this proposal, we would preserve the eventDate but remove day, as it the event crosses several days:

"year": 2009,
"month": 3,
"eventDate": "2009-03-18/2009-04-13",

This record (portal link) is published with eventDate=2019-04-06T20:00:00/2019-04-10T05:00:00 and no separate day, month or year values. Currently, we process it to this:

"year": 2019,
"month": 4,
"day": 6,
"eventDate": "2019-04-06T20:00:00",

Instead, we propose returning this:

"year": 2019,
"month": 4,
"eventDate": "2019-04-06T20:00:00/2019-04-10T05:00:00",
"startDayOfYear": 96,
"endDayOfYear": 100,

Searching

The search and download APIs will be affected by this change.

Occurrences will be returned if the occurrence date/date range is completely within the query date or date range.

Search: eventDate=2023-01-11
Record: eventDate=2023-01-11    -- included
Record: eventDate=2023-01       -- EXCLUDED
Record: eventDate=2023-01-11/12 -- EXCLUDED

Search: eventDate=2023-01-11,2023-01-12
Record: eventDate=2023-01-11    -- included
Record: eventDate=2023-01       -- EXCLUDED
Record: eventDate=2023-01-11/12 -- included

Search: eventDate=*,2023-01 (meaning "Before end of January 2023")
Record: eventDate=2023-01-11    -- included
Record: eventDate=2023-01       -- included
Record: eventDate=2023-01-11/12 -- included

Search: eventDate=2023-01,2023-01 (meaning "After start of January 2023 AND before end of January 2023")
Search: eventDate=2023-01 (same meaning)
Record: eventDate=2023-01-11    -- included
Record: eventDate=2023-01       -- included
Record: eventDate=2023-01-11/12 -- included

This implementation will avoid returning occurrences with eventDates like “2010/2021” in many queries. (There are millions of occurrences with large ranges like this.)

Density maps

There is a year filter for the density/pixel maps. An occurrence from 2023-01 will be included, but an occurrence with an eventDate spanning more than a single year (like 2022-13-31/2023-01-01) will no longer be included.

Quarterly analytics, global/regional trends

The quarterly analytics include calculations based on the individual dwc:year, dwc:month and dwc:day fields. The statistics will be affected where these values change or become blank.

rGBIF, PyGBIF

Both libraries will be updated as necessary to support eventDate values containing a date range.

Feedback

We have delayed addressing this issue for a long time, primarily due to concerns about changing the existing behaviour of the API. However, it’s also one of the most frequently requested improvements to GBIF’s interpretation.

If you are aware of software or systems which would have problems adapting to the proposed change, please let us know, either here on the community forum, on the API users mailing list, the GitHub issue or by email to me.

We will alert users in the same places when the change is ready to be tested on the test system at api.gbif-uat.org, where it will be ready for testing for at least 2 weeks. We will also inform users when the change is to be made live on api.gbif.org.

Thank you,

Matt

6 Likes

This post also addresses the issue with eventDate: The first of the month/year GBIF problem and the interval date GBIF problem

This is very timely as one of our users just brought this issue up. I would like to see this change. Thanks!

@MattBlissett what about when the year is part of a span? as in what will you be doing with a value like this one?

  • 2022-12-29/2023-01-15

Thanks for making these date handling changes – looks very promising!

Thanks @Debbie,

2022-12-29/2023-01-15

This date would stay as it is, it’s already in the ideal format. The separate year, month and day fields would be blank. startDayOfYear would be 363, and endDayOfYear would be 15.

1 Like