A guide to date issues

A calendar day in Darwin Core records should be represented in the eventDate field as YYYY-MM-DD (“1945-06-29”), a month as YYYY-MM (“1945-06”) and a year as YYYY (“1945”).

These three formats can also be used to represent an interval, such as “1945-06-29/1945-07-03”, “1945-06/1945-07-03”, “1945/1946” or “1945-06-17/18” (same as “1945-06-17/1945-06-18”).

Getting the date right can make the difference between an event or occurrence record that’s usable and trustable, and one that isn’t. But getting the date right apparently isn’t so easy, to judge from the large number of date issues I see in my work as a data auditor.

In this post I summarise the kinds of issues that publishers should look for in their Darwin Core datasets when compiling or checking data. A key idea to keep in mind is that eventDate should hold a correct and correctly formatted date, not an exact copy of the date in the source (such as a specimen label). That exact copy should appear in a verbatimEventDate field. Example: verbatimEventDate = “vii.6.1924”, eventDate = “1924-07-06”.


Non-dates. “Unknown” and “?” are not dates. If the date isn’t known, leave eventDate blank or explain any date uncertainty in eventRemarks. For example, if the date is “July or August 1968”, put “1968” in eventDate and “July or August” in eventRemarks.

Impossible dates. “3006-11-13” (a date in the future), “2011-04-31” (April has 30 days), “2001-04-29” (2001 was not a leap year) and “1995-13-67” (there are only 12 months and no more than 31 days in the ISO 8601 (Gregorian) calendar).

Over-exact dates. If only the year 1973 is known, the date is “1973”, not “1973-01-01”. For April 1973, the date is “1973-04”, not “1973-04-01”. For “summer 1890” (Northern Hemisphere), “1890-06-21/1890-09-21” is over-exact. It could be replaced by “1890-06/09” or some other, appropriate span of months, or by “1890” in eventDate and “summer” in eventRemarks.

Incorrectly formatted dates. “1825-00-00” should be “1825”, “1825-06-00” should be “1825-06” and “1825–” should be “1825”.

Unnecessary intervals. If only the year 1983 is known, “1983-01-01/1983-12-31” is ambiguous overkill for “1983”. If only the month-year June 1983 is known, “1983-06-01/1983-06-30” is ambiguous overkill for “1983-06”. I’ve also seen unnecessary interval repeats: “1998-09-17/1998-09-17” should be “1998-09-17”.

Swapped intervals. “1952-02-23/1950-12-20” should be “1950-12-20/1952-02-23” (earlier/later).

Disappearing intervals. The original date information is “May 16th to 28th, 1935”, but eventDate is “1935-05-16” instead of the correct “1935-05-16/28”.

Incorrect transcriptions and format conversions. These errors are easy to make but can be detected if the Darwin Core dataset includes both verbatimEventDate and eventDate. Some examples:

1

Carry-forward and copy-down issues. These occur when multiple records are being prepared in a spreadsheet, or in a database that allows carrying-forward during data entry:

2

Illogical dates. eventDate cannot logically be later than dateIdentified, measurementDeterminedDate or modified. If eventDate is “2016-11-04” and dateIdentified is “2014-01-23” (or just “2014”), then one of those dates is wrong. Note, however, that eventDate might be later than georeferencedDate if the observation or collection site was selected and georeferenced in advance.

Two-places-at-once. This is not so much a date issue as a multiple-fields, multiple-records issue, but it is surprisingly common. In a Darwin Core dataset, the same collector in recordedBy on the same eventDate appears in the records in places a long distance apart. Either the collector, the date or the location (or a combination of these) is likely to be wrong. This example is from an American museum dataset:

3

Microsoft Excel date issues. These are hard to spot but worth remembering. One is that various Excel versions incorrectly assume that 1900 was a leap year, and include the fictitious 29 February 1900 in serial date numbering. A more serious issue is Excel date twins; see here and here for explanations. The problem is that a single date in Excel might be wrong by 4 years and a day, or the same event could appear in a database with one record having the correct date and another record having a date 4 years and a day earlier or later than the correct one.

Miscellaneous date disagreements. I audited a dataset in which there were numerous records dated well before the 1990s in which georeferenceSources had “GPS”. The explanation (from the data compiler) was that investigators had recently revisited old, published sites with GPS in hand; this information should have been in georeferenceRemarks. Another occasional disagreement I’ve found is in records based on published literature, with eventDate later than the date of publication of the record source in associatedReferences.


Robert Mesibov (“datafixer”); robert.mesibov@gmail.com

2 Likes