How can you be in two places at once?

In a GBIF forum post in 2022 I described a programmatic check to find anomalous occurrence records where the same collector was recorded as being in two very widely separated places on the same day.

That 2022 check was for a particular auditing job. I’ve now generalised the check for clean Darwin Core datasets. By “clean” I mean that recordedBy is filled in and free of pseudo-duplicates, eventDate is in ISO 8601 format and both decimalLatitude and decimalLongitude are filled in and valid. The check also allows me to set a threshold distance between sites said to be visited by the same collector on the same day.

With a threshold of 1000 km, the Smithsonian’s Extant Specimen Records, updated 2025-05-02, returned more than 3600 anomalous blocks of occurrence records. Most of these are the result of data entry errors in lat/lon and could be flagged by GBIF, for example as “presumed negative latitude” or “country coordinate mismatch”. Here are three such blocks:

recordedBy eventDate decLat decLon catalogNumber
T. Mendietta & B. Taylor 2023-08-01 33.5232 -103.863 US 3760262
T. Mendietta & B. Taylor 2023-08-01 33.6727 -4.40142 US 3760274
T. Mendietta & B. Taylor 2023-08-01 33.6421 -104.357 US 3760278
D. Rodriguez 2019-10-07 89.4074 -14.3674 US 3753480
D. Rodriguez 2019-10-07 14.3641 -89.4055 US 3760898
B. Wallnöfer 2006-05-01 48.2517 16.2325 US 3520318
B. Wallnöfer 2006-05-01 48.2517 48.2517 US 3520319
B. Wallnöfer 2006-05-01 48.2517 16.2325 US 3520320
B. Wallnöfer 2006-05-01 48.2506 16.2708 US 3520321

Other anomalies are more subtle and would suggest that curators re-check the accession details, as in this block:

recordedBy eventDate decLat decLon catalogNumber
E. Jenkins 2000-01-25 68.1912 -135.917 USNM 1385987
E. Jenkins 2000-01-25 68.1912 -135.917 USNM 1385988
E. Jenkins 2000-01-25 68.1912 -135.917 USNM 1400750
E. Jenkins 2000-01-25 51.5918 -116.061 USNM 1400751
E. Jenkins 2000-01-25 51.5918 -116.061 USNM 1400752

The collector was the same person (E.J. Jenkins) and the records are from the same country (Canada), but they’re ca 2100 km apart. Possible on the same day?

I’ve done this particular check for a number of large Darwin Core datasets. The strangest result was for an entomology dataset where Collector X seemed to be in two places at once quite often. The collection manager told me that

(Collector X) often had others collecting for him, and he was a bit ornery and sometimes did not acknowledge those other collectors on labels (used his own name), especially several that he paid to collect and didn’t really consider to be “bona fide entomologists.” Not nice. And so (Collector X) sometimes does seem to be in multiple countries at once (when he is in fact traveling abroad and active and another one of his local USA collectors is active) or in multiple states (when he is traveling around USA). For better or worse, his specimen labels say what the labels say, and those discrepancies cannot really be placed at the feet of data entry personnel, these would need quite a bit of extra curatorial thinking/research/validation (smile).


Robert Mesibov (“datafixer”);mesibov@datafix.com.au

3 Likes

Always a joy to read, Robert.
Now if cloning were possible I’d send my younger, fitter model out into the field :slight_smile: