Please don't be so certain about your uncertainty

datafixer · April 18, 2023, 12:44am

I’ll begin this post by saying something that might sound controversial: If you only provide latitude and longitude for an occurrence, your Darwin Core location data are incomplete.

You probably already know that if you leave out a geodetic datum, GBIF will assume that your coordinates are based on the WGS84 datum, even if they aren’t. The “WGS84” is just a guess, and the difference between the point in WGS84 and the point in another datum could be several hundred metres or more. Please always specify your geodeticDatum.

The other item you shouldn’t leave out is coordinateUncertaintyInMeters. Your latitude/longitude coordinates specify a point, and that point doesn’t have a size. It’s infinitesimally small. No one can safely assume that your coordinates actually mean “about here, or maybe plus or minus 50 metres”, or “this was the GPS reading somewhere on our big sampling plot”.

In other words, a Darwin Core location should have at least four entries: decimalLatitude, decimalLongitude, geodeticDatum and coordinateUncertaintyInMeters (cUIM, for short). This quartet of entries specifies a circle within which the occurrence was located. The centre of the circle is the point with the latitude and longitude you entered, given your datum. The radius of the circle is the coordinateUncertaintyInMeters.

If you don’t provide a cUIM, then your location is incomplete. How close was your observation or collection to that latitude/longitude point? 10 meters? 100? 1000? 10000? 100000? You didn’t tell us! If you provide a cUIM, then your location expands from a point to a circle, and you are saying that the occurrence was definitely within that circle. Thank you!

Unfortunately, many of the datasets I audit don’t have cUIM entries attached to their coordinates. They should, because uncertainty really matters (Marcer et al. 2020, Marcer et al. 2022) when GBIF data are used for research purposes.

There’s an excellent online resource from GBIF (Chapman and Wieczorek 2020) that describes in detail how to estimate cUIM when the location information comes from a specimen label or a spot on a map. Here I want to suggest practical estimation methods for field workers who get locations from handheld GPS units or smartphones. I’ll consider three sampling scenarios:

Opportunistic sampling. You’re wandering around a moorland and you find a small colony of a particular moss. You grab a few samples, you take a GPS reading at the last grab and the unit says the “accuracy” of the reading is 5m, so you’re reasonably confident that the decimalLatitude and decimalLongitude will be close to where you’re standing. What’s the cUIM for this occurrence?

It isn’t 5m, the GPS “accuracy”. Note down (in the field) an estimate of the radius of a circle within which you observed and sampled the moss, with the GPS reading at the centre of that circle. This cUIM will probably be much larger than 5m.

Plot-based sampling. You record all the tree species on a 50x50m plot. Your GPS unit says the “accuracy” of the reading is 5m. Is that the cUIM?

No, and the uncertainty depends on where you took your GPS reading. If the GPS unit was at the centre of the plot, then a circle including the whole of the 50x50m plot has a radius of about 35m (half the plot diagonal). To allow for GPS reading uncertainty, the estimated cUIM would realistically be 45m. If the reading was taken at a corner of the plot, a realistic cUIM would be 80m, and at other locations on the plot might be estimated between 45m and 80m.

Transect sampling. You sample along a more or less straight line, 200m long, and you have GPS readings with 5m “accuracy” at the start point and the finish point of the transect. What’s the cUIM?

With the “point-radius” method for recording occurrences, you first need to calculate the coordinates of the midpoint of your transect. These midpoint coordinates are what you put in decimalLatitude and decimalLongitude. The cUIM is then half the length of the transect, 100m, plus a bit more to allow for GPS uncertainty, say 110m.

There’s another way to give the location of a transect or plot, namely using WKT geometry. Suppose I sample along a transect from -41.2228 145.6008 to -41.2235 145.6025, with my GPS unit set to the WGS84 datum. The WKT format has longitude first, so the WKT representation for this transect is

LINESTRING(145.6008 -41.2228,145.6025 -41.2235)

I can enter this in a Darwin Core footprintWKT field together with the footprintSRS, which is WGS84.

I calculate the midpoint of my transect:

decimalLatitude = ((-41.2228) + (-41.2235))/2 = -41.2232

decimalLongitude = (145.6008 + 145.6025)/2 = 145.6016

The cUIM will be half the length of the transect, plus a bit more for GPS uncertainty. If I don’t know the transect length, I can calculate it with an online “distance between latitude longitude points” program, like these:

https://www.calculator.net/distance-calculator.html

http://edwilliams.org/gccalc.htm

or I can estimate it with a spatial mapping program, such as Google Earth/Google Maps or a GIS. The distance between the start and end points of my transect is roughly 170m, so a cUIM allowing for GPS uncertainty would be about 90m.

You can also define a polygon in which you sampled (an irregular area, or a square plot) as a POLYGON footprintWKT, but the order of the coordinates of the corners of the polygon must be anticlockwise and the start point must be the same as the finish point. (See also https://dwc.tdwg.org/terms/#dwc:footprintSpatialFit.) One advantage to representing the plot or polygon by its corners is that you can then use a spatial mapper to find its centroid (its decimalLatitude and decimalLongitude) and to estimate the radius of a circle that includes the whole of the area.

Please note my use of “a bit more”, “about” and “roughly” in this post. That’s deliberate, because all uncertainties should be regarded as approximate. I cringe when I see something like “3218.69” in cUIM. The number either came from a georeferencing calculator or (as in this case) converting exactly 2 miles to metres. An uncertainty can’t possibly be that certain, so I suggest in this case rounding up to something like 4000, and putting “uncertainty approximated” in georeferenceRemarks.

(In this case, I’m assuming “2 miles” was an estimate between 1.5 miles (2414m) and 2.5 miles (4023m).)

Robert Mesibov (“datafixer”); robert.mesibov@gmail.com

system · May 18, 2023, 10:45am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How can you be in two places at once? Data Publishing	1	60	June 4, 2025
The vexed question of missing data in Darwin Core Data Publishing	8	994	August 19, 2022
Publishing Camtrap DP with the IPTv3+ Data Publishing NodesSupportHour	2	102	March 13, 2025
The Humboldt extension to Darwin Core Data Publishing NodesSupportHour	0	52	May 19, 2025
Preferences or recommended best practices for granularity of data Data Publishing	4	610	January 15, 2022

Please don't be so certain about your uncertainty

Related topics