We had a resource with original coordinates that were incorrect, they were outside of Colombia and it was a clear mistake, so the thing that we made in those cases is delete the decimal coordinates and kept the original coordinates in the verbatimLatitude and verbatimLongitude.
However, we realized that GBIF still interpreted those two fields and plotted the incorrect coordinates with a flag Country coordinate mismatch, as you can see here Occurrence Detail 2294394411.
So we should like to know how different data providers deals with this kind of problem, maybe keeping the whole coordinates in verbatimCoordinates field (we are not sure if GBIF still interpreted that field), or maybe with a comment in georeferenceRemarks saying that the original coordinates are a mistake.
This is an important issue for us, because when we discover incorrect coordinates, we always put them in verbatimCoordinates and delete the decimal fields, we use the decimal in our analysis and not the verbatim. But if GBIF portal always made the transformation, our job is not as useful as it can be for other end users.
We also like to know if there is a way to stop GBIF portal to interpreted wrong coordinates, maybe if the coordinate has a flag, the best option is do not interpreted them.
Thanks in advance for your time and have a good day, @camisilver
When verbatim coordinates are given but no decimal lat/long version is available, the GBIF interpretation does indeed attempt to convert the verbatim values. This is based on the assumption that verbatim coordinates can come in a variety of formats, that are each technically correct, just not decimal. There is no mechanism to intercept coordinates that are provided, but flagged as “known incorrect” - the transfer schema (Darwin Core) does not cater for this case. At GBIF.org, coordinates that are spotted as having issues during interpretation (like the country/coordinate mismatch) are then flagged centrally, and by default excluded from map display. The flag is also available in downloaded data, but the interpreted (wrong) information still remains available.
To fully exclude coordinates that are, already at source, known to be incorrect, we would recommend to handle this at the mapping stage (source dataset to DwC-A transfer schema generation). Assuming that the intention is to not publish known incorrect coordinates at all, the verbatim data fields could either be removed from the IPT mapping altogether, or the source data be filtered in a view that excludes the questionable records from being published through the IPT’s “verbatim” values. This way, the information is still maintained in the local database, while the published version relies on decimalLatitude and decimalLongitude values alone.
My opinion is that it may be important for users to know every information available in specimen labels. Even if we are sure or suspect it may be incorrect.
For example, a specimen with wrong coordinates which is duplicated in different datasets.
Let’s say one provider detects the error and removes the information, but the other doesn’t.
Then GBIF will always show the species present, but no clue about the error detected by one of the providers.
If one dataset shows the specimen coordinates flagged as “doubtful”, “incorrect” or whatever, then users will have an opportunity to know it, and decide whether they want to (un)trust possible duplicate specimens which might also have incorrect coordinates.
So, is there a way to “obfuscate” the verbatim coordinates so they are still human-readable, but we can stop GBIF from interpreting and convert them to lat/lon?
In that case, we could do that and put a informative flag before/after the verbatim coordinates (or in any other DwC concept)
Thanks @ahahn and @sant for the replies. Your explanation of the process is clear, with that in mind we took an informed decision.
We have a similar opinion of @sant , is important for end users to see the original coordinates, so we should like to keep them.
We understand that for a lot of cases the GBIF interpretation is valuable and at the moment there is no way to obfuscate the cases when there is a mistake.
For now, we will use the element georeferenceRemarks to store the verbatimCoordinates, with a comment saying that they are incorrect. In that way end users can still see them, but GBIF will not interpret the coordinates.
We will only use it this approach with known coordinates that are incorrect and maybe in the future GBIF can obfuscate the incorrect coordinates with serious flags like country mismatch.
Thank you for describing your use case so clearly, @EstebanMH-SiB and @sant. The current solution does indeed sound like the best way forward.
This scenario (exchanging coordinate values that are known to be incorrect) may not have been a central enough use case in the design of a transfer schema (Darwin Core, DwC) that is meant to define an encompassing enough, but not all-inclusive, limited set of terms that describe species occurrences.
The solution you wish for would require a data publisher (you) to be able to indicate in the original, published data, that a given value is known to be incorrect, e.g. adding a qualifying flag to specific values - in this case VerbatimLatitude / VerbatimLongitude / VerbatimCoordinates (Darwin Core quick reference guide - Darwin Core). To my knowledge, such qualifiers do not exist in the transfer schema. Explicitly transmitting the information that “there is this value, and we know it to be incorrect” is not easy to do in a structured, machine-readable form, especially not across tens of thousands of independent data sources.
In this sense, there is no sufficiently structured information available that would allow to obfuscate such values - especially not in a generic workflow that should still highlight correctable errors in other sources’ data. Flags on the side of GBIF’s data processing are part of the interpretation process of supplied original values, including verbatim fields. Issue detection does flag these cases based on the interpretation process itself, and e.g. highlights country/coordinate mismatches. These flags are available to data users; and for coordinates, they are also available as data filters: to only consider records without flagged issues. At the same time, the original and verbatim values are always available to those users interested in the source version of data.
I think yours is the best solution we can find that will satisfy both yours and other publishers’ and users’ needs. Thanks again for raising this concern!