Readers may recall that several years ago, GBIF mandated the use of a few CC licenses (or a CC0 waiver). I am wondering if we might now take the same approach with occurrenceID, especially as we are getting serious about digital extended specimens that will evidently require an identifier of this nature to safeguard identity in the face of all the links that will accrue.
Although meant to be globally unique, occurrenceID across all of GBIF is anything but. Some occurrence records lack them altogether, some are numeric auto-increments, some are merely copied/pasted from catalogNumber, some are http URIs, some are https URIs, some are UUIDs. Data publishers frequently change them for various reasons that often have little to do with the identity of the occurrence record but more to do with the administration of the dataset as a whole.
Could we / should we require that occurrenceID be populated and structured in a particular way now to help pave the way? And could we / should we enforce its handling as a persistent identifier, inclusive of its use in GBIF’s own occurrence URLs as a sign of faith, its occurrence APIs, use in BioSchemas & elsewhere? Or, do we wait until digital extended specimens are operational & later decide if their identifiers are the same as what we’d expect to use in occurrenceID?