7. Persistent identifier (PID) schemes

Moderators: Wouter Addink, Alex Hardisty and Hong Cui

Background

Persistent identifiers (PID) are long-lasting references that can be used to unambiguously identify any kind of object. They are foundational elements of data infrastructure, not only as identifiers but also as connectors of one thing to another. PIDs can uniquely identify physical objects to digital artefacts to records of transactions to the identification of specific vocabulary terms and concepts. Different kinds of PID scheme can be used for identifying different things- DOIs for documents and datasets, for example; ORCiD for persons, ROR for organisations. With a PID scheme we mean in this consultation not only the technical elements but the whole arrangement around PIDs for using and operating them. This includes the ownership, authority, governance and financial elements. A scheme that we aim to discuss in particular is DOI adapted with characteristics specific to natural sciences, being the scheme proposed for Digital extended Specimens.

PIDs are a foundation for achieving the FAIR Guiding Principles of being ‘findable, accessible, interoperable and reusable. There are many different kinds of PID and several kinds can be used in combination in any specific domain or application. What is used at the level of the institution to identify physical objects, database records and collections of those can be different from one another. And these will often be different from what is used elsewhere in data manipulation, aggregation, archival, federation and integration, which is different from what is needed for Digital extended Specimens.

The ability of machines to process Digital extended Specimen (DS) data depends on trustworthy, reliable PIDs for those DS. The challenge is not the choice of identifier scheme for DS, for which DOIs are proposed by DiSSCo but that there is presently no adequate global scheme for assigning and recording DOIs for DS, making this one issue for the present consultation.

Allied to this is the question of what else needs to be persistently identified and how. DS are a special category of object, with a special status. But there are a wide range of other object types associated with the manipulation and processing of Digital Specimens, including for example transactions of loans and visits, annotations and determinations, images and other digitized artefacts, instruments, facilities, collections, provenance and attribution events, and more. Each of these objects must be identified but each category has its own requirements regarding the information associated with the PID and the mechanisms by which PIDs are allocated. It isn’t necessary or even desirable to use the same identifier scheme as for DS when simpler schemes with different Handle prefix(es) can suffice. Establishing global PID schemes for such objects is a further challenge and also a topic for the consultation.

Translating proposals to actionable steps the community can trust, align to and support through smooth, non-disruptive transitions is the key hurdle to overcome for success and widespread adoption. Steps include developing a supporting community around mechanisms for registration/resolution and the services needed for the administration of that, along with the associated business model to remain viable over the long-term. This business model should be adapted to the circumstances of the community served. Custom PID services address the needs of the natural sciences community, with specific characteristics of metadata for describing specimens. Trust in the validity of metadata and referential integrity both imply the need for institutional commitments embodied in formalisations such as service level agreements or memoranda of understanding. Multiple stakeholders must begin a journey to frame an agreement of the necessary technical, ownership, authority, governance and financial elements. This must lead to: i) robust technical implementation; ii) stable policy, governance and funding models; ii) trust in the validity of metadata; iii) referential integrity; and iv) guarantees of long-term persistence.

The goal of this topic of consultation is to discuss the shape of such a framework and to identify the significant milestones towards achieving it. A secondary but equally important objective is to allow organisations with a stakeholding interest to begin to develop trust, alignment, and eventually commitment towards a global PID service framework for the domain of Digital extended Specimens.

Questions to promote discussion

  1. If DOIs were available for Digital extended Specimen referring to the physical specimens in your collection with links to extended information and annotations, what role could they play in your work?
  2. What added benefits/services should be provided to convince your institution to invest in using DOIs for DS?
  3. Implementing DOIs for DS can enable a transformation in how collections data are accessed and used. What transformation would you like to see and how can this be made to succeed?
  4. How should the costs of a PID scheme be paid for and who (which kind of organisations) should be responsible for that?
  5. What advantages are there from moving forward with a community-specific branding of DOIs under the name of ‘Natural Science Identifiers’ (NSId)?
  6. What challenges have you encountered in the past / do you foresee when introducing a new type of PID to your collections? If you’ve used DOIs or other PIDs for identifying things or been responsible for administering the assignment of DOIs/PIDs, for example as a member of a registration agency such as Crossref or DataCite, what’s your experience been like?
  7. What other data elements, objects and/or terms/concepts should be identified with a PID but are not yet able to be identified? What kind of scheme(s) is needed for assigning Handles to these kinds of things?

Information resources

  • Handle.Net registry. http://handle.net/.
  • International DOI Foundation. DOI Handbook. Digital Object Identifier System Handbook.
  • The DONA Foundation. https://www.dona.net/.
  • Hardisty A, Addink W, Glöckler F, Güntsch A, Islam S, Weiland C. (In review). A choice of persistent identifier schemes for the Distributed System of Scientific Collections (DiSSCo). Research Ideas and Outcomes. In review.
  • Davies N, Deck J, Kansa EC, Whitcher S, Kunze J, Meyer C et al. (2021) Internet of Samples (iSamples): Toward an Interdisciplinary Cyberinfrastructure for Material Samples. GigaScience, Volume 10, Issue 5, May 2021, giab028. doi: 10.1093/gigascience/giab028.
  • European Commission (2020). Directorate-General for Research and Innovation. A Persistent Identifier (PID) policy for the European Open Science Cloud. Publications Office of the EU. https://doi.org/10.2777/926037.
  • European Commission (2021). Directorate-General for Research and Innovation. PID architecture for the EOSC. Publications Office of the EU. https://doi.org/10.2777/525581.
  • Madden, F and Woodburn M. (2021). Persistent Identifiers at the Natural History Museum. Case study by the PIDs as IRO Infrastructure AHRC funded project. Can be found on this page: TANC HeritagePIDs - resources along with other case studies and outputs from the project.
  • Towards a national collection. Website of the UK’s AHRC 5-year programme taking the first steps towards opening the UK’s heritage collections to the world by creating a unified virtual ‘national collection’. https://www.nationalcollection.org.uk/.

In the context of question 4 (costs of PID scheme), I would like to point to the experiences of the outcomes of a 2.5 year long strategic planning and road mapping effort of the IGSN Global Sample Number that was aimed toward identifying ways for the PID system to scale to growing demands and to operate with a sustainable business model. The outcome if this effort is a road map toward a partnership of the IGSN e.V. with DataCite e.V. that will support the global adoption, implementation, and use of physical sample identifiers. See blog by DataCite CEO Matt Buys at Bringing together communities: IGSN and DataCite.

Following the good example set by Kerstin (being the first contributor to topic 7), I am tossing some very preliminary thoughts on Q5 (NSId). Strong branding is certainly great for incentivizing adoption, but ‘Nature Science Identifiers’ seems to be a bit too broad and a bit too limiting at the same time. It is too broad because the IDs will only be used for DES, not everything identifiable in natural science (correct me if I am wrong here). It is too limiting because it may exclude samples collected in e.g., archaeology. I know natural history specimens is the current focus, but do we intend to always keep this focus? Could “DESID” be considered as a candidate brand?

Small correction: NSID stands for Natural Science Identifier.

I think that successful branding leads to the fact that the connotation of a term does not have to be defined in the term itself. I am very sure that NSIDs can be marketed in such a way that it is always clear that they are identifiers for Digital Specimens.

I never asked myself the archaeology question. It was actually always clear to me that we were talking about natural history.