Background and context for phase 2

We do not repeat the background and context for the whole consultation, except for carrying forward information relevant for phase 2 into the present page. We assume familiarity with phase 1, its contributions and main summaries for each of the first five topics on the GBIF Discourse home page for the virtual consultation. Here we just present a very brief summary statement for each of those topics to provide context for phase 2:

  • Topic 1: Making FAIR data for specimens accessible: Extended and enriched open data through Digital extended Specimens (DS) will revolutionize how natural science collections data are produced and used. Assigning unambiguous persistent identifiers and maintaining the link between a DS and its corresponding physical specimen is a joint responsibility of custodial institutions and an agent responsible for assigning DS identifiers. Legal/regulatory, ethical and other sensitivity obligations must be met. Usages, modifications and enrichments are various kinds of transactions on the DS that must be provenanced.

  • Topic 2: Extending, enriching and integrating data: The benefits of extending and enriching natural history collection records for research and education are clear, with many examples of the kinds of data we should be integrating - from images, vocalizations, field notes, georeferences, and annotations created within a collection, to products of research created by end users (publications, Genbank sequences, annotations, images, CT scans, etc.) to the technical mechanisms and cyberinfrastructure required to make these necessary connections. Current publishing mechanisms are not sufficient and we need to move to a more transactional mechanism that will allow all of these connections and annotations to be made transparently visible in real time along with the necessary attribution. There is a significant social element to any successful system in ensuring that all actors in the data pipeline are making the elements they are creating discoverable and linkable through the use of unique identifiers and metadata.

  • Topic 3: Annotating specimens and other data: Developing and implementing a robust global annotation system to improve data quality and provide attribution are key value propositions for the Digital extended Specimen. However, annotations providing extra value, such as actionable data on phenology for data use in policy, are potentially more important in developing funding streams for implementation. Building trust for human and machines annotations is critical to the uptake and use. This can best be accomplished by starting slowly and simply first with dataset level annotations. There is a role for regional or global annotation data stores as not all local collections are expected to want or require all annotations back into the source system.

  • Topic 4: Attributing work done: It is important that we are able to make visible the work that has gone into the assembly of the Digital extended Specimen, from collection through identification, to later research use. This can help to uncover the sometimes hidden histories behind the collections. Correct attribution can aid data discovery, and may contribute to the research profile of institutions and individual researchers. Careful discussion with researchers, institutions and publishers is required in order to clarify roles and responsibilities and to ensure the fair application and use of any derived metrics.

  • Topic 5: Analyzing/mining specimen data for novel applications: The ability to access, manipulate, and analyze data from a plethora of sources is critical to diversity research at a global scale, yet this is currently hindered by disparate data infrastructures. In addition to expanding technical capabilities, a fully-implemented digital extended specimen concept capable of contributing to cutting-edge research must include workforce training and support for data collectors, data managers, and data users.

Overall, the contributions to and outcomes of phase 1 have shown that there is strong consensus and potential for convergence of the Digital Specimen and Extended Specimen ideas, and that this is achievable. We name this as Digital extended Specimens, abbreviated as ‘DS’.

The figure below illustrates the separation between physical specimens, digital representations in institutional collection management systems (CMS), and Digital extended Specimens. It shows how data at the DS level can benefit from wider, external curatorial and other activities of the expert community beyond the capacities of institutionally based curators and collection managers.

Whereas phase 1 of the consultation was principally concerned with the value streams enabled by the digital specimen / extended specimen framework, the second phase is much more about the infrastructure capabilities needed to support those so that desired business outcomes can be achieved. The individual topics of phase 2 reflect this focus.