FAIR data is data that is findable, accessible, interoperable and reusable according to the FAIR Guiding Principles. Generating and providing access to FAIR data during and after the specimen digitisation process is one of several important value streams to be better supported by the digital specimen / extended specimen framework.
Note: A value stream is a sequence of activities that creates an overall result or outcome for a stakeholder (end-user). A stakeholder can be a scientist, a collection manager, a curator, an educator, etc. The result or outcome has a worth or usefulness to the stakeholder.
When it comes to mobilizing data for digital representations of specimens, everyone wants widened and improved access to specimen data; both to data about the specimen and the digitisation process itself, or related to the specimen and data derived from study and analysis of the specimen. Everyone wants data to be findable, accessible, interoperable and reusable (FAIR).
The idea of Webster et al., of extending the scope of the specimen concept itself to include things other than biological or geological materials, such as audio, video, photographic recordings and a wide range of other data types (both directly derived and indirectly related) leads to the idea of the Digital Specimen as proposed by DiSSCo and the notion of a network of extended specimens, as proposed by BCoN. Convergence leads to the idea of the extended digital specimen where ‘extended’ refers to adding derived data and ‘digital’ references the idea of distinguishing between the physical specimen as an identifiable object in the real/natural world and a coupled or corresponding information representation of that object in the digital realm i.e., an identifiable digital object on the Internet that can be manipulated independently of the physical object.
The past twenty years have led to data becoming more easily findable and accessible, yet still not fully meeting all the requirements of the FAIR Guiding Principles. The focus must turn increasingly to making specimen data interoperable and reusable, not only by humans but also by machines i.e., by software. Digital specimens as described by DiSSCo, are FAIR by design and are representations on the Internet (surrogates) corresponding to identifiable physical specimens in a natural science collection – or ‘Specimens on the Internet’ – that can be manipulated by both humans and machines. Standardized as open Digital Specimens in a new specification (openDS) these form the basis for the next generation of collections data infrastructure.
The goal of this category is to discuss the business (including scientific) outcomes that can be achieved by adopting a converged open digital and extended specimen technical framework that we name as openDS, and the opportunities that affords to various stakeholders. It is relevant to discuss technical aspects and capabilities needed, as well as the way forward from the present day, including new models of cooperation and actions that are needed. This category can also consider financial, social, governance, legal and professional implications.
This category is concerned with the overarching issues on which outcomes in the other categories depend.
- Webster MS, editor. The extended specimen: emerging frontiers in collections-based ornithological research. CRC Press; 2017 Jul 20. doi: 10.1201/9781315120454; especially Chapter 1.
- DiSSCo Tech - What is a Digital Specimen? https://bit.ly/DigitalSpecimen
- BCoN, 2019 - Extending US biodiversity collections to address national challenges. https://bcon.aibs.org/wp-content/uploads/2019/01/Report-Public-Comment-draft.pdf
- ES/DS Framework, A technical explanation towards convergence. Video presentation: Dropbox - Archi-v0.4-16Dec2020-export.mp4 - Simplify your life
- TDWG 2020 SYM07: Standards development to support transformation of collection data into digital specimens. Recording of session: TDWG 2020: Standards development to support transformation of collection data into digital specimens - YouTube.
- TDWG 2020 PD03: Panel discussion on enabling digital specimen & extended specimen concepts in current tools & services. Recording of session: TDWG 2020 Enabling digital specimen & extended specimen concepts in current tools & services - PD03 - YouTube.
- TDWG 2020 BoF 01: Birds of a Feather session on converging Digital & Extended Specimens towards global specification. Recording of session: TDWG 2020: Converging Digital & Extended Specimens towards global specification - Working Sessions - YouTube.
Lannom, L., Koureas, D., and Hardisty, A.R. (2020). FAIR Data and Services in Biodiversity Science and Geoscience. Data Intelligence 2(1):122-130. doi: 10.1162/dint_a_00034
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific data, 3. doi: 10.1038/sdata.2016.18.
De Smedt K, Koureas D, Wittenburg P (2020) FAIR Digital Objects for Science: From data pieces to actionable knowledge units. Preprints 2020, 2020030073. doi: 10.20944/preprints202003.0073.v1.
Specification for open Digital Specimens (openDS), Github repository: https://github.com/DiSSCo/openDS; especially:
- Answers to frequently asked questions about Digital Specimens and openDS;
- Explanation of how openDS and the Extended Specimen Network are related;
- The openDS data model relates to other important structures, standards and initiatives in the wider world, as well as to information science in different domains of scientific discourse. Positioning openDS in the landscape is one of the first and most important steps in development of the specification. Making sure that everyone likely to make use of the model agrees on this is essential to progress.
- openDS consists of three principal and interrelated components, as follows:
The openDS data model. Read the introduction to the openDS data model;
The Ontology for open Digital Specimens (ODS). Read the introduction to the ODS ontology; and,
The openDS Application Programming Interface (API). Read the introduction to the openDS API.
Lendemer J, Thiers B, Monfils AK, Zaspel J, Ellwood ER, Bentley A, LeVan K, Bates J, Jennings D, Contreras D, Lagomarsino L (2019) The Extended Specimen Network: A Strategy to Enhance US Biodiversity Collections, Promote Research and Education. BioScience, biz140. doi: 10.1093/biosci/biz140.
Hardisty A, Saarenmaa H, Casino A, Dillen M, Gödderz K, Groom Q, Hardy H, Koureas D, Nieva de la Hidalga A, Paul DL, Runnel V, Vermeersch X, van Walsum M, Willemse L (2020) Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1. Research Ideas and Outcomes 6: e54280. doi: 10.3897/rio.6.e54280; especially sections 2 (The DiSSCo Research Infrastructure) and 4 (Architecture, tools and technologies).
National Academies of Sciences, Engineering, and Medicine 2020. Biological Collections: Ensuring Critical Research and Education for the 21st Century. Washington, DC: The National Academies Press. https://doi.org/10.17226/25592.
Creating Darwin Core OWL files for OBO Foundry ontologies GitHub - BiodiversityOntologies/dwcobo: code for creating in Darwin Core ontology modules to import to BCO.
Questions to promote discussion
We suggest three groups of questions to promote discussion. These are concerned with new models of curation and governance, functions needed for new data science and improving engagement of participants. However, we welcome contributions on any matter related to mobilizing FAIR extended digital specimen data.
Group 1 questions: New models of digitisation, curation and governance
- What capabilities do natural science collections (NSC) need to serve comprehensive FAIR data about their specimens when that data is a combination of the collection holder’s data, collector’s data and data from external specialists and third-party sources?
- How and where should such combinations of value-added data be stored and curated and who should take the responsibility for that?
- What FAIR data needs to be generated or made available during different steps in the digitisation process?
- Extended digital specimen data is not the responsibility of a single organisation. What new models of cooperation do NSCs and other actors need to govern and serve such data?
- Where are new standards needed to make that possible?
- What investments are needed and by who? How can we make the return on investment concrete for serving comprehensive FAIR data?
Group 2 questions: Functions needed for new data science
- What types of scientific questions do you want to be able to address with extended digital specimen data?
- What functions (services, capabilities) do you need to pursue the scientific questions you wish to address?
- What kinds of operations would you want to perform on specimen data remotely across a network (i.e., without having to bring the data to your local computer)?
Group 3 questions: Improving engagement of participants
- What metrics/measures motivate you to contribute/share your value-adding data to extended digital specimens? What do you expect in return?
- What could be done to broaden the use of extended specimen data and to make it accessible to a broader audience?
- How could the experience of users (e.g., of portals, search, etc.) be improved and their lives made easier?