Moderators: Jose Fortes, Tim Robertson, Sharif Islam, David Martin
Background
Robust access points and data infrastructure are important for the generation, archival, dissemination and analysis of digital specimen data. The infrastructure based on the Digital Extended Specimens framework not only needs to be reliable and robust but also needs to be used and adopted by the user community. At the same time, we are envisioning this new framework within the existing data infrastructures and practices around the world. These contexts will help us appreciate local data practices (e.g., field and lab work), data sharing and collaboration issues (e.g., data sharing among multiple research groups or organizations), short and long term data curation and storage practices (e.g., the role of data repositories) and the role of national, regional, global, and thematic aggregation. Support for institutions of varying sizes and capabilities to deliver data (for shorter and longer-term) into such infrastructure also needs to be considered. Furthermore, the success of any global initiative around digital specimen data depends on how well the infrastructure can accommodate new capabilities such as curation by the community, providing unambiguous attribution, and provenance management services â among others.
Besides data practices, current contexts will help us to get a view of how the data journey happens from production to reuse (see data journey described by Sabina Leonelli). This data journey can range from moving individual digital objects from one repository to another to aggregating and publishing datasets (see also 2017 article by Beckett Sterner and Nico Franz). With a nuanced view of different digital artefacts, the data journey can provide guidelines for the bigger transformation that must happen to accommodate the Digital Extended Specimens framework. Given the diverse nature of the data classes and users involved, a balance needs to be achieved to understand generalized and specialized use cases. As we move toward new solutions and capabilities we should also map current professional practices to new kinds of analysis, expertise and collaboration within roles such as metadata specialist, data manager, data scientists, research software engineer and other specialists that work with data and cyberinfrastructure. Lastly, a truly global infrastructure needs to have a nuanced view of specific financial and political situations in different parts of the world regarding support and financial models for long-term sustainability.
The goal of this category is to discuss how, from our current context, we can go toward a global data infrastructure based on the Digital Extended Specimens framework. Even though the technical aspects are the primary focus, social and financial aspects are also relevant for this section.
A presentation (1.1 MB) was created to provide background information on the FAIR Digital Object concept and to highlight architectural and application layers that can materialise the vision around Digital Extended Specimens.
A note on terms
Terms such as âcyberinfrastructureâ, âdata infrastructureâ and âresearch infrastructureâ are used in relation to digital infrastructures providing services to the scientific community. Many such terms come from funding programs such as the U.S. National Science Foundation (NSF), from European efforts focusing on research infrastructures, such as the European Strategy Forum for Research Infrastructures (ESFRI) and the European Open Science Cloud (EOSC), and from the Research Data Alliance (RDA).
Questions to promote discussion
- What are the core capabilities (such as data management, data analysis) the infrastructure should satisfy?
- What are the current pain points (e.g., storage needs, scalability, data integrity, bandwidth)?
- There are various approaches to how applications, such as collection management systems, can participate in an open Digital Extended Specimen based solution. This could include full support natively in a local installation (i.e. implementing and running the appropriate APIs), use of shared systems that provide the functionality (e.g. using cloud CMSes), or synchronising with another party to provide the necessary data access services on your behalf (e.g. with DiSSCo, iDigBio, GBIF or other). We welcome discussion around deployment aspects and what level of adoption of Digital Object Architecture the community foresee within the tools used.
- Being able to integrate with the existing tools and data networks in use by institutions is critical for adoption. What are the constraints, the desire and capacity to adapt?
- Several emerging technologies and protocols may provide good frameworks for deploying infrastructure supporting the digital specimen vision. Notable mentions include blockchain to record the âchange eventsâ in the specimen lifecycle and Digital Object Architecture and itâs associated Digital Object Interface Protocol. We encourage open discussion about the merits of these, and others.
Information resources
- Atkins, D. E., Droegemeier, K. K., Feldman, S. I., Garcia-Molina, H., Klein, M. L., Messerschmitt, D. G., . . . Wright, M. H. (2003). Revolutionizing science and engineering through cyberinfrastructure: Report of the National Science Foundation Blue-Ribbon Advisory Panel on Cyberinfrastructure. Arlington, VA: National Science Foundation.
- Borgman, C.L., Darch, P.T., Sands, A.E. and Golshan, M.S., 2016. The durability and fragility of knowledge infrastructures: Lessons learned from astronomy. DOI: https://doi.org/10.1002/pra2.2016.14505301057
- De Smedt, K., Koureas, D. and Wittenburg, P., 2020. FAIR digital objects for science: from data pieces to actionable knowledge units. Publications, 8(2), p.21. DOI: https://doi.org/10.3390/publications8020021
- Leonelli, S. and Tempini, N., 2020. Data journeys in the sciences. Springer Nature. DOI: http://doi.org/10.1007%2F978-3-030-37177-7
- National Academies of Sciences, Engineering, and Medicine, 2021. Biological Collections: Ensuring Critical Research and Education for the 21st Century.
- Sterner, B. and Franz, N.M., 2017. Taxonomy for humans or computers? Cognitive pragmatics for big data. Biological Theory, 12(2), pp.99-111. DOI: https://doi.org/10.1007/s13752-017-0259-5
- Wood, J., Andersson, T., Bachem, A., & Best, C. (2010). Riding the wave: How Europe can gain from the rising tide of scientific dataâ Final report of the High Level Expert Group on Scientific Data. Brussels, Belgium: European Commission.
- Belbin, L., Wallis, E., Hobern, D. and Zerger, A., 2021. The Atlas of Living Australia: History, current state and future directions. Biodiversity Data Journal, 9 DOI: https://dx.doi.org/10.3897%2FBDJ.9.e65023