Extending, enriching and integrating data

MAFleming · March 4, 2021, 6:22pm

I would like to make a contribution to the consultation relating to use-cases beyond trained practitioners of taxonomics and systematics, including collection managers. There is a global shared heritage value to knowledge of the natural world, and a wider group of users (as both contributors and consulters) of biological data in GBIF and DiSSCo means greater support for the work and infrastructures of natural history.

As a preamble, it is important to note that even though I am not a biologist, I have worked as a humanities researcher and research manager inside several natural history museums, and collaborated successfully with biologists in those contexts and in an interdisciplinary manner. Therefore I understand the critical significance of specimen annotations in relation to species determinations and nomenclatural disputes. I also understand the problem of ‘distributed annotations’ which come about because of the distribution of multiple ‘duplicates’ that find themselves in a number of different collections where different research questions may have produced widely differing data that it would be valuable to collate and correlate.

As a humanities scholar, I am not able to go in deep concerning information architectures or indeed the finer points of taxonomic and/or genomic data. But I do know quite a bit about natural historical knowledge that is missing from most databases, be they natural history museum catalogues, or large aggregators such as GBIF and JSTOR Plants, or initiatives such as Barcode of Life. The knowledge that I am referring to is not knowledge that most biologists have to date been very interested in, but many are realising that it is knowledge that is extremely valuable in understanding climate change, biodiversity loss, and the increasingly significant social, historical and cultural aspects of specimens and collections. This is the part of the information iceberg that is below the water, as opposed to the highly specialised and limited datasets that are currently aggregated specifically for taxonomic use.

This is knowledge that is locked up in manuscript documents, labels, records, field notebooks, colonial archives, letters and more — and it is valuable to biologists. It is also forms of knowledge held as heritage understanding in communities all over the world from which centralised specimen collections originate. As a meta-issue, the conditions in which these kinds of knowledge have been produced over 500 years must also be a subject of study if we are to understand why we do science in the way that we do it now, and also if we are to understand what to do in order to mitigate climate change and biodiversity loss.

In the humanities and library/archives sciences, considerable efforts are being led to make these hidden bodies of information machine readable and accessible to computation. This is often being done in partnership with digital humanities colleagues, and as @Debbie points out, there is much that biodata researchers can learn from pilot projects in these humanities areas – not just in terms of the information that is being surfaced, but also in terms of the semantic solutions that are being forged.

Some examples of such projects are:

Making Sense of Illustrated Handwritten Archives (Naturalis Leiden, University of Twente, et al) which looks at the papers of the Dutch Natuurkundige Commissie, containing a rich account of 17,000 pages of scientific exploration of the Indonesian Archipelago (1820- 1850). At their recent online conference, I organised a panel of librarians, archivists, bioinformatics researchers, social science researchers and humanities researchers: Semantics and Beyond: Modeling and enriching longue-durée biocultural data for answering interdisciplinary and epistemic research questions.
CLIR/BHL Field Book Project (begun at Smithsonian Institution Archives in 2010 but now with many partners including BHL). This digitisation project is ‘critical for any researcher interested in the full context of the scientific expeditions, discoveries, or collections described therein’.
Darwin Correspondence Project (University of Cambridge); Alfred Russel Wallace Correspondence Project (begun at NHMLondon)
Early Modern collections catalogues: Digital Editions of the catalogue of Sir Hans Sloane (Fossils 1) from Reconstructing Sloane, a project I have been involved in for a decade; as well as the Royal Society’s digital publication of the manuscript catalogue of the Royal Society Repository.
Mobile Museums Project and Miscellaneous Reports Project, Royal Botanic Gardens Kew with humanities partners such as The Linnean Society and Royal Holloway Department of Geography. It is worth noting that the Miscellaneous Reports Project ran a conference concurrently with this GBIF consultation, the opening remarks of which were given by RBGKew’s Director of Science Alexandre Antonelli who outlined the significance of Kew’s collaboration with humanities researchers: view from 7:58 to 15:56

These nathist knowledge projects are all effected by humanities scholars and digital humanities researchers working hand in hand (and sometimes with biologists), just as is the case for collaborations between biologists, collection managers, and biodiversity informaticians. It would be valuable to both groups to turn systematically to each other in order to share bodies of knowledge, data, and methods. This should happen in long-term well-structured exchanges and collaborations and ultimately the fruits will be manifest in the co-design of ground-breaking data models and the pooling of highly heterogeneous knowledge that has huge interdisciplinary value.

Questions of data enrichment, DOIs, semantic alignment, and notions of extended/digital collection objects are also major drivers in historical and cultural collections management, as can be seen in the UK’s Towards a National Collection project (which also includes natural history) and research being done at the Getty Institute in Semantics and Name Authorities.

It cannot be the sole responsibility of GBIF alone to organise such intellectual and technical collaborations across the sciences and the humanities, but it is incumbent on scientific communities of practice and infrastructures such as GBIF, DiSSCo, and others to consider such collaborations in creating new infrastructures and data models. It would be wonderful to have a structured consultation on this. In the Letter of Intent for Collaboration in a Global and Open Process for Interoperable Enriched Specimen Information Models, we read that there is an ‘Aim to collaborate in a global process, open to participation from all stakeholders’.

‘All stakeholders’ would also include other significant holders of knowledge about the natural world, such as communities of origin from whence these collections came: a consultation worth having as well. Communities of origin in the localities from which natural historical collections have been made over some 600 years also have a keen interest in, and deep knowledge of, the biology and habitats in which they live. This knowledge can be historical as well as contemporary, and also has a place in attribution discussions (viz WIPO TKN and the ethics behind Nagoya). Noting local names of species and places can have value in determinations and disputes. (see also this discussion initiated by @sking on decolonising collections data, to which I have made a contribution, and this comment from @bsterner from the 2020 GBIF consultation, as well as this open question from @austinmast)

Data infrastructures truly aiming to be FAIR and to CARE would incorporate both information and data models that are co-designed with, and co-populated by biologists, communities of origin, humanities scholars and SocSci researchers. It would be wonderful to be able to make a contribution to such a collaborative project!

Topic		Replies	Views
Summaries - 2. Extending, enriching and integrating data Digital/Extended Specimen	4	1411	February 25, 2021
Background and context for phase 2 Digital/Extended Specimen	0	1087	June 8, 2021
Analyzing/mining specimen data for novel applications Digital/Extended Specimen	43	2896	April 4, 2021
Summaries - 1. Making FAIR data for specimens accessible Digital/Extended Specimen	2	1562	February 26, 2021
Summaries - 5. Analyzing/mining specimen data for novel applications Digital/Extended Specimen	0	1151	February 16, 2021

Extending, enriching and integrating data

Related topics