Summaries - 1. Making FAIR data for specimens accessible

JoeMiller · February 12, 2021, 12:01pm

This is the compilation of daily summaries (most recent first) written by the topic facilitators. The goal of this page is to orient new readers of the topic. Please go to the thread to comment.

Go to Making FAIR data for specimens accessible

Summary Number 4, 6 March 2021

It becomes clear that although discussions about access and benefit sharing tend to focus on digital sequence information, that is only one part of the story. Other kinds of information, when digitally available give rise to a whole range of sensitivity concerns. These are not technology questions but questions of regulation, ethics, commerce, sustainability, crime, law, etc. Extended open data are a gamechanger that turns dusty old collections into goldmines. Thus, mechanisms to deal with potential issues must be considered and a more coordinating discussion is essential to ensure technical, legal and ethical alignment. When searching, the availability of certain metadata might have to depend on who you are.

Responsibilities for creating and maintaining a digital specimen lie with the institution having custodial responsibility for the physical specimen. For physical specimens that no longer exist but where it is desirable to create a corresponding DS, these may need to be adopted by an institution for maintenance. Maintaining the link between the PS and its corresponding DS is a joint responsibility that will exist between a custodial institution and an agent responsible for assigning identifiers, to be managed through contracts and service level agreements.

The logical and implementation structure of a digital extended specimen was explored in a sub-topic thread. Authoritative information about a specimen, supplementary information derived from a specimen and other information associated with the specimen are logically integrated under a single digital specimen persistent identifier, reflecting the core Webster (2017) notion of extending the scope of a specimen. The connection to the PS is immutable but the DS is mutable. It can be modified and updated. The custodial institution is responsible for authoritative information about the specimen whereas other kinds of information (supplementary, associated, etc.) can be enriched by any appropriately authorized person/organisation/machine. Such modifications and enrichments can be treated as transactions (of various kinds) on the digital specimen.

Chains of custody spanning the path of a specimen and its information from the gathering event in the field, through lab work, imaging, statistics, etc. to official reporting in (for example) conservation contexts, national planning, court evidence, commercial decisions, etc. were mentioned as being enabled by DES/identifiers.

Concerns to be taken care of during implementation, such as ensuring continuing relevance of collections once the data is ‘out there’, ensuring that collections receive and can assimilate new and changed data, the way collection management systems should interact with the federating abstracting DES layer, etc. were discussed and noted for further attention.

In the end, consensus emerged that the DiSSCo digital specimen concept and the BCoN extended specimen are not so different from one another as seen by comparison diagrams. They share more similarities than differences and any apparent differences arise from the perspectives from which they have been considered. The DS concept has been considered more from the point of view of technical and implementation in ICT whereas the ES concept is considered more from the point of view of the different kinds of data that are about and derive from collections specimens, and how these relate. In a converged illustration the blue circle defines a common scope.

Summary Number 3, 26 Feb 2021

Legal aspects and ethical issues

In a continued discussion on the legal aspects of making data for specimens more available the issue was raised that CBD-focus only is too limited and that permits and CITES should be intrinsically considered in the design of an infrastructure for biodiversity data. Also ethical issues of data sharing and publishing should be addressed. With regard to DSI it was mentioned that whether or not there can be a fundamental legal distinction between a physical genetic resource and its DSI, is a matter of legal and political debate.

New service needs

In response to the question how extended/digital specimen (ES/DS) concepts help to address making specimen data widely usable for novel applications, it was mentioned that if we are trying to build a system of very interconnected data, then some mechanism to search across ES/DS data using graph style queries would be useful, ie emphasising and properly utilising the links between records as much as attributes on the records themselves.

Copyright on data

Some concerns were raised about international data resources operating in a variety of intellectual property law jurisdictions. It was questioned whether the adoption of Creative Commons in GBIF was the right choice given that data are not copyrighted works and that attribution should perhaps be done with a license that does not rely on copyright. Applying licenses to data and images that do not have any existing rights also may give a wrong impression that rights exist. It was pointed out however that Creative Commons adoption followed a large community consultation, that case law in the U.S. has established Creative Commons valid standing and status and that there is also a GBIF data publisher agreement in place.

Metadata

With regard to “searchable metadata" it was mentioned that existing domain metadata standards as EML (the complete version) should be considered. Besides sharing data with GBIF, this would also make it easier to share with DataOne which is focusing on standardisation through detailed metadata.

In a continued discussion about DS/ES metadata It was stated that the link between the physicalSpecimenID and the unique identifier for the DS twin should be rock solid, and a plea was made for establishment of proxy organisations with enough financial and staffing resources to underpin this with provision of landing pages for the identifiers, similar to publishers in CrossRef.

Structure of a DS/ES specimen

Following earlier questions about the nature of the digital specimen object, it was proposed to create some sketches to discuss and clarify this. For this a subthread was created:

Structure and responsibilities of a #digextspecimen. The possible sections of a ES/DS were drawn and discussed and als it was mentioned that important operations need to be defined as these will affect implementations. It was also mentioned that it depends on Collection Management System capabilities how much of the data can be synced with the CMS

Summary Number 2, 19 Feb 2021

Discussion today provided some clarification on kernel metadata and its role in the main types of searches performed on specimen data. It also raised the question about the nature of the digital specimen object, and whether is should be “just a bag of relationships with other objects” or if metadata from the original specimen should be embedded in it. The suggestion was made that if globally inclusive indexing service such as GBIF sit the heart of the global architecture of digital objects, then their role may be limited to reflecting an authoritative view of how well they reflect their associated physical object.

More thoughts were shared on the regulations governing the use of digital specimens. Although we must ensure compliance with Access and Benefits sharing provisions of the Nagoya protocol, there may be options that will relieve the burden from collections institutions of responsibility for past and future use of data governed by ABS agreements by relaxing the regulation of such specimens. In any case, social and ethical considerations must be built into any plans for implementing a global, integrated biodiversity data infrastructure. Exposure of digital specimen data may reveal instances where collections were not made in compliance with established permits or other regulations, and although responsibility for compliance lies with the original source of the data, data publishers must be more attentive to their responsibility for sharing improperly gathered data.

Summary Number 1, 17 Feb 2021

Thanks for the initial comments. Issues coming up during the first days of the consultation:

Clarifications of understanding about the Digital Specimen/Extended Specimen concept, design choices and the socio-technical implementation. It was discussed how design choices around the realisation of the DS/ES concept and social contracts associated with maintaining DS/ES are linked. It’s already clear that a much fuller explanation is needed, covering several dimensions to help improve understanding. What does a DS/ES contain? How are its components arranged in relation to one another? Who is responsible for what?
How several legal aspects will be addressed. Concerns specifically, in relation to specimen/data sovereignty, repatriation of benefits in relation to CARE principles, and emerging multi-lateral agreements such as Nagoya, GMBS. Also, on the legal implications / differences between FAIR and open. This might point to the need to find and involve one or two legal expects in the consultation, either now or in a later phase; and/or to organise workshops on the topic.
Use of terminology. Terms such as ‘kernel information’, ‘metadata’ and ‘linked data’ are causing confusion. Besides missing a glossary of terms to help common understanding, this is linked to incomplete understanding due to insufficient explanations. See also (1) above.

Questions we aim to focus on in the next couple of days:

New models of digitization, curation and governance to serve FAIR data when that data is a combination of the collection holder’s data, collector’s data and data from external specialists and third-party sources.
Functions needed for new data science - what services or capabilities are currently missing?
Improving engagement of participants - what motivates you to contribute/share value-adding data to extended digital specimens? What do you wish to see?

Topic		Replies	Views
Summaries - 5. Analyzing/mining specimen data for novel applications Digital/Extended Specimen	0	1145	February 16, 2021
Background and context for phase 2 Digital/Extended Specimen	0	1081	June 8, 2021
Summaries - 2. Extending, enriching and integrating data Digital/Extended Specimen	4	1405	February 25, 2021
Making FAIR data for specimens accessible Digital/Extended Specimen	59	4221	March 5, 2021
Summaries - 3. Annotating specimens and other data Digital/Extended Specimen	0	1332	February 16, 2021

Summaries - 1. Making FAIR data for specimens accessible

Related topics