Summaries - 1. Making FAIR data for specimens accessible

This is the compilation of daily summaries (most recent first) written by the topic facilitators. The goal of this page is to orient new readers of the topic. Please go to the thread to comment.

Go to Making FAIR data for specimens accessible

Summary Number 3, 26 Feb 2021

  • Legal aspects and ethical issues

In a continued discussion on the legal aspects of making data for specimens more available the issue was raised that CBD-focus only is too limited and that permits and CITES should be intrinsically considered in the design of an infrastructure for biodiversity data. Also ethical issues of data sharing and publishing should be addressed. With regard to DSI it was mentioned that whether or not there can be a fundamental legal distinction between a physical genetic resource and its DSI, is a matter of legal and political debate.

  • New service needs

In response to the question how extended/digital specimen (ES/DS) concepts help to address making specimen data widely usable for novel applications, it was mentioned that if we are trying to build a system of very interconnected data, then some mechanism to search across ES/DS data using graph style queries would be useful, ie emphasising and properly utilising the links between records as much as attributes on the records themselves.

  • Copyright on data

Some concerns were raised about international data resources operating in a variety of intellectual property law jurisdictions. It was questioned whether the adoption of Creative Commons in GBIF was the right choice given that data are not copyrighted works and that attribution should perhaps be done with a license that does not rely on copyright. Applying licenses to data and images that do not have any existing rights also may give a wrong impression that rights exist. It was pointed out however that Creative Commons adoption followed a large community consultation, that case law in the U.S. has established Creative Commons valid standing and status and that there is also a GBIF data publisher agreement in place.

  • Metadata

With regard to “searchable metadata" it was mentioned that existing domain metadata standards as EML (the complete version) should be considered. Besides sharing data with GBIF, this would also make it easier to share with DataOne which is focusing on standardisation through detailed metadata.

In a continued discussion about DS/ES metadata It was stated that the link between the physicalSpecimenID and the unique identifier for the DS twin should be rock solid, and a plea was made for establishment of proxy organisations with enough financial and staffing resources to underpin this with provision of landing pages for the identifiers, similar to publishers in CrossRef.

  • Structure of a DS/ES specimen

Following earlier questions about the nature of the digital specimen object, it was proposed to create some sketches to discuss and clarify this. For this a subthread was created:

Structure and responsibilities of a #digextspecimen. The possible sections of a ES/DS were drawn and discussed and als it was mentioned that important operations need to be defined as these will affect implementations. It was also mentioned that it depends on Collection Management System capabilities how much of the data can be synced with the CMS

Summary Number 2, 19 Feb 2021

Discussion today provided some clarification on kernel metadata and its role in the main types of searches performed on specimen data. It also raised the question about the nature of the digital specimen object, and whether is should be “just a bag of relationships with other objects” or if metadata from the original specimen should be embedded in it. The suggestion was made that if globally inclusive indexing service such as GBIF sit the heart of the global architecture of digital objects, then their role may be limited to reflecting an authoritative view of how well they reflect their associated physical object.

More thoughts were shared on the regulations governing the use of digital specimens. Although we must ensure compliance with Access and Benefits sharing provisions of the Nagoya protocol, there may be options that will relieve the burden from collections institutions of responsibility for past and future use of data governed by ABS agreements by relaxing the regulation of such specimens. In any case, social and ethical considerations must be built into any plans for implementing a global, integrated biodiversity data infrastructure. Exposure of digital specimen data may reveal instances where collections were not made in compliance with established permits or other regulations, and although responsibility for compliance lies with the original source of the data, data publishers must be more attentive to their responsibility for sharing improperly gathered data.

Summary Number 1, 17 Feb 2021

Thanks for the initial comments. Issues coming up during the first days of the consultation:

  1. Clarifications of understanding about the Digital Specimen/Extended Specimen concept, design choices and the socio-technical implementation. It was discussed how design choices around the realisation of the DS/ES concept and social contracts associated with maintaining DS/ES are linked. It’s already clear that a much fuller explanation is needed, covering several dimensions to help improve understanding. What does a DS/ES contain? How are its components arranged in relation to one another? Who is responsible for what?

  2. How several legal aspects will be addressed. Concerns specifically, in relation to specimen/data sovereignty, repatriation of benefits in relation to CARE principles, and emerging multi-lateral agreements such as Nagoya, GMBS. Also, on the legal implications / differences between FAIR and open. This might point to the need to find and involve one or two legal expects in the consultation, either now or in a later phase; and/or to organise workshops on the topic.

  3. Use of terminology. Terms such as ‘kernel information’, ‘metadata’ and ‘linked data’ are causing confusion. Besides missing a glossary of terms to help common understanding, this is linked to incomplete understanding due to insufficient explanations. See also (1) above.

Questions we aim to focus on in the next couple of days:

  • New models of digitization, curation and governance to serve FAIR data when that data is a combination of the collection holder’s data, collector’s data and data from external specialists and third-party sources.

  • Functions needed for new data science - what services or capabilities are currently missing?

  • Improving engagement of participants - what motivates you to contribute/share value-adding data to extended digital specimens? What do you wish to see?