8. Meeting legal/regulatory, ethical and sensitive data obligations

hardistyar · May 17, 2021, 8:25am

Moderators: Alex Hardisty, Jutta Buschbom and Breda Zimkus

Background

The desire to have open access to specimen data and to conduct open science contrasts with issues associated with legal requirements such as intellectual property rights; regulatory constraints of specific national and international legislation; the need to prevent the exposure of sensitive information to unauthorised persons; the need of businesses to maintain data- and infrastructure-based business foundations; and the social goals of fairness and equity, which require mechanisms to prevent individuals and groups taking undue advantage, especially of large and information-rich datasets. Against the background of a general policy of being ‘as open as possible, as closed as legally and/or ethically necessary’, the goal of this topic is to identify and explore the mechanisms of the future to meet legal obligations, including national and international regulations, as well as ethical and sensitive data concerns associated with biodiversity collections and their use.

The paragraphs below develop the themes of this topic in greater depth to allow several questions to be considered.

Digital solutions for meeting obligations

We want to explore when extended data infrastructure(s) are accepted and embraced by the communities to which they can provide services. Which criteria and functions must be offered to support legal/regulatory and ethical/moral obligations and sensitive data concerns? Are there existing solutions that can provide blueprints or that can be adapted? Digital solutions must provide services and advantages welcomed by the [communities of collection professionals, bio/geodiversity scientists and bio/geodiversity informaticians](link to topic 9), who develop, manage and maintain the data infrastructure(s). At the same time, bio/geodiversity infrastructures radiate out into their surrounding societies and offer intersections with a wide range of affected or interested [stakeholders with potential applications](link to topic 11). Thus, extended infrastructures need to appeal to and provide the functionality required in interactions with specific partners. These partners are found, for example, with indigenous peoples and local communities (IPLCs), women and youth; with engaged citizens and environmental activists; in university, government and companies’ research and development departments; in businesses (e.g., providing environmental assessments or producing a wide range of commodities); as well as organizations maintaining certification systems. Furthermore, infrastructure partners are responsible for local, subnational to national administration and contribute to (sub-)national planning and reporting processes; are professionals in customs offices, in law enforcement and the legal system responsible for case decisions; and are involved in developing forward-looking policy-decisions. Considering these and other applications, one aspect of this topic is related more directly to technical functionality providing versatile and elegant effectiveness, and efficiency through user-friendly interfaces, powerful tools and integrated workflows, made possible by FAIR Digital Objects, [persistent identifiers](link to topic 7) and relational links governed by [transactional mechanisms and provenance](link to topic 10) A second aspect considers and develops the theoretical and procedural concepts implementing a layer of legal and regulatory obligations; considerations for sensitive data, information privacy, and business intelligence; as well as ethical and social frameworks based on the the Sustainable Development Goals for the use of traditional knowledge in fairness, equity and justice.

ABS, BBNJ, and CARE

The collections community is one of the mediators of both, technical as well as socio-ethical aspects with regard to implementing and executing Access and Benefit-Sharing (ABS) regulations. In this context, a growing number of legal, regulatory, and ethical issues are confronting biodiversity collections. The Nagoya Protocol on Access and Benefit Sharing has notable implications, and although the aim is to create greater legal certainty and increase transparency for both providers and users of genetic resources, this agreement often poses a number of challenges for those collecting, managing, and using collections. Several issues remain unresolved, including the inclusion of digital sequence information (DSI) under the CBD and/or the Nagoya Protocol; the conservation and sustainable use of marine biological diversity of areas beyond national jurisdiction (BBNJ) with the consideration of the development of an international instrument under the United Nations Convention on the Law of the Sea; and the interaction with traditional knowledge and data rights of indigenous people, local communities, women and youth (CARE principles for the governance of indigenous data).

Chains of custody

Already for ABS, chains of custody must be documented. The technical implementation of and operational compliance with chains of custody have expanded requirements when specimens are used for specific purposes, such as certification or forensic casework. In a versatile conservation work environment, chains of custody span the path of a specimen and its associated metadata from the gathering event in the field, transport, accessioning, preparation and digitization to use (e.g., lab work, imaging, statistical analyses) and end products (e. g. reports, publications), including loan/gift-events in the context of biodiversity collections. Chain of custody-functionality and the information it provides must be available when required for official reporting (compliance) in conservation contexts, national planning, court evidence, and commercial and customs decisions. These represent some of the main use cases for which transactional mechanisms and provenance ([Topic 10](link to topic 10)) are needed and for which consensus on global implementation mechanisms is needed.

Questions to promote discussion

Which models and frameworks already exist? Have they been implemented and how? Are they in use? What are the experiences with them?
Who decides the specifics of what should be implemented?

Will this need to be an international, legal and multi-stakeholder top-down model or will it be a per-specimen/per-information and per-provider bottom-up model?
What happens if for one specimen or information different potential rights-holders exist? E.g., an indigenous people and local community (IPLC), a researcher who produced derived results, a collections institution, a company holding a patent, etc.
Applicability of subnational and national regulations and international treaties might depend on use and outcomes: e.g., if information is “exported”, i.e., used in a different country/administrative unit; if it is used for non-commercial or commercial purposes; if it results in a commercial product years later; …

Who should be responsible for setting rights and maintaining them, e.g., for biological diversity of areas beyond national jurisdiction (BBNJ)? Who has the legal and/or ethical responsibility? How can this be recorded in Digital Extended Specimens?
What is the power and potential of supporting these obligations and considerations in a Digital Extended Specimen infrastructure? How can they inspire, even “demand” the use of DES infrastructures and spin-off applications?
Are there stakeholders that have not yet been identified that might play an important role in aspects of implementation or compliance?

Information resources

A wide range of background information resources are relevant, including those of a general nature and those related to the biodiversity and natural sciences domain more specifically.

On intellectual property rights

Carroll, M. W. (2006) Creative commons and the new intermediaries. Michigan State Law Review 45, 45–65. https://digitalcommons.wcl.american.edu/cgi/viewcontent.cgi?article=1039&context=facsch_lawrev.

On open science

Bowser, A., Wiggins, A. & Stevenson, R. (2013) Data Policies for Public Participation in Scientific Research: A Primer. DataONE, Albuquerque. http://cdn1.safmc.net/wp-content/uploads/2016/11/28101058/Bowseretal2013_DataPolicyPrimer.pdf
RDA-CODATA Legal Interoperability Interest Group (2016) Legal Interoperability of Research Data: Principles and Implementation guidelines. Legal Interoperability of Research Data: Principles and Implementation Guidelines.
Alonso García, E. (2018) GLOBIS-B Position Paper for Policymakers on the Potential Solutions to Scientific, Technical and Legal Interoperability Issues. DOI: 10.5281/zenodo.1323495.

On collections-based experiences and points of view

BCoN (2019) Extending U.S. Biodiversity Collections to Promote Research and Education. A report by the Biodiversity Collections Network 2019 URL: https://www.aibs.org/home/assets/BCoN_March2019_FINAL.pdf.
Blasiak, R., R. Wynberg, K. Grorud-Colvert, S. Thambisetty, N.M. Bandarra, A.V.M. Canário, J. da Silva, C.M. Duarte, M. Jaspars, A. Rogers, K. Sink, and C.C.C. Wabnitz. (2020) The ocean genome and future prospects for conservation and equity. Nature Sustainability 3: 588–596. doi:10.1038/s41893-020-0522-9.
Colella, J.P., R.B. Stephens, M.L. Campbell, B.A. Kohli, D.J. Parsons, and B.S. Mclean. (2020) The Open-Specimen Movement. BioScience biaa146: 1–10. doi:10.1093/biosci/biaa146.
Fukushima, C., R. West, T. Pape, L. Penev, L. Schulman, and P. Cardoso (2020) Wildlife collection for scientific purposes. Conservation Biology. https://doi.org/10.1111/cobi.13572.
National Academies of Sciences, Engineering, and Medicine (2020) Biological Collections: Ensuring Critical Research and Education for the 21st Century. Washington, DC: The National Academies Press. doi: 10.17226/25592. - Page 28 specifically.
Thiers, B., J. Bates, A.C. Bentley, L.S. Ford, D. Jennings, A.K. Monfils, J.M. Zaspel, J.P. Collins, M.H. Hazbón, and J. L. Pandey (2021) Viewpoint: Implementing a community vision for the future of biodiversity collections. BioScience biab036, 1–3. https://doi.org/10.1093/biosci/biab036
Zimkus, B.M., L.S. Ford, and P.M. Morris (2021) The need for permit management within biodiversity collection management systems to digitally track permits and other legal compliance documentation and increase transparency about origins and uses. Collection Forum. Accepted.

On experiences made in/by human genomics and medicine

Phillips M, etal. (2020) Genomics: data sharing needs an international code of conduct Genomics: data sharing needs an international code of conduct
Maxem A (2021) Why some researchers oppose unrestricted sharing of coronavirus genome data. Why some researchers oppose unrestricted sharing of coronavirus genome data.
Van Noorden R (2021) Scientists call for fully open sharing of coronavirus genome data Scientists call for fully open sharing of coronavirus genome data.

On Access and Benefit-Sharing

Secretariat of the Convention on Biological Diversity. 2020. The Access and Benefit Sharing Clearing House (version 2020.03.17). https://absch.cbd.int/
Smyth, S.J. and T.C. Charles. 2020. Impacts on International Research Collaborations from DSI/ABS Uncertainty. Trends in Biotechnology. doi:10.1016/j.tibtech.2020.10.011.
Gaffney J., R. Tibebu, R. Bart, G. Beyene, D. Girma, N.A. Kane, E.S. Mace, T. Mockler, E.E. Nickson, N. Taylor, and G. Zastrow-Hayes (2020) Open access to genetic sequence data maximizes value to scientists, farmers, and society. Global Food Security. 26: 100411. doi:10.1016/j.gfs.2020.100411.
Elsa Tsioumani (2021) Fair and Equitable Benefit-Sharing in Agriculture https://www.routledge.com/Fair-and-Equitable-Benefit-Sharing-in-Agriculture-Open-Access-Reinventing/Tsioumani/p/book/9780367181864#sup

On rights-based governance of data, access and use

Local Contexts - Grounding Indigenous Rights: Traditional Knowledge and BioCultural Labels. https://localcontexts.org/.
Mukurtu Platform: https://mukurtu.org/.
Carroll SR, et al (2020). The CARE Principles for Indigenous Data Governance. Data Science Journal, 19: 43, pp. 1–12. DOI: https://doi.org/10.5334/dsj-2020-043.
Carroll, S.R., Herczog, E., Hudson, M. et al. Operationalizing the CARE and FAIR Principles for Indigenous data futures. Sci Data 8, 108 (2021). Operationalizing the CARE and FAIR Principles for Indigenous data futures | Scientific Data.
GIDA - Global Indigenous Data Alliance, Promoting Indigenous Control of Indigenous Data. https://www.gida-global.org/
Bernstein J, Heinz V, Schouwink R, Meunier M, Holland E, Roe D (2021). Strengthening equity in the post-2020 Global Biodiversity Framework. IIED, London. https://pubs.iied.org/20156IIED. See also blog post by Holland E & Roe D 2021 https://www.iied.org/practical-guide-helps-negotiators-put-equity-heart-new-global-biodiversity-framework.

On environmental ethics

Van Dooren T, Kirksey E and Münster U (2016) “Multispecies Studies: Cultivating Arts of Attentiveness,” Environmental Humanities 8(1): 1-22.

On sensitive data

Chapman AD (2020) Current Best Practices for Generalizing Sensitive Species Occurrence Data. Copenhagen: GBIF Secretariat. Current Best Practices for Generalizing Sensitive Species Occurrence Data .
Figueira R, Beja P, Villaverde C, Vega M, Cezón K, Messina T, Archambeau A, Johaadien R, Endresen D & Escobar D (2020) Guidance for private companies to become data publishers through GBIF: Template document to support the internal authorization process to become a GBIF publisher. Copenhagen: GBIF Secretariat. https://doi.org/10.35035/doc-b8hq-me03.
GBIF Secretariat & IAIA (2020) Best Practices for Publishing Biodiversity Data from Environmental Impact Assessments. Copenhagen: GBIF Secretariat. Best Practices for Publishing Biodiversity Data from Environmental Impact Assessments.

On chains of custody

for multi-stakeholder approaches to global value chains (see also rights-based approaches)
- Kalyanee Paranjape K,Agarwal N (2021) http://sdg.iisd.org/commentary/guest-articles/transforming-private-sector-contributions-to-the-decade-of-action-evolving-north-south-partnerships/
for forensics to combat forestry, fisheries and wildlife crime, for example:
- UNODC Wildlife and Forest Crime Crimes that affect the environment
- CITES: Tools, Services and Resources available through ICCWC https://cites.org/eng/prog/iccwc/tools.php
- Wildlife and Forest Crime Forensic Guidelines (Ivory & Timber) https://www.unodc.org/unodc/en/wildlife-and-forest-crime/forensic-guidelines.ht
- Law Enforcement Best Practice Flow Diagram for Timber https://www.unodc.org/documents/Wildlife/Timber_Flow_Diagram.pdf
for certification
- FSC standards and guidelines, Chain of Custody Certification | Forest Stewardship Council
- World Forest ID project infrastructure https://worldforestid.org/

JuttaBuschbom · June 16, 2021, 5:36am

Good morning and welcome to discussions focusing on the Digital Extended Specimen concept as a future implemented digital infrastructure embedded in and in exchange with society.

Starting out, open access to and the fair and equitable sharing of benefits derived from biodiversity are fundamental for societies attaining sustainability, as well as for concrete, operational conservation applications. At the same time, the topic of access and benefit sharing (ABS) has gained a certain notoriety for having given rise to a thicket of laws and regulations, difficult to understand and navigate, not only for organismal biologists and collections.

How can the integration of extended physical and digital infrastructures provide orientation, actionable confidence, as well as easy and effective application for providers and users?

Andrawaag · June 16, 2021, 11:43am

When discussing licensing of biodiversity, could we also discuss License stacking when licenses are applied to data.
The existence of various licenses with different degrees of openness seriously hampers the reuse of data.
A lot of data is made available using licenses that are more geared towards reports and other creative outputs, but not all open data licenses are compatible with each other, meaning that they can not be integrated.

It would be great if this consultation cycle could lead to a decision tree that can be used to pick an applicable license for biodiversity data.

JuttaBuschbom · June 18, 2021, 11:50am

Hi Andra,
welcome to the topic and thank you for your interest.
It seems that you are touching on two topics:

for users, advice on which license to choose for biodiversity data and knowledge, so that it can and will be reused, and thus can contribute to eg. conservation, benefit sharing, deepening of our knowledge, and more.
for infrastructure developers, which information and functions will make a user’s interaction with licenses “easier”.

Here, I will explore my view on 1) licensing biodiversity data and knowledge. In a subsequent post, I will compile a list of technical fields and functions as proposal for an answer to 2), infrastructure development and implementation.

License choices for biodiversity data and knowledge

My experience with licenses is that it’s really hard to deal with them, both as user of a licensed resource, as well as a provider who has herself to choose a license for a resource that she is providing, eg. a product, service, … And I am only speaking about decisions to be made in the universe of open/free/public/CC/copyleft/open-source/public-domain-equivalent/permissive licenses (see corresponding pages in https://en.wikipedia.org).

Having not heard the term “license stacking” I followed your link with its very helpful information and animation (librarian’s cool glasses!). Afterwards, I found GBIF’s description of their licenses’ development and their argumentation a good foundation.

Basically, GBIF’s strategy supports “Public money, public code”, ie. public resources. They allow three licenses: CC0, CC-BY and CC-BY-NC.

These are my personal, partly abstract, 5cent:
For simplicity, when ever possible choose CC0. You don’t need a lawyer for that, and you promote sharing and reuse for, hopefully, the good of society and future generations.

When your, your business’, organization’s or institution’s income depends on credit returned by uses of this resource, and there is no other way for “quantifying” the importance and impact of your published resource, then you might use CC-BY. Looking into a future in which the DES infrastructure will be fully up and running, at that point a CC-BY seems not to be necessary anymore. All uses of all resources will be linked, no matter about around how many steps and corners, to the provider or publisher. All it needs is a dashboard for each provider/publisher that is visualizing the work of some (friendly) bots, which are collecting links in the background. [sounds somehow scary? - mmh]

Thinking about much of the data involved in the context of the DES, GBIF and biodiversity (a DNA-sequence, a barcode, a single genome sequence, one physical collection specimen, an observation, a single annotation, … ), this is data which is of interest in the context of “Big Data” analyses. That is, in themselves these single data “points” are not that informative, their value lies in combining them with other data into smaller to larger datasets, on which more or less extensive analyses are run. I am not certain that in these cases CC-BY-NC-licenses will stop an unfair and in the longer term unsustainably acting “big business” or start-up to get rich on exploiting accessible resources on the back of everybody else. We (the public, providers) have no insight into what lies on companies’ servers, what might have been used in R&D, or what data might power commercial platforms providing analytical and information services.

On the other hand, societies, the SDGs, conservation, etc. depend to a large part on businesses to provide sustainable and “fair” products and services for the good of all. Restrictive licenses might provide obstacles for achieving exactly the goals, for which they were chosen for.

Data, information and expertise as business intelligence can form the foundation of business enterprises. Hence, businesses can have good reasons to restrict access to (parts of) their data and information, and/or restrict the use of their information capital, by setting in place more or less restrictive licenses and patents. In this way, they keep them behind a kind of “communication” wall. If somebody would like to use their resources, they need to contact the business and inquire about conditions for access and use. Here, a restrictive license can form the foundation for income, and also cooperation and collaboration.

Apart from setting your data and information resources into the public domain (eg. CC0), all licenses require effort by the licensor and demand long-term responsibility from the licensor. Licenses need upkeep, at a minimum in the form of keeping contact information up to date, ensuring that somebody can be reached in a reasonable time frame for inquiries and communicating a testament decision – what should happen if the licensor isn’t around anymore, will somebody inherit the ownership of the license, who? Licensors might also want to monitor the use of their resources and potentially enforce the license (else, why add restrictions in the first place). Thus, when choosing a restrictive license, it might be a good idea to have it to be time-limited and/or default to the public domain (CC0).

JuttaBuschbom · June 18, 2021, 1:22pm

This is the second part of my answer, considering which fields and functions might make user’s interactions with licenses et al. easier. On one hand, they can provide orientation to users. At the same time, their extent should not scare providers and users away from these issues.

Fields and functions associated with licenses, legal agreements and/or business contracts

[Owner of resource]
License
- Type
- Version
- information/description
- URL
- License holder
Contact for interaction
- Agent (might be owner, license holder, designated contact point, etc.): researcher, institution, organization, business, agency, etc.
- Contact information, eg. email, phone, etc.
Expiration date
- Set legally by regulation
- Set by license holder (needs to be sooner than any legally set date)
Legacy (testament for license and license rights)
- Who will inherit the license and the rights arising from it?
- Information if this testament is legally binding (and why)
Acknowledgements/Attributions
- If the resource includes previous licenses (eg. because it is a composite dataset of several to many data points): yes - no
- Links to those licenses
- (automatically compiled) list of links, resolved to license holders, who require attribution in human and machine-readable form

Legal and/or business statements (eg. permits, contracts) concerning acquisition and/or further uses of the resource. These fields need to be repeating, since several interactions might occur for one resource. Also, interactions might evolve over time, reflection of a series of eg. inquiries, outcomes, contact points, etc.

Legal and/or business agreements (eg. permits, contracts)
- Legal and/or business agreements
  - Are required? Yes – no (eg. associated with acquisition)
  - Exist? Yes - no
- Permit holder, signatories of agreement
- Link to legal agreement or business contract, including (scanned) document
- Contact points and information on both sides, eg. collector and permit agency; resource provider and business partner, etc.
- Associated communication, eg. inquiries and associated outcomes
- Terms of contract
- (Potential) Expiration date
Information about confidential legal/business agreements (eg. Prior Informed Consent (PIC) and Mutually Agreed Terms (MAT) agreements associated with Access and Benefit Sharing, see Nagoya Protocol - Wikipedia, might be confidential)
- Confidential information exists? Yes – no
- Contact for inquiries into confidential information
- Restricted access, accessible only for resource owner/provider (legal contact) and permit provider/business partner:
  - Module under 7.

@Andrawaag and everybody: I am curious what you think about these ideas, proposals. Do you have experience with these topics and matters?

JuttaBuschbom · June 21, 2021, 6:15am

This article from 2018 points to the importance of written agreements. At the same time, it questions if data (“facts”) can or should be copyrighted at all. Has our view changed since then?

Andrawaag · June 23, 2021, 7:56am

I would actually like to steer away from the narrative of “big business” getting wealthy. It often is the main argument in favour of a cc-by-nc license, while the definition of what entails “commercial use” remains murky Creative Commons licenses and the non-commercial condition: Implications for the re-use of biodiversity information (Q20895780). e.g. Does publishing a peer-review paper entails commercial use, since a publisher builds revenue on the process. TBH I am sceptical if there are indeed canonical examples of big companies becoming rich on open data and for those examples that exist whether or not the benefit of widespread reuse would not work against this enrichment.

This is not to say that I am against more restrictive licenses, on the contrary. There can be compelling reasons to not share as open as possible. Different use cases vouch for restrictions (e.g. location information on threatened species). But the “big companies” becoming rich is just not convincing enough and as said I would like to steer away from this narrative towards a more detailed discussion about when to restrict and when not to.

Andrawaag · June 23, 2021, 8:01am

In the EU copyright is also conflated with database rights. So while data are indeed facts and as such not copyrightable, by this alliance with the database protection act copyright is applicable to data.

JuttaBuschbom · June 24, 2021, 6:00am

@Andrawaag Thanks for highlighting the limitations of the CC-BY-NC license from an additional perspective, one that does not require legally questionable actions.

JuttaBuschbom · June 24, 2021, 6:39am

Canonical examples of businesses successfully building on open data abound in human health and medicine. The medical/pharma industry wouldn’t be what they are today without what started with the open data provided by the human genome project 20 years ago and continues with the openly shared, continually extended reference datasets of global human genetic diversity (eg. gnomAD). One of the latest examples is the early sharing of SARS-CoV-2’s genome by Chinese researchers.

At the same time, the widespread reuse of these data doesn’t seem to have hindered some companies, business managers and investors to having become disproportionately rich. In addition, in the case of COVID-19, open sharing of data doesn’t automatically result in vaccines being distributed and available fairly and equitably worldwide.

Nevertheless, currently a consensus seems to be forming that (open) access should be disentangled from the subsequent step of benefit sharing. Accumulating experiences over the past decades have shown that restricting access to basic research data doesn’t seem to correlate well with improved benefit sharing and/or one’s own improved gain of benefits. That is, just because a researcher restricts access to their biodiversity data by selecting a CC-BY-NC license, won’t automatically have a positive effect towards a fairer, juster distribution of benefits.

My point of view of the essence of restrictive licenses (and permits) currently is that they require/demand communication with the licensor. This type of communication doesn’t scale well. It thus produces obstacles, which can pile up to blockages due to becoming unfeasible, for analyses/research into questions which require extensive datasets. The situation thus becomes similar to existing experiences with multilevel (international, national, subnational, local) bilateral permit systems, eg. in phylogeographic research, which is considered basic research and capacity building. Furthermore, there seems to be general agreement that such basic biodiversity and taxonomic research even is doing quite well regarding non-monetary benefit sharing.

Andrawaag · June 24, 2021, 4:50pm

Although I would love for this to be true, I don’t think this is accurate. Open Data/Access became, at best, mainstream in the last decade. Successful companies predate that change in data availability. Yet, much research data remains only available on request (see tweet below) i.e. A lot of research data still remains in data silos. IMHO the opposite is true, companies thrive on data being obscure and inaccessible or by putting substantial effort to improve the quality of the available open data. The economic advantage lies in having access to hidden data OR quality improved open data, but not perse the open data on its own merit.

https://twitter.com/egonwillighagen/status/1405781457604431873

Breda_Zimkus · June 25, 2021, 2:25pm

Feel free to contact the moderators of this topic (@JuttaBuschbom, @hardistyar, and myself @Breda_Zimkus) directly if you would like to participate but do not feel comfortable posting on the platform. We would be happy to share your thoughts.

JuttaBuschbom · June 29, 2021, 7:14am

@Andrawaag Thanks a lot for your comment, Andra. I don’t necessarily agree with several of your points, and thus had to sort out for myself over the past few days, why I still feel that you have highlighted a key point regarding businesses and a future DES’ interactions with them.

My short answer is data (infrastructure, service) quality. I agree with your argument that quality (across the range of criteria) is at the core of business’ capitals, capacities and experiences. Businesses can add a level of quality to data, infrastructures and services that research a) might not be interested in (since its not needed for the progress of knowledge and insight), and b) simply can’t provide due to contextual constraints (short-term project-based work, insufficient funding and resources, different institutional focuses and priorities, …).

Verified and high-quality data in efficient infrastructures, therefore, is something that specifically companies can contribute to capacity building (fact-based knowledge, research and expertise), to knowledge and technology transfer, and thus to tangible benefit sharing for the protection of biodiversity and the achievements of the SDGs.

The question thus becomes: Which characteristics does a DES infrastructure need to have to be acceptable and even desirable for companies? That is, what are the key traits required by companies to consider and then implement links between their own and the DES infrastructure, and to share certain information (eg. metadata) or (sub)sets of their data with the DES network?

In the context of biodiversity conservation and ABS, for me the second phase of benefit sharing so far had been characterized by a) non-monetary benefit sharing being associated with (basic) biodiversity research, while b) businesses are predominantly associated with monetary benefit sharing. However, what if the DES provides the context and functionality, which encourages businesses to share their seriously high-quality datasets with the world?

High-quality data and especially high-quality designed-for-purpose datasets are dearly and urgently needed across the range of conservation applications (I have explored and provided statistical arguments for conservation genomics eg. here). In addition, there is an evolving landscape of free and open-source software (FOSS; include: data) business models, which provide the foundations for some very successful small to large companies.

What would be needed for this to work?

hardistyar · June 30, 2021, 8:14am

My personal perception (might be wrong) is that the choice to use CC-BY is often because a) organisations/individuals feel uncomfortable to entirely waive rights over their data; and/or b) to serve as a reminder/encouragement to users to remember to acknowledge use of someone else’s data.

It is not only license stacking in the traditional sense that is an issue but also the issue of carrying licensing conditions forward through workflows from beginning to end. This is the same problem with added complications, especially when such workflows are automated with common workflow tools such as Taverna, Kepler, Galaxy, Jupyter Notebooks, etc. Often in workflows one is separating out specific data records, discarding useless records, retaining useful records and (re-)combining records (often from several sources) into new datasets to form the basis of an analysis. No workflow I know off presently conveys the relevant licensing details for each and every record through to the end of the workflow. However, when such workflows can become based on manipulation of multiple Digital extended Specimen (DS) objects, this problem disappears because every object contains a pointer to its relevant license.

For non-publicly funded data, restrictions, licensing and limitations on use are purely contractual matters between the funding organisation and the organisation producing the data.

For publicly funded data DiSSCo proposes to follow a principle of unrestrictive licensing, as open as possible (similar to what @JuttaBuschbom explained above).

DiSSCo’s open access guidelines (section 4.4 in Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1) propose a policy of “as open as possible, as closed as (legally) necessary”. DiSSCo will say that exceptions to the open as possible policy must be stated clearly and must be justified strictly according to objective criteria defined by national security, legislation or other regulatory compliance, sensitivity of collection information, and third-party rights (such as personal privacy). Restrictions that do not have a justification based on objective criteria in legislation are legally invalid and are not permitted.

hardistyar · June 30, 2021, 8:39am

A few years ago it was explained to me by a Swiss lawyer that “you cannot own data, just as you cannot own [your] children”. You can only be a guardian, caring for it/them. The following below is the longer explanation of the legal basis that was given to me. I think it is useful to replay it here. I think the lawyer concerned would probably be happy that I share their explanation.

“Ownership does refer to tangible goods. Data are intangible (or immaterial) goods. Property in intangible goods is possible, if the object is a copyrightable work, a patented invention etc., within the EU (but not in the rest of the world!) also if the object is a “database” (which refers to a “database, which shows that there has been qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents” [art. 6 of directive 96/9]). A collection of data can become a copyrightable work, if it is original by the way of selection of items or the presentation. If these conditions are met, the data collection is an IPR-protected object, if not, it is not an object of legal regulation. Even in the first case, the collection is still not a material good, but an immaterial one, and it follows the property rules of copyright, database protection etc., which are very different from property rules for material goods. In these cases, you can speak of “copyright ownership”, “patent ownership”, “database ownership”, but never of “data ownership”, because the protection of copyright refers to the work, but not to the content, the protection of patents refer to the commercial re-use but not to the procedures, the EU-database protection refers to the database as a whole, but not to the single data. I’m aware of the fact, that this may sound somehow far from practice, but these are basic concepts of intellectual property rights and they have a huge impact on the operationality of IPRs. Life would be extremely complicated, when IPR would be the same as ownership in material goods.”

hardistyar · June 30, 2021, 8:47am

My Swiss lawyer colleague also explained the following, which I also think is helpful for everyone. The first explanation I posted, is effectively an expansion of the second paragraph below.

"Licenses allow the use of absolute rights, which means rights that are enforceable against everybody. Such absolute rights are, for example, ownership (with respect to material goods), patents (with respect to inventions), copyright (with respect to works of art and literature [in a very broad sense]). Where there is no absolute right, there is no room for a license.

Data is an immaterial good. Ownership never applies to immaterial goods. A license would be possible, if the data is patentable or copyrightable. This is possible in some cases, but not regularly. Where data is just data there is no license.

It is possible that you keep your data secret and that nobody has access to them. Then you can allow access individually. A lawyer would call such an allowance a “use agreement”, but not a license. It is only enforceable against the contract partner, but not against anybody else. If another person finds access to these data, you cannot impede that person from using them.

Personal data about yourself is still a completely different case. There is also an absolute right, which is called “privacy” in some legislations, or “personal right” in others. It does not give you ownership over these data, but the right to prevent others from certain uses of these data. At the same time, others (for example the police) are allowed to make use of the very same data. That’s why the term of “ownership” is inappropriate for the relation between you and these data."

hardistyar · June 30, 2021, 11:04am

At first glance, all the suggested fields seem reasonable but will need further examination/definition during a design phase. In the openDS data model we can easily make a provision for any kinds of ‘license objects’, ‘legal objects’, etc. as needed. The conditions expressed via such objects can be applied to institutions as a whole, to entire collections within an institution, to specific Digital extended Specimens (DS) corresponding to specific physical specimens and/or to collated datasets of multiple DS (another kind of collection!). Objects can be repeated/multiplied as many times as is necessary e.g., to record all the different business agreements associated with a specific specimen and its data. However, see the paragraph below on scope.

Robust technical infrastructure mechanisms are then used to enforce the “as open as possible, as closed as necessary” policy. DS objects must be secure by design. Such mechanisms must not place onerous and convoluted obligations on data publishers nor must they decrease usability for users. The precise mechanism to be used remains for further study but could, for example be based on ciphertext-policy attribute-based encryption (CP-ABE) [Bethencourt et al. 2007] whereby suitably authorised users have a digital key that unlocks sensitive data encrypted by the data guardian/publisher. Alternatively, but less robust is an approach based on stretched-perimeter access control, whereby ‘policy enforcement points’ are instantiated to the places where access policy is to be enforced [Burnap et al. 2012].

Neither approach, however addresses the inherent trust issue, which is that once access has been given, there is little that can be done to prevent further (malicious) dissemination of the controlled sensitive data. At that moment, a disciplinary or legal recourse becomes the only viable response.

JuttaBuschbom · July 2, 2021, 6:25am

@hardistyar to summarize what I understand:

inaccessible/secret/hidden entities (material or immaterial):
→ use agreements
public/(openly) accessible/visible entities
→ granted absolute rights = licenses
(granted by whom? the law? the fastest-talking person? …)

material → legal ownership (license)
(can you own a tree or plot of land? → eg. multispecies ethics)
immaterial → work (likely “action” as per Hannah Arendt)
- patents: invention = intellectual work
- copyright: creative work

Nagoya Protocol regulating access and benefit sharing:
I am wondering if the PICs (Prior Informed Consent) and MATs (Mutual Agreed Terms) associated with ABS-regulation are based on inaccessibility (cp. 1.; border controls, customs, enforcement) or on ownership (cp. 2.; public accessible though regulated by absolute legal rights)?

JuttaBuschbom · July 2, 2021, 7:53am

To transform all of the before into actionable conclusions:

With all of the work and efforts that we are putting into exploring, developing and designing the DES concept, eg. here in the forum via the consultation process, and already discussing the concrete implementation of the infrastructure, it seems to me that the DES concept and an implemented DES infrastructure will be copyrightable (-> license based on the creative work on an immaterial good/entity/object). It might even be possible to patent the DES concept or parts of it (I don’t think that is the intention, just theoretically to get the definitions right).

Also, the development and design of the schema of a single FAIR digital object or link (FDO) has been and still is not trivial. Thus, it should be well possible to copyright the schema of a FDO and set the use of the schema under a license, which defines the rights, duties and limits of others.

Finally, once the schema (there could actually be several (sub)types) is available and implemented, users have to invest work to fill the fields of the schema with data. For the sake of exploring, if the data stored in a single (DES-) FDO - that is, not a database as a whole or certain subsets of it containing multiple records, but even a single data record (down to a single field entry) - can be licensed, this is my argument for yes, FAIR data can:

Facts out there are reality and can’t be copyrighted. Facts, that is, data stored in an organized, structured, accessible = FAIR way is the creative work of a user.

First, the user is required to disentangle the data story in their mind
"oh, this rare blue-hued grasshopper here at the margin of the organic foods and H2-Harleys mall parking lot in Big City under trees on a hot, sunny day with the evening storms moving in at the side of Iliunnuut and their pet kangaroo, ..."),
then split and extract the information according to the DES fields, and finally pay attention to enter data (manually (back at home) or indirectly automatically (their collection app does most of it for them, though they need to push the right buttons for the correct input forms)) in appropriate ways and formats into the correct, fitting fields. If this were effortless and mindless, we would be already swamped with high-quality digital data. Thus, yes, data in FDOs, that is, DES-data should be copyrightable, I would say.

Breda_Zimkus · July 12, 2021, 4:23pm

Some publishers (e.g., Molecular Ecology) will now require confirmation that research is compliant with relevant national laws implemented in Nagoya agreements. What can be done to make it easier for researchers to demonstrate this?

Topic		Replies	Views
Making FAIR data for specimens accessible Digital/Extended Specimen	59	4566	March 5, 2021
6. Robust access points and data infrastructure alignment Digital/Extended Specimen	32	3204	August 31, 2021
10. Transactional mechanisms and provenance Digital/Extended Specimen	58	3572	March 17, 2022
Structure and responsibilities of a #digextspecimen Digital/Extended Specimen	30	4298	June 29, 2021
11. Partnerships to collaborate more effectively Digital/Extended Specimen	19	2852	October 6, 2021