8. Meeting legal/regulatory, ethical and sensitive data obligations

@Andrawaag Thanks a lot for your comment, Andra. I don’t necessarily agree with several of your points, and thus had to sort out for myself over the past few days, why I still feel that you have highlighted a key point regarding businesses and a future DES’ interactions with them.

My short answer is data (infrastructure, service) quality. I agree with your argument that quality (across the range of criteria) is at the core of business’ capitals, capacities and experiences. Businesses can add a level of quality to data, infrastructures and services that research a) might not be interested in (since its not needed for the progress of knowledge and insight), and b) simply can’t provide due to contextual constraints (short-term project-based work, insufficient funding and resources, different institutional focuses and priorities, …).

Verified and high-quality data in efficient infrastructures, therefore, is something that specifically companies can contribute to capacity building (fact-based knowledge, research and expertise), to knowledge and technology transfer, and thus to tangible benefit sharing for the protection of biodiversity and the achievements of the SDGs.

The question thus becomes: Which characteristics does a DES infrastructure need to have to be acceptable and even desirable for companies? That is, what are the key traits required by companies to consider and then implement links between their own and the DES infrastructure, and to share certain information (eg. metadata) or (sub)sets of their data with the DES network?

In the context of biodiversity conservation and ABS, for me the second phase of benefit sharing so far had been characterized by a) non-monetary benefit sharing being associated with (basic) biodiversity research, while b) businesses are predominantly associated with monetary benefit sharing. However, what if the DES provides the context and functionality, which encourages businesses to share their seriously high-quality datasets with the world?

High-quality data and especially high-quality designed-for-purpose datasets are dearly and urgently needed across the range of conservation applications (I have explored and provided statistical arguments for conservation genomics eg. here). In addition, there is an evolving landscape of free and open-source software (FOSS; include: data) business models, which provide the foundations for some very successful small to large companies.

What would be needed for this to work?

My personal perception (might be wrong) is that the choice to use CC-BY is often because a) organisations/individuals feel uncomfortable to entirely waive rights over their data; and/or b) to serve as a reminder/encouragement to users to remember to acknowledge use of someone else’s data.

It is not only license stacking in the traditional sense that is an issue but also the issue of carrying licensing conditions forward through workflows from beginning to end. This is the same problem with added complications, especially when such workflows are automated with common workflow tools such as Taverna, Kepler, Galaxy, Jupyter Notebooks, etc. Often in workflows one is separating out specific data records, discarding useless records, retaining useful records and (re-)combining records (often from several sources) into new datasets to form the basis of an analysis. No workflow I know off presently conveys the relevant licensing details for each and every record through to the end of the workflow. However, when such workflows can become based on manipulation of multiple Digital extended Specimen (DS) objects, this problem disappears because every object contains a pointer to its relevant license.

For non-publicly funded data, restrictions, licensing and limitations on use are purely contractual matters between the funding organisation and the organisation producing the data.

For publicly funded data DiSSCo proposes to follow a principle of unrestrictive licensing, as open as possible (similar to what @JuttaBuschbom explained above).

DiSSCo’s open access guidelines (section 4.4 in Conceptual design blueprint for the DiSSCo digitization infrastructure - DELIVERABLE D8.1) propose a policy of “as open as possible, as closed as (legally) necessary”. DiSSCo will say that exceptions to the open as possible policy must be stated clearly and must be justified strictly according to objective criteria defined by national security, legislation or other regulatory compliance, sensitivity of collection information, and third-party rights (such as personal privacy). Restrictions that do not have a justification based on objective criteria in legislation are legally invalid and are not permitted.

A few years ago it was explained to me by a Swiss lawyer that “you cannot own data, just as you cannot own [your] children”. You can only be a guardian, caring for it/them. The following below is the longer explanation of the legal basis that was given to me. I think it is useful to replay it here. I think the lawyer concerned would probably be happy that I share their explanation.

“Ownership does refer to tangible goods. Data are intangible (or immaterial) goods. Property in intangible goods is possible, if the object is a copyrightable work, a patented invention etc., within the EU (but not in the rest of the world!) also if the object is a “database” (which refers to a “database, which shows that there has been qualitatively and/or quantitatively a substantial investment in either the obtaining, verification or presentation of the contents” [art. 6 of directive 96/9]). A collection of data can become a copyrightable work, if it is original by the way of selection of items or the presentation. If these conditions are met, the data collection is an IPR-protected object, if not, it is not an object of legal regulation. Even in the first case, the collection is still not a material good, but an immaterial one, and it follows the property rules of copyright, database protection etc., which are very different from property rules for material goods. In these cases, you can speak of “copyright ownership”, “patent ownership”, “database ownership”, but never of “data ownership”, because the protection of copyright refers to the work, but not to the content, the protection of patents refer to the commercial re-use but not to the procedures, the EU-database protection refers to the database as a whole, but not to the single data. I’m aware of the fact, that this may sound somehow far from practice, but these are basic concepts of intellectual property rights and they have a huge impact on the operationality of IPRs. Life would be extremely complicated, when IPR would be the same as ownership in material goods.”

1 Like

My Swiss lawyer colleague also explained the following, which I also think is helpful for everyone. The first explanation I posted, is effectively an expansion of the second paragraph below.

"Licenses allow the use of absolute rights, which means rights that are enforceable against everybody. Such absolute rights are, for example, ownership (with respect to material goods), patents (with respect to inventions), copyright (with respect to works of art and literature [in a very broad sense]). Where there is no absolute right, there is no room for a license.

Data is an immaterial good. Ownership never applies to immaterial goods. A license would be possible, if the data is patentable or copyrightable. This is possible in some cases, but not regularly. Where data is just data there is no license.

It is possible that you keep your data secret and that nobody has access to them. Then you can allow access individually. A lawyer would call such an allowance a “use agreement”, but not a license. It is only enforceable against the contract partner, but not against anybody else. If another person finds access to these data, you cannot impede that person from using them.

Personal data about yourself is still a completely different case. There is also an absolute right, which is called “privacy” in some legislations, or “personal right” in others. It does not give you ownership over these data, but the right to prevent others from certain uses of these data. At the same time, others (for example the police) are allowed to make use of the very same data. That’s why the term of “ownership” is inappropriate for the relation between you and these data."

2 Likes

At first glance, all the suggested fields seem reasonable but will need further examination/definition during a design phase. In the openDS data model we can easily make a provision for any kinds of ‘license objects’, ‘legal objects’, etc. as needed. The conditions expressed via such objects can be applied to institutions as a whole, to entire collections within an institution, to specific Digital extended Specimens (DS) corresponding to specific physical specimens and/or to collated datasets of multiple DS (another kind of collection!). Objects can be repeated/multiplied as many times as is necessary e.g., to record all the different business agreements associated with a specific specimen and its data. However, see the paragraph below on scope.

Robust technical infrastructure mechanisms are then used to enforce the “as open as possible, as closed as necessary” policy. DS objects must be secure by design. Such mechanisms must not place onerous and convoluted obligations on data publishers nor must they decrease usability for users. The precise mechanism to be used remains for further study but could, for example be based on ciphertext-policy attribute-based encryption (CP-ABE) [Bethencourt et al. 2007] whereby suitably authorised users have a digital key that unlocks sensitive data encrypted by the data guardian/publisher. Alternatively, but less robust is an approach based on stretched-perimeter access control, whereby ‘policy enforcement points’ are instantiated to the places where access policy is to be enforced [Burnap et al. 2012].

Neither approach, however addresses the inherent trust issue, which is that once access has been given, there is little that can be done to prevent further (malicious) dissemination of the controlled sensitive data. At that moment, a disciplinary or legal recourse becomes the only viable response.

1 Like

@hardistyar to summarize what I understand:

  1. inaccessible/secret/hidden entities (material or immaterial):
    → use agreements

  2. public/(openly) accessible/visible entities
    → granted absolute rights = licenses
    (granted by whom? the law? the fastest-talking person? …)

  • material → legal ownership (license)
    (can you own a tree or plot of land? → eg. multispecies ethics)
  • immaterial → work (likely “action” as per Hannah Arendt)
    • patents: invention = intellectual work
    • copyright: creative work

Nagoya Protocol regulating access and benefit sharing:
I am wondering if the PICs (Prior Informed Consent) and MATs (Mutual Agreed Terms) associated with ABS-regulation are based on inaccessibility (cp. 1.; border controls, customs, enforcement) or on ownership (cp. 2.; public accessible though regulated by absolute legal rights)?

To transform all of the before into actionable conclusions:

With all of the work and efforts that we are putting into exploring, developing and designing the DES concept, eg. here in the forum via the consultation process, and already discussing the concrete implementation of the infrastructure, it seems to me that the DES concept and an implemented DES infrastructure will be copyrightable (-> license based on the creative work on an immaterial good/entity/object). It might even be possible to patent the DES concept or parts of it (I don’t think that is the intention, just theoretically to get the definitions right).

Also, the development and design of the schema of a single FAIR digital object or link (FDO) has been and still is not trivial. Thus, it should be well possible to copyright the schema of a FDO and set the use of the schema under a license, which defines the rights, duties and limits of others.

Finally, once the schema (there could actually be several (sub)types) is available and implemented, users have to invest work to fill the fields of the schema with data. For the sake of exploring, if the data stored in a single (DES-) FDO - that is, not a database as a whole or certain subsets of it containing multiple records, but even a single data record (down to a single field entry) - can be licensed, this is my argument for yes, FAIR data can:

Facts out there are reality and can’t be copyrighted. Facts, that is, data stored in an organized, structured, accessible = FAIR way is the creative work of a user.

First, the user is required to disentangle the data story in their mind
"oh, this rare blue-hued grasshopper here at the margin of the organic foods and H2-Harleys mall parking lot in Big City under trees on a hot, sunny day with the evening storms moving in at the side of Iliunnuut and their pet kangaroo, ..."),
then split and extract the information according to the DES fields, and finally pay attention to enter data (manually (back at home) or indirectly automatically (their collection app does most of it for them, though they need to push the right buttons for the correct input forms)) in appropriate ways and formats into the correct, fitting fields. If this were effortless and mindless, we would be already swamped with high-quality digital data. Thus, yes, data in FDOs, that is, DES-data should be copyrightable, I would say.

Some publishers (e.g., Molecular Ecology) will now require confirmation that research is compliant with relevant national laws implemented in Nagoya agreements. What can be done to make it easier for researchers to demonstrate this?

I think the paradigm shift in the legal regime pertaining to indigenous communities’ sovereign rights over their natural resources and derivative data is creating opportunities for biodiversity institutions and researchers to rethink their existing “best” practices and professional standards surrounding stewardship of physical specimens as well as governance of data. One aspect of conventional transactions that merits a further review is institutional collection acquisition and accession policies. Today, institutions may be required to stop and question the very assumption that field collected biodiversity specimens and associated data have no strings attached. Furthermore, from the risk management perspective, to protect your institutions from legal liability and ethical challenges, it may be necessary to take an additional step of clearance and due diligence to ensure incoming specimens and associated data meet their (updated) collection policy. This is true especially if the institution’s mission and collection are international in scope including those that originated from “beyond jurisdiction.” A boilerplate language in a deed of transfer form, which intends to accomplish outright and absolute transfer of title to specimens and data, may no longer legally and ethically make sense in situations implicating cross-jurisdictional regulatory complexities. For example, hypothetically if your counterpart’s affiliated tribal community holding sovereign rights over material drafted a MOU including terms that the collector-user may not assign title (i.e. transfer ownership) to a third party under any conditions but may arrange a bailment (i.e., deposition) with a biorepository to facilitate future open access and noncommercial use, is the repository institution willing to work with that counterpart and redraft language of a deed into more like a license to accept the deposition and long-term holding of such specimens and data with strings attached or even willing to revamp the whole collection acquisition and data policy? — Pardon if my view is off the main topics of this discussion thread.

2 Likes

@apodemus This topic is definitely part of the discussion of this thread, and thank you for your thoughts. I agree with you and Colella et al. (2020) who suggested that a paradigm shift from specimen ownership to stewardship is one step needed to adapt to the changing legal landscape. A robust global cyberinfrastructure, such as that advocated by the Extended Specimen Network, will also promote transparency regarding the origins and uses of genetic resources. Ultimately both institutional changes and global integration of data will strengthen international collaborations as both providers and users work together to conserve biodiversity. (Colella, J.P., R.B. Stephens, M.L. Campbell, B.A. Kohli, D.J. Parsons, and B.S. Mclean. 2020. The Open-Specimen Movement . BioScience biaa146: 1–10. doi:10.1093/biosci/biaa146)

2 Likes

At the initiative of Dirk Neumann, we discussed in a group the characteristics and functions that will allow the digital and extended specimen infrastructure to be applicable in the context of access and benefit sharing under the Convention on Biological Diversity (CBD) and its Nagoya Protocol.

We are proposing the following eight guidelines and requirements:
Andrew Bentley, Jutta Buschbom, Libby Ellwood, Alex Hardisty, Chris Lyal, Dirk Neumann, Breda Zimkus

  1. Take care to use language that is CBD conformant.
  2. Show the importance of the DES for the continuing design and implementation of the post-2020 Global Biodiversity Framework, as well as the mobilisation and aggregation of data for its monitoring elements and indicators.
  3. As a general rule, we strive to openly publish as much as possible (all data and metadata) online.
  4. Have in place a powerful, strong and well-thought-out layer of user and data access management and security for ‘sensitive data’.
  5. Encrypt all data and most metadata at the level of an individual specimen or digital object. Provide access via (personal) digital keys.
  6. Link obligations and restrictions regarding use to the digital key.
  7. Implement a transactional system that records every transaction.
  8. Workforce capacity building is very much needed across the whole range of the digital realm, its work areas and workflows.

(for an extended version please follow this link)

We are very interested in your thoughts, experiences, comments and in additional considerations regarding these guidelines and requirements.

@apodemus I wanted to return to your comment and ask you and others how the paradigm shift from ownership to stewardship could be achieved? How can opening up data in the way suggested by digital extended specimens (i.e., placing in the public domain and supporting curation by the community) assist this? How can it be stimulated? When the extended specimen network is decentralized and not under the control of a single organization, what benefits or issues does that create?

Are there stakeholders that we don’t even think about because we aren’t aware that they exist and that the DES is of interest to them? What do you as a stakeholder concretely need from a DES infrastructure for it to support you in achieving your objectives?

Several of the questions asked last week (eg. here and in Topic 11) fit well into a concept and methodology called “System redesign toward creating shared value” (SYRCS), recently described in a post of the i2insight-blog.

Systems redesign in our context is a transition from the more local and disparate publishing of data to an integrated, harmonized global infrastructure focused on FAIR functionality and quality, specifically geared for data re-use and their application to efforts of solving real-world problems.

Following the concept of system redesign toward creating shared value, change happens and/or needs four stages (text adapted from the post by Moein Khazaei, Mohammad Ramezani, Amin Padash and Dorien DeTombe):

  1. emancipation and critical thinking
  2. problem structuring
  3. multi-criteria and quantitative decision-making
  4. creating shared value.

Our current questions seem to fit into stage 1, in which we focus on you as stakeholders and query your motivation (sense of purpose and value), your experiences with and thoughts on power (who is in control and who is needed for success), knowledge (experience and expertise), and legitimacy (ensuring that all those affected are involved).

The approach proposed in the blog post is to use ‘is’ and ‘ought to be’ forms of questions, such as “Who is (ought to be) the beneficiary? That is, whose interests are (should be) served?”.

My personal point of view is that the community-driven development of the digital and extended specimen infrastructure and network is an incredible chance. It also shows a strength of the collections and biodiversity communities: to work together and thus having the experiences needed to design, find solutions for and implement complex, shared infrastructures. We are happy that we already reached a wide range of stakeholders, from the humanities and law to conservation and businesses. Please jump in, join the development process and add your experiences and thoughts.

1 Like

In my personal point of view, a consensus model for biodiversity collection stewardship should take flexible approaches instead of trying to achieve its broadest acceptance and application across the globe under one principle in regard to accountability and sustainability; approaches flexible enough to allow coexistence and balancing of competing views, paying due respect to different value systems and national policies concerning biological resources and their derived intellectual property.

3 Likes

These are valuable suggestions. Within GGBN (http://www.ggbn.org) we proposed a permit (and loan) vocabulary as part of the GGBN Data Standard (https://terms.tdwg.org/wiki/GGBN_Data_Standard) a few years ago to enable sharing this information. Several members have implemented it already. Since GGBN is dedicated to molecular samples the need come up with solutions for implementing the Nagoya Protocol is very high. The goal is to make this vocabulary mandatory for all data providing GGBN partners and also add data quality checks in the harvester for these particular terms. The terms include: controlled vocabulary for permit/document types, status of the permit and a qualifier, url to the document, loaning restrictions, loan dates, disposition etc. As part of the SYNTHESYS+ project we are currently reviewing this vocabulary (focussing on preserved collections, biobanks and living collections). We are also planning to propose a TDWG Task Group on Permits and Loans to broaden the scope and include more people. Our goal is, that such a data standard can be implemented by other platforms such as GBIF, DiSSCo or INSDC too. So I’d say the timing fits perfectly to share the efforts.

2 Likes

With regard to a future loans and permits standard, Gabi and I will organise a workshop within the COST Action MOBILISE (https://www.mobilise-action.eu/). The aim is to seek input from different collections communities (e.g. paleontology, observations, geosciences, biology, anthropology) about the legal requirements they are faced with and that should be taken into consideration in the standard development. The workshop will also deal with the implementation of the standard in digital infrastructures, and it would be perfect if we could discuss possible solutions for the digital/extended specimen concept. The workshop is planned for September 29th and 30th, 3-6 pm UTC on both days. I will send the information via different mailing lists soon, but you are also welcome to contact me directly (e.haeffner{at}bgbm{dot}org).

1 Like

@JuttaBuschbom pointed out to me in an email today that decisions relating to access must be set up to accommodate, even be dominated by many fine-scale users - by which I think we are talking about fine-scale decisions on access to information.

We either can work in a context where we inherently assume everyone/everything is trusted to behave (which can be a mistake) or we work as if we always assume no-one/no-thing can be trusted to behave as we would wish. Each time access is needed a user must repeat the steps to establish trust. Halfway scenarios, where some users are trusted and some are not, are increasingly problematical because of the complex interplay of the objectives/policies of multiple data producers with those of data consumers. It becomes easier to establish new trust each time we need to do so.

Consider the analogy of flying. You need a passport/id and a boarding pass to get through the airport and board a specific flight. Just because you did it once, doesn’t make you trusted the next time. It also doesn’t let you access all areas of the airport nor embark on any flight you like. (I wish!).

This ‘zero-trust’ model is becoming the norm in information systems because it’s easier to manage; especially in distributed/decentralized systems when multiple control points and control needs exist (again, think of airports and aeroplanes). With a common general-purpose mechanisms (passport/id and boarding pass) multiple variations of rules/policies can be enforced in different places and according to different needs. A model based on inherent trust only works for a few privileged users and can’t scale well (think VIP/private jets).

A zero-trust model sits well with the remarks of @apodemus requiring:

My question is: What would our boarding pass look like? How can we be inspired by passenger name, destination, flight number, gate, and seat number?

Thinking of the airport analogy further, we can in fact see a clear distinction between passengers (users of the airport) with their boarding passes, and airport staff with their airport id badges and a different set of access rules. Do we also need two kinds of control?

This does not mean all data/information becomes access controlled. Again, an airport allows staff, passengers, and family members on the concourses, departures and arrivals halls with no or only limited control. We more or less have that today with institutional, GBIF and other data portals. Users provide some limited indication of who they are and that they’ll abide by terms and conditions but then they can access/obtain useful data (but generally only data deemed to be of low sensitivity). However, just ticking a box to agree with terms and conditions is no longer sufficient. A proper, accountable record of subsequent transactions must be kept in cases where meeting legal/regulatory, ethical and/or sensitive data obligations applies.

I suspect that it may be similar to the information we as collection managers and curators request when processing a specimen loan - only for data. Who are you? What institution are you affiliated with? What do you need to use the data for? What products are going to be created? We could provide information similar to a specimen loan agreement that would outline how to cite, how to link, etc. A digital key could then be provided to any data that is sensitive and has been obscured. The biggest stumbling block that I see is that most of the data will be freely available so how do we enforce such a system?

The hope is of course that by making the system linkable and transparent that all such transactions would be visible and traceable by all which will hopefully guarantee compliance.