Preferred identifiers for GRSciColl entries - Should we mint DOIs for collections?

This is a thread to follow up on a topic we started discussing during the April GRSciColl community call (you can check the recording here: Global Registry of Scientific Collections (GRSciColl) community call - April 2024 on Vimeo).

A few months ago (November 2023), we ran a survey to collect input on the GRSciColl data schema in order to update it (this is in the context of our road map work).
At the time, 8 responders gave us feedback on GRSciColl identifiers.

  • Most people agree that we need to add context for the identifiers (and identifier types) available on GRSciColl. Unless you are familiar with the world of identifiers, what you see on GRSciColl might be difficult to navigate.
  • Most people also agreed that it would help to have a “how to cite” section on GRSciColl institution and collection pages so people know exactly what to cite (which identifier).
  • At the time, most responder also said that GRSciColl wouldn’t need to mint DOIs for GRSciColl entries.

During our community call on Wednesday this week, the question of preferred identifiers and DOIs came back. I would like to have a bit more input on the topic.

What do you think should be the preferred identifiers to reference institutions and collections on GRSciColl?

Here are a few ideas discussed during the call and a poll:

  • Institutions should be able to choose which identifier people should be used for referencing their entries.
  • The preferred identifiers should be (at least by default) GRSciColl URLs. The advantage being that this is something created and maintained by GBIF, it doesn’t rely on external sources.
  • The preferred identifiers should be RORs for institutions. This is a position that have been voiced multiple times. However it relies on two things:
    • Institution (or someone) making sure that the correct ROR identifiers are in GRSciColl for the correct entries (right now about 6% of the institutions have a ROR id in GRSciColl).
    • That ROR maintain those identifiers
  • The preferred identifiers should be DOIs for collection. With the caveat that minting DOIs has a cost, perhaps this could be for institutions who request them only (maybe with a button in the GRSciColl interface?)
  • The preferred identifiers should be something else? ARK identifiers were mentioned during the call:
    • ARKs are generic and persistent. The good thing is that they are completely free. The are most often used in heritage field but should be adaptable for natural collections. See https://arks.org/

What do you think? Feel free to vote here and/or comment. Many thanks!

  • Institutions should be able to choose for themselves
  • GRSciColl URLs/UUIDs should be the default
  • ROR for institutions
  • DOIs (minted by GRSciColl) for collections
  • Others (ARK?)
0 voters

We have over 2,700,000 DOIs for downloads and 100,000 for datasets, so there would be no additional cost to make a few thousand DOIs for GRSciColl entities.

1 Like

Is the concern INSTITUTIONAL identifiers or COLLECTION identifiers. I see needing an official institutional identifier opening up a paperwork nightmare for people in charge of small collections and institutions where there is little understanding of the needs associated with information sharing, the kind of people I work with. So I opt for DOIs for institutions, minted by GRSciColl (since it is the booy wanting them) - and the option of using an established, adequately unique, recognized identifier. I do not like using the collection code for an institution code.

1 Like

I assume by DOI, you mean a DataCite DOI. Would the resourceTypeGeneral = “Collection” rather than “Dataset” as is the case for a GBIF Download? They define this as: “An aggregation of resources, which may encompass collections of one resourceType as well as those of mixed types. A collection is described as a group; its parts may also be separately described.”

In any case, you’ll need a URL and HTML landing page for a DOI, whether it’s a Dataset or not. Can you sufficiently populate (and maintain) the metadata for a DataCite DOI if it had resourceTypeGeneral = “Collection” if that entity was split across institutions as is often the case? Could you also populate relationType to specify the relationship among and between parts of a collection? Does the DataCite schema support an array of “publisher” or whatever could be used to specify the potentially many host institutions for a single collection?

2 Likes

I think @dshorthouse raises some valid points worth consideration. I think we should explore if and how well these collections fit the DataCite metadata schema. @mgrosjean, let’s put our heads together next week? :slight_smile:

2 Likes

Note that GRSciColl entries can refer to “inactive” collections which were lost since (like this one) or were split and integrated in other collections.
It can helpful to have those entries available as these collections might be referenced in publications.

@dnoesgaard and I decided that we are going to map a few GRSciColl collections to DataCite (test) and share them here in the coming week(s) so we can discuss what makes the most sense.