Collections catalogue (GRBio)

rdmpage · March 22, 2019, 9:25pm

OK @trobertson I’ll try and answer some of these questions. And to be clear, I’m still trying to get my head around Wikidata as well. Given that many different people and communities contribute, and they often have different goals, things can get messy.

First off, to try and estimate the number of museums and herbaria in Wikidata I ran a SPARQL query:

  SELECT DISTINCT * WHERE {
  { ?repository wdt:P31 wd:Q181916. }
  UNION
  { ?repository wdt:P31 wd:Q1970365. }
  UNION
  { ?repository wdt:P31 wd:Q26959059. }
  ?repository rdfs:label ?label.
  FILTER((LANG(?label)) = "en")
}

The result is here: http://tinyurl.com/y55sfe95 This finds 387 institutions. The query is more complex than I’d like because it looks for herbaria, natural history museums, and zoological museums (clearly not an exclusive list of institutions). For fun here’s a map (addressing one of @Debbie’s concerns, Wikidata makes it trivial to create maps).

If we take GrBio’s 7000 institutions, then 387 is clearly fairly small. But this query will miss a lot of institutions (e.g., universities, botanic gardens, etc.) There are also lots of Wikidata entries that come from Wikispecies and are pretty minimal (often just the institutionCode). I scrapped these from Wikispecies and looked them up in Wikidata, this gives us about 1300 institutions. Wikispecies editors are creating specimen records (e.g., type specimens) and linking those to institution pages via institutionCode, it is these pages that end up in Wikidata.

In terms of communities using Wikidata for collections, I don’t think that’s much of a thing yet, although some people are uploading specimens(!). But many museum records are quite rich, the AMNH being a great example: https://www.wikidata.org/wiki/Q217717

There’s a lot going on with Wikidata in relation to gene families, the academic literature, etc. that I haven’t gone into here, instead I’ve focussed on museums and herbaria. I think it’s fair to say that there are big gaps in Wikidata’s coverage, and it’s going to be a challenge to sort out. I’m trying to do some mapping between GrBio, NCBI, Wikispecies, Wikidata, and JSTOR to make some sense of this. The real test will be what happens if and when we ask the wider community to help out.

Topic		Replies	Views
Alternatives / followup of GRBio	2	1232	July 5, 2018
Toward Reliable Biodiversity Dataset References Data Use	8	2923	February 24, 2020
Citations and use update (presentation by Daniel Noesgaard) GB27	6	1191	October 21, 2020
Bionomia: Indexing, displaying links to collectors & determiners	12	1637	May 9, 2022
Technical Support hour for Nodes - Global Registry of Scientific Collections (GRSciColl) Data Publishing NodesSupportHour	7	796	July 16, 2023

Collections catalogue (GRBio)

Related topics