Collections catalogue (GRBio)

OK @trobertson I’ll try and answer some of these questions. And to be clear, I’m still trying to get my head around Wikidata as well. Given that many different people and communities contribute, and they often have different goals, things can get messy.

First off, to try and estimate the number of museums and herbaria in Wikidata I ran a SPARQL query:

  SELECT DISTINCT * WHERE {
  { ?repository wdt:P31 wd:Q181916. }
  UNION
  { ?repository wdt:P31 wd:Q1970365. }
  UNION
  { ?repository wdt:P31 wd:Q26959059. }
  ?repository rdfs:label ?label.
  FILTER((LANG(?label)) = "en")
}

The result is here: http://tinyurl.com/y55sfe95 This finds 387 institutions. The query is more complex than I’d like because it looks for herbaria, natural history museums, and zoological museums (clearly not an exclusive list of institutions). For fun here’s a map (addressing one of @Debbie’s concerns, Wikidata makes it trivial to create maps).

If we take GrBio’s 7000 institutions, then 387 is clearly fairly small. But this query will miss a lot of institutions (e.g., universities, botanic gardens, etc.) There are also lots of Wikidata entries that come from Wikispecies and are pretty minimal (often just the institutionCode). I scrapped these from Wikispecies and looked them up in Wikidata, this gives us about 1300 institutions. Wikispecies editors are creating specimen records (e.g., type specimens) and linking those to institution pages via institutionCode, it is these pages that end up in Wikidata.

In terms of communities using Wikidata for collections, I don’t think that’s much of a thing yet, although some people are uploading specimens(!). But many museum records are quite rich, the AMNH being a great example: https://www.wikidata.org/wiki/Q217717

There’s a lot going on with Wikidata in relation to gene families, the academic literature, etc. that I haven’t gone into here, instead I’ve focussed on museums and herbaria. I think it’s fair to say that there are big gaps in Wikidata’s coverage, and it’s going to be a challenge to sort out. I’m trying to do some mapping between GrBio, NCBI, Wikispecies, Wikidata, and JSTOR to make some sense of this. The real test will be what happens if and when we ask the wider community to help out.

1 Like