Google has released Dataset Search, which looks cool. GBIF data comes up, e.g. https://toolbox.google.com/datasetsearch/search?query=Coleoptera BUT there are no GBIF links or logos. Instead the kinks are to Datacite and the logos (if any) are for the sources of the data. I think this is potentially a problem for GBIF as it is essentially invisible in the search results, even if it is actually the reason the data occurs in those results. Maybe someone should talk to Google and Datacite about fixing this?
Thereâs a Twitter thread here that discusses this a bit more: https://twitter.com/rdmpage/status/1037391171423744001?s=20
Indeed, GBIF mediated datasets do not appear as they should for the moment.
Also noticed that lots of GBIF downloads appear as âdatasetsâ with GBIF logo.see search results
It should not be too difficult to fix and its crucial to GBIF visibility to have that right!
Hope the Secretariat will start investigating this soon.
@andre I think there are a couple of issues here.
Firstly, who best to give credit to for GBIF (-insert politically correct prefix here) datasets? GBIF themselves seem happy with the original providerâs logo being used:
BTWâweâre really happy with the logo-and-star-credit for data providers, and with display of GBIF download DOIs⌠https://twitter.com/GBIF/status/1037618845224247297
If the goal is to increase visibility of the original provider, that makes sense. However, it is at the cost of GBIFâs visibility (and not helped by the link to the dataset being to a DataCite metadata page, not the GBIF page for the dataset).
The second issue is, as you point out, the huge number of GBIF downloads that appear with the GBIF logo. These are, mostly, a waste of space in that they arenât datasets as such. Iâve never been happy with GBIFâs decision to assign a DOI to every download, especially as those downloads are not guaranteed to be persistent (which undermines the very idea of a DOI in the first place). But I guess itâs an attempt to make it easier to cite GBIF data. But a consequence is that meta search engines like Googleâs get swamped with âdatasetsâ that arenât datasets.
So I think as things stand GBIF (a) doesnât get any visibility for the data it mobilises and (b) gets visibility for essentially spamming the search results
None of this may be what GBIF imagined would happen, but itâs something that deserves some attention, especially given the potential visibility of Googleâs latest toy.
Point a) is basically correct as it stands nowâbut we do feel that itâs important for the data publishers to receive credit. Whatâs happening now is that DataCite is standing in for GBIF as the provider, in the absence of Schema.org-compatible metadata. Providing that is and has been on the docket, but with the release of Google Dataset Search, it takes on greater urgency. We werenât expecting the tool, either.
Point b)âŚwell, legal and regulatory frameworks recognize a download as a new dataset, whether itâs derivative or not. We do assign DOIs, and we couldnât do be doing what weâre doing with literature tracking without them. And they provide a detailed, transparent and reproducible link back to the complete list of sources while reinforcing the provenance. âSpamming the search resultsâ seems a bit extreme.
If it is, it probably wonât for long, once Google start tweaking the results, providing facets and filters, AND adds more data from other domainsâenvironmental data was an easy get for them to start. In the meantime, presumably because it provides full-text search of dataset titles, it provides an interesting way to search what people are downloading by common names.