Hey y’all -
Jen Hammock of the Encyclopedia of Life hinted (see https://gitter.im/EOL/eol?at=5e1fde41a50f33623f44fa87) that this EcoEvoRxiv preprint might be of interest to the GBIF community. Note that the preprint has not yet been peer-reviewed, but has been submitted. Curious to hear your take on it. (disclaimer: I am second author)
Elliott, M. J., Poelen, J. H., & Fortes, J. (2020, January 3). Toward Reliable Biodiversity Dataset References. https://doi.org/10.32942/osf.io/mysfp
For those who do not like to click on doi links, here the abstract:
Toward Reliable Biodiversity Dataset References
No systematic approach has yet been adopted to reliably reference
and provide access to digital biodiversity datasets. Based on
accumulated evidence, we argue that location-based identifiers such
as URLs are not sufficient to ensure long-term data access. We
introduce a method that uses dedicated data observatories to
evaluate long-term URL reliability.
From March through October of 2019, we took periodic
inventories of the data served by major biodiversity aggregators,
including GBIF, iDigBio, DataONE, and BHL. Over the period of
observation, we found that, for each network, 5% to 43% of
registered URLs were intermittently or consistently unresponsive,
0% to 63% produced unstable content, and 13% to 76% became
either unresponsive or unstable.
We propose the use of cryptographic hashing to generate
content-based identifiers that can reliably reference datasets. We
show that content-based identifiers facilitate decentralized archival
and reliable distribution of biodiversity datasets to enable long-term
accessibility of the referenced datasets.