Toward Reliable Biodiversity Dataset References

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1016/j.ecoinf.2020.101132. This is version 4 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Michael John Elliott, Jorrit H. Poelen, Jose Fortes

Abstract

No systematic approach has yet been adopted to reliably reference and provide access to digital biodiversity datasets. Based on accumulated evidence, we argue that location-based identifiers such as URLs are not sufficient to ensure long-term data access. We introduce a method that uses dedicated data observatories to evaluate long-term URL reliability.

From March 2019 through May 2020, we took periodic inventories of the data provided to major biodiversity aggregators, including GBIF, iDigBio, DataONE, and BHL by accessing the URL-based dataset references from which the aggregators retrieve data. Over the period of observation, we found that, for the URL-based dataset references available in each of the aggregators data provider registries, 5% to 70% of URLs were intermittently or consistently unresponsive, 0% to 66% produced unstable content, and 20% to 75% became either unresponsive or unstable.

We propose the use of cryptographic hashing to generate content-based identifiers that can reliably reference datasets. We show that content-based identifiers facilitate decentralized archival and reliable distribution of biodiversity datasets to enable long-term accessibility of the referenced datasets.

DOI

https://doi.org/10.32942/osf.io/mysfp

Subjects

Biodiversity, Life Sciences

Keywords

biodiversity, ecological informatics, Information Retrieval, Information Systems

Dates

Published: 2020-01-03 05:36

Last Updated: 2020-06-01 18:51

Older Versions
License

CC-By Attribution-ShareAlike 4.0 International