Harmonizing taxon names in biodiversity data: a review of tools, databases, and best practices

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1111/2041-210X.13802. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files
Authors

Matthias Grenié , Emilio Berti, Juan D. Carvajal-Quintero, Gala Mona Louise Dädlow, Alban Sagouis, Marten Winter

Abstract

1. The process of standardizing taxon names, taxonomic name harmonization, is necessary to properly merge data indexed by taxon names. The large variety of taxonomic databases and related tools are often not well described. It is often unclear which databases are actively maintained or what is the original source of taxonomic information. In addition, software to access these databases is developed following non-compatible standards, which creates additional challenges for users. As a result, taxonomic harmonization has become a major obstacle in ecological studies that seek to combine multiple datasets.
2. Here, we review and categorize a set of major taxonomic databases publicly available as well as a large collection of R packages to access them and to harmonize lists of taxon names. We categorized available taxonomic databases according to their taxonomic breadth (e.g. taxon-specific vs multi-taxa) and spatial scope (e.g. regional vs global), highlighting strengths and caveats of each type of database. We divided R packages according to their function, (e.g. syntax standardization tools, access to online databases, etc.) and highlighted overlaps among them. We present our findings (e.g. network of linkages, data and tool characteristics) in a ready-to-use Shiny web application (available at: https://mgrenie.shinyapps.io/taxharmonizexplorer/).
3. We also provide general guidelines and best practice principles for taxonomic name harmonization. As an illustrative example, we harmonized taxon names of one of the largest databases of community time series currently available. We showed how different workflows can be used for different goals, highlighting their strengths and weaknesses and providing practical solutions to avoid common pitfalls.
4. To our knowledge, our opinionated review represents the most exhaustive evaluation of links among and of taxonomic databases and related R tools. Finally, based on our new insights in the field, we make recommendations for users, database managers, and package developers alike.

DOI

https://doi.org/10.32942/osf.io/e3qnz

Subjects

Biodiversity, Ecology and Evolutionary Biology, Life Sciences, Other Ecology and Evolutionary Biology

Keywords

R packages, taxonomic databases, taxonomic harmonization, taxonomic name matching, taxonomic tools, taxonomy

Dates

Published: 2021-09-03 02:33

Last Updated: 2021-12-09 00:01

Older Versions
License

CC-BY Attribution-No Derivatives 4.0 International