A review of the heterogeneous landscape of biodiversity databases: opportunities and challenges for a synthesized biodiversity knowledge base

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.1111/geb.13497. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.


Download Preprint


Xiao Feng, Brian Joseph Enquist, Daniel Park, Bradley Boyle, David D. Breshears, Rachael Gallagher, Aaron Lien, Erica Newman, Joseph Robert Burger, Brian Maitner


Aim: Addressing global environmental challenges requires access to biodiversity data across wide spatial, temporal and biological scales. Recent decades have witnessed an exponential increase of biodiversity information aggregated by biodiversity databases (hereafter ‘databases’). However, heterogeneous coverage, protocols, and standards of databases hampered the data integration among databases. To stimulate the next stage of data integration, here we present a synthesis of major databases, and investigate i) how the coverages of databases vary across taxonomy, space, and record type; ii) the degree of integration among databases; iii) how integration of databases can increase biodiversity knowledge; iv) the barriers to databases integration.
Location: Global
Time period: Contemporary
Major taxa studied: Plants and Vertebrates
Methods: We reviewed the scope of twelve well-established databases and assessed the status of their integration. We synthesized information from these databases to assess major knowledge gaps and barriers to fully integration. We estimated how improved integration can increase the coverage and depth of biodiversity knowledge.
Results: Each reviewed database had unique focus of data coverages. Data flows were common among databases, though not always clearly documented. Functional trait databases were more isolated than those pertaining to species distributions. Poor compatibility between taxonomic systems used by different databases posed a major challenge to integration. We demonstrated that integration of distribution databases can lead to greater taxonomic coverage that corresponds to 23 years’ advancement in knowledge accumulation, and improvement in taxonomic coverage could be as high as 22.4% for trait databases.
Main conclusions: Rapid increase of biodiversity knowledge can be achieved through the integration of databases, providing the data necessary to address critical environmental challenges. Our synthesis provides an overview of the integration status of databases. Full integration across databases will require tackling the major impediments to data integration – taxonomic incompatibility, lags in data exchange, barriers to effective data synchronization, and isolation of individual initiatives.




Ecology and Evolutionary Biology, Life Sciences, Other Ecology and Evolutionary Biology



Published: 2021-06-29 14:17

Older Versions

CC-BY Attribution-NonCommercial 4.0 International