Phyloreferences: Tree-Native, Reproducible, and Machine-Interpretable Taxon Concepts

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: https://doi.org/10.3998/ptpbio.2101. This is version 4 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Nico Cellinese, Stijn Conix, Hilmar Lapp

Abstract

Evolutionary and organismal biology have become inundated with data. At the same rate, we are experiencing a surge in broader evolutionary and ecological syntheses for which tree-thinking is the staple for a variety of post-tree analyses. To fully take advantage of this wealth of data to discover and understand large-scale evolutionary and ecological patterns, computational data integration, i.e. the use of machines to link data at large scale, is crucial. The most common shared entity by which evolutionary and ecological data need to be linked is the taxon to which they belong. We propose a set of requirements that a system for defining such taxa should meet for computational data science: taxon definitions should maintain conceptual consistency, be reproducible via a known algorithm, be computationally automatable, and be applicable across the tree of life. We argue that Linnaean names, the most prevalent means of linking data to taxa, fail to meet these requirements due to fundamental theoretical and practical shortfalls. We argue that for the purposes of data-integration we should instead use phylogenetic definitions transformed into formal logic expressions. We call such expressions phyloreferences, and argue that, unlike Linnaean names, they meet all requirements for effective data-integration.

DOI

https://doi.org/10.32942/osf.io/57yjs

Subjects

Biodiversity, Bioinformatics, Computer Sciences, Databases and Information Systems, Ecology and Evolutionary Biology, Engineering, Life Sciences, Other Ecology and Evolutionary Biology, Physical Sciences and Mathematics, Software Engineering

Keywords

computational semantics, data integration, phylogenetic definitions, phylogenetic taxonomy, phyloreferences, taxon concepts, Tree of Life, tree thinking

Dates

Published: 2021-03-06 10:35

Last Updated: 2021-08-10 04:29

Older Versions
License

CC-By Attribution-ShareAlike 4.0 International