Past and future uses of text mining in ecology & evolution

Maxwell Jenner Farrell; Liam Brierley; Anna Willoughby; Andrew Yates; Nicole Mideo

Past and future uses of text mining in ecology & evolution

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files

Public data

Authors

Maxwell Jenner Farrell, Liam Brierley, Anna Willoughby, Andrew Yates, Nicole Mideo

Abstract

Ecology and evolutionary biology, like other scientific fields, are experiencing an exponential
growth of academic manuscripts. As domain knowledge accumulates, scientists will need
new computational approaches for identifying relevant literature to read and include in
formal literature reviews and meta-analyses. Importantly, these approaches can also
facilitate automated, large-scale data synthesis tasks and build structured databases from
the information in the texts of primary journal articles, books, grey literature, and
websites. The increasing availability of digital text, computational resources, and
machine-learning based language models have led to a revolution in text analysis and
Natural Language Processing (NLP) in recent years. NLP has been widely adopted across
the biomedical sciences, but is rarely used in ecology and evolutionary biology. Applying
computational tools from text mining and NLP will increase the efficiency of data synthesis,
improve the reproducibility of literature reviews, formalize analyses of research biases and
knowledge gaps, and promote data-driven discovery of patterns across ecology and
evolutionary biology. Here we present recent use cases from ecology and evolution, and
discuss future applications, limitations, and ethical issues.

DOI

https://doi.org/10.32942/osf.io/c4kvq

Subjects

Biodiversity, Bioinformatics, Ecology and Evolutionary Biology, Life Sciences

Keywords

biodiversity science, computational linguistics, database construction, document classification, Information Extraction, Information Retrieval, Named Entity Recognition, natural language processing, NLP, relation extraction, topic model

Dates

Published: 2022-02-16 12:03

Last Updated: 2022-04-05 05:59

Older Versions

Version 1 - 2022-02-16

License

CC-By Attribution-ShareAlike 4.0 International