The changing landscape of text mining - a review of approaches for ecology and evolution

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Maxwell J Farrell , Nicolas Le Guillarme, Liam Brierley, Bronwen Hunter, Daan Scheepens, Anna Willoughby, Andrew Yates, Nicole Mideo

Abstract

In ecology and evolutionary biology, synthesis and modelling of data from published literature is a common practice for generating insight and testing theories across systems. However, the tasks of searching, screening, and extracting data from literature are often arduous. Researchers may manually process hundreds to thousands of articles for systematic reviews, meta-analyses, and compiling synthetic datasets. As relevant articles expand to tens or hundreds of thousands, computer-based approaches can increase efficiency and dramatically improve the transparency and reproducibility of literature-based research. Methods available for text mining are rapidly changing due to developments in machine learning-based language models. Here we review the growing landscape of approaches, mapping them onto three broad paradigms (Frequency-based approaches, Traditional Natural Language Processing, and Deep learning-based language models). This review serves as an entry point to learn foundational and cutting edge concepts, vocabularies, and methods, and foster better integration of these tools into ecological and evolutionary research. We discuss approaches for modelling ecological texts, generating training data, developing custom models, and interacting with Large Language Models, and we present challenges and possible solutions to implementing these methods in ecology and evolution.

DOI

https://doi.org/10.32942/X2VG87

Subjects

Ecology and Evolutionary Biology, Life Sciences

Keywords

natural language processing, large language models, Deep learning, literature synthesis, Information Extraction, database construction

Dates

Published: 2024-02-20 22:13

License

CC-By Attribution-ShareAlike 4.0 International

Additional Metadata

Language:
English

Conflict of interest statement:
None

Data and Code Availability Statement:
N/A