Towards the next generation of species delimitation methods: an overview of Machine Learning applications

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files
Authors

Matheus Salles, Fabricius Domingos

Abstract

Species delimitation is the process of distinguishing between populations of the same species and distinct species of a particular group of organisms. Various methods exist for inferring species limits, whether based on morphological, molecular, or other types of data. In the case of methods based on DNA sequences, most of them are rooted in the coalescent theory. However, coalescence-based models have limitations, especially regarding complex evolutionary scenarios, large datasets, and varying genetic data types. In this context, machine learning (ML) can be considered as a promising analytical tool, and provides an effective way to explore dataset structures when species-level divergences are hypothesized. In this review, we examine the use of ML in species delimitation and provide an overview and critical appraisal of existing workflows. We also provide simple explanations on how the main types of ML approaches operate, which should help uninitiated researchers and students interested in the field. Our review suggests that while current ML methods designed to infer species limits are analytically powerful, they also present specific limitations and should not be considered as definitive alternatives to coalescent methods for species delimitation. On the other hand, such variability might also represent an advantage, highlighting the flexibility of ML algorithms. Future enterprises should consider the constraints related to the use of simulated data, as in other model-based methods relying on simulations. We also propose best practices for the use of ML methods in species delimitation, offering insights into potential future applications. We expect that the proposed guidelines will be useful for enhancing the accessibility, effectiveness, and objectivity of ML in species delimitation.

DOI

https://doi.org/10.32942/X2W313

Subjects

Biology, Computational Biology, Ecology and Evolutionary Biology, Genetics and Genomics

Keywords

Bioinformatics, molecular data, speciation, phylogenetics, phylogenomics, Artificial intelligence, deep learning., molecular data, speciation, phylogenetics, phylogenomics, Artificial Intelligence, Deep learning

Dates

Published: 2023-12-07 12:20

Last Updated: 2024-10-09 03:47

Older Versions
License

CC BY Attribution 4.0 International

Additional Metadata

Language:
English

Data and Code Availability Statement:
Not applicable