Towards the next generation of species delimitation methods: an overview of Machine Learning applications

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files
Authors

Matheus Salles, Fabricius Domingos

Abstract

Species delimitation is the process of distinguishing between populations of the same species and distinct species of a particular group of organisms. Various methods exist for inferring species limits, with most of them being rooted in Coalescent Theory. Their primary goal is to identify independently evolving lineages that should represent separate species. Coalescent models have improved species delimitation by enabling explicit testing of hypotheses regarding evolutionary independence among lineages. However, they have some limitations, especially regarding complex evolutionary scenarios, large datasets, and varying genetic data types. In this context, machine learning (ML) can be considered as a promising analytical tool, and clearly provides an effective way to explore dataset structures when species-level divergences are hypothesised. In this review, we examine the use of ML in species delimitation and provide an overview and critical appraisal of existing workflows. We also provide simple explanations on how the main types of ML approaches operate, which should help researchers and students interested in the field. While current ML methods designed to infer species limits are analytically powerful, they also present specific limitations and should not be considered as definitive alternatives to traditional coalescent methods for species delimitation. For instance, there are clear limitations regarding the utilisation of simulated data, especially in supervised and deep learning approaches, and the type of data representation used by each ML approach. We then discuss the strengths and weaknesses of existing pipelines, propose best practices for the use of ML methods in species delimitation, and offer insights into potential future applications. Generative adversarial networks and domain adaptation techniques, for instance, could be used to partially address the misspecification issue related to simulating genetic data. Besides, integrating ML methods into the hypothesis testing process, alongside available coalescent-based methods, could enable a more comprehensive exploration of evolutionary models and parameters, improving the accuracy and biological interpretability of species delimitation analyses. Additionally, we suggest guidelines for enhancing the accessibility, effectiveness, and objectivity of ML in species delimitation processes, aiming to offer a transformative perspective on this subject.

DOI

https://doi.org/10.32942/X2W313

Subjects

Biology, Computational Biology, Ecology and Evolutionary Biology, Genetics and Genomics

Keywords

Bioinformatics, molecular data, speciation, phylogenetics, phylogenomics, Artificial intelligence, deep learning., molecular data, speciation, phylogenetics, phylogenomics, Artificial Intelligence, Deep learning

Dates

Published: 2023-12-07 11:20

Last Updated: 2023-12-07 11:20

License

CC BY Attribution 4.0 International

Additional Metadata

Language:
English

Data and Code Availability Statement:
Not applicable