Skip to main content
Anomaly detection in metabarcoding amplicon reads using an LSTM-CNN deep neural network ensemble (MetAnoDe)

Anomaly detection in metabarcoding amplicon reads using an LSTM-CNN deep neural network ensemble (MetAnoDe)

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Alexander Keller

Abstract

Metabarcoding has emerged as a critical tool in ecology and other scientific disciplines, facilitating species identification in diverse samples for biodiversity monitoring, community and microbiome analysis, dietary studies, and understanding species interactions. However, challenges arise from errors and artifacts introduced during laboratory processes such as PCR and sequencing. Manual inspection is impractical due to the vast amount of sequences, necessitating rapid algorithms to clean the data. Thorough bioinformatic data cleanup can reduce such mistakes by removal of low-quality sequences or such classified as non-fitting through alignments. However, in practice some anomalous sequences evade detection, while also normal sequences may be mistakenly removed. Deep neural networks (DNNs) offer a promising solution by recognizing complex DNA sequence patterns. In this study I present a new software MetAnoDe (Metabarcoding Anomaly Detection), featuring development of novel deep-learning LSTM and CNN models for independent application and use as an ensemble model. MetAnoDe employs an alignment-free approach that complements existing tools, enhancing data cleanup efficiency. Here, the three models were trained for bacterial 16S-V4 and plant ITS2 markers which can be readily reused in other studies. Cross-validation and real-world data testing demonstrate high accuracy. Optimal integration into pipelines can also streamline overall runtime, synergizing effectively with current alignment-based methods. It is further adaptable for other markers due to the software's automated model training capability. In conclusion, MetAnoDe enhances metabarcoding by efficiently identifying anomalous sequences. An integration of DNNs with traditional approaches enhances biodiversity estimates by reducing non-target sequence inclusion, ensuring more accurate and comprehensive results.

DOI

https://doi.org/10.32942/X2792N

Subjects

Life Sciences

Keywords

machine learning, microbiome, metabarcoding, 16S, ITS2, outlier detection, convolutional neural network, long short-term memory, recurrent neural networks

Dates

Published: 2025-03-20 12:06

License

CC-BY Attribution-NonCommercial 4.0 International

Additional Metadata

Language:
English

Data and Code Availability Statement:
https://github.com/chiras/MetAnoDe