A universal DNA barcode for the Tree of Life

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.


Download Preprint


Bruno A S de Medeiros , Liming Cai, Peter J Flynn, Yujing Yan, Xiaoshan Duan, Lucas C Marinho, Christiane Anderson, Charles Davis 


Species identification using DNA barcodes has revolutionized biodiversity sciences and society at large. However, conventional barcoding methods do not reflect genomic complexity, may lack sufficient variation, and rely on limited genomic loci that are not universal across the Tree of Life. Here, we develop a novel barcoding method that uses exceptionally low-coverage genome skim data to create a “varKode”, a two-dimensional image representing the genomic landscape of a species. Using these varKodes, we then train neural networks for precise taxonomic identification. Applying an expertly annotated genomic dataset including hundreds of newly sequenced genomic samples from the plant clade Malpighiales, we demonstrate >91% precision when identifying species or genera. Remarkably, high accuracy remains despite minimal data amounts that lead to failure when applying alternative methods. We further illustrate the broad utility of varKodes across several focal clades of eukaryotes and prokaryotes. As a final test, we classify the entire NCBI eukaryote sequence-read archive to identify its 861 constituent families with >95% precision despite utilizing less than 10 Mbp of data per sample. Enhanced computational efficiency and scalability, minimal data inputs robust to degraded DNA, and modularity for further development make varKoding an ideal approach for biodiversity science.




Bioinformatics, Computational Biology, Genomics, Other Ecology and Evolutionary Biology


biodiversity science, computer vision, DNA barcoding, Malpighiaceae, natural history collections, Neural Networks, Species identification, taxonomy


Published: 2024-01-18 06:01

Last Updated: 2024-04-18 17:34

Older Versions

Creative Commons Attribution-NonCommercial 4.0

Additional Metadata


Conflict of interest statement:

Data and Code Availability Statement:
The current version of varKoder is available at https://github.com/brunoasm/varKoder. A fastai model pre-trained on SRA data is available at https://huggingface.co/brunoasm/vit_large_patch32_224.NCBI_SRA. Open data is not available, pending manuscript peer review.