Best practices in designing, sequencing and identifying random DNA barcodes

This is a Preprint and has not been peer reviewed. The published version of this Preprint is available: This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


There are no comments or no comments have been made public for this article.


Download Preprint

Supplementary Files

Milo S. Johnson, Sandeep Venkataram, Sergey Kryazhimskiy


Random DNA barcodes are a versatile tool for tracking cell lineages, with applications ranging from development to cancer to evolution. Here we review and critically evaluate barcode designs as well as methods of barcode sequencing and initial processing of barcode data. We first demonstrate how various barcode design decisions affect data quality and propose a new optimal design that balances all considerations that we are currently aware of. We then discuss various options for the preparation of barcode sequencing libraries, including inline indices and Unique Molecular Identifiers (UMIs). Our main conclusion is that the utility of inline indices is high whereas that of UMIs is low. Finally, we test the performance of several established and new bioinformatic pipelines for the extraction of barcodes from raw sequencing reads and for error correction. We find that both alignment and regular expression-based approaches work well for barcode extraction, and that error correction pipelines designed specifically for barcode data are superior to generic ones. Overall, this review will help researchers approach their barcoding experiments in a deliberate and systematic way.



Bioinformatics, Biotechnology, Cell and Developmental Biology, Ecology and Evolutionary Biology, Evolution, Life Sciences


barcodes, development, evolution, lineage tracking


Published: 2022-09-28 13:21


CC-By Attribution-ShareAlike 4.0 International