Skip to main content
Classification and regression trees clarify the role of epistasis and environment in genotype–phenotype maps

Classification and regression trees clarify the role of epistasis and environment in genotype–phenotype maps

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Sudam Surasinghe, Swathi Nachiar Manivannan, Lorin Crawford, C. Brandon Ogbunugafor 

Abstract

Understanding how genetic variation translates into phenotypic outcomes is central to various sub-fields of genetics. This task is complicated by a range of forces–including epistasis, environmental modulation of mutation effects, and ecological influences–that complicate the process of mapping from genotype to phenotype. In this study, we apply a unified decision tree approach, classification and regression trees (CART), to model genotype-phenotype relationships across protein fitness landscapes across a diversity of organisms: (i) a fluorescent protein isolated from Entacmaea quadricolor (bubble-tip anemone), (ii) antifolate resistance in Plasmodium falciparum (malaria parasite) dihydrofolate reductase (DHFR) under drug concentration gradients, (iii) allelic variants from the long-term evolution experiment (LTEE) in Escherichia coli, (iv) proteostasis-modulated drug resistance phenotypes in three bacterial orthologues of DHFR, and (v) chemotypic diversification of sesquiterpene
synthases in Nicotiana tabacum (cultivated tobacco). Our results demonstrate that decision trees can effectively capture higher-order interactions between mutations and environments, uncovering nonlinear dependencies and contingencies that are often missed by traditional parametric models. By enabling clear visualization of interaction hierarchies, CART serves as both a predictive tool and an explanatory framework for genotype-phenotype mapping. This approach has use cases across the spectrum, from resolving the genomic architecture of biological traits, to personalized medicine, and varied applications in bioengineering.

DOI

https://doi.org/10.32942/X2N643

Subjects

Life Sciences

Keywords

Population genetics, machine learning, Epistasis, environmental epistasis

Dates

Published: 2025-09-23 17:37

Last Updated: 2025-09-23 17:37

License

CC-By Attribution-NonCommercial-NoDerivatives 4.0 International

Additional Metadata

Language:
English

Data and Code Availability Statement:
https://github.com/OgPlexus/Cartepistasis1