Handling Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated Data

Tiago R. Simões; Oksana V. Vernygora; Bruno A.S. de Medeiros; April Marie Wright

Handling Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated Data

This is a Preprint and has not been peer reviewed. This is version 2 of this Preprint.

Add a Comment

You must log in to post a comment.

Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files

Supplementary material

Authors

Tiago R. Simões, Oksana V. Vernygora, Bruno A.S. de Medeiros, April Marie Wright

Abstract

Character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological datasets, as it violates the assumption of characters independency that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in datasets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become “inapplicable” across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological datasets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. We find that in small simulated datasets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger datasets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures—a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a dataset, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion. owing to a considerable expansion of the tree parameter space.

DOI

https://doi.org/10.32942/osf.io/r23j8

Subjects

Animal Sciences, Bioinformatics, Ecology and Evolutionary Biology, Evolution, Life Sciences, Research Methods in Life Sciences

Keywords

Dates

Published: 2022-04-07 20:16

Last Updated: 2022-04-08 00:28

Older Versions

Version 1 - 2022-04-07

License

CC-By Attribution-ShareAlike 4.0 International

Additional Metadata

Data and Code Availability Statement:
Supplementary data for peer-review purposes deposited in Dryad. Available upon request.