The Site/Group Extended Data format and tools

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Authors

Julien Yann Dutheil , Diyar Hamidi, Basile Pajot

Abstract

Comparative sequence analysis permits unravelling the molecular processes underlying gene evolution. Many statistical methods generate candidate positions within genes, such as fast or slowly-evolving sites, coevolving groups or residues, sites undergoing positive selection or changes in evolutionary rates. Understanding the functional causes of these evolutionary patterns requires combining the results of these analyses and mapping them onto molecular structures, a complex task involving distinct coordinate referential systems. To ease this task, we introduce the site/group extended data (SGED) format, a simple text format to store (groups of) site annotations. We developed a toolset, the SgedTools, which permits SGED files manipulation, creating them from various software outputs and translating coordinates between individual sequences, alignments, and three-dimensional structures. The package also includes a Monte-Carlo procedure to generate random site samples, possibly conditioning on site-specific features. This eases the statistical testing of evolutionary hypotheses, accounting for the structural properties of the encoded molecules.

DOI

https://doi.org/10.32942/X26K70

Subjects

Life Sciences

Keywords

molecular evolution, Bioinformatics, Data analysis, Three-dimensional struture, randomization

Dates

Published: 2023-11-30 10:39

Last Updated: 2023-11-30 15:39

License

CC BY Attribution 4.0 International

Additional Metadata

Language:
English

Conflict of interest statement:
None.

Data and Code Availability Statement:
Open data and code available at https://github.com/jydu/sgedtools/