Motif-weighted Structure Alignment for Classification and Evolutionary Studies of Carbonic Anhydrase

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint.

Add a Comment

You must log in to post a comment.


Comments

There are no comments or no comments have been made public for this article.

Downloads

Download Preprint

Supplementary Files
Authors

Hongyi Shi

Abstract

Carbonic anhydrases (CAs) attract interest for their critical roles in various physiological processes and potential application in CO2 sequestration to combat global warming. Despite being an important enzyme family, the classification and evolution of CAs remain elusive due to their high sequence diversity and long evolutionary history. In this paper, the in-silico strategy, Motif-weighted Alignment for Structure-based Protein Classification (MASPC) was developed, which uses OmegaFold simulated CA structures combined with weighted structural motif alignment, TM-weighted, to facilitate more precise polymorphic analysis of large enzyme datasets in a robust manner. The MASPC strategy was first validated by 74 ground-truth CA structures extracted from PDB, showing improved performance compared to sequence-based polymorphic analysis (ClustalO-RAxML). Subsequently, MASPC was applied to analyze a representative database, which contains 1603 CAs from 117 model organisms, with focus on α-, β-, and- γ- CA classes, to cover organisms from across life evolution history. The results indicated that α-, β-, and γ-CAs were well grouped in their own classes, with clearer clustering associated with the CA’s organism. The structural differences among the α-, β-, and γ-CAs revealed by MASPC supported the current understanding that CA classes are the results of convergent evolution. The sub-clusters in α- and β-CAs are highly associated with organisms according to their appearance in evolutionary history, demonstrating a close correlation between CA evolution and life evolution. Furthermore, the MASPC method was also applied to identify 27 potential α-CAs from the NCBI database with less than 40% sequence similarity to a template human carbonic anhydrase II (HCA-II) sequence, demonstrating possible applications in enzyme identification studies.

DOI

https://doi.org/10.32942/X25S7R

Subjects

Bioinformatics, Life Sciences

Keywords

Protein, alignment, evolution, Carbonic Anhydrase, carbon capture

Dates

Published: 2025-02-24 10:01

Last Updated: 2025-02-24 10:01

License

CC BY Attribution 4.0 International

Additional Metadata

Language:
English

Conflict of interest statement:
None

Data and Code Availability Statement:
Data and code is available at https://github.com/resplendentHSHI/TMweighted