(l-r)Ricky Chen, Fenix Huang, Christian Reidys, Reza Rezazadegen, Qijun He, Thomas Li, Andrei Bura
Information is data that has been organized or presented in a meaningful way. Mathematical Biocomplexity research focuses on extracting information from biological systems and networks. In particular, we are interested in the information stored at the structural level. Structural information is crucial to the functioning and behavior of a biological system. However, interpreting structural information is challenging when the biological system is large or when it has a complicated “shape.”
We formulate mathematical principles that underpin the complexity of structural information and develop efficient strategies/algorithms to tackle this high complexity.
Our findings lay the foundation for quantitative analysis of neutral evolution, molecular interactions, and network dynamics.
Genetic Data Information
DNA nucleotide information is transcribed into RNA, stabilized by molecular folding. In a plethora of interactions, it is the specific folding configuration and not the particular sequence of nucleotides that determines biological functionality. On one hand, genetic sequences like viruses as well as noncoding RNAs can mutate significantly, and still preserve their phenotype. On the other hand, sequences like riboswitches can switch between two phenotypes without any sequence change. Understanding the sequence-structure relation as well as their "hidden" information proves to be challenging.
Our research focuses on extracting information embedded in the sequence-structure pair that is not readily discoverable using sequence alignment. Specifically, we utilize the folding map, that takes RNA sequences to secondary structures, in order to understand sequence mutation impact on the phenotypic changes and vice versa. This line of work gives a unique insight into evolution dynamics involving both sequences and structures.
- We developed Boltzmann samplers of sequence-structure pairs with various biologically meaningful constraints (Hamming distance filtration, genus filtration, etc.). These samplers facilitate the computational study of the shape-modulated evolution of biomolecules, as well as the design of functional noncoding genes.
- We studied the mutational robustness of let-7 miRNAs, in particular, with respect to multiple-point mutations. We showed that native let-7 genes exhibit higher mutational robustness compared to random inverse-folded sequences. We further demonstrated the connection between the robustness of let-7 miRNAs and cell differentiation across various organisms.
- We are investigating the mutational robustness of let-7 genes in cancer patients, comparing it to that of healthy people. Cancer cells are poorly differentiated compared to normal cells. Due to the important role of let-7 genes in cell differentiation, this line of work has potential application in predicting, preventing and treating cancers that corresponding to malfunctional let-7 genes.
- We are studying the energy-weighted density of the bi-compatible sequences of riboswitch alternative structures, revealing the phenotypic transition signals for the identification of riboswitches. The transitional signal also helps us to understand the irreversible sequence mutations in diseases.
Understanding Large-Scale Networks
Researchers studied the complexity arising from large-scale networks. One particular example is long non-coding RNAs (lncRNAs, typically 1-20 kB) viewed as contact graphs. Recently, lncRNAs have been found to be pervasively transcribed in the human and other mammalian genomes, and are increasingly associated with networks of epigenetic and post-transcriptional control in healthy and diseased biological systems. Understanding the secondary structure of lncRNAs is the key to unveiling its diverse functional roles in gene regulation. In particular, long-range intramolecular interactions in these large molecules are poorly understood and present a significant challenge to predicting secondary structures for lncRNAs.
We investigate the Boltzmann ensemble of secondary structures generated by statistical sampling, and identify the structural features by extracting the information of such an ensemble. We utilize an integrated experiment and computation approach to understand the secondary structure of lncRNAs. Our research facilitates the discovery of the regulation role of lncRNAs and further implications for molecular markers of diseases such as cancer.
- We analyzed the distributions of various structural elements in lncRNAs, such as stacks, hairpin- , interior- and multi-loops, and pseudoknots.
- We analyzed the length spectrum of base-pairings in large RNA secondary structures. The long-range base pairings have length on the order of the sequence length.
- We developed a computational approach to identify the key features of base-pairing interactions, by extracting information from large-scale statistical sampling of secondary structures.
- We are currently working on developing an information-theoretic framework that incorporates key structural features and experimental probing data in a hierarchical fashion to determine the secondary structure of lncRNAs.
- We further studied the relationship between the networks of base-pair interactions in the ensemble and the dynamics generated by sequential update of information from experimental probing data.
Topological Complexity of Interacting Systems
The complexity of an interacting system also arises from its "shape", i.e., its topological complexity. Cross-serial interactions (pseudoknots), as well as multiple-structure interactions (riboswitches), are crucial to the function of biomolecules. The increased complexity of these interactions brings new challenges as well as hints at new methods for understanding them. We construct mathematical models to measure the topological complexity of interacting systems and tackle such complexity using topological recursion. Our research can facilitate the detection and design of functional genes whose structures rank high on this topological complexity scale.
- We apply simplicial homology and study the topological trace between two RNA secondary structures. Such a trace captures the mutually exclusive substructures that enable the ``switching’’ mechanism in riboswitches. This also can be used to distinguish ncRNAs from different classes.
- We derived a novel method for transforming cross-serial interactions into cross-free interactions. The method facilitates fast Boltzmann sampling and statistical analysis of RNA pseudoknot structures.
- We are developing topological approaches for identifying mutually exclusive substructures in the structure ensemble of a given sequence, locating switching sequences, and detecting potential riboswitches.
- We are designing efficient algorithms to construct riboswitch sequences with two desired stable configurations.
- We are extending the homology analysis to planar interaction structures in order to understand the interplay between ligand binding and folding.