(l-r)Ricky Chen, Fenix Huang, Christian Reidys, Reza Rezazadegen, Qijun He, Thomas Li, Andrei Bura
Information is data that has been organized or presented in a meaningful way. Mathematical Biocomplexity research focuses on extracting information from biological systems and networks. In particular, we are interested in the information stored at the structural level. Structural information is crucial to the functioning and behavior of a biological system. However, interpreting structural information is challenging when the biological system is large or when it has a complicated “shape.”
We formulate mathematical principles that underpin the complexity of structural information and develop efficient strategies/algorithms to tackle this high complexity.
Our findings lay the foundation for quantitative analysis of neutral evolution, molecular interactions, and network dynamics.
Genetic Data Information
DNA nucleotide information is transcribed into RNA, stabilized by molecular folding. In a plethora of interactions, it is the specific folding configuration and not the particular sequence of nucleotides that determines biological functionality. On one hand, genetic sequences like viruses as well as noncoding RNAs can mutate significantly, and still preserve their phenotype. On the other hand, sequences like riboswitches can switch between two phenotypes without any sequence change. Understanding the sequence-structure relation as well as their "hidden" information proves to be challenging.
Our research focuses on extracting information embedded in the sequence-structure pair that is not readily discoverable using sequence alignment. Specifically, we utilize the folding map, that takes RNA sequences to secondary structures, in order to understand sequence mutation impact on the phenotypic changes and vice versa. This line of work gives a unique insight into evolution dynamics involving both sequences and structures.
- We developed Boltzmann samplers of sequence-structure pairs with various biologically meaningful constraints (Hamming distance filtration, genus filtration, etc.). These samplers facilitate the computational study of the shape-modulated evolution of biomolecules, as well as the design of functional noncoding genes.
- We studied the mutational robustness of let-7 miRNAs, in particular, with respect to multiple-point mutations. We showed that native let-7 genes exhibit higher mutational robustness compared to random inverse-folded sequences. We further demonstrated the connection between the robustness of let-7 miRNAs and cell differentiation across various organisms.
- We are investigating the mutational robustness of let-7 genes in cancer patients, comparing it to that of healthy people. Cancer cells are poorly differentiated compared to normal cells. Due to the important role of let-7 genes in cell differentiation, this line of work has potential application in predicting, preventing and treating cancers that corresponding to malfunctional let-7 genes.
- We are studying the energy-weighted density of the bi-compatible sequences of riboswitch alternative structures, revealing the phenotypic transition signals for the identification of riboswitches. The transitional signal also helps us to understand the irreversible sequence mutations in diseases.
The energy-spectrum of biocompatible sequences
F. Huang, C. Barrett, and C. Reidys
We develop a bicompatible sequences sampler for two given structures that provides sequences which are thermodynamically stable to both structures. These bicompatible sequences are crucial to understanding phenotypic transitions. We employ this sampler to analyze riboswitch sequences. We show that the two alternative structures of riboswitches are highly accessible to each other when compared to random structure pairs.
Sequence-structure relations of biopolymers
Bioinformatics, 33(3): 382-389. (2017) C. Barrett, F. Huang, and C. Reidys
We develop a sequence sampler that provides sequences which are thermodynamically stable to a given RNA structure.We use this sampler to present a detailed analysis of native genetic sequence-structure pairs. We show that sequences sampled from native structures exhibit significantly distinct signals from the ones sampled from random structures, suggesting intrinsic relevant patterns on native sequences.
- HamSampler is a C program that samples RNA sequences from a given RNA secondary structure with a given Boltzmann distribution as well as with Hamming distance filtration. It takes a secondary structure, a reference sequence and a given distance as input, and generates RNA sequences that are Boltzmann distributed having the specified fixed Hamming distance to the reference sequence.
- Bifold is a C program that samples RNA sequences for two structures. It takes two RNA secondary structures having the same length as input, and generates sequences that are thermodynamically stable to the two input structures.
Understanding Large-Scale Networks
Researchers studied the complexity arising from large-scale networks. One particular example is long non-coding RNAs (lncRNAs, typically 1-20 kB) viewed as contact graphs. Recently, lncRNAs have been found to be pervasively transcribed in the human and other mammalian genomes, and are increasingly associated with networks of epigenetic and post-transcriptional control in healthy and diseased biological systems. Understanding the secondary structure of lncRNAs is the key to unveiling its diverse functional roles in gene regulation. In particular, long-range intramolecular interactions in these large molecules are poorly understood and present a significant challenge to predicting secondary structures for lncRNAs.
We investigate the Boltzmann ensemble of secondary structures generated by statistical sampling, and identify the structural features by extracting the information of such an ensemble. We utilize an integrated experiment and computation approach to understand the secondary structure of lncRNAs. Our research facilitates the discovery of the regulation role of lncRNAs and further implications for molecular markers of diseases such as cancer.
- We analyzed the distributions of various structural elements in lncRNAs, such as stacks, hairpin- , interior- and multi-loops, and pseudoknots.
- We analyzed the length spectrum of base-pairings in large RNA secondary structures. The long-range base pairings have length on the order of the sequence length.
- We developed a computational approach to identify the key features of base-pairing interactions, by extracting information from large-scale statistical sampling of secondary structures.
- We are currently working on developing an information-theoretic framework that incorporates key structural features and experimental probing data in a hierarchical fashion to determine the secondary structure of lncRNAs.
- We further studied the relationship between the networks of base-pair interactions in the ensemble and the dynamics generated by sequential update of information from experimental probing data.
- The block spectrum of RNA pseudoknot structures
Journal of Mathematical Biology, 79: 791-822 (2019) T. J. X. Li, C. Burris, and C. Reidys
We analyze the length-spectrum of blocks in RNA pseudoknot structures. We prove that there almost surely exists a unique giant block and that with high probability any other block has finite length. We compute the probabilities of observing blocks of specific pseudoknot types, such as H-type and kissing hairpins. We show that sliding window methods for structure prediction of large RNAs are incompatible with the unique giant block.
- On an enhancement of RNA probing data using Information Theory
T. J. X. Li, and C. Reidys
We employ an information-theoretic approach to identify a target structure in a Boltzmann ensemble of structures via chemical probing data. Our framework is centered around the ensemble tree: a hierarchical bi-partition of the input ensemble that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. For a Boltzmann ensemble incorporating probing data, our framework correctly identifies the target with fidelity greater than 90%.
- The Rainbow Spectrum of RNA Secondary Structures
Society for Mathematical Biology, 80: 1514-1538 (2018) T. J. X. Li, and C. Reidys
We quantify the length spectrum of base-pairs in RNA secondary structures. We show that there always exists a unique rainbow-arc having length at the same order of the sequence length. This is the first theoretical proof for the almost sure existence of long range base-pairings in large RNAs.
- RNAStructureIdentifier is a software package for identifying the target secondary structure from an RNA structure ensemble. We present a demo of the ensemble tree.
Topological Complexity of Interacting Systems
The complexity of an interacting system also arises from its "shape", i.e., its topological complexity. Cross-serial interactions (pseudoknots), as well as multiple-structure interactions (riboswitches), are crucial to the function of biomolecules. The increased complexity of these interactions brings new challenges as well as hints at new methods for understanding them. We construct mathematical models to measure the topological complexity of interacting systems and tackle such complexity using topological recursion. Our research can facilitate the detection and design of functional genes whose structures rank high on this topological complexity scale.
- We apply simplicial homology and study the topological trace between two RNA secondary structures. Such a trace captures the mutually exclusive substructures that enable the "switching" mechanism in riboswitches. This also can be used to distinguish ncRNAs from different classes.
- We derived a novel method for transforming cross-serial interactions into cross-free interactions. The method facilitates fast Boltzmann sampling and statistical analysis of RNA pseudoknot structures.
- We are developing topological approaches for identifying mutually exclusive substructures in the structure ensemble of a given sequence, locating switching sequences, and detecting potential riboswitches.
- We are designing efficient algorithms to construct riboswitch sequences with two desired stable configurations.
- We are extending the homology analysis to planar interaction structures in order to understand the interplay between ligand binding and folding.
- Loop Homology of Bi-secondary Structures
A. Bura, Q. He, and C. Reidys
A riboswitch is a noncoding RNA with two stable configurations. We investigate the "shape" complexity of the transformation between the two configurations using topology. We show that known riboswitches all lie in the same complexity class and subsequently derive a novel notion of "continuity'' in structural transformations.
- Topological language for RNA
Mathematical Biosciences, 282:109-120 (2016) F. Huang, and C Reidys
We represent the base pairing interactions of an RNA structure as a fat graph where vertices are connected via ribbons. Such representations enable us to resolve cross serial interactions in pseudoknots and efficiently generate them.
- Topology of RNA-RNA interaction structures
J. Andersen, F. Huang, R. Penner, and C. Reidys
We study the interaction between two noncoding RNAs. We investigate the "building blocks" that control the complexity of such interactions using topological analysis.
- A topological framework for signed permutations
Discrete Mathematics, 340: 2161-2182 (2017) F. Huang, and C. Reidys
We develop a topological framework for signed permutations. A signed permutation is represented by a fat graph having a central vertex connected to ribbon edges. A reversal action on a signed permutation can be interpreted as vertex gluing, vertex slicing, and vertex half-flipping. We apply the framework and describe Pevzner’s theory on reversal distances for signed permutations from a topological perspective.