The Team
(lr)Ricky Chen, Fenix Huang, Christian Reidys, Reza Rezazadegen, Qijun He, Thomas Li, Andrei Bura
Our Research
Information is data that has been organized or presented in a meaningful way. Mathematical Biocomplexity research focuses on extracting information from biological systems and networks. In particular, we are interested in the information stored at the structural level. Structural information is crucial to the functioning and behavior of a biological system. However, interpreting structural information is challenging when the biological system is large or when it has a complicated “shape.”
We formulate mathematical principles that underpin the complexity of structural information and develop efficient strategies/algorithms to tackle this high complexity.
Our findings lay the foundation for quantitative analysis of neutral evolution, molecular interactions, and network dynamics.
Genetic Data Information
DNA nucleotide information is transcribed into RNA, stabilized by molecular folding. In a plethora of interactions, it is the specific folding configuration and not the particular sequence of nucleotides that determines biological functionality. On one hand, genetic sequences like viruses as well as noncoding RNAs can mutate significantly, and still preserve their phenotype. On the other hand, sequences like riboswitches can switch between two phenotypes without any sequence change. Understanding the sequencestructure relation as well as their "hidden" information proves to be challenging.
Our research focuses on extracting information embedded in the sequencestructure pair that is not readily discoverable using sequence alignment. Specifically, we utilize the folding map, that takes RNA sequences to secondary structures, in order to understand sequence mutation impact on the phenotypic changes and vice versa. This line of work gives a unique insight into evolution dynamics involving both sequences and structures.
Research Highlights
 We developed Boltzmann samplers of sequencestructure pairs with various biologically meaningful constraints (Hamming distance filtration, genus filtration, etc.). These samplers facilitate the computational study of the shapemodulated evolution of biomolecules, as well as the design of functional noncoding genes.
 We studied the mutational robustness of let7 miRNAs, in particular, with respect to multiplepoint mutations. We showed that native let7 genes exhibit higher mutational robustness compared to random inversefolded sequences. We further demonstrated the connection between the robustness of let7 miRNAs and cell differentiation across various organisms.
Ongoing Research
 We are investigating the mutational robustness of let7 genes in cancer patients, comparing it to that of healthy people. Cancer cells are poorly differentiated compared to normal cells. Due to the important role of let7 genes in cell differentiation, this line of work has potential application in predicting, preventing and treating cancers that corresponding to malfunctional let7 genes.
 We are studying the energyweighted density of the bicompatible sequences of riboswitch alternative structures, revealing the phenotypic transition signals for the identification of riboswitches. The transitional signal also helps us to understand the irreversible sequence mutations in diseases.
Selected Papers

The energyspectrum of biocompatible sequences
F. Huang, C. Barrett, and C. Reidys
We develop a bicompatible sequences sampler for two given structures that provides sequences which are thermodynamically stable to both structures. These bicompatible sequences are crucial to understanding phenotypic transitions. We employ this sampler to analyze riboswitch sequences. We show that the two alternative structures of riboswitches are highly accessible to each other when compared to random structure pairs.

Sequencestructure relations of biopolymers
Bioinformatics, 33(3): 382389. (2017) C. Barrett, F. Huang, and C. Reidys
We develop a sequence sampler that provides sequences which are thermodynamically stable to a given RNA structure.We use this sampler to present a detailed analysis of native genetic sequencestructure pairs. We show that sequences sampled from native structures exhibit significantly distinct signals from the ones sampled from random structures, suggesting intrinsic relevant patterns on native sequences.
Software
 HamSampler is a C program that samples RNA sequences from a given RNA secondary structure with a given Boltzmann distribution as well as with Hamming distance filtration. It takes a secondary structure, a reference sequence and a given distance as input, and generates RNA sequences that are Boltzmann distributed having the specified fixed Hamming distance to the reference sequence.
 Bifold is a C program that samples RNA sequences for two structures. It takes two RNA secondary structures having the same length as input, and generates sequences that are thermodynamically stable to the two input structures.
Understanding LargeScale Networks
Researchers studied the complexity arising from largescale networks. One particular example is long noncoding RNAs (lncRNAs, typically 120 kB) viewed as contact graphs. Recently, lncRNAs have been found to be pervasively transcribed in the human and other mammalian genomes, and are increasingly associated with networks of epigenetic and posttranscriptional control in healthy and diseased biological systems. Understanding the secondary structure of lncRNAs is the key to unveiling its diverse functional roles in gene regulation. In particular, longrange intramolecular interactions in these large molecules are poorly understood and present a significant challenge to predicting secondary structures for lncRNAs.
We investigate the Boltzmann ensemble of secondary structures generated by statistical sampling, and identify the structural features by extracting the information of such an ensemble. We utilize an integrated experiment and computation approach to understand the secondary structure of lncRNAs. Our research facilitates the discovery of the regulation role of lncRNAs and further implications for molecular markers of diseases such as cancer.
Research Highlights
 We analyzed the distributions of various structural elements in lncRNAs, such as stacks, hairpin , interior and multiloops, and pseudoknots.
 We analyzed the length spectrum of basepairings in large RNA secondary structures. The longrange base pairings have length on the order of the sequence length.
 We developed a computational approach to identify the key features of basepairing interactions, by extracting information from largescale statistical sampling of secondary structures.
Ongoing Research
 We are currently working on developing an informationtheoretic framework that incorporates key structural features and experimental probing data in a hierarchical fashion to determine the secondary structure of lncRNAs.
 We further studied the relationship between the networks of basepair interactions in the ensemble and the dynamics generated by sequential update of information from experimental probing data.
Selected Papers
 The block spectrum of RNA pseudoknot structures
Journal of Mathematical Biology, 79: 791822 (2019) T. J. X. Li, C. Burris, and C. Reidys
We analyze the lengthspectrum of blocks in RNA pseudoknot structures. We prove that there almost surely exists a unique giant block and that with high probability any other block has finite length. We compute the probabilities of observing blocks of specific pseudoknot types, such as Htype and kissing hairpins. We show that sliding window methods for structure prediction of large RNAs are incompatible with the unique giant block.
 On an enhancement of RNA probing data using Information Theory
T. J. X. Li, and C. Reidys
We employ an informationtheoretic approach to identify a target structure in a Boltzmann ensemble of structures via chemical probing data. Our framework is centered around the ensemble tree: a hierarchical bipartition of the input ensemble that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. For a Boltzmann ensemble incorporating probing data, our framework correctly identifies the target with fidelity greater than 90%.
 The Rainbow Spectrum of RNA Secondary Structures
Society for Mathematical Biology, 80: 15141538 (2018) T. J. X. Li, and C. Reidys
We quantify the length spectrum of basepairs in RNA secondary structures. We show that there always exists a unique rainbowarc having length at the same order of the sequence length. This is the first theoretical proof for the almost sure existence of long range basepairings in large RNAs.
Software
 RNAStructureIdentifier is a software package for identifying the target secondary structure from an RNA structure ensemble. We present a demo of the ensemble tree.
Topological Complexity of Interacting Systems
The complexity of an interacting system also arises from its "shape", i.e., its topological complexity. Crossserial interactions (pseudoknots), as well as multiplestructure interactions (riboswitches), are crucial to the function of biomolecules. The increased complexity of these interactions brings new challenges as well as hints at new methods for understanding them. We construct mathematical models to measure the topological complexity of interacting systems and tackle such complexity using topological recursion. Our research can facilitate the detection and design of functional genes whose structures rank high on this topological complexity scale.
Research Highlights
 We apply simplicial homology and study the topological trace between two RNA secondary structures. Such a trace captures the mutually exclusive substructures that enable the "switching" mechanism in riboswitches. This also can be used to distinguish ncRNAs from different classes.
 We derived a novel method for transforming crossserial interactions into crossfree interactions. The method facilitates fast Boltzmann sampling and statistical analysis of RNA pseudoknot structures.
Ongoing Research
 We are developing topological approaches for identifying mutually exclusive substructures in the structure ensemble of a given sequence, locating switching sequences, and detecting potential riboswitches.
 We are designing efficient algorithms to construct riboswitch sequences with two desired stable configurations.
 We are extending the homology analysis to planar interaction structures in order to understand the interplay between ligand binding and folding.
Selected Papers
 Loop Homology of Bisecondary Structures
A. Bura, Q. He, and C. Reidys
A riboswitch is a noncoding RNA with two stable configurations. We investigate the "shape" complexity of the transformation between the two configurations using topology. We show that known riboswitches all lie in the same complexity class and subsequently derive a novel notion of "continuity'' in structural transformations.
 Topological language for RNA
Mathematical Biosciences, 282:109120 (2016) F. Huang, and C Reidys
We represent the base pairing interactions of an RNA structure as a fat graph where vertices are connected via ribbons. Such representations enable us to resolve cross serial interactions in pseudoknots and efficiently generate them.
 Topology of RNARNA interaction structures
J. Andersen, F. Huang, R. Penner, and C. Reidys
We study the interaction between two noncoding RNAs. We investigate the "building blocks" that control the complexity of such interactions using topological analysis.
 A topological framework for signed permutations
Discrete Mathematics, 340: 21612182 (2017) F. Huang, and C. Reidys
We develop a topological framework for signed permutations. A signed permutation is represented by a fat graph having a central vertex connected to ribbon edges. A reversal action on a signed permutation can be interpreted as vertex gluing, vertex slicing, and vertex halfflipping. We apply the framework and describe Pevzner’s theory on reversal distances for signed permutations from a topological perspective.