Genetics, Development and Cell Biology Ph.D. candidate: Carla Mann
Major: Bioinformatics and Computational Biology
Major professor: Drena Dobbs, University Professor
Title: Applications of Machine Learning to Solve Biological Puzzles
Abstract: The era of “big data” has led to the generation of more biological data than any human could hope to process. This flood of data has necessitated the development of computational methods to assist in analysis, but also has made it possible to begin to model complex biological systems. Machine learning methods represent one avenue for modeling, and allow for the identification of intricate and often cryptic signals underlying biological processes.
In this dissertation, I present two machine learning models, RPIDisorder and MEDJED, which were developed to predict RNA-protein interaction partners (RPIPs) and DNA double-strand break (DSB) repair by the microhomology-mediated end joining (MMEJ) pathway, respectively.
RPIDisorder uses signals from protein and RNA sequences that have been used successfully in published RNA-protein partner prediction methods, but it is superior to existing methods because it also exploits signal from disordered protein regions to predict interactions with greater specificity than has been possible previously. This makes RPIDisorder more useful for modeling biologically relevant RNA-protein interaction networks, potentially leading to identification of novel targets for clinical interventions to treat the numerous cancers and neurological and metabolic disorders associated with disruptions in RNA-protein interactions.
MEDJED (Microhomology-Evoked Deletion Judication EluciDation) uses signal within and surrounding short stretches of homologous DNA sequence (microhomologies) on either side of a DSB introduced by a gene editing nuclease (e.g., CRISPR-Cas9 or TALEN) to predict the extent to which a targeted genomic site will be repaired using the MMEJ pathway. In this way, MEDJED can dramatically reduce the unpredictability that currently limits the utility of CRISPR-mediated gene editing for functional genomics and gene therapy applications.
Taken together, the results of these studies demonstrate that machine learning models can be valuable for identifying sequence signals that mediate macromolecular recognition, with many potential applications in both basic and applied research.