Skip to main content

Priyanka Bhandary, Ph.D. Defense: 'Employing predictive techniques and expression-based information to functionally characterize orphan genes in Arabidopsis thaliana'

Apr 25, 2022 - 10:00 AM
to , -
See the full event:

Priyanka Bhandary, GDCB graduate student

 

 

 

 

 

 

 

Speaker: Priyanka Bhandary, GDCD graduate student (Bioinformatics and Computational Biology major) in GDCB Professor Eve Wurtele's lab

Title: "Employing predictive techniques and expression-based information to functionally characterize orphan genes in Arabidopsis thaliana"

Abstract: More than 15 petabases of raw RNAseq data is accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. In my first chapter, I have presented the lack of reusable RNA-Seq data available in the public databases using test cases of different databases and RNA-Seq for different organisms. I have further presented solutions for both the repositories and data depositors.

An essential part of understanding a genome is deciphering the function of the genes that it is composed of. It has been hypothesized that sets of genes with similar expression patterns across multiple spatial and temporal conditions could potentially be related in function. These group of genes characterized as regulons has been made available in Arabidopsis thaliana. Gaining regulon information for unannotated genes can give context to a scientist who can further validate these genes' potential functions. I have developed a machine learning framework to predict the regulon information for functionally unannotated genes using the massive amount of publicly available expression data for Arabidopsis thaliana.

Finally, I have performed meta-analysis with expression data for candidate orphan genes deciphered using expression data in MetaOmGraph (MOG). Orphan genes are those genes that share no homology to known genes and have been slow to equip the organism with a tool-set that assists in various aspects of survival. It has been 25 years since they were first described and there is still a lot to be learnt about how they originate and what their functions are. There have been orphan genes which been predicted in Arabidopsis thaliana using various methodologies. Understanding the context in which they are expressed could give an idea of their functionality. I have used phylostratigraphic and micro syntenic analysis to narrow down on Arabidopsis thaliana orphan genes using seven recently sequenced Arabidopsis thaliana accessions.  Differential Gene expression analysis under various stress conditions also discloses those differentially regulated orphan genes in those specific stresses. Co-expression and clustering analysis shed light into the modules that they express in. I have used Gene Ontology Enrichment to further gain an insight into the specific functionalities of the predicted candidate orphan genes. To choose the most suitable file for downstream analysis in MOG, I have used an empirical method using these GO clusters after performing various combinations of batch correction and normalizations. This chapter also provides templates for meta-analysis to develop potential functional hypothesis and the first-of-its-kind MOG project, which uses RNA-Seq data in Arabidopsis thaliana built as a community resource.

April 25, 2022: Priyanka Bhandary, Ph.D. Defense