GDCB Seminar: Unique molecular identifiers for more accurate counting of reads

Tuesday, April 13, 2021 - 4:10pm to 5:00pm
Event Type: 


Speaker: Karin Dorman, Iowa State University Department of Genetics, Development and Cell Biology and Department of Statistics professor

Title: Unique molecular identifiers for more accurate counting of reads

Abstract provided for Karin Dorman seminarAbstract: A multitude of standard and not so standard experimental techniques now rely on high throughput sequencing to quantify the abundance of features, such as microbes, transcripts, binding sites, interactions, in biological samples. Most of these protocols involve PCR amplification, which distorts feature abundance and the features themselves through rare, but amplified errors. Subsequent error-prone sequencing then further distorts the features. In some protocols, such as scRNA-seq and immunological profiling, it is now routine to add Unique Molecular Identifiers (UMIs) prior to amplification. Reads with the same UMI tag are amplified products of one originally sampled molecule, which allows amplification bias to be eliminated by grouping reads into clusters representing true biological sequences and counting UMIs instead of reads per cluster to quantify molecular abundance. Furthermore, reads with the same UMI can be combined to produce highly accurate estimates of the true biological sequence. There are now several UMI-aware deduplication and quantification tools offered by the bioinformatics community, but they may fail to achieve accurate estimation of molecular counts. I introduce Deduplication and accurate Abundance estimation with UMIs (DAUMI), a novel probabilistic method and software to detect and quantify true biological sequences. DAUMI recognizes UMI collision, where the same UMI is used more than once, plus it can detect and correct PCR and sequencing errors in the UMI and sampled sequences. Focusing on amplicon data, I benchmark the approach against other UMI-aware clustering methods, demonstrating DAUMI performs better on both simulated and real datasets, particularly achieving higher accuracy on datasets with UMI collision.

This is joint work with Xiyu Peng, a graduate student in the Bioinformatics and Computational Biology Program.

Meeting link:

April 13, 2021, GDCB Seminar flyer (Karin Dorman)