Improving RNA-seq transcript quantification

Yuan, Lingnan

Improving RNA-seq transcript quantification

dc.contributor.advisor	Dorman, Karin
dc.contributor.advisor	Liu, Peng
dc.contributor.advisor	Dai, Xiongtao
dc.contributor.advisor	Niemi, Jarad
dc.contributor.advisor	Espin Palazon, Raquel
dc.contributor.author	Yuan, Lingnan
dc.contributor.department	Statistics (LAS)
dc.date.accessioned	2023-01-10T20:08:10Z
dc.date.available	2023-01-10T20:08:10Z
dc.date.embargo	2025-01-10T00:00:00Z
dc.date.issued	2022-12
dc.date.updated	2023-01-10T20:08:11Z
dc.description.abstract	RNA-seq is a deep sequencing technique used to analyze the expression of messenger RNA (mRNA) molecules (transcripts) in a cell or cells. Many existing tools for transcript quantification use the EM algorithm. This dissertation proposes several methods to improve the performance of these tools. In the first part of the dissertation, we incorporate EM acceleration methods, Anderson acceleration, SQUAREM and Quasi-Newton methods, in one of the most popular transcript quantification tools, Salmon. We show that the accelerated algorithms can speed up the original EM algorithm with no cost in accuracy. The performance is consistent across different initializations and data characteristics. Versions with back-tracking guarantee monotone convergence and boundary constraints with limited effect on the speed. In the second part of the dissertation, we focus on estimation methods that better reflect the sparsity found in bulk and especially single-cell RNA-seq data. We introduce a penalty function, designed for probabilities, in the optimization. The penalty encourages estimated transcript abundances to lie on a vertex or edge of the probability simplex, thus achieving both shrinkage and parsimony in the estimated transcript abundances. The penalized EM algorithm better distinguishes truly absent transcripts from expressed ones than the original EM, both in bulk and single-cell RNA-seq data. In the third part of the dissertation, we focus on more efficient calculation of the quantification uncertainty, or estimated standard errors, of transcript abundances. Current methods to estimate quantification uncertainty rely heavily on resampling methods, like bootstrap and Gibbs sampling, which require large number of expensive replicates for good accuracy. We demonstrate that the formulation derived using Louis' method can be used to estimate the quantification uncertainty without resampling. We demonstrate its utility on simulated data. All three methods should have broad utility in the quantification step of standard RNA-seq analyses.
dc.format.mimetype	PDF
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/Nr1VX9Rz
dc.language.iso	en
dc.language.rfc3066	en
dc.subject.disciplines	Statistics	en_US
dc.subject.disciplines	Bioinformatics	en_US
dc.subject.keywords	EM algorithm	en_US
dc.subject.keywords	Penalty function	en_US
dc.subject.keywords	Quantification uncertainty	en_US
dc.subject.keywords	RNA-seq	en_US
dc.title	Improving RNA-seq transcript quantification
dc.type	dissertation	en_US
dc.type.genre	dissertation	en_US
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.discipline	Statistics	en_US
thesis.degree.discipline	Bioinformatics	en_US
thesis.degree.grantor	Iowa State University	en_US
thesis.degree.level	dissertation	$
thesis.degree.name	Doctor of Philosophy	en_US

File

Original bundle

Now showing 1 - 1 of 1

Name:: Yuan_iastate_0097E_20621.pdf
Size:: 867.66 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations