Improving RNA-seq transcript quantification
dc.contributor.advisor | Dorman, Karin | |
dc.contributor.advisor | Liu, Peng | |
dc.contributor.advisor | Dai, Xiongtao | |
dc.contributor.advisor | Niemi, Jarad | |
dc.contributor.advisor | Espin Palazon, Raquel | |
dc.contributor.author | Yuan, Lingnan | |
dc.contributor.department | Statistics (LAS) | |
dc.date.accessioned | 2023-01-10T20:08:10Z | |
dc.date.available | 2023-01-10T20:08:10Z | |
dc.date.embargo | 2025-01-10T00:00:00Z | |
dc.date.issued | 2022-12 | |
dc.date.updated | 2023-01-10T20:08:11Z | |
dc.description.abstract | RNA-seq is a deep sequencing technique used to analyze the expression of messenger RNA (mRNA) molecules (transcripts) in a cell or cells. Many existing tools for transcript quantification use the EM algorithm. This dissertation proposes several methods to improve the performance of these tools. In the first part of the dissertation, we incorporate EM acceleration methods, Anderson acceleration, SQUAREM and Quasi-Newton methods, in one of the most popular transcript quantification tools, Salmon. We show that the accelerated algorithms can speed up the original EM algorithm with no cost in accuracy. The performance is consistent across different initializations and data characteristics. Versions with back-tracking guarantee monotone convergence and boundary constraints with limited effect on the speed. In the second part of the dissertation, we focus on estimation methods that better reflect the sparsity found in bulk and especially single-cell RNA-seq data. We introduce a penalty function, designed for probabilities, in the optimization. The penalty encourages estimated transcript abundances to lie on a vertex or edge of the probability simplex, thus achieving both shrinkage and parsimony in the estimated transcript abundances. The penalized EM algorithm better distinguishes truly absent transcripts from expressed ones than the original EM, both in bulk and single-cell RNA-seq data. In the third part of the dissertation, we focus on more efficient calculation of the quantification uncertainty, or estimated standard errors, of transcript abundances. Current methods to estimate quantification uncertainty rely heavily on resampling methods, like bootstrap and Gibbs sampling, which require large number of expensive replicates for good accuracy. We demonstrate that the formulation derived using Louis' method can be used to estimate the quantification uncertainty without resampling. We demonstrate its utility on simulated data. All three methods should have broad utility in the quantification step of standard RNA-seq analyses. | |
dc.format.mimetype | ||
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/Nr1VX9Rz | |
dc.language.iso | en | |
dc.language.rfc3066 | en | |
dc.subject.disciplines | Statistics | en_US |
dc.subject.disciplines | Bioinformatics | en_US |
dc.subject.keywords | EM algorithm | en_US |
dc.subject.keywords | Penalty function | en_US |
dc.subject.keywords | Quantification uncertainty | en_US |
dc.subject.keywords | RNA-seq | en_US |
dc.title | Improving RNA-seq transcript quantification | |
dc.type | dissertation | en_US |
dc.type.genre | dissertation | en_US |
dspace.entity.type | Publication | |
relation.isOrgUnitOfPublication | 264904d9-9e66-4169-8e11-034e537ddbca | |
thesis.degree.discipline | Statistics | en_US |
thesis.degree.discipline | Bioinformatics | en_US |
thesis.degree.grantor | Iowa State University | en_US |
thesis.degree.level | dissertation | $ |
thesis.degree.name | Doctor of Philosophy | en_US |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Yuan_iastate_0097E_20621.pdf
- Size:
- 867.66 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 0 B
- Format:
- Item-specific license agreed upon to submission
- Description: