An updated method for correcting batch effect Huo, Yonghui
dc.contributor.department Statistics
dc.contributor.majorProfessor Dorman Karin 2021-06-02T13:42:39.000 2021-08-14T03:33:59Z 2021-08-14T03:33:59Z Wed Jan 01 00:00:00 UTC 2020 2020-11-26 2021-01-01
dc.description.abstract <p>Abstract<br /><br />I propose a novel variation (Pro-SVA) on iteratively reweighted surrogate variable analysis (IRW-SVA) for detecting and measuring batch effects in high dimensional gene expression data. Specifically, I propose to use the matrix-free high dimensional factor analysis (HDFA) algorithm instead of singular value decomposition (SVD) in the IRW-SVA iterations. HDFA efficiently provides the maximum likelihood estimates of the error variances and batch loadings, which can subsequently be used to estimate the batch factors. To evaluate the performance of Pro-SVA, I simulated 100 samples of 1,000 genes with batch effects and (1) no biological effects, (2) biological effects for half of the genes, or (3) biological effects for all genes. To compare the methods, I estimated the batch-induced correlation matrix using both methods and computed the relative Frobenius distance of this estimate to the true correlation matrix. The results show that Pro-SVA obtains better estimates of the correlation matrix than IRW-SVA in most cases, especially when there are no biological effects or when the biological covariate affects only half the genes. Therefore, Pro-SVA holds promise as a new approach to detect and account for batch effects in high-dimensional gene expression datasets.</p>
dc.format.mimetype PDF
dc.identifier archive/
dc.identifier.articleid 1713
dc.identifier.contextkey 20315366
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath creativecomponents/748
dc.source.bitstream archive/|||Sat Jan 15 01:48:37 UTC 2022
dc.source.bitstream archive/|||Sat Jan 15 01:48:38 UTC 2022
dc.subject.disciplines Applied Statistics
dc.subject.disciplines Biostatistics
dc.subject.disciplines Microarrays
dc.subject.disciplines Multivariate Analysis
dc.subject.disciplines Statistical Methodology
dc.subject.disciplines Statistical Models
dc.subject.keywords batch effect
dc.subject.keywords Singular Value Decomposition (SVD)
dc.subject.keywords high dimensional data
dc.subject.keywords factor analysis model
dc.subject.keywords gene expression data
dc.subject.keywords RNA-seq
dc.title An updated method for correcting batch effect
dc.type article
dc.type.genre creativecomponent
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca Statistics creativecomponent
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
48.18 KB
Adobe Portable Document Format
No Thumbnail Available
12.54 KB
Microsoft Word XML