An updated method for correcting batch effect

dc.contributor.author Huo, Yonghui
dc.contributor.department Statistics
dc.contributor.majorProfessor Dorman Karin
dc.date 2021-06-02T13:42:39.000
dc.date.accessioned 2021-08-14T03:33:59Z
dc.date.available 2021-08-14T03:33:59Z
dc.date.copyright Wed Jan 01 00:00:00 UTC 2020
dc.date.embargo 2020-11-26
dc.date.issued 2021-01-01
dc.description.abstract <p>Abstract<br /><br />I propose a novel variation (Pro-SVA) on iteratively reweighted surrogate variable analysis (IRW-SVA) for detecting and measuring batch effects in high dimensional gene expression data. Specifically, I propose to use the matrix-free high dimensional factor analysis (HDFA) algorithm instead of singular value decomposition (SVD) in the IRW-SVA iterations. HDFA efficiently provides the maximum likelihood estimates of the error variances and batch loadings, which can subsequently be used to estimate the batch factors. To evaluate the performance of Pro-SVA, I simulated 100 samples of 1,000 genes with batch effects and (1) no biological effects, (2) biological effects for half of the genes, or (3) biological effects for all genes. To compare the methods, I estimated the batch-induced correlation matrix using both methods and computed the relative Frobenius distance of this estimate to the true correlation matrix. The results show that Pro-SVA obtains better estimates of the correlation matrix than IRW-SVA in most cases, especially when there are no biological effects or when the biological covariate affects only half the genes. Therefore, Pro-SVA holds promise as a new approach to detect and account for batch effects in high-dimensional gene expression datasets.</p>
dc.format.mimetype PDF
dc.identifier archive/lib.dr.iastate.edu/creativecomponents/748/
dc.identifier.articleid 1713
dc.identifier.contextkey 20315366
dc.identifier.doi https://doi.org/10.31274/cc-20240624-202
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath creativecomponents/748
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/9z0K39qr
dc.source.bitstream archive/lib.dr.iastate.edu/creativecomponents/748/An_updated_method_for_correcting_batch_effect.docx|||Sat Jan 15 01:48:37 UTC 2022
dc.source.bitstream archive/lib.dr.iastate.edu/creativecomponents/748/auto_convert.pdf|||Sat Jan 15 01:48:38 UTC 2022
dc.subject.disciplines Applied Statistics
dc.subject.disciplines Biostatistics
dc.subject.disciplines Microarrays
dc.subject.disciplines Multivariate Analysis
dc.subject.disciplines Statistical Methodology
dc.subject.disciplines Statistical Models
dc.subject.keywords batch effect
dc.subject.keywords Singular Value Decomposition (SVD)
dc.subject.keywords high dimensional data
dc.subject.keywords factor analysis model
dc.subject.keywords gene expression data
dc.subject.keywords RNA-seq
dc.title An updated method for correcting batch effect
dc.type article
dc.type.genre creativecomponent
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.discipline Statistics
thesis.degree.level creativecomponent
File
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
auto_convert.pdf
Size:
48.18 KB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
An_updated_method_for_correcting_batch_effect.docx
Size:
12.54 KB
Format:
Microsoft Word XML
Description: