An updated method for correcting batch effect

Supplemental Files
Date
2021-01-01
Authors
Huo, Yonghui
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Statistics
Organizational Unit
Journal Issue
Series
Abstract

Abstract

I propose a novel variation (Pro-SVA) on iteratively reweighted surrogate variable analysis (IRW-SVA) for detecting and measuring batch effects in high dimensional gene expression data. Specifically, I propose to use the matrix-free high dimensional factor analysis (HDFA) algorithm instead of singular value decomposition (SVD) in the IRW-SVA iterations. HDFA efficiently provides the maximum likelihood estimates of the error variances and batch loadings, which can subsequently be used to estimate the batch factors. To evaluate the performance of Pro-SVA, I simulated 100 samples of 1,000 genes with batch effects and (1) no biological effects, (2) biological effects for half of the genes, or (3) biological effects for all genes. To compare the methods, I estimated the batch-induced correlation matrix using both methods and computed the relative Frobenius distance of this estimate to the true correlation matrix. The results show that Pro-SVA obtains better estimates of the correlation matrix than IRW-SVA in most cases, especially when there are no biological effects or when the biological covariate affects only half the genes. Therefore, Pro-SVA holds promise as a new approach to detect and account for batch effects in high-dimensional gene expression datasets.

Description
Keywords
batch effect, Singular Value Decomposition (SVD), high dimensional data, factor analysis model, gene expression data, RNA-seq
Citation
DOI
Source