Identifying Relevant Covariates in RNA-seq Analysis by Pseudo-Variable Augmentation

dc.contributor.author Nguyen, Yet
dc.contributor.author Nettleton, Dan
dc.contributor.department Statistics (CALS)
dc.date.accessioned 2024-11-07T14:39:31Z
dc.date.available 2024-11-07T14:39:31Z
dc.date.issued 2024-11-02
dc.description.abstract RNA-sequencing (RNA-seq) technology allows for the identification of differentially expressed genes, which are genes whose mean transcript abundance levels vary across conditions. In practice, RNA-seq datasets often include covariates that are of primary interest in addition to a set of covariates that are subject to selection. Some of these covariates may be relevant to gene expression levels, while others may be irrelevant. Ignoring relevant covariates or attempting to adjust for the effect of irrelevant covariates can compromise the identification of differentially expressed genes. To address this issue, we propose a variable selection method that uses pseudo-variables to control the expected proportion of selected covariates that are irrelevant. Our method accurately selects relevant covariates while keeping the false selection rate below a specified level. We demonstrate that our method outperforms existing methods for detecting differentially expressed genes when working with available covariates. Our method is implemented in FSRAnalysisBS function of the R package csrnaseq, which is available at www.github.com/ntyet/csrnaseq. The analysis and simulation are available at www.github.com/ntyet/csrnaseq/tree/main/analysis.
dc.description.comments This article is published as Nguyen, Y., Nettleton, D. Identifying Relevant Covariates in RNA-seq Analysis by Pseudo-Variable Augmentation. JABES (2024). https://doi.org/10.1007/s13253-024-00665-3.
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/jw27EMDv
dc.language.iso en
dc.publisher Springer Nature
dc.rights © 2024 The Author(s). This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
dc.source.uri https://doi.org/10.1007/s13253-024-00665-3 *
dc.subject.disciplines DegreeDisciplines::Physical Sciences and Mathematics::Statistics and Probability
dc.subject.keywords False selection rate
dc.subject.keywords False discovery rate
dc.subject.keywords Variable selection
dc.subject.keywords Differential expression analysis
dc.title Identifying Relevant Covariates in RNA-seq Analysis by Pseudo-Variable Augmentation
dc.type article
dspace.entity.type Publication
relation.isAuthorOfPublication 7d86677d-f28f-4ab1-8cf7-70378992f75b
relation.isOrgUnitOfPublication 5a1eba07-b15d-466a-a333-65bd63a4001a
File
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
2024-Nettleton-IdentifyingRelevant.pdf
Size:
514.72 KB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
Supplementary Materials for IdentifyingRelevant.pdf
Size:
505.03 KB
Format:
Adobe Portable Document Format
Description:
Collections