New aspects of statistical methods for missing data problems, with applications in bioinformatics and genetics

Thumbnail Image
Wang, Dong
Major Professor
Dan Nettleton
Song Xi Chen
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Organizational Unit
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of

As missing data problems become more commonplace in biological research and other areas, a method with relaxed assumptions while flexible enough to accommodate a wide range of situations is highly desired. We propose a nonparametric imputation method for data with missing values. The inference on the parameter defined by general estimating equations is performed using an empirical likelihood method. It is shown that the nonparametric imputation method together with empirical likelihood can reduce bias and improve efficiency of the estimate relative to inference using only complete cases of the dataset. The confidence regions obtained by empirical likelihood demonstrate good coverage properties. Since our method is valid under very weak assumptions while also possessing the flexibility inherent to estimating equations and empirical likelihood, it can be applied to a wide range of problems. An example is given using mouse eye weight and gene expression data;Missing data methods are also highly valuable from an experimental design point of view. We proposed a selective transcriptional profiling approach in improving the efficiency and affordability of genetical genomics research. The high cost of microarrays tends to limit the adoption of the standard genetical genomics approach. Our method is derived in a missing data framework, in which only a subset of objects are subjected to microarray experiments. It is shown that this approach can significantly reduce experimental cost while still achieving satisfactory power. To address the need for a nonparametric method, we developed empirical likelihood based inference for multi-sample comparison problems using data with surrogate variables. By applying this result to selective transcriptional profiling, we show that the idea of using relatively inexpensive trait data on extra individuals to improve the power of test for association between a QTL and gene transcriptional abundance also applies to the empirical likelihood based method.

Subject Categories
Sun Jan 01 00:00:00 UTC 2006