Some Bayes methods for biclustering and vector data with binary coordinates

Thumbnail Image
Date
2019-01-01
Authors
Chakraborty, Abhishek
Major Professor
Advisor
Stephen B. Vardeman
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

We consider Bayes methods for two problems that share a common need to partition index sets encoding commonalities between observations. The first is a biclustering problem. The second is inference for mixture models for $p$-vectors with binary coordinates.

Standard one-way clustering methods form homogeneous groups in a set of objects. Biclustering methods simultaneously cluster rows and columns of a rectangular dataset in such a way that responses are homogeneous for all row-cluster by column-cluster groups. Assuming that data entries follow a normal distribution with a bicluster-specific mean term and a common variance, we propose a Bayes methodology for biclustering and corresponding Markov Chain Monte Carlo (MCMC) algorithms. Our proposed method not only identifies homogeneous biclusters, but also generates plausible predictions for missing/unobserved entries in the potential rectangular dataset as illustrated through simulation studies and applications to real datasets.

In the second problem, we propose a tractable symmetric distribution for modeling multivariate vectors of 0's and 1's on $p$ dimensions that allows for nontrivial amounts of variation around some central value. We then consider Bayesian analysis of mixture models where the component distributions have this above form. Inferences are made from the posterior samples generated by MCMC algorithms. We also extend our proposed Bayesian mixture model analysis to datasets with missing entries. Model performance is illustrated through simulation studies and applications to real datasets.

Comments
Description
Keywords
Citation
DOI
Source
Subject Categories
Copyright
Thu Aug 01 00:00:00 UTC 2019