Some Bayes methods for biclustering and vector data with binary coordinates
Date
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
We consider Bayes methods for two problems that share a common need to partition index sets encoding commonalities between observations. The first is a biclustering problem. The second is inference for mixture models for $p$-vectors with binary coordinates.
Standard one-way clustering methods form homogeneous groups in a set of objects. Biclustering methods simultaneously cluster rows and columns of a rectangular dataset in such a way that responses are homogeneous for all row-cluster by column-cluster groups. Assuming that data entries follow a normal distribution with a bicluster-specific mean term and a common variance, we propose a Bayes methodology for biclustering and corresponding Markov Chain Monte Carlo (MCMC) algorithms. Our proposed method not only identifies homogeneous biclusters, but also generates plausible predictions for missing/unobserved entries in the potential rectangular dataset as illustrated through simulation studies and applications to real datasets.
In the second problem, we propose a tractable symmetric distribution for modeling multivariate vectors of 0's and 1's on $p$ dimensions that allows for nontrivial amounts of variation around some central value. We then consider Bayesian analysis of mixture models where the component distributions have this above form. Inferences are made from the posterior samples generated by MCMC algorithms. We also extend our proposed Bayesian mixture model analysis to datasets with missing entries. Model performance is illustrated through simulation studies and applications to real datasets.