Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms

Thumbnail Image
Date
2010-01-01
Authors
Maitra, Ranjan
Melnykov, Volodymyr
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

A new method is proposed to generate sample Gaussian mixture distributions according to prespecified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying prespecified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering. This article has supplementary material online.

Comments

This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Computational and Graphical Statistics on January 2012, available online : http://www.tandf.com/10.1198/jcgs.2009.08054.

Description
Keywords
Citation
DOI
Subject Categories
Copyright
Fri Jan 01 00:00:00 UTC 2010
Collections