A separability index for clustering and classification problems with applications to cluster merging and systematic evaluation of clustering algorithms

Thumbnail Image
Date
2011-01-01
Authors
Peterson, Anna
Major Professor
Advisor
Arka P. Ghosh
Ranjan Maitra
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

A separability index quantifying the degree of difficulty in a hard clustering problem is proposed under assumptions of a multivariate Gaussian distribution for each group. We first define a preliminary index and explore its properties both theoretically and numerically. Adjustments are then made to this index so that the final refinement is also interpretable in terms of the Adjusted Rand Index between a true grouping and its hypothetical idealized clustering, taken as a surrogate of clustering complexity. Our derived index is used to develop a data-simulation algorithm that generates samples according to the prescribed value of the index. This algorithm is particularly useful for systematically generating datasets with varying degrees of clustering difficulty which we use to evaluate performance of different clustering algorithms. The index is also shown to be useful in providing a summary of the distinctiveness of classes in grouped datasets.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Sat Jan 01 00:00:00 UTC 2011