Novel data clustering methods and applications
Is Version Of
The need to interpret and extract possible inferences from high-dimensional data sets has led over the past decades to the development of dimensionality reduction and data clustering techniques. Scientific and technological applications of clustering methodologies include among others bioinformatics, biomedical image analysis and biological data mining. Current research in data clustering focuses on identifying and exploiting information on dataset geometry and on developing robust algorithms for noisy datasets. Recent approaches based on spectral graph theory have been devised to efficiently handle dataset geometries exhibiting a manifold structure, and fuzzy clustering methods have been developed that assign cluster membership probabilities to data that cannot be readily assigned to a specific cluster.
In this thesis, we develop a family of new data clustering algorithms that combine the strengths of existing spectral approaches to clustering with various desirable properties of fuzzy methods. More precisely, we consider a slate of "random-walk" distances arising in the context of several weighted graphs formed from the data set, which allow to assign "fuzzy" variables to data points which respect in many ways their geometry. The developed methodology groups together data which are in a sense "well-connected", as in spectral clustering, but also assigns to them membership values as in other commonly used fuzzy clustering approaches. This approach is very well suited for image analysis applications and, in particular, we use it to develop a novel facial recognition system that outperforms other well-established methods.