Clustering with semi metrics

With the advent of Human Genome Project and other genome sequencing efforts, we are now faced with the challenge of developing not only new methods of data analysis but also improving the already existing methods of data analysis so that they can be better used to take advantage of the data. Here we revisit clustering as a tool for large-scale gene expression(or any other data) analysis. Distance measures are an integral part of any clustering algorithm as a means of checking how similar two objects are. The present distance measures are not flexible enough to capture user defined similarities. We define a similarity measure called Universal Distance Measure (UDM) that is flexible enough to capture any abstract notion of similarity. UDM incorporates the user view about two or more objects being same to come up with a well defined distance measure. Further we also investigate how sub clusters interact as a group to form new clusters. We call this interaction Methodology of Joining (MOJ) and study various aspects of it. We believe the use of UDM and MOJ will help us to capture more complex relations that we are looking for in the data.

