Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering

Thumbnail Image
Date
2018-01-01
Authors
Almodóvar-Rivera, Israel
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

Standard clustering algorithms usually find regular-structured clusters such as ellipsoidally- or spherically-dispersed groups, but are more challenged with groups lacking formal structure or definition. Syncytial clustering is the name that we introduce for methods that merge groups obtained from standard clustering algorithms in order to reveal complex group structure in the data. Here, we develop a distribution-free fully-automated syncytial clustering algorithm that can be used with k-means and other algorithms. Our approach computes the cumulative distribution function of the normed residuals from an appropriately fit k-groups model and calculates the nonparametric overlap between each pair of groups. Groups with high pairwise overlaps are merged as long as the generalized overlap decreases. Our methodology is always a top performer in identifying groups with regular and irregular structures in several datasets. The approach is also used to identify the distinct kinds of gamma ray bursts in the Burst and Transient Source Experiment 4Br catalog and also the distinct kinds of activation in a functional Magnetic Resonance Imaging study.

Series Number
Journal Issue
Is Version Of
Article
Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering
(Journal of Machine Learning Research, 2020-06) Almodóvar-Rivera, Israel A. ; Maitra, Ranjan ; Statistics (CALS)
Commonly-used clustering algorithms usually find ellipsoidal, spherical or other regular-structured clusters, but are more challenged when the underlying groups lack formal structure or definition. Syncytial clustering is the name that we introduce for methods that merge groups obtained from standard clustering algorithms in order to reveal complex group structure in the data. Here, we develop a distribution-free fully-automated syncytial clustering algorithm that can be used with k-means and other algorithms. Our approach estimates the cumulative distribution function of the normed residuals from an appropriately fit k-groups model and calculates the estimated nonparametric overlap between each pair of clusters. Groups with high pairwise overlap are merged as long as the estimated generalized overlap decreases. Our methodology is always a top performer in identifying groups with regular and irregular structures in several datasets and can be applied to datasets with scatter or incomplete records. The approach is also used to identify the distinct kinds of gamma ray bursts in the Burst and Transient Source Experiment 4Br catalog and the distinct kinds of activation in a functional Magnetic Resonance Imaging study.
Versions
Series
Academic or Administrative Unit
Type
article
Comments

This is a pre-print of the article Almodóvar-Rivera, Israel, and Ranjan Maitra. "Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering." arXiv preprint arXiv:1805.09505 (2018). Posted with permission.

Rights Statement
Copyright
Mon Jan 01 00:00:00 UTC 2018
Funding
Subject Categories
DOI
Supplemental Resources
Source
Collections