An efficient k-means-type algorithm for clustering datasets with incomplete records

dc.contributor.author Lithio, Andrew
dc.contributor.author Maitra, Ranjan
dc.contributor.department Statistics
dc.date 2019-06-27T09:41:29.000
dc.date.accessioned 2020-07-02T06:56:55Z
dc.date.available 2020-07-02T06:56:55Z
dc.date.copyright Mon Jan 01 00:00:00 UTC 2018
dc.date.embargo 2019-09-19
dc.date.issued 2018-12-01
dc.description.abstract <p>The <em>k</em>‐means algorithm is arguably the most popular nonparametric clustering method but cannot generally be applied to datasets with incomplete records. The usual practice then is to either impute missing values under an assumed missing‐completely‐at‐random mechanism or to ignore the incomplete records, and apply the algorithm on the resulting dataset. We develop an efficient version of the <em>k</em>‐means algorithm that allows for clustering in the presence of incomplete records. Our extension is called <em>k</em><em>m</em>‐means and reduces to the <em>k</em>‐means algorithm when all records are complete. We also provide initialization strategies for our algorithm and methods to estimate the number of groups in the dataset. Illustrations and simulations demonstrate the efficacy of our approach in a variety of settings and patterns of missing data. Our methods are also applied to the analysis of activation images obtained from a functional magnetic resonance imaging experiment.</p>
dc.description.comments <p>This is the peer-reviewed version of the following article: Lithio, Andrew, and Ranjan Maitra. "An efficient k‐means‐type algorithm for clustering datasets with incomplete records." <em>Statistical Analysis and Data Mining: The ASA Data Science Journal</em> 11, no. 6 (2018): 296-311, which has been published in final form at DOI: <a href="http://dx.doi.org/10.1002/sam.11392" target="_blank">10.1002/sam.11392</a>. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving. Posted with permission.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/stat_las_pubs/167/
dc.identifier.articleid 1168
dc.identifier.contextkey 14423019
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath stat_las_pubs/167
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/90473
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/stat_las_pubs/167/2018_MaitraRanjan_EfficientKmeans.pdf|||Fri Jan 14 21:04:41 UTC 2022
dc.source.uri 10.1002/sam.11392
dc.subject.disciplines Categorical Data Analysis
dc.subject.keywords Amelia
dc.subject.keywords CARP
dc.subject.keywords fMRI
dc.subject.keywords imputation
dc.subject.keywords jump statistic
dc.subject.keywords k‐means++
dc.subject.keywords k‐POD
dc.subject.keywords mice
dc.subject.keywords SDSS
dc.subject.keywords soft constraints
dc.title An efficient k-means-type algorithm for clustering datasets with incomplete records
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2018_MaitraRanjan_EfficientKmeans.pdf
Size:
2.47 MB
Format:
Adobe Portable Document Format
Description:
Collections