Data clustering using proximity matrices with missing values

dc.contributor.author Karimzadeh, Samira
dc.contributor.author Olafsson, Sigurdur
dc.contributor.department Industrial and Manufacturing Systems Engineering
dc.date 2019-03-11T21:13:12.000
dc.date.accessioned 2020-06-30T04:48:24Z
dc.date.available 2020-06-30T04:48:24Z
dc.date.copyright Tue Jan 01 00:00:00 UTC 2019
dc.date.embargo 2021-02-21
dc.date.issued 2019-07-15
dc.description.abstract <p>In most applications of data clustering the input data includes vectors describing the location of each data point, from which distances between data points can be calculated and a proximity matrix constructed. In some applications, however, the only available input is the proximity matrix, that is, the distances between each pair of data point. Several clustering algorithms can still be applied, but if the proximity matrix has missing values no standard method is directly applicable. Imputation can be done to replace missing values, but most imputation methods do not apply when only the proximity matrix is available. As a partial solution to fill this gap, we propose the Proximity Matrix Completion (PMC) algorithm. This algorithm assumes that data is missing due to one of two reasons: complete dissimilarity or incomplete observations; and imputes values accordingly. To determine which case applies the data is modeled as a graph and a set of maximum cliques in the graph is found. Overlap between cliques then determines the case and hence the method of imputation for each missing data point. This approach is motivated by an application in plant breeding, where what is needed is to cluster new experimental seed varieties into sets of varieties that interact similarly to the environment, and this application is presented as a case study in the paper. The applicability, limitations and performance of the new algorithm versus other methods of imputation are further studied by applying it to datasets derived from three well-known test datasets.</p>
dc.description.comments <p>This is a manuscript of an article published as Karimzadeh, Samira, and Sigurdur Olafsson. "Data Clustering using Proximity Matrices with Missing Values." <em>Expert Systems with Applications</em> 126 (2019): 265-276. DOI: <a href="http://dx.doi.org/10.1016/j.eswa.2019.02.022" target="_blank">10.1016/j.eswa.2019.02.022</a>. Posted with permission.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/imse_pubs/202/
dc.identifier.articleid 1203
dc.identifier.contextkey 13936077
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath imse_pubs/202
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/44499
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/imse_pubs/202/2019_OlafssonSigurdur_DataClustering.pdf|||Fri Jan 14 22:21:16 UTC 2022
dc.source.uri 10.1016/j.eswa.2019.02.022
dc.subject.disciplines Operational Research
dc.subject.disciplines Systems Engineering
dc.subject.keywords Clustering
dc.subject.keywords Imputation
dc.subject.keywords Missing values
dc.subject.keywords Proximity matrix
dc.title Data clustering using proximity matrices with missing values
dc.type article
dc.type.genre article
dspace.entity.type Publication
relation.isAuthorOfPublication 485e1458-0389-4fa4-bf89-a25dec27125d
relation.isOrgUnitOfPublication 51d8b1a0-5b93-4ee8-990a-a0e04d3501b1
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2019_OlafssonSigurdur_DataClustering.pdf
Size:
7.99 MB
Format:
Adobe Portable Document Format
Description:
Collections