Principal Components Analysis of Discrete Datasets Zhu, Yifan
dc.contributor.department Statistics
dc.contributor.majorProfessor Ranjan Maitra 2019-09-22T11:14:44.000 2020-06-30T01:32:30Z 2020-06-30T01:32:30Z Mon Jan 01 00:00:00 UTC 2018 2018-01-01
dc.description.abstract <p>We propose a Gaussian copula based method to perform principal component analysis for discrete data. By assuming the data are from a discrete distributions in the Gaussian copula family, we can consider the discrete random vectors are generated from a latent multivariate normal random vector. So we first obtain an estimate of the correlation matrix of latent multivariate normal distribution, then we use the estimated latent correlation matrix to get the estimates of principal components. We also focus on the case when we have categorical sequence data with multinomial marginal distribution. In this case the marginal distribution is not univariate and thus the usual Gaussian copula does not fit here. The optimal mapping method is proposed to convert the original data with multivariate discrete marginals to the mapped data with univariate marginals. Then the usual Gaussian copula can be used to model the mapped data, and we apply the discrete principal component analysis to the mapped data. The senators' voting data was used in the experiment as an example. Finally, we also propose a matrix Gaussian copula method to deal with data with multivariate marginals. It can be considered as an extension of Gaussian copula, and we use the latent correlation matrix in the matrix Gaussian copula to obtain the principal components.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 1128
dc.identifier.contextkey 13405203
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath creativecomponents/121
dc.source.bitstream archive/|||Fri Jan 14 19:13:01 UTC 2022
dc.subject.disciplines Categorical Data Analysis
dc.subject.disciplines Statistical Methodology
dc.subject.keywords Gaussian copula
dc.subject.keywords PCA
dc.subject.keywords dimension reduction
dc.subject.keywords discrete data
dc.title Principal Components Analysis of Discrete Datasets
dc.type article
dc.type.genre creativecomponent
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca Statistics creativecomponent
Original bundle
Now showing 1 - 1 of 1
953.98 KB
Adobe Portable Document Format