Principal Components Analysis of Discrete Datasets

Thumbnail Image
Date
2018-01-01
Authors
Zhu, Yifan
Major Professor
Ranjan Maitra
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Abstract

We propose a Gaussian copula based method to perform principal component analysis for discrete data. By assuming the data are from a discrete distributions in the Gaussian copula family, we can consider the discrete random vectors are generated from a latent multivariate normal random vector. So we first obtain an estimate of the correlation matrix of latent multivariate normal distribution, then we use the estimated latent correlation matrix to get the estimates of principal components. We also focus on the case when we have categorical sequence data with multinomial marginal distribution. In this case the marginal distribution is not univariate and thus the usual Gaussian copula does not fit here. The optimal mapping method is proposed to convert the original data with multivariate discrete marginals to the mapped data with univariate marginals. Then the usual Gaussian copula can be used to model the mapped data, and we apply the discrete principal component analysis to the mapped data. The senators' voting data was used in the experiment as an example. Finally, we also propose a matrix Gaussian copula method to deal with data with multivariate marginals. It can be considered as an extension of Gaussian copula, and we use the latent correlation matrix in the matrix Gaussian copula to obtain the principal components.

Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
creative component
Comments
Rights Statement
Copyright
Mon Jan 01 00:00:00 UTC 2018
Funding
Supplemental Resources
Source