Statistical methods for tensor-valued data

Thumbnail Image
Date
2022-05
Authors
Llosa-Vite, Carlos Jonathan
Major Professor
Advisor
Maitra, Ranjan
Caragea, Petrutza
Dutta, Somak
Kaiser, Mark S
Meeker, William Q
Niemi, Jarad
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Tensors are multidimensional arrays of numbers. As data-collecting technologies get more sophisticated, tensor-valued data become ubiquitous in many fields of science. Considering the tensor-variate structure of these data can aid with estimation and inference, in many cases making it possible. In this dissertation we propose three statistical methodologies that accommodate tensor-variate data. First, in Chapter two we propose a regression model that accommodates tensor-valued data as the response and as the covariate by assuming the CP, Tucker and TR formats on the tensor-variate regression coefficient (the mean structure), and a Kronecker-separable covariance structure. We develop algorithms for maximum likelihood estimation of the low-rank coefficients, and derive their sampling distributions and computational complexities. Moreover, we apply our methodology to extend ANOVA to the tensor-variate realm. Our methodology is motivated by two real-data applications. The first one involves detecting biological markers of subjects at risk of attempting suicide, and the second one involves distinguishing facial characteristics from face imaging data. In Chapter three we provide a theoretical study of tensor-variate elliptically contoured distributions that have a Kronecker-separable covariance structure. We study fundamental properties such as moments, characterizations, important sub-families, marginal and conditional distributions. We develop procedures for maximum likelihood estimation under various assumptions, one being tensor-on-tensor regression with non-Gaussian tensor-error distributions and CP, tensor ring and Tucker formats, and another being a robust Tyler-type estimator of the scale matrices. We apply the methodology to distinguish facial characteristics from imaging data, and also in the classification of cat and dog images through discriminant analysis: in both cases the tensor-variate-t distributions with learned degrees of freedom outperforms its TVN counterpart. Finally, in Chapter four we study the simplifying tensor-variate structure that is dictated by the real-discrete Fourier transform (RDFT). We first derive the structure of covariance matrices with eigenvectors given by the RDFT modes, and extend it to the tensor-variate realm by defining tensor-variate Fourier covariance structure (FTCS). We demonstrate that while FTCS allows for significant parameter reduction of the tensor-variance (which is now explained only through its eigenvalues), it has a deep connection with other covariance structure models in literature. We study a flexible family of tensor-variate distributions that attain such mean and tensor-variance structures, and demonstrate that further substantial parameter reduction can be attained by considering only points that are contained within certain RDFT frequencies. We demonstrate that the Fourier-tensor-variate-t distribution outperforms several other models in terms of BIC at distinguishing facial characteristics, even when all the RDFT frequencies are considered. We also use the Fourier TVN distribution to classify mango leaves that are infected with the potentially devastating Fungal disease Anthracnose, achieving 90% prediction accuracy by considering only a few RDFT frequency bands that we select through cross-validation.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation
Comments
Rights Statement
Copyright
Funding
DOI
Supplemental Resources
Source