Statistical methods for tensor-valued data

Llosa-Vite, Carlos  Jonathan

Statistical methods for tensor-valued data

File

LlosaVite_iastate_0097E_19997.pdf (19.08 MB)

Date

2022-05

Authors

Llosa-Vite, Carlos Jonathan

Advisor

Maitra, Ranjan

Caragea, Petrutza

Dutta, Somak

Kaiser, Mark S

Meeker, William Q

Niemi, Jarad

Abstract

Tensors are multidimensional arrays of numbers. As data-collecting technologies get more sophisticated, tensor-valued data become ubiquitous in many fields of science. Considering the tensor-variate structure of these data can aid with estimation and inference, in many cases making it possible. In this dissertation we propose three statistical methodologies that accommodate tensor-variate data. First, in Chapter two we propose a regression model that accommodates tensor-valued data as the response and as the covariate by assuming the CP, Tucker and TR formats on the tensor-variate regression coefficient (the mean structure), and a Kronecker-separable covariance structure. We develop algorithms for maximum likelihood estimation of the low-rank coefficients, and derive their sampling distributions and computational complexities. Moreover, we apply our methodology to extend ANOVA to the tensor-variate realm. Our methodology is motivated by two real-data applications. The first one involves detecting biological markers of subjects at risk of attempting suicide, and the second one involves distinguishing facial characteristics from face imaging data. In Chapter three we provide a theoretical study of tensor-variate elliptically contoured distributions that have a Kronecker-separable covariance structure. We study fundamental properties such as moments, characterizations, important sub-families, marginal and conditional distributions. We develop procedures for maximum likelihood estimation under various assumptions, one being tensor-on-tensor regression with non-Gaussian tensor-error distributions and CP, tensor ring and Tucker formats, and another being a robust Tyler-type estimator of the scale matrices. We apply the methodology to distinguish facial characteristics from imaging data, and also in the classification of cat and dog images through discriminant analysis: in both cases the tensor-variate-t distributions with learned degrees of freedom outperforms its TVN counterpart. Finally, in Chapter four we study the simplifying tensor-variate structure that is dictated by the real-discrete Fourier transform (RDFT). We first derive the structure of covariance matrices with eigenvectors given by the RDFT modes, and extend it to the tensor-variate realm by defining tensor-variate Fourier covariance structure (FTCS). We demonstrate that while FTCS allows for significant parameter reduction of the tensor-variance (which is now explained only through its eigenvalues), it has a deep connection with other covariance structure models in literature. We study a flexible family of tensor-variate distributions that attain such mean and tensor-variance structures, and demonstrate that further substantial parameter reduction can be attained by considering only points that are contained within certain RDFT frequencies. We demonstrate that the Fourier-tensor-variate-t distribution outperforms several other models in terms of BIC at distinguishing facial characteristics, even when all the RDFT frequencies are considered. We also use the Fourier TVN distribution to classify mango leaves that are infected with the potentially devastating Fungal disease Anthracnose, achieving 90% prediction accuracy by considering only a few RDFT frequency bands that we select through cross-validation.

Academic or Administrative Unit

Statistics (LAS)

Type

dissertation