Graph-based estimation and inference for high-dimensional data

dc.contributor.advisor Chu, Lynna
dc.contributor.advisor Dorman, Karin
dc.contributor.advisor Li, Chunlin
dc.contributor.advisor Liu, Peng
dc.contributor.advisor Nordman, Daniel
dc.contributor.author Bai, Yichuan
dc.contributor.department Statistics (LAS)
dc.date.accessioned 2025-02-11T17:26:22Z
dc.date.available 2025-02-11T17:26:22Z
dc.date.embargo 2027-02-11T00:00:00Z
dc.date.issued 2024-12
dc.date.updated 2025-02-11T17:26:23Z
dc.description.abstract High-dimensional data are becoming increasingly influential and are of considerable interest within the statistical community. Existing tools may be inadequate when confronted with high-dimensional data or lack theoretical support when the dimension increases. The graph-based approach is a framework that uses the similarity graph as input to address various statistical problems. We develop graph-based statistics to target fundamental statistics problem. Due to its flexibility and power, these graph-theoretic statistics are well suited for high-dimensional settings across various problems. This dissertation includes three studies. The first study focuses on the two-sample test problem. A robust graph-based test is introduced that could overcome the effect of hubness, which is defined as a node in the similarity graph with a large node degree, presented in high-dimensional data. The second study explores the estimation of the number of clusters. Choosing the number of clusters could be challenging for high-dimensional data since the existing methods usually utilize the within or between cluster dispersion, which may be inefficient in high-dimension. We proposed a graph-based method to choose the number of clusters based on the true densities. The consistency of the estimated number of clusters is provided. The third study investigates testing the random effects in the mixed effects model. Without estimation of any parameters, a graph-based method is proposed that can test whether the random effect is zero in the mixed model with high-dimensional fixed effects.
dc.format.mimetype PDF
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/arY4AG4v
dc.language.iso en
dc.language.rfc3066 en
dc.subject.disciplines Statistics en_US
dc.subject.keywords Graph-based en_US
dc.subject.keywords Inference en_US
dc.subject.keywords Unsupervised learning en_US
dc.title Graph-based estimation and inference for high-dimensional data
dc.type dissertation en_US
dc.type.genre dissertation en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.discipline Statistics en_US
thesis.degree.grantor Iowa State University en_US
thesis.degree.level dissertation $
thesis.degree.name Doctor of Philosophy en_US
File
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: