Two sample inference for high dimensional data and nonparametric variable selection for census data

dc.contributor.advisor Song Xi Chen
dc.contributor.author Li, Jun
dc.contributor.department Statistics
dc.date 2018-08-11T12:14:17.000
dc.date.accessioned 2020-06-30T02:48:38Z
dc.date.available 2020-06-30T02:48:38Z
dc.date.copyright Tue Jan 01 00:00:00 UTC 2013
dc.date.embargo 2015-07-30
dc.date.issued 2013-01-01
dc.description.abstract <p>In the first part of this thesis, we address the question of how new testing methods can be developed for two sample inference for high dimensional data. Particularly, chapter 2 focuses on testing the equality of two high dimensional covariance matrices, which can be directly applied to evaluating the difference in genetic correlation for</p> <p>different populations subject to various biological conditions. As we will demonstrate in chapter 2 , the test we propose has no normality assumption and also allows the dimension to be much larger than the sample sizes. These two aspects surpass the capacity of the classical tests such as the likelihood ratio test. Testing the equality of high dimensional mean vectors is another important two-sample testing problem. Most tests for the equality of two mean vectors are not powerful against sparse alternative in the sense that the difference of two population mean vectors only spreads out over a small number of coordinates. In chapter 3, we propose two tests designed to obtain better power performance against sparse alternative by conducting both variance reduction and signal enhancement through thresholding and transformation, respectively.</p> <p>The second part of this thesis is on variable selection for census data. Human populations are heterogeneous in that the probability of enumerating an individual depends on the characteristics of the individual. For the US Census, a group of variables is chosen to reflect much of the heterogeneity and the relevance of these variables to the enumeration function needs to be investigated. In chapter 4, we introduce a nonparametric variable selection method based on the optimal bandwidths obtained by minimizing the cross- validation function. The relevance of each variable to the enumeration function is reflected by the asymptotic convergence of associated optimal bandwidth. Also to formally test the significance of each variable, a bootstrap procedure is introduced.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/etd/13283/
dc.identifier.articleid 4290
dc.identifier.contextkey 4615778
dc.identifier.doi https://doi.org/10.31274/etd-180810-3479
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/13283
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/27472
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/etd/13283/Li_iastate_0097E_13624.pdf|||Fri Jan 14 19:48:43 UTC 2022
dc.subject.disciplines Statistics and Probability
dc.subject.keywords High dimensional data
dc.subject.keywords Kernel Smoothing
dc.subject.keywords Large p small n
dc.title Two sample inference for high dimensional data and nonparametric variable selection for census data
dc.type article
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.level dissertation
thesis.degree.name Doctor of Philosophy
File
Original bundle
Now showing 1 - 1 of 1
Name:
Li_iastate_0097E_13624.pdf
Size:
2.71 MB
Format:
Adobe Portable Document Format
Description: