Two sample inference for high dimensional data and nonparametric variable selection for census data

dc.contributor.advisor Song Xi Chen Li, Jun
dc.contributor.department Statistics 2018-08-11T12:14:17.000 2020-06-30T02:48:38Z 2020-06-30T02:48:38Z Tue Jan 01 00:00:00 UTC 2013 2015-07-30 2013-01-01
dc.description.abstract <p>In the first part of this thesis, we address the question of how new testing methods can be developed for two sample inference for high dimensional data. Particularly, chapter 2 focuses on testing the equality of two high dimensional covariance matrices, which can be directly applied to evaluating the difference in genetic correlation for</p> <p>different populations subject to various biological conditions. As we will demonstrate in chapter 2 , the test we propose has no normality assumption and also allows the dimension to be much larger than the sample sizes. These two aspects surpass the capacity of the classical tests such as the likelihood ratio test. Testing the equality of high dimensional mean vectors is another important two-sample testing problem. Most tests for the equality of two mean vectors are not powerful against sparse alternative in the sense that the difference of two population mean vectors only spreads out over a small number of coordinates. In chapter 3, we propose two tests designed to obtain better power performance against sparse alternative by conducting both variance reduction and signal enhancement through thresholding and transformation, respectively.</p> <p>The second part of this thesis is on variable selection for census data. Human populations are heterogeneous in that the probability of enumerating an individual depends on the characteristics of the individual. For the US Census, a group of variables is chosen to reflect much of the heterogeneity and the relevance of these variables to the enumeration function needs to be investigated. In chapter 4, we introduce a nonparametric variable selection method based on the optimal bandwidths obtained by minimizing the cross- validation function. The relevance of each variable to the enumeration function is reflected by the asymptotic convergence of associated optimal bandwidth. Also to formally test the significance of each variable, a bootstrap procedure is introduced.</p>
dc.format.mimetype application/pdf
dc.identifier archive/
dc.identifier.articleid 4290
dc.identifier.contextkey 4615778
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath etd/13283
dc.language.iso en
dc.source.bitstream archive/|||Fri Jan 14 19:48:43 UTC 2022
dc.subject.disciplines Statistics and Probability
dc.subject.keywords High dimensional data
dc.subject.keywords Kernel Smoothing
dc.subject.keywords Large p small n
dc.title Two sample inference for high dimensional data and nonparametric variable selection for census data
dc.type article
dc.type.genre dissertation
dspace.entity.type Publication
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca dissertation Doctor of Philosophy
Original bundle
Now showing 1 - 1 of 1
2.71 MB
Adobe Portable Document Format