Random forest robustness, variable importance, and tree aggregation

Sage, Andrew

Random forest robustness, variable importance, and tree aggregation

dc.contributor.advisor	Ulrike Genschel
dc.contributor.advisor	Dan Nettleton
dc.contributor.author	Sage, Andrew
dc.contributor.department	Department of Statistics (LAS)
dc.date	2018-08-11T08:02:52.000
dc.date.accessioned	2020-06-30T03:11:12Z
dc.date.available	2020-06-30T03:11:12Z
dc.date.copyright	Sun Apr 01 00:00:00 UTC 2018
dc.date.embargo	2001-01-01
dc.date.issued	2018-01-01
dc.description.abstract	<p>Random forest methodology is a nonparametric, machine learning approach capable of strong performance in regression and classification problems involving complex datasets. In addition to making predictions, random forests can be used to assess the relative importance of explanatory variables. In this dissertation, we explore three topics related to random forests: tree aggregation, variable importance, and robustness. In Chapter 2, we show that the method of tree aggregation used in one popular random forest implementation can lead to biased class probability estimates and that it is often beneficial to combine the tree partitioning algorithm used in one implementation with the aggregation scheme used in another. In Chapter 3, we show that imputing missing values proir to assessing variable importance often leads to inaccurate variable importance measures. Using simulation studies, we investigate the impact on variable importance of six random-forest-based imputation techniques and find that some techniques are prone to overestimating the importance of variables whose values have been imputed, while other techniques tend to underestimate the importance of such variables. In Chapter 4, we propose a new robust approach for random forest regression. Adapted from a popular approach used in polynomial regression, our method uses residual analysis to modify the weights associated with training cases in random forest predictions, so that outlying training cases have less impact. We show, using simulation studies, that this approach outperforms existing robust techniques on noisy, contaminated datasets.</p>
dc.format.mimetype	application/pdf
dc.identifier	archive/lib.dr.iastate.edu/etd/16453/
dc.identifier.articleid	7460
dc.identifier.contextkey	12331508
dc.identifier.doi	https://doi.org/10.31274/etd-180810-6083
dc.identifier.s3bucket	isulib-bepress-aws-west
dc.identifier.submissionpath	etd/16453
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/30636
dc.language.iso	en
dc.source.bitstream	archive/lib.dr.iastate.edu/etd/16453/Sage_iastate_0097E_17242.pdf\|\|\|Fri Jan 14 21:00:34 UTC 2022
dc.subject.disciplines	Statistics and Probability
dc.title	Random forest robustness, variable importance, and tree aggregation
dc.type	dissertation
dc.type.genre	dissertation
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	264904d9-9e66-4169-8e11-034e537ddbca
thesis.degree.discipline	Statistics
thesis.degree.level	dissertation
thesis.degree.name	Doctor of Philosophy

File

Original bundle

Now showing 1 - 1 of 1

Name:: Sage_iastate_0097E_17242.pdf
Size:: 940.76 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Theses and Dissertations