A residual-based approach for robust random fore st regression
Date
2021-07-08
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
International Press
Abstract
We introduce a novel robust approach for random forest regression that is useful when the conditional distribution of the response variable, given predictor values, is contaminated. Residual analysis is used to identify unusual response values in training data, and the contributions of these values are down-weighted accordingly. This approach is motivated by a robust fitting procedure first proposed in the context of locally weighted polynomial regression and scatterplot smoothing. We demonstrate that tuning the parameter in the robustness algorithm using a weighted crossvalidation approach is advantageous when contamination is suspected in training data responses. We conduct extensive simulations, comparing our method to existing robust approaches, some of which have not been compared to one another in prior studies. Our approach outperforms existing techniques on noisy training datasets with response contamination. While no approach is uniformly optimal, ours is consistently competitive with the best existing approaches for robust random forest regression.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
article
Comments
This article is published as Sage, A. J., Genschel, U., Nettleton, D. (2021). A residual-based approach for robust random forest regression. Statistics and Its Interface. 14(4), 389–402. doi:https://dx.doi.org/10.4310/20-SII660.
Rights Statement
Posted with permission