Case-Specific Random Forests for Big Data Prediction

dc.contributor.author Zimmerman, Joshua
dc.contributor.author Nettleton, Dan
dc.contributor.author Nettleton, Dan
dc.contributor.department Statistics
dc.date 2019-09-14T09:42:47.000
dc.date.accessioned 2020-07-02T06:55:46Z
dc.date.available 2020-07-02T06:55:46Z
dc.date.copyright Thu Jan 01 00:00:00 UTC 2015
dc.date.embargo 2019-07-17
dc.date.issued 2015-01-01
dc.description.abstract <p>Some training datasets may be too large for storage on a single computer. Such datasets may be partitioned and stored on separate computers connected in a parallel computing environment. To predict the response associated with a specific target case when training data are partitioned, we propose a method for finding the training cases within each partition that are most relevant for predicting the response of a target case of interest. These most relevant training cases from each partition can be combined into a single dataset, which can be a subset of the entire training dataset that is small enough for storage and analysis in memory on a single computer. To generate a prediction from this selected subset, we use Case-Specific Random Forests, a variation of random forests that replaces the uniform bootstrap sampling used to build a tree in a random forest with unequal weighted bootstrap sampling, where training cases more similar to the target case are given greater weight. We demonstrate our method with an example concrete dataset. Our results show that predictions generated from a small selected subset of a partitioned training dataset can be as accurate as predictions generated in a traditional manner from the entire training dataset.</p>
dc.description.comments <p>This proceeding is published as Zimmerman, J., Nettleton, D. (2015). Case-specific random forests for big data prediction. In <em>JSM Proceedings,</em> General Methodology. <em> </em>Alexandria, VA: American Statistical Association, pp. 2537–2543. Posted with permission.</p>
dc.format.mimetype application/pdf
dc.identifier archive/lib.dr.iastate.edu/stat_las_conf/8/
dc.identifier.articleid 1007
dc.identifier.contextkey 14941939
dc.identifier.s3bucket isulib-bepress-aws-west
dc.identifier.submissionpath stat_las_conf/8
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/90254
dc.language.iso en
dc.source.bitstream archive/lib.dr.iastate.edu/stat_las_conf/8/0-Permission_from_ASA.pdf|||Sat Jan 15 02:03:25 UTC 2022
dc.source.bitstream archive/lib.dr.iastate.edu/stat_las_conf/8/2015_Nettleton_CaseSpecific.pdf|||Sat Jan 15 02:03:26 UTC 2022
dc.subject.disciplines Categorical Data Analysis
dc.subject.disciplines Computer Sciences
dc.subject.disciplines Statistical Methodology
dc.subject.disciplines Statistical Models
dc.subject.keywords bootstrap
dc.subject.keywords machine learning
dc.subject.keywords parallel computing
dc.title Case-Specific Random Forests for Big Data Prediction
dc.type article
dc.type.genre conference
dspace.entity.type Publication
relation.isAuthorOfPublication 7d86677d-f28f-4ab1-8cf7-70378992f75b
relation.isOrgUnitOfPublication 264904d9-9e66-4169-8e11-034e537ddbca
File
Original bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
2015_Nettleton_CaseSpecific.pdf
Size:
83.82 KB
Format:
Adobe Portable Document Format
Description:
No Thumbnail Available
Name:
0-Permission_from_ASA.pdf
Size:
131.01 KB
Format:
Adobe Portable Document Format
Description: