Evaluating imputation in a two-way table of means for training data construction
dc.contributor.advisor | Olafsson, Sigurdur | |
dc.contributor.advisor | Mirka, Gary | |
dc.contributor.advisor | MacKenzie, Cameron | |
dc.contributor.advisor | Davarnia, Danial | |
dc.contributor.author | Arzanipour, Atousa | |
dc.contributor.department | Department of Industrial and Manufacturing Systems Engineering | |
dc.date.accessioned | 2025-02-11T17:35:23Z | |
dc.date.available | 2025-02-11T17:35:23Z | |
dc.date.issued | 2024-12 | |
dc.date.updated | 2025-02-11T17:35:25Z | |
dc.description.abstract | Predictive machine learning starts with a training dataset but when this data is small the quality of the models may suffer. When additional observed data is unavailable, training data construction methods may be useful by augmenting the data and using a combination of observed and synthetic data to train better models. We consider using imputation into a two-way table of means for such training data construction and evaluate different imputation methods for this purpose. To construct synthetic data, we first construct a two-way table by splitting the explanatory variables into two subsets that define the dimensions of the table, with table itself being populated with the values of the response variable to be predicted. This two-way table will in general have missing values. The key to the training data construction is to interpret each missing value in the table as a potential observation or a data point that doesn’t yet exist in the original data. Imputation in two-way tables is a well-studied subject, and these missing values can thus be imputed using existing methods. Finally, the table is converted back to the original data format, and each imputed value in the table becomes a new synthetic training data point in the original format. We evaluate different imputation methods in combination with different predictive models and for different amounts of synthetic data, and the results show that the effectiveness of the approach does depend on both the imputation method and the predictive model, and in general this approach can be effectively used to construct up to 30-40% of the training data. | |
dc.format.mimetype | ||
dc.identifier.doi | https://doi.org/10.31274/td-20250502-96 | |
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/KrZJ8pXr | |
dc.language.iso | en | |
dc.language.rfc3066 | en | |
dc.subject.disciplines | Operations research | en_US |
dc.subject.keywords | Data Construction | en_US |
dc.subject.keywords | Data Imputation | en_US |
dc.subject.keywords | Learning Algorithms | en_US |
dc.subject.keywords | Machine Learning | en_US |
dc.title | Evaluating imputation in a two-way table of means for training data construction | |
dc.type | thesis | en_US |
dc.type.genre | thesis | en_US |
dspace.entity.type | Publication | |
relation.isOrgUnitOfPublication | 51d8b1a0-5b93-4ee8-990a-a0e04d3501b1 | |
thesis.degree.discipline | Operations research | en_US |
thesis.degree.grantor | Iowa State University | en_US |
thesis.degree.level | thesis | $ |
thesis.degree.name | Master of Science | en_US |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Arzanipour_iastate_0097M_21911.pdf
- Size:
- 464.1 KB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 0 B
- Format:
- Item-specific license agreed upon to submission
- Description: