Evaluating imputation in a two-way table of means for training data construction

dc.contributor.advisor Olafsson, Sigurdur
dc.contributor.advisor Mirka, Gary
dc.contributor.advisor MacKenzie, Cameron
dc.contributor.advisor Davarnia, Danial
dc.contributor.author Arzanipour, Atousa
dc.contributor.department Department of Industrial and Manufacturing Systems Engineering
dc.date.accessioned 2025-02-11T17:35:23Z
dc.date.available 2025-02-11T17:35:23Z
dc.date.issued 2024-12
dc.date.updated 2025-02-11T17:35:25Z
dc.description.abstract Predictive machine learning starts with a training dataset but when this data is small the quality of the models may suffer. When additional observed data is unavailable, training data construction methods may be useful by augmenting the data and using a combination of observed and synthetic data to train better models. We consider using imputation into a two-way table of means for such training data construction and evaluate different imputation methods for this purpose. To construct synthetic data, we first construct a two-way table by splitting the explanatory variables into two subsets that define the dimensions of the table, with table itself being populated with the values of the response variable to be predicted. This two-way table will in general have missing values. The key to the training data construction is to interpret each missing value in the table as a potential observation or a data point that doesn’t yet exist in the original data. Imputation in two-way tables is a well-studied subject, and these missing values can thus be imputed using existing methods. Finally, the table is converted back to the original data format, and each imputed value in the table becomes a new synthetic training data point in the original format. We evaluate different imputation methods in combination with different predictive models and for different amounts of synthetic data, and the results show that the effectiveness of the approach does depend on both the imputation method and the predictive model, and in general this approach can be effectively used to construct up to 30-40% of the training data.
dc.format.mimetype PDF
dc.identifier.doi https://doi.org/10.31274/td-20250502-96
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/KrZJ8pXr
dc.language.iso en
dc.language.rfc3066 en
dc.subject.disciplines Operations research en_US
dc.subject.keywords Data Construction en_US
dc.subject.keywords Data Imputation en_US
dc.subject.keywords Learning Algorithms en_US
dc.subject.keywords Machine Learning en_US
dc.title Evaluating imputation in a two-way table of means for training data construction
dc.type thesis en_US
dc.type.genre thesis en_US
dspace.entity.type Publication
relation.isOrgUnitOfPublication 51d8b1a0-5b93-4ee8-990a-a0e04d3501b1
thesis.degree.discipline Operations research en_US
thesis.degree.grantor Iowa State University en_US
thesis.degree.level thesis $
thesis.degree.name Master of Science en_US
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Arzanipour_iastate_0097M_21911.pdf
Size:
464.1 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: