Evaluating imputation in a two-way table of means for training data construction

Arzanipour, Atousa

Evaluating imputation in a two-way table of means for training data construction

dc.contributor.advisor	Olafsson, Sigurdur
dc.contributor.advisor	Mirka, Gary
dc.contributor.advisor	MacKenzie, Cameron
dc.contributor.advisor	Davarnia, Danial
dc.contributor.author	Arzanipour, Atousa
dc.contributor.department	Department of Industrial and Manufacturing Systems Engineering
dc.date.accessioned	2025-02-11T17:35:23Z
dc.date.available	2025-02-11T17:35:23Z
dc.date.issued	2024-12
dc.date.updated	2025-02-11T17:35:25Z
dc.description.abstract	Predictive machine learning starts with a training dataset but when this data is small the quality of the models may suffer. When additional observed data is unavailable, training data construction methods may be useful by augmenting the data and using a combination of observed and synthetic data to train better models. We consider using imputation into a two-way table of means for such training data construction and evaluate different imputation methods for this purpose. To construct synthetic data, we first construct a two-way table by splitting the explanatory variables into two subsets that define the dimensions of the table, with table itself being populated with the values of the response variable to be predicted. This two-way table will in general have missing values. The key to the training data construction is to interpret each missing value in the table as a potential observation or a data point that doesn’t yet exist in the original data. Imputation in two-way tables is a well-studied subject, and these missing values can thus be imputed using existing methods. Finally, the table is converted back to the original data format, and each imputed value in the table becomes a new synthetic training data point in the original format. We evaluate different imputation methods in combination with different predictive models and for different amounts of synthetic data, and the results show that the effectiveness of the approach does depend on both the imputation method and the predictive model, and in general this approach can be effectively used to construct up to 30-40% of the training data.
dc.format.mimetype	PDF
dc.identifier.doi	https://doi.org/10.31274/td-20250502-96
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/KrZJ8pXr
dc.language.iso	en
dc.language.rfc3066	en
dc.subject.disciplines	Operations research	en_US
dc.subject.keywords	Data Construction	en_US
dc.subject.keywords	Data Imputation	en_US
dc.subject.keywords	Learning Algorithms	en_US
dc.subject.keywords	Machine Learning	en_US
dc.title	Evaluating imputation in a two-way table of means for training data construction
dc.type	thesis	en_US
dc.type.genre	thesis	en_US
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	51d8b1a0-5b93-4ee8-990a-a0e04d3501b1
thesis.degree.discipline	Operations research	en_US
thesis.degree.grantor	Iowa State University	en_US
thesis.degree.level	thesis	$
thesis.degree.name	Master of Science	en_US

File

Original bundle

Now showing 1 - 1 of 1

Name:: Arzanipour_iastate_0097M_21911.pdf
Size:: 464.1 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 0 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations