Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing
Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing
Date
2020-10-06
Authors
Yang, Yicheng
Kim, Jae Kwang
Cho, In-Ho
Kim, Jae Kwang
Cho, In-Ho
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Altmetrics
Authors
Kim, Jae Kwang
Person
Research Projects
Organizational Units
Statistics
Organizational Unit
Journal Issue
Series
Department
Statistics
Abstract
The fractional hot-deck imputation (FHDI) is a general-purpose, assumption-free imputation method for handling multivariate missing data by filling each missing item with multiple observed values without resorting to artificially created values. The corresponding R package FHDI \cite{Im:2018} holds generality and efficiency, but it is not adequate for tackling big incomplete data due to the requirement of excessive memory and long running time. As a first step to tackle big incomplete data by leveraging the FHDI, we developed a new version of a parallel fractional hot-deck imputation (named as P-FHDI) program suitable for curing large incomplete datasets. Results show a favorable speedup when the P-FHDI is applied to big datasets with up to millions of instances or 10,000 of variables. This paper explains the detailed parallel algorithms of the P-FHDI for large instances (big-n) or high-dimensionality (big-p) datasets and confirms the favorable scalability. The proposed program inherits all the advantages of the serial FHDI and enables a parallel variance estimation, which will benefit a broad audience in science and engineering.
Comments
This is a manuscript of an article published as Yang, Yicheng, Jaekwang Kim, and In-Ho Cho. "Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing." IEEE Transactions on Knowledge and Data Engineering (2020).
DOI: 10.1109/TKDE.2020.3029146.
Copyright 2020 IEEE.
Attribution 4.0 International (CC BY 4.0).
Posted with permission.