Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing

dc.contributor.author Yang, Yicheng
dc.contributor.author Kim, Jae Kwang
dc.contributor.author Cho, In-Ho
dc.contributor.department Statistics (CALS)
dc.date.accessioned 2022-06-03T15:16:08Z
dc.date.available 2022-06-03T15:16:08Z
dc.date.issued 2020-10-06
dc.description.abstract The fractional hot-deck imputation (FHDI) is a general-purpose, assumption-free imputation method for handling multivariate missing data by filling each missing item with multiple observed values without resorting to artificially created values. The corresponding R package FHDI \cite{Im:2018} holds generality and efficiency, but it is not adequate for tackling big incomplete data due to the requirement of excessive memory and long running time. As a first step to tackle big incomplete data by leveraging the FHDI, we developed a new version of a parallel fractional hot-deck imputation (named as P-FHDI) program suitable for curing large incomplete datasets. Results show a favorable speedup when the P-FHDI is applied to big datasets with up to millions of instances or 10,000 of variables. This paper explains the detailed parallel algorithms of the P-FHDI for large instances (big-n) or high-dimensionality (big-p) datasets and confirms the favorable scalability. The proposed program inherits all the advantages of the serial FHDI and enables a parallel variance estimation, which will benefit a broad audience in science and engineering.
dc.description.comments This is a manuscript of an article published as Yang, Yicheng, Jaekwang Kim, and In-Ho Cho. "Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing." IEEE Transactions on Knowledge and Data Engineering (2020). DOI: 10.1109/TKDE.2020.3029146. Copyright 2020 IEEE. Attribution 4.0 International (CC BY 4.0). Posted with permission.
dc.identifier.uri https://dr.lib.iastate.edu/handle/20.500.12876/7rKoXAar
dc.language.iso en
dc.publisher IEEE
dc.source.uri https://doi.org/10.1109/TKDE.2020.3029146 *
dc.subject.keywords Parallel fractional hot-deck imputation
dc.subject.keywords incomplete big data
dc.subject.keywords multivariate missing data curing
dc.subject.keywords parallel Jackknife variance estimation
dc.title Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing
dc.type article
dspace.entity.type Publication
relation.isAuthorOfPublication fdf914ae-e48d-4f4e-bfa2-df7a755320f4
relation.isAuthorOfPublication be09bc99-6d52-4838-973b-47f629edd366
relation.isOrgUnitOfPublication 5a1eba07-b15d-466a-a333-65bd63a4001a
File
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
2020-KimJaiKwang-ParallelFractional.pdf
Size:
3.42 MB
Format:
Adobe Portable Document Format
Description:
Collections