Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing
dc.contributor.author | Yang, Yicheng | |
dc.contributor.author | Kim, Jae Kwang | |
dc.contributor.author | Cho, In-Ho | |
dc.contributor.department | Statistics (CALS) | |
dc.date.accessioned | 2022-06-03T15:16:08Z | |
dc.date.available | 2022-06-03T15:16:08Z | |
dc.date.issued | 2020-10-06 | |
dc.description.abstract | The fractional hot-deck imputation (FHDI) is a general-purpose, assumption-free imputation method for handling multivariate missing data by filling each missing item with multiple observed values without resorting to artificially created values. The corresponding R package FHDI \cite{Im:2018} holds generality and efficiency, but it is not adequate for tackling big incomplete data due to the requirement of excessive memory and long running time. As a first step to tackle big incomplete data by leveraging the FHDI, we developed a new version of a parallel fractional hot-deck imputation (named as P-FHDI) program suitable for curing large incomplete datasets. Results show a favorable speedup when the P-FHDI is applied to big datasets with up to millions of instances or 10,000 of variables. This paper explains the detailed parallel algorithms of the P-FHDI for large instances (big-n) or high-dimensionality (big-p) datasets and confirms the favorable scalability. The proposed program inherits all the advantages of the serial FHDI and enables a parallel variance estimation, which will benefit a broad audience in science and engineering. | |
dc.description.comments | This is a manuscript of an article published as Yang, Yicheng, Jaekwang Kim, and In-Ho Cho. "Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing." IEEE Transactions on Knowledge and Data Engineering (2020). DOI: 10.1109/TKDE.2020.3029146. Copyright 2020 IEEE. Attribution 4.0 International (CC BY 4.0). Posted with permission. | |
dc.identifier.uri | https://dr.lib.iastate.edu/handle/20.500.12876/7rKoXAar | |
dc.language.iso | en | |
dc.publisher | IEEE | |
dc.source.uri | https://doi.org/10.1109/TKDE.2020.3029146 | * |
dc.subject.keywords | Parallel fractional hot-deck imputation | |
dc.subject.keywords | incomplete big data | |
dc.subject.keywords | multivariate missing data curing | |
dc.subject.keywords | parallel Jackknife variance estimation | |
dc.title | Parallel Fractional Hot Deck Imputation and Variance Estimation for Big Incomplete Data Curing | |
dc.type | article | |
dspace.entity.type | Publication | |
relation.isAuthorOfPublication | fdf914ae-e48d-4f4e-bfa2-df7a755320f4 | |
relation.isAuthorOfPublication | be09bc99-6d52-4838-973b-47f629edd366 | |
relation.isOrgUnitOfPublication | 5a1eba07-b15d-466a-a333-65bd63a4001a |
File
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- 2020-KimJaiKwang-ParallelFractional.pdf
- Size:
- 3.42 MB
- Format:
- Adobe Portable Document Format
- Description: