An investigation of the effects of missing data technique selection on student performance results
The problem for this study was to investigate whether the selection of a missing data technique impacted indicators of student achievement. The first part of the study consisted of applying five different missing data techniques (MDTs) to the same set of criterion-referenced student performance data and comparing the results. The missing data techniques studied were listwise deletion, pairwise deletion, grand-mean substitution, cell-mean substitution, and simple regression;Differences were found in some results, for both student posttest and gain scores. The decrease in data accompanying the use of listwise deletion was tied to the inability to detect statistically significant differences found in data sets treated with other missing data techniques;The second part of the study consisted of conducting a simulation to monitor the performance of the five missing data techniques. Ten proportionally equivalent data sets (PEDs) were created using missing data ratios in an original, totally complete, data set. Each PED was treated with the five MDTs used in the first part of the study. Mean deviation measures were calculated to monitor bias and closeness to actual values. Deviation measures were calculated for test means, standard deviations, and correlations. Results were averaged across the ten PEDs;Results showed no single missing data technique performed best on all deviation measures, though simple regression appeared to perform the best overall. Listwise deletion, the most often used missing data technique, was shown to do the worst job in estimating test means. Grand-mean substitution was consistently outperformed by cell-mean substitution. It was recommended that multiple missing data techniques should be used in analyzing data sets. The recommended techniques were: listwise deletion, pairwise deletion, cell-mean substitution, and simple regression;This study demonstrated two main points: (1) selection of a missing data technique can impact student performance results, and (2) it is possible to investigate the performance of various MDTs. The researcher suggests that decisions regarding treatment of missing data should move from an unconscious default within statistical packages to a conscious decision based on an understanding of the possible consequences.