The leave-worst-k-out criterion for cross validation

Wang, Lizhi

The leave-worst-k-out criterion for cross validation

File

2022-WangLizhi-LeaveWorst.pdf (1.07 MB)

Date

2022-06-17

Authors

Wang, Lizhi

Publisher

Springer-Verlag GmbH Germany

Abstract

Cross validation is widely used to assess the performance of prediction models for unseen data. Leave-k-out and m-fold are among the most popular cross validation criteria, which have complementary strengths and limitations. Leave-k-out (with leave-1-out being the most common special case) is exhaustive and more reliable but computationally prohibitive when k>2; whereas m-fold is much more tractable at the cost of uncertain performance due to non-exhaustive random sampling. We propose a new cross validation criterion, leave-worst-k-out, which attempts to combine the strengths and avoid limitations of leave-k-out and m-fold. The leave-worst-k-out criterion is defined as the largest validation error out of Cnk possible ways to partition n data points into a subset of (n−k) for training a prediction model and the remaining k for validation. In contrast, the leave-k-out criterion takes the average of the Cnk validation errors from the aforementioned partitions, and m-fold samples m random (but non-independent) such validation errors. We prove that, for the special case of multiple linear regression model under the L1 norm, the leave-worst-k-out criterion can be computed by solving a mixed integer linear program. We also present a random sampling algorithm for approximately computing the criterion for general prediction models under general norms. Results of two computational experiments suggested that the leave-worst-k-out criterion clearly outperformed leave-k-out and m-fold in assessing the generalizability of prediction models; moreover, leave-worst-k-out can be approximately computed using the random sampling algorithm almost as efficiently as leave-1-out and m-fold, and the effectiveness of the approximated criterion may be as high as, or even higher than, the exactly computed criterion.

Academic or Administrative Unit

Department of Industrial and Manufacturing Systems Engineering

Department of Electrical and Computer Engineering

Type

article

Comments

This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at DOI: 10.1007/s11590-022-01894-6. Copyright 2022 The Author(s). Posted with permission.