Out-of-sample comparisons of overfit models
This paper uses dimension asymptotics to study why overfit linear regression models should be compared out-of-sample; we let the number of predictors used by the larger model increase with the number of observations so that their ratio remains uniformly positive. Our analysis gives a theoretical motivation for using out-of-sample (OOS) comparisons: the DMW OOS test allows a forecaster to conduct inference about the expected future accuracy of his or her models when one or both is overfit. We show analytically and through Monte Carlo that standard full-sample test statistics can not test hypotheses about this performance. Our paper also shows that popular test and training sample sizes may give misleading results if researchers are concerned about overfit. We show that P 2 /T must converge to zero for theDMW test to give valid inference about the expected forecast accuracy, otherwise the test measures the accuracy of the estimates constructed using only the training sample. In empirical research, P is typically much larger than this. Our simulations indicate that using large values of P with the DMW test gives undersized tests with low power, so this practice may favor simple benchmark models too much.