A mathematical programming approach for integrated multiple linear regression subset selection and validation

Chung, Seokhyun; Park, Young-Woong; Park, Young-Woong; Cheong, Taesu

A mathematical programming approach for integrated multiple linear regression subset selection and validation

File

2020_ParkYW_A_Mathematical_Programming_Approach_for_Integrated_Multiple.pdf (1.12 MB)

Date

2020-12-01

Authors

Chung, Seokhyun

Park, Young-Woong

Cheong, Taesu

Authors

Person

Park, Young-Woong

Assistant Professor

Organizational Units

Organizational Unit

Information Systems and Business Analytics

In today’s business landscape, information systems and business analytics are pivotal elements that drive success. Information systems form the digital foundation of modern enterprises, while business analytics involves the strategic analysis of data to extract meaningful insights. Information systems have the power to create and restructure industries, empower individuals and firms, and dramatically reduce costs. Business analytics empowers organizations to make precise, data-driven decisions that optimize operations, enhance strategies, and fuel overall growth. Explore these essential fields to understand how data and technology come together, providing the knowledge needed to make informed decisions and achieve remarkable outcomes.

Department

Information Systems and Business Analytics

Abstract

Subset selection for multiple linear regression aims to construct a regression model that minimizes errors by selecting a small number of explanatory variables. Once a model is built, various statistical tests and diagnostics are conducted to validate the model and to determine whether the regression assumptions are met. Most traditional approaches require human decisions at this step. For example, the user may repeat adding or removing a variable until a satisfactory model is obtained. However, this trial-and-error strategy cannot guarantee that a subset that minimizes the errors while satisfying all regression assumptions will be found. In this paper, we propose a fully automated model building procedure for multiple linear regression subset selection that integrates model building and validation based on mathematical programming. The proposed model minimizes mean squared errors while ensuring that the majority of the important regression assumptions are met. We also propose an efficient constraint to approximate the constraint for the coefficient t-test. When no subset satisfies all of the considered regression assumptions, our model provides an alternative subset that satisfies most of these assumptions. Computational results show that our model yields better solutions (i.e., satisfying more regression assumptions) compared to the state-of-the-art benchmark models while maintaining similar explanatory power.

Comments

This accepted article is published as Chung, Y.W. Park*, and T. Cheong (2020), “A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation,” Pattern Recognition 108:107565. doi: 10.1016/j.patcog.2020.107565. Posted with permission

Copyright

Wed Jan 01 00:00:00 UTC 2020

Collections

Publications

Full item page