Adaptive energy-based gradient methods for large-scale optimization and data-driven discovery of dynamical systems via neural networks

Thumbnail Image
Tian, Xuping
Major Professor
Liu, Hailiang
Vaswani, Namrata
Luo, Songting
Wu, Zhijun
Wu, Ruoyu
Committee Member
Journal Title
Journal ISSN
Volume Title
Research Projects
Organizational Units
Journal Issue
Is Version Of
Machine learning and data science have revolutionized numerous scientific and engineering domains, promising a renaissance in complex data analysis and understanding. This thesis addresses two critical challenges at the forefront of these fields: (1) developing efficient optimization methods for training large-scale machine learning models, and (2) the discovery of dynamical systems from observational data. To tackle the first challenge, we introduce a new family of gradient-based optimization methods. These methods employ an adaptive energy-based strategy, ensuring unconditional energy stability regardless of the step size (learning rate) value. We provide convergence analyses for both deterministic and stochastic settings, with particular emphasis placed on the SGEM (Stochastic Gradient with Energy and Momentum) method, notable for its incorporation of momentum acceleration. Experimental results on benchmark deep learning problems demonstrate SGEM's rapid convergence and superior generalization capabilities. Furthermore, we investigate the dynamic behavior of a deterministic variant of SGEM through the lens of limiting Ordinary Differential Equations (ODEs). Our results illuminate the impact of momentum and step size on the stability and convergence of discrete schemes. Addressing the second challenge, we propose a data-driven optimal control approach for learning system parameters. This approach is subsequently extended to encompass the learning of the entire governing function by incorporating neural network approximation into the framework. Specifically, we exemplify the data-driven optimal control approach by learning the parameters of the Susceptible-Exposed-Infectious-Recovered (SEIR) model from reported COVID-19 data. The Optimal Control Neural Networks (OCN) framework is demonstrated through its application to a gradient flow system. The training process of the neural networks is meticulously designed using the adjoint method alongside symplectic ODE solvers. Numerical experiments on several canonical systems validate the OCN framework. In summary, this research contributes to the advancement of both the theoretical understanding and practical applications of large-scale optimization in machine learning, as well as the data-driven discovery of dynamical systems.
Subject Categories