First-order methods of solving nonconvex optimization problems: Algorithms, convergence, and optimality
Date
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
First-order methods for solving large scale nonconvex problems have been applied in many areas of machine learning, such as matrix factorization, dictionary learning, matrix sensing/completion, training deep neural networks, etc. For example, matrix factorization problems have lots of important applications in document clustering, community detection and image segmentation. In this dissertation, we first study some novel nonconvex variable splitting methods for solving some matrix factorization problems, mainly focusing on symmetric non-negative matrix factorization (SymNMF) and stochastic SymNMF.
In the problem of SymNMF, the proposed algorithm, called nonconvex splitting SymNMF (NS-SymNMF), is guaranteed to converge to the set of Karush-Kuhn-Tucker (KKT) points of the nonconvex SymNMF problem. Furthermore, it achieves a global sublinear convergence rate. We also show that the algorithm can be efficiently implemented in a distributed manner. Further, sufficient conditions are provided which guarantee the global and local optimality of the obtained solutions. Extensive numerical results performed on both synthetic and real data sets suggest that the proposed algorithm converges quickly to a local minimum solution.
Furthermore, we consider a stochastic SymNMF problem in which the observation matrix is generated in a random and sequential manner. The proposed stochastic nonconvex splitting method not only guarantees convergence to the set of stationary points of the problem (in the mean-square sense), but further achieves a sublinear convergence rate. Numerical results show that for clustering problems over both synthetic and real world datasets, the proposed algorithm converges quickly to the set of stationary points.
When the objective function is nonconvex, it is well-known the most of the first-order algorithms converge to the first-order stationary solution (SS1) with a global sublinear rate. Whether the first-order algorithm can converge to the second-order stationary points (SS2) with some provable rate has attracted a lot of attention recently. In particular, we study the alternating gradient descent (AGD) algorithm as an example, which is a simple but popular algorithm and has been applied to problems in optimization, machine learning, data mining, and signal processing, etc. The algorithm updates two blocks of variables in an alternating manner, in which a gradient step is taken on one block, while keeping the remaining block fixed.
In this work, we show that a variant of AGD-type algorithms will not be trapped by ``bad'' stationary solutions such as saddle points and local maximum points. In particular, we consider a smooth unconstrained nonconvex optimization problem, and propose a perturbed AGD (PA-GD) which converges (with high probability) to the set of SS2 with a global sublinear rate. To the best of our knowledge, this is the first alternating type algorithm which is guaranteed to achieve SS2 points with high probability and the corresponding convergence rate is $\mathcal{O}(\text{polylog}(d)/\epsilon^{7/3})$ [where polylog$(d)$ is polynomial of the logarithm of problem dimension $d$].