Provable surrogate gradient-based optimization for unsupervised learning

Nguyen, Thanh

Provable surrogate gradient-based optimization for unsupervised learning

dc.contributor.advisor	Chinmay Hegde
dc.contributor.author	Nguyen, Thanh
dc.contributor.department	Department of Electrical and Computer Engineering
dc.date	2020-06-26T19:54:08.000
dc.date.accessioned	2020-06-30T03:21:50Z
dc.date.available	2020-06-30T03:21:50Z
dc.date.copyright	Fri May 01 00:00:00 UTC 2020
dc.date.embargo	2020-12-16
dc.date.issued	2020-01-01
dc.description.abstract	<p>Gradient-based optimization lies at the core of modern machine learning and deep learning, with (stochastic) gradient descent algorithms being employed as the main workhorse. The unprecedented success of deep learning over the last decade has arguably been tied with the popularity and mysterious success of these algorithms --- particularly for supervised learning.</p> <p>This success is despite the fact that objective functions in deep learning are extremely non-linear and non-convex.</p> <p>The past few years have witnessed great theoretical advances in analyzing optimization and inductive biases of gradient descent for supervised learning. However, the majority of existing work only applies to settings such as classification and regression. In contrast, the role of gradient descent in the unsupervised setting has gained far less attention. In this work, we make concrete contributions to the understanding of gradient-based optimization in unsupervised learning.</p> <p>We start with dictionary learning, an unsupervised feature learning mechanism widely used in signal processing and machine learning. The primary goal of dictionary learning is to learn sparse, linear representations of the data by minimizing the reconstruction loss. In this problem, the objective function is coupled with an intractable sparse coding step due to the latent representations. Therefore, the gradient of the loss with respect to the model parameters can not be obtained exactly but is only a noisy estimate of the true gradient. However, gradient-based alternating minimization for dictionary learning works surprisingly well in practice while theoretical understanding of the success has lagged behind. We will refer to this method as "surrogate" gradient-based optimization.</p> <p>In Chapter 1 and Chapter 2, we introduce two surrogate gradient descent algorithms for sparse coding. The first algorithm learns a double-sparsity model where the dictionary is the product of a fixed, known basis and a learnable sparse component. The second algorithm provably learns a dictionary from samples with missing entries. In each case, we provide a spectral initialization subroutine that gives a coarse estimate of the true dictionary. Then, starting from this estimate, we prove that the surrogate descent algorithm linearly converges to the true dictionary. We analyze the algorithm and demonstrate superior sample complexity and computational complexity bounds over existing provable approaches.</p> <p>While sparse coding is still widely used, its computational cost is prohibitive for high-dimensional data. Autoencoders have instead emerged as an efficient and flexible alternative for feature learning using neural networks. In Chapter 3, we build upon our theory of surrogate gradient developed in the previous chapters to provide a series of results for autoencoder learning. For several generative models of data, we prove that when trained with gradient descent, two-layer weight-tied autoencoders can successfully recover the ground-truth parameters of the corresponding models. Our analysis establishes theoretical evidence that shallow autoencoder modules can indeed be powerful feature learning mechanisms for a variety of data models. In Chapter 4, we go beyond the local analysis in Chapter 3 and analyze the gradient dynamics of over-parameterized autoencoders. Under a few mild assumptions about the given training dataset, we rigorously prove the linear convergence of gradient descent for randomly initialized autoencoder networks. Our analysis mirrors the recent advances in the emerging theory of neural tangent kernels.</p> <p>Chapter 5 considers a black-box optimization problem where the objective and constraints are specified as solutions to expensive PDE solvers. We pose this optimization as sampling from a Gibbs distribution with a black-box energy function and perform Langevin sampling by using surrogate gradients of the black-box functions learned by deep neural networks. We prove the convergence of the surrogate Langevin dynamics when the target distribution is log-concave and smooth. Finally, in Chapter 6, we lay out several potential directions that merge two lines of research in gradient flow analysis and Langevin dynamics, as well as inverse problems with generative priors.</p>
dc.format.mimetype	application/pdf
dc.identifier	archive/lib.dr.iastate.edu/etd/17923/
dc.identifier.articleid	8930
dc.identifier.contextkey	18242497
dc.identifier.doi	https://doi.org/10.31274/etd-20200624-102
dc.identifier.s3bucket	isulib-bepress-aws-west
dc.identifier.submissionpath	etd/17923
dc.identifier.uri	https://dr.lib.iastate.edu/handle/20.500.12876/32106
dc.language.iso	en
dc.source.bitstream	archive/lib.dr.iastate.edu/etd/17923/Nguyen_iastate_0097E_18698.pdf\|\|\|Fri Jan 14 21:31:09 UTC 2022
dc.subject.keywords	Gradient Descent
dc.subject.keywords	Machine Learning
dc.subject.keywords	Optimization
dc.subject.keywords	Provable Methods
dc.subject.keywords	Unsupervised Learning
dc.title	Provable surrogate gradient-based optimization for unsupervised learning
dc.type	thesis	en_US
dc.type.genre	thesis	en_US
dspace.entity.type	Publication
relation.isOrgUnitOfPublication	a75a044c-d11e-44cd-af4f-dab1d83339ff
thesis.degree.discipline	Electrical Engineering(Communicationsand Signal Processing)
thesis.degree.level	thesis
thesis.degree.name	Doctor of Philosophy

File

Original bundle

Now showing 1 - 1 of 1

Name:: Nguyen_iastate_0097E_18698.pdf
Size:: 1.85 MB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Theses and Dissertations