EE Colloquium: Convergence of First-Order Algorithms Under Overparametrized Settings
Abstract: We talk about some convergence results of first-order algorithms under the overparametrized settings. First, we consider adversarial training. Adversarial training is a principled approach for training robust neural networks. Despite the tremendous successes in practice, its theoretical properties still remain largely unexplored. We provide new theoretical insights of gradient descent based adversarial training by studying its computational properties, specifically on its inductive bias. We take the binary classification task on linearly separable data as an illustrative example, where the loss asymptotically attains its infimum as the parameter diverges to infinity along with certain directions. Specifically, we show that when the adversarial perturbation during training has bounded $\ell_2$-norm, the classifier learned by gradient descent based adversarial training converges in direction to the maximum $\ell_2$-norm margin classifier at the rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$, significantly faster than the rate $\mathcal{O}\rbr{1/\log T}$ of training with clean data. In addition, when the adversarial perturbation during training has bounded $\ell_q$-norm for some $q\ge 1$, the resulting classifier converges in direction to a maximum mixed-norm margin classifier, which has a natural interpretation of robustness, as being the maximum $\ell_2$-norm margin classifier under worst-case $\ell_q$-norm perturbation to the data. Our findings provide theoretical backups for adversarial training that indeed promotes robustness against adversarial perturbation. We will also talk about new theoretical guarantees of Actor-Critic algorithms for risk-sensitive reinforcement learning.
Bio: Ethan is an assistant professor at Penn State University. Before joining Penn State, he got his PhD from Princeton University in 2016 under the guidance of Bob Vanderbei and Han Liu, and his bachelor's degree from the National University of Singapore in 2010 under the guidance of Kim-Chuan Toh. He works on different problems such as statistical learning, high-dimensional inference and adaptive trial design from both statistical and computational perspectives. He won numerous awards in statistics and optimization such as Best Paper Prize for Young Researchers in Continuous Optimization (jointly with Mengdi Wang and Han Liu), which is awarded once every three years.
Event Contact: Minghui Zhu