Techniques for enhancing adversarial robustness and robust fairness in deep neural networks
Date
2024-12
Authors
Fakorede, Olukorede Joseph
Major Professor
Advisor
Tian, Jin
Quinn, Christopher
Lutz, Robyn
Li, Qi
Jannesari, Ali
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Deep Neural Networks (DNNs) are a class of Artificial Neural Networks that have shown
remarkable success in various fields, such as Computer Vision, Reinforcement Learning, and
Natural Language Processing. However, the applicability of DNNs in safety-critical domains has
often been questioned due to observations that DNNs are brittle and fail to perform well when fed
imperceptibly perturbed input examples. These imperceptibly perturbed inputs are referred to as
adversarial examples.
To address the problem of vulnerabilities of DNNs, it is imperative to improve their
robustness. Of the several methods earlier proposed to improve the adversarial robustness of
DNNs, Adversarial Training (AT), which involves training DNNs with adversarial examples, is
found to be the most effective.
While AT has proven to be effective in improving the robustness of DNNs, the generalization
performance of AT methods, compared to natural training methods, remain inferior. Further, the
fairness of AT methods have been questioned as the robustness gains of adversarially-trained
DNN models exhibit substantial variations across classes.
In this thesis, we explore more novel approaches for formulating Adversarial Training of
DNNs, aiming to improve and achieve better robustness for DNNs. First, we show that
incorporating Hypersphere Embedding and regularization terms that adequately capture
angularly discriminative information into Adversarial Training helps improve the robustness of
DNNs.
Despite the effectiveness of AT methods in improving the adversarial robustness of DNNs,
recent works have shown that the robustness obtained the AT methods exhibit significant
inter-class variance. Motivated by this observation, we designed a instance-wise re-weighting
method for uniquely assigning importance to losses of individual adversarial training examples.
Our approach achieve the dual purpose of minimizing the inter-class variance inherent in robust
models obtained through adversarial training and improve the overall robustness performance.
We further question the use of uniform perturbation for crafting adversarial training
examples. We theoretically show that under uniform perturbation, various training adversarial
examples incur varying increment in losses. Consequent on this revelation, we argue that AT
methods may generally benefit from customizing adversarial perturbations to individual
adversarial training examples. We proposed two instance-wise weighting methods for
perturbation budgets of each training example.
Finally, drawing inspirations from the traditional standard deviation statistic, we propose a
measure that may be utilized in any supervised machine learning settings. We show that the
proposed standard-deviation-inspired measure, when used as a regularization term, effectively
improves the robustness of AT methods to strong white-box attacks, in addition to improving the
generalization of AT methods to adversarial attacks not seen during training. Furthermore, we
show that the proposed measure may be used as a metric for generating adversarial attacks on
DNN models.
Series Number
Journal Issue
Is Version Of
Versions
Series
Academic or Administrative Unit
Type
dissertation