distributions of adversarial examples by learning...
TRANSCRIPT
![Page 1: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/1.jpg)
by learning the Distributions of Adversarial Examples
Boqing GongJoint work with Yandong Li, Lijun Li, Liqiang Wang, & Tong Zhang
Published in ICML 2019
![Page 2: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/2.jpg)
Intriguing properties of deep neural networks (DNNs)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR.Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.
![Page 3: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/3.jpg)
Intriguing properties of deep neural networks (DNNs)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. ICLR.Goodfellow, I. J., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. ICLR.
![Page 4: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/4.jpg)
Projected gradient descent (PGD) attack
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv:1706.06083.Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial machine learning at scale. arXiv preprint arXiv:1611.01236.
![Page 5: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/5.jpg)
Intriguing results (1)
~100% attack success rates on CIFAR10 & ImageNet
vs
![Page 6: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/6.jpg)
Intriguing results (2)
![Page 7: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/7.jpg)
Intriguing results (2)
Adversarial examples generalize between different DNNs
vs
E.g., AlexNet InceptionV3
![Page 8: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/8.jpg)
Intriguing results (3)
A universal adversarial perturbation
Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.
![Page 9: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/9.jpg)
In a nutshell, white-box adversarial attacks can
Fool different DNNs for almost all test examples
Most data points lie near the classification boundaries.
Fool different DNNs by the same adversarial examples
The classification boundaries of various DNNs are close.
Fool different DNNs by a single universal perturbation
We can turn most examples to adversarial by moving them along the same direction by the same amount.
![Page 10: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/10.jpg)
However, white-box adversarial attacks can
Not apply to most real-world scenarios
Not work when the network architecture is unknown
Not work when the weights are unknown
Not work when querying networks is (e.g., cost) prohibitive
![Page 11: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/11.jpg)
Black-box attacks
Panda
Panda: 0.88493
Indri: 0.00878
Red Panda: 0.00317
Substitute attack (Papernot et al., 2017)
Decision-based (Brendel et al., 2017)
Boundary-tracing (Cheng et al., 2019)
Zero-th order (Chen et al., 2017)
Natural evolution strategies (Ilyas et al., 2018)
![Page 12: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/12.jpg)
Bad local optimum, non-smooth optimization, curse of dimensionality, defense-specific gradient estimation, etc.
The? adversarial perturbation (for an input)
Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. ICML.
![Page 13: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/13.jpg)
Learns the distribution of adversarial examples (for any input)
Our work
![Page 14: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/14.jpg)
Learns the distribution of adversarial examples (for an input)
Reduces the “attack dimension”Fewer queries into the network.
Smoothes the optimizationHigher attack success rates.
Characterizes the risk of the input exampleNew defense methods.
Our work
![Page 15: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/15.jpg)
Learns the distribution of adversarial examples (for an input)
Our work
> > >
![Page 16: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/16.jpg)
Learns the distribution of adversarial examples (for an input)
A sample from the distribution fails DNN by a high chance.
Our work
![Page 17: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/17.jpg)
Which family of distributions?
![Page 18: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/18.jpg)
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
![Page 19: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/19.jpg)
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
![Page 20: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/20.jpg)
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
![Page 21: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/21.jpg)
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
![Page 22: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/22.jpg)
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
![Page 23: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/23.jpg)
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
![Page 24: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/24.jpg)
Natural evolution strategies (NES)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2014). Natural evolution strategies. JMLR..
![Page 25: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/25.jpg)
Black-box
![Page 26: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/26.jpg)
Experiment setup
Attack 13 defended DNNs & 2 vanilla DNNs
Consider both and
Examine all test examples of CIFAR10 & 1000 of ImageNet
Excluding those misclassified by the targeted DNN
Evaluate by attack success rates
![Page 27: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/27.jpg)
Attack success rates, ImageNet
![Page 28: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/28.jpg)
Attack success rates, CIFAR10
![Page 29: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/29.jpg)
Attack success rate vs. optimization steps
![Page 30: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/30.jpg)
Transferabilities of the adversarial examples
![Page 31: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/31.jpg)
A universally effective defense technique?
Adversarial training / defensive learning
The PGD attack
DNN weights
![Page 32: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/32.jpg)
In a nutshell,
Is a powerful black-box attack, >= white-box attacks
Is universal, failed various defenses by the same algorithm
Characterizes the distributions of adversarial examples
Reduces the “attack dimension”
Speeds up the defensive learning (ongoing work)
![Page 33: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/33.jpg)
Physical adversarial attack
Joint work with Yang Zhang, Hassan Foroosh, & David Phil
Published in ICLR 2019Boqing Gong
![Page 34: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/34.jpg)
Recall the following result
A universal adversarial perturbation
Moosavi-Dezfooli, S. M., Fawzi, A., Fawzi, O., & Frossard, P. (2017). Universal adversarial perturbations. CVPR.
![Page 35: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/35.jpg)
Physical attack: universal perturbation → 2D mask
Eykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. "Robust physical-world attacks on deep learning models." CVPR 2018.
![Page 36: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/36.jpg)
Physical attack: 2D mask → 3D camouflage
Gradient descent w.r.t. camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations
Non-differentiable
![Page 37: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/37.jpg)
Repeat until done1. Camouflage a vehicle2. Drive it around and take many pictures of it3. Detect it by Faster-RCNN & save the detection scores
→ Dataset: {(camouflage, vehicle, background, detection score)}
Physical attack: 2D mask → 3D camouflage
![Page 38: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/38.jpg)
Fit a DNN to predict any camouflage’s corresponding detection scores
Physical attack: 2D mask → 3D camouflage
![Page 39: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/39.jpg)
Physical attack: 2D mask → 3D camouflage
Gradient descent w.r.t. Camouflage cin order to minimize detection scoresfor the vehicle under all feasible locations
Non-differentiableBut approximated by a DNN
![Page 41: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/41.jpg)
Why do we care?
![Page 42: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/42.jpg)
Observation, re-observation, & future work
Defended DNNs are still vulnerable to transfer attacks (only to some moderate degree though)
Adversarial examples from black-box attacks are less transferable than those from white-box attacks
All future work on defenses will adopt adversarial training
Adversarial training will become faster (we are working on it)
We should certify DNNs’ expected robustness by
![Page 43: Distributions of Adversarial Examples by learning thevalser.org/webinar/slide/slides/20190925/NATTACK... · Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples](https://reader034.vdocuments.us/reader034/viewer/2022042919/5f6383f6f3d5ac6b27651b84/html5/thumbnails/43.jpg)
New works to watchStateful DNNs: Goodfellow (2019). A Research Agenda: Dynamic Models to Defend Against Correlated Attacks. arXiv: 1903.06293.
Explaining adversarial examples: Ilyas et al. (2019) Adversarial Examples Are Not Bugs, They Are Features. arXiv:1905.02175.
Faster adversarial training: Zhang et al. (2019). You Only Propagate Once: Painless Adversarial Training Using Maximal Principle. arXiv:1905.00877. && Shafahi et al. (2019). Adversarial Training for Free! arXiv: 1904.12843.
Certifying DNNs’ expected robustness: Webb et al. (2019). A Statistical Approach to Assessing Neural Network Robustness. ICLR. && Cohen et al. (2019). Certified adversarial robustness via randomized smoothing. arXiv preprint arXiv:1902.02918.