![Page 1: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/1.jpg)
Adversarial Examples and Adversarial Training
Ian Goodfellow, OpenAI Research Scientist Presentation at San Francisco AI Meetup, 2016-08-18
![Page 2: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/2.jpg)
(Goodfellow 2016)
In this presentation
• “Intriguing Properties of Neural Networks” Szegedy et al, 2013
• “Explaining and Harnessing Adversarial Examples” Goodfellow et al 2014
• “Adversarial Perturbations of Deep Neural Networks” Warde-Farley and Goodfellow, 2016
![Page 3: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/3.jpg)
(Goodfellow 2016)
In this presentation• “Transferability in Machine Learning: from Phenomena
to Black-Box Attacks using Adversarial Samples” Papernot et al 2016
• “Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples” Papernot et al 2016
• “Adversarial Perturbations Against Deep Neural Networks for Malware Classification” Grosse et al 2016 (not my own work)
![Page 4: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/4.jpg)
(Goodfellow 2016)
In this presentation• “Distributional Smoothing with Virtual
Adversarial Training” Miyato et al 2015 (not my own work)
• “Virtual Adversarial Training for Semi-Supervised Text Classification” Miyato et al 2016
• “Adversarial Examples in the Physical World” Kurakin et al 2016
![Page 5: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/5.jpg)
(Goodfellow 2016)
Overview• What are adversarial examples?
• Why do they happen?
• How can they be used to compromise machine learning systems?
• What are the defenses?
• How to use adversarial examples to improve machine learning, even when there is no adversary
![Page 6: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/6.jpg)
(Goodfellow 2016)
Adversarial Examples
Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack
![Page 7: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/7.jpg)
(Goodfellow 2016)
Turning Objects into “Airplanes”
![Page 8: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/8.jpg)
(Goodfellow 2016)
Attacking a Linear Model
![Page 9: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/9.jpg)
(Goodfellow 2016)
Not just for neural nets• Linear models
• Logistic regression
• Softmax regression
• SVMs
• Decision trees
• Nearest neighbors
![Page 10: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/10.jpg)
(Goodfellow 2016)
Adversarial Examples from Overfitting
x
x
x
OO
Ox O
![Page 11: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/11.jpg)
(Goodfellow 2016)
Adversarial Examples from Excessive Linearity
xx
x
O O
O
O
O
x
![Page 12: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/12.jpg)
(Goodfellow 2016)
Modern deep nets are very piecewise linear
Rectified linear unit
Carefully tuned sigmoid
Maxout
LSTM
Google Proprietary
Modern deep nets are very (piecewise) linear
Rectified linear unit
Carefully tuned sigmoid
Maxout
LSTM
![Page 13: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/13.jpg)
(Goodfellow 2016)
Nearly Linear Responses in Practice
![Page 14: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/14.jpg)
(Goodfellow 2016)
Maps of Adversarial and Random Cross-Sections
(collaboration with David Warde-Farley and Nicolas Papernot)
![Page 15: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/15.jpg)
(Goodfellow 2016)
Maps of Adversarial Cross-Sections
![Page 16: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/16.jpg)
(Goodfellow 2016)
Maps of Random Cross-SectionsAdversarial examples
are not noise
(collaboration with David Warde-Farley and Nicolas Papernot)
![Page 17: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/17.jpg)
(Goodfellow 2016)
Clever Hans(“Clever Hans,
Clever Algorithms,” Bob Sturm)
![Page 18: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/18.jpg)
(Goodfellow 2016)
Small inter-class distancesClean example
Perturbation Corrupted example
All three perturbations have L2 norm 3.96This is actually small. We typically use 7!
Perturbation changes the true class
Random perturbation does not change the class
Perturbation changes the input to “rubbish class”
![Page 19: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/19.jpg)
(Goodfellow 2016)
The Fast Gradient Sign Method
![Page 20: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/20.jpg)
(Goodfellow 2016)
Wrong almost everywhere
![Page 21: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/21.jpg)
(Goodfellow 2016)
Cross-model, cross-dataset generalization
![Page 22: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/22.jpg)
(Goodfellow 2016)
Cross-technique transferability
(Papernot 2016)
![Page 23: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/23.jpg)
(Goodfellow 2016)
Train your own model
Transferability AttackTarget model with unknown weights, machine learning
algorithm, training set; maybe non-differentiable
Substitute model mimicking target
model with known, differentiable function
Adversarial examples
Adversarial crafting against substitute
Deploy adversarial examples against the target; transferability
property results in them succeeding
![Page 24: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/24.jpg)
(Goodfellow 2016)
Adversarial Examples in the Human Brain
(Pinna and Gregory, 2002)
These are concentric
circles, not
intertwined spirals.
![Page 25: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/25.jpg)
(Goodfellow 2016)
Practical Attacks
• Fool real classifiers trained by remotely hosted API (MetaMind, Amazon, Google)
• Fool malware detector networks
• Display adversarial examples in the physical world and fool machine learning systems that perceive them through a camera
![Page 26: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/26.jpg)
(Goodfellow 2016)
Adversarial Examples in the Physical World
![Page 27: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/27.jpg)
(Goodfellow 2016)
Failed defenses
Weight decay
Adding noise at test time
Adding noise at train time
Dropout
Ensembles
Multiple glimpses
Generative pretraining Removing perturbation
with an autoencoder
Error correcting codes
Confidence-reducing perturbation at test time
Various non-linear units
Double backprop
![Page 28: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/28.jpg)
(Goodfellow 2016)
Training on Adversarial Examples
![Page 29: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/29.jpg)
(Goodfellow 2016)
Adversarial TrainingLabeled as bird
Decrease probability of bird class
Still has same label (bird)
![Page 30: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/30.jpg)
(Goodfellow 2016)
Virtual Adversarial TrainingUnlabeled; model
guesses it’s probably a bird, maybe a plane
Adversarial perturbation intended to
change the guess
New guess should match old guess
(probably bird, maybe plane)
![Page 31: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/31.jpg)
(Goodfellow 2016)
Text Classification with VATRCV1 Misclassification Rate
6.00
6.50
7.00
7.50
8.00
Earlier SOTA SOTA Our baseline Adversarial Virtual Adversarial
Both Both + bidirectional model
6.68
6.977.05
7.12
7.40
7.20
7.70
Zoomed in for legibility
![Page 32: Adversarial Examples and Adversarial Training...(Goodfellow 2016) In this presentation • “Intriguing Properties of Neural Networks” Szegedy et al, 2013 • “Explaining and](https://reader035.vdocuments.us/reader035/viewer/2022070903/5f6383f6f3d5ac6b27651b85/html5/thumbnails/32.jpg)
(Goodfellow 2016)
Conclusion
• Attacking is easy
• Defending is difficult
• Benchmarking vulnerability is training
• Adversarial training provides regularization and semi-supervised learning