defending against adversarial examples
TRANSCRIPT
![Page 1: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/1.jpg)
Defending Against Adversarial ExamplesIan Goodfellow, Staff Research Scientist, Google Brain
NIPS 2017 Workshop on Machine Learning and Security
![Page 2: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/2.jpg)
(Goodfellow 2017)
Adversarial Examples
Timeline: “Adversarial Classification” Dalvi et al 2004: fool spam filter “Evasion Attacks Against Machine Learning at Test Time” Biggio 2013: fool neural nets Szegedy et al 2013: fool ImageNet classifiers imperceptibly Goodfellow et al 2014: cheap, closed form attack
![Page 3: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/3.jpg)
(Goodfellow 2017)
Cross-model, cross-dataset generalization
![Page 4: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/4.jpg)
(Goodfellow 2017)
Cross-technique transferability
(Papernot 2016)
![Page 5: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/5.jpg)
(Goodfellow 2017)
Enhancing Transfer With Ensembles
(Liu et al, 2016)
![Page 6: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/6.jpg)
(Goodfellow 2017)
Train your own model
Transferability AttackTarget model with unknown weights, machine learning
algorithm, training set; maybe non-differentiable
Substitute model mimicking target
model with known, differentiable function
Adversarial examples
Adversarial crafting against substitute
Deploy adversarial examples against the target; transferability
property results in them succeeding (Szegedy 2013, Papernot 2016)
![Page 7: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/7.jpg)
Thermometer Encoding: One Hot Way to Resist
Adversarial Examples
Jacob Buckman*
Aurko Roy* Colin Raffel Ian Goodfellow
*joint first author
![Page 8: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/8.jpg)
(Goodfellow 2017)
Linear Extrapolation
�10.0 �7.5 �5.0 �2.5 0.0 2.5 5.0 7.5 10.0
�4
�2
0
2
4
Vulnerabilities
![Page 9: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/9.jpg)
(Goodfellow 2017)
Neural nets are “too linear”A
rgum
ent
to s
oftm
ax
Plot from “Explaining and Harnessing Adversarial Examples”, Goodfellow et al, 2014
![Page 10: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/10.jpg)
(Goodfellow 2017)
Difficult to train extremely nonlinear hidden layers
To train: changing this weight needs to have a large, predictable effect
To defend: changing this input needs
to have a small or unpredictable effect
![Page 11: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/11.jpg)
(Goodfellow 2017)
Idea: edit only the input layer
DEFENSE
Train only this part
![Page 12: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/12.jpg)
(Goodfellow 2017)
![Page 13: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/13.jpg)
(Goodfellow 2017)
Observation: PixelRNN shows one-hot codes work
Plot from “Pixel Recurrent Neural Networks”, van den Oord et al, 2016
![Page 14: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/14.jpg)
(Goodfellow 2017)
![Page 15: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/15.jpg)
(Goodfellow 2017)
Fast Improvement Early in Learning
![Page 16: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/16.jpg)
(Goodfellow 2017)
Large improvements on SVHN direct (“white box”) attacks
5 years ago, this would have
been SOTA on clean data
![Page 17: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/17.jpg)
(Goodfellow 2017)
Large Improvements against CIFAR-10 direct (“white box”) attacks
6 years ago, this would have
been SOTA on clean data
![Page 18: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/18.jpg)
(Goodfellow 2017)
Other results
• Improvement on CIFAR-100
• (Still very broken)
• Improvement on MNIST
• Please quit caring about MNIST
![Page 19: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/19.jpg)
(Goodfellow 2017)
Caveats
• Slight drop in accuracy on clean examples
• Only small improvement on black-box transfer-based adversarial examples
![Page 20: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/20.jpg)
Ensemble Adversarial Training
Florian Tramèr
Alexey Kurakin
Nicolas Papernot
Ian Goodfellow
Dan Boneh Patrick McDaniel
![Page 21: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/21.jpg)
(Goodfellow 2017)
Estimating the Subspace Dimensionality
(Tramèr et al, 2017)
![Page 22: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/22.jpg)
(Goodfellow 2017)
Transfer Attacks Against Inception ResNet v2 on ImageNet
![Page 23: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/23.jpg)
(Goodfellow 2017)
Competition
Best defense so far on ImageNet: Ensemble adversarial training.
Used as at least part of all top 10 entries in dev round 3
![Page 24: Defending Against Adversarial Examples](https://reader031.vdocuments.us/reader031/viewer/2022012920/61c87b554f6bcd313f1d0be3/html5/thumbnails/24.jpg)
(Goodfellow 2017)
Get involved!https://github.com/tensorflow/cleverhans