SESSION ID:
#RSAC
Ian Goodfellow
SECURITY AND PRIVACY OF MACHINE LEARNING
Staff Research ScientistGoogle Brain@goodfellow_ian
(Goodfellow 2018)
#RSAC
Machine Learning and Security
2
yy
hh
xx
W
wyyh1
h1
x1
x1
h2
h2
x2
x2
Machine Learning for Security
Malware detection Intrusion detection …
Security against Machine Learning
yy
hh
xx
W
wyyh1
h1
x1
x1
h2
h2
x2
x2
Password guessing Fake reviews …
(Goodfellow 2018)
#RSAC
Security of Machine Learning
3
yy
hh
xx
W
wyyh1
h1
x1
x1
h2
h2
x2
x2
(Goodfellow 2018)
#RSAC
An overview of a field
4
This presentation summarizes the work of many people, not just my own / my collaborators
Download the slides for this link to extensive references
The presentation focuses on the concepts, not the history or the inventors
(Goodfellow 2018)
#RSAC
Machine Learning Pipeline
5
X ✓x
y
Training data
Learning algorithmLearned parameters
Test input
Test output
(Goodfellow 2018)
#RSAC
Privacy of Training Data
6
X ✓ X
(Goodfellow 2018)
#RSAC
Defining (ε, δ)-Differential Privacy
7
(Abadi 2017)
(Goodfellow 2018)
#RSAC
Private Aggregation of Teacher Ensembles
8
(Papernot et al 2016)
(Goodfellow 2018)
#RSAC
Training Set Poisoning
9
xX ✓ y
(Goodfellow 2018)
#RSAC
ImageNet Poisoning
10
(Koh and Liang 2017)
(Goodfellow 2018)
#RSAC
Adversarial Examples
11
X ✓
x
y
(Goodfellow 2018)
#RSAC
Model Theft
12
X ✓x
y✓
(Goodfellow 2018)
#RSAC
Model Theft++
13
X ✓x
y✓Xx
(Goodfellow 2018)
#RSAC
Deep Dive on Adversarial Examples
14
...solving CAPTCHAS and reading addresses...
...recognizing objects and faces….
(Szegedy et al, 2014)
(Goodfellow et al, 2013)
(Taigmen et al, 2013)
(Goodfellow et al, 2013)
and other tasks...
Since 2013, deep neural networks have matched human performance at...
(Goodfellow 2018)
#RSAC
Adversarial Examples
15
(Goodfellow 2018)
#RSAC
Turning objects into airplanes
16
(Goodfellow 2018)
#RSAC
Attacking a linear model
17
(Goodfellow 2018)
#RSAC
Wrong almost everywhere
18
(Goodfellow 2018)
#RSAC
Cross-model, cross-dataset transfer
19
(Goodfellow 2018)
#RSAC
Transfer across learning algorithms
20
(Papernot 2016)
(Goodfellow 2018)
#RSAC
Transfer attack
21
Train your own model
Target model with unknown weights, machine learning
algorithm, training set; maybe non-differentiable
Substitute model mimicking target
model with known, differentiable function
Adversarial examples
Adversarial crafting against substitute
Deploy adversarial examples against the target; transferability
property results in them succeeding
(Goodfellow 2018)
#RSAC
Enhancing Transfer with Ensembles
22
(Liu et al, 2016)
(Goodfellow 2018)
#RSAC
Transfer to the Human Brain
23
(Elsayed et al, 2018)
(Goodfellow 2018)
#RSAC
Transfer to the Physical World
24
(Kurakin et al, 2016)
(Goodfellow 2018)
#RSAC
Adversarial Training
25
0 50 100 150 200 250 300
Training time (epochs)
10�2
10�1
100
Tes
tm
iscl
ass
ifica
tion
rate Train=Clean, Test=Clean
Train=Clean, Test=Adv
Train=Adv, Test=Clean
Train=Adv, Test=Adv
(Goodfellow 2018)
#RSAC
Adversarial Training vs Certified Defenses
26
Adversarial Training: Train on adversarial examples This minimizes a lower bound on the true worst-case error Achieves a high amount of (empirically tested) robustness on small to medium datasets
Certified defenses Minimize an upper bound on true worst-case error Robustness is guaranteed, but amount of robustness is small Verification of models that weren’t trained to be easy to verify is hard
(Goodfellow 2018)
#RSAC
Limitations of defenses
27
Even certified defenses so far assume unrealistic threat model
Typical model: attacker can change input within some norm ball
Real attacks will be stranger, hard to characterize ahead of time (Brown et al., 2017)
(Goodfellow 2018)
#RSAC
Clever Hans
28
(“Clever Hans, Clever Algorithms,”
Bob Sturm)
(Goodfellow 2018)
#RSAC
Get involved!
29
https://github.com/tensorflow/cleverhans
(Goodfellow 2018)
#RSAC
Apply What You Have Learned
30
Publishing an ML model or a prediction API? Is the training data sensitive? -> train with differential privacy
Consider how an attacker could cause damage by fooling your model Current defenses are not practical Rely on situations with no incentive to cause harm / limited amount of potential harm