cs573 data privacy and security - cs.emory.edulxiong/cs573/share/slides/09_aml_advexample.pdf ·...

57
CS573 Data Privacy and Security Adversarial Machine Learning Li Xiong

Upload: others

Post on 04-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

CS573 Data Privacy and Security

Adversarial Machine Learning

Li Xiong

Page 2: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Machine Learning Under Adversarial Settings

• Data privacy/confidentiality attacks

• membership attacks, model inversion attacks

• Model integrity attacks

• Training time: data poisoning attacks

• Inference time: evasion attacks and adversarial examples

Page 3: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• Defenses

• Training time: data poisoning attacks

• Attacks

• Defenses

• Crowdsourcing applications

Page 4: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Attacks and Defense

Competition

NIPS 2017

Fereshteh Razmi

Spring 2018

Page 5: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Y E V G E N I Y ( E U G E N E ) V O R O B E Y C H I K 1

B O L I 2

ADVERSARIAL MACHINE LEARNING (TUTORIAL)

1Assistant Professor, Computer Science & Biomedical Informatics

Director, Computational Economics Research Laboratory

Vanderbilt University

2 Post Doctoral Research Associate, UC Berkeley

Page 6: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial ML applications

● Machine learning for adversarial applications

○ Fraud detection

○ Malware detection

○ Intrusion detection

○ Spam detection

● What do all of these have in common?

Page 7: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial ML applications

● Machine learning for adversarial applications

○ Fraud detection

○ Malware detection

○ Intrusion detection

○ Spam detection

● What do all of these have in common?

○ Detect bad “things” (actors, actions, objects)

Page 8: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Bad actors

● Bad actors (who do bad things) have objectives

○ the main one is not getting detected

○ they can change their behavior to avoid detection

● This gives rise to evasion attacks

○ Attacks on ML, where malicious objects are deliberately transformed to evade detection

(prediction by ML that these are malicious)

Page 9: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

EVASION ATTACKS

• Adversary who previously chose instance x (which would be classified as malicious) now chooses another instance x’ which is classified as benign

Benign

Malicious

Page 10: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

EXAMPLE OF EVASION

cheap = 1.0

mortgage = 1.5

Total score = 2.5

From: [email protected] mortgage now!!!

Feature Weights

> 1.0 (threshold)

1.

2.

3.

Spam

10

Page 11: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

EXAMPLE OF EVASION

cheap = 1.0

mortgage = 1.5

Total score = 0.5

From: [email protected] mortgage now!!!Joy Oregon

< 1.0 (threshold)

1.

2.

3.

OK11

Feature Weights

Joy= -1.0

Oregon = -1.0

Page 12: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

ADVERSARIAL EXAMPLES

Classified as panda Small adversarial noise Classified as gibbon

Page 13: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

ADVERSARIAL EXAMPLES

Small adversarial noise

Page 14: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Examples

Clean Example:

● Naturally occuring example (like in ImageNet

dataset)

Adversarial Example:

● Modified example

● Fool classifier to misclassify it (can be targeted

or untargeted)

● Unnoticable to human

Introduction Attack Methods Defenses NIPS Top Scores

Adv Examples

Evasion Attacks

● Malicious input

● Fool (binary) classifier to misclassy it as benign

(evade detection)

Page 15: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

ADVERSARIAL EXAMPLES

Figure by Qiuchen Zhang

Page 16: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Common Attack Scenarios

Type of outcome:

1. Non-targeted: predict ANY incorrect label

2. Targeted: Change prediction to some

SPECIFIC TARGET class

Adversary knowledge on the model:

1. White box

2. Black box with probing

3. Black box without probling

Feed data:

1. Digital attack:

a. Direct access to the digital representation

b. Precise control

2. Physical attack:

a. Physical world

b. Change camera angle, ambient light

c. Input obtained by a sensor (e.g. Camera or

microphone)

Introduction Attack Methods Defenses NIPS Top Scores

Attack Scenarios

Page 17: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• White box attacks

• Optimization based methods: L-BFGS, C&W

• Fast/approximate methods: FGSM, I-FGSM

• Black/gray box attacks

• Defenses

• Competition methods

• Training time: data poisoning attacks

Page 18: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

L-BFGS

Szegedy et al. (2014b)

● First methods to find adv examples for NN

● xadv closest image to x, classified as y’ by f

● Find δ with box-constrained L-BFGS

● Smallest possible attack perturbation

● Drawback:

a. can be defeated merely by degrading the

image quality (e.g. rounding to 8-bit

representation)

b. quite slow

Introduction Attack Methods Defenses NIPS Top Scores

White-box L-BFGS

Page 19: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Carlini and Wagner (C&W) (2017)

● Followed L-BFGS work

● Dealt with box constraints by change of variables: Xadv = 0.5(tanh(w) + 1)

● K: determine confidence level

● Used Adam optimizer

Introduction Attack Methods Defenses NIPS Top Scores

White-box Carlini

0.5(tanh(w) + 1)confience

Loss function

Page 20: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• White box attacks

• Optimization based methods: L-BFGS, C&W

• Fast/approximate methods: FGSM, I-FGSM

• Black/gray box attacks

• Defenses

• Competition methods

• Training time: data poisoning attacks

Page 21: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Explaining and Harnessing Adversarial Examples

Ian J. Goodfellow, Jonathon Shlens and Christian Szegedy

Google Inc., Mountain View, CA

Page 22: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks
Page 23: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks
Page 24: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks
Page 25: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Linear explanation of adversarial examples

Page 26: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Introduction Attack Methods Defenses NIPS Top Scores

White-box FGSM

Fast Gradient Sign Method (FGSM)

● Linear perturbation of non-linear methods

● Fast (one step) but not too precise

● Using infinity norm

Page 27: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Linear perturbation of non-linear models

Image from reference paper

Page 28: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Introduction Attack Methods Defenses NIPS Top Scores

White-box I-FGSM

Iterative Attacks (I-FGSM)

● L-BFGS: high success, high computational cost

● FGSM: low success, low computational cost

Rapid progress by one step

● Solution: Iterative method (small no. of iterations)

● Targeted attacks

Page 29: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• White box attacks

• Optimization based methods: L-BFGS, C&W

• Fast/approximate methods: FGSM, I-FGSM

• Black/gray box attacks

• Defenses

• Competition methods

• Training time: data poisoning attacks

Page 30: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Other White-box Attacks

● Madry et. al’s Attack

○ Start I-BFGS from a random point inside

ε-ball

● Adversarial Transformation Networks

● Non differentiable systems

○ cannot calculate gradient

○ use transferability for training (black

box)

Black-box Attacks

● Transferability: Xadv can fool one model, is able

to fool other models.

● 0% < fraction of transferable Xadv < 100l%

○ Source model

○ Target model

○ Dataset

● Luck or high transfer rate …?

● Probes: copy of the model (substitute)

● Fully Black box

○ Ensemble: if Xadv fool every model, it’s

more likely it can be generalized

Introduction Attack Methods Defenses NIPS Top Scores

Black-box

Page 31: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• White box attacks

• Optimization based methods: L-BFGS, C&W

• Fast/approximate methods: FGSM, I-FGSM

• Black/gray box attacks

• Defenses

• Adversarial training

• Detector/reformer based

• Competition methods

• Training time: data poisoning attacks

Page 32: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Defenses

● Image preprocessing/denoising

a. Compression

b. Median filter (reduce precision)

● Fail in white box attacks

● Gradient Masking

● Most white box attacks use gradients of the

model

● Defender: makes gradients useless

a. Non-differentiable

b. Zero gradients in most places

● Vulnerable to black box attacks: similar decision

boundaries

● Detection-based

● Refuse to classify adversarial examples

● May decrease Acc on clean data (Shallow RBF)

● Automated reforming/denoising

● Adversarial Training

● Train on both clean and adversarial examples

● Drawback:

○ Tend to overfit to specific attack (add noise)

○ If uses X trained by some max-norm constraint,

cannot resist on high perturbations

Introduction Attack Methods Defenses NIPS Top Scores

Page 33: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Training

Page 34: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Classifier

logits1Clean Image

Regular Training

True label

Cross entropy loss

Page 35: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Classifier

logits2

logits1

loss

Adversarial Image

Clean Image

Adversarial Training

True label Cross

entropy

Cross entropy

Page 36: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Classifier

logits2

logits1

loss

resnet18+fgsm

Vgg19+fgsm

Ensemble adversarial

trainingRandom choice

Clean Image

Ensemble Adversarial Training

True label Cross

entropy

Cross entropy

Vgg16+fgsm

Page 37: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Classifier

logits2

logits1 loss

Random choice

Ensemble Adversarial Training + Adversarial Logits Pairing

True label

Cross entropy

ALP loss

(MSE)

Logits pairing

Vgg16+fgsm

resnet18+fgsm

Vgg19+fgsm

Ensemble adversarial

training

Clean Image

Page 38: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• White box attacks

• Optimization based methods: L-BFGS, C&W

• Fast/approximate methods: FGSM, I-FGSM

• Black/gray box attacks

• Defenses

• Adversarial training

• Detector/reformer based

• Competition methods

• Training time: data poisoning attacks

Page 39: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

MagNet:

a Two-Pronged Defense against

Adversarial Examples*

*Dongyu Meng (Shanghai Tech University), Hao Chen (University of California Davis)

ACM CCS 2017

Page 40: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Manifold of normal examples

Misclassification on Adversarial:

1. Far from the boundary

No option to reject

1. Close to the boundary

Poorly generalized classifier

Adversarial vs. Normal Examples Introduction

Defense Evaluation

Introduction

Adversarial vs Normal

Attacks on Images

Existing Defense

MagNet Design

Detector

Reformer

Threat Models

Implementation

Conclusion

Page 41: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

1. Adversarial Training

Build a robust classifier

Train on both Adv. and Norm.

What attack to train on

2. Detecting Adv. Examples

Separate classification network (detector)

Train on both Adv. and Norm.

What attack to train on

3. Defensive Distillation

Train classifier in a specific way

Hard for attacks

Complex to retrain, Not protected against Carlini attack

Existing Defense Methods Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Introduction

MagNet Design

Detector

Reformer

Threat Models

Implementation

Conclusion

Existing Defense

Page 42: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

1. Adversarial Training

Build a robust classifier

Train on both Adv. and Norm.

What attack to train on

2. Detecting Adv. Examples

Separate classification network (detector)

Train on both Adv. and Norm.

What attack to train on

3. Defensive Distillation

Train classifier in a specific way

Hard for attacks

Complex to retrain, Not protected against Carlini attack

Existing Defense Methods

MagNet

● Doesn’t retrain classifier

● Uses only Normal examples

(can be generalized on attacks)

Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Introduction

MagNet Design

Detector

Reformer

Threat Models

Implementation

Conclusion

Existing Defense

Page 43: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

MagNet Design Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Existing Defense

Introduction

Detector

Reformer

Threat Models

Implementation

Conclusion

MagNet Design

Page 44: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Detector

Whether the input is adversarial

Detector based on reconstruction error

Trains an autoencoder by Normal examples

Reconstruction error:

High E(x) on Adversarial examples

Defines a threshold on E

Not effective on small E

Detector based on probability divergence

Uses AE and classifier softmax layer

f(x) = f(ae(x)), but f(x’) != f(ae(x’))

Softmax may saturate → add Temperature

Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Existing Defense

MagNet Design

introduction

Reformer

Threat Models

Implementation

Conclusion

Detector

Page 45: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Reformer

Autoencoder-based

Train:

▪ Use Normal examples

▪ Minimize Reconstruction Error

Test:

▪ Normal: same data generation process as Training set

▪ Adversarial: AE approximates it and makes it closer to Normal manifold

Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Existing Defense

MagNet Design

Detector

Introduction

Threat Models

Implementation

Conclusion

Reformer

Page 46: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Blackbox Attack

MNIST CIFER-10

Reformers 1 1

Detectors 2

Reconstruction error-

based

1 : Error-based

2 : Probability

divergence-based

(T=10 , 40)

Acc

Target

classifier

99.4% 90.6%

MagNet on

Normal

99.1% 86.8%

MagNet on

Adversarial

> 99%

(except Carlini L0)

> 75%

(> 99% half of attacks)

Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Existing Defense

MagNet Design

Detector

Reformer

Threat Models

Introcution

Conclusion

Implementation

Page 47: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Blackbox Attacks Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Existing Defense

MagNet Design

Detector

Reformer

Threat Models

Introcution

Conclusion

Implementation

Page 48: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Blackbox Attack

* MNIST , CIFAR-10

* Impact of reformer and detector ?

* Blackbox attack

* Carlini’s L2 attack with different Confidence level

* The higher confidence, the harder attack

Introduction

Defense Evaluation

Adversarial vs Normal

Attacks on Images

Existing Defense

MagNet Design

Detector

Reformer

Threat Models

Introcution

Conclusion

Implementation

Page 49: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• White box attacks

• Optimization based methods: L-BFGS, C&W

• Fast/approximate methods: FGSM, I-FGSM

• Black/gray box attacks

• Defenses

• Competition

• Training time: data poisoning attacks

Page 50: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

NIPS competition

Evaluation

● ImageNet

● Time limit: batch of 100 images less than 500s

● Development dataset vs. Secret final dataset

● 4 rounds (3 optional, just for test purposes)

● Test all Defense methods against all attacks

● Attack score: number of times they fool defense

methods

● Defense score: number of correctly classified

examples

Tasks

1. Non-targeted Adversarial Attack (blackbox)

2. Targeted Adversarial Attack (blackbox)

3. Defense against Adversarial Attacks: Robust

classifier

Introduction Attack Methods Defenses NIPS Top Scores

Page 51: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

NIPS

Results

91 non-targeted, 65 targeted, 107 defense

Introduction Attack Methods Defenses NIPS Top Scores

Baseline Top scoring Submission (TSAIL)

Score Score Wort Score

Defense 0.772 0.9531 0.1184

Non-targeted 0.346 0.781 0.136

Targeted 0.199 0.402 0

Page 52: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

1st place in Both attack tracks (TsAIL)

● Momentum iterative attack

● Attack strength vs. transferability (FGSM vs. I-FGSM)

● Use momentum to stablize update directions and escape from poor local optima (avoid overfitting)

● Use ensemble (fuse logits) to be more transferable (for black box attacks)

● For targeted attacks

○ Standard deviation instead of L1 norm

○ Set {-2,-1,0,1,2}, larger search space

Introduction Attack Methods Defenses NIPS Top Scores

Page 53: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Introduction Attack Methods Defenses NIPS Top Scores

2nd place in Both attack tracks (Sangxia)

● I-FGSM based

● Ensemble models for transferabillity: fi

● Image augmentation (rotate, scale, … by

parameter θ) for transferability: Tθ

● Only include adversarially trained models in later

iterations

● Targeted attacks:

○ Minimize loss wrt to ytarget

○ No augmentation

Page 54: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Introduction Attack Methods Defenses NIPS Top Scores

3rd place in targeted attack (FatFingers)

● Dynamic iterative ensemble attack

● Use many NN in ensemble (hit rate is zero)

● Dynamically exclude some NNs in later itr

○ Preassign no. of iterations

○ Remove if loss function is small enough

Page 55: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

1st place in Defense (TsAIL)

● Denoiser based

● Input: adversarial example, Output: adversarial

noise

● Loss function between the representation at l-th

layer in the target model for original and adv image

● Topmost convolutional layer

● Logits layer

● Submitted: DUNET (denoising U-net) +CNN

● Trained on FGSM and I-FGSM

Introduction Attack Methods Defenses NIPS Top Scores

Page 56: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Introduction Attack Methods Defenses NIPS Top Scores

2nd place in Defense (iyswim)

● Randomization

1. Random resizing layer

2. Random padding layer

3. Adversarially trained model

● Low level transformations may destroy

perturbation

● Advantages:

○ No additional training

○ Few additional computations

○ Compatible with other defense methods

● Randomization: it also can be applied to whitebox

attacks

Page 57: CS573 Data Privacy and Security - cs.emory.edulxiong/cs573/share/slides/09_AML_AdvExample.pdf · Machine Learning Under Adversarial Settings • Data privacy/confidentiality attacks

Adversarial Machine Learning

• Inference time: evasion attacks and adversarial example

• Background

• Attacks

• Defenses

• Competition

• Training time: data poisoning attacks