malware packing heat enigma v6 - usenix.org · ange albertini 2009-2010 creative commons...

34
1 When Malware Is Packing Heat Davide Balzarotti and Giovanni Vigna USENIX Enigma 2018

Upload: duongque

Post on 28-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

1

When Malware Is Packing Heat

Davide Balzarotti and Giovanni Vigna

USENIX Enigma 2018

Page 2: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

2

Packing

Page 3: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

3

Packing

Page 4: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

4

Researchers often have a limited understanding of the complexity of runtime packers

Page 5: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

5

Researchers often have a limited understanding of the complexity of runtime packers

AV software often mis-classify benign packed samples as malicious

Page 6: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

6

Researchers often have a limited understanding of the complexity of runtime packers

AV software often mis-classify benign packed samples as malicious

We all love ML, but in the presence of packing it just learns the wrong thing

Page 7: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

7

Page 8: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

8

Layer 1

Page 9: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

9

Layer 1 Layer 2

Page 10: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

10

Layer 1 Layer 2 Layer 3

Page 11: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

11

Layer 1 Layer 2 Layer n

packer code application code

Page 12: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

12

Page 13: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

13

Page 14: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

14

Complexity Classes [Class I] a single unpacking routine is executed before transferring the control to the unpacked program

[Class II] multiple unpacking layers are executed sequentially and lead to the original code at the end

[Class III] intermediate layers are executed in loops

[Class IV] the packer code is interleaved with the execution of the unpacked program

[Class V] pieces of the original program are unpacked on-demand

[Class VI] only a single fragment of the original program (as little as a single instruction) is unpacked in memory at any moment in time

Page 15: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

15

Off-The-Shelf Packers Custom Malware Packers

Page 16: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

16

Ange Albertini 2009-2010 Creative Commons Attribution http://corkami.blogspot.com

Page 17: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

17

Why Does Packing Matter?

§  Dynamic analysis techniques (e.g., sandboxes) have been introduced to deal with packing…

§  …but static analysis techniques are more efficient!

Page 18: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

18

An Experiment

§  Benign programs from Windows OSs (XP, Vista, 7, NT) §  7983 samples

§  Packed with 4 different packers §  16663 samples

§  Submitted to VirusTotal §  Looking for 10+ detections

§  See: http://sarvamblog.blogspot.com/2013/05/nearly-70-of-packed-windows-system.html

Page 19: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

19

Results

§  UPX: 0% False Positives

§  BEP: 72.78% False Positives

§  NsPack: 98.72% False Positives

§  Upack: 99.88% False Positives

Page 20: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

20

Packing = Malware?

§  False Positives §  Dataset Pollution

Page 21: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

21

How Did We Get Here?

§  Machine Learning has been increasingly used to perform malware detection

§  The misclassification of packed binaries is the result of learning the wrong thing…

§  Let’s take a step back!

Page 22: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

22

What Is Machine Learning?

§  “Machine learning explores the study and construction of algorithms that can learn from and perform predictive analysis on data” https://en.wikipedia.org/wiki/Machine_learning

Page 23: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

23

Why Machine Learning?

§  Supports data analysis

§  Supports characterization

§  Supports classification

Page 24: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

24

?Round?

Has>3sides?

Machine Learning

Page 25: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

25

Machine Learning

Page 26: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

26

?Redsarebad

Blues,greens,orangesaregood

Whataboutgreys?

Machine Learning

Page 27: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

27

Machine Learning

Page 28: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

28

Pitfalls in Machine Learning

Page 29: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

29

Pitfalls in Machine Learning

Page 30: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

30

Pitfalls in Machine Learning

Page 31: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

31

Pitfalls in Machine Learning

Page 32: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

32

Another Experiment

Insight: When most of malware is packed, packing is what is actually learned

ratio % # packed malicious # unpacked malicious # unpacked benign accuracy % (mean) variance % % false positive for benign packed0 0 5173 5173 94.607 0.007 7.0335 258 4914 5173 93.766 0.039 20.038

10 517 4655 5173 93.283 0.107 20.57715 775 4397 5173 94.452 0.003 26.67320 1034 4138 5173 94.19 0.006 27.81825 1293 3879 5173 94.065 0.01 29.8530 1551 3621 5173 94.326 0.009 28.11635 1810 3362 5173 94.412 0.009 31.88240 2069 3103 5173 94.712 0.015 32.80345 2327 2845 5173 94.751 0.018 35.02550 2586 2586 5173 94.48 0.018 35.37455 2845 2327 5173 94.75 0.02 35.91360 3103 2069 5173 94.712 0.032 36.66765 3362 1810 5173 94.982 0.052 36.4170 3621 1551 5173 94.934 0.052 38.25275 3879 1293 5173 95.466 0.071 34.60280 4138 1034 5173 95.533 0.074 35.82285 4397 775 5173 95.756 0.055 35.65690 4655 517 5173 95.407 0.234 36.12895 4914 258 5173 96.123 0.076 49.711

100 5173 0 5173 96.839 0.008 52.451

Page 33: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

33

Conclusions

§  Applying machine learning to packed malware might lead to the detection of packing (and not the detection of malicious behavior) resulting in false positives §  De-sensitization caused by false positives §  Pollution of datasets

§  Sophisticated dynamic unpacking and analysis is necessary

Page 34: malware packing heat enigma v6 - usenix.org · Ange Albertini 2009-2010 Creative Commons Attribution . 17 Why Does Packing Matter?

34

Questions?

process by Roman from the Noun Project Machine learning picture: https://xkcd.com/1838/