exploring in the wild: a big data approach to application ... detection - exploring_in...exploring...

McAfee/RIM Confidential

Exploring in the Wild:

A Big Data Approach to Application Security

Research (and Exploit Detection)

Haifei Li, [email protected]

Chong Xu, [email protected]

CanSecWest, March 2014, Vancouver

About Us: Haifei

• Security Researcher at McAfee Labs • Previously: Microsoft, Fortinet

• Work on two questions:

1) How to find vulnerabilities? 2) How to exploit them? At McAfee my interests have been extended to a third:

3) How to detect the effect by answering the first and second?

Let’s help the world!

• Presented at BlackHat Europe 2010, CanSecWest 2011, REcon 2012, Syscan360 2012

• Living around here.

About Us: Chong

• Ph.D. from Duke University

• Director @ McAfee Labs IPS team

• Focus: • Advanced (0-day) exploit and malware defense • APT detection • Computer networking • Network and host security

• NIPS, HIPS, NGFW

Agenda

Exploit Detection and CVE-2013-3906

The Idea

3

2

4

5 Summary

The Situation We Are In 1

Exploring in the Wild

Some Background

• This is a practice we have been working on for more than one year

• Not a particularly technical deep drive • But the idea behind that is cool • And, most important, of long-term benefit • Extend our view on applications and threat

landscape

Motive

I am so tired of doing security research!

The Situation We Are In • In essence, security research is:

• Understanding “how it works” • Not just finding bugs or exploiting bugs

• The most popular client-side apps today

The Situation We Are In • Applications have become so “rich”

• So many features, so many “things” probably no one knows yet

• Interoperability: • IE can run Flash, Java, Reader, Office code • Chrome can run Flash, Java, Reader code • Office can run Flash, perhaps Java • Some OS-delivered features can also be

triggered by applications (e.g., .NET, DirectX) • …

• This is a complex “system” rather than a 90s server-side application

Vendors vs. Researchers

• Apps are developed by software giants • Microsoft, Google, Oracle, Adobe • Thousands of engineers fully committed to

deliver new features day by day • And don’t forget: Most of the apps are closed-

source

• What do security researchers have? • A small, inattentive community • Researchers in different orgs, different purposes,

different interests • Most of us also have daily, nonresearch duties

It’s Never Been a Fair Game

*Picture from Internet

Usual Research Approaches • Usually, a nice vulnerability or an innovative exploitation

technology is inspired by: • A crash found by fuzzing

• A large number of methodologies/frameworks discussed in recent years, too many to name

• Digging into new features (brings new vulns/ideas) • Flash JIT =>

• JIT Spraying bypasses ASLR+DEP [1] • Custom heap managements on Reader and Flash =>

• reliable heap-spray & the Flash “Vector” exploitation [2] • HTML 5 =>

• Canvas object “ImageData” heap-spray [3] • PDF XFA feature =>

• the PDF 0day (CVE-2013-0640) [4] • Office ActiveX interoperability =>

• non-scriptable heap-spray in Office (CVE-2013-3906) [5]

• But it won’t help a lot on understanding the application • Looks only into a point, not a surface

from the application level • Need to provide “template” that

triggers the features first • Can’t find interoperability-related,

logic, info leak, bugs, etc.

• Currently, recognizing “new features” is done manually or randomly

The Limitations • Fuzzing is cool

• Maximize testing the code for specific features

Application

Fuzzing

Adam: “Hey dude, I just found something weird when surfing on YouTube.” Bob: “I smell fresh, let’s go deep.”

A Conclusion

• Researchers find it really hard to catch all the “features” delivered by vendors

• Also for application behaviors

• Researchers need an interesting entry point so they can perform future research • An entry point could be anything, not just a crash

• The question: How to get those “entry points”?

Agenda


The Idea

3

2

4

5 Summary



Our Approach • At McAfee Labs, we own hundreds of millions of

samples • “Unlimited” PDF, Flash, Java, Office, HTML (URL)

samples • New ones arrive every minute

• So, we thought we might be able to leverage the huge number of samples for something new • For industry interests, we need to know if there

are “unclassified” exploits or, even worse, zero-day exploits.

• For research, we can leverage the resource to better understand “how it works”

A Simple Idea • We "execute" every sample in a single environment

• Sandboxing • By appropriate applications

• We record basically everything during the execution of the sample • File Access • Registry Access • Process Activities • Network Activities • Process Memory Status • And more

A “Big Data” Plan

• We “sign” all the information we collect for a single sample for a single environment • We call it “DNA”

• We store those DNAs in our DNA database

• Rather than drop it after execution

• We “data mine” in the DNA

database • Usually “DNA comparison”

How It Looks

Sample Feeder DNA Database

Core

“DNA” Comparison

• We compare the similarity of the DNAs

• Fact: Most samples we test are normal

• Example: • 1 million samples have behavior A • A few samples don’t OR • 1 million samples do NOT have behavior B • A few samples do

• So, the unusual ones attract some “interest”

The Goal: Finding the Interest

• This is the great mission!

• We collect DNAs

• We compare the DNAs

• We get something interesting (unusual)

• We analyze/research the sample

• We learn things and find new stuff! • We use the knowledge to improve our “real-time”

rules for zero-day detections! (Will discuss later)

• A learning-then-improving cycle

Case Study 1

• We don’t know whether loading C:\Windows\System32\MSCOMCTL.OCX into an Office process is malicious or interesting.

• But we found that only a few samples made that happen. • ~100 in a half-million Office samples

• Manual research showed all the samples are malicious exploits! • CVE-2012-0158 • CVE-2012-1856 • CVE-2013-3906 (the TIFF zero day we discovered)

• What we learned: If we see MSCOMCTL.OCX is loaded into an Office process, it’s likely an exploit.

Case Study 2 • Assume we find only a few HTML/URL samples triggering IE

process to access unusual location: C:\Windows\AppPatch\EMET.DLL

• While most others didn’t

• Prehacking trick to check if Microsoft EMET is installed, in the IE10 CVE-2014-0322 exploit (credit: FireEye*)

*http://www.fireeye.com/blog/technical/cyber-exploits/2014/02/operation-snowman-deputydog-actor-compromises-us-veterans-of-foreign-wars-website.html

Case Study 2 (cont.)

• What we learned: • Even if a component of the ITW exploit is missing

(say, the SWF in the CVE-2014-0322 exploit), by strictly examining (comparing) the behaviors we can still find the point, which may lead to the whole exploit discovery.

• The author believes this trick should be considered as a security vulnerability, though it’s currently not. • It allows bad guys to check the existence of a local file

from the Internet, not just for EMET dll, but also AV products, as well as detecting a VM environment

• It works on almost every IE, including IE11.

The Benefit of DNA Comparison

• We don’t have to know which application behavior is suspicious, malicious, or even interesting • We just need to find the unusual

(interesting) ones

• We can find most hidden exploits because we can find out unknown malicious behaviors through this approach

Agenda


The Idea

3

2

4

5 Summary



Exploit Detection Isn’t Hard

• Zero-day detection has become a hot topic in the industry

• However, it’s never been a “technology problem” • How hard is it to build a VM and hook

“CreateProcess”?

• Exploit behaviors, including post-exploitation behaviors, are usually very clear • Not much an exploit can do compared with malware,

especially on modern systems • Little room during/after bypassing all mitigations • Having not seen one in the wild is a real challenge

The Devil Is in the Details

• Lack of sample source (no sample)

• Poor management (have sample, didn’t test)

• Not the right environment (will discuss later)

• VM-detecting exploit • This would be cool, but we haven’t seen one in

the real world

• Our prediction for 2014

• It’s more like an intelligence or management problem!

The Devil Is in the Details (cont.)

• “Watering hole” attacks, or many online attacks, are targeting one or more specific environments • Running Win7? I exploit only XP (so many to name) • Running Office 2010 on Win7? I exploit Office 2007/2010

on XP (CVE-2013-3906) • Running IE8 and IE11? I exploit only IE10 (CVE-2014-

0322) • Using English? Sorry I work only on CN/JP/KR markets • Running latest Flash? No, I exploit only old versions, even

as a zero day! (CVE-2014-0497) [6] • …

• This is the most challenging one

Possible Mitigations • Run as many environment as possible

• Good: easy to do • Bad: applies only to Lab projects; will still miss some

because you can’t install every version

• Hooking on the “version-checking” code • Good: powerful • Bad: deep research (RE) required, not easy to do

• Static and Sandboxing • Static-scanning can find those “suspicious” ones • Mark them, and do deep multienvironment tests • I guess this would be the most practical way

• Our DNA-comparison approach can help with this!

Our System & Zero-Day Detection

• Initially, we insert a set of “strict” rules in the middle to detect zero-day exploits • PE dropped? Something downloaded? Process

created? And More • As the system helps us understand application

behaviors at a deep level, we continue to add and improve the rules

• A “learning-then-improving” cycle

Our System & Zero-Day Detection

Rules for exploit detection

Sample Feeder DNA Database

Core

Learn and improve

Next, let’s talk about a real example: the Office (TIFF) zero-day exploit

(CVE-2013-3906)

What Happened on Oct. 31?

• Halloween :P

• Our beta-running project for Office documents came online just the previous night • Got lucky?

• The suspicious sample we detected: • MD5: 1FD4F3F063D641F84C5776C2C15E4621

• Strong malicious behaviors: • http://flatnet.com/bruce/winword.exe • C:\Documents and Settings\<username>\Local

Settings\Temp\winword.exe

The Timeline • 09:21 AM: Detected the issue • 11:06 AM: I noted the log, started manual tests

• Beta running, no alerting component :p

• 12:44 PM: Call for researchers to confirm and analyze • Kudos to Bing Sun, Chong Xu, Xiaoning Li, Lijun Chen, and Vinay Karecha

• 01:20 PM: Reported to MSRC • Might not a bad idea to let the vendor know early

• Next days: Phone calls from MSRC, internal cooperation,

email exchanges, conf calls, etc. • Nov 5: Coordinated announcements

• http://blogs.mcafee.com/mcafee-labs/mcafee-labs-detects-zero-day-exploit-targeting-microsoft-office-2 • http://technet.microsoft.com/en-us/security/advisory/2896666

• Nov 6: Follow-up technical post discussing the DEP mystery in Office

• http://blogs.mcafee.com/mcafee-labs/solving-the-mystery-of-the-office-zero-day-exploit-and-dep

http://blogs.mcafee.com/mcafee-labs/mcafee-labs-detects-zero-day-exploit-targeting-microsoft-office-2





















http://technet.microsoft.com/en-us/security/advisory/2896666



http://blogs.mcafee.com/mcafee-labs/solving-the-mystery-of-the-office-zero-day-exploit-and-dep























Some Points: New Heap-Spray • First exploit seen using ActiveX in Office to do heap-spray

• Non-scriptable, Office security mitigations won’t block • Embedded Control Persistence Binary Data

• Other methods such as leveraging Flash ActionScript is already blocked, e.g. “Flash Click-to-Play”

• http://blogs.adobe.com/security/2013/02/click-to-play-for-office-is-here.html

Some Points: OpenXML != Safe

• First exploit seen organized in Office OpenXML format (.docx)

• Even though this is a TIFF-parsing vulnerability

• Seen a lot doc/ppt/xls/rtf exploits, but no docx/pptx/xlsx before

• Previously, OpenXML format was usually considered “safe”

Some Points: DEP Mystery • After our announcement of the discovery, several parties

analyzed the threat as well, but have reported inconsistent DEP/ROP results • Many claimed it uses ROP gadgets to bypass DEP

• While our analysis (on the sample we detected) showed it didn’t do any DEP bypass, and DEP is not even enabled on Win7 with Office 2007


Microsoft [7]

FireEye [8]

CrowdStrike [9]
























DEP Mystery: What’s Going On? • Our later analysis showed that it’s actually because of the

environment that the sample aims to exploit

• Office 2007 running on XP • Affected, no DEP, so simple heap spray works

• Office 2007 running on Windows 7 • Affected, still no DEP, simple heap spray still works

• Office 2010 running on Windows XP • Affected, DEP on, heap spray doesn’t work!

• The sample that runs ROP gadgets is for Office 2010 on XP!

The Attack

• A massive attack • We have identified more than 60 unique samples since

the first discovery • We were not “just lucky” to detect this

• First sample can be tracked to early July 2013 on VirusTotal • In the wild for at least four months

• Someone provided some good insights on Twitter • http://pastebin.com/64pBCgbw

• A carefully orchestrated attack • Exploits prepared for every vulnerable environment (as

seen in the DEP mystery)

• A nice exploit template • Bad guys need to modify only a little to make a future

Office zero day

Agenda


The Idea

3

2

4

5 Summary



Next, we’re going to review some “odd” things that we have found in the wild

I: Document Tracking

• Network activities are important behaviors

• Previously, I thought a document (PDF, Office) wouldn’t connect to third-party address. You?

• But I was pretty surprised: • Office documents connecting to third parties are *usual*

• Found too many samples • Mostly embedding remote images

• Not too many for PDFs, in two categories: • A patched vuln in PDF JavaScript API implementation • A feature in “rights-managed” PDFs

Office Document Tracking

• Affected all Office file formats, and RTF

• Mostly because of embedded pictures via HTTP or UNC protocols

• Even if Office sometimes warns users, the traffic has happened

PDF Document Tracking

• In April 2013, we reported a vuln in PDF JavaScript implementation that brings PDF tracking

• http://blogs.mcafee.com/mcafee-labs/tracking-pdf-usage-poses-a-security-problem

• This was considered a security vuln and was patched

• We also found a more complicated one • A PDF signed by Adobe LiveCycle Rights Management

• When a user opens the PDF, it connects to the “server” (defined in the PDF) to check out “policies”

• Anyone can define the server in the PDF • This is a PDF feature! - “rights management”

http://blogs.mcafee.com/mcafee-labs/tracking-pdf-usage-poses-a-security-problem















And More.. • Is this a big deal?

• It depends on you.. Your Privacy!

• Who is exploiting it? • A number of “services” are

offered online

• Note: All samples were found in the wild, some *may* have malicious intentions (domains registered under services companies)

• More discussion : http://justhaifei1.blogspot.com/2013/10/document-tracking-what-you-should-know.html

http://justhaifei1.blogspot.com/2013/10/document-tracking-what-you-should-know.html












II: Unusual Crashes

• If you are exploring in the wild, anything could happen

• A crash is a strongly suspicious behavior

• Various crashes we have seen: • Crash “on purpose”—Office! • Unconfirmed crashes in Office • Stack overflow in IE • ..

Crash on Purpose • We have detected a couple hundred samples that crash a

fully patched Office

• mscomctl.ocx 6.1.98.34, offset 0x00054f86 => CVE-2012-0158

• kernel32.dll, offset 0x00012fd3 => CVE-2013-3906 (integer overflow exception)

Crash on Purpose

• They really are “on purpose”! • Believe or not, this is how MS Office patches

vulns :P • I personally have no idea about this

• I got a novel idea to detect previous Office exploits

with updated systems • Just check where it crashed (offset and stacktrace) • Highly-accurate detection*

*During my analysis, I saw a lack of VT detections for many previous

Office exploits

Crash Not on Purpose

• Sometimes, even clean samples will deliver crashes!

• Need tough, manual research • Our strategy is to first rule out the possibility of

zero-day exploit • Though, sometimes you can’t be 100% sure! Especially for

Office binary formats

• No time to do deep/full research for every one

• “Free zero-day time”!

Excel Zero-Day Crash • Crashes on Office

2007/2013, but not Office 2010

• .XLS

• No malicious content found

• No detection on VT

Excel Zero-Day Crash 2 • Crashes all versions of

Office

• .XLS

• No malicious content found

• No detection on VT

IE11 Stack Overflow • Not a stack buffer overflow

• Not malicious

• Just an unfortunate endless loop

• A sample window for a sense of IE11’s QA

III: Identifying Unknown Attacks & Tech

• A real example showing how this approach can detect unknown potential attacks & provide “entry points” for future research

• During our project, we saw some web pages will trigger the IE process accessing following locations

• While most others didn’t

• A typical Windows searching order when the full path is not provided

Some Backgrounds Found

• Future manual research shows that it’s triggered by the following JavaScript code

• The code we detected is hosted at: • http://cpro.baidustatic.com/cpro/ui/

ci.js • A website owned by Baidu.com, a

well-known Chinese search giant • Not a malicious exploit • Used to detect if user is using the

360 Browser, a browser developed by another well-known Chinese local company 360.cn

• Not a new trick, discussed here: • http://segmentfault.com/q/1010000

000117437

http://cpro.baidustatic.com/cpro/ui/ci.js

http://cpro.baidustatic.com/cpro/ui/ci.js

http://segmentfault.com/q/1010000000117437

http://segmentfault.com/q/1010000000117437

Some More Thoughts

• An entry point for future research

• What we learn? • Leveraging “Image()” object & the “res://” protocol

allows you to access local file via IE • Providing full path will allow to determinate the

existence for specific local file (most for PE files) • Pretty much like the “EMET checking” trick in the

recent CVE-2014-0322 IE10 exploit (discussed before)

• Thoughts: should also be considered as a security vulnerability?

IV: Example: Identifying a New Feature

• As we’ve discussed, “DNA comparison” will help on identifying some interesting features

• Just an example, in PDFs • We noted some little-known DLL(s) are loaded

• MD5: 98D3249FE81732805685F538EB57A518 • Indicates that Adobe Reader can play multimedia

natively in PDFs • An entry point: Fuzzing? REing? More?

Agenda


The Idea

3

2

4

5 Summary



Summary and Conclusion

• This “Big Data” approach not only helps (zero-day) exploit detection, but also benefits advanced security research: • Provides various “entry points” for future research • Have a “surface” view of the application’s

behaviors as well as features • Open our eyes!

• Have a sense of the exploit threat landscape • Zero-day detection! • A way to find most hidden attacks

Challenges

• Manpower • Need teamwork to build it (architecting,

automation, coding, etc.) • Need a talent security research team to analyze

various samples that attract interest, especially in early stages

• Still, the devil is in the details

• A research-oriented project • Careful automation to collect meaningful data • Required strong security knowledge on

application and OS

Major References [1] Dion Blazakis, "Interpreter Exploitation: Pointer Inference and Spraying" [Online]. Available: http://www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf [2] Haifei Li, "Smashing the Heap with Vector: Advanced Exploitation Technique in Recent Flash Zero-day Attack" [Online]. Available: https://sites.google.com/site/zerodayresearch/smashing_the_heap_with_vector_Li.pdf [3] Federico Muttis, " HTML5 Heap Sprays: Pwn all the things" [Online]. Available: http://exploiting.files.wordpress.com/2012/10/html5-heap-spray.pdf [4] Matthieu Bonetti, "CVE-2013-0640: Adobe Reader XFA oneOfChild Un-initialized memory vulnerability"[Online]. Available: https://labs.portcullis.co.uk/blog/cve-2013-0640-adobe-reader-xfa-oneofchild-un-initialized-memory-vulnerability-part-1 [5] Haifei Li, "McAfee Labs Detects Zero-Day Exploit Targeting Microsoft Office" [Online]. Available: http://blogs.mcafee.com/mcafee-labs/mcafee-labs-detects-zero-day-exploit-targeting-microsoft-office-2 [6] Vyacheslav Zakorzhevsky, "CVE-2014-0497 – a 0-day vulnerability" [Online]. Available: https://www.securelist.com/en/blog/8177/CVE_2014_0497_a_0_day_vulnerability [7] Elia Florio, "CVE-2013-3906: a graphics vulnerability exploited through Word documents" [Online]. Available: http://blogs.technet.com/b/srd/archive/2013/11/05/cve-2013-3906-a-graphics-vulnerability-exploited-through-word-documents.aspx [8] Xiaobo Chen, Dan Caselden and Mike Scott, "The Dual Use Exploit: CVE-2013-3906 Used in Both Targeted Attacks and Crimeware Campaigns" [Online]. Available: http://www.fireeye.com/blog/technical/cyber-exploits/2013/11/the-dual-use-exploit-cve-2013-3906-used-in-both-targeted-attacks-and-crimeware-campaigns.html [9] Jason Geffner, "Analysis of a CVE-2013-3906 Exploit“ [Online]. Available: http://www.crowdstrike.com/blog/analysis-cve-2013-3906-exploit/index.html

McAfee/RIM Confidential

Questions?

Bing Sun, Xiaoning Li (Intel Labs), Dan Sommer, and James Walter also contributed to this presentation