hybrid analysis - nextgen technology for advanced malware

June 2014

Payload Security UG (haftungsbeschränkt) [email protected], www.payload-security.com

Hybrid Analysis - NextGen Technology for Advanced Malware

As malware evolves, the era of pure dynamic analysis systems is coming to an end.

What potential does Hybrid Analysis have?

by Jan Miller ([email protected])

What you will learn… What you should know… About automated malware analysis challenges What Hybrid Analysis is about Why Hybrid Analysis is part of a successful strategy

Basic knowledge of x86 Assembly Basic knowledge of Malware Analysis Systems

Table of Contents Introduction ................................................................................................................................................ 2

Terminology ................................................................................................................................................ 3

Static Analysis ......................................................................................................................................... 3

Dynamic Analysis .................................................................................................................................... 3

Dormant Code......................................................................................................................................... 3

Hybrid Analysis........................................................................................................................................ 4

Hybrid Analysis in Action ............................................................................................................................ 5

Tools ....................................................................................................................................................... 5

VirtualBox ........................................................................................................................................... 5

StaticStream ........................................................................................................................................ 5

Dynamic Analysis Tools ....................................................................................................................... 5

Hybrid Analysis vs. Matsnu Trojan .......................................................................................................... 6

Conclusion ............................................................................................................................................ 11

Summary ................................................................................................................................................... 12

About the Tools .................................................................................................................................... 12

On the Web ........................................................................................................................................... 12

About the author .................................................................................................................................. 12

Table of Figures ......................................................................................................................................... 13

Bibliography .............................................................................................................................................. 14

mailto:[email protected]

http://www.payload-security.com/


June 2014

Payload Security UG (haftungsbeschränkt)

[email protected], www.payload-security.com 2

Introduction The Internet connects a wide range of personal computers for private and business purposes that often

run Microsoft Windows OS on x86 compatible architectures with Windows ranging at 90% market share

in the desktop segment (NetMarketShare, 2014). These monocultures are an extremely attractive

environment for numerous malware attacks. Today, malware often appears in the form of highly

complex Trojan systems that come with exploit kits and very sophisticated anti-detection measures. The

number of infections and the awareness in the industry is larger than ever. Today, there are about 4

million new infections per month (SecureList, 2014). The worm MyDoom.X alone caused damages of

about $38.5 billion – and that was in 2006 (Borglund, 2014). Lately, also due to the NSA scandal, the

awareness for IT security has been growing a lot and IT security is becoming a highly invested market.

Classical malware detection methods were based on pure static code analysis, such as finding a specific

byte pattern and matching it against a known database of “malicious signatures”. Static analysis can be

described (in the most general sense) as code analysis without execution of the target payload. In turn,

malware authors started releasing packed/encrypted or even polymorphic software that rendered

classical methods worthless. Consequently, anti-virus (AV) vendors, CERTs/CIRTs and malware

researchers started developing and using dynamic analysis systems. Dynamic analysis can be described

(in the most general sense) as code analysis during execution or emulation of the target payload. This

was a huge step in evolution, because when the execution environment is instrumented appropriately, it

allows the observer to see the target software behavior after the malware unpacks its security layers.

Today, dynamic analysis systems run the target software on virtual environments with hardware

acceleration support (such as VMWare or VirtualBox), in order to observe the malware behavior during

runtime. These often automatic systems are called “Sandbox” analysis systems, as they represent an

isolated execution environment for malware that simulates a real victim’s machine1. Using systems such

as VirtualBox, the virtual machine (VM) state can be restored to a clean state by loading predefined

snapshot files, thus allowing execution of numerous malware samples in sequence without the need to

restore the infected machine. Of course, malware authors have adapted to the growth of Sandbox

systems and introduced a variety of VM detection methods. If a VM environment can be detected, the

malware may behave differently as it would in the wild and not show its true behavior. The not-

observed malicious functionality is what we call dormant code. These avoiding techniques range from

delayed execution – so called “time bombs” – to complex system/hardware state detection methods.

For example, if the real payload is not executed within a reasonable amount of time – the analysis

system will give up on the analysis and potentially miss valuable information. Thus, dormant code

detection is a vital prerequisite to Sandbox systems. Analysis results get even better when dormant code

is analyzed in-depth using runtime context information.

Combining both static and dynamic analysis (typical term is Hybrid Analysis) in a fully automated,

scalable and performant analysis environment is the next generation in malware forensics and detection

algorithms. In this article, we will take a look at what dynamic analysis data is necessary to understand

dormant code and how we can combine it with static analysis to extract in-depth behavior information.

1 Executing malware on a prepared physical machine is possible as well, of course.



June 2014



Terminology In this chapter the most important terms are outlined, in order for all readers to be at the same level for

when the terms are used in the article.

Static Analysis Static analysis can be described in the most general sense as code analysis without execution of the

target payload. The target code (the analysis input data) may be a compiled binary file or a human-

readable format, such as program source code, scripting language files or any other type of machine

code representation. N. Ayewah et al. define static analysis as a method that “(…) examines code in the

absence of input data and without running the code, and can detect potential security violations (…),

runtime errors (…) and logical inconsistencies (…).” (Nathaniel Ayewah, David Hovemeyer, J. David

Morgenthaler, John Penix and William Pugh, 2008).

Dynamic Analysis Dynamic analysis can be described in the most general sense as code analysis during execution or

emulation of the target payload. Involved techniques are usually implemented by tools such as

execution visualizers, system observing tools (e.g. malicious behavior detection, intrusion detection,

performance observation, etc.), profilers or other types of behavior analysis tools (e.g. sandbox

systems). The only known technique used for performing dynamic analysis is instrumentation of the

target code or its host (i.e. instrumenting the Operating System to enable system-level profiling of the

suspect application), in order to profile the target code’s behavior (Kendall, 2007). Instrumentation

refers to techniques that insert additional code for analysis purpose (or instrumentation code) into the

target code, in order to measure client performance, detect bugs or intercept code-flow in order to

analyze certain behavior patterns. In malware analysis, behavior patterns are often the most interesting.

Dormant Code Dormant code or dormant functionality in malicious programs is payload/code that is not observed

during dynamic analysis. In the context of malware, dormant code (not to be confused with “Software

rot”) may be hiding very interesting behavior that was not executed during analysis for whatever reason

(e.g. due to virtual machine detection, a command and control server not being available, a long initial

sleeping delay, etc.). We can say that every pure dynamic analysis containing “no malicious behavior”

always contains some kind of dormant code (as the executed code coverage is never 100%) and

sometimes malicious dormant code. As the “false negative” case is to be avoided at all cost (i.e. thinking

something is clean that is not), it makes sense to invest resources into detecting dormant code. This can

be achieved by adding e.g. an additional static analysis layer on memory snapshots.

On a side-note: process memory context constantly changes. Thus, it is necessary to take memory

snapshots at an intelligent point in time or with a high frequency to “catch” e.g. unpacked code or

injected shellcode, etc. In a “perfect” world with quantum processors, an analysis system would be able

to observe any memory change and instantly analyze the entire process address space for all potentially

executable code locations and not make an impact on the performance. Unfortunately, we do not have

quantum computers and as such need to require on heuristics and shortcuts, leaving room for mistakes.



June 2014



For example, analysis systems that run through thousands of files per day have an analysis time limit

that they have to abide by. If nothing happens within the first ~5-10 minutes, it is off to the next file and

heuristics have to do the job. Thus, the better and more intelligent the underlying algorithms and

performance of the system overall is, the more files can be analyzed in a more complete and error-

reduced fashion. Of course, scalable systems and a lot of hardware can solve bad implementations to

some degree, but there is always a limit in the real world hardware-wise and other bottlenecks surface

on large parallel systems, i.e. quality starts at the lowest level keeping in mind a flexible architecture.

Hybrid Analysis Hybrid Analysis (HA) is something we call intelligent combination of static and dynamic analysis. It is a

technology or method that can integrate run-time data extracted from dynamic analysis into a static

analysis algorithm to detect behavior or malicious functionality otherwise not as easily possible. Often,

the dynamic “helper data” resembles memory snapshots, runtime API symbol data (memory reference

address values) and adding them as an input to a sophisticated static analysis engine (possibly including

data flow analysis). For example, if a dormant code sequence executes an indirect call, it would not be

possible to resolve the called function address without knowing the value read from a memory location

at the point in time of execution2. Even if we knew the value, it would not be possible to associate the

called function address with a system call, if a mapping of memory references to symbol information is

not available for the specific execution environment3

2 Using a memory snapshot from a later point in time is possible as well, if the value remains unchanged. 3 The “specific analysis” reference is important, because techniques such as ASLR (Address space layout

randomization) cause system API function addresses to not be predictable. As such, we always need to understand detected dormant code in a process context of a specific execution environment.



June 2014



Hybrid Analysis in Action In this chapter we will apply Hybrid Analysis techniques on an exemplary malware and evaluate the

results in order to take a look at the practical side of the topic. In the previous chapter, Hybrid Analysis

and its associated terms were outlined briefly.

Tools Before we get to the experimental results, the involved tools will be outlined briefly.

VirtualBox

For our example malware analysis, we will be using VirtualBox as our preferred virtual machine

environment. From the main page Oracle states that “VirtualBox is a powerful x86 and AMD64/Intel64

virtualization product for enterprise as well as home use. Not only is VirtualBox an extremely feature

rich, high performance product for enterprise customers, it is also the only professional solution that is

freely available as Open Source Software under the terms of the GNU General Public License (GPL)

version 2.” (VirtualBox) Sounds good? It is good. Definitely good enough to show what HA is about.

StaticStream

StaticStream is our preferred static analysis engine, as it can take dynamic data (such as memory

snapshots, symbol data) and put it together using HA technology. From the webpage, it is described as

following: “StaticStream is a high-performance static analysis engine that is written in C++ and can

analyze x86 PE files, memory dumps or shellcode. It uses a novel approach of combining dynamic data

with state of the art static analysis techniques in order to detect and understand dormant code. It offers

a wide range of configuration options and regular updates.” (Payload Security)

Dynamic Analysis Tools

For run-time data capturing we are going to use the AREE (Automatic Reverse Engineering Engine)

Manager and Monitor binaries. These are two in-house tools used at Payload Security to generate

dynamic data when running malware. These tools work similar to the Cuckoo Sandbox monitor library

“CuckooMon” in the sense that they detour calls at the application level, whereby the Manager is used

to load configuration data and start the analysis. The monitor is a DLL file that is injected into the initial

malware process and user-level hooks are applied to catch system API calls. Also, whenever the malware

tries to inject itself into another process (e.g. using a remote thread or other techniques), the monitor is

applied to the new target process. In order for our experiment to be successful, injected shellcode,

memory dumps, process context (loaded modules, registry accesses, mutants, etc.) and symbol

information (module exports) are logged before the malware is able to modify/taint the data. Why did

we use our own tools? Basically, we only decided to use them, because the generated dynamic data has

a preferred format that is understandable to StaticStream and we can show how HA works more easily.

If you want to replicate our experiment and want to try out the tools, feel free to contact us.



https://www.virtualbox.org/wiki/Virtualization

June 2014



Hybrid Analysis vs. Matsnu Trojan Now that we know about the tools involved, let us take a look at real malware and see HA come into

action. For our “experiment”, we decided to use a Trojan called Matsnu4 that encrypts files on the target

drive in order hold the unencrypted data as a ransom. These are the steps we will be taking:

Install a VirtualBox instance with a typical OS, such as Windows XP

Load Matsnu sample on the virtual machine drive

Run Matsnu sample using AREEv2Mgr and inject AREEv2Mon monitor library

Let the analysis run for a couple of seconds (it is enough) and grab the generated run-time data

Take the grabbed run-time data and use it to analyze memory snapshots using HA technology

Evaluate the results and draw a conclusion

First, let us install Windows XP and load Matsnu on the main drive. The following screenshot shows the

system after setup shortly before an analysis.

Figure 1: Start Screen after Installing Windows XP and loading “matsnu” on the main drive

4 MD5 e008e161cce090242262fc977b6fe707d3058cdaa3b5d5c3bab24c8c6b05ce9e



June 2014



As we can see, there is a “shared folder” (release) open with the Manager ready to start the Matsnu

application. Also, we notice that Matsnu is using a PDF icon in order to mislead the Windows user into

thinking it is dealing with a document and not an executable. As extensions are disabled by default, we

cannot know at first sight that it is an executable.

In the next screenshot we see the manager open and use the command “.run C:/Matsnu” to start

analysis manually. There is also a command-line interface, but that is not outlined here.

Figure 2: Running “matsnu” from the Manager using the interactive mode

At this point we can already observe an output folder “AREE” that has been created on the C: drive. It

will contain all the dynamic analysis information. Also, the Matsnu file is missing. Checking the captured

files in the “AREE” folder, we detect that this is implemented using a dynamically created batch, which is

deletes itself after deleting the original file “Matsnu.exe” on the C: drive. Also, the batch file is executed

from a duplicated process so that the original file is not in use by the OS. This is the batch file content:

:l

if not exist "C:\Matsnu.exe" goto e

del /Q /F "C:\Matsnu.exe"

goto l

:e

del /Q /F "C:\DOCUME~1\mjkdmjmj\APPLIC~1\5176313.bat"



June 2014



All in all, the malicious process duplicates itself upon startup, deletes the original file, but continues to

exist. The PDF file is missing for the user and the malware author’s probably assume that the user will

continue with daily business not putting thought to what happened.

After running the sample for a couple of seconds, we abort the analysis, quit the VM and take a look at

the captured dynamic data. This is how the dynamic data folder looks like.

Figure 3: Dynamic Data Folder

The “api” folder contains system calls and parameters, the “bin” folder contains captured files (e.g. the

*.bat file mentioned above), the “ctx” folder contains environment data (such as loaded modules, their

symbols, registry accesses, etc.), the “dmp” folder contains memory snapshots of multiple frames and

the “shc” folder contains extracted shellcodes. The “monprocs.csv” file contains an overview of all

monitored processes. In this case, the contents are similar to the following (reduced version):

15539444-00013192,"INJECT_NEW","c:\Matsnu.exe","\Device\HarddiskVolume1\Matsnu.exe","<date>"

15540015-00013280,"INJECT_EXISTING","C:\WINDOWS\system32\cmd.exe","\Device\HarddiskVolume1\WINDOWS\system32\cmd.exe","<date>"

15540115-00001528,"INJECT_EXISTING","C:\WINDOWS\Explorer.EXE","\Device\HarddiskVolume1\WINDOWS\explorer.exe","<date>"

We quickly see that Matsnu first runs the batch file and then injects itself into “explorer.exe” where it

remains to execute most of its payload. This makes manual debugging with e.g. OllyDbg more difficult.

Consequently, we first try to analyze the memory dump files (ignoring all system files) from the

explorer.exe process using symbol memory references and module information as “context

information”, which is one of the ideas of Hybrid Analysis. Specifically, we start StaticStream letting it

analyze the last frame of the process (i.e. the last “dump” we logged before quitting the VM), because it

often contains already unpacked code sequences. See the following StaticStream’s output in a shorter

form (passing by nearly 1.6 million instructions including data flow in an impressive ~3 seconds):

Welcome to AREE v2.1

Starting analysis ...

Adding undefined memory file 15540115-00001528.00000002.15561486.2B90000.00000040.mdmp (POI: 0, Executable: 1) for later analysis

…

Found a hidden PE file in memory file 15540115-00001528.00000002.15561486.3730000.00000002.mdmp at 3730000

…

Analyzing in-memory binary file 15540115-00001528.00000002.15561486.3730000.00000002.mdmp

Analyzing 1 exports

1 of 1 exports accepted

No packed files could be detected

…

Running heuristic scan on binary file 15540115-00001528.00000002.15561486.3730000.00000002.mdmp



June 2014



…

Generating final analysis report

Number of passed instructions: 1660669

Finished analysis in 3276 ms with a throughput of 445 KB/s

This is an excerpt of how one output folder with stream files containing disassembly listings looked like

(a human-readable output is the default behavior):

Figure 4: Streams Folder File Listing

Hand-browsing some of the stream files quickly reveal that one portion of the streams contains

encrypted payload and one portion contains unencrypted payload. Here are some of the more

interesting functions that could be used for post-processing to generate behavior signatures or used as

an entrypoint for an additional manual analysis:

Figure 5: Persistance using RegCreateKeyEx



June 2014



The above “code sequence” (or “Stream”) shows the call to RegCreateKeyExW at ADVAPI32.dll that

would otherwise not be detected using pure static analysis, as the indirect call memory reference would

not be resolved. In this case, the creation of a registry key and a registry key value was set during

execution, as indicated by the dynamic analysis registry logfile (i.e. the associated code sequence is not

dormant code):

Figure 6: Persistance using Registry

Converting the hex values to ASCII reveals the following pathway:

C:\Documents and Settings\mjkdmjmj\Application Data\Microsoft\qfpvideo.exe

Matsnu obviously tries to survive a reboot by adding itself to the auto-start registry, which is a very

common technique. Checking more streams, another interesting entrypoint was found quickly. It is the

function that encrypts the Command & Control server requests before sending the data over an

alternate HTTP connection.

Figure 7: Encrypting Payload before C&C request

The code location above is a good starting point to check cross-references and intercept the encrypted

key creation (of course, this requires a flexible monitor system). Also, please note that using a run-time

capturing mechanism located at the kernel level, such a system would not be able to capture the

unencrypted data without hooking into the user mode and becoming detectable again.



June 2014



Today, more and more malware is using encrypted traffic (not only HTTPS, but the payload itself being

encrypted as well), making it necessary to move closer to the malware code itself, as

encryption/decryption of important system data happens at the application level.

On a side note, the HA technology also revealed the following C&C server IP addresses using the

alternate HTTP port 8080:

50.31.146.134:8080 204.197.254.94:8080 78.129.181.191:8080 27.124.127.10:8080 173.203.112.215:8080

50.97.99.2:8080 103.25.59.120:8080 5.135.208.53:8080 50.31.146.109:8080 204.93.183.196:8080

… and a lot more interesting dormant code sequences, which are not outlined here.

Conclusion Although the Matsnu Trojan is not the most sophisticated malware available today, it is a good example,

because it reflects typical and state of the art aspects. The traffic communication uses encrypted

payloads, it tries to hide its payload injecting itself into a variety of processes, it decrypts its payload

inside the explorer making manual debugging difficult, and so forth. Using some run-time data capturing

tools we were able to extract a lot of information, including dormant code and complete symbol

information. Of course, the dynamic analysis tool was required to follow the malware into the explorer

and remain undetected. As a next step, the static analysis engine StaticStream associated run-time data

and generated code sequences for post-processing quickly, allowing us to find valuable analysis

entrypoints and behavior data otherwise unseen by a pure dynamic analysis engine.

In general we can say that static analysis is good, if the to-be-analyzed data is not encrypted, not

obfuscated and available in a more or less complete manner, etc. Sadly, this is not often the case with

malware today. Furthermore, we can say that dynamic analysis is good as well, but it misses dormant

code and potentially malicious functionality. As we cannot make any qualified statements about the

unknown, it is impossible for a pure dynamic analysis system to safely make a statement about a file

being benign/clean, because maybe the real payload was never executed. Thus, new Hybrid Analysis

(HA) technologies are not only a necessity, but part of a future solution in the battle on malware. Due to

the additional overhead imposed by hybrid technologies, very efficient and performance-oriented

algorithms are necessary, especially if viewed on a large scale.



June 2014



Summary In this article we outlined that today’s malware development is opening up new challenges for malware

analysis systems. In the early days, simple static analysis byte patterns were enough to detect and

classify malware. Then, as malware became more sophisticated, dynamic analysis systems that observed

run-time behavior surfaced. The dynamic analysis systems have evolved and are a powerful tool today,

but their impact is becoming more and more limited. Today, neither static nor dynamic analysis alone is

an effective weapon against modern malware. Dynamic analysis environments are either being detected

and/or malicious dormant code is not being analyzed, due to time-constraints or unpredictable code

flow behavior. Using intelligent algorithms and Hybrid Analysis (HA) technologies, the best of both

worlds can be put together: first-pass checks, analyzing/logging run-time behavior, as well as detecting

and understanding dormant code functionality. In this article we showed that Hybrid Analysis is an

answer, if the run-time data captured has a sufficient quality and the static analysis engine is flexible

enough to produce usable analysis results that can be post-processed to generate signatures or

indicators.

About the Tools In this article we put focus on a static analysis engine called StaticStream. It is a product of Payload

Security and makes automatic and efficient Hybrid Analysis available to dynamic analysis systems and

analysts. Its easy interface, high configurability and flexible data stream processing architecture make it

an interesting option to upgrade any dynamic analysis system for challenges today and tomorrow.

On the Web More information on StaticStream is available on the web at www.payload-security.com.

About the author Jan Miller is a specialist for static binary analysis algorithms, reverse engineering and malware

signatures. He is the CEO and founder of Payload Security UG (haftungsbeschränkt). In the past two

years, he has been putting focus on Android based malware, as well as implementing Hybrid Analysis

technologies for a leading dynamic analysis system.




June 2014



Table of Figures Figure 1: Start Screen after Installing Windows XP and loading “matsnu” on the main drive .................... 6

Figure 2: Running “matsnu” from the Manager using the interactive mode .............................................. 7

Figure 3: Dynamic Data Folder .................................................................................................................... 8

Figure 4: Streams Folder File Listing ........................................................................................................... 9

Figure 5: Persistance using RegCreateKeyEx ............................................................................................... 9

Figure 6: Persistance using Registry .......................................................................................................... 10

Figure 7: Encrypting Payload before C&C request .................................................................................... 10



June 2014



Bibliography Borglund, J. (2014, April). Top 5 Most Costly Viruses of All Time. Retrieved April 2014, from TopTen

Reviews: http://anti-virus-software-review.toptenreviews.com/top-5-most-costly-viruses-of-all-

time-pg5.html

Cuckoo Sandbox. (n.d.). Malwr - Malware Analysis by Cuckoo Sandbox. Retrieved June 24, 2014, from

https://malwr.com/analysis/YjQzNzExNjcwMDQyNDBhMmJmOTFhN2Y4ODk5ZmQ0NGM/

Kendall, K. (2007). Practical Malware Analysis. Mandiant, Intelligent Information Security.

Nathaniel Ayewah, David Hovemeyer, J. David Morgenthaler, John Penix and William Pugh. (2008).

Experiences Using Static Analysis to Find Bugs.

NetMarketShare. (2014, April). Desktop Operating System Market Share. Retrieved April 2014, from

http://www.netmarketshare.com/

Payload Security. (n.d.). Payload-Security.com - Combining Static and Dynamic Analysis Intelligently.

Retrieved June 24, 2014, from http://www.payload-security.com/

SecureList. (2014, April). Internet threats statistics. Retrieved April 2014, from SecureList:

http://www.securelist.com/en/statistics#/en/map/oas/month

VirtualBox. (n.d.). Oracle VM VirtualBox. Retrieved June 24, 2014, from https://www.virtualbox.org/



hybrid analysis - nextgen technology for advanced malware

Documents

static analysis

dynamic analysis tools

advanced malware

matsnu trojan

dormant code