behavioural malware analysis using sandnets

IntroductionThe proliferation of malware continues to grow at a staggering rate. There are an estimated 250 new variants of malware introduced into the wild every day[1]. The vast majority of these threats are not new creations, but simply a rehash of a known subset of malware. Unfortunately, it is not always easy to make a deter-mination as to whether a new malware sample has been previously seen, or is in fact, an original development.

The primary reason for the constant flood is the underground economy which has formed around malware over the past several years. Most other online crime, such as phishing, spam and extor-tion has some link to malware. It acts as a facilitator to provide computer crimi-nals with the means to carry out their illicit operations without fear of being caught or put out of business, by allow-ing them to channel their traffic through infected end-user systems. This simple concept has caught on with almost every cybercrime group in existence, and is fueling the creation of new malware every day.

The need for malware analysisAt some point most organizations recognize the need to detect and clas-sify the malware that is infecting their

computing base. Most of the time, this means relying on an antivirus com-pany to provide detection, analysis and remediation. Even though the detection is often delayed and the analysis terse, for most companies this is sufficient. However, some might feel an increased risk from malware, especially targeted attacks, where signature-based antivirus is least effective. For these situations, an organization might choose to deploy an in-house malware analysis team.

Two approaches generally govern the cost of performing routine malware anal-ysis. First, there is the reverse engineer-ing approach, where malware is painstak-ingly analyzed using disassemblers and decompilers. This method requires a certain skillset which is not widely found in the general IT population. Therefore, hiring a full-time malware analyst for what is often a sporadic need is not often the most economical solution.

Behavioural analysisThere is an easier approach to find-ing out what malware does without needing a high level of reverse engi-neering skill. This method is what we call “behavioural analysis” – simply watching what changes are made to an infected system, along with the network traffic sent by the malware. Although this can lead to incomplete results (as the malware may never exe-cute certain branches of code under the tested set of conditions), in most cases, it is good enough to understand the scope of the threat and deploy counter-measures and remediation.

Behavioural analysis has varying degrees of success depending on the ability of the malware to interact with its environment. Solutions which sim-ply “sandbox” the malware and record its local activity are liable to miss a large part of the picture. Modern malware often uses the Internet as a way to communicate status and to download additional components “on-the-fly.” A great deal of the malware samples we see every day are simply, “downloaders,” whose only task is to obtain another executable from the Internet and execute it. Without an easy way to watch the exchange between a piece of malware and its controller, we are often left in a state in which the malware is waiting for instructions and no more activity occurs – giving us nothing else to record and analyze.

At the same time, we can’t simply allow malware to have unfettered access to the Internet. There is always the danger that a sample under analysis may be a self-spreading worm, which may propagate to hosts inside or outside our network. This is an unac-ceptable risk. While the possibility of spreading out of our control can be reduced by selectively firewalling com-munication to and from the infected host, it also has the possibility of interfering with our analysis, while not completely removing the risk of an out-break event. This is, therefore, also not an acceptable solution.

Computer Fraud & Security December 20064

Behavioural malware analysis using SandnetsJoe Stewart

Malware analysis has long been an arcane art, left to those who have advanced low-level reverse engineering skills and dedicated research labs. For years this has been solely the domain of the professional antivirus company and its researchers. However, as more organiza-tions see the need for employing additional defensive mechanisms above and beyond antivirus scanning, the ability to observe the behavior of malware as it interacts with the network and the operat-ing system becomes imperative. This article introduces a system for quick and nearly-automated behavioural analysis of malware.

MALWARE ANALYSIS

Joe Stewart

“ 250 malware

variants get

into the wild

every day”

December 2006 Computer Fraud & Security5

The Sandnet conceptThe best approach to behavioural analysis which protects against pro-liferation of hostile code while also allowing as broad an analysis as pos-sible is through what we have termed a “Sandnet”. Sandnets can be thought of as a sandbox that encompasses an entire network. In fact, they can be used to simulate the entire Internet for the benefit of our malware. In order to provide a fake Internet to our infected hosts, we merely need to answer every possible IP address and every possible IP-based protocol, and the services provided by those protocols. In most cases, emulating every IP protocol is not necessary, as these protocols are not in common use by most malware. Providing TCP, UDP and ICMP serv-ices on all possible IP addresses is typi-cally enough.

Luckily, modern Unix-like operating systems such as Linux or one of the BSD flavours makes this task easy. Using the built-in firewall capabilities, it is pos-sible to route and mangle packets to our heart’s content. One network interface on one machine can listen for and pro-vide answers to traffic destined for any

IP address and service. The only thing required is that the machine is config-ured as the default route for our infected machine. This entire arrangement can be orchestrated with the simplest of network configurations, consisting of a crossover ethernet cable between the infected system and our “faux Internet” host.

It is also possible to create an entire sandnet on a single host, using virtual machine software such as VMware to run the infected operating system. Communication with the host operating system is done using a private “host-only” virtual network device instead of a crossover cable.

Becoming the InternetOnce we have configured the network, we need to provide services for the malware to talk to. Typical malware will most often need to communi-cate via HTTP, SMTP, DNS, and sometimes FTP. These services can be provided by standard servers listening on the local host interface of our faux Internet host. For the more paranoid, this may be an unacceptable risk; since there is always the possibility the

malware has exploit code for one or more of these servers. This could allow the malware to breach the containment of the sandnet under certain condi-tions. Another risk-reducing approach is to emulate these protocols using custom built minimal servers written in a language which is resistant to buffer overflow attacks. In many cases we can emulate “just enough” of a protocol to make the malware function normally, without a great deal of programming. The trade-off is that it makes it a lit-tle easier to detect the presence of a Sandnet when servers don’t behave exactly as expected.

Of course, there are times that we simply don’t know in advance what protocol a piece of malware is going to use. Occasionally a malware author may choose to write a custom proto-col for communication between an infected host and the control server. In these cases, a generic protocol may be simulated in order to try and elicit as much conversation from the malware as possible. This can be as simple as listening on a TCP port, waiting for data, and simply sending a new line or two for each packet of data received.

MALWARE ANALYSIS

Figure 1: Sandnet mechanism

Often, especially in the case of exploit code, programmers are not particular about the received data – they might only be interested in sending the pay-load as long as they receive “some” data from the target. In the case of custom communication protocols however, it may take some time and effort in order to effectively simulate the protocol. In these cases, behavioural analysis may fail to provide enough data on its own, so reverse engineering may have to fill in the gaps.

Post-mortemAfter the malware has run, it is often desired to perform forensics on the infected machine, in order to learn what files have been changed or added, and any other configuration changes that may have taken place. This post-mor-tem analysis is achieved by copying the raw disk image of the infected machine onto the Sandnet host, where it can be mounted and compared against an image saved before the infection.

This process can be automated by alternating the boot of our “target” (the operating system to be infected) and a maintenance operating system designed to do the network transfer of the raw partition data. A minimal Linux distri-bution booted over the network using PXE is ideal for this. Using PXE boot also gives us the ability to instruct the system to simply boot from the local hard drive image. So, by scripting the PXE boot server we can automate the booting of two different operating systems without the need for manual intervention on the console.

After the forensic image has been copied, the pristine image can be written back to the hard drive of the infected computer. Using VMware provides significant time saving in this step, as it provides a one-step proce-dure to revert to a previous snapshot of the virtual machine, which can be done in only a few seconds. Copying even a small Windows system partition over a crossover cable can take several minutes each direction. This would seem to provide a compelling reason

to use VMware exclusively instead of a two-system Sandnet. However, one major drawback for VMware-based analysis is that it is very easy for a program to detect it is running under VMware. In fact, many malware fami-lies already include code to detect if they are running in VMware, and if so, refuse to run. Some executable packers also include this detection and will not even unpack the program if VMware is detected, thus thwarting attempts at behavioural analysis.

AttacksLike VMware however, a two-system, hardware-only Sandnet can also be detected in a number of ways. For instance, a malware author could attempt to resolve a host name to see if it matched the predetermined IP address of that host name or whether is was simply a fake response generated by an emulated DNS server. Of course, there is always the option to run a recursive DNS resolver on the sandnet host and allow the malware to receive authentic replies, but this also intro-duces the possibility that the malware could compromise the integrity of the sandnet. This could be done by open-ing a covert communication channel over DNS, or possibly even by intro-ducing a DNS-based exploit into the resolver that might propagate outside of the Sandnet.

Another attack could be to test ping response times or IP TTLs to differ-ent sites known to be very different in terms of latency. If the ping times were extraordinarily short, or the TTLs were always the same no matter what site was requested, it could indicate a falsi-fied Internet environment. This attack could be mitigated by writing a kernel module which could change the TTL and introduce artificial latency to bet-ter simulate real-world network timing.

There are scores of other techniques an attacker could theoretically use in order to discern between a real and artificial Internet. However, unlike VMware, which only requires a few assembly instructions to detect, detecting Sandnets as a rule

would require more code and that code would have to be able to adapt to all pos-sible setups with which a Sandnet could be deployed. Since a Sandnet is not a “black-box” piece of software, it becomes difficult to predict exactly how any one entity will design and configure it.

ConclusionSandnet analysis is not a turn-key solu-tion to malware analysis, but it can help ease the burden of finding out what a piece of malware does within a reason-able time frame. As malware continues to evolve into an ever larger threat to the enterprise, the need for understanding it grows. These days almost all network intrusions involve malware in one way or another. As such it is the primary facili-tator of online crime, and is something we can no longer continue to ignore. Sandnets can help shed some light into the underworld of malware and possibly help stem the flow. ResourcesTruman – http://www.lurhq.com/tru-manVMware scripting API – http://www.vmware.com/support/developer/script-ing-APIHoneynet project/honeyd – http://www.honeyd.org About the authorJoe Stewart is a senior security researcher with SecureWorks, a leading managed security services provider. Since the year 2000, he has published over 50 articles, mostly related to dissection of advanced malware such as Sobig, Phatbot, Bobax, Sinit, Sober, Blackworm and SpamThru. Joe is a regular speaker at several security conferences, and has been recognized by the FBI’s National Cyber Forensics Training Alliance for his efforts in assisting the worldwide fight against cybercrime.

References1: Jürgen Schmidt, Thou Shalt Not

Create New Viruses, 2006, www.heise-security.co.uk/articles/77440

Computer Fraud & Security December 20066

MALWARE ANALYSIS

http://www.lurhq.com/truman

http://www.lurhq.com/truman

http://www.vmware.com/support/developer/scripting-API



http://www.honeyd.org

http://www.honeyd.org

http://www.heise-security.co.uk/articles/77440

http://www.heise-security.co.uk/articles/77440

behavioural malware analysis using sandnets

Documents