modelingmobileresourcesecurity - matematica · laurea magistrale in matematica...

Università degli Studi Roma TreLaurea Magistrale in Matematica

Modeling Mobile Resource Security

Supervisor:Dr. Roberto Di Pietro

Assistant Supervisor:Dr. Flavio Lombardi

Candidate :Sara Rossicone

Academic Year 2012/2013

1

Mobile devices are rapidly becoming the dominant computing platforms. The mobility and connectiv-ity these devices afford, provide immense utility. As such, recent years have seen a growth in the numberof security sensitive applications that run on commercially available mobile devices.

In particular, most popular mobile platforms, such as Android, Symbian, iOS, Blackberry, andWinCE,provide access to application markets that allow third party applications to be downloaded and installedonto the device. In addition, however some devices permit the installation of apps from unknown sources.

Several solutions aiming to solve this problem have been proposed and some products are commer-cially available. We have chosen Android, since it is open source and allows greater experimental activity.

Android is a popular mobile operating system that is installed in millions of devices and accountedfor more than 50% of all smartphone sales in the third quarter of 2011. The popularity of Android andthe open nature of its application marketplace makes it a prime target for attackers (i.e. malware).

The mobile threat model includes three types of threats: malware, grayware, and personal spyware.Malware gains access to a device for the purpose of stealing data, damaging the device, or annoying theuser, etc. The attacker defrauds the user into installing the malicious application or gains unauthorizedremote access by taking advantage of a device’s vulnerability. Malware provides no legal notice to theaffected user. This threat includes Trojans, worms, botnets, and viruses. Malware is illegal in manycountries, including the United States, and the distribution of it may be punishable by jail time. Spywarecollects personal information such as location or text message history over a period of time. Personalspyware sends the victim’s information to the person who installed the application onto the victim’sdevice, rather than to the author of the application. Grayware spies on users, but the companies thatdistribute grayware do not aim to harm users.

The rapid growth of mobile malware calls for effective malware detection on mobile devices. Eachapplication specifies which resources of the device are required. Users can grant or deny its installationand their permissions needed. Even if a user can be warned about the risk of having accepted suspiciouspermission, the spreading of real malware has demonstrated that users directly trust any applicationrequest and install them on their phones.

Different approaches have been proposed to contain security risks. Many researchers have identifiedareas of concern and proposed solutions to problems on smartphone operating systems.

As we will better illustrate, several tools were developed to identify information leaks on smartphoneplatforms using dynamic analysis techniques. Another line of research deals with the confused-deputyproblem on Android, where inter-process communication channels can inadvertently expose privilegedfunctionality to unprivileged callers. Other systems attempt to use or extend the Android permissionsystem to defend against malware. Besides the above defenses, some work has been proposed to applycommon security techniques from the desktop to mobile devices. In particular we will turn our attentionto dynamic and static analysis, underlying their complementary role in detecting malicious apps’ behav-ior, combining them with tainting technique.

The principal aim of our work is to model mobile resource security evaluating both static and dynamicanalysis techniques. Static analysis, mostly used by antivirus companies, is based on source code orbinaries inspection looking at suspicious patterns. Through some tools, we will show how to decompileapk files and retrieve the source code. To this end, we have analyzed two important malicious app whichhave been gripped Android’s users for different months: DroidKungFu and Mobile Zeus.

On the other hand, the dynamic analysis or behavior-based detection, involves running the sample ina controlled and isolated environment in order to analyze its execution traces.

To perform our experimental activity we will examine DECAF/DroidScope, a fine grained dynamicbinary instrumentation tool for Android that rebuilds two levels of semantic information: operatingsystem and Java. In particular we will make use of the DalvikInstructionTracer plugin which will beresponsible for tracking all events and printing them on a file.

Due to obfuscation and encryption of real malicious apps’ code, we have written an application whichwould follow their same behavior, but that would allow us to monitor it at real time. The application atissue, steals all kinds of sensitive data, i.e. IMEI number, contacts, GPS Coordinates. Giving it numerous

2

of permissions in the Manifest File, we submitted a complete example of what malicious programmersaim to gain from users. Through tainting technique, we will prove that the plugin really traces events.In doing so, we will establish what variables could be considered notable.

Furthermore, we will show how to launch an attack by flooding the system with huge amount ofrequests and causing a Denial of Service. We recall that a Denial of Service attack is an effort to makeone or more computer systems unavailable and it is typically targeted at web servers, but it can alsobe used on mail servers, name servers, and any other type of computer system. Denial of service (DoS)attacks may be initiated from a single machine, but they typically use many computers to carry out anattack. Therefore, distributed denial of service (DDoS) attacks are often used to coordinate multiplesystems in a simultaneous attack. Our objective involves the creation of hundreds of threads which stealthe IMEI number and by making lots of un-optimal calculations, cause the halt of the plugin and of theentire system. Once performed the attack, we will take care to find a solution to the flooding, blockingthe requests.

The solution will be pursued with the string matching technique. String searching algorithms, some-times called string matching algorithms, are an important class of string algorithms that try to find aplace where one or several strings (also called patterns) are found within a larger string. Our anti-floodingstrategy consists in monitoring the writing of the dalvik file during the activity execution and in stoppingits malicious behavior as soon a particular string is found.

Android SecurityAndroid Security [12] is based on different mechanisms.It protects applications and data through a combination of two enforcement mechanisms, one at thesystem level and the other at the Inter Component Communication level (ICC level).ICC mediation defines the core security framework, but it builds on the guarantees provided by the un-derlying Linux system. In the general case, each application runs as a unique user identity, which enablesAndroid to limit the potential damage of programming flaws.ICC isn’t limited by user and process boundaries. Because the file must be world-readable and writable forproper operations, the Linux system has no way of mediating ICC. Although user separation is straight-forward and easily understood, controlling ICC is much more subtle and warrants careful consideration.As the central point of security enforcement, the Android middleware mediates all ICC establishmentby reasoning about labels assigned to applications and components. In its simplest form, access to eachcomponent is restricted by assigning it an access permission label; this text string need not be unique.When a component initiates ICC, the reference monitor looks at the permission labels assigned to itscontaining application, and if the target component’s access permission label is in that collection, it allowsICC establishment to proceed. If the label isn’t in the collection, establishment is denied even if the com-ponents are in the same application. The developer assigns permission labels via the XML Manifest Filethat accompanies every application package. In doing so, the developer defines the application’s securitypolicy i.e. assigning permission labels to an application specifies its protection domain, whereas assigningpermission to the components in an application specifies an access policy to protect its resources. Becauseapplications often contain components that other applications should never have access to, another wayto strengthen Android security is to declare a component private in the Manifest File. If a component isprivate, the only components that can interact with it are those from the same app or another app thatruns with the same Unique Identification Number (UID). By making a component private, the developerdoesn’t need to worry which permission label to assign it, or how an other application might acquire thatlabel.

Components aren’t the only resource that require protection. In fact, unprotected intent broadcastscan unintentionally leak information to explicitly listening attackers. To challenge this, the AndroidApplication Programming Interfaces (API) for broadcasting intents optionally allow the developer tospecify a permission label to restrict access to the intent object. The access permission label assignmentto a broadcasted intent restricts the set of applications that can receive it. The Manifest File therefore,doesn’t give the entire picture of the application’s security. Let’s now introduce the concept of a “pendingintent”.

3

A developer can define an intent object as normally done to perform an action (to start an activity, forexample) but instead of performing the action, it can pass the intent to a special method that creates aPending Intent object, corresponding to the desired action.The Pending Intent object is simply a reference pointer that can allow applications included with theframework, to integrate better with third-party applications. Pending intents let applications direct thebroadcast to a specific private broadcast receiver. This prevents forging without the need to coordinatepermissions with system applications. Used correctly, they can improve an application’s security. In fact,several Android APIs require pending intents, such as the Location Manager, which has a “proximity up-date” feature that notifies an application via intent broadcast when a geographic area is entered or exited.

Not all system resources are accessed through components, for example, Android provides directAPI access. In fact, the services that provide indirect access to hardware often use APIs available tothird-party applications. Android protects sensitive APIs with additional permission label checks: anapplication must declare a corresponding permission label in its Manifest File to use them. By protectingsensitive APIs with permissions, Android forces an application developer to declare the desire to interfacewith the system in a specific way. Consequently, vulnerable applications can’t gain unknown access ifexploited. The most commonly encountered protected API is for network connections.

Early versions of the Android Software Development Kit (SDK) let developers mark a permission as“application” or “system”, extending the previous model into four protection levels for permission labels:

• Normal permissions, like the old permissions, are granted to any applications that request them inits Manifest;

• Dangerous permissions are given only after user confirmation;

• Signature permissions are granted only to applications signed by the same developer key as thepackage defining the permission;

• Signature or system permissions act like signature permissions but exist for legacy compatibilitywith the older system permission type, only Google applications can directly interface the telephonyAPI, for example.

The standard permission system described so far, is often not sufficient when used with contentproviders. A content provider may want to protect itself with read and write permissions, while its directclients also need to hand in specific Uniform Resource Identifiers (URIs) to other applications for themto operate on. Recall that Android uses a special content URI to address content providers, optionallyspecifying a record within a table. The developer can pass such a URI in an intent’s data field forexample, an intent can specify the VIEW action and a content URI identifying an image file. If used tostart an activity, the system will choose a component in a different application to view the image. Ifthe target application doesn’t have read permission to the content provider containing the image file,the developer can use a URI permission instead. In this case, the developer sets a read flag in theintent that grants the target application access to the specific intent identified record. URI permissionsare essentially capabilities for database records. Although they provide least privilege access to contentproviders, the addition of new delegation mechanisms further diverges from the original Mandatory AccessModel (MAC).

Android ArchitectureAndroid is an open source software stack for mobile devices. The architecture of Android is distributedin different levels or layers where the bottom provides a service to the upper.

As shown in Figure 1, these levels are:

• the operating system (OS);

• libraries with the Dalvik Virtual Machine (DVM);

4

LINUX KERNEL

DisplayDriver

WifiDriver

AudioDriver

Binder(IPC)Driver

PowerManagement

ProcessManagement

MemoryManagement

APPLICATION FRAMEWORK

ActivityManagement

WindowManagement

NotificationManagement

PackageManagement

ResourceManagement

ContentProviders

ViewSystem

APPLICATIONSNative Android Apps Third Party Apps

Android RuntimeCore

Libraries

DalvikVirtual

Machine

LIBRARIES

SQLlite

WebKit OpenGl ES

FreeTypeSurfaceManager

MediaFramework

SSL SGL libc

Figure 1: Android architecture

• the Application Framework;

• applications;

From the bottom, the Linux kernel provides basic services such as memory management, processscheduling and file system. At a higher level there are the native libraries developed in C and C ++.These, together constitute the core of Android.

At the third layer, we have the Android Application Framework, consisting of a series of componentsand APIs. At the top of the software stack lies the application layer, which contains a set of built-in coreapplications and third party applications installed by users.Android defines four component types:

• Activity components, define applications user interface. Typically, an application developer definesone activity per “screen”. Activities start each other, possibly passing and returning values. Onlyone activity on the system has keyboard and processing focus at a time; all others are suspended.

• Service components perform background processing. When an activity needs to perform someoperations that have to take charge after the user interface has disappeared (such as download afile or play music), it commonly starts a service specifically designed for that action. Services oftendefine an interface for Remote Procedure Call (RPC) that other system components can use tosend commands and retrieve data, as well as register callbacks.

• Content provider components store and share data using a relational database interface. Eachcontent provider has an associated “authority” describing the content it contains.

5

• Broadcast receiver components act as mailboxes for messages from other applications. Broadcastreceivers subscribe to such destinations to receive the messages sent to it. Application code canalso addresses a broadcast receiver explicitly by including the name space assigned to its containingapplication.

Android provides several means for applications communication. The primary mechanism for componentsinteractions is an Intent, which is simply a message object, containing a destination component addressand data. The APIs define methods that accept intents and use such informations to start each of thefour component type already been said. As these methods are invoked, the Android framework beginsexecuting code in the target application. This process of ICC is known as an action. ICC is analogousto inter-process communication (IPC) in Unix-based systems.Every Android application runs in its own process, with its own instance of the Dalvik virtual machine.Dalvik has been written so that a device can run multiple VMs efficiently. The Dalvik VM executes filesin the Dalvik Executable (.dex) format which is optimized for minimal memory footprint. The VM isregister-based, and runs classes compiled by a Java language compiler that have been transformed intothe .dex format by the included “dx” tool. Android application package file (APK) is the file format usedto distribute and install application software and middleware onto Google’s Android operating system.To make an APK file, a program for Android is first compiled, and then all of its parts are packagedinto one file. This holds all of that program’s code (such as .dex files), resources, assets, certificates, andManifest File.

Android Malware DetectionRootkits are a class of malware that infects code and data of OS kernel. By infecting the kernel itself,they gain control over the layer that is traditionally considered the Trusted Computing Base (TCB) onmost systems.Bickford et al., in [3] have focused on security versus energy tradeoffs for host-based rootkit detection.Some emerging proposals for malware detection have examined how to sidestep the energy constraintsusing offloaded architectures in which the malware detector itself executes on a well-provisioned serverand monitors mobile devices. Unfortunately, malware detection offload either incurs significant powerexpenditures, due to data upload, or has limited effectiveness, because it is best suited to traditionalsignature-based scanning. Such signature scanning is easily defeated with encryption, polymorphism andother stealth techniques. For this reason, there is growing consensus that signature-based scanning mustbe supplemented with powerful host-based agents that, for example, employ behavior-based detectionalgorithms. They have presented a framework to quantify the degree of security being traded off whenprolonging battery life, and ways in which such tradeoffs can be implemented. Specifically, they havestudied security tradeoffs along two axes: the surface of attacks that the malware detector will cover, andthe frequency whereby the malware detector will be invoked. The first technique, based on Patagonix,detects rootkits by monitoring code integrity; the second technique, based on Gibraltar, monitors kerneldata integrity.Rastogi et al. [24] have developed a systematic framework called DroidChameleon with several commontransformation techniques that may be used to transform Android applications automatically. Some ofthese transformations are highly specific to the Android platform only. Based on the framework, theyhave passed known malware samples (from different families) through these transformations to generatenew variants of malware, which are verified to possess the originals’ malicious functionality.In this scenario Burguera et al. in [6] have proposed an approach to analyze the behavior of Androidapplications, providing a framework to distinguish between applications that, having the same name andversion, behave differently. The main feature of their work has been the use of a crowd-sourced systemobtaining the traces of applications’ behavior, which helps researchers to collect different samples ofapplication execution traces. The whole analysis process is performed on a dedicated remote server. Thisserver will be used exclusively to collect information and detect malicious and suspicious applicationsin the Android platform. Then, a lightweight client, called Crowdroid, is in charge of monitoring LinuxKernel system calls and sending them preprocessed to a centralized server.One way to extend the control of users and trusted third parties on smartphones is to use context-related

6

policies. So, Conti et al. [8] have presented CRePE, a system that is able both to enforce polices atrun-time and also allow trusted third parties. With its elaborate architecture, CRePE is able to definecontexts and rules over them without reducing Android security.

Static VS. Dynamic AnalysisSo far two approaches have been proposed for the analysis and detection of malware: static analysis anddynamic analysis. Static analysis, mostly used by antivirus companies, is based on source code or binariesinspection looking at suspicious patterns. On the other hand, dynamic analysis or behavior-based detec-tion, involves running the sample in a controlled and isolated environment in order to analyze its executiontraces. Static analysis works have also been proposed for malware detection in individual smartphones.Antivirus companies have adapted their signature-based detection systems to smartphones, but consider-ing the level of resources needed by antivirus techniques and the power and memory constraints of mobiledevices, in-phone analysis is not a preferred solution to apply in smartphones. Static analysis is knownto be vulnerable to code obfuscation techniques, which are common place for desktop malware and areexpected for Android malware. In fact, the Android SDK includes a tool named Proguard citepproguardfor obfuscating Apps. Researchers have also demonstrated that bytecode randomization techniques canbe used to completely hide the internal logic of a Dalvik bytecode program. Static analysis also fallsshort for exploit diagnosis, because a vulnerable runtime execution environment is needed to observe andanalyze an exploit attack and pinpoint the vulnerability.Dynamic analysis is immune to code obfuscation and is able to see the malicious behavior on an actualexecution path. Its downside is lack of code coverage, although it can be ameliorated by exploiting mul-tiple execution paths.Sohr et al. in [27] have employed Java Modeling Language (JML) to specify security requirements forJava 2 Micro Edition (J2ME) APIs, and check at compile-time if the implementation satisfies the re-quirements.An other approach is presented in RiskRanker [18], a tool able to assess risks from existing (untrusted)apps for zero-day malware detection. Grace et al. [18] have analyzed factory stock apps to identifypermission leakage, a threat that also spurred studies on its runtime mitigations.In spite of RiskRanker being effective in archiving its own goals, it targets the vulnerabilities that onlyrepresent a subset of component hijacking (i.e. hijacks seeking to access non-permission-protected sensi-tive resources are not covered). Plus, it doesn’t intend to provide any in-depth detection method suitedfor scalable app vetting.Lu’s et al. in [22] work, therefore have aimed to bridge this gap. CHEX follows a static program analysisapproach, featuring a novel data-flow analyzer specially designed to accommodate Android’s special appprogramming paradigms. Static analysis makes sense for vetting benign apps in that, the anti-analysistechniques that are commonly used in adversarial scenarios are out of scope, and the advantages of staticanalysis, such as its completeness and bounded time complexity, are well suited to addressing the vul-nerability discovery problem. To test CHEX, they have built a generic Android app analysis frameworknamed Dalysis, which stands for Dalvik bytecode analysis. As suggested by its name, Dalysis directlyworks on off-the-shelf app packages (or Dalvik bytecode) without requiring source code access or anydecompilation assistance.

Virtualization and AndroidIt is widely accepted that dynamic analysis is indispensable, because malware is often heavily obfuscatedto thwart static analysis. Furthermore, runtime information is often needed for exploit diagnosis. Vir-tualization based analysis has proven effective against evasion, because all of the analysis componentsare out of the box and are more privileged than the runtime environment being analyzed, including themalware. Based on dynamic binary translation and hardware virtualization techniques, several analysisplatforms have been built for analyzing desktop malware. These platforms are able to bridge the semanticgap between the hardware-level view from the VMM and the OS-level view within the virtual machine

7

using virtual machine introspection techniques.The advantages of virtualization-based analysis approaches are two-fold:

1. As the analysis runs underneath the entire virtual machine, it is able to analyze even the mostprivileged attacks in the kernel;

2. As the analysis is performed externally, it becomes very difficult for an attack within the virtualmachine to disrupt the analysis.

The downside, however, is the loss of semantic contextual information when the analysis component ismoved out of the box. To reconstruct the semantic knowledge, Virtual Machine Introspection (VMI), afamily of techniques that rebuilds a guest context from the VMM, is needed to intercept certain kernelevents and parse kernel data structures. The security benefits of virtualization have been rigorouslyand repeatedly established. Traditionally the use of virtualization as a tool for building secure systemshas been the purview of the desktop and server environments. While the recent interest in mobilevirtualization is promising, it is still unclear how to best architect secure systems with this technology.The system design presented by Gudeth et al. in [19] is based on bare metal virtualization, a designchoice specifically selected to satisfy the minimization of the TCB. They have recommended the use ofa bare metal hypervisor, which typically consists of orders of magnitude fewer lines of code than a fullOS. A bare metal hypervisor runs directly on the hardware with all guest OSs and optionally individualapplications running in their own virtual machine. Hence, any attack exploiting vulnerabilities in OS ordrivers, is thwarted by the bare metal hypervisor.Despite the fact that Android is based on Linux, it is not straightforward to take the same desktopanalysis approach for Android malware. So, Yan et al., authors of DroidScope in [34], have aimed toreconstruct semantic knowledge at two levels all in a unified analysis platform.These two levels are:

1. OS-level semantics (how information about processes, threads, memory mappings and system callsare rebuilt at runtime), that understand the activities of the malware process and its native com-ponents;

2. Java-level semantics, that comprehend the behaviors in the Java components.

Yan et al. in [33] have recorded malware execution using hardware virtualization for transparency, andthen replayed and analyzed the malware’s execution using dynamic binary translation for flexibility andefficiency of in depth analysis. Their platform, V2E, needs to work under the malicious context: theemulator should exactly replay the execution recorded from the hardware virtualization platform in spiteof the fact that malware is trying to detect every possible heterogeneous property in these two systems.

Android Domain IsolationAlthough virtualization provides strong isolation, it duplicates the entire Android software stack, whichrenders those approaches quite heavy-weight in consideration of the scarce battery life of smartphones.A possible approach to mitigate this problem could be the automatic hibernation of VMs currently notdisplayed to users, even if, currently available mobile virtualization technology, does not provide thesefeatures. Default Android, in fact, has no means to group applications and data into domains, where adomain compromises a set of applications and data belonging to one trust level.Bugiel et al. in [4] have presented the design and implementation of XManDroid (eXtended Monitoringon Android), a security framework that extends the monitoring mechanism of Android to detect andprevent application-level privilege escalation attacks at runtime, based on a system-centric system policy.Their implementation analyzes dynamically applications’ transitive permissions usage while inducing aminimal performance overhead unnoticeable for users. In contrast to existing solution, Bugiel et al. in[5] have presented TrustDroid, a lightweight solution, which doesn’t require duplication of Android’smiddleware and kernel. It enables isolation at different layers of the Android software stack:

• at the middleware layer, to prevent inter-domain application communication and data access;

8

• at the kernel layer to enforce mandatory access control on the file system and on Inter-ProcessCommunication (IPC) channels;

• and at the network traffic.

In particular, TrustDroid exploits coloring of separate and distinguishable components. The assign-ment of colors for applications and user data, is based on a certification scheme which can be easilyintegrated into Android. Based on the applications colors, TrustDroid organizes applications along withtheir data into logical domains. At runtime, TrustDroid monitors all application communications, accessto common shared databases, as well as file-system and network access. It also denies any data exchangeor application communication between different domains.

Data TaintingDefinition 1 (Flow). An operation, or series of operations, that uses the value of some object, say x, toderive a value for another, say y, causes a flow from x to y.

Two types of flows are defined: explicit flows such as x = y, where we observe an explicit transfer ofa value from x to y, and implicit flows (control flows) were there is no direct transfer of value from a tob, but when the code is executed, b would obtain the value of a.

Definition 2 (Tainted). If the source of the value of the object X is untrustworthy, we say that X istainted.

Definition 3 (To taint). To “taint” user data is to insert some kind of tag or label for each object of theuser data. The tag allow to track the influence of the tainted object along the execution of the program.

Definition 4 (Taint propagation). If an operation uses the value of some tainted object, say X, to derivea value for another, say Y , then object Y becomes tainted.Object X taints the object Y , through taint operator t : X → t(Y ).Taint operator is transitive:

X → t(Y ), t(Y )→ t(Z),⇒ X → t(Z)

Two of the most commonly employed dynamic analysis techniques in security research, are dynamictaint analysis and forward symbolic execution. Dynamic taint analysis runs a program and observes whichcomputations are affected by predefined taint sources such as user input. Dynamic forward symbolic ex-ecution automatically builds a logical formula describing a program execution path, which reduces theproblem of reasoning about the execution to the domain of logic. The two analysis can be used in con-junction to build formulas representing only the parts of an execution that depend upon tainted values.The principle of dynamic taint analysis is to taint some of the data in a system and then propagate thetaint to data for tracking the information flow in the program. The dynamic taint analysis mechanismis used primarily for vulnerability detection and protection of sensitive data. To detect the exploitationof vulnerabilities, the sensitive transactions must be monitored to ensure that they are not tainted byoutside data. But this technique does not detect control flows which can cause an under-tainting problem,i.e. that some values should be marked as tainted, but are not. An attacker can take advantage of anindirect control dependency to exploit a vulnerability.Enck et al. have presented TaintDroid [11], a sophisticated framework which detects unauthorized leak-age of sensitive data. TaintDroid exploits dynamic taint analysis in order to label privately data witha taint mark, auditing on track tainted data as it propagates through the system, and alerting users iftainted data aims to leave the system. TaintDroid mainly addresses data flows, whereas privilege escala-tion attacks also involve control flows.A precise definition of dynamic taint analysis or forward symbolic execution must target a specific lan-guage. Schwartz et al. [26] has used SimpIL : a Simple Intermediate Language. Although the languageis simple, it is powerful enough to express typical languages as varied as Java and assembly code. Indeed,the language is representative of internal representations used by compilers for a variety of programminglanguages. A program in SimpIL language consists of a sequence of numbered statements. In recent years,

9

symbolic execution has advanced a lot. As has already been said, it is usually combined with dynamictaint analysis and theorem proving, and is becoming a powerful technique in security analysis of softwareprograms. In particular, symbolic execution has been shown to be useful in discovering trigger-basedcode (malicious in many cases, although not necessarily) and finding the corresponding trigger condition.Wang et al. [29] have challenged the requirement of using cryptographic functions in obfuscation to makesymbolic execution difficult, and proposed a novel automatic obfuscation technique that makes use oflinear unsolved conjectures. There are a few advantages of using only linear operations in the obfuscationwithout any cryptographic ones. First, the obfuscated code becomes less suspicious in malware detection.The obfuscated code produced by their technique only adds a simple loop to the code, making the result-ing obfuscated code similar to legitimate programs, e.g., simple number sorting algorithms. Second, suchsimple obfuscated code makes it possible for their technique to be combined with other obfuscation andpolymorphism techniques to achieve stronger protection. Third, the size of the obfuscated code is lessthan one hundred bytes longer than the original program. Many unsolved conjectures, e.g. the Collatzconjecture, involve some simple linear operations on integers that loop for an unknown number of times.Such operations are usually fast and commonly used in basic algorithms in computer science. They areperfect candidates to be used in obfuscations to make symbolic execution difficult.

ai ={

n for i = 0f(ai−1) for i > 0 where f(n) =

{n2 if n ≡ 0 (mod 2)3n + 1 if n ≡ 1 (mod 2)

Figure 2: Collatz conjecture: ai will eventually reach 1 regardless of the value of n.

Another advantage of using these unsolved conjectures is that they can be used to obfuscate inequalityconditions, a case the previous proposal is unable to handle. Although some inequality conditions couldbe transformed to (a set of) equality conditions, it might become impractical when the inequality rangeis big. Schwartz et al. have proposed and implemented an automatic obfuscater to incorporate unsolvedconjectures into trigger conditions in program source code. Extensive evaluations show that symbolicexecution would take hundreds of hours in order to figure out the trigger condition.Haldar et al., in [20], have presented a technique and a implementation for dynamically tracing tainteduser input in the Java Virtual Machine. Their technique tracks the taintedness of untrusted input through-out the lifetime of the application. Taintedness is propagated in the obvious way - strings derived fromtainted strings are also considered tainted. That technique is completely transparent and the applicationis completely unaware of it. It can be applied to an existing Java classfile, and does not need source code.

Static analysis is the analysis of computer software that is performed without actually executingprograms.The goal of static analysis is, given a program and a set of initial states, to compute the set of states thatarise during the execution of the program.A program is specified by a:

Definition 5 (Control flow graph). A control flow graph is denoted by a couple (CFG)G = (V, E),where:

• V is a set of program locations;

• E v V × V is a set of edges that represent the flow of control;

The graph is examined to identify the ramifications of the control flow and check the existence of anyanomalies such as unreachable code.

Definition 6 (path). Let’s start and end : E → V be two functions that associate a start node and anend node, respectively, with each edge, then a path d is a finite sequence of edges e1, e2, . . . ek such thatend(ei) = start(ei+1) ∀i = 1, . . . , k − 1.

Definition 7 (Path Condition). A Path Condition (PC), for a given statement, indicates the conditionsthat the input must satisfy for an execution to cover a path along which the statement is executed.

10

Dynamic StaticLooks at a single path Looks at multiple pathsDetermines exact taint values for run Must either over or under approximate

taint at confluence of pathsMust be run on each execution to detectattacks

Can be used to add monitoring code foronly vulnerable paths

Table 1: Differences between static and dynamic analysis

We will say “executable path” a path for which there exists a set of input data that satisfies the pathcondition.

One can generate an:

Definition 8 (Execution Tree). An execution tree has a node with each statement executed (labeledwith the statement number) and with each transition between statements a directed arc connecting theassociated nodes. For each forking IF statement execution, the associated node has two arcs leaving thenode which are labeled “T” and “F” for the true (THEN) and false (ELSE) parts, respectively.

Gibler et al. in [13] have presented AndroidLeaks, a static analysis framework for automaticallyfinding potential leaks of sensitive information in Android applications on a massive scale. AndroidLeaksdrastically reduces the number of applications and the number of traces that a security auditor has toverify manually. Leveraging WALA [30], a program analysis framework for Java source and byte code,they have created a call graph of an application’s code and then performed a reachability analysis todetermine if sensitive information may be sent over the network. If there is a potential path, they haveused dataflow analysis to determine if private data reaches a network sink.An interesting tool which provides static analysis is Androguard [10]. Androguard is mainly a tool writtenin python to play with Dex/Odex (Dalvik virtual machine) (.dex) (disassemble, decompilation), APK(Android application) (.apk), Android’s binary xml (.xml), Android Resources (.arsc).

Among the most important features it is able to map and manipulate DEX/ODEX/APK/AXM-L/ARSC format into full Python objects, disassemble/Decompilation/Modification of DEX/ODEX/APKformat, decompile with the first native (directly from dalvik bytecodes to java source codes) dalvik de-compiler (DAD). Androguard has been used in performing static analysis also by other research projects.For example Androwarn [9] which is a tool whose main aim is to detect and warn the user about poten-tial malicious behaviors. The detection is performed with the static analysis of the application’s Dalvikbytecode, represented as Smali. This analysis leads to the generation of a report, according to a technicaldetail level chosen from the user.Even APKInspector [7] has as a goal to aide analysts and reverse engineers to visualize compiled Androidpackages and their corresponding DEX code. APKInspector provides both analysis functions and graphicfeatures for the users to gain deep insight into the malicious apps.And finally, Andrubis [21], a tool which analyzes unknown apps for the Android platform (APKs), justlike Anubis does for Windows executables. The report provided by Andrubis gives the human analystinsight into various behavioral aspects and properties of a submitted app. To achieve comprehensiveresults, Andrubis employs both static and dynamic analysis approaches.

Static analysis is useful at the time of application development, when potential vulnerabilities foundby the analysis can be fixed by the programmer in source code. Some human intervention is also neededbecause static approaches, in order to be conservative, typically also report a number of false positives.The programmer must then manually examine the reported errors to determine which are actual vulnera-bilities and which are not. There are two problems that need to be dealt with. Firstly, the problem mustbe specified correctly. This means getting all the rules and corner cases for validating user input correctly.Secondly, this specification must be implemented faithfully. Static approaches can catch implementation

11

errors, but not bugs of specification. If a dynamic approach independently also performs its own checks,it may be able to catch more errors than only static checking. However, static approaches do providemore accurate reports than runtime approaches, enabling fixing vulnerabilities before an application isdeployed, and having no runtime performance overhead.

Because of the serious limitations explained with TaintDroid , Graa et al. in [17] have proposeda hybrid approach from which it combines and benefits from the advantages of static and dynamicanalysis. To solve the under-tainting problem in the Android system they have used a hybrid approachthat improves the functionality of TaintDroid by integrating the concepts introduced by Trishul. Trishulis an information flow control system. It is implemented in a Java virtual machine to secure executionof Java applications by tracking data flow within the environment. It does not require a change to theoperating system kernel because it analyzes the bytecode of an application being executed. Trishul isbased on the hybrid approach to correctly handle implicit flows using the compiled program rather thanthe source code at load-time.

Discussing Static Analysis: DroidKungFu and ZeusDroidKungFuThis malware, which is included in repackaged apps made available through a number of alternative appmarkets and forums targeting Chinese-speaking users . The malware adds into the infected app a newservice and a new receiver. The receiver will be notified when the system finishes booting so that it canautomatically launch the service without user interaction. Once the service gets started, DroidKungFuwill collect a variety of information on the infected mobile phone, including the IMEI number, phonemodel, as well as the Android OS version. With the collected information, the malware phones home bymaking a HTTP Post to a hard-coded remote server.

Specifically, instead of including plaintext remote server URLs, the malware encrypts them and hasthree C&C servers for additional redundancy or robustness. Inside the infected app, there exists an(encrypted) embedded apk that the malware will attempt to install after getting the root privilege.Specifically, the embedded apk, once decrypted, appears to be a fake Google Update app. If installed,this embedded apk does not show any icon in the home screen. Our analysis shows that this app isactually a backdoor, which will connect back to a remote server for instructions. In essence, it effectivelyconverts the compromised phone into a bot.

Its onCreate() method will attempt to get root access on the phone using two separate exploits. Oneof them, which is related to an embedded file named “ratc” (the acronym of “RageAgainstTheCage”), isencrypted but will be decrypted at runtime (with the copyAssets method) and then executed to exploitthe adb resource exhaustion bug, which affects Android 2.2 or below. If successful, the malware canelevate its privilege to root. Recent Android versions (2.3+) have patched this bug and this exploit willnot be successful. In this case, the malware will attempt to detect whether the phone has been alreadyrooted and if so further request the root privilege. In either case, the malware will still phone home withcollected phone information (e.g., IMEI and phone model etc). After obtaining the root privilege, theDroidKungFu malware can essentially access arbitrary files in the phone and have the capability to installor remove any packages. One built-in payload of DroidKungFu is to install a hidden app named legacyafter getting the root privilege. The app is embedded as part of the infected host app and pretends tobe the legitimate Google Search app bearing with the same icon. It turns out that the fake app is abackdoor. Within a short two-month period from June to August 2011, there were identified three mostimportant different versions of DroidKungFu malware. Clearly, while the anti-virus companies diligentlypush out signatures to detect malware in the wild, the malware authors are also working hard to evolvemalware at a rapid pace to avoid detection. Anyway DroidKungFu now comes in different flavors (5 sofar), discovered by Pr. Xuxian Jiang (and research team) and Lookout. A brief presentation of theirdifferences can be obtained with its androsim.py tool.

12

1 sara@Sara−Compaq −8510w−KU288ES−ABZ : ~ / Scrivania / androguard \$ . / androsim . py −i / home / sara /←↩Scrivania / Nuovo / droidKungFu . apk / home / sara / Scrivania / Nuovo / dr oid kun gfu 2 . apk −d

2 Elements :3 IDENTICAL : 454 SIMILAR : 345 NEW : 3566 DELETED : 2097 SKIPPED : 08 −−> methods : 13 .900310 % of s imi lar iti es

Listing 1: androsim.py tool.

As we can see from [28], the package name of the malware is “com.tutusw.phonespeedup”:

1 <manifest android : versionCode=" 14 " android : versionName=" 1 . 3 . 1 " android : i n s t a l l L o c a t i o n=" auto "←↩package="com . tutusw . phonespeedup ">

2 <uses−sdk android : m i n S d k V e r s i o n=" 3 " android : t a r g e t S d k V e r s i o n=" 8 "/>3 <uses−permission android : name=" android . p e r m i s s i o n .RECEIVE_BOOT_COMPLETED"/>4 <uses−permission android : name=" android . p e r m i s s i o n .WAKE_LOCK"/>5 <uses−permission android : name=" android . p e r m i s s i o n .VIBRATE"/>6 <uses−permission android : name=" android . p e r m i s s i o n .WRITE_EXTERNAL_STORAGE"/>7 <uses−permission android : name=" android . p e r m i s s i o n .ACCESS_NETWORK_STATE"/> // Check ←↩

c o n n e c t i v i t y to remote s e r v e r8 <uses−permission android : name=" android . p e r m i s s i o n .ACCESS_WIFI_STATE"/> // Uses w i f i f o r ←↩

c o n n e c t i v i t y9 <uses−permission android : name=" android . p e r m i s s i o n .CHANGE_WIFI_STATE"/>

10 <uses−permission android : name=" android . p e r m i s s i o n .INTERNET"/> // Communicate with remote ←↩s e r v e r

11 <uses−permission android : name=" android . p e r m i s s i o n .READ_PHONE_STATE"/> // Get i n f o r m a t i o n s ←↩from phone

Listing 2: Android Manifest.

The Android system requires that all installed applications must be digitally signed with a certificatewhose private key is held by the application’s developer. The Android system uses the certificate as ameans of identifying the author of an application and establishing trust relationships between applications.The certificate is not used to control which applications the user can install. The certificate does not needto be signed by a certificate authority: it is perfectly allowable, and typical, for Android applications touse self-signed certificates. The system will not install an application on an emulator or a device if it isnot signed, thus, all applications must be signed. To test and debug an application, the build tools signyour application with a special debug key that is created by the Android SDK build tools. The Androidsystem will not install or run an application that is not signed appropriately. Through Androguard [10]we have been able to recover the application’s sign:

1 [ { "SAMPLE" : " apks / malwares / kungfu / droidkungfu . apk " } , { "BASE" : " AndroidOS " , "NAME" : "←↩DroidKungfu " ,

2 "SIGNATURE" : [3 { "TYPE" : "METHSIM" , "CN" : "Lcom/ g o o g l e / s s e a r c h / S e a r c h S e r v i c e ; " , "MN" : " getPermiss ion1←↩

" , "D" : " ( ) Z" } ,4 { "TYPE" : "METHSIM" , "CN" : "Lcom/ g o o g l e / s s e a r c h / S e a r c h S e r v i c e ; " , "MN" : "←↩

getPermiss ion2 " , "D" : " ( )V" } ,5 { "TYPE" : "METHSIM" , "CN" : "Lcom/ g o o g l e / s s e a r c h / S e a r c h S e r v i c e ; " , "MN" : "←↩

getPermiss ion3 " , "D" : " ( )V" }6 ] ,7 "BF" : " a && b && c "8 } ]

Listing 3: droidkungfu.sign from Androguard’s malware database.

When installed, the application checks if the malicious service named"com.google.ssearch.SearchService“, is already running.If the service is not found in the running services, it will start the service. We can see from the activities,found in the Android Manifest File, that, inside the”com.google.ssearch.GoogleSsearch“ activity, the malware will start its own service and then launch theapplication’s primary activity. In the Android Manifest File, we can see a new receiver is declared for themalware. The receiver will be able to notice when the system has completed boot process so that it canstart the service declared for the malware automatically without user interaction. At first, the malware

13

will check the shared preferences and then check the connectivity using the network information of thedevice. Later it will collect information on the device, for example IMEI, operating system type, modeland more. The malware tries to connect a remote server. To get informations about it, we have usedWireshark [32] which is a free and open-source packet analyzer. It is used for network troubleshooting,analysis, software and communications protocol development, and education. Wireshark is cross-platform,using the GTK+ widget toolkit to implement its user interface, and using pcap to capture packets; it runson various Unix-like operating systems including Linux, OS X, BSD, Solaris, and on Microsoft Windows.There is also a terminal-based (non-GUI) version called TShark. Wireshark, and the other programsdistributed with it such as TShark, are free software, released under the terms of the GNU GeneralPublic License. Similar to tcpdump, Wireshark has a graphical front-end, plus some integrated sortingand filtering options. It allows the user to put network interface controllers that support promiscuousmode into that mode, in order to see all traffic visible on that interface, not just traffic addressed to oneof the interface’s configured addresses and broadcast/multicast traffic. However, when capturing witha packet analyzer in promiscuous mode on a port on a network switch, not all of the traffic travelingthrough the switch will necessarily be sent to the port on which the capture is being done, so capturingin promiscuous mode will not necessarily be sufficient to see all traffic on the network. Port mirroringor various network taps extend capture to any point on net; simple passive taps are extremely resistantto malware tampering. This malware encrypts two well known exploits named ’exploit’ (udev exploit)and ’rage against the cage’ exploit. When the malware runs, it decrypts those two exploits and tries togain root access on the device. In the assets folder we can see 3 files which are binary files encrypted withAES algorithm. The two exploits used by the malware, i.e. “exploid” and “rage against the cage”, arewell known exploits. The malware will try to get permissions using various methods: first, it checks thepermissions, second, it checks for the version and tries to get permissions. If the malware could not getroot, it asked the user to give it him. The exploit needs USB debugging (adb) to get this exploit to runsuccessfully. If USB is not enabled then it has to get it working, which can be achieved with the victim’sapproval.

Mobile ZeusThe Zeus malware (also known as Zbot) first appeared in 2006 when a security firm released a full reverseengineering analysis of an unknown trojan named PRG. Since then, it has been modified and customizedto suite specific needs and released in different variants, each one offering innovative features to stealsensitive information. Like most banking Trojans, the Zeus’s goal is to steal sensitive information thatcould lead the attacker to carry out a financial fraud against the victim. The Zeus environment is usuallycomposed of three different entities: the bot, e.g. the machine that has been infected, the Command andCenter (C&C or dropzone) i.e. the main server where the control panel is hosted and where the botssend the stolen information, and the configuration server i.e. the server where the configuration file ishosted, ready to be downloaded by the bots.Mobile ZeuS [23], or Trojan-Spy.*. Zitmo, was designed for one sole purpose: to quickly steal mobileTransaction Authentication Number, (mTAN codes) without mobile users noticing. The first importantthing to point out is that ZitMo works in close collaboration with the regular ZeuS Trojan, a modificationof the Trojan that targets the Win32 platform. To defend users from this malware, Riccardi et al in [25]have proposed a technique to extract the keystream used by Zeus to encipher its payload. In 2010,malicious users added a new function to the PC-based ZeuS. The way it had worked remained more orless the same, only now, a modified authentication page would also ask users to enter data about theirmobile device (the make, model, and telephone number) in addition to their username and password.

Discussing Dynamic Analysis: DroidScope and DECAFDroidScope [34] is an Android analysis platform for virtualization-based malware analysis. DroidScopereconstructs both the OS-level and Java-level semantics simultaneously and seamlessly.

Figure 3 illustrates the architecture of the Android system from the perspective of a system pro-grammer. To demonstrate the capabilities of DroidScope, Yan et al. have developed several analysis

14

Linux Kernel

ZygoteSystemServices

NativeComponent

Systemlibraries

JavaComponent

JavaComponent

Java Libraries

DVM

JNI

APITracer

Nativeinsn.

Tracer

Dalvikinsn.

Tracer

TaintTracer

Java level view

OS level view

Instru

menta

tion In

terfa

ce

Droidscope

Figure 3: Overview of Android System in [34].

tools on it. The API tracer monitors the malware’s activities at the API level to reason about how themalware interacts with the Android runtime environment. This tool monitors how the malware’s Javacomponents communicate with the Android Java framework, how the native components interact withthe Linux system, and how Java components and native components communicate through the JNI inter-face. The Native instruction tracer and Dalvik instruction tracer look into how a malicious App behavesinternally by recording detailed instruction traces. The Dalvik instruction tracer records Dalvik bytecodeinstructions for the malware’s Java components and the native instruction tracer records machine-levelinstructions for the native components (if they exist). The Taint tracker observes how the malware ob-tains and leaks sensitive information (e.g., GPS location, IMEI and IMSI) by leveraging the taint analysiscomponent in DroidScope. Dynamic taint analysis has been proposed as a key technique for analyzingdesktop malware particularly with respect to information leakage behavior. To reconstruct the OS-levelview for DroidScope, they employed similar techniques used for x86 platforms, generally known as virtualmachine introspection. The OS-level view, in fact, is essential for analyzing native components. It alsoserves a basis for obtaining the Java-level view for analyzing Java components. With basic instrumen-tation support, they extract the following OS-level semantic knowledge: system calls, running processes,including threads and the memory map. To obtain the system call information, special instructions, i.e.insert the additional TCG instructions were instrumented. In doing so, a callback function is invokedand it is responsible for retrieving additional information from memory. For important system calls (e.g.open, close, read, write, connect, etc.), the system call parameters and return values are retrieved as well.As a result, it is possible to understand how a user-level process accesses the file system and the network,communicates with another process, and so on. With the OS-level view and knowledge of how the DVMoperates internally, it is possible to reconstruct the Java or Dalvik view, including Dalvik instructions,the current machine state, and Java objects. DVM executes Dalvik bytecode in two ways: interpretationand Just-In-Time compilation (JIT). The interpreter, named mterp, uses an offset-addressing method tomap Dalvik opcodes to machine code blocks. The Just-In-Time compiler was introduced to improve per-formance by compiling heavily used, or hot, Dalvik instruction traces (consisting of multiple code blocks)directly into native machine code. Overall, JIT provides an excellent performance boost for programsthat contain many hot code regions, although it makes fine-grained instrumentation more difficult. Thisis because JIT performs optimization on one or more Dalvik code blocks and thus blurs the Dalvik in-struction boundaries. Since completely disable JIT at build time may incur a heavy performance penalty,the authors have chosen to selectively disable JIT at runtime. Java Objects are described using two datastructures. Firstly, ClassObject describes a class type and contains important information about thatclass: the class name, where it is defined in a dex file, the size of the object, the methods, and the locationof the member fields within the object instances. To standardize class representations, Dalvik creates

15

a ClassObject for each defined class type and implicit class type, e.g. arrays. Secondly, as an abstracttype, Object describes a runtime object instance, i.e. member fields. Each Object has a pointer to theClassObject that it is an instance of plus a tail accumulator array for storing all member fields. Symbols(such as function name, class name, field name, etc.) provide valuable information for human analyststo understand program execution. Thus, DroidScope seeks to make the symbols readily available bymaintaining a symbol database.For portability, one database of offsets to symbols per module has beenused. At runtime, finding a symbol by a virtual address requires first identifying the containing moduleusing the shadow memory map, and then calculating the offset to search the database. Native librarysymbols are retrieved statically through objdump and are usually limited to Android libraries since mal-ware libraries are often stripped of all symbol information. On the other hand, Dalvik or Java symbolsare retrieved dynamically and static symbol information through dexdump is used as a fallback. Thishas the advantage of ensuring the best symbol coverage for optimized dex files and even dynamicallygenerated Dalvik bytecode.

DECAF is a multi-target binary analysis platform. The core idea is to abstract away the detailsof different targets, (e.g. the program counter register in x86 EIP contains the virtual address of thecurrent instruction being executed while the same register in ARM PC points to the next instruction) sothat the analyst can focus on doing important work (analyzing) (e.g. in DECAF there is one functionDECAF_getCurPC that returns the address of the instruction being executed and targets specific functionsto obtain the register values). In this way, what the analyst has to do is register for different events,such as “block begin” or “instruction begin” or “system call”.Following a similar philosophy, DECAFalso provides multiple virtual machine introspection facilities so that no matter if the guest machine isWindows or Linux, the analyst will still be able to readily obtain a shadow process list, among otherthings. The current version of DroidScope is built for Android Gingerbread. Since the authors mainlydealt with the 32-bit ARM architecture and included some files from the Android source code as part ofDroidScope, they always need that the host machine be a 32-bit machine. DECAF does not have thislimitation. Since the original paper, the authors of DroidScope have been porting it to run on top of theDECAF [35] binary analysis platform. The immediate advantages are:

• Seamless ARM and x86 Native API support;

• Dynamic loading of plugins;

• More refined NativeAPI - more callbacks, and some bug fixes;

• Better Virtual Machine Introspection support.

Let us now introduce the core of our research: the DalvikInstructionTracer plugin. Recalling that theDalvikInstructionTracer records Dalvik bytecode instructions for the malware’s Java components,let usshow most important pieces of source code:

1 s t a t i c mon_cmd_t D I T _ t e r m _ c m d s [ ] = {2 #i n c l u d e " plugin_cmds . h "3 { NULL , NULL , } ,4 } ;56 void DIT_cleanup ( )7 {8 i f ( gTracingPID != −1)9 {

10 mterp_clear ( gTracingPID ) ;11 gTracingPID = −1;12 }1314 i f ( DIT_handle != D E C A F _ N U L L _ H A N D L E )15 {16 D S _ D a l v i k _ u n r e g i s t e r _ c a l l b a c k ( DS_DALVIK_INSN_BEGIN_CB , DIT_handle ) ;17 DIT_handle = D E C A F _ N U L L _ H A N D L E ;18 }19 }2021 p l u g i n _ i n t e r f a c e _ t D I T _ i n t e r f a c e ;

16

2223 p l u g i n _ i n t e r f a c e _ t ∗ init_plugin ( void )24 {25 D I T _ i n t e r f a c e . mon_cmds = D I T _ t e r m _ c m d s ;26 D I T _ i n t e r f a c e . p l u g i n _ c l e a n u p = &DIT_cleanup ;2728 // i n i t i a l i z e the p l u g i n29 miofile=fopen ( " d a l v i k f i l e . txt " , " a " ) ;3031 DIT_init ( ) ;32 r e t u r n (& D I T _ i n t e r f a c e ) ;33 }

Listing 4: The DalvikInstructionTracer plugin source code.

Our Solution: Enhanced Dynamic AnalysisDue to the limitation of using emulators, we will couldn’t perform tainting activities, so in this chapterwe illustrate a simple malicious application we have built to carry out our researches.

Thief ApplicationAccording to the studies upon the Android malware DroidKungFu, our application aims to steal sensitiveinformation, i.e. IMEI contacts and accounts, and perform background actions without users’ agreement.Once the application is installed, users will display what is shown in Figure 4.

Figure 4: Main Screen of Thief Activity.

The core idea of the application is to send to a listening server, sensitive data stolen from devices.Each time users press one of the buttons, a connection is set up, and data are transmitted. To establisha connection, we have created a Client and a Server and stolen data are encapsulated into a Message.

Before explaining which kind of services are related to each button, let us have a look at AndroidManifest xml file to see permissions.

1 <uses−permission android : name=" android . p e r m i s s i o n .ACCESS_WIFI_STATE" />2 <uses−permission android : name=" android . p e r m i s s i o n .CHANGE_WIFI_STATE" />3 <uses−permission android : name=" android . p e r m i s s i o n .ACCESS_NETWORK_STATE" />4 <uses−permission android : name=" android . p e r m i s s i o n .ACCESS_FINE_LOCATION" />5 <uses−permission android : name=" android . p e r m i s s i o n .ACCESS_COARSE_LOCATION" />6 <uses−permission android : name=" android . p e r m i s s i o n .INTERNET" />

17

7 <uses−permission android : name=" android . p e r m i s s i o n .READ_PHONE_STATE" />8 <uses−permission android : name=" android . p e r m i s s i o n .ACCESS_MOCK_LOCATION" />9 <uses−permission android : name=" android . p e r m i s s i o n .READ_INTERNAL_STORAGE" />

10 <uses−permission android : name=" android . p e r m i s s i o n .WRITE_EXTERNAL_STORAGE" />11 <uses−permission android : name=" android . p e r m i s s i o n .GET_ACCOUNTS"/>12 <uses−permission android : name=" android . p e r m i s s i o n .AUTHENTICATE_ACCOUNTS"/>

Listing 5: Android Manifest of Thief Application.

As already said, the IMEI number is a fundamental device identificator.Consumer IMEIs have value to black market phone vendors. When a phone is reported stolen, its

IMEI is black-listed, which prevents it from connecting to cellular networks. This is supposed to renderstolen phones useless.

In practice, thieves can alter phone IMEIs to replace black-listed IMEIs with valid IMEIs. Thismotivates a market for valid consumer IMEIs.By default, an emulator’s IMEI should be “000-000-000-000-000”, but to perform as credible as possiblethe activity, we have changed it in the emulator-arm file.

A user’s contact list includes contacts’ names, phone numbers, and e-mail addresses. This contactinformation could be sold to scammers, spammers, or phishers.

As for the IMEI, an emulated device can’t provide a contacts list, so we have written a vcf file, wherewe have stored faked names and telephone numbers. This file has been pushed into the simulated SDcard and then it has been imported into the emulator.

Once the contacts list have been stolen, we were able to send e-mails in background to each one ofthem.

Another well-known malicious service we implemented, is getting information about users’ geograph-ical positions.

Since our experiments were performed through an emulator, we could not get real GPS coordinates.In any case, as we did for the IMEI number, we were able to emulate a couple of geographical coordinatesthrough adb and through DDMS Eclipse Prospective.

The last service we have implemented, provides the flooding of the system with hundreds of IMEIrequests.

This causes a Denial Of Service (DoS) which is an attempt to make a machine or network resourceunavailable to its intended users.

Generally speaking, motives for, and targets of a DoS attack may vary and generally consist of effortsto temporarily or indefinitely interrupt or suspend services of a host connected to the Internet.

One common method of attack involves saturating the target machine with external communicationsrequests, so much so that it cannot respond to legitimate traffic, or responds so slowly as to be renderedessentially unavailable. Such attacks usually lead to a server overload. In general terms, DoS attacksare implemented by either forcing the targeted computer to reset, or consuming its resources so that itcan no longer provide its intended service or obstructing the communication media between the intendedusers and the victim so that they can no longer communicate adequately.

A denial-of-service attack is characterized by an explicit attempt by attackers to prevent legitimateusers of a service from using that service. There are two general forms of DoS attacks: those that crashservices and those that flood services.

A DoS attack can be perpetrated in a number of ways. The five basic types of attack are:

1. Consumption of computational resources, such as bandwidth, disk space, or processor time.

2. Disruption of configuration information, such as routing information.

3. Disruption of state information, such as unsolicited resetting of TCP sessions.

4. Disruption of physical network components.

5. Obstructing the communication media between the intended users and the victim so that they canno longer communicate adequately.

18

In most cases DoS attacks involve forging of IP sender addresses (IP address spoofing) so that the locationof the attacking machines cannot easily be identified. Our aim is to cause a Denial of Service by makingthe underlying DroidScope plugin unavailable. We allocated a lot of threads which were going to stealthe IMEI number. Once created, these threads were not put in operation immediately, but they had towait for a kind of green light, granted by the Semaforo object. The Semaforo is an object shared by allof them and it is initialized at the “red” state and as soon it assumes the “green” state, all the threadswant to read the IMEI number simultaneously. Once launched, the Flooding attack in effect causes aDenial of Service causing not only the halt of the application but of the entire emulator.

Running Thief Activity under DECAF’s controlSince the plugin is merely a printer of dalvik instructions, our first task was to verify that differentinputs would fit different outputs. Simply stated, the first step was to “taint” noteworthy variables. Forexample, giving the emulator two different IMEI numbers, we were able to compare their relative logs.

A DoS attack to the Monitoring SystemAs already stated our aim was to study Thief Activity’s behavior under DECAF control. The applicationwas so intrusive, that it halted the entire system and DECAF fared no better. As we expected, theattack ended well, it hanged the emulator and the DalvikInstructionTracer could not trace all eventsalways leaving the last lines uncompleted.

As a next step, we have turned attention to solving the flooding problem, trying to intercept this kindof attack through the log file and prevent the system from the invasion of calls.

Pattern Matching for suspicious ActivityPattern matching [31] is the act of checking a perceived sequence of tokens for the presence of theconstituents of certain pattern. In contrast to pattern recognition, the match usually has to be exact.

The patterns generally have the form of either sequences or tree structures. Uses of pattern matchinginclude outputting the locations (if any) of a pattern within a token sequence, outputting some componentof the matched pattern, and substituting the matching pattern with some other token sequence (i.e.,search and replace).

Sequence patterns (e.g., a text string) are often described using regular expressions and matched usingtechniques such as backtracking.

Tree patterns are used in some programming languages as a general tool to process data based onits structure, e.g., Haskell [14], ML [16] and the symbolic mathematics language Mathematica [15], havespecial syntax for expressing tree patterns and a language construction for conditional execution andvalue retrieval based on it. For simplicity and efficiency reasons, these tree patterns lack some featuresthat are available in regular expressions.

Searching for a string in Java is very simple, because it can be performed through the object Patternand the object Matcher and its relative method find.

1 p r i v a t e boolean leggiFile ( ) {2 try {3 File directory =new File ( p a t h o f t h e f i l e ) ;4 fstream = new F i l e I n p u t S t r e a m ( directory ) ;5 br = new B u f f e r e d R e a d e r ( new I n p u t S t r e a m R e a d e r ( fstream ) ) ;6 Pattern p = Pattern . compile ( " getColor " ) ;7 String stringApp = br . readLine ( ) ;8 Matcher m = p . matcher ( stringApp ) ;9 whi le ( stringApp != n u l l && m . find ( )==f a l s e ) {

10 stringApp= br . readLine ( ) ;11 m = p . matcher ( stringApp ) ;12 i f ( m . find ( )==tr ue ) {13 r e t u r n t rue ;14 }15 }16 } catch ( IOException e ) {17 // TODO Auto−generated catch block

19

18 e . p r i n t S t a c k T r a c e ( ) ;19 }20 r e t u r n f a l s e ;21 }

Listing 6: Matching the string “getColor”.

The Anti-flooding ButtonIn this section we will describe our idea to safeguard the system from the attack described above.

Leaving the Flooding Button untouched, we duplicated it with the Anti-Flooding Button, addingcodes that could stem damages. The main idea was to implement a service that, during the executionof the activity and the DalvikInstructionTracer, could read the file which was going to be written, andas soon as it tracked down suspicious entries it killed the current activity. The first obstacle is thatthe plugin writes a file outside of the SD card of the emulator which is the only one location where theactivity can read files. In other words, it is impossible to alert the activity to the existence of the dalvikfile log produced by the DalvikInstructionTracer.To get around this problem, we implemented a type of server, named “reader”.

1 p u b l i c Reader ( i n t port ) {2 try {3 letSocket=new Se rve rSo cke t ( port ) ;4 readData ( ) ;5 } catch ( IOException e ) {6 e . p r i n t S t a c k T r a c e ( ) ;7 }8 }

Listing 7: The server Reader.

Like a really server, the reader establishes a connection with the application running on the emulator,and as the name suggests, it reads the dalvik file that the activity did no have access to. During thereading, it searches for the method “getColor” which, without a doubt, reveals that the attack is goingto start. As the pattern is matched, the method returns a flag to the activity. If the flag is set to thetrue value, it means that the string has been found and the receiving activity will immediately show adialog. This dialog has a number field, so the user can input the pid of the application he wants to stop,and interrupt the flooding.

Figure 5: Returning a valued true flag and showing the dialog to insert PID and stop activity.

20

Discussion and ConclusionsToday’s smartphone operating systems frequently fail to provide users adequate control over and visibilityinto how third-party applications use their private data.

Their popularity also encourages malware authors to penetrate various mobile marketplaces withmalicious applications (or apps).

These malicious apps hide in the sheer number of other normal apps, which makes their detectionchallenging. Unofficial repositories also exist, where developers can upload applications, including crackedapplications or trojan horses. This has allowed malicious attackers to upload malware to the GoogleMarket and also to spread malware through unofficial repositories.

Existing mobile anti-virus software are inadequate in their reactive nature by relying on known mal-ware samples for signature extraction.

ContributionsThe most important contribution of this work is the mechanism we propose for obtaining and analyzingreal traces of application behavior.

In collaboration with several tools [2, 1, 21, 7, 10] (see §), we have been capable of studying codes ofreal malware which had gripped Android users in the past.

Furthermore, with the Yan’s et al. [34] analysis platform DroidScope/DECAF, we have been interestedin detecting anomalous applications at runtime (§). In particular, we have tested their DalvikInstruc-tionTracer plugin. By deploying our own app, we have created a proof of its real effectiveness in taintingdata.

This analysis technique has been widely used in the literature. We have seen that there were manydifferent approaches to detect malware. We considered that monitoring Dalvik system calls is one of themost accurate techniques to determine the behavior of Android applications, since they provide detailedlow level information.

We also have considered the benefits provided by virtualization-based analysis platform: analyze eventhe most privileged attacks in the kernel and have an analysis completely performed externally.

Next step has been to launch a flooding attack which could block the plugin activity and also theentire system. Once we had launched the attack, we took care of finding a solution.

We proposed a solution based on the string matching technique. During the activity execution, wehad the opportunity to see its behavior, and we could even alert the users when the attack was going tostart.

Approach LimitationsFirst of all, because of our approach is based on DroidScope/DECAF, we have to consider its limitationsbetter described in [34], i.e. limited code coverage and detecting/evading of DroidScope.

In fact, emulation-resistant malware detects if they are running within an emulated environment andevades analysis by staying dormant or simply crashing themselves.

Furthermore, our anti-flooding system is penalized by performance overhead. In fact, our experimentalactivities are tested in a virtual machine that adds an additional virtualization layer slowing down theentire system.

Future WorksWe could improve our approach along two directions.

First, the DalvikInstructionTracer plugin’s activity can be improved in terms of time: in fact it printslots of strings at time, slowing down considerably the whole experimental environment. For example, it’spossible to screen logs according to fixed parameters, improving the performance.

Second, plugins available so far only for the x86 platform, i.e. the APITracer, the NativeInstruction-Tracer and the TaintTracker, can be ported to the ARM one.

Bibliography

[1] Ip2location, bringing location to the internet. [Online; in data 20-july-2013].

[2] Proguard, 2013. [Online; in data 9-august-2013].

[3] Jeffrey Bickford, H. Andrés Lagar-Cavilla, Alexander Varshavsky, Vinod Ganapathy, and LiviuIftode. Security versus energy tradeoffs in host-based mobile malware detection. In Proceedings ofthe 9th international conference on Mobile systems, applications, and services, MobiSys ’11, pages225–238, New York, NY, USA, 2011. ACM.

[4] Sven Bugiel, Lucas Davi, Alexandra Dmitrienko, Thomas Fischer, and Ahmad-Reza Sadeghi. Xman-droid: A new android evolution to mitigate privilege escalation attacks. Technical Report TR-2011-04, Technische Universität Darmstadt, Apr 2011.

[5] Sven Bugiel, Lucas Davi, Alexandra Dmitrienko, Stephan Heuser, Ahmad-Reza Sadeghi, and Bhar-gava Shastry. Practical and lightweight domain isolation on android. In Proceedings of the 1st ACMworkshop on Security and privacy in smartphones and mobile devices, SPSM ’11, pages 51–62, NewYork, NY, USA, 2011. ACM.

[6] Iker Burguera, Urko Zurutuza, and Simin Nadjm-Tehrani. Crowdroid: behavior-based malwaredetection system for android. In Proceedings of the 1st ACM workshop on Security and privacy insmartphones and mobile devices, SPSM ’11, pages 15–26, New York, NY, USA, 2011. ACM.

[7] Yuan Tian Cong Zheng, Ryan W. Smith. Apkinspector, 2012. [Online; in data 19-may-2013].

[8] Mauro Conti, Vu Thien Nga Nguyen, and Bruno Crispo. Crepe: context-related policy enforcementfor android. In Proceedings of the 13th international conference on Information security, ISC’10,pages 331–345, Berlin, Heidelberg, 2011. Springer-Verlag.

[9] Thomas D. Androwarn,yet another static code analyzer for malicious android applications, 2012.[Online; in data 19-may-2013].

[10] Anthony Desnos. Androguard, reverse engineering, malware and goodware analysis of android ap-plications ... and more (ninja !), 2012. [Online; in data 19-may-2013].

[11] William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel,and Anmol N. Sheth. Taintdroid: an information-flow tracking system for realtime privacy moni-toring on smartphones. In Proceedings of the 9th USENIX conference on Operating systems designand implementation, OSDI’10, pages 1–6, Berkeley, CA, USA, 2010. USENIX Association.

[12] William Enck, Machigar Ongtang, and Patrick McDaniel. Understanding android security. IEEESecurity and Privacy, 7(1):50–57, jan 2009.

[13] Clint Gibler, Jonathan Crussell, Jeremy Erickson, and Hao Chen. Androidleaks: automaticallydetecting potential privacy leaks in android applications on a large scale. In Proceedings of the 5thinternational conference on Trust and Trustworthy Computing, TRUST’12, pages 291–307, Berlin,Heidelberg, 2012. Springer-Verlag.

21

BIBLIOGRAPHY 22

[14] Google. Haskell, 2013. [Online; in data 06-august-2013].

[15] Google. Mathematica, 2013. [Online; in data 06-august-2013].

[16] Google. Ml, 2013. [Online; in data 06-august-2013].

[17] Mariem Graa, Nora Cuppens-Boulahia, Frédéric Cuppens, and Ana Cavalli. Detecting control flowin smarphones: combining static and dynamic analyses. In Proceedings of the 4th international con-ference on Cyberspace Safety and Security, CSS’12, pages 33–47, Berlin, Heidelberg, 2012. Springer-Verlag.

[18] Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. Riskranker: scalable andaccurate zero-day android malware detection. In Proceedings of the 10th international conference onMobile systems, applications, and services, MobiSys ’12, pages 281–294, New York, NY, USA, 2012.ACM.

[19] Kevin Gudeth, Matthew Pirretti, Katrin Hoeper, and Ron Buskey. Delivering secure applicationson commercial mobile devices: the case for bare metal hypervisors. In Proceedings of the 1st ACMworkshop on Security and privacy in smartphones and mobile devices, SPSM ’11, pages 33–38, NewYork, NY, USA, 2011. ACM.

[20] Vivek Haldar, Deepak Chandra, and Michael Franz. Dynamic taint propagation for java. In Pro-ceedings of the 21st Annual Computer Security Applications Conference, ACSAC ’05, pages 303–311,Washington, DC, USA, 2005. IEEE Computer Society.

[21] iSecLab. Andrubis: A tool for analyzing unknown android applications, 2012. [Online; in data19-may-2013].

[22] Long Lu, Zhichun Li, Zhenyu Wu, Wenke Lee, and Guofei Jiang. Chex: statically vetting an-droid apps for component hijacking vulnerabilities. In Proceedings of the 2012 ACM conference onComputer and communications security, CCS ’12, pages 229–240, New York, NY, USA, 2012. ACM.

[23] Denis Maslennikov. Update: Security alert: Hacked websites serve suspicious android apps (not-compatible), October 6, 2012. [Online; accessed 29-November-2012].

[24] Vaibhav Rastogi, Yan Chen, and Xuxian Jiang. Droidchameleon: evaluating android anti-malwareagainst transformation attacks. In Proceedings of the 8th ACM SIGSAC symposium on Information,computer and communications security, ASIA CCS ’13, pages 329–334, New York, NY, USA, 2013.ACM.

[25] Marco Riccardi, Roberto Di Pietro, Marta Palanques, and Jorge Aguilí Vila. Titans’ revenge:Detecting zeus via its own flaws. Comput. Netw., 57(2):422–435, feb 2013.

[26] Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All you ever wanted to know aboutdynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In Pro-ceedings of the 2010 IEEE Symposium on Security and Privacy, SP ’10, pages 317–331, Washington,DC, USA, 2010. IEEE Computer Society.

[27] Karsten Sohr, Tanveer Mustafa, and Adrian Nowak. Software security aspects of java-based mobilephones. In Proceedings of the 2011 ACM Symposium on Applied Computing, SAC ’11, pages 1494–1501, New York, NY, USA, 2011. ACM.

[28] 2012 AVG Technologies. [Online; accessed 14-April-2013].

[29] Zhi Wang, Jiang Ming, Chunfu Jia, and Debin Gao. Linear obfuscation to combat symbolic execution.In Proceedings of the 16th European conference on Research in computer security, ESORICS’11, pages210–226, Berlin, Heidelberg, 2011. Springer-Verlag.

[30] T.J. Watson. Welcome to the t.j. watson libraries for analysis (wala), 2006. [Online; in data 19-july-2013].

BIBLIOGRAPHY 23

[31] Wikipedia. Pattern matching — wikipedia, the free encyclopedia, 2013. [Online; accessed 6-August-2013].

[32] Wikipedia. Wireshark — wikipedia, the free encyclopedia, 2013. [Online; accessed 21-April-2013].

[33] Lok-Kwong Yan, Manjukumar Jayachandra, Mu Zhang, and Heng Yin. V2e: combining hardwarevirtualization and softwareemulation for transparent and extensible malware analysis. SIGPLANNot., 47(7):227–238, March 2012.

[34] Lok Kwong Yan and Heng Yin. Droidscope: seamlessly reconstructing the os and dalvik semanticviews for dynamic android malware analysis. In Proceedings of the 21st USENIX conference onSecurity symposium, Security’12, pages 29–29, Berkeley, CA, USA, 2012. USENIX Association.

[35] Lok Kwong Yan and Heng Yin, 2013. [Online; accessed 12-Jenuary-2013].

modelingmobileresourcesecurity - matematica · laurea magistrale in matematica...

Documents