triggerscope: towards detecting logic bombs in android applications

TriggerScope: Towards Detecting Logic Bombsin Android Applications

Yanick Fratantonio, Antonio Bianchi, William Robertson, EnginKirda, Christopher Kruegel, Giovanni Vigna

Pietro De Nicolao

Politecnico di Milano

June 21, 2016

1 / 26

Introduction to logic bombs

TriggerScope

Experiment

Critique

Related and future work

2 / 26

The paper

TriggerScope: Towards Detecting Logic Bombsin Android Applications

Yanick Fratantonio∗, Antonio Bianchi∗, William Robertson†, Engin Kirda†, Christopher Kruegel∗, Giovanni Vigna∗∗UC Santa Barbara

{yanick,antoniob,chris,vigna}@cs.ucsb.edu†Northeastern University{wkr,ek}@ccs.neu.edu

Abstract—Android is the most popular mobile platform today,and it is also the mobile operating system that is most heavilytargeted by malware. Existing static analyses are effective indetecting the presence of most malicious code and unwantedinformation flows. However, certain types of malice are very dif-ficult to capture explicitly by modeling permission sets, suspiciousAPI calls, or unwanted information flows.

One important type of such malice is malicious applicationlogic, where a program (often subtly) modifies its outputs or per-forms actions that violate the expectations of the user. Maliciousapplication logic is very hard to identify without a specification ofthe “normal,” expected functionality of the application. We referto malicious application logic that is executed, or triggered, onlyunder certain (often narrow) circumstances as a logic bomb. Thisis a powerful mechanism that is commonly employed by targetedmalware, often used as part of APTs and state-sponsored attacks:in fact, in this scenario, the malware is designed to target specificvictims and to only activate under certain circumstances.

In this paper, we make a first step towards detecting logicbombs. In particular, we propose trigger analysis, a new staticanalysis technique that seeks to automatically identify triggersin Android applications. Our analysis combines symbolic execu-tion, path predicate reconstruction and minimization, and inter-procedural control-dependency analysis to enable the precisedetection and characterization of triggers, and it overcomesseveral limitations of existing approaches.

We implemented a prototype of our analysis, called TRIG-GERSCOPE, and we evaluated it over a large corpus of 9,582benign apps from the Google Play Store and a set of trigger-based malware, including the recently-discovered HackingTeam’sRCSAndroid advanced malware. Our system is capable ofautomatically identify several interesting time-, location-, andSMS-related triggers, is affected by a low false positive rate(0.38%), and it achieves 100% detection rate on the malwareset. We also show how existing approaches, specifically whentasked to detect logic bombs, are affected by either a very highfalse positive rate or false negative rate. Finally, we discuss thelogic bombs identified by our analysis, including two previously-unknown backdoors in benign apps.

I. INTRODUCTION

Android is currently the most popular mobile platform. 78%of all smartphones sold in Q1 2015 [39] were shipped withAndroid installed, and the Google Play Store now hosts morethan two million applications [17]. Unfortunately, Androidhas also become the most widely-attacked mobile platform;according to a recent report, it is the target of 79% of knownmobile malware instances [59].

App store providers invest significant resources to protecttheir users and keep their platforms clean from maliciousapps. To prevent malicious apps from entering the market(and to detect malice in already-accepted applications), theseproviders typically use a combination of automated programanalysis (e.g., Google Bouncer [41]) and manual app reviews.These automated approaches leverage static and/or dynamiccode analysis techniques, and they aim to detect potentially-malicious behaviors – e.g., exfiltrating personal private infor-mation, stealing second-factor authentication codes, sendingtext messages to premium numbers, or creating mobile bot-nets. These techniques are similar in nature to the numer-ous approaches to detecting Android malware proposed inacademia [29], [31], [40], [36], [64], [14], [18], [19], [32].These approaches proved to be effective when detecting tra-ditional malware [63], and recent reports show that the officialGoogle Play app store is reasonably free from maliciousapplications [15].

Nonetheless, there are certain types of malice that are stillvery difficult to capture explicitly by modeling permissionsets, suspicious API calls, or unwanted information flows(i.e., all those features used by existing analysis approaches).One important type of such malice is malicious applicationlogic. We consider a program to contain malicious applicationlogic when it (often subtly) modifies its outputs, providingresults that violate the expectations that a user can reasonablyhave when interacting with this app. In particular, we refer tomalicious application logic that is executed, or triggered, onlyunder certain (often narrow) circumstances as a logic bomb.

As an example of a logic bomb, consider a navigationapplication (similarly to Google Maps) that is meant to assist asoldier in the battlefield when determining the shortest route toa given location. As a legitimate part of its intended behavior,this application would collect GPS-related information, sendthe information over the network to the application’s back-end for processing, retrieve the results, and display to theuser some helpful information (such as the route to follow).Assume further that this app contains a functionality thatchecks whether the current day is past a specific, hard-codeddate: If the current day is indeed past this date, the app subtlyqueries the network back-end for a long route, and not forthe shortest one as the user would expect. Thus, after the

2016 IEEE Symposium on Security and Privacy

2375-1207/16 $31.00 © 2016 IEEEDOI 10.1109/SP.2016.30

377

2016 IEEE Symposium on Security and Privacy

© 2016, Yanick Fratantonio. Under license to IEEE.DOI 10.1109/SP.2016.30

377

3 / 26

Introduction to logic bombs

4 / 26

Logic bombs

Logic bomb: malicious application logic that is triggered onlyunder certain (narrow) conditions.

I Malicious application logic: violation of user’s reasonableexpectations

I Malware is designed to target specific victims, under certaincircumstances

Example: take a navigation application, supposed to help a soldierin a war zone find the shortest route to a location.

I after a given (hardcoded) date, it gives to him longer, moredangerous routes

5 / 26

Another (real) example

RemoteLock: Android app that allows the user to remotely lockand unlock the device with a SMS containing an user-definedkeyword

I The app code also contains the following check:(!= (#sms/#body equals "adfbdfgfsgyhytdfsw")) 0)

I The predicate is triggered when an incoming SMS contains thathardcoded string

I By sending a SMS containing that string, the device unlocksI That’s a backdoor implemented with a logic bomb!

6 / 26

Problems with traditional defenses

App Stores employ some defenses, but they are not sufficient.I Static analysis: malicious application logic doesn’t require

additional privileges or make “strange” API callsI Malicious behavior is deeply hidden within the app logicI Example: the malicious navigation app. . . behaved like any

navigation app!

I Dynamic analysis: likely won’t execute code triggered only ona future date or in a certain location

I Code coverage problemsI Can be detected and evadedI Even if covered, how to discern malicious behavior from benign?

I Manual audit: if source code is not available, no guaranteesI Code can be obfuscated

7 / 26

TriggerScope

8 / 26

Key observation

TriggerScope detects logic bombs by precisely analyzing andcharacterizing the checks (conditionals, predicates) that guarda given behavior.

I It gives less importance to the (malicious) behavior itself.

if(sms.getBody().equals("adfbdf...")) // Look here!{

myObject.doSomething(); // ...not there.}

9 / 26

Trigger analysis

I Predicate: logic formula used in a conditional statementI (&& (!= (#sms/#body contains "MPS:") 0) (!=

(#sms/#body contains "gps") 0))I Suspicious predicate: a predicate satisfied only under very

specific, narrow conditions

I Functionality: a set of basic blocks in a programI Sensitive functionality: a functionality performing, directly or

indirectly a sensitive operationI In practice: all calls to Android APIs protected by permissions,

and operations involving the filesystem

I Trigger: suspicious predicate controlling the execution of asensitive functionality

10 / 26

Analysis overview (1)

1. Static analysis of bytecode; building of Control Flow Graph2. Symbolic Values Modeling for integer, string, time, location

and SMS-based objects3. Expression Trees are built and appended to each symbolic

object referenced in a checkI Reconstruction of the semantics of the check, often lost in

bytecode

!!!"#$%&#'$%()"#$%*

+,-. /01231/3//Figure 4: Example of an expression tree.

against sensitive values, in terms of both false positives andfalse negatives. The details about this step are provided inSection III-E.

III. ANALYSIS STEPS

While the previous section provided a high-level overviewof the analysis steps, in this section we elaborate upon thedetails.

A. Symbolic Execution

The analysis begins by first unpacking the Android APK andextracting the DEX file that contains the application’s Dalvikbytecode, the encoded application manifest, and encoded re-sources such as string values and GUI layouts. The bytecodeis then lifted into a custom intermediate representation (IR)that all our analysis passes operate on. The analysis thenperforms a class hierarchy analysis and control flow analysisover the IR to construct the intra- and inter-procedural control-flow graphs. After these preliminary steps, the application’sbytecode is subjected to forward static symbolic execution,applying a flow-, path-, and context-sensitive analysis, wherethe particular context used ranges from full insensitivity to2Type1Heap object sensitivity [51], which is known to providea good trade-off between precision and performance whenperforming symbolic execution on object-based programs.Android Framework Modeling. A notable feature of ouranalysis that bears mention is its awareness of i) the An-droid application lifecycle, ii) control flows that traverse theAndroid application framework due to the pervasive use ofasynchronous callbacks used in Android applications, andiii) inter-component communication using the Android intentframework. The precise modeling of these aspects has beenwidely studied in the literature and, for our design andimplementation, we mainly reused ideas from previous works.In particular, we follow the approach described in FlowDroidto model Android application components’ lifecycle [19]; weintegrate EdgeMiner’s results to model the control flow trans-fers through the Android framework [24]; and we reused ideaspresented in Epicc [45] to precisely model inter-componentcommunications among the applications components.Symbolic Values Modeling. Our symbolic execution enginemodels the sets of possible values that local and field lo-cations can contain. In particular, the analysis focuses oninteger, string, time, location, and SMS-related objects, andit records operations performed over concrete and symbolicvalues in these classes. For example, the analysis faithfullymodels symbolic string values produced and manipulated bymany important classes (e.g., String, StringBuffer,StringBuilder, and related classes) and their respective

methods (e.g., append, substring, and similar APIs).Moreover, our system models symbolic integer values resultingfrom APIs such as equals, startswith, and contains,which are particularly important when detecting suspiciouschecks on String-based objects (such as the body and thesender retrieved from the SmsMessage object). The accuratemodeling of String objects is also useful to precisely modelIntents and Bundles objects (in essence, a key-valuestore).

Similarly, our prototype also models time-related objects.For example, time values can be introduced by constructionfrom a constant, in which case the analysis computes thecorresponding concrete value. Values representing the currenttime are also specially handled by modeling the Android APIsknown to return such values (which are represented by the#now tag in the examples mentioned in this paper). Oursystem also annotates time-related objects with special tags toencode which time component is stored in a given object. Forexample, an application can invoke the Date.getMonth()method to access the month component of a given time object,or the Date.getSeconds() method to access the seconds.In this case, our analysis annotates the return value of thesemethods invocations with the special tags #now/#monthand #now/#seconds. All other components are similarlymodeled. This information can be useful, for example, todetermine how narrow is the condition represented by a giventime-related check.

Note also that it is not sufficient to simply record time valuesrepresenting the current time as a singular “now” value, asthe notion of “current time” monotonically increases duringprogram execution. Therefore, in addition to recording thata time object corresponds to the current time at the pointwhere it is returned from the Android framework, the analysisadditionally annotates such value with an integer identifierthat is incremented for each new current time value observedduring symbolic execution. This additional information is usedto reconstruct the semantics of a given check (e.g., hard-codedvs. recurring check) more precisely.

In addition, our analysis also models symbolic location-and SMS-related values and operations. Sources of location-related values include invocation of Android APIs related tothe GPS and cellular radio devices, operations on Locationobjects, and transformations on raw integer and double valuesextracted from Location objects. Our analysis also keepstrack whether a given Location object represents the currentlocation (in which case the value is annotated with the special#here tag), and, when an application accesses a specific lo-cation component, such as longitude and latitude, our analysisencodes this information with the special tags #longitudeand #latitude.

Finally, sources of SMS-related values are modeled sim-ilarly, and include operations performed on SmsMessageobjects, such as createFromPdu, getMessageBody,and getOriginatingAddress, to which our analysisassociates, respectively, the tags #sms, #sms/#body, and#sms/#sender.

382382

Figure 1: Example of expression tree.

11 / 26


4. Block Predicate Extraction: edges of Control Flow Graphare annotated with simple predicates

I Simple predicate: P in if P then X else Y

5. Path Predicate Recovery and MinimizationI Combine simple predicates to get the full path predicate that

reaches each basic blockI Minimization: elimination of redundant terms in predicates

I important to reduce false dependencies

12 / 26

Path Predicate Recovery and Minimization

f

b0

b1

b2 b3

b4

q

g() h()p ⋀ q ¬p ⋀ q

(p ⋀ q) ⋁ (¬p ⋀ q) = q

b5

Figure 5: Path predicate reconstruction from block predicates.

Expression Trees. During the symbolic execution step, eachsymbolic object is also annotated with an expression tree thatencodes the operations that influence its value. As an example,for the code snippet in Figure 1, our system would annotate theobject involved in the check with the expression tree depictedin Figure 4, whose text representation is “(#now afterDate(2016/12/22)).” As we will show, these annotationscontain all the necessary information to reconstruct the seman-tics of checks that operate on these inputs (and later classifythem). Of course, to avoid performance issues, we keep trackof precise information only for objects that are relevant to ouranalysis.

B. Block Predicate ExtractionWe already mentioned how, during the symbolic execution

step, the sCFG is also annotated with simple block predicates.To better explain this step, consider, once again, the programsnippet in Figure 1 and its schematic representation in Fig-ure 5. The analyzer would first recover the block predicate pas a path condition introduced by b1, and it would annotatethe b1 → b2 edge (where b2 is the block that contains theinvocation of method g) with predicate p, representing that,once the execution reaches b1, b2 is executed if and only if pholds. For this example, our system would determine that p isa comparison of the value of an object x against the constantzero. The expression tree of x (i.e., Figure 4) is then retrieved,and the predicate p is determined to be “(!= (#now afterDate(2016/12/22)) 0).” Similarly, the b1 → b3 edge(where b3 is the block that contains the invocation of methodh) would be annotated with the predicate ¬p.

C. Path Predicate Recovery and MinimizationIn the following step, simple block predicates are combined

together to recover, for each basic block, the full intra-

procedural path predicate. Given the block predicates for eachbasic block, path predicate recovery is conceptually straight-forward. For each block, the analysis performs a backwardstraversal of the enclosing method’s CFG and builds a complexBoolean formula that represents all paths from the currentblock to the entry point of the method. Predicates for se-quences of basic blocks are combined using conjunction (i.e.,logical AND), while blocks with multiple predecessors (dueto branch joins) take the disjunction (i.e., logical OR) of thecorresponding incoming path predicates. These predicates notonly capture the control-dependency relation between blocks,but also precisely encode the condition that must be satisfiedfor a block to be executed.

Note that, without any further steps, this simple path pred-icate reconstruction algorithm might produce path predicatesthat contain redundant terms. These terms could, if not re-moved, result in a large number of blocks with false depen-dencies on values derived from trigger inputs. For instance,if a basic block is only executed when a certain conditionevaluates to true over a value derived from a trigger input, thenthe path predicate recovery algorithm will correctly identifythat the basic block depends on that trigger input. But, it willalso erroneously state that successor blocks – including blocksexecuted after a join with the else path – depend on the triggerinput.

To illustrate, consider once again the graph shown inFigure 5, extracted from the trigger example introduced inFigure 1. After the analysis has computed block predicatesover the entire sCFG, the path predicate from block b0 to b2 –where the incoming path predicate to f is q – is represented bythe conjunction p∧q. Since p involves symbolic time values,blocks b2 and b3 have a control dependency on a time-basedinput. However, the path predicate associated with the joinof the two branches, represented by b4, would result in thefollowing expression: (p∧q)∨(¬p∧q). This will also be thepath predicate associated with the exit from f, represented byb5. Note how this path predicate has a dependency on a time-based input (through the predicate p), even if the execution ofthe basic block b5 is clearly not guarded by any time-relatedcheck.

For this reason, after path predicates are recovered for eachbasic block, the analysis minimizes each path predicate. Thisis accomplished by recursively simplifying the full formulasfor each basic block (using standard Boolean laws such asthe distributive law) until no further simplifications can beperformed. To conclude our example, the minimization ofthe path predicate associated with b5 would result in theexpression (p ∧ q) ∨ (¬p ∧ q) = (p ∨ ¬p) ∧ q = q, which,as expected, does not contain a spurious dependency on timeinput. We note that minimization of Boolean formulas is NP-hard in the general case. However, we found this techniqueto be fast in practice in our system. Finally, to compute pathpredicates for those cases that include loops, we make use ofthe techniques described in [33].

383383

Full predicate, minimization

Simple predicates

Figure 2: Path Predicate Reconstruction13 / 26


6. Predicate Classification: a check is suspicious if it’sequivalent to:

I Comparison between current time value and constantI Bounds check on GPS locationI Hard-coded patterns on body or sender of SMS

7. Control-Dependency Analysis: control dependency betweensuspicious predicates and sensitive functionalities.

I sensitive = privileged Android APIs + fileystem opsI Suspiciousness propagates with data flows and callbacksI Problem: data flows through files

I When in doubt: suspicious!

8. Post-processing: whitelisting for some edge cases

14 / 26

Experiment

15 / 26

Data sets

I Benign applications: 9582 apps from Google Play StoreI They all use time-, location- or SMS-related APIsI Actually, TriggerScope identified backdoors in two “benign”

apps, confirmed by manual inspection!

I Malicious applications: 14 apps from several sourcesI Stealthy malware developed for previous researchesI Real-world malware samplesI HackingTeam RCSAndroid

16 / 26

Results of analysis

Analysis step TP FP TN FN FPR FNRPredicate detection 14 1386 7927 0 14.88% 0%Suspicious Predicate A. 14 462 8851 0 4.96% 0%Control-Dependency A. 14 117 9196 0 1.26% 0%TriggerScope (all) 14 35 9278 0 0.38% 0%

Table 1: Results of analysis after each step. Note how each step is usefulto refine the analysis.FPR = FP

FP+TN , FNR = FNFN+TP

17 / 26

False Positives decreasing step after step

TriggerScope False Positives ChartFile Modifica Visualizza Inserisci Formato Dati Strumenti Componenti aggiuntivi Guida Tutte le modifiche sono state salvate in Drive

€ % 123

Calibri

12

False Positives after each analysis step

0 400 800 1200 1600

Predicate detection

Suspicious predicate a.

Control-dependency a.

TriggerScope (all)

Ana

lysi

s st

ep

TriggerScope False Positives ChartCommenti Condividi

[email protected]

Foglio1Figure 3: Each analysis stage is useful, because it reduces False Positives.

18 / 26

Critique

19 / 26

Strengths

I TriggerScope provides rich semantics on predicatesI Great help for manual auditsI This makes the tool extensible, open for future research

I Novel approach: focus on checks, not malicious behaviorsI Fewer FPs, FNs than other tools

20 / 26

Issues: limits of analysis

I Definition of suspicious predicate is too narrowI Only checks against hardcoded values are consideredI Triggers can come from the network or elsewhere

I Authors claim 0% FNs, but the evaluation isn’t conclusiveI we manually inspected a random subset of 20 applications for

which our analysis did not identify any suspicious check. Wespent about 10 minutes per application, and we did not find anyfalse negatives.

I Difficult to assess FNs if no tool finds anything and source codeis unavailable

I This analysis is still blacklisting: listing things we don’t likeI We’re competing against attackers’ creativity

21 / 26

Issues: evasion techniquesI Reflection, dynamic code loading, polymorphism and

conditional code obfuscation [Sharif, 2008] can defeat staticanalysis.

I Authors say that these techniques are themselves suspicious, butthey also have legitimate uses

I Predicate minimization is NP-completeI Is it possible to design “pathological” code to slow down and

defeat analysis?I Or result in very complex, meaningless predicates?

I Exceptions were not cited as a control flow subversion methodI Statically reasoning in the presence of exceptions and about the

effects of exceptions is challenging [Liang, 2014]I Unclear how the static analysis engine handles exceptionsI Unchecked exceptions (e.g. division by zero) could be exploited

as stealthy triggers

22 / 26

Related and future work

23 / 26

Related work: AppContext

AppContext [Yang, 2015]: supervised machine learning method toclassify malicious behavior statically

1. Starts identifying suspicious actions2. Context: which category of input controls the execution of

those actions?

Similar idea: just looking at the action isn’t enough. Differences:I AppContext only classifies triggers as suspicious or not;

TriggerScope also provides semantics about the predicates,helping manual inspection

I AppContext does not consider the typology of the predicate,only the type of its inputs

I Higher FP rate than TriggerScope

24 / 26

Future evolutions

I Extend trigger analysis not only to time, location, SMSsI The trigger could come e.g. from the networkI The framework is easily extensible to other types of triggers

with more work to model other symbolic valuesI Is there a more general approach?

I Quantitative analysis of predicate “suspiciousness”I Currently, it’s defined in a qualitative, ad-hoc wayI Could be combined with classification methods

25 / 26

References

Sharif M., Lanzi A., Giffin J., Lee W.Impeding Malware Analysis Using Conditional Code Obfuscation

In: Network and Distributed System Security Symposium, SanDiego, CA (2008)

Yang W., Xiao X., Andow B., Li S., Xie T., Enck W.AppContext: Differentiating Malicious and Benign Mobile AppBehaviors Using ContextIn Proceedings of the International International Conference onSoftware Engineering (ICSE), 2015.

Liang S., Sun W., Might M., Keep A. Van Horn D.Pruning, Pushdown Exception-Flow Analysis14th IEEE International Working Conference on Source CodeAnalysis and Manipulation (2014)

26 / 26

triggerscope: towards detecting logic bombs in android applications

Science