target directed event sequence generation for android applications

11
Target Directed Event Sequence Generation for Android Applications Jiwei Yan 1,3 , Tianyong Wu 1,3 , Jun Yan 2,3 , Jian Zhang 1,3 1 State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences 2 Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences 3 University of Chinese Academy of Sciences Email:{yanjw, wuty, yanjun, zj}@ios.ac.cn ABSTRACT Testing is a commonly used approach to ensure the quality of software, of which model-based testing is a hot topic to test GUI programs such as Android applications (apps). Exist- ing approaches mainly either dynamically construct a model that only contains the GUI information, or build a model in the view of code that may fail to describe the changes of GUI widgets during runtime. Besides, most of these models do not support back stack that is a particular mechanism of Android. Therefore, this paper proposes a model LATTE that is constructed dynamically with consideration of the view information in the widgets as well as the back stack, to describe the transition between GUI widgets. We also pro- pose a label set to link the elements of the LATTE model to program snippets. The user can define a subset of the label set as a target for the testing requirements that need to cover some specific parts of the code. To avoid the state explosion problem during model construction, we introduce a definition “state similarity” to balance the model accuracy and analysis cost. Based on this model, a target directed test generation method is presented to generate event sequences to effectively cover the target. The experiments on several real-world apps indicate that the generated test cases based on LATTE can reach a high coverage, and with the model we can generate the event sequences to cover a given target with short event sequences. CCS Concepts Software and its engineering Software testing and debugging; Keywords Android GUI Model; Targeted Test Generation; Back Stack; Dynamic Modeling ACM ISBN 978-1-4503-2138-9. DOI: 10.1145/1235 1. INTRODUCTION With the success of smart mobile device market, mobile application market ushered a high speed developing period, especially Android application market. Android apps, like other software, need to be adequately tested to eliminate the potential bugs and improve the quality. In the area of testing, an essential step is test case generation, which focuses on how to automatically generate the test suite with high code coverage and strong fault detection ability. Android apps are event-driven GUI programs. An An- droid app can be regarded as a collection of widgets, each of which is defined in an Activity class that is provided by Android system to interact with the user. The user oper- ations on the components (corresponding to View class in Android) in the screen trigger the corresponding events to drive the app execute the corresponding code and transfer from one widget to another. Thus, for an Android app under test (AUT), a test input is a sequence of events associated with its widgets. There are a number of test generation techniques for An- droid apps, of which model-based testing is an attractive approach for tackling this problem. The general steps of model-based test generation are shown as follows: (1) design a formal and abstract model that can briefly describe the software behavior; (2) translate the software to the model; (3) generate the test cases based on the model. The key point in the model-based testing is how to design and con- struct a proper model that can accurately and comprehen- sively describe the behavior of the AUT. In the area of An- droid apps, the model is often used to describe the Activity transitions of the AUT. In recent years, several model construction approaches have been proposed for test generation of Android apps, which can be categorized into two kinds, static construc- tion and dynamic one. The former one leverages the static analysis techniques on the code of the AUT to extract the GUI components in each Activity of the AUT and the tran- sitions between Activities. However, this kind of approaches may fail to describe the changes in the screen of one Activ- ity during runtime, e.g., some views are instantiated under the conditions that should be determined dynamically. The latter one regards the AUT as a black-box and makes use of dynamic analysis techniques to ripper the GUI information and trace the transitions between Activities when the AUT is running. As a result, the model constructed by these ap- proaches does not contain code information. In addition, we find that the models proposed in the existing works are not intricate enough to describe the AUT behavior. The existing arXiv:1607.03258v1 [cs.SE] 12 Jul 2016

Upload: dinhngoc

Post on 14-Feb-2017

224 views

Category:

Documents


1 download

TRANSCRIPT

Target Directed Event Sequence Generationfor Android Applications

Jiwei Yan1,3, Tianyong Wu1,3, Jun Yan2,3, Jian Zhang1,3

1State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences2Technology Center of Software Engineering, Institute of Software, Chinese Academy of Sciences

3University of Chinese Academy of SciencesEmail:{yanjw, wuty, yanjun, zj}@ios.ac.cn

ABSTRACTTesting is a commonly used approach to ensure the quality ofsoftware, of which model-based testing is a hot topic to testGUI programs such as Android applications (apps). Exist-ing approaches mainly either dynamically construct a modelthat only contains the GUI information, or build a model inthe view of code that may fail to describe the changes ofGUI widgets during runtime. Besides, most of these modelsdo not support back stack that is a particular mechanism ofAndroid. Therefore, this paper proposes a model LATTEthat is constructed dynamically with consideration of theview information in the widgets as well as the back stack, todescribe the transition between GUI widgets. We also pro-pose a label set to link the elements of the LATTE modelto program snippets. The user can define a subset of thelabel set as a target for the testing requirements that needto cover some specific parts of the code. To avoid the stateexplosion problem during model construction, we introducea definition “state similarity” to balance the model accuracyand analysis cost. Based on this model, a target directed testgeneration method is presented to generate event sequencesto effectively cover the target. The experiments on severalreal-world apps indicate that the generated test cases basedon LATTE can reach a high coverage, and with the modelwe can generate the event sequences to cover a given targetwith short event sequences.

CCS Concepts•Software and its engineering → Software testingand debugging;

KeywordsAndroid GUI Model; Targeted Test Generation; Back Stack;Dynamic Modeling

ACM ISBN 978-1-4503-2138-9.

DOI: 10.1145/1235

1. INTRODUCTIONWith the success of smart mobile device market, mobile

application market ushered a high speed developing period,especially Android application market. Android apps, likeother software, need to be adequately tested to eliminatethe potential bugs and improve the quality. In the areaof testing, an essential step is test case generation, whichfocuses on how to automatically generate the test suite withhigh code coverage and strong fault detection ability.

Android apps are event-driven GUI programs. An An-droid app can be regarded as a collection of widgets, eachof which is defined in an Activity class that is provided byAndroid system to interact with the user. The user oper-ations on the components (corresponding to View class inAndroid) in the screen trigger the corresponding events todrive the app execute the corresponding code and transferfrom one widget to another. Thus, for an Android app undertest (AUT), a test input is a sequence of events associatedwith its widgets.

There are a number of test generation techniques for An-droid apps, of which model-based testing is an attractiveapproach for tackling this problem. The general steps ofmodel-based test generation are shown as follows: (1) designa formal and abstract model that can briefly describe thesoftware behavior; (2) translate the software to the model;(3) generate the test cases based on the model. The keypoint in the model-based testing is how to design and con-struct a proper model that can accurately and comprehen-sively describe the behavior of the AUT. In the area of An-droid apps, the model is often used to describe the Activitytransitions of the AUT.

In recent years, several model construction approacheshave been proposed for test generation of Android apps,which can be categorized into two kinds, static construc-tion and dynamic one. The former one leverages the staticanalysis techniques on the code of the AUT to extract theGUI components in each Activity of the AUT and the tran-sitions between Activities. However, this kind of approachesmay fail to describe the changes in the screen of one Activ-ity during runtime, e.g., some views are instantiated underthe conditions that should be determined dynamically. Thelatter one regards the AUT as a black-box and makes use ofdynamic analysis techniques to ripper the GUI informationand trace the transitions between Activities when the AUTis running. As a result, the model constructed by these ap-proaches does not contain code information. In addition, wefind that the models proposed in the existing works are notintricate enough to describe the AUT behavior. The existing

arX

iv:1

607.

0325

8v1

[cs

.SE

] 1

2 Ju

l 201

6

works often omit several particular mechanisms of Android,such as the back stack with complex launch mode that hasa big influence both on model states and transitions.

In this paper, we refine the existing models via comprehen-sively considering the GUI information and Android mecha-nisms, and propose a new model called LATTE to describedifferent states in Activities and the transitions betweenthese states during the execution of an Android app. Most ofthe existing models distinguish different states according toonly the basic information (like the view type and position)of views in the widgets. However, based on our observation,the program behaviour of the same widget will be differentwhen the status of its views and the back stack is different.Therefore, our LATTE model defines a more intricate statethat contains the view information as well as the status ofviews and the back stack to address this issue. On the otherhand, too fine-grained state of model may lead to the stateexplosion problem. We introduce “state similarity” to mergesimilar states to avoid exploring too many states in modelconstruction procedure.

Besides the GUI information, we also link the transitionsin the model to the code that are executed when the corre-sponding GUI events are triggered. We first define a labelset and map the code snippets to labels and then mark thetransitions of LATTE model with them. We define severallabels that correspond to user concern in the model as atarget and find a set of pathes based on LATTE to cover it.This model can be better used to generate test cases to coverspecific code snippets that the user is concerned about.

We adopt the dynamic construction technique to buildour LATTE model from an AUT. We first insert necessaryprobes into the AUT to record the runtime information re-lated to the back stack and labels. The instrumented AUTis driven to run on a testing framework to explore the GUIcomponents and construct the model on-the-fly. At last, wetraverse the model to generate feasible test cases to coverthe user given target. We implemented the proposed tech-niques into a model based testing tool called AppTag andcompared it with two state-of-art tools Monkey [5] and Dyn-odroid [20] (these tools outperform other existing test gen-eration tools [16]).

The main contributions of this work are summarized asfollows.

• Propose the LATTE model to describe the GUI char-acteristics in detail for generating test cases.

• Provide a dynamic construction approach for LATTEmodel.

• Propose an approach to generate test sequences tocover the user given target.

• Implement a model based test generation tool AppTagand evaluate it on real-world instances.

The remainder of this paper is organized as follows. InSection 2, we discuss some necessary background knowledgeabout Android Activity and event sequence generation. TheLATTE model we proposed will be described in Section 3.In the following section, we will introduce our model con-struction and target directed test generation approaches indetail. Then we evaluate our approach via the experimentson real-world Android apps in Section 5. Section 6 surveysthe related work and Section 7 gives the conclusion and dis-cusses the future work.

2. BACKGROUNDThe execution of an Android app is composed of a se-

quence of Activities. Accordingly, a test case of an app isalso a series of operations on Activities. In this section, wewill present some background knowledge on Activity, andthe techniques for event sequence generation, including theGUI ripping technique to extract the views in the widget,and the instrumentation technique for monitoring the oper-ations.

2.1 Android ActivityWhen an Activity is launched, it will display a widget that

is made up with a series of UI views such as buttons andtextviews that facilitate the user interaction. The screencan only be occupied by one Activity, therefore, Androidsystem introduces a back stack [3] to store all the launchedActivities. If a new Activity is activated, the former Activityshould be destroyed or pushed into the back stack. In thefollowing part of this subsection, we will briefly introducethe UI views and the back stack.

2.1.1 UI ViewsAndroid provides various kinds of built-in UI views for

user interaction that are all extended from the class View

and correspond to different UI events. Besides, Android alsoallows users to customize the UI views by extending the viewand overriding the standard methods. The customized viewis usually extended from slight modification or compositionof basic views.

The View class represents the basic building block for userinterface components. Except for the type of the componentand position displayed on the widget, some views also haveseveral extra attributes during runtime, like enabled, focusedor checked. We define these extra attributes as the statusof the view. In practice, the different statuses of a viewmay represent different program execution state and maylead to different program behavior (see Section 5.2). Forexample, a checkbox which is used for changing the settingof an app with checked and unchecked statuses are totallydifferent. Therefore, in this paper, we consider two viewsare the same only if all the attributes including the statusesare the same.

A view has several UI events corresponding to differentuser operations. For example, the possible events for theButton view are Click, LongClick and Press. In addition,some events should be triggered by a combination of severaluser operations, for instance, the typing event on EditText

view needs a text clearing operation followed by a text typ-ing operation. Table 1 shows the commonly used events ofUI views according to Android references [2], where Scroll

indicates the scroll operation of ListView, and setValue

denotes the operation of setting value for ProgressBar.

Table 1: UI views and eventsType Click

LongClickPress

ScrollClearTextTypeText

setValue

Button√ √

RadioButton√

CheckBox√

ImageView√ √

TextView√ √

EditText√ √

ListView√ √

ProgressBar√

Besides, there are some global events that can be executedat any program state and do not attach to UI views, such asRotate, click on the Home, Back and Power key. We considerall these events in analyzing the UI views of an Activity.

2.1.2 Back StackDuring the execution of the app, the user interacts with

a collection of Activities that are stored in the back stack.When the current Activity starts another one, the new Ac-tivity will be pushed on the top of the stack and takes thefocus. There is a special feature of Android system thatwhen user presses the hardware back key, the current Activ-ity will be popped and destroyed, and the previous Activityresumes. Activities in the stack can only be rearranged bypush and pop operations. Note that an Activity with differ-ent back stack may lead to different program behavior (seeSection 5.2).

Android system defines the Launch Mode [4] of an Activityto determine the evolution of the back stack when the Ac-tivity is launched. There are four launch modes, includingStandard, SingleTop, SingleTask and SingleInstance. Thelaunch modes of the Activities are declared in the Manifestfile of the app or profiled using intent flags in the code. Thediversity of launch modes makes the evolution of the backstack complex so that we need to take effort in correctlymodeling the back stack. The details of these launch modeswill be discussed in Section 4.3.2.

2.1.3 An Example of Views and Back StackTomDroid is a popular note-taking application in Android

platform. Fig. 1 shows six snapshots (a-f) of two ActivitiesTomDroidActivity and PreferencesActivity in this app,where the eight numbers (1-8) denote a sequence of eventsthat delete a note (1-2), adjust the settings to show all thedeleted notes (3-6) and recover the deleted note (7-8). There

1

(a) (b)

2

(c)

3

4

5

6

(d) (e)

8

7 1

(f)

Figure 1: TomDroid Application

are several types of views in the widget, including Button,TextView, ImageView, CheckBox, where we can click theButton to jump into another Activity or click the checkboxto change its status. These views correspond to differentevents described in Table 1. Besides, in the bottom of thewidget, there are three keys that are provided by the An-droid system. The “�” key is the hardware back key thatwill by default drive the app return to the previous Activityor destroy the app when the key is clicked.

When the app starts, it will directly jump to the Activ-ity TomDroidActivity (snapshot a) and push this Activityinto back stack. After clicking the last menu item “Settings”(event 4), the app will start a new instance of Activity Pref-

erencesActivity (snapshot d) and push it into back stack.A click operation can be executed on checkbox (event 5)to change its status, while it will not affect the back stack.After that, if we press the back key (event 6), current Ac-tivity PreferencesActivity will be destroyed and poppedfrom the back stack, and previous Activity TomDroidActiv-

ity will be resumed. Figure 2 visualizes the evolution of theback stack with the above procedure.

TomDroidActivity

TomDroidActivity

PreferencesActivity

TomDroidActivity

Click Settings (4) Press Back Key (6)

Figure 2: Change of Back Stack

2.2 Event Sequence GenerationFor the TomDroid example in Section 2.1.3, the event se-

quence from 1 to 8 forms a test case to check the note re-covery functionality. Our goal is to generate a set of eventsequences that cover such specific code that corresponds tothe user concerned functionalities. A simple way like Mon-key just randomly sends events to drive AUT to run untilthe functionality is triggered. Dynodroid improves this ideawith some heuristics. However, this type of random ap-proaches are not model based, i.e., they do not construct aglobal model to record the differences in the status of a sameActivity when it is traversed multiple times. As a result, therandomly generated test sequence usually contains a largenumber of events that burden the workload for debugging.

To generate the compact event sequences, we should con-struct a model that represents the transitions along withevents between the Activities during exploring the GUI toguide the future traversal. Furthermore, this model cancarry some deep connections between the status of GUIviews and the program logic, such as the back stack, tomake it more accurate.

The information of this model can be obtained via thecombination of existing dynamic program analysis techniques.The UI views can be extracted by the GUI ripping tech-nique. For the information that can not be captured fromGUI, we employ the instrumentation technique that insertsthe probes into specific points of the program and collectthe concerned information during execution. Here we brieflyintroduce these two techniques, and the model will be dis-

cussed in the next section.

2.2.1 GUI RippingGUI ripping is a commonly used technique to identify

the GUI elements in the GUI programs. There are severalframeworks that can obtain the information of views for theAndroid apps in runtime, such as Robotium [7] and Hierar-chyviewer [8]. In this paper, we leverage Robotium for GUIripping.

Robotium is an open source Android test framework thatdrives the AUT to execute under manually designed testcases. It is based on the test instrumentation mechanism[1] provided by Android system, that can monitor the inter-action of application and Android system, and control theexecution of AUT. Robotium also provides a series of APIsto capture and access the views and send instructions tosimulate user events.

2.2.2 Instrumentation TechniqueBesides the information of GUI views, we also need to ob-

tain some code information to construct our model. There-fore, we make use of the instrumentation technique to mon-itor the code snippets that we are interested in. This ap-proach modifies application’s original code by adding a fewstatements into it. The statements inserted are called pro-gram probes. They are usually added for examining the ex-ecution of program statements and runtime change of vari-ables. These probes will work during the dynamic running ofinstrumented application, and provide runtime informationof program statements.

The instrumentation technique is often implemented onthe source code, but the source code is not always availableespecially for apps in Android markets. Hence, we performinstrumentation on the “smali” byte-code, a readable formatof the byte-code of the app. The smali format provides mul-tiple keywords, e.g., the .method keyword denotes the begin-ning of a method body and invoke keyword is used to invokethe indicated method. We should put the probes properlyaccording to the grammar and semantic context of originalprogram. In addition, instrumentation on byte code shouldtake good care of registers to avoid conflicts and ensure thatthe application shall run normally after modification.

3. LABELLED ACTIVITY TRANSITIONMODEL

Many model based test generation research works makeuse of some abstract model to describe the AUT, for exam-ple, finite state machine (FSM) is widely used in previousworks [10, 14, 15, 17, 24]. However, these works ignore ei-ther the back stack or the statuses of UI views, that leads toan imprecise UI model. On the one hand, a model withoutback stack has difficulty in generating test cases containingback operations, as the destination of back operation is de-termined by information of back stack. On the other hand,the change of view status may have influence on the app’srunning state. Without the consideration of view status,some features of apps will be omitted.

In this section, we will describe a Labelled Activity Tran-sition graph with sTack and Events (LATTE) model andgive a similarity detection approach for constructing accu-rate app models.

3.1 Model Formalization

As discussed above, an Activity may contain multiplesuites of views and events in runtime, that may lead to theloss of some program states when we regard an Activity as astate. Therefore, we propose a LATTE model, whose mainidea is to split each Activity into one or more states, depend-ing on the current views and events in the screen and theback stack information. The LATTE model can be formallydescribed as a 5-tuple M = 〈S,La, T, s0, q〉, where

• S is a finite set of application’s runtime states. Anelement s ∈ S is a triple 〈a, Vs, Ls〉 where a denotesthe Activity that s corresponds to, and Vs indicates theset of views that belong to the state, and Ls is a list ofActivities in the back stack. (We have not addressedthe events here, the reason is that for a concrete statesplit from an Activity, the events are bound to the setof views.)

• La is the set of labels.

• T denotes the set of transitions. Each element t ∈ Tis a 4-tuple 〈src, e, la, des〉 representing the transitionfrom the source state src to the destination state descaused by event e bound to the state src, and thelabel set la ⊂ La denotes the labels assigned to thistransition.

• s0 ∈ S is the entry state that represents the initialstate of the app.

• q is a terminal state denoting that the app quits orjumps to another app.

Note that the denotations with subscript s, including theview set Vs and the back stack Ls, describe the attributesthat are affiliated to the state s. We should have regardedthe states that have the same Activity and a minor changein the view set or the back stack as different states. How-ever, in the real-world Android apps, this way may cause astate explosion problem during the model construction. Inorder to balance the performance of model accuracy and theanalysis cost of AUT, we propose a “state similarity” mea-surement to merge the similar states in the next subsection.

3.2 State SimilarityAccording to the definition of the LATTE model, a state is

a triple 〈a, Vs, Ls〉. We measure the similarity of two statesderived from the same Activity according to the views andback stack, of which views can be characterized with twoattributes, the list of GUI components and view status ofeach view (see Section 2.1.1).

Let s1 and s2 be two states, and two functions SimV andSimL calculate the similarity of the view and back stack. Wehave the following formula to compute the state similarityof two states s1 and s2.

Sim(s1, s2) = ω · SimV (s1, s2) + (1.0− ω) · SimL(s1, s2)

Where the function SimV returns the percentage of sameviews in all the views of the two states, i.e., SimV (s1, s2) =|V1

⋂V2|/|V1

⋃V2| where V1 and V2 denote the set of views

of s1 and s2 respectively. The function SimL returns 1 ifthe back stacks of the two states are the same, otherwise 0.We define the configurable parameter ω (0 ≤ ω ≤ 1.0) torepresent the weights of these two functions.

Two states with high similarity can almost be regardedas the same. Therefore, in our approach, we introduce thesimilarity threshold ST to avoid too many states split froman Activity. When building the LATTE model for an app,if the similarity of the newly explored state and an existingstate in the model is higher than ST , we will not introducea new state in the model. Instead, We will merge the newstate to the existing state that is most similar to it.

3.3 Label and TargetWe introduce a label set La to embed the code informa-

tion into our model. In general, each element in La corre-sponds to a part of specific code. For example, we can seteach label to represent a distinct method of the AUT, or allthe methods in the same class. The mapping rule for thelabels and the code snippets are designed according to theactual testing requirements. A fine-grained rule can makethe model contain more accurate code information while itincreases the model scale and the cost of the model con-struction. With the mapping rule, the labelling procedureis implemented via code instrumentation in our approach.

The motivation of this work is mainly inspired from theobservation that the testers are often concerned about somespecific parts of code in the AUT. For example, when thetesters want to test a functionality of the AUT, they onlyfocus on several methods related to this functionality. Wecall the set of these specific code the “target”. Formallyspeaking, in this work, we define the target as a set of la-bels in the label set La. For example, if we have a label setLa = {la1, la2, la3}, we can set the target as Ta = {la1, la3}representing that the test cases should cover all the transi-tions labeled with la1 and la3.

In the testing procedure, many kinds of subjects can be re-garded as targets. In this paper, we just consider two typesof targets, including covering a set of specific user-developedmethods and a set of system APIs related to resource andprivacy. The former one can make sense in the above situ-ation when the testers want to test a specific functionalityimplemented by developers. The latter one focuses on twokinds of important system APIs, where the misuse of theAPIs for the resources (Camera, Media Player and Sensors,etc.) will lead to performance decrease [22], and the pri-vacy related APIs are essential to the application security[13]. By labelling these targets, we can guide our approachto generate possible test cases that can reach these labels orthe combinations of them. Execution of these test cases cantrigger the bugs caused by specific parts of the program, orreveal some insecure operations of the app.

3.4 LATTE Model of TomDroidFig. 3 shows parts of the LATTE model of TomDroid

in Fig. 1. Each entity indicates a state that is marked byits state id as well as its back stack. For simplicity, we donot give the view information of each state in the figure.The states in the same dashed box correspond to the sameActivity. Each edge from one state to another is a transi-tion corresponding to an event in Fig. 1 and the red solidedge illustrates the label set of the transition is not empty.Here we deem label set La = {la1, la2} where la1 and la2

represent the invocation of the methods deleteNote and un-

deleteNote respectively.Note that in state s5 (snapshot d), the status of checkbox

“Show Deleted Notes”of Activity PreferencesActivity will

Entry

Label: undeleteNote

Label: deleteNote

Tomdroid

S2

Tomdroid

S3

Tomdroid

S4

Tomdroid

S7

Tomdroid

S8

Tomdroid

S1

Click Delete

Click Menu

Click Settings

Click Back

LongClick

Deleted Note

Click Undelete

LongClick Note

Preferences

Tomdroid

S5

Preferences

Tomdroid

S6

Click

Checkbox

TomdroidActivityTomdroidActivity

PreferencesActivity

Figure 3: LATTE Model of TomDroid

affect the layout of Activity TomDroidActivity that if it isnot checked, the deleted notes will not appear and can notbe recovered. Therefore, it is necessary to split the ActivityPreferencesActivity into two states (s5 and s6).

4. PROPOSED APPROACHGiven an AUT and a target, our goal is to generate a small

set of executable event sequences for covering the target. Inthis section, we will first give an overview of the approach,and then discuss the construction of the LATTE model andpropose an adaptive test generation approach.

4.1 OverviewFig. 4 shows the overview of our approach, which is com-

posed by three parts: Instrumentation, Model Constructionand Adaptive Test Generation.

Our approach takes an Android apk file and a label setas the input. Firstly, we instrument probes into the AUTto monitor the runtime information for target labelling andmodel generation. Secondly, we drive the instrumented AUTrunning on the Android device and exploring the AUT forconstructing the LATTE model. Then we generate exe-cutable test cases for the user given target based on themodel by a modified traversal algorithm with some heuris-tics. In this step, a feedback approach is used to check thefeasibility of test cases by monitoring the execution.

4.2 InstrumentationWe adopt a light-weight instrumentation method on byte-

code to insert probes monitoring the execution of code aboutActivity transition, back stack and labels. First of all, wescan the files decompiled from apk file to locate the state-ments related to targets and the creation and destroy op-

Instrumentation

Model Construction

Test Generation

Build Model

Analyze Model

Instrumented APK

LATTE Model

Test Execution

Result Detection

GUI Views & Runtime Info

Sending Instructions

/Execution Result

APK &Label File

ExecutableTest Cases

Figure 4: Approach Overview

erations on Activity. Next, we add probes following thestatements for logging their run-time information.

4.3 Model ConstructionAfter we have got an instrumented app, we make use of

the existing Android testing framework and develop a scriptto drive the app running on a device and construct theLATTE model on-the-fly according to the log information.The script performs an iterative operation of app explorationand model construction. During each iteration, our scriptobtains the views and status information of the Activity atthe top of the back stack, collects the candidate of possibleevents according to the views (refer to Table 1), and selectsthe next event and send to the testing framework. The iter-ation will terminate when all events have been triggered. Inthe rest of this subsection, we will describe our algorithm,and show how we handle the back stack and launch mode.

4.3.1 Traversal AlgorithmAlgorithm 1 shows the details of the model construction

that is based on the BFS traversal algorithm.

Algorithm 1 LATTE Model Construction

Input: Instrumented Apk File, Label SetOutput: LATTE model M1: var: Queue<State> q2: initialize q with the entry state s03: while q is not empty do4: get the first state sh from q5: drive the app to the state sh6: perform an unvisited event e of sh7: obtain the label set la8: create a new state sn with logging information9: create a new transition from sh to sn with e and la

10: find state sm that has maximum similarity with sn11: if Sim(sm, sn) > ST then12: merge sn to sm13: else14: append sn to queue q15: end if16: if all events of sh have been visited then17: remove sh from q18: end if19: end while

The algorithm maintains the modelM of explored part ofAUT and a queue q storing unvisited states. The explorationstarts from the entry state s0 and ends when all states havebeen visited. In each iteration, we first get the front state shfrom the queue q and drive the app to the state sh accordingto the event sequence from the entry state to the state shthat we record when sh is detected. Next, we select anunvisited event e of sh and execute it with this event toa new state sn. Then we collect the runtime informationof sn and create a transition from sh to sn. The event eand the labels la related to target are also assigned to thistransition. Then we will calculate the similarity values ofsh and existing states in the model M. If the maximumsimilarity value exceeds the threshold ST , we will mergethe new state to the existing state that is most similar to it,otherwise, we will append sn to queue q. If all events of shhave been visited, we will remove sh from q.

4.3.2 Launch Mode AnalysisIn Algorithm 1, we need to create a new state sn with

logging information in Line 8. According to the definitionof LATTE, a state is determined by its Activity, views andthe back stack, of which the back stack can not be obtaineddirectly from existing Android testing framework. In ourapproach, we maintain the back stack information for eachstate by monitoring the APIs onStartActivity() and fin-

ish() which will be invoked when a new Activity is startedand destroyed. The default operation of onStartActiv-

ity() is pushing new Activity instance into the back stack,but it may be complex when the launch mode is considered.We have addressed the influence of launch modes on backstack in section 2.1.2. Our work is a test generation for asingle app, thus we focus on the Standard, SingleTop andSingleTask modes, for the rest one SingleInstance modeis often used in cross-app transitions.

Table 2: Rules for Back Stack ChangingLaunch Mode Event Stack Bef. Stack Aft.

Standard Open a (. . . ,a) (. . . ,a,a)Standard Open b (. . . ,a) (. . . ,a,b)SingleTop Open a (. . . ,a) (. . . ,a)SingleTop Open a (. . . ,a,b) (. . . ,a,b,a)SingleTask Open a (. . . ,a,b) (. . . ,a)SingleTask Open c (. . . ,a,b) (. . . ,a,b,c)

Here we will discuss the rules for handling launch modes inTable 2 where the letters a, b and c denote different instancesof Activity. The Standard mode simply pushes and pops thenew launched Activity without considering the same Activ-ity in the stack, i.e., one Activity can be instantiated multi-ple times (the first row). The difference between SingleTopmode and Standard lies in that when the Activity is alreadyon the top of the back stack, the SingleTop mode refuses tocreate a new instance of the Activity (the third row). ThesingleTask mode does not allow multiple instances of oneActivity in a task, in other words, if an instance of the Ac-tivity already exists, all Activities above it will be poppedand current Activity will be stored on the top of the backstack (the fifth row).

During the model construction, we maintain a global stackto simulate the Android back stack of the app. When anew state is created (Line 8 in Algorithm 1), we determine

whether a new instance of Activity is started. If it is, we ob-tain its launch mode and update the global stack accordingto the rules in Table 2. Finally, we assign the informationof the global stack to the back stack of the new state.

4.4 Adaptive Test GenerationIn this subsection, we will introduce our adaptive target

directed test generation procedure in the following Algo-rithm 2. We take the LATTE model M and a target labelset Ta whose elements are labels to be covered as input. Wefirst construct the set LT of transitions whose label set con-tains at least one element in Ta. Then we leverage severalgraph algorithms on the model M to extract the depen-dency relationships between states and transitions. Com-bining these information and the transition set LT, we canobtain some reachability information of states and transi-tions, like which state can directly or indirectly reach sometransitions in LT, for assisting further adaptive test gener-ation. For a transition l ∈ LT , we try to find an eventsequence es /∈ TES that covers it by a modified DFS algo-rithm with some heuristics based on the above information.For example, the next transition that could lead to covermost labels in target has the highest priority to be selected.The sequence es will be converted into a test script whichcan be deployed and running on the device. After the ex-ecution, a running result will be returned that determineswhether es is a feasible path or not. If the execution of cur-rent path failed, we attempt to find another path executable.For each transition in LT , we set a MAXTRY limit to re-strict the times of attempts.

Algorithm 2 Adaptive Test Generation

Input: LATTE model M, Target Label Set TaOutput: Targeted Event Sequence Set TES1: TES← Φ2: construct the set LT of transitions labeled with Ta3: for each ` ∈ LT do4: try ← 05: while try < MAXTRY do6: try ← try + 17: if there exists an event sequence es /∈ TES that

covers ` then8: if es is executable then9: add es to TES

10: end if11: else12: break13: end if14: end while15: end for

5. EVALUATIONTo evaluate the effectiveness of our approach, we imple-

ment a tool Target directed Automatic test Generation (App-Tag) on the top of the testing framework Robotium to con-struct model and generate target directed test sequences forAUT. All of our experiments are done on a Samsung I9300cellphone, with its 1.4GHz CPU, 1GB RAM, and 16G ROM.The AUT will be reinstalled after each test to ensure thesame initial environment for each testing process.

5.1 Environment Setup

We have three research questions as follows.

• RQ1. Can the LATTE model effectively depict theGUI behaviour of the AUT?

• RQ2. How does the similarity impact the model scaleand coverage?

• RQ3. What is the effectiveness of our approach togenerate event sequences for covering the given target?

To answer these questions, we collect a number of appli-cations from F-Droid Repository [6] as well as an AndroidMarket in China [9], and conduct a series of experimentson them. Table 3 lists the detailed information of a part ofexperimental apps due to space limitations. The first col-umn denotes source information of the app that can be opensource or commercial, while the third column gives the sizeof the apps. The last three columns show the number ofclasses, methods according to the smali file, and Activitiesunder the package declared in the Manifest file of each app.

Table 3: Experimental ApplicationsOrigin App Size #C #M #A

open-source

aGrep 343 46 174 6AnyCut 452 11 66 6BookCatalogue 2736 877 4361 34Budget 189 63 272 8HotDeath 7926 28 355 3Jamendo 1341 50 203 2Nectroid 194 104 672 6PasswordMaker 1659 89 452 8TomDroid 1069 154 834 8Voicesmith 362 63 356 9Websearch 1898 45 176 3WhoHasMyStuff 788 24 139 2

comm-ercial

BubeiListen 3838 902 4637 78Compass 1381 29 316 2Cradio 1568 43 486 4MaMa 1053 5 39 4SougouSearch 8417 272 1083 48Terminal 11669 6 24 1

To answer RQ1, we measure the effectiveness of LATTEmodel with the code coverage of the test cases generatedbased on the model. The method and class coverage arechosen as our coverage metric since it is not easy to get thestatement coverage information for commercial apps whosesource code is usually not available. We collect the informa-tion of executed classes and methods via the instrumentationtechnique to calculate the class and method coverage. Welabel all the methods of AUT and set the target as all themethods, then we generate test sequences and calculate themethod and class coverage. We also pick two popular auto-matic testing tools Monkey and Dynodroid for comparison.A recent research [16] shows that Monkey and Dynodroidachieve higher coverage than other existing tools.

For RQ2, we design experiments to show that the similar-ity setting of states influences the scale of LATTE model. Inthese experiments, we set the value of parameter ω as 0.8 incalculating similarity and the value of threshold ST from 0to 0.9. We compare the number of transitions in generated

LATTE model and the coverage by traversing this modelunder different values of ST .

For RQ3, we define two types of targets, including user-developed methods as well as resource and privacy relatedsystem APIs, and generate event sequences to cover them.Experiments are done between AppTag and Monkey to com-pare the minimal sequence length they need to cover thegiven target. To get the minimal sequence length of Mon-key, we implement a script to repeatedly run Monkey withthe event limits increased by 1000 in each iteration, until thegenerated sequence covers the target. We also leverage theinstrumentation technique to determine whether the gener-ated test cases cover the targets.

5.2 Case StudyIn the following part, we will present two simple case stud-

ies to show how the view status and the back stack influencethe behavior of AUT.

The app WhoHasMyStuff is a management tool used fortracking things that user lent out. When adding a new entryfor the items lent to others, user can define a description,a lent date and a contact person for it. Besides, this appprovides an option for adding the entry to calender. Usercan choose this functionality by clicking a checkbox on Ac-tivity AddObject. If the status of this checkbox is checked,some calendar related views will show up on current Activ-ity. These new views are used to determine the expecteddate for reminding. In this occasion, the status change ofa view influences the Activity layout, furthermore, it influ-ences the corresponding events of current state. Withoutthe consideration of view status, we will miss the transitionthat are triggered by new instantiated views.

Another app Voicesmith is used for processing audio sig-nals in real-time. On the entry state of the Activity Home-

Activity, it provides a TextView view item that directs tothe Activity PreferencesActivity, besides, another Dafx-

Activity also provides an ImageButton view that leads tothe Activity PreferencesActivity. Even though the twoActivities are nearly the same, there are still some differ-ences between them. If we send a “Back” event to the AUTat these two states, the behaviour of them will be different.Therefore, they should not be merged as the same one.

To sum up, both the view status and the back stack will

0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 00 . 20 . 30 . 40 . 50 . 60 . 70 . 8

Cove

rage

E v e n t N u m b e r o f D y n o d r o i d

0 3 0 0 0 6 0 0 0 9 0 0 0 1 2 0 0 0 1 5 0 0 0 1 8 0 0 0 2 1 0 0 00 . 20 . 30 . 40 . 50 . 60 . 70 . 8 M e t h o d C o v e r a g e

C l a s s C o v e r a g e

Cove

rage

E v e n t N u m b e r o f M o n k e y

Figure 5: Coverage of Monkey & Dynodroid

impact the behaviour of the AUT. The consideration of themcan help us improve the accuracy of our LATTE model.

5.3 The Effectiveness of LATTE ModelIn this section, we evaluate the effectiveness of LATTE

model by comparing the method and class coverage of thetest cases generated based on the model with two testingtools Monkey and Dynodroid. These tools require the userto provide the number of generated events, thus we set upan experiment on 10 apps from our benchmark to obtainthe suitable value of the number. Fig. 5 shows the averagecoverage of these two tools under different number of events.From this figure, we can see that the coverage will becomestable with the increase of events. Specifically, the coverageof Monkey will become stable after executing 15000 events,and the number for Dynodroid is about 4000 (Machiry etal. ran Monkey for 10,000 events and Dynodroid for 2,000events in their work [20]), so we set these two values as thelimits of events for the two tools in our experiments.

Then, we will give the detailed information about coverageresults in the following Table 4. We compare the method andclass coverage under different testing tools. The similaritythreshold ST is set to be an experimental value 0.7 (referto Section 5.4). We also provide the total number of eventsgenerated by AppTag in the sixth column.

5.4 State Similarity and Model ScaleAs shown in the Table 4, test cases generated by AppTag

reach the same level in class and method coverage as Monkeyand Dynodroid. Our tool and Monkey reach lower coveragein several cases (Voicesmith and MaMa) than Dynodroid.The reason lies in that we have not implemented simulation

- 0 . 1 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 00

1 0 0

2 0 0

3 0 0

4 0 0

5 0 0

6 0 0

Trans

ition N

umbe

r

S i m i l a r i t y T h r e s h o l d

T e r m i n a l H o t D e a t h T o m D r o i d W e b s e a r c h M a M a B u d g e t N e c t r o i d C r a d i o S o u g o u S e a r c h

(a) Similarity Threshold and Model Scale

- 0 . 1 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9 1 . 0

2 0

3 0

4 0

5 0

6 0

7 0

8 0

Metho

d Cov

erage

(%)

S i m i l a r i t y T h r e s h o l d

T e r m i n a l H o t D e a t h T o m D r o i d W e b s e a r c h M a M a B u d g e t N e c t r o i d C r a d i o S o u g o u S e a r c h

(b) Similarity Threshold and Coverage

Figure 6: Impact of Similarity Threshold

Table 4: Class and Method Coverage Comparison

ApplicationMonkey Dynodroid AppTag

CC(%) MC(%) CC(%) MC(%) #E CC(%) MC(%)

aGrep 56.5 30.5 77.0 58.0 112 63.0 37.9AnyCut 63.6 60.6 63.6 62.1 55 63.6 51.5BookCatalogue 19.8 13.5 16.5 11.1 149 22.3 14.5HotDeath 85.7 75.3 67.8 58.3 241 85.7 78.8Jamendo 64.0 32.5 37.0 22.0 89 72.0 44.8Nectroid 85.6 56.7 82.3 63.9 196 85.6 59.5PasswordMaker 17.9 17.0 49.0 44.0 305 43.8 39.4TomDroid 36.4 25.2 57.6 40.0 255 56.5 39.2Voicesmith 23.8 5.6 46.0 36.2 135 22.2 5.6Websearch 64.4 49.2 57.8 45.4 108 68.9 55.7BubeiListen 32.8 32.0 17.1 8.9 540 53.9 50.4Compass 48.3 14.2 48.2 13.9 62 43.3 22.0MaMa 60.0 25.6 80.0 66.6 32 60.0 28.2SougouSearch 36.0 28.7 28.3 11.8 477 35.7 31.7Terminal 100.0 80.0 100.0 76.0 28 100.0 80.0

of system events and cross-app testing in our prototype,which will be left as the future work. Besides, the totalnumber of events generated by our tool. For most of theexperimental apps, except for several complex commercialapps, AppTag constructs the model in half an hour. Com-pared with other two tools, AppTag only needs to build themodel once and can reuse the model to generate test suitesfor multiple targets within several minutes.

We can control the size of model by setting different thresh-old ST for similarity. A higher threshold ST may cause morestate splitting and lead to a more accurate model, and fur-ther lead to test set with higher coverage. In this subsection,we will discuss the impact of ST to the scale of the modeland the benefit to the coverage from high ST . Fig. 6 demon-strates the tendency of the number of transitions in modeland the method coverage of generated test cases under differ-ent values of ST . As we mentioned before, a high thresholdST may cause an extremely large even infinite model scale,therefore, we set 2 hours as the upper bound of the executiontime, and do not show the result if the model constructionis not finished within this bound.

As we can see, with the increase of similarity thresholdST , the scale of the model increases dramatically. Obvi-ously, the cost of model construction will increase accord-ingly. However, the coverage of test suites generated fromthe model will not increase significantly when ST reaches acertain level. Therefore, ST is a proper control variable tomake a trade-off between the accuracy and efficiency. Wefind that 0.7 is a reasonable choice according to Fig. 6.

5.5 Test Generation for Covering TargetsIn this part, we will give some targets including specific

methods and Android APIs, and compare the event sequencesgenerated by AppTag and Monkey.

User-Developed Method Target. The experimentalresult of user-developed method target is shown in Table 5.The second column lists the information of target, it can bea single label, or a set of several labels. The third columngives the minimal number of events for Monkey to cover thetarget. Note that the number of events n denotes Monkeycan cover the target by an event sequence whose number isin the range of (n − 1000, n]. The last column presents the

number of events that AppTag needed to cover the target.It is composed by two numbers, the number of events usedfor model construction and the length of sequence generatedto cover the target. It shows that Monkey will trigger thetarget with a long sequence, especially when the AUT iscomplex or the target has a complicated execution logic,while AppTag can reach the target using an extremely shortevent sequence.

Table 5: user-Developed Method Target

Application TargetNumber of EventsMonkey AppTag

HotDeath showCardHelp 3000 241/3PasswordMaker onExportClick 12000 305/2

TomDroiddeleteNote,

undeleteNote50000 255/12

WhoHas-MyStuff

updateDate,updateReturnDate

9000 102/8

CradiosearchRadio,addFavorite,addDelFavo

4000 394/14

SougouSearch addCard 21000 477/4

System API Target. Table 6 shows the experimentalresult of system API target. The second column gives thesystem APIs we picked, including resource (VelocityTracker,AudioRecord, URL) and privacy (ContactsContract) relatedAPIs. These experiments also demonstrate that AppTag cancover the target with a short sequence.

6. RELATED WORKThere are many kinds of test generation approaches on

Android apps, including random testing, model-based andsystematic testing. In this section, we will introduce severalrepresentative works based on these approaches and high-light the differences between our work and theirs.

Random Testing. Monkey [5] is a widely used black-box testing tool, which can send sequences of random eventsto android apps. It is simple and fully automatic that cangenerate a great deal of test events within a short time.There are works based on Monkey for detecting GUI bugs

Table 6: System API Target

Application TargetNumber of EventsMonkey AppTag

AnyCutContactsContract(CONTENT URI)

1000 55/2

BudgetVelocityTracker

(obtain)4000 246/2

VoicesmithAudioRecord

(release)3000 135/2

WhoHas-MyStuff

ContactsContract(CONTENT URI)

2000 102/2

BubeiListenURL

(openConnection)1000 540/5

[18] and security bugs [21]. However, Monkey is not suitablefor generating highly specific event sequences.

Dynodroid [20] proposed by Machiry et al. provides amore efficient random GUI exploration approach comparedwith Monkey. They define several strategies for selectingevents to guide the test generation procedure and supportsystem event generation by instrumenting the Android frame-work.

Model-based Testing. Model-based testing has beenwidely studied and applied in testing Android apps recently.A key point of model-based testing is to construct a modelthat can accurately depict the software behaviour.

Several researches construct the model by static analysis.W. Yang et al. [24] model the GUI behavior of applica-tion as a FSM, they proposed an approach that uses staticanalysis on Java source code of Android to extract actionsassociated with view components on a GUI state, and im-plemented a tool called ORBIT. S. Yang et al. [23] provideda similar model called Window Transition Graph (WTG),with a more accurate static callback analysis, it gives morecareful model of currently-active windows stack and windowtransition. There are some differences between them and ourwork. The first one is that their model construction relies onthe source code of the AUT, while we can handle the apk filedirectly. The second one is that they build a model staticallythat misses the changes of GUI screen during runtime.

Some researchers leverage dynamic analysis techniques toconstruct the model of the AUT. Amalfitano et al. [11] im-plemented a tool called AnroidRipper to explore the GUIviews of the AUT. However, the model produced by thistool does not distinguish the different statuses of views inthe same Activity. As a result, each GUI object may beincluded multiple times so that the size of the model is toolarge. Azim et al. [14] proposed Activity Transfer Graph(ATG) as their exploration model. They design a staticanalysis algorithm on the AUT to extract the Static Activ-ity Transfer Graph (SATG), and use dynamic GUI explo-ration to handle dynamic activities layouts to complementthe SATG. However, they regard the Activity as the min-imum unit in ATG and also do not consider the differentstatuses of views and the back stack of the same Activity.

Model-based testing is also used to detect some specificbugs in Android apps, for example, Zhang et.al [25] pro-posed a model-based test generation approach to expose re-source leak defects in Android apps. They construct theWTG model proposed by Yang et al. [23] to describe theAUT and try to generate test cases for two important cate-

gories of neutral sequences, based on common leak patternsspecific to Android. The main idea of their work is a bit likeour work in that the common resource leak patterns can beregarded as the target. However, as we mentioned above,the WTG model is constructed by static analysis that mayfail to describe the change of GUI screen during runtime. Inaddition, their target is only related to the GUI views ratherthan the code snippets that user concerned in our work.

Systematic Testing. ACTEve [12] is a concolic-testingtool that generates sequences of events automatically andsystematically. It symbolically tracks events from the gen-erated point in the framework to the handled point in theapp, thus both the framework and the AUT need to be in-strumented. Besides, ACTEve can handle system events aswell as UI events. The limitation is that it only alleviatesbut not avoid the path explosion problem, and it currentlyhandles only tap event.

Jensen et al. [19] provide another concolic-testing ap-proach that aims at automatically finding event sequencesthat reach a given target line in the application code. Thisapproach improves automated testing for Android applica-tions that are not computationally heavy but may have com-plex user interaction patterns. However, the work of sym-bolic execution can only process integer but not String orother complex data type. In addition, this approach alsoneed a model for test case generation and they build it man-ually.

Choudhary et al. [16] conduct an empirical study on ex-isting testing tools for Android apps. An interesting discov-ery is that Monkey and Dynodroid which are based on therandom exploration strategy can reach higher coverage thanother tools with more sophisticated strategies. The majorreason of this phenomenon is that most of the app behaviorscan be exercised by generating only UI events. Our LATTEmodel can accurately and comprehensively describe these UIevents and the experimental results show that the test casesgenerated by it can achieve similar coverage compared withMonkey and Dynodroid using shorter event sequences.

7. CONCLUSIONWe proposed a dynamic way to model the AUT and then

generate test cases to cover the given targets. To describethe AUT accurately, we presented a LATTE model withback stack, view and event information for the model-basedtesting of Android apps. Different from other dynamic ap-proaches, our model also represents some of the code infor-mation via the label mechanism that guides the test gener-ation procedure to cover the code snippets user concernedquickly. We have evaluated the effectiveness of our approachon several real-world apps, the result shows that it achievesthe same coverage as the state-of-art tools, with a shorterevent sequence.

We believe that our approach can greatly promote the ef-fectiveness of Android testers and help developers for targetdirected testing. There are some possible ways to improveour approach. The cross application invocation is commonlyused by developers and it influences the change of back stackaccording to its launch mode. Another potential improve-ment lies in the target set. Currently we regard a target asan unordered set of labels. If we introduce temporal logicto the target set, our work can be extended to dynamicallycheck for some bugs with temporal properties. All these willbe left as our future work.

8. REFERENCES[1] Android developers.

ActivityInstrumentationTestCase2.http://developer.android.com/reference/android/test/ActivityInstrumentationTestCase2.html.

[2] Android developers. Android View. http://developer.android.com/reference/android/view/View.html.

[3] Android developers. Back Stack.http://developer.android.com/guide/components/tasks-and-back-stack.html.

[4] Android developers. Launch Mode.http://developer.android.com/guide/topics/manifest/activity-element.html#lmode.

[5] Android developers. ui/application exerciser monkey.http://developer.android.com/tools/help/monkey.html.

[6] F-Droid. https://f-droid.org.

[7] Google code. Robotium.http://code.google.com/p/robotium/.

[8] Hierarchy Viewer | Android Developers.developer.android.com/tools/help/hierarchy-viewer.html.

[9] Wandoujia. http://www.wandoujia.com/apps/.

[10] D. Amalfitano, A. R. Fasolino, P. Tramontana, S. D.Carmine, and A. M. Memon. Using GUI ripping forautomated testing of android applications. InIEEE/ACM International Conference on AutomatedSoftware Engineering, ASE’12, pages 258–261, 2012.

[11] D. Amalfitano, A. R. Fasolino, P. Tramontana, S. D.Carmine, and A. M. Memon. Using GUI ripping forautomated testing of android applications. InProceedings of the 2012 IEEE/ACM InternationalConference on Automated Software Engineering, pages258–261, 2012.

[12] S. Anand, M. Naik, M. J. Harrold, and H. Yang.Automated concolic testing of smartphone apps. In20th ACM SIGSOFT Symposium on the Foundationsof Software Engineering (FSE-20), SIGSOFT/FSE’12,2012, page 59, 2012.

[13] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel,J. Klein, Y. L. Traon, D. Octeau, and P. McDaniel.Flowdroid: precise context, flow, field, object-sensitiveand lifecycle-aware taint analysis for Android apps. InProceedings of the 2014 ACM SIGPLAN Conferenceon Programming Language Design andImplementation, page 29, 2014.

[14] T. Azim and I. Neamtiu. Targeted and depth-firstexploration for systematic testing of android apps. InProceedings of the 2013 ACM SIGPLAN InternationalConference on Object Oriented Programming SystemsLanguages and Applications, OOPSLA 2013, part ofSPLASH 2013, pages 641–660, 2013.

[15] W. Choi, G. C. Necula, and K. Sen. Guided GUItesting of android apps with minimal restart andapproximate learning. In Proceedings of the 2013 ACMSIGPLAN International Conference on ObjectOriented Programming Systems Languages andApplications, OOPSLA 2013, part of SPLASH 2013,pages 623–640, 2013.

[16] S. R. Choudhary, A. Gorla, and A. Orso. Automatedtest input generation for android: Are we there yet?In 30th IEEE/ACM International Conference on

Automated Software Engineering, ASE 2015, pages429–440, 2015.

[17] S. Hao, B. Liu, S. Nath, W. G. J. Halfond, andR. Govindan. PUMA: programmable ui-automationfor large-scale dynamic analysis of mobile apps. In The12th Annual International Conference on MobileSystems, Applications, and Services, MobiSys’14,2014, pages 204–217, 2014.

[18] C. Hu and I. Neamtiu. Automating GUI testing forandroid applications. In Proceedings of the 6thInternational Workshop on Automation of SoftwareTest, AST 2011, pages 77–83, 2011.

[19] C. S. Jensen, M. R. Prasad, and A. Møller.Automated testing with targeted event sequencegeneration. In International Symposium on SoftwareTesting and Analysis, ISSTA ’13, pages 67–77, 2013.

[20] A. Machiry, R. Tahiliani, and M. Naik. Dynodroid: aninput generation system for android apps. In JointMeeting of the European Software EngineeringConference and the ACM SIGSOFT Symposium onthe Foundations of Software Engineering,ESEC/FSE’13, pages 224–234, 2013.

[21] R. Mahmood, N. Esfahani, T. Kacem, N. Mirzaei,S. Malek, and A. Stavrou. A whitebox approach forautomated security testing of android applications onthe cloud. In 7th International Workshop onAutomation of Software Test, AST 2012, pages 22–28,2012.

[22] T. Wu, J. Liu, Z. Xu, C. Guo, Y. Zhang, J. Yan, andJ. Zhang. Light-weight, inter-procedural andcallback-aware resource leak detection for androidapps. IEEE Transactions on Software Engineering,2016. Accepted, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7442579.

[23] S. Yang, H. Zhang, H. Wu, Y. Wang, D. Yan, andA. Rountev. Static window transition graphs forandroid. In 30th IEEE/ACM International Conferenceon Automated Software Engineering, ASE, 2015,pages 658–668, 2015.

[24] W. Yang, M. R. Prasad, and T. Xie. A grey-boxapproach for automated gui-model generation ofmobile applications. In Fundamental Approaches toSoftware Engineering - 16th International Conference,FASE 2013, Held as Part of the European JointConferences on Theory and Practice of Software,ETAPS 2013, pages 250–265, 2013.

[25] H. Zhang, H. Wu, and A. Rountev. Automated testgeneration for detection of leaks in androidapplications. In AST, Accepted, 2016.http://web.cse.ohio-state.edu/presto/pubs/ast16.pdf.