a game-theoretic approach for dynamic information flow...

0018-9286 (c) 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAC.2020.2976040, IEEETransactions on Automatic Control

A Game-Theoretic Approach for Dynamic Information FlowTracking to Detect Multi-Stage Advanced Persistent ThreatsShana Moothedath, Member, IEEE, Dinuka Sahabandu, Joey Allen, Andrew Clark, Member, IEEE

Linda Bushnell, Fellow, IEEE, Wenke Lee, Member, IEEE, and Radha Poovendran, Fellow, IEEE

Abstract—Advanced Persistent Threats (APTs) infiltrate cy-ber systems and compromise specifically targeted data and/orresources through a sequence of stealthy attacks consisting ofmultiple stages. Dynamic information flow tracking has beenproposed to detect APTs. In this paper, we develop a dynamicinformation flow tracking game for resource-efficient detectionof APTs via multi-stage dynamic games. The game evolves on aninformation flow graph, whose nodes are processes and objects(e.g., file, network endpoints) in the system and the edges capturethe interaction between different processes and objects. Eachstage of the game has pre-specified targets that are characterizedby a set of nodes of the graph. The goal of the APT is toevade detection and reach a target node of each stage. The goalof the defender is to maximize the detection probability whileminimizing performance overhead on the system. The resourcecosts of the players are different and the information structureis asymmetric, resulting in a nonzero-sum imperfect informationgame. We first calculate the best responses of the players andthen compute Nash equilibrium for single stage attacks. We thenprovide a polynomial-time algorithm to compute a correlatedequilibrium for the multi-stage attack case. Finally, we simulateour model and algorithm on real-world nation state attack dataobtained from the Refinable Attack INvestigation (RAIN) system.

Index Terms—Multi-stage attacks, Multi-stage dynamic game,Advanced Persistent Threats (APTs), Information flow tracking

I. INTRODUCTION

Advanced Persistent Threats (APTs) are long-term stealthyattacks mounted by intelligent and resourceful adversaries withthe goal of sabotaging critical infrastructures and/or exfil-trating critical information. Typically, APTs target companiesand organizations that deal with high-value information andintellectual property. APTs monitor the system for a long timeand perform tailored attacks that consist of multiple stages[1]. In the first stage of the attack, APTs start with an initialreconnaissance step followed by an initial compromise. Oncethe attacker establishes a foothold in the system, the attackertries to elevate the privileges in the subsequent stages and toproceed to the target through more internal compromises. Theattacker then performs data exfiltration at an ultra-low-rate.

S. Moothedath, D. Sahabandu, L. Bushnell, and R. Poovendran arewith the Department of Electrical and Computer Engineering, Universityof Washington, Seattle, WA 98195, USA. sm15, sdinuka, lb2,[email protected].

A. Clark is with the Department of Electrical and Computer En-gineering, Worcester Polytechnic Institute, Worcester, MA 01609, [email protected].

J. Allen and W. Lee are with the College of Computing, Georgia Instituteof Technology, Atlanta, GA 30332 USA. [email protected],[email protected].

This work was supported by ONR grant N00014-16-1-2710 P00002 andDARPA TC grant DARPA FA8650-15-C-7556.

Detecting APTs is a challenging task as these attacksare stealthy and customized. However, APTs introduce in-formation flows, as data-flow and control-flow commands,while interacting with the system. Dynamic Information FlowTracking (DIFT) is a promising detection mechanism againstAPTs as DIFT detects adversaries in a system by trackingthe traces of the information flows introduced in the system[2]. DIFT taints or tags sensitive information flows across thesystem as suspicious, and tracks the propagation of taggedflows through the system, and generates security analysis,referred to as traps, which are based on certain pre-specifiedsecurity rules, for any unauthorized use of tagged data [3].There is an inherent trade-off between the effectiveness ofDIFT and the resource costs incurred due to memory overheadfor tagging and tracking non-adversarial information flows.

Our objective in this paper is to develop a resource-efficient analytical model of DIFT to detect multi-stage APTsby an optimal tagging and trapping procedure. Adversarialinteraction makes game theory a promising framework tocharacterize the trade-off between detection efficiency and costof detection and develop an optimal DIFT defense, which isthe contribution of this paper. Each stage of the APT attackis a stage in our multi-stage game model and is characterizedby a unique set of critical locations and critical infrastructuresof the system, referred to as destinations. The contributionsof this paper are the following:

• We model the interaction of APTs and DIFT with thesystem as a two-player multi-stage nonzero-sum game withimperfect information structure. The adversary strategizesto reach a destination node and the defender strategizes todetect the APT in a resource-efficient manner. A solutionto this game gives an optimal defense policy for DIFT tomaximize the probability of APT detection while minimiz-ing memory and performance overhead on the system.• We develop algorithms to compute best responses of the

players. The best response of the adversary is obtained byreducing it to a shortest path problem on a directed graphsuch that a shortest path corresponds to a path of maximumprobability of reaching the final target. The best responseof the defender, which is a subset of nodes, is approximatedby the submodularity property of its payoff function.• We analyze a special case of the problem where the attack

is a single-stage attack. For this case, we characterize theset of Nash equilibria of the game. This characterization isobtained by proving the equivalence of the dynamic gameto a suitably defined bimatrix-game formulation.• We provide a polynomial-time iterative algorithm to com-

pute a local correlated equilibrium of the game for themulti-stage attack. Our algorithm provides locally optimal

Authorized licensed use limited to: University of Washington Libraries. Downloaded on April 06,2020 at 16:23:09 UTC from IEEE Xplore. Restrictions apply.



Table I: Key Notations

G = (VG ,EG) Information flow graph (IFG)N Number of nodes in IFGM Number of attack stagesλ Attacker entry points of IFGD Attacker destinations of IFGD j Attacker destinations of stage js j

i Node si of IFG in stage jN (si) Neighbor set of node si

S State space of gamePA Adversarial playerPD Defender playerAA Action set of adversaryAD Action set of defenderpA Strategy of adversarypD Strategy of defenderUA Payoff of adversaryUD Payoff of defenderPT ( j) Probability of adversarial flow detected at stage jPR( j) Probability that flow reaches destination set D jCD(si) Cost of tagging flow through siWD(si) Cost of trapping flow through siΩD Set of all paths from s0 to Dπ(ω) Probability of adversary choosing path ω ∈ΩDp(ω) Probability of detection along path ω ∈ΩDF = (VF ,EF) Flow networkS Source-sink cut node set of FS? Source-sink min-cut node set of F

equilibrium strategies for both players by transforming thetwo-player game to an (N(M+2)+ |Λ|+1)-player game,where N denotes the number of processes and objects inthe system, M denotes the number of stages of the APTattack, and |Λ| denotes the size of the set of security rules.• We perform experimental analysis of our model on the

real-world multi-stage attack data obtained using RefinableAttack INvestigation (RAIN) framework [4], [5] for a threeday nation state attack.

Related WorkThere are different architectures for DIFT available in

the literature to prevent a wide range of attacks [6]. Thefundamental concepts in these architectures remain the same,however, they differ in the choice of tagging units, tagpropagation rules, data- and control-flow dependencies basedrules, and the set of security rules used for verification ofthe authenticity of the information flows [7]. While softwaremodeling of DIFT architecture is available, we provide ananalytical model of DIFT. Specifically, we model DIFT todetect APTs by tracking data-flow-based information flows.

Game theory has been widely used in the literature toanalyze and design security in cyber systems against differenttypes of adversaries [8], [9]. For instance, the FlipIt gamein [10] captures all the interaction between APTs and thedefender when both the players are trying to take control ofa cyber system. In [10], APT and defender both take actionsperiodically and pay a cost for each of their action. Lee etal. in [11] introduced a control-theoretic approach to modelcompeting malwares in the FlipIt game. Game models are

available for APT attacks in cloud storage [12] and cybersystems [13]. The interaction between an APT and a defenderthat allocates Central Processing Units (CPUs) over multiplestorage devices in a cloud storage system is formulated asa Colonel Blotto (zero-sum) game in [12]. Another zero-sumgame model is given in [13] to model the competition betweenAPT and defender in a cyber system.

Often in practice, the resource costs for the defender and theadversary are not the same, hence the game model is nonzero-sum. In this direction, a nonzero-sum game model is given in[14] to capture the interplay between the defender, the APTattacker, and the insiders for joint attacks. The approach in[14] models the incursion stage of the APT attack, while ourmodel in this paper captures the lateral propagation stages ofan APT attack. More precisely, we provide a multi-stage gamemodel that detects APTs by implementing a data-flow-basedDIFT detection mechanism while minimizing resource costs.

The conference versions of this paper establish a DIFT-based game model for single-stage attacks [15] and multi-stage attacks [16]. The approaches in [15] and [16] considereda DIFT architecture in which the locations in the system toperform security analysis, called traps or tag sinks, are pre-specified and known to both the players and the defenderwill select the data channels that are to be tagged. In thispaper, we provide an analytical model for a data-flow basedDIFT architecture that selects not only the data channels to betagged, but also the locations to conduct the security analysis,i.e., traps or tag sinks, and the security rules that are to beverified for both single and multi-stage attacks.

Organization of the PaperThe rest of the paper is organized as follows. Section II de-

scribes the preliminaries of DIFT and the graphical representa-tion of the system. Section III introduces the notations used inthe paper and then presents the game formulation. Section IVdiscusses the solution concept of the game. Section VI presentsa solution approach to the game for the single-stage attack.Section VII presents a solution to the game for the multi-stage attack. Section VIII explains the numerical simulationresults of the model and results on real-world data. Finally,Section IX gives the concluding remarks.

II. PRELIMINARIES

In this section, we discuss the detection mechanism DIFTand the graphical representation of the system referred to asthe information flow graph. Key notations are summarized inTable I.

A. Dynamic Information Flow Tracking (DIFT)DIFT consists of a software module (execution monitor)

and two hardware mechanisms (dynamic information track-ing and security tag checking) added to the processor coreof the system [3]. DIFT detection system has three majorcomponents: 1) tag sources, 2) tag propagation rules, and3) tag sinks or traps. A tag is a bit marking that denotesthe sensitivity of a data-flow. Input from untrusted sourcesis tagged as being potentially malicious by DIFT [17]. Not allthe input channels are suspicious as there are some secured




sources which are protected from external channels. The tagstatus of the information flows propagate through the systembased on the pre-specified propagation rules which are eitherdata-flow-based or data- and control-flow-based.

Conventional DIFT will tag all the sensitive channels inthe system. This, however, results in tagging numerous au-thentic flows, referred to as overtagging [7], which leads tofalse alarms and performance overhead resulting in systemslowdown. On the other hand, untagged spurious flows dueto undertagging are security threats to the system. Moreover,conventional DIFT only adds a tag and never removes a tag[7]. To reduce overtagging, the notion of tag sanitization wasintroduced in [7]. The developer is given an option to test thetag status of the data and sanitize it, i.e., untaint, whenever thisseems appropriate [17]. For example, the output of constantoperations (where the output is independent of the source data)and a tagged flow successfully passing all security rules canbe untagged. An efficient tagging policy must incorporate tagsanitization and perform selective tag propagation in such away that both overtagging and undertagging are minimized.

Tagged flows are inspected by DIFT at locations calledtag sinks, also referred to as traps, in order to determine theruntime behavior of the system [6]. Tag sinks are generated inthe system when an unusual use of a tagged flow is detected.At tag sinks, DIFT performs analysis using the security rulesthat are based on the details of the associated flow, such asterminal points of the flow and the path traversed, and decidesif the flow is spurious or not. In the case where the DIFTconcludes that the flow is spurious, the victimized process isterminated by the operating system [3]. On the other hand, ifthe flow is found to not be spurious, then the system continuesits operation. The selection of security rules and the locationsof the tag sinks must be optimal to reduce performance andmemory overhead.

B. Information Flow Graph (IFG)

An information flow graph G = (VG ,EG) is a graphicalrepresentation of the system. The node set VG = s1, . . . ,sNcorresponds to the processes, objects, and files in the systemand edge set EG ⊆ VG ×VG represents interactions betweendifferent nodes that are obtained from the system log data forthe whole-system execution and workflow during the entireperiod of logging. The node set D ⊂VG denotes the subset ofnodes that correspond to critical data centers and the criticalinfrastructure sites of the system, known as destinations.We assume that the destination nodes are at least one nodeaway from the entry points. We consider multi-stage attacksconsisting of M stages, where each stage is characterized by aset of destination nodes. The set D j := d j

1, . . . ,djn j denotes

the set of destinations in the jth stage of the attack and henceD := ∪M

j=1D j. The interaction of DIFT and APTs, which weformally model in Section III, evolves through G.

III. PROBLEM FORMULATION: GAME MODEL

In this section, we model a two-player multi-stage gamebetween APTs and DIFT. We model the different stages ofthe game in such a way that each stage of the APT attacktranslates to a stage in the game.

A. System Model

We denote the adversarial player of the game by PA andthe defender player by PD. In the jth stage of the attack, theobjective of PA is to evade detection and reach a destinationnode in stage j, i.e., D j. PA can also abandon the attack atany stage by dropping out the flow. The objective of PD is todetect PA before PA reaches a node in D j. In order to detectPA, PD identifies a set of processes Y := y1, . . . ,yh ⊆VG asthe tag sources such that any information flow passing througha process yi ∈ Y is tagged. PD tracks the traversal of a taggedflow through the system and generates tag sinks denoted asT := t1, . . . , th′ ⊂VG using pre-specified security rules.

Let Λ be the set of security rules. We consider a securitypolicy that is based on the terminal points of the flow, i.e.,entry point of the flow and the location of the flow whereanalysis is performed. Therefore, Λ : VG×VG →0,1, where1 represents that the pair of terminal points of the flow violatesthe security policy of the system and 0 otherwise. Here, |Λ|6N2, since not all node pairs in VG have a directed path betweenthem. Hence the number of security rules that are relevant toa node is at most N. Without loss of generality, we assumethat each node in G is associated with N security rules. As Nis large, applying all N security rules at every tag sink maynot be required. The security rules are pre-specified dependingon the application running on the system and is known to thedefender. In our game model, DIFT selects a subset of rulesat every tag sink to perform security analysis.

B. State Space of the Game

Let λ ⊂VG denote the subset of nodes in the IFG that aresusceptible (vulnerable) to attacks. In order to characterizethe entry point of the attack by a unique node, we introduce apseudo-process s0 such that s0 is connected to all the processesin the set λ . Let S := VG ∪s0, Eλ := s0×λ , and E :=EG ∪Eλ . Note that, s0 is the root node of the modified graphand hence transitions are allowed from s0 and no transition isallowed into s0.

Now we define the state space of the game. Each decisionpoint in the game is a state of the state space and is definedby the source of the flow in set λ , the stage of the attack,the current location in the IFG, si, along with its tag status,trap status, and the status of the N security rules applicable atsi. We use s j

i to denote the process si at the jth stage of theattack. Then the state space of the game is denoted by

S := VG ×1, . . . ,M×λ ×0,12+N∪(s10, 0, . . . ,0︸︷︷︸

2+N times

),

where S = s1, . . . , sT with T = (2(2+N)NM|λ |) + 1. Heres1 = (s1

0,0, . . . ,0) is the state in S corresponding to thepseudo-node s0. The remaining states are given by sr =

(s ji ,λr,k1

r , . . . ,k(2+N)r ), for r = 2, . . . ,T , where si ∈ VG , j ∈

1, . . . ,M, λr ∈ 1, . . . , |λ |, and k1r , . . . ,k

2+Nr ∈ 0,1. Here,

k1r = 1 if the flow at si is tagged and k1

r = 0 otherwise.Similarly, k2

r = 1 if si is a tag sink and k2r = 0 otherwise,

and k3r , . . . ,k

2+Nr denotes the selection of security rules (bit 1

denotes that a rule is selected and bit 0 otherwise).Let N (si) denotes the set of out-neighbors of a node

si ∈ VG defined as N (si) := si′ : (si,si′) ∈ E ∪ φ. Here




φ corresponds to adversary dropping the flow. Consider twostates sr = (s j

i ,λr,k1r , . . . ,k

(2+N)r ), sr′ = (s j′

i′ ,λr′ ,k1r′ , . . . ,k

(2+N)r′ )

in S . Then sr′ is an out-neighbor of sr in the state space graphif one of the following cases hold: 1) j = j′ and si′ ∈N (si),and 2) j′ = j + 1 and si = si′ ∈ D j. Case 1) correspondsto transition in the same stage to an out-neighbor node ordropping out of the game and case 2) corresponds to transitionat a destination from one stage to the next stage. Note that, incase 2) (i.e., j′ = j+1 and si = si′ ∈D j).

Tagging s0 means tagging all sensitive flows which is notdesirable on account of the performance overhead. Therefore,s0 is neither a tag source nor a tag sink and it is always instage 1 with origin at s0 itself as denoted by state s1. We givethe following definition for an adversarial flow in the statespace S originating at the state (s1

0,0, . . . ,0).

Definition III.1. An information flow in S that originates atstate (s1

0,0, . . . ,0) and terminates at state (s ji ,λr,k1

r , . . . ,k2+Nr )

is said to satisfy the stage-constraint if the flow passes throughsome destinations in D1,D2, . . . ,D j−1 in order.

Definition III.1 holds in those applications in which theattacker must compromise the internal processes in a specificorder in order to achieve the desired goal, e.g., accessing log-in ID, followed by password to access a bank account.

C. Actions of the PlayersThe players PA and PD have finite action sets over the

state space S denoted by sets AA and AD, respectively.The action set of PA is the set of all possible paths theattacker can traverse in an attack. Let Ω denote the set ofall possible paths1 in the graph S = (VS ,ES ) that originateat node s1

0. Here VS := VG × 1, . . . ,M ∪ s10,φ and

ES consists of: i) (s10,s

1i ) : si ∈ λ, ii) (s j

i ,sji′) : (si,si′) ∈

EG, iii) (s ji ,s

j+1i ) : si ∈ D j, and iv) (s j

i ,φ) : si ∈ VG , j ∈1, . . . ,M. Then the action set of PA is Ω, i.e., AA = Ω.Let sr = (s j

i ,λr,k1r , . . . ,k

(2+N)r ) be a state such that si ∈ D j.

At this state, AD = /0 and the next state of sr is given by(s j+1

i ,λr,k1r , . . . ,k

(2+N)r ). This construction of S captures the

transitions of the attacker from stage j to stage j+1. Note that,PA can also end the game by dropping the information flow atany point in time, i.e. when the end state of a path is a null stateφ . For a state (s j

i ,λr,k1r , . . . ,k

(2+N)r ) in S , λr is decided by the

process in λ to which the adversary transitions from s0, i.e.,the transition from (s0,0, . . . ,0) in the state space. Further, λrfor a particular adversarial flow remains fixed for all states inS that the flow traverses. As the tag propagation rules are pre-specified by the user, the action set of PD includes selectionof tag status of a flow, tag sinks, and security check rules.Hence the action set of the defender at s j

i is a binary tuple,(k1

r , . . . ,k2+Nr ), and AD = 0,1NM(2+N). While the objective

of PA is to exploit the vulnerable processes λ of the system tosuccessfully launch an attack, the objective of PD is to selectan optimal set of tagged nodes, say Y? ⊂ Y , and an optimalset of tag sinks, say T ? ⊂ T , and a set of security rules suchthat any spurious information flow in the system is detectedat some tag sink before reaching destination.

1A path is a finite sequence of edges that join a sequence of vertices of agraph, without repetitions.

D. Information of the GameBoth adversary and the defender know the graph G. At any

state sr in the game, the defender has the information aboutthe tag source status of sr, the tag sink status of sr, and theset of security rules chosen at sr. However, the adversary isunaware of the tag source status, the tag sink status, and thesecurity rules chosen at that state. On the other hand, whilethe adversary knows the stage of the attack, the defender doesnot know the stage of the attack and hence the unique set ofdestinations targeted by PA in that particular stage. Thus, theplayers PA and PD have asymmetric knowledge resulting inan imperfect information game.

E. Strategies of the PlayersA strategy is a rule that the player uses to select actions

at every step of the game. We consider mixed strategiesand hence there are probability distributions over the actionsets AA and AD. The defender’s strategy, pD, consists ofselecting the tag status, trap status, and security rules for flowspassing through each state. The defender strategy at a statecorresponding to process si, pD(si), is a tuple of length 2+N,(p1

D(si), . . . ,p2+ND (si)). Here, p1

D(si) denotes the probability thatthe flow passing through si is tagged, p2

D(si) the probabilitythat si is a tag sink, and (p3

D(si), . . . ,p2+ND (si)) the probability

of selecting each rule in Λ corresponding to si. The pseudo-process s0 has pi′

D(s0) = 0 for i′ ∈ 1, . . . ,2+N. Note thatthe defender strategy does not depend on the stage, as thedefender is unaware of the stage of the attack. The adversaryon the other hand knows the stage of the attack and hencethe strategy of PA, i.e., the probability distribution on Ω,pA : Ω→ [0,1]|AA|, depends on the attack stage.

Taken together, the strategies of PA and PD are given bythe vectors pD = (p1

D(si), . . . ,p2+ND (si)) : si ∈ S and pA =

pA(ω) : ω ∈Ω, respectively. Note that, pA is a vector whoselength equals the number of paths in Ω, while pD is a vector oflength |S | with each entry of length (2+N). Notice that pA isdefined in such a way that a flow that originate at (s1

0,0, . . . ,0)in the state space S reaches a state (s j

i ,λr,k1r , . . . ,k

2+Nr ), for

some λr ∈1, . . . , |λ | and for some k1r , . . . ,k

2+Nr ∈0,1, after

passing through some destinations of stages 1, . . . , j− 1. Bythis definition of state space and strategies of the game, allinformation flows in S satisfy the stage-constraints, given inDefinition III.1, and can affect the performance of the systemand even result in system breakdown, if malicious.

F. Payoffs to the PlayersNow we define the payoffs of players PA and PD, denoted

by UA and UD, respectively. The payoffs for players includepenalties and rewards at every stage of the attack. UA consistsof: (i) reward β A

j > 0 for adversary successfully reaching adestination in the jth stage satisfying the stage-constraints, and(ii) cost αA < 0 if the adversary is detected by the defender.Similarly, UD consists of: (a) memory cost CD(si) < 0 fortagging node si ∈VG , (b) memory cost WD(si)< 0 for settingtag sink at node si ∈ VG , (c) cost γi, for i ∈ 1, . . . ,N, forselecting the ith security check rule at a tag sink, (d) costβ D

j < 0 if the adversary reaches a destination in the jth stagesatisfying the stage-constraint, and (e) reward αD > 0 for




detecting the adversary. We assume that the cost of tagginga node and the cost of setting tag sink at a node, CD(si)and WD(si), respectively, are independent of the attack stage.However, CD(si) and WD(si) depends on the average traffic atprocess si and hence CD(si) := c1 B(si) andWD(si) := c2 B(si).Here, c1 ∈ R− is a fixed tagging cost and c2 ∈ R− is a fixedcost for setting tag sink, where R− is the set of negative realnumbers, and B(si) denotes the average traffic at node si. At astate sr = (s j

i ,λr,k1r , . . . ,k

(2+N)r ), the costs CD(si) and WD(si)

are incurred if the corresponding bit denoting the tag statusand trap status, i.e., k1

r ,k2r , respectively, are 1.

Recall that, the origin of any adversarial information flowin the state space S is (s1

0,0, . . . ,0). For a flow originatingat state (s1

0,0, . . . ,0) in S , let pT ( j) denotes the probabilitythat the flow will get detected at stage j and pR( j) denotesthe probability that the flow will reach some destination in setD j. The values of pT ( j) and pR( j) depend on the tag status,the tag sink status, and the set of security rules selected. Fora given strategy, (pD,pA), the payoffs UD and UA are given by,

UD(pD,pA) = ∑si∈VG

(p1

D(si)CD(si)+p2D(si)WD(si)+

N

∑g=1

p2+gD (si)γr

)+

M

∑j=1

(pT ( j)αD + pR( j)β D

j

), (1)

UA(pD,pA) =M

∑j=1

(pT ( j)αA + pR( j)β A

j

). (2)

Note that, if the adversarial flow is detected in stage j, thenpT ( j′) = 0 for j′ > j and j′ ∈ 1, . . . ,M.

Assumption III.2. If the game parameters,αD,αA,β D,β A,CD,WD, are such that the cost of tagginga flow and performing security analysis across a path ishigh enough that there could be paths that may not beworth the tagging cost, then the adversary always achievesits goal irrespective of the defender’s actions. To avoid thistrivial case, we assume that the game parameters satisfy thecondition that at equilibrium there exists a defender strategywith nonzero probability of detection.

G. Preliminary Analysis of the Model

In this subsection, we perform an initial analysis of ourmodel. A multi-stage attack consisting of M stages belongs toone of the following M+2 scenarios.1) The adversary drops out of the game before reaching some

destination in D1.2) The adversary reaches some destination in each ofD1, . . . ,D j and then drops out of the game, for j =1, . . . ,M−1 (M−1 possibilities).

3) The adversary reaches some destination in each ofD1, . . . ,DM .

4) The defender detects the adversary at some stage.The payoff of the game is different for each of the caseslisted above. In scenario 1), PA and PD incur zero payoff.In scenario 2), PA earns rewards for reaching stages 1, . . . , j,respectively, PD incurs the penalty for not detecting the adver-sary at stages 1, . . . , j, respectively, and the game terminates.In scenario 3), PA earns rewards for reaching destinations in

all stages and wins the game and PD incurs a total penalty fornot detecting the adversary at all the stages. In scenario 4),PA incurs the penalty for getting detected and PD earns thereward for detecting the adversary and wins the game.

For calculating the payoffs of PD and PA at a decision pointin the game (i.e., at a state in S), for given player strategies(pD,pA), we define payoffs at a state LA

(pD,pA): S → R and

LD(pD,pA)

: S → R for the adversary and defender, respectively,at every state in the state space S . Let the current stateof the game be (s j′

i ,λr,k1r , . . . ,k

2+Nr ), where λr ∈ 1, . . . , |λ |

and k1r , . . . ,k

2+Nr ∈ 0,1, and q(s j′

i ) denote the probabilitythat the next state of the game is φ . Further, consider theset of paths, Ω j ⊂ Ω, that originate at s1

0, reaches a des-tination node in D j, and then drops out before reaching adestination in D j+1, without getting detected by the defender.Let PR, j(s

j′i ,λr,k1

r , . . . ,k2+Nr ) denote the probability that the

adversary reaches some node in D j and drops outs beforereaching D j+1, provided the current state of the game is(s j′

i ,λr,k1r , . . . ,k

2+Nr ). Also let PT (s

j′i ,k

1r , . . . ,k

2+Nr ) denote the

probability that information flow is detected by the defenderwhen the current state is (s j′

i ,λr,k1r , . . . ,k

2+Nr ). To characterize

the payoffs of the players at a state in S , we now introduce afew notations. For notational brevity, let us denote k1

r , . . . ,k2+Nr

by kr, for r = 1, . . . ,N. For state (s j′i ,λr, kr) and t ∈ 1, . . . ,M,

Qt(sj′i ):= ∑

ω∈Ω js`∈N (si)kg`∈0,1

g∈1,...,2+N

pA(ω)[2+N

∏g=1

(pg

D(s`))kg

`(

1−pg

D(s`))(1−kg

` )]PR,t(st

`,λr, kr),

Qt(sj′i ):= ∑

ω∈Ω js`∈N (si)kg`∈0,1

g∈1,...,2+N

pA(ω)[2+N

∏g=1

(pg

D(s`))kg

`(

1−pg

D(s`))(1−kg

` )]PT (st

`,λr, kr).

Then,

PR, j(sj′i ,λr, kr) =

0, k1r = · · ·= k2+N

r = 1q(s j′

i )+Q j′+1(sj′i ), si ∈D j, j′ = j

0, si ∈D j′ , j′ = j+10, j′ > j+1Q j′(s

j′i ), j′ ≤ j

q(s j′i )+Q j′(s

j′i ), j′ = j+1

PT (sj′i ,λr, kr) =

1, k1

r = · · ·= k2+Nr = 1

0, j′ = M,si ∈DM

Q j′(sj′i ), otherwise.

Using the definitions of PR, j(·) and PT (·) at a state in S ,the payoffs of PD and PA at a state (s j′

i ,λr,k1r , . . . ,k

2+Nr ) are

given by Eqs. (3) and (4), respectively.

LD(pD,pA)

(s j′i ,λr, kr) = ∑

sb∈VG

(p1

F,b(sj′i ,λr, kr)CD(sb)+ p2

F,b(sj′i ,λr, kr)WD(sb)+

N

∑g=1

p2+gF,b (s j′

i ,λr, kr)γg

)+

M

∑j=1

(pR, j(s

j′i ,λr, kr)(

j

∑v=1

βDv )+PT (s

j′i ,λr, kr)α

D), (3)

LA(pD,pA)

(s j′i ,λr, kr) =

M

∑j=1

(pR, j(s

j′i ,λr, kr)(

j

∑v=1

βAv )+PT (s

j′i ,λr, kr)α

A). (4)

In Eqs. (3) and (4), p1F,b(s

j′i ,λr, kr) denotes the probability




that flow passing through sb ∈VG is tagged if the current stateis (s j′

i ,λr, kr) and p2F,b(s

j′i ,λr, kr) denotes the probability that

node sb ∈ VG is a tag sink in a flow whose current state is(s j′

i ,λr, kr). Similarly, p2+gF,b (s j′

i ,λr, kr) denotes the probabilitythat the rth security rule is selected for inspecting authen-ticity of a flow whose current state is (s j′

i ,λr, kr). Eqs. (3)and (4) give a system of 2(2+N)NM|λ |+ 1 linear equationseach for the payoff functions LD

(pD,pA)and LA

(pD,pA), where

LD(pD,pA)

(b), LA(pA,pA)

(b) denote the payoffs at the bth state in S .Lemma III.3 relates payoffs of the game UD(pD,pA),UA(pD,pA)with payoffs at a state LD

(pD,pA),LA

(pA,pA), respectively. We use

Lemma III.3 to compute a local correlated equilibrium of thegame in Section VII (Algorithm VII.1, Step 15).

Lemma III.3. Consider the defender and adversary strate-gies pD and pA, respectively. Then, the following hold:(i) UA(pD,pA) = LA

(pD,pA)(s1

0,0, . . . ,0), and (ii) UD(pD,pA) =

LD(pD,pA)

(s10,0, . . . ,0).

Proof. (i): By definition we get, LA(pD,pA)

(s10,0, . . . ,0) =

∑Mj=1

(PR, j(s1

0,0, . . . ,0)(∑jv=1 β A

v )+PT (s10,0, . . . ,0)α

A)

. Here,

M

∑j=1

PR, j(s10,0, . . . ,0)(

j

∑v=1

βAv ) = β

A1

M

∑j=1

PR, j(s10,0, . . . ,0)+

βA2

M

∑j=2

PR, j(s10,0, . . . ,0)+ . . .+β

AM PR,M(s1

0,0, . . . ,0), (5)

Where, ∑Mj=1 pR, j(s1

0,0, . . . ,0) is the total probability that aflow originating at (s1

0,0, . . . ,0) reach some destination in D1.Similarly, ∑

Mj=2 pR, j(s1

0,0, . . . ,0) is the total probability that aflow originating at (s1

0,0, . . . ,0) reach some destination in D2.Thus

Thus,M

∑j=1

PR, j(s10,0, . . . ,0) = pR(1). Similarly,

M

∑j=2

PR, j(s10,0, . . . ,0) = pR(2), . . . ,PR,M(s1

0,0, . . . ,0) = pR(M).

(6)

From Eqs. (5) and (6), we getM

∑j=1

(PR, j(s1

0,0, . . . ,0)(j

∑v=1

βAv ))=

M

∑j=1

pR( j)β Aj . (7)

Since PT (s10,0, . . . ,0) = ∑

Mj=1 pT ( j),

PT (s10,0, . . . ,0)α

A =M

∑j=1

pT ( j)αA. (8)

From Eqs. (7) and (8), we get LA(pD,pA)

(s10,0, . . . ,0) =

∑Mj=1

(pR( j)β A

j + pT ( j)αA)=UA(pD,pA).

(ii): Notice that p1F,i(s

10,0, . . . ,0) is the probability that

the process si is a tag source in a flow originating at(s1

0,0, . . . ,0). Thus p1F,i(s

10,0, . . . ,0) = p1

D(si). Similarly, we getp2

F,i(s10,0, . . . ,0) = p2

D(si) and p2+gF,i (s1

0,0, . . . ,0) = p2+gD (si), for

g = 1, . . . ,N. This along with Eqs. (7) and (8) implies thatLD(pD,pA)

(s10,0, . . . ,0) = ∑si∈VG

(p1

D(si)CD(si) + p2D(si)WD(si) +

∑Ng=1 p2+g

D (si)γr

)+∑

Mj=1

(pR( j)β D

j + pT ( j)αD)=UD(pD,pA).

This completes the proof of (i) and (ii).

IV. GAME MODEL: SOLUTION CONCEPT

This section presents an overview of the notions of equilib-rium considered in this work. We first describe the concept of aplayer’s best response to a given mixed policy of an opponent.

Definition IV.1. Let pA : Ω → [0,1]|Ω| denote an adver-sary strategy (probability of selecting paths) and pD : S →[0,1](2+N)|S | denote a defender strategy (probabilities of tag-ging, tag sink selection, and security rule selection at everynode in the graph). The set of best responses of the defendergiven by

BR(pA) = argmaxpDUD(pD,pA) : pD ∈ [0,1](2+N)|S |.

Similarly, the best responses of the adversary are given by

BR(pD) = argmaxpAUA(pD,pA) : pA ∈ [0,1]|Ω|.

Intuitively, the best responses of the defender are the setof tagging strategies, the set of tag sink selection strategies,and the set of security rule selection strategies that jointlymaximize the defender’s payoff for a given adversary strategy.At the same time, the best responses of the adversary are thesets of probabilities of paths that maximize the adversary’spayoff for a given defender (tagging, tag sink selection, andsecurity rule selection) strategy. A mixed policy profile is aNash equilibrium (NE) if the mixed policy of each player isa best response to the fixed mixed policy of the rest of theplayers. Formal definition of Nash equilibrium is as follows:

Definition IV.2. A pair of mixed policies (pD,pA) is a Nashequilibrium if

pD ∈ BR(pA) and pA ∈ BR(pD).

A Nash equilibrium (NE) captures the notion of a stablesolution as it occurs when neither player can improve itspayoff by unilaterally changing its strategy. A pure strategyNE for the APT vs. DIFT game corresponds to the adversarydeterministically choosing a path from the entry point toa destination node. However, in that case the defender canalways improve the payoff by performing security analysis atonly one node in that path with probability 1. Hence thereexists no pure strategy NE for the game. Nash’s result in[18] that proved the existence of an NE for a finite gamewith mixed strategy guarantees the existence of NE for thegame we consider in this paper. While there exists an NEfor games with rational, noncooperative players, it is PPAD-complete to compute it in general [19], especially for nonzero-sum dynamic games of the type considered in this paper. Alsonote that, for the game considered in this paper, the payofffunctions for the players are nonlinear in the probabilities. Aweaker solution concept which is a relaxation of the Nashequilibrium is the correlated equilibrium defined as follows.

Definition IV.3. Let P denote a joint probability distributionover the set of defender and adversary strategies. Then P is




a correlated equilibrium if for all strategies pA,p′A and pD,p′D,conditioned on that the strategy drawn from P is (pD,pA),

UD(pD,pA) ≥ UD(p′D,pA)

UA(pD,pA) ≥ UA(pD,p′A)

We next consider a simpler version of the correlated equi-librium that models the local policies at each process.

Definition IV.4. Let P denote a joint probability distributionover the set of defender and adversary actions. The distribu-tion P is a local correlated equilibrium if for all states si ∈S ,j ∈ 1, . . . ,M, and strategies pD(si) and pA(ω), conditionedon that the strategy drawn from P is (pD,pA), we have

UD(pD,pA) ≥ UD(p′D,pA)

UA(pD,pA) ≥ UA(pD,p′A)

where p′D denotes a strategy with p′xD(si) = pxD(si), for some

x∈1, . . . ,2+N, p′yD(si)=pyD(si) for y∈1, . . . ,2+N,y 6= x,

and p′D(si′) = pD(si′) for i 6= i′, and p′A denotes a strategy withp′A(ω) = pA(ω) and p′A(ω ′) = pA(ω

′) for ω 6= ω ′.

V. BEST RESPONSE OF THE PLAYERS

In this section, we calculate the best responses of players,PA and PD.

A. Best Response for the Adversary

The best response of the adversary to a given defenderstrategy is described here. Firstly, we present the followingpreliminary lemma.

Lemma V.1. Consider a defender policy pD. For each des-tination d j

b ∈ D j, let Ωd jb

denote the set of paths in S

that originate at s10 and terminate at some state that cor-

responds to node d jb. For any path ω , let p(ω) denote

the probability that a flow reaches the destination with-out getting detected by the defender. Finally, for every d j

b,choose a path ω∗

d jb∈ argmaxp(ω) : ω ∈Ωd j

b. Let ω∗ ∈

argmaxp(ωd jb) : d j

b ∈D j, j = 1, . . . ,M. Finally, define thepolicy p∗A by

p∗A(sji ,s

j′

i′ ) =

1, (s j

i ,sj′

i′ ) ∈ ω∗

0, else

Then, ω∗ ∈ BR(pD).

Proof. Let pA be any adversary policy, and let Ω denote theset of all possible paths in S that are chosen by the adversarywith nonzero probability such that the termination of the pathis at some destination in D = ∪M

j=1D j. The payoff of theadversary can be written as

UA(pD,pA) = ∑ω∈Ω

π(ω)(p(ω)β Aj(ω)+(1− p(ω))αA)

=M

∑j=1

∑d j

b∈D j

∑ω∈Ω

d jb

π(ω)(p(ω)β Aj +(1− p(ω))αA,

where j(ω) is equal to the stage where the path terminatesand π(ω) is the probability that the path is chosen under this

policy. The payoff UA(pD,pA) is bounded above by the paththat maximizes p(ω)(β A

j −αA), which is exactly the path ω∗.

Using Lemma V.1, we present the following approach toselect a best response to the adversary for a given defenderpolicy. For each destination in

⋃Mj=1D j, we first choose a path

ω to that destination such that the probability of reaching thatdestination, p(ω), is maximized while traversing destinationsof all intermediate stages. From those paths, we then select apath that maximizes p(ω)(β A

j −αA).

Proposition V.2. The path ω∗ returned by a shortest pathalgorithm on the state space graph with edge weights of eachincoming edge to states that correspond to node si equal to− log

(1−p1

D(si)p2D(si)∏

2+Nr=3 pr

D(si))

is a best response to thedefender strategy pD.

Proof. Consider a path ω ∈Ωd jb, i.e., a path that originates at

s10 and terminates at some state that correspond to node d j

b.Let the first node that belongs to the vulnerable set λ , throughwhich ω traverses be denoted by λω . For all nodes of G thatlie in ω , i.e., si ∈ ω , let Λω(si) denote the set of indices ofthe security rules in Λ that are based on si and λω .

By Lemma V.1, it suffices to show that a shortest path inthe state space with a suitably defined weight function willreturn a path in S with the maximum probability of reachingsome state in S corresponding to node d j

b of G without gettingdetected by the adversary. For any path ω ∈ Ωd j

b, the proba-

bility that the flow reaches d jb without getting detected by the

adversary is equal to ∏si∈ω

(1−p1

D(si)p2D(si)∏

2+Nr=3 pr

D(si)).

arg maxω∈Ω

d jb

∏si∈ω

(1−p1

D(si)p2D(si)

2+N

∏r=3

prD(si)

)= arg max

ω∈Ωd jb

∑si∈ω

log(

1−p1D(si)p2

D(si)2+N

∏r=3

prD(si)

)= arg min

ω∈Ωd jb

− ∑si∈ω

log(

1−p1D(si)p2

D(si)2+N

∏r=3

prD(si)

)The problem of finding best response to the adversary is

equivalent to finding the shortest path from s10 to d j

b in agraph where the edge weights, which are non-negative as(

1− p1D(si)p2

D(si)∏2+Nr=3 pr

D(si))6 1, are equal to − log

(1−

p1D(si)p2

D(si)∏2+Nr=3 pr

D(si))

for each edge incoming to s j′i , for

j′ ∈ 1, . . . ,M.

B. Best Response for the DefenderWe now present an approach for approximating the best

response of the defender. In this approach, the set of possibleresponses at si ∈VG is discretized. Define

Vr = szri : si ∈VG ,zr = 1, . . . ,Zr

for integers Zr > 0, where r = 1, . . . ,2+N. For any V ′r ⊆ Vr,r = 1, . . . ,2+N, define

pD(si;V ′r ) =1Zr|szr

i : zr = 1, . . . ,Zr∩V ′r |, (9)




and define pD(V ′1) to be the resulting vector of probabilities fortag selection, pD(V ′2) to be the resulting vector of probabilitiesfor tag sink selection, and pD(V ′3), . . . ,pD(V ′2+N) to be theresulting vectors of probabilities for security rule selection.Then pD(V ′)= pD(V ′r )2+N

r=1 is the resulting vector of defenderstrategy, where V ′ = V ′1, . . . ,V ′2+N. For a given adversarystrategy, say pA, let f (V ′) =UD(pD(V ′),pA).

Proposition V.3. The function f (V ′) is submodular as afunction of V ′, that is, for any V ′r , V ′′r with V ′r ⊆ V ′′r and anyszr

i /∈V ′′r , for r ∈ 1, . . . ,2+N,

f(

V ′r ∪szri 2+N

r=1

)− f (V ′)≥ f

(V′′r ∪s

zri 2+N

r=1

)− f (V ′′).

Proof. Consider UD(pD,pA) as defined in Eq. (1). The firstand second terms of UD(pD,pA) are equal to

∑si∈VG

CD(si)

Z1|sz1

i : z1 = 1, . . . ,Z1∩V ′1| and

∑si∈VG

WD(si)

Z2|sz2

i : z2 = 1, . . . ,Z2∩V ′2|,

respectively, both of which are modular as a function of V ′.The third term of UD(pD,pA) equals

∑si∈VG

N

∑r=1

γr

Z2+r|sz2+r

i : z2+r = 1, . . . ,Z2+r∩V ′2+r|,

which is also modular as a function of V ′. Let Ω denote the setof all possible paths in the state space that are chosen by theadversary with nonzero probability such that the terminationof the path is at some destination in D = ∪M

j=1D j. The lastterm can be written as

∑ω∈Ω

π(ω)M

∑j=1

(pT ( j;ω)αD + pR( j;ω)β Dj ),

where pT ( j;ω) (resp. pR( j;ω)) denotes the probability thatthe adversarial flow is detected by the defender at the jth stage(resp. reaches some destination in stage j) when the samplepath is ω and the defender strategy is pD(V ′) (the V ′ is omittedfrom the notation for simplicity). π(ω) denotes the probabilityof selecting the path ω . Let g(ω;V ′) denote the probabilitywith which adversary evades detection along path ω when thedefender’s strategy is pD(V ′). Since the last destination that isreached before dropping out is determined by the choice ofpath (denote this destination j(ω)), we haveM

∑j=1(pT ( j;ω)αD+pR( j;ω)β D

j )=g(ω;V ′)αD+(1−g(ω;V ′))β Dj(ω)

= g(ω;V ′)(αD−βDj(ω))+β

Dj(ω).

Since αD−β Dj(ω) ≥ 0 and β D

j(ω) is independent of pD(V ′), itsuffices to show that g(ω;V ′) is submodular as a functionof V ′. Let V ′ ⊆ V ′′ with V ′r ⊆ V ′′r and szr

i /∈ V ′′r , for any r ∈

1, . . . ,2+N. We can write g(ω;V ′′)

= 1−

∏sik∈ω:

ik=i

(1−2+N

∏r=1

prD(sik))

∏

sik∈ω:

ik 6=i

(1−2+N

∏r=1

prD(sik))

= 1−δ (V ′′)(1−

2+N

∏r=1

prD(si))

c(si;ω),

where prD(sik), for r = 1, . . . ,2+N, denotes the probabilities

of selecting node sik as tag source, as tag sink, and selectingsecurity rules under the policy pD(V ′), respectively, and

δ (V ′′) = ∏sik∈ω:

ik 6=i

(1−2+N

∏r=1

prD(sik)), c(si;ω) = |sik ∈ω : ik = i|.

Hence, g(ω;

V ′′r ∪szri 2+N

r=1)−g(ω;V ′′)

= 1−δ (V ′′)(1−∏2+Nr=1 (p

rD(si;V ′′)+ 1

Zr))c(si;ω)−

(1−δ (V ′′)(1−2+N

∏r=1

prD(si;V ′′))c(si;ω))

= δ (V ′′)[(1−

2+N

∏r=1

prD(si;V ′′))c(si;ω)−

(1−2+N

∏r=1

(prD(si;V ′′)−

1Zr

))c(si;ω)]

From Eq. (9) and the definition of PD(V ′), when V ′r ⊆ V ′′r ,pr

D(si;V ′)≤ prD(si;V ′′), and hence

(1−2+N

∏r=1

prD(si;V ′′))c(si;ω)− (1−

2+N

∏r=1

(prD(si;V ′′)−

1Zr

))c(si;ω)

≤ (1−2+N

∏r=1

prD(si;V ′))c(si;ω)− (1−

2+N

∏r=1

(prD(si;V ′)−

1Zr

))c(si;ω).

Furthermore, V ′r ⊆V ′′r for r = 1, . . . ,2+N implies δ (V ′)≥δ (V ′′). Hence

g(ω;

V ′r ∪szri 2+N

r=1)−g(ω;V ′)

> g(ω;

V ′′r ∪szri 2+N

r=1)−g(ω;V ′′),

completing the proof of submodularity.

Submodularity of f (V ′) implies the following.

Proposition V.4. There exists an algorithm that is guaranteedto select a set V ? satisfying f (V ?) ≥ 1

2 max f (V ′) : V ′ ⊆Vwithin O(NZ) evaluations of UD, where Z = ∑

2+Nr=1 Zr.

Proof. The proof follows from submodularity of V ′ and [20].

VI. RESULTS: SINGLE-STAGE ATTACKS

In this section, we focus on the case where there is only asingle attack stage and provide a solution for the DIFT game.Recall that in a single-stage attack, the attacker’s objective isto choose transitions in the IFG to reach a target node. Our




approach to solving the game is based on a minimum capacitycut-set formulation on a flow network constructed for the IFGfollowed by solving a bimatrix game. Here, M = 1 and hencewe drop the notation for the stage in this section.

A. Min-Cut FormulationFor a flow-network F , a cut is defined below.

Definition VI.1. In a flow-network F with vertex and directededge sets VF and EF respectively, for a subset S ⊂VF the cutinduced by S is a subset of edges κ(S) ⊂ EF such that forevery (u,v) ∈ κ(S), |u,v∩ S |= 1.

The set κ(S) consists of all edges whose one end pointis in S . Given a flow-network F = (VF ,EF) with source-sinkpair (sF , tF) and edge capacity vector cF : EF → R+, the costof a cut κ(S), is defined as the sum of the costs of the edgesin the cut

cF(κ(S)) = ∑e∈κ(S)

cF(e). (10)

The objective of the (source-sink)-min-cut problem is to finda cut κ(S?) of S? such that cF(κ(S?)) 6 cF(κ(S)) for anycut κ(S) of S satisfying sF ∈ S and tF /∈ S . The (source-sink)-min-cut problem is well studied and there exist differentalgorithms that find the maximum flow f ? in polynomial time(polynomial in |VF | and |EF |) [21]. Given an information flowgraph G, we first construct the flow-network F .

Pseudocode describing the construction of F = (VF ,EF)is given in Algorithm VI.1. The vertex set of F consistsof two nodes si and s′i corresponding to each node si inthe information flow graph G and additional vertices sF , tF

(Step 2). Thus |VF | = 2N + 2. The directed edge set of Fconsists of all edges in the information flow graph (EG ), edgescorresponding to the nodes in the information flow graph (EG ),edges connecting source node sF to all nodes in the set λ (Eλ ),and edges connecting all destination nodes to the sink nodetF (ED) (Step 3). The capacity vector cF is defined in such away that all edges except the edges corresponding to nodesin G have infinite capacity. The capacities of the edges in EGare defined as the sum of cost for selecting those nodes as tagsource and tag sink since the costs of selecting the securityrules do not depend on the node (Step 4). Hence, a minimumcapacity edge in F corresponds to a node in G that has theminimum cost of tagging and trapping. Let κ(S?) denote anoptimal solution to the (source-sink)-min-cut problem on F .Since EG is a cut and ∑e∈EG

cF(e)< ∞, κ(S?)⊂ EG Then, themin-cut nodes is given by

S? := si : (si,s′i) ∈ κ(S?). (11)

The objective of the defender is to optimally select a defensepolicy such that no adversarial flow reaches from s0 to somenode in D. Assumption III.2 ensures that there is no pathwhere the cost of tagging exceeds the damage of the attack, sothat the defender always has an incentive to perform securityanalysis in at least one location along each path. In otherwords, the defender ensures that no flow from sF reachestF without getting detected. For achieving this the defender’spolicy must have a nonzero probability of tag and trap for atleast one node in all possible paths from sF to tF . For any

Algorithm VI.1 Pseudocode for constructing the flow-networkF and defender payoff function UD(·)

Input: Information flow graph G, costs CD,WD,γ1, . . . ,γN

Output: Flow-network F , source, sink nodes: sF , tF , capac-ity vector cF

1: Construct flow-network F with vertex set VF and edge setEF as follows:

2: VF ← VG ∪V ′G ∪ sF , tF, where VG = s1, . . . ,sN, V ′G =

s′1, . . . ,s′N, and sF = s03: EF ← EG ∪ EG ∪Eλ ∪ED, where EG = (s′i,s j) : (si,s j) ∈

EG, EG = (si,s′i) : i = 1, . . . ,N, Eλ = (sF ,si) : si ∈ λ ,and ED = (s′i, tF) : si ∈D

4: cF(e)←

∞, e ∈ EG ∪Eλ ∪ED

CD(si)+WD(si), e ∈ EG

adversary policy, the best possible choice for the defender isto tag and trap at a node that has the minimum total costCD(·)+WD(·). An attack path is a directed path from s0 tosome node in D formed by a sequence of transitions of theadversary. The probability of an attack path under an adversarystrategy is the product of the probabilities of all transitionsalong that path. The adversary plans its transitions to obtainan attack path with least probability of detection. The resultbelow characterizes Nash equilibria of the single-stage.

Theorem VI.2. Let S? be a min-cut of the flow-networkF = (VF ,EF) constructed in Algorithm VI.1. Then, at Nashequilibrium for the single-stage attack case, the defender’spolicy is to tag and trap with equal probability all the nodesin S?. Further, the adversary’s strategy is such that each attackpath passes through exactly one node in S?.

Before giving the proof of Theorem VI.2, we present thefollowing lemma that establishes the first main argument inproving the theorem.

Lemma VI.3. Let ΩD be the set of all paths in G from s0to some node in D under any adversary policy. Then, for adefender policy that assign tag and trap at all nodes in themin-cut S? and does not tag and trap any other node, thebest response of the adversary is a sequence of transitionssuch that any attack path (or set of paths if using a mixedpolicy) passes through exactly one node which is tag sourceand trap.

Proof. Consider a policy of the defender where all the nodesin the min-cut, i.e., S?, are tagged and assigned as traps. Notethat, all paths in ΩD pass through some node in S?. We provethe argument through a contradiction. Suppose that there existsa path ω ∈ΩD such that there are two nodes, say si,sr, in pathω with nonzero probability of tag and trap. Without loss ofgenerality, assume that in ω there exists a directed path fromsi to sr. Now we show that π(ω) = 0, where π(ω) is theprobability with which the adversary chooses path ω . Notethat si,sr ∈ S?. Since S? corresponds to a min-cut, there existpaths in ΩD that have node si in it but not sr, and vice-versa.Hence for an adversary whose current state is si, there existsa path from si to some node in D that guarantees the win




of the adversary. The transition probability from si to a nodein ω that will lead to some node in D through sr is zero asthis path has lower adversary payoff. This gives π(ω) = 0 andcompletes the proof.

The following result proves that for any adversary policythe best response of the defender is to tag and trap at onenode in every attack path under that adversary policy.

Lemma VI.4. Let ΩD denote the set of all paths from s0 tosome node in D under any adversary policy. If the defender’spolicy be such that the probability of detecting the adversary isthe same for all ω ∈ΩD , then the best response of the defenderis always to tag at exactly one node in every ω ∈ΩD .

Proof. Given (1− p(ω))’s are same for all ω ∈ΩD . Considera defender’s policy pD in which exactly one node in everyω ∈ΩD is chosen as the tag source and tag sink. Assume thatthe defender policy is modified to p′D such that more than onenode in some path has nonzero tag and trap probability. Notethat, such a p′D exists as all paths ω ∈ ΩD at least have twostates in it where the defender can perform security analysis,since destinations are assumed to be at least one node awayfrom the entry points. This variation updates the probabilitiesof nodes in a set of paths in ΩD . For p′D to be a best response,UD(p′D,pA)>UD(pD,pA). The defender’s payoff is given by

UD(pD,pA) = ∑ω∈ΩD

π(ω)[(1− p(ω))αD + p(ω)β D+

∑si∈ω

[p1D(si)CD(si)+p2

D(si)WD(si)+N

∑r=1

p2+rD (si)γr]

]The terms in UD that correspond to αD and β D are the samein both cases as p(ω)’s are equal for all possible paths.Hence the terms in UD differ in the terms corresponding toCD, WD, and γr’s. Note that defender’s probabilities (policy)at two nodes in a path are dependent due to the constrainton p(ω). Hence for every path whose probabilities are mod-ified, ∑si∈ω [p

′1D (si)CD(si) + p′2D (si)WD(si) + ∑

Nr=1 p′rD (si)γr] <

∑si∈ω [p1D(si)CD(si) + p2

D(si)WD(si) + ∑Nr=1 pr

D(si)γr], as theprobability in the single node case is less than the sum ofthe probabilities of the more than one node case as the eventsare dependent and the CD and WD values are also the leastpossible (γr’s are equal at all nodes in the information flowgraph). This implies UD(p′D,pA)<UD(pD,pA). Therefore, thereexists no best response for the defender which has more thanone node with nonzero tag and trap probability in a path, ifp(ω)’s are equal for all ω ∈ΩD .

The result below deduces a property of the best response ofthe adversary which along with Lemma VI.4 establishes thefinal main argument to prove Theorem VI.2.

Lemma VI.5. Let ΩD denote the set of all paths from s0to some node in D under any adversarial policy. Let thedefender’s policy be such that the probability of detecting theadversary is the same for all ω ∈ΩD . Then the best responseof the defender is to tag and trap the flows at the min-cut ofthe flow-network F constructed in Algorithm VI.1.

Proof. By Lemma VI.4 the best response of the defender isto tag and trap at one node in every attack path. Note that

all attack paths chosen under pA pass through some node inthe min-cut. Assigning nonzero probability of tag and trap atthe nodes in S?, all possible attack paths have some nonzeroprobability of getting detected. We prove the result using acontradiction argument. Suppose that it is not the best responseof the defender to tag and trap the nodes in S?. Then, thereexists a subset of nodes S ⊂ S such that ∑si∈S CD(si) +WD(si)< ∑sr∈S? CD(sr)+WD(sr) and all possible paths froms0 to nodes in D pass through some node in S . Then, S is a(source-sink)-cut-set and let κ(S) := (si,s′i) : si ∈ S. Then,κ(S) is a cut set and cF(κ(S))< cF(κ(S?)). This contradictsthe fact that κ(S?) is an optimal solution to the (source-sink)-min-cut problem. Hence the best response of the defender isto tag and trap only the nodes in S?.

Now we present the proof of Theorem VI.2.Proof of Theorem VI.2: Lemma VI.3 proves that the bestresponse of the adversary is any sequence of transitions thatgives a path (or set of paths if mixed policy) that passesthrough exactly one node that is a tag source and a trap if thedefender’s policy is to tag at the min-cut nodes. Lemma VI.5concludes that the best response of the defender is to tag andtrap the adversary at the nodes in the min-cut of F , providedthe probability of detecting the adversary in all ω ∈ ΩDare equal. This implies that, if the detection probabilities((1− p(ω))’s) are equal at NE, then the defender’s policy atNE will tag and trap the nodes in the min-cut and the adversarywill choose an attack path such that it passes through exactlyone node that is tagged and also a trap. Now we show thatthe detection probabilities are indeed equal at NE.

Consider any unilateral deviation in the policy of theadversary. Let π(ω)’s for ω ∈ Ω be modified due to changein transition probabilities of the adversary such that theupdated probabilities of the attack paths are π(ωi) + εi, fori = 1, . . . , |Ω|. Here, εi’s can take positive values, negative val-ues or zero such that ∑

|Ω|i=1 εi = 0. Consider two arbitrary paths,

say ω1 and ω2, such that a unilateral change in the adversarypolicy changes π(ω1) and π(ω2) and the probabilities of theother paths remain unchanged. Without loss of generality,assume that π(ω1) increases by ε while π(ω2) decreasesby ε and all other π(ω)’s remain the same. As (pD,pA)

is a Nash equilibrium (π(ω1)+ ε)(

p(ω1)(βA−αA)+αA

)+

(π(ω2) − ε)(

p(ω2)(βA − αA) + αA

)6 π(ω1)

(p(ω1)(β

A −

αA) + αA)+ π(ω2)

(p(ω2)(β

A − αA) + αA)

. This implies(p(ω1)− p(ω2))(β

A−αA)6 0. As (β A−αA)> 0, this implies(p(ω1)− p(ω2)) 6 0. By exchanging the roles of ω1 andω2 and using the same argument one can also show that(p(ω1)− p(ω2))> 0. This implies (p(ω1)− p(ω2)) = 0. Sinceω1 and ω2 are arbitrary, one can show that for the general case

|Ω|

∑i=1

εi p(ωi) = 0. (12)

Eq. (12) should hold for all possible values of εi’s satisfying∑|Ω|i=1 εi = 0. This gives p(ωi) = p(ω j) for all i, j ∈ 1, . . . , |Ω|

at Nash equilibrium. This completes the proof.The NE of the game is characterized by a solution of the

min-cut problem and the set of transitions of the adversary




such that all attack paths have exactly one node which istagged and also a trap. Moreover, the tag and trap probabilityof these nodes are equal. Note that, the solution to the min-cutproblem is not unique in a general flow graph and hence theNE of the game may not be unique. Results in this subsectionconclude that at any NE the defender’s policy will tag and trapthe nodes in the min-cut with equal detection probability andthe adversary chooses its transitions such that in every attackpath exactly one node is a tag source and a tag sink.

B. Matrix Game Analysis for Nash Equilibrium

In this subsection, we discuss the matrix-game formulationof the single-stage case give in Table II. We first solve the(source-sink)-min-cut problem on F . Let an optimal solutionbe κ(S?). Let the vertex set corresponding to κ(S?) beS? = s1, . . . , sa, where S? := si : (si,s′i)∈ κ(S?). By Theo-rem VI.2, at NE the defender only tag and trap the nodes in S?and adversary chooses transitions such that it passes throughonly one tag and trap. The attack paths chosen by the adversaryare therefore characterized by s1, . . . , sa. We denote theprobability of selecting an attack path corresponding to thenode si as π(si). Further, let cost(si) denote the total cost ofselecting node si for conducting security analysis.

Remark VI.6. At NE, since any attack path passes throughexactly one node in S? = s1, . . . , sa which indeed has equalprobability of tag and trap, without loss of generality, onecan consider the action space of the adversary as the set ofdisjoint paths through S? and the adversary strategizes overthis set of disjoint paths. Thus in the bimatrix formulation, thestrategy of the adversary is to select a path which is uniquelydefined by a node in S?.

Now, we present a result that characterizes the set of NE ofthe single-stage attack case of the game given in Section III.

Theorem VI.7. Consider the dynamic game between PD

and PA where the attack consists of a single stage. LetS? = s1, . . . , sa be a min-cut node set of the flow network F .Then, solution to the bimatrix-game, given in Table II, givesa Nash equilibrium for the single-stage flow tracking game.

Proof. The defender’s payoffs for the two cases in Table II is

UD(Not detected) =a

∑i=1

π(si)(

βD)

(13)

UD(Detected) =a

∑i=1

π(si)(

αD + cost(si)

)(14)

The defender randomly chooses to detect the adversary or notif Eqs. (13) and (14) are equal. This gives

a

∑i=1

π(si)(

βD−α

D− cost(si))= 0. (15)

There are many possible values of π(si)’s for i = 1, . . . ,a, thatsatisfy Eq. (15). Each of those solution will give a probabilitymixture, i.e., π(si)’s, for the adversary at a Nash equilibrium.

In order to obtain the probability mixture of the defender,we consider the following in Table II. For every si ∈ S?,one can find the set of nodes that belong to the set λ

that has a directed path to the node si using a depth first

search (DFS) algorithm [22]. Let this set be denoted byλ (si). Then, cost(si) = CD(si) +WD(si) + ∑r∈λ (si) γr for allsi ∈ S?. Then the probability of not detecting the adversaryin an attack path with min-cut node si in it is given by(1−p1

D(si)p2D(si)∏r∈λ (si) p2+r

D (si)).

UA(si) =(

1−p1D(si)p2

D(si) ∏r∈λ (si)

p2+rD (si)

)β

A +

p1D(si)p2


p2+rD (si)α

A, for i = 1, . . . ,a. (16)

The adversary will randomly choose between attackpaths that correspond to nodes s1, s2, . . . , sa only whenUA(s1) = UA(s2) = . . . = UA(sa). Theorem VI.2 saysthat p(ω) = (1 − p1

D(s1)p2D(s1)∏r∈λ (s1) p2+r

D (s1)) = . . . =

(1− p1D(sa)p2

D(sa)∏r∈λ (sa) p2+rD (sa)). This implies UA(s1) =

UA(s2) = . . .=UA(sa).

The defender’s payoff is given by

UD(pD,pA) = ∑ω∈Ω

π(ω)[(1− p(ω))αD + p(ω)β D+

∑si∈ω

[p1D(si)CD(si)+p2

D(si)WD(si)+N

∑r=1

p2+rD (si)γr]

]At Nash equilibrium with respect to the solution S? of the(source-sink)-min-cut game, changing p2+r

D (si) for any valueof r ∈ 1, . . . ,N and for one node, say si ∈ S?, will notimprove the payoff UD. Firstly, let us assume that the taggingprobability at si changes from p1

D(si) to p′1D(si). By equilibriumcondition UD(pD,pA)>UD(p′D,pA). Note that by Lemma VI.3each node with nonzero defender’s probability will lie inexactly one chosen path by the adversary. Hence

π(si)[

p(ω)(β D−αD)+α

D +p1D(si)CD(si)

]> π(si)

[p′(ω)(β D−α

D)+αD +p′1D(si)CD(si)

], and

π(si)[(p(ω)− p′(ω))(β D−α

D)+(p1D(si)−p′1D(si))CD(si)

]> 0.

Here

p(ω)− p′(ω) =(

p2D(si) ∏

r∈λ (si)

p2+rD (si)

)︸︷︷︸

ϕ1(si)

(p′1D(si)−p1

D(si)).

This gives[π(si)

(ϕ1(si)(β

D−αD)−CD(si)

)](p′1D(si)−p1

D(si))> 0.

The term π(si)[ϕ1(si)(β

D−αD)− CD(si)]

is independent ofthe change in the tagging probability at si and the value iseither positive or negative. By the equilibrium assumption,π(si)

[ϕ1(si)(β

D−αD)−CD(si)]= 0 (since (p′1D(si)−p1

D(si))

can be made positive or negative and the inequality must holdfor both cases). As π(si) 6= 0, this implies

ϕ1(si) =CD(si)

(β D−αD).

By varying the tag sink selection probability, i.e., changing




Table II: Matrix game for single-stage case with disjoint attack paths

Defender

Adversarys1 s2 . . . sa

Not detected (β D,β A) (β D,β A) . . . (β D,β A)

Detected (αD + cost(s1),αA) (αD + cost(s2),α

A) . . . (αD + cost(sa),αA)

p2D(si) to p′2D(si), we get

ϕ2(si)=WD(si)

(β D−αD), where ϕ2(si) :=

(p1


p2+rD (si)

).

Similarly, by varying the probability of selection each of therules at si, for r ∈ 1, . . . ,N,

ϕr+2(si) =γr

(β D−αD).

Taking logarithms of ϕ1(si), . . . ,ϕ2+N(si), for a node si ∈ S ,we get 2 + N independent linear equations with 2 + N un-knowns. Thus there exists a unique solution to this set ofequations which indeed gives the defender’s policy at Nashequilibrium. Thus solution to the matrix-game in Table II givesa NE to the single-stage case.

This completes the discussion on the NE for the flowtracking game when the attack consists of a single stage.

VII. RESULTS: MULTI-STAGE ATTACKS

Solving for Nash equilibrium in non-zero sum, imperfectinformation game settings is generally known to be computa-tionally difficult. In this section, we present an efficient algo-rithm to compute a locally optimal correlated equilibrium [23],[24] of the game introduced in Section III.

Formal definitions for the correlated equilibrium and thelocal correlated equilibrium are given in the Definitions IV.3and IV.4, respectively. Intuitively correlated equilibrium canbe viewed as a general distribution over a set of strategyprofiles such that if an impartial mediator privately recom-mends actions to each player from this distribution, then noplayer has an incentive to choose a different strategy. Thecorrelated equilibrium has several advantages [23], [24] : (1)it is guaranteed to always exist and (2) it can be found inpolynomial time (i.e., computing a correlated equilibrium onlyrequires solving a linear program whereas solving a Nashequilibrium requires finding its fixed point).

In order to find locally optimal correlated equilibriumsolutions, we map our two-player game model into a gamewith ((M +2)N +Λ+1) players, where Λ = ∑

Ni=1 Λ(si) with

si ∈VG and Λ(si) stands for the total number of security rulesassociated with the node si ∈VG in the information flow graph.Then the adversary’s strategy is represented by (MN + 1)players, MN of which represents the adversary’s actions atevery node si ∈VG and for a specified stage j ∈ 1, . . . ,M andone adversarial player acting on the pseudo node, s0, whosestrategy decides the entry point chosen by the adversary intothe system. Similarly, the defender’s strategy is represented by∑

Ni=1 Λ(si)+ 2N players, each one of the Λ(si)+ 2 defender

player represents the defender’s strategy (tag selection, trapselection, and selection of security rules) at a single node si.

Formally, we consider a set of players PAi j : i =1, . . . ,N, j = 1, . . . ,M ∪ PDi : i = 1, . . . ,2N +Λ ∪ Ps0.Each of the players in PAi j has action space A Ai j = N (si),each player in PDi has action space 0,1 which, dependingon the type of defender player, represents whether or not to tagor trap/no trap or selecting or not selecting a specific tag checkrule. The player Ps0 has action space λ . We let aD denote theset of actions chosen by the players PDi : i = 1, . . . ,2N+Λand aA denote the set of actions chosen by the playersPAi j : i = 1, . . . ,N, j = 1, . . . ,M∪Ps0.

The payoffs of the players from a particular action set aregiven by

UAi j(aA,aD) = UA(aA,aD),

UDi(aA,aD) = UD(aA,aD),

where UA and UD are as defined in Section III. Hence, alladversarial players receive the same payoff UA, while alldefender players receive the payoff UD. Equivalently, theadversarial players UAi j cooperate in order to maximize theadversary’s payoff, while the defender players UDi attempt tomaximize the defender payoff.

Under the solution algorithm, the game is played repeat-edly, with each player choosing its action from a probabilitydistribution (mixed strategy) over the set of possible actions.After observing their payoffs, the players update their strate-gies according to an internal regret minimization learningalgorithm [25]. A pseudocode of the proposed algorithm forcomputing correlated equilibrium strategies for both defenderand adversary players is given in Algorithm VII.1.

The algorithm initializes the strategies at each node ofthe information flow graph to be uniformly random. At eachiteration t, an action is chosen for each player according tothe probability distribution pt,n of player n. After observingthe actions from other players, the probability distributionpt,n is updated as follows. For each pair of actions r and s,the new probability distribution pr→s

t,n is generated, in whichall of the probability mass allocated to action r is insteadallocated to action s. The expected payoff arising from pr→s

t,ncan be interpreted as the expected benefit from playing actions instead of r at previous iterations of the algorithm.

For each pair (r,s), a weight ∆(r,s),t,n is computed thatconsists of the relative benefit of each distribution pr→s

t,n , i.e.,pairs (r,s) such that allocating probability mass from r tos produces a larger expected payoff will receive a higherweight. A new distribution pt,n is then computed based onthe weights ∆(r,s),t,n, so that actions that produced a higherpayoff for the player will be chosen with increased probability.The algorithm continues until the distributions converge. The




Algorithm VII.1 Pseudocode of the algorithm for computingcorrelated equilibrium

1: Initialize t← 02: for n = 1, . . . ,(M+2)N +Λ+1 do3: pt,n← uniform distribution over set of actions4: end for5: while ||pt −pt−1||> ε do6: for n = 1, . . . ,(M+2)N +Λ+1 do7: at,n← action chosen from distribution pt,n8: end for9: for n = 1, . . . ,(M+2)N +Λ+1 do

10: at,−n← (at,l : l 6= n)11: for all (r,s) actions of player n do12: pr→s

t,n ← pt,n13: pr→s

t,n (r)← 014: pr→s

t,n (s)← pt,n(r)+pt,n(s)

15: ∆(r,s),t,n←exp(η ∑

t−1u=1 E(Un(pr→s

u,n ,au,−n)))∑(x,y):x 6=y exp(η ∑

t−1u=1 E(Un(px→y

u,n ,au,−n)))16: pt,n ← fixed point of equation pt,n =

∑(i, j):i6= j pr→st,n ∆(r,s),t,n

17: end for18: end for19: t← t +120: end while

convergence of the algorithm is described by the followingproposition.

Proposition VII.1. Algorithm VII.1 converges to a localcorrelated equilibrium of the game introduced in Section III.

Proof. By [25], Algorithm VII.1 converges to a correlatedequilibrium of the ((M + 2)N +Λ+ 1)-player game. Equiv-alently, by Definition IV.3, for any si and p′Di

∈ [0,1], the jointdistribution P returned by the algorithm satisfies

E(UDi(pDi ,pD−i ,pA))≥ E(UDi(p′Di,pD−i ,pA)). (17)

Since the payoff UDi is equal to UD for all i∈ 1, . . . ,N, Eq.(17) is equivalent to

E(UD(pDi ,pD−i ,pA))≥ E(UD(p′Di,pD−i ,pA)). (18)

Similarly, for any s ji ∈ S×1, . . . ,M∪s1

0 and any p′Ai j,

we have

E(UAi j(pD,pA−i j , pAi j))≥ E(UAi j(pD,pA−i j , p′Ai j)) (19)

which is equivalent to

E(UA(pD,pA−i j , pAi j))≥ E(UA(pD,pA−i j , p′Ai j)) (20)

Equations (18) and (20) imply that the output of AlgorithmVII.1 satisfies the conditions of Definition IV.4 and hence isa local correlated equilibrium.

The following Proposition provides the complexity analysisfor the proposed algorithm.

Proposition VII.2. With probability (1 − ζ ), Algorithm

VII.1 returns an ε-correlated2 equilibrium usingO(

N2(M+2)+N(Λ+1)ε2 ln

(N2(M+1)+N(Λ+1)

ζ

))evaluations of

the payoff function.

Proof. By [25, Chapter 7, Section 7.4], learning-based al-gorithms return an ε-correlated equilibrium with probability(1 − ζ ) within maxn

16ε2 ln NnK

ζiterations, where Nn is the

number of actions for player n and K is the number of players,incurring a total of 16NnK

ε2 ln NnKζ

evaluations of the payofffunction. In this case, Nn ≤ N and K = (M + 2)N +Λ+ 1,resulting in the desired complexity bounds.

Proposition VII.2 shows that convergence of the algorithmis sublinear in the number of nodes, with a total complexitythat is quadratic in the number of nodes and linear in thenumber of stages.

VIII. EXPERIMENTAL ANALYSIS

In this section, provide experimental validation of our modeland simulation results to complement our theoretical resultsusing real-world attack data obtained using Refinable AttackINvestigation system (RAIN) [4], [5] for a three day nationstate attack. We implement our model and run Algorithm VII.1on the information flow graph generated using the system logdata obtained using the RAIN system for day one of the nationstate attack. The output of the algorithm gives the defender’spolicy, i.e., tagging locations, trapping locations and selectionof appropriate security rules, at a local equilibrium of themulti-stage game. We also perform sensitivity analysis byvarying the cost of defense (i.e., tagging costs, trapping costsand individual security rule selection costs at each node inthe underlying information flow graph) for the defender. Thisanalysis enables us to infer the optimal strategies of theplayers and the sensitivity of the model with respect to costparameters for a given attack data set (information flow graphwith specified destinations for each attack stage).

We present below the details of the attack we consider andthe steps involved in the construction of the information flowgraph for that attack.

A. Attack DescriptionThe evaluation was completed on a nation state attack (i.e.,

state-of-the-art APT attack) orchestrated by a red-team duringan adversarial engagement. The engagement was organizedby a US government agency (US DARPA). During the en-gagement, we leveraged RAIN [4] to record the whole-systemlog. At a high-level the goal of the adversaries’ campaignwas to steal sensitive proprietary and personal informationfrom the targeted company. The attack is designed to runthrough three days. We only consider the day 1 log data forour evaluation purposes. Through our extensive analysis, wepartitioned the attack in day 1 into four key stages: initialcompromise, internal reconnaissance, foothold establishment,and data exfiltration. The initial compromise leveraged a spear-phishing attack, which lead the victim to a website that washosting ads from a malicious web server. The victim navigated

2An ε-correlated equilibrium is a correlated equilibrium at which no playercan improve the expected payoff more that ε by unilaterally deviating fromthe strategy.




0 50 100 150 200

Iteration number

950

1050

1150

1250

1350

1450

1550

1650

1750

1850

1950D

efen

der

uti

lity

0 50 100 150 200

Iteration number

-2010

-2000

-1990

-1980

-1970

Ad

vers

ary

uti

lity

-2 -1.5 -1 -0.5 0 0.5 1

Scaling factor for cost in log

900

1100

1300

1500

1700

1900

2100

Def

end

er u

tilit

y

(a) (b) (c)

Figure 1: (a) Average payoff of the defender and (b) Average payoff of the adversary, at each iteration of Algorithm VII.1 with cost parametersof the game set as follows: β A

1 = 100,β A2 = 200,β A

3 = 500,β A4 = 1200, αA =−2000,αD = 2000, β D

1 =−100,β D2 =−200,β D

3 =−500,β D4 =

−1200. The costs of tagging and trapping at each node are set with a fixed cost value of c1 =−50 and c2 =−50, respectively, multiplied bythe fraction of flows through each node in the information graph. The cost of selecting the security rules are set as γ1 = . . .= γN =−50. Forsimulation purposes we assume that the fraction of the flows through each node in the information flow graph is uniform. (c) Average payoffof defender as a function of the cost for defense. In each realization of the experiment we scale the components of the cost for defense (i.e.costs for tag, trap and tag check rule selection) by a increasing scaler factors 0.01, 0.1, 0.5, 1, 3, 6 and 10. All the other game parametershave been fixed to constant values used in the Case Study VIII-B.

to the website, which exploited a vulnerability in the Firefoxbrowser. Once the attackers compromised the machine, thenext stage of the APT leveraged common payoffs to do internalreconnaissance. The goal of this stage was to fingerprint thecompromised system to detect running processes and networkinformation. The attackers then established a foothold bywriting a malicious program to disk. The malicious programwas eventually executed, and established a backdoor, whichwas used to continuously exfiltrate the companies sensitivedata.

The system log data for day 1 of the nation state attack iscollected with the annotated entry points of the attack and theattack destinations corresponding to each stage of the attack.Initial conversion of the attack data into an information flowgraph resulted in a coarse-grain graph with ≈ 132,000 nodesand ≈ 2 million edges. The coarse graph captures the wholesystem data during the recording time which includes theattack-related data and lots of data related to the system’sbackground processes (noise). Hence coarse graph providesvery little security-sensitive (attack-related) information aboutthe underlying system and it is computationally intensiveto run our algorithm on such a coarse graph. Without lossof any relevant information, we pruned the coarse graph toextract the security-sensitive information about the systemfrom the data [4]. The resulting information flow graph iscalled security-sensitive information sub-graph [4] and we runour experimental analysis on this graph. The pruning includesthe following two major steps:1) Starting from the provided attack destinations, perform up-

stream, downstream and point-to-point stream techniquesdiscussed in [4] to prune the coarse information flow graph.

2) Prune the resulting subgraph from Step 1) by combiningobject nodes (e.g. files, net-flow objects) that belong to thesame directories or that use the same network socket.

The resulting pruned information flow graph (see Figure 2)consists of 30 nodes (N = 30) out of which 8 nodes areidentified as attack destination nodes corresponding to each

Figure 2: Security-sensitive information flow sub-graph for thenation state attack

of the 4 stages (M = 4) of the day 1 nation state attack. Onenode related to a net-flow object has been identified as anentry point used for the attack (|λ |= 1). Note that, even whenthe sensitive locations in the system are known it may not befeasible to do tagging, trapping, and security analysis at thatlocation (entry point of attack) which is captured in our modelby the costs for tagging, trapping, and performing authenticityanalysis using a security rule.

B. Case Study 1: Convergence of the algorithm

In this section, we provide a case study that validates theconvergence of the proposed algorithm. For simulation pur-poses, we assume the fraction of the flows through each nodeis uniformly distributed. From the system log, one can estimatethe distribution of the fraction of flows through each node byaveraging the number of events associated with each nodewith respect to the total number of events occurred during thewhole system recording. Figures 1(a) and (b) plots the payoffvalues for both players at each iteration of Algorithm VII.1 forthe above-mentioned parameters. Both defender and adversarypayoffs converge within a finite number of iterations.




C. Case Study 2: Payoff of the defender vs. defense cost

This case study is used to analyze the effect of the costof defense on the defender’s payoff. We use the same gameparameters as in Case Study 1. Figure 1(c) shows that theexpected payoff of the defender starts decreasing exponentiallyas the cost of defense increases. When the defense costincreases defender incurs more resource cost to maintain thesame level of security (maintain low attack success probabilityby setting up tag sources and traps) in the system or highercosts to keep the defender away from frequently deploying tagsources and traps in the system and hence the attack successesrate increases and the defender’s payoff decreases. This resultimplies that an optimal selection of the locations for taggingand trapping is critical for a resource-efficient implementationof detection system against APTs.

IX. CONCLUSION

This paper proposed a game-theoretic framework for cost-effective real-time detection of Advanced Persistent Threats(APTs) that perform a multi-stage attack. We used an in-formation flow tracking-based detection mechanism as APTscontinuously introduce information flows in the system. Asthe defense mechanism is unaware of the stage of the attackand also can not distinguish whether an information flowis malicious or not, the game considered in this paper hasasymmetric information structure. Further, the resource costsof the attacker and the defender are not the same, resulting ina multi-stage nonzero-sum imperfect information game. Wefirst computed the best responses of both players. For theadversary, we showed that the best response can be obtainedusing a shortest path algorithm in polynomial time. For thedefender, we showed that the payoff function of the defenderis submodular for a given adversary strategy and hence a1/2-optimal solution to the best response can be found inpolynomial time. For solving the game in order to obtain anoptimal policy for the defender, we first considered the single-stage attack case and characterized the set of Nash equilibriaby proving the equivalence of the dynamic game to a bimatrix-game formulation. We showed that at equilibrium an optimaldefense strategy is to choose a min-cut node set correspondingto the information flow graph to conduct the security analysis.Then, we considered the multi-stage case and provided apolynomial time algorithm to find an ε-correlated equilibrium,for any given ε . We performed experimental analysis ofour model on real-data for a three day nation state attackobtained using Refinable Attack INvestigation (RAIN) system.Extending the model to address cases where the targets of theadversary are unknown is a part of future work.

REFERENCES

[1] G. Brogi and V. V. T. Tong, “TerminAPTor: Highlighting advancedpersistent threats through information flow tracking,” IFIP InternationalConference on New Technologies, Mobility and Security, 2016.

[2] J. Newsome and D. Song, “Dynamic taint analysis: Automatic detection,analysis, and signature generation of exploit attacks on commoditysoftware,” Network and Distributed Systems Security Symposium, 2005.

[3] G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, “Secure programexecution via dynamic information flow tracking,” ACM SIGPLANNotices, vol. 39, no. 11, pp. 85–96, 2004.

[4] Y. Ji, S. Lee, E. Downing, W. Wang, M. Fazzini, T. Kim, A. Orso,and W. Lee, “RAIN: Refinable attack investigation with on-demandinter-process information flow tracking,” ACM SIGSAC Conference onComputer and Communications Security, pp. 377–390, 2017.

[5] Y. Ji, S. Lee, M. Fazzini, J. Allen, E. Downing, T. Kim, A. Orso, andW. Lee, “Enabling refinable cross-host attack investigation with efficientdata flow tagging and tracking,” USENIX Security Symposium, pp. 1705–1722, 2018.

[6] J. Clause, W. Li, and A. Orso, “Dytan: a generic dynamic taint analysisframework,” International Symposium on Software Testing and Analysis,pp. 196–206, 2007.

[7] E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wantedto know about dynamic taint analysis and forward symbolic execution(but might have been afraid to ask),” IEEE Symposium on Security andPrivacy, pp. 317–331, 2010.

[8] M. Tambe, Security and Game Theory: Algorithms, Deployed Systems,Lessons Learned. Cambridge University Press, 2011.

[9] T. Alpcan and T. Basar, Network Security: A Decision and Game-Theoretic Approach. Cambridge University Press, 2010.

[10] M. Van Dijk, A. Juels, A. Oprea, and R. L. Rivest, “FlipIt: the game of“stealthy takeover”,” Journal of Cryptology, vol. 26, no. 4, pp. 655–713,2013.

[11] P. Lee, A. Clark, B. Alomair, L. Bushnell, and R. Poovendran, “A hosttakeover game model for competing malware,” IEEE Conference onDecision and Control, pp. 4523–4530, 2015.

[12] M. Min, L. Xiao, C. Xie, M. Hajimirsadeghi, and N. B. Mandayam,“Defense against advanced persistent threats in dynamic cloud storage:A colonel blotto game approach,” arXiv preprint arXiv:1801.06270,2018.

[13] S. Rass, S. Konig, and S. Schauer, “Defending against advancedpersistent threats using game-theory,” PLoS one, vol. 12, no. 1, pp.e0 168 675: 1–43, 2017.

[14] P. Hu, H. Li, H. Fu, D. Cansever, and P. Mohapatra, “Dynamicdefense strategy against advanced persistent threat with insiders,” IEEEConference on Computer Communications, pp. 747–755, 2015.

[15] D. Sahabandu, B. Xiao, A. Clark, S. Lee, W. Lee, and R. Poovendran,“DIFT games: dynamic information flow tracking games for advancedpersistent threats,” IEEE Conference on Decision and Control, 2018.

[16] S. Moothedath, D. Sahabandu, A. Clark, S. Lee, W. Lee, and R. Pooven-dran, “Multi-stage dynamic information flow tracking game,” in Con-ference on Decision and Game Theory for Security. Lecture Notes inComputer Science, 2018, vol. 11199, pp. 80–101.

[17] P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna,“Cross site scripting prevention with dynamic data tainting and staticanalysis.” Network & Distributed System Security Symposium, 2007.

[18] J. F. Nash, “Equilibrium points in n-person games,” Proceedings of theNational Academy of Sciences, vol. 36, no. 1, pp. 48–49, 1950.

[19] X. Chen, X. Deng, and S.-H. Teng, “Settling the complexity of com-puting two-player nash equilibria,” Journal of the ACM, vol. 56, no. 3,p. 14, 2009.

[20] N. Buchbinder, M. Feldman, J. Seffi, and R. Schwartz, “A tight lineartime (1/2)-approximation for unconstrained submodular maximization,”SIAM Journal on Computing, vol. 44, no. 5, pp. 1384–1402, 2015.

[21] J. B. Orlin, “A faster strongly polynomial minimum cost flow algorithm,”Operations Research, vol. 41, no. 2, pp. 338–350, 1993.

[22] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introductionto Algorithms. MIT press: Cambridge, 2001.

[23] C. H. Papadimitriou and T. Roughgarden, “Computing correlated equi-libria in multi-player games,” Journal of the ACM, vol. 55, no. 3, pp.14:2–29, 2008.

[24] R. J. Aumann, “Correlated equilibrium as an expression of Bayesianrationality,” Econometrica: Journal of the Econometric Society, pp. 1–18, 1987.

[25] N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games.Cambridge university press, 2006.




Shana Moothedath is a Postdoctoral ResearchScholar in the Department of Electrical and Com-puter Engineering at the University of Washington- Seattle. She received her B.Tech. and M.Tech.degrees in Electrical and Electronics Engineeringfrom the Kerala University, India, in 2011 and2014 respectively, and Ph.D. degree in ElectricalEngineering from Indian Institute of TechnologyBombay, India, in 2018. Her research interests in-clude network security analysis, structural analysisof large-scale control systems, and applications of

systems theory to complex networks.

Dinuka Sahabandu is a Ph.D. candidate in theDepartment of Electrical and Computer Engineeringat the University of Washington - Seattle. He re-ceived the B.S. degree and M.S. degree in ElectricalEngineering from the Washington State University -Pullman in 2013 and 2016, respectively. His researchinterests include game theory for network securityand control of multi-agent systems.

Joey Allen is a Ph.D. candidate in the Collegeof Computing at Georgia Institute of Technology,where he is studying information security under Dr.Wenke Lee. He received the B.S. degree and M.S.degree in Computer Engineering from the Univer-sity of Tennessee - Knoxville in 2014 and 2016,respectively. His research interests include forensicanalysis, mobile security, and information tracking.

Andrew Clark (M’15) is an Assistant Professor inthe Department of Electrical and Computer Engi-neering at Worcester Polytechnic Institute. He re-ceived the B.S. degree in Electrical Engineering andthe M.S. degree in Mathematics from the Univer-sity of Michigan - Ann Arbor in 2007 and 2008,respectively. He received the Ph.D. degree from theNetwork Security Lab (NSL), Department of Elec-trical and Computer Engineering, at the Universityof Washington - Seattle in 2014. He is author or co-author of the IEEE/IFIP William C. Carter award-

winning paper (2010), the WiOpt Best Paper (2012), the WiOpt Student BestPaper (2014), and an IEEE GameSec Outstanding Paper (2018), and was afinalist for the IEEE CDC 2012 Best Student-Paper Award and IEEE/ACMICCPS Best Paper (2016, 2018). He received the University of WashingtonCenter for Information Assurance and Cybersecurity (CIAC) DistinguishedResearch Award (2012) and Distinguished Dissertation Award (2014), and anNSF CRII award (2016). He holds a patent in privacy-preserving constant-time identification of RFID. His research interests include control and securityof complex networks, submodular optimization, control-theoretic modeling ofnetwork security threats, and deception-based network defense mechanisms.

Linda Bushnell (F’17) is a Research Professor inthe Department of Electrical and Computer Engi-neering at the University of Washington - Seattle.She received her Ph.D. in Electrical Engineeringfrom University of California - Berkeley in 1994, herM.A. in Mathematics from University of California -Berkeley in 1989, her M.S. in Electrical Engineeringfrom University of Connecticut - Storrs in 1987, andher B.S. in Electrical Engineering from Universityof Connecticut - Storrs in 1985. She also receivedher MBA from the University of Washington Foster

School of Business in 2010. Her research interests include networked controlsystems and secure-control. She is a Fellow of the IEEE for contributions tonetworked control systems. She is a Fellow of IFAC for contributions to theanalysis and design of networked control systems. She is a recipient of theUS Army Superior Civilian Service Award, NSF ADVANCE Fellowship, andIEEE Control Systems Society Distinguished Member Award. She has been amember of the IEEE since 1985, and a member of the IEEE CSS since 1990.She is currently the Treasurer of the American Automatic Control Council,Member of the Technical Board for the International Federation on AutomaticControl, Associate Editor for Automatica, and Associate Editor for the IEEETransactions on Control of Network Systems.

Wenke Lee is a Professor of Computer Sciencein the College of Computing at Georgia Institute ofTechnology. He also serves as the Director of theGeorgia Tech Information Security Center (GTISC).He received his Ph.D. in Computer Science fromColumbia University in 1999. Dr. Lee works insystems and network security. His current researchprojects are in the areas of botnet detection andattribution, malware analysis, virtual machine moni-toring, mobile phone security, and detection and mit-igation of information manipulation on the Internet,

with funding from NSF, DHS, DoD, and the industry. Dr. Lee has publishedover 100 articles with more than 40 of them cited more than 100 times. In2006, Dr. Lee co-founded Damballa, Inc., a spin-off from his lab that focuseson botnet detection and mitigation.

Radha Poovendran (F’15) is a Professor in theDepartment of Electrical and Computer Engineeringat the University of Washington (UW) - Seattle.He served as the Chair of the Electrical and Com-puter Engineering Department at UW for five yearsstarting January 2015. He is the Director of theNetwork Security Lab (NSL) at UW. He is theAssociate Director of Research of the UW Center forExcellence in Information Assurance Research andEducation. He received the B.S. degree in ElectricalEngineering and the M.S. degree in Electrical and

Computer Engineering from the Indian Institute of Technology- Bombay andUniversity of Michigan - Ann Arbor in 1988 and 1992, respectively. Hereceived the Ph.D. degree in Electrical and Computer Engineering from theUniversity of Maryland - College Park in 1999. His research interests arein the areas of wireless and sensor network security, control and security ofcyber-physical systems, adversarial modeling, smart connected communities,control-security, games-security, and information theoretic security in thecontext of wireless mobile networks. He is a Fellow of the IEEE for hiscontributions to security in cyber-physical systems. He is a recipient ofthe NSA LUCITE Rising Star Award (1999), National Science FoundationCAREER (2001), ARO YIP (2002), ONR YIP (2004), and PECASE (2005)for his research contributions to multi-user wireless security. He is alsoa recipient of the Outstanding Teaching Award and Outstanding ResearchAdvisor Award from UW EE (2002), Graduate Mentor Award from Officeof the Chancellor at University of California - San Diego (2006), and theUniversity of Maryland ECE Distinguished Alumni Award (2016). He wasco-author of award-winning papers including IEEE/IFIP William C. CarterAward Paper (2010) and WiOpt Best Paper Award (2012).


a game-theoretic approach for dynamic information flow...

Documents