optimal defense policies for...

University of Michigan - Ann Arbor

Erik MiehlingMohammad RasouliDemosthenis Teneketzis

Second ACM Workshop on Moving Target Defense (MTD 2015)Denver, Colorado — October 12-15, 2015

Optimal Defense Policies for Partially Observable Spreading Processes on Bayesian Attack Graphs

1

Motivation

❖ Factors from information security:

Confidentiality (C) — Ensuring data does not get into the wrong hands

Integrity (I) — Maintaining accuracy/trustworthiness of information

Availability (A) — Ensuring that data is always available to trusted users

❖ Modern systems are becoming increasingly connected to improve operational efficiency and flexibility

❖ This convenience comes at the cost of introducing many vulnerabilities

2

C

I A

CIA triad

Reactiveness — should respond to observed changes in the system condition

Predictiveness — forecast, and prepare for, where the system will be in the future

❖ Factors from MTD:

The Conflict Environment

❖ We consider a dynamic setting where a network is continually being subjected to attacks with the objective of compromising some target resources through exploits.

❖ The defender can control services in the network to prevent the attacker from reaching the target resources.

❖ Philosophy: Automated approach to defending the network.

❖ Defense actions are observation-driven

3

Contribution❖ Contribution: We introduce a formal model which allows us to

study dependencies between vulnerabilities in a system and to define (and compute) optimal defense policies.

❖ Aspects of our defense model❖ Progressive attacks — recent exploits build upon previous exploits,

progressively degrading the system❖ Dynamic defense — defender is choosing the best action based on

new information❖ Partial knowledge — the defender is uncertain about the security of

the network at any given time❖ It is both reactive and predictive

4

Attack Graphs❖ Insufficient to look at single

vulnerabilities when protecting a network

❖ Attackers combine vulnerabilities to penetrate the network

❖ Attack graphs model how multiple vulnerabilities can be combined and exploited by an attacker

❖ Explicitly takes into account paths that the attacker can take to reach the critical exploitation

❖ Can use tools such as CAULDRON* to generate attack graphs5 *Jajodia et al. 2010-11

Bayesian Attack Graphs

❖ Nodes, , represent attributesN

: critical (root) nodes : leaf nodes

❖ Bayesian attack graph: G = {N , ✓, E ,P}

1

2

3

45

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

↵12

↵13

↵14

NL = {1, 5, 7, 8, 11, 12, 16, 17, 20}NC = {9, 14} ✓ NR = {2, 9, 14, 18}

Attributes:

Types:

Exploits:

Probabilities:

❖ Types, : AND attributes

✓

❖ Edges, , represent exploits : exploits

❖ Probabilities, : exploit probabilities

: OR attributes

E

P

NL ✓ NNC ✓ NR ✓ N

E = (i, j)i,j2N

P = (↵ij)(i,j)2E

N^ ✓ N \ NL

N_ ✓ N \ NL

N^ = {2, 3, 6, 9, 10, 13, 18, 19}N_ = {4, 14, 15}

E = {(1, 2), (1, 3), . . . , (20, 19)}

P = {↵1,2,↵1,3, . . . ,↵20,19}

6

❖ The attacker’s behavior is assumed to follow a probabilistic spreading process

Spreading Process

t = ⌧At time :

probability that exploitwill be discovered/taken

by the attacker(public knowledge)

❖ Contagion seed and spread: At each time t

1. Each leaf attribute is enabled with probability

2. Contagion spreads according to predecessor rules

Xi⌧ = 1

j

X l⌧ = 0

k

↵il

↵jl

↵kl

Xit = 0 Xi

t = 1(disabled) (enabled)

❖ Each attribute (node) can be in one of two states

i 2 N

↵i 2 [0, 1]

7

❖ The type of the attribute dictates the nature of the spreading process

Spreading Process — Predecessor Rules

❖ For AND attributes, e.g. node l

set of direct predecessors

t = ⌧At time :

Xi⌧ = 1

j

X l⌧ = 0

m

n

k

↵il

↵jl

↵kl

↵mk

↵nk

P (X lt+1 = 1|X l

t = 0, Xt) =

8<

:

Q

p2Dl

↵pl if

^

p2Dl

Xpt = 1

0 otherwise

8

❖ The type of the attribute dictates the nature of the spreading process

Spreading Process — Predecessor Rules

❖ For AND attributes, e.g. node l

❖ For OR attributes, e.g. node k

set of direct predecessors

t = ⌧At time :

Xi⌧ = 1

j

X l⌧ = 0

m

n

k

↵il

↵jl

↵kl

↵mk

↵nk

P (X lt+1 = 1|X l

t = 0, Xt) =

8<

:

Q

p2Dl

↵pl if

^

p2Dl

Xpt = 1

0 otherwise

P (Xkt+1 = 1|Xk

t = 0, Xt) =

8<

:

1�Q

p2Dk

(1� ↵pk) if

_

p2Dk

Xpt = 1

0 otherwise

9

Defender’s Observations

❖ Defender thus observes a subset of enabled nodes that have been discovered at each time-step

disabled attribute

enabled & undetected

enabled & detected

❖ Defender only partially observes this process

Yt 2 {0, 1}N

❖ Rationale: defender may not know the full capability of the attacker at any given time

Assumption: No false positives can occur.

�i = P (Y it = 1|Xi

t = 1)probability of detection

P (Y it = 1|Xi

t = 0) = 0

10

Defender’s Countermeasures❖ The defender uses network services as countermeasures

❖ Existence of exploits depend on services

❖ For example❖ Secure Shell (SSH)

❖ File Transfer Protocol (FTP)

❖ Port scanning

❖ etc.

❖ Defender can thus temporarily block or disable these services to stop the attacker from progressing through the network

❖ We don’t consider the typical hardening strategy definition of removing all

11

Defender’s Countermeasures

u11

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

❖ Suppose there are a set of services

❖ Taking action corresponds to disabling service ❖ disables a subset of the

nodes

{u1, . . . , uM}M

mum

um

Wum

Xi = 0, i 2 Wum

Wu1 = {1}

12

ut 2 U = }({u1, . . . , uM})

❖ Action at time t


u2

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

u2

{u1, . . . , uM}M

mum

um

Xi = 0, i 2 Wum

Wu2 = {5, 17}

13



nodes Wum

ut 2 U = }({u1, . . . , uM})



u3

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

{u1, . . . , uM}M

mum

um

Xi = 0, i 2 Wum

Wu3 = {13}

14



nodes Wum

ut 2 U = }({u1, . . . , uM})



1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

u1

u2

u2u3

u4 u4

u4

u5

u5

u6

ut 2 U = }({u1, . . . , uM})

{u1, . . . , uM}M

mum

um

Xi = 0, i 2 Wum


Assumption: All leaf nodes are covered by at least one service.

15



nodes Wum

Cost Function

❖ Confidentiality & Integrity factor: state is less costly than state

❖ Cost of taking action in state : u 2 U C(x, u)x 2 X

x

x

16

uu

C(·, u) < C(·, u)

❖ Availability factor: an action that has a higher negative impact on availability than another action should satisfy:

C(x, ·) < C(x, ·)x

x

Monotone States

Assumption: The only feasible states are monotone.

1

2 3

45

6

1

2 3

45

6

1

2 3

45

6

1

2 3

45

6

Assumption: The attack graph only contains AND nodes.

(monotone states under general topology)

(monotone states under AND topology)

simplifying assumption

1

2 3

45

6

1

2 3

45

6

1

2 3

45

6

17

Defender’s Information States❖ Define the history up to time t as

❖ We capture by an information stateHt

❖ Information state obeys the update rule T : �(X )⇥ Y ⇥ U ! �(X )

⇡t+1 = T (⇡t, yt+1, ut)

x1 x2 x3 x4 xK

. . . X

space of monotone states

Ht = (⇡0, U1, Y1, U2, Y2, . . . , Ut�1, Yt)

⇧ it = P (Xt = xi|Ht)

⇧t = (⇧ 1t , . . . ,⇧

Kt ) 2 �(X )

18

Defender’s Optimization Problem❖ Choose a control policy that solvesg : �(X ) ! U , g 2 G

min

g2GEg

( 1X

t=0

⇢tC(⇧t, Ut)��⇧0 = ⇡0

)

subject to Ut = g(⇧t)

⇧t+1 = T (⇧t, Yt+1, Ut)

❖ The above is a Partially Observable Markov Decision Process (POMDP)

19

Dynamic Programming Solution

V ⇤(⇡) = minu2U

n

C(⇡, u) + ⇢X

y2YP⇡,uy V ⇤(T (⇡, y, u))

o

❖ The optimal policy achieves the minimum expected discounted total cost.

❖ The Bellman equation for the corresponding value function is:

g⇤

V ⇤

for all . ⇡ 2 �(X )

dimensionality of this term is greatly reduced due to the monotonicity assumption

Dynamic Programming Solution

Reactive: captures belief of current system state based on previous states, actions, and observations.

Predictive: continuation cost incorporates the dynamics, taking into account the effect of future possible attacks and defense policies on the specification of the current defense decision.

21

V ⇤(⇡) = minu2U

n

C(⇡, u) + ⇢X

y2YP⇡,uy V ⇤(T (⇡, y, u))

o

❖ The optimal policy achieves the minimum expected discounted total cost.

❖ The Bellman equation for the corresponding value function

g⇤

V ⇤

for all . ⇡ 2 �(X )

ExampleAttributes:

1. Vulnerability in WebDAV on machine 1

2. User access on machine 1

3. Heap corruption via SSH on machine 1

4. Root access on machine 1

5. Buffer overflow on machine 2


7. Portscan on machine 2

8. Network topology leakage from machine 2





1 2

3

4

5

6

7

8

9

10

11

12

↵11,12

↵9,10

↵3,4

↵6,7

↵7,8

↵2,3

↵8,11

↵5,6

↵10,11

↵1,2

↵4,9

↵8,9

22

Example - CountermeasuresAttributes:

1. Vulnerability in WebDAV on machine 1

2. User access on machine 1

3. Heap corruption via SSH on machine 1




7. Portscan on machine 2

8. Network topology leakage from machine 2




12. Root access on machine 4u2

u1 : block WebDAV service: disconnect machine 2

Countermeasures:

1 2

3

4

5

6

7

8

9

1011

12

↵11,12

↵9,10

↵10,11

↵3,4

↵1,2

↵6,7

↵7,8

↵8,9

↵2,3 ↵4,9

↵8,11

↵5,6u1

u2

23

Example❖ The resulting POMDP contains

❖ 29-states/observations❖ 4 actions

❖ Solved using pomdp-solve*❖ We show three different information

states with different corresponding optimal actions

❖ The optimal policy is intuitive❖ Disable the service that

corresponds to the attribute that has a high probability of being enabled.

1 5 10 15 20 25 290

0.1

0.2

0.3

0.4

0.5

1 5 10 15 20 25 290

0.1

0.2

0.3

0.4

1 5 10 15 20 25 290

0.1

0.2

0.3

(a)

(b)

(c)

⇡(2)⌧

⇡(3)⌧

⇡(1)⌧

u⇤ = u1

u⇤ = u2

u⇤ = ?

24 *Cassandra 2003-15 (online)

Conclusion and Future Work❖ Summary

❖ Introduced a formal model (built upon Bayesian attack graphs) for defending a network

❖ Formulated the problem as a POMDP

❖ Solved the POMDP for a small example

❖ Future Work

❖ Scaling the problem

❖ Exact POMDP solvers only capable of handling small examples

❖ Realistic attack graphs are big!

❖ Complexity analysis?

❖ Structural results

❖ Directed acyclic graphs give rise to a natural partial order

❖ Can we use this to show threshold properties of the optimal policy?

25

Thank You!

Questions?

C

I A

1 2

3

4

5

6

7

8

9

1011

12

↵11,12

↵9,10

↵10,11

↵3,4

↵1,2

↵6,7

↵7,8

↵8,9

↵2,3 ↵4,9

↵8,11

↵5,6

u1

u2

u3

u4

Xi⌧ = 1

j

X l⌧ = 0

m

n

k

↵il

↵jl

↵kl

↵mk

↵nk

26

Funding Acknowledgments

❖ NSF — Foundations Of Resilient CybEr-physical Systems (FORCES)

Grant: CNS-1238962

❖ Special thanks to the following funding sources

❖ ARO MURI — Adversarial and Uncertain Reasoning for Adaptive Cyber Defense: Building the Scientific Foundations

Grant: W911NF-13-1-0421

27

Information State Update

t t+ 1

⇡t ut yt+1 ⇡t+1

attack spread

⇡t+ ⇡t++ ⇡t+++

t+ t++ t+++

⇡t+1 = T (⇡t, yt+1, ut) = T4(T3(T2(T1(⇡t, ut))), yt+1).

❖ Decompose the information state update into four steps

⇡t+ = T1(⇡t, ut) ⇡t++ = T2(⇡t+) ⇡t+++ = T3(⇡t++) ⇡t+1 = T4(⇡t+++ , yt+1)

❖ The update can then be written as the composition

(effect of action) (effect of attack) (effect of spread) (effect of observation)

28

optimal defense policies for...

Documents