optimal defense policies for...
TRANSCRIPT
University of Michigan - Ann Arbor
Erik MiehlingMohammad RasouliDemosthenis Teneketzis
Second ACM Workshop on Moving Target Defense (MTD 2015)Denver, Colorado — October 12-15, 2015
Optimal Defense Policies for Partially Observable Spreading Processes on Bayesian Attack Graphs
1
Motivation
❖ Factors from information security:
Confidentiality (C) — Ensuring data does not get into the wrong hands
Integrity (I) — Maintaining accuracy/trustworthiness of information
Availability (A) — Ensuring that data is always available to trusted users
❖ Modern systems are becoming increasingly connected to improve operational efficiency and flexibility
❖ This convenience comes at the cost of introducing many vulnerabilities
2
C
I A
CIA triad
Reactiveness — should respond to observed changes in the system condition
Predictiveness — forecast, and prepare for, where the system will be in the future
❖ Factors from MTD:
The Conflict Environment
❖ We consider a dynamic setting where a network is continually being subjected to attacks with the objective of compromising some target resources through exploits.
❖ The defender can control services in the network to prevent the attacker from reaching the target resources.
❖ Philosophy: Automated approach to defending the network.
❖ Defense actions are observation-driven
3
Contribution❖ Contribution: We introduce a formal model which allows us to
study dependencies between vulnerabilities in a system and to define (and compute) optimal defense policies.
❖ Aspects of our defense model❖ Progressive attacks — recent exploits build upon previous exploits,
progressively degrading the system❖ Dynamic defense — defender is choosing the best action based on
new information❖ Partial knowledge — the defender is uncertain about the security of
the network at any given time❖ It is both reactive and predictive
4
Attack Graphs❖ Insufficient to look at single
vulnerabilities when protecting a network
❖ Attackers combine vulnerabilities to penetrate the network
❖ Attack graphs model how multiple vulnerabilities can be combined and exploited by an attacker
❖ Explicitly takes into account paths that the attacker can take to reach the critical exploitation
❖ Can use tools such as CAULDRON* to generate attack graphs5 *Jajodia et al. 2010-11
Bayesian Attack Graphs
❖ Nodes, , represent attributesN
: critical (root) nodes : leaf nodes
❖ Bayesian attack graph: G = {N , ✓, E ,P}
1
2
3
45
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
↵12
↵13
↵14
NL = {1, 5, 7, 8, 11, 12, 16, 17, 20}NC = {9, 14} ✓ NR = {2, 9, 14, 18}
Attributes:
Types:
Exploits:
Probabilities:
❖ Types, : AND attributes
✓
❖ Edges, , represent exploits : exploits
❖ Probabilities, : exploit probabilities
: OR attributes
E
P
NL ✓ NNC ✓ NR ✓ N
E = (i, j)i,j2N
P = (↵ij)(i,j)2E
N^ ✓ N \ NL
N_ ✓ N \ NL
N^ = {2, 3, 6, 9, 10, 13, 18, 19}N_ = {4, 14, 15}
E = {(1, 2), (1, 3), . . . , (20, 19)}
P = {↵1,2,↵1,3, . . . ,↵20,19}
6
❖ The attacker’s behavior is assumed to follow a probabilistic spreading process
Spreading Process
t = ⌧At time :
probability that exploitwill be discovered/taken
by the attacker(public knowledge)
❖ Contagion seed and spread: At each time t
1. Each leaf attribute is enabled with probability
2. Contagion spreads according to predecessor rules
Xi⌧ = 1
j
X l⌧ = 0
k
↵il
↵jl
↵kl
Xit = 0 Xi
t = 1(disabled) (enabled)
❖ Each attribute (node) can be in one of two states
i 2 N
↵i 2 [0, 1]
7
❖ The type of the attribute dictates the nature of the spreading process
Spreading Process — Predecessor Rules
❖ For AND attributes, e.g. node l
set of direct predecessors
t = ⌧At time :
Xi⌧ = 1
j
X l⌧ = 0
m
n
k
↵il
↵jl
↵kl
↵mk
↵nk
P (X lt+1 = 1|X l
t = 0, Xt) =
8<
:
Q
p2Dl
↵pl if
^
p2Dl
Xpt = 1
0 otherwise
8
❖ The type of the attribute dictates the nature of the spreading process
Spreading Process — Predecessor Rules
❖ For AND attributes, e.g. node l
❖ For OR attributes, e.g. node k
set of direct predecessors
t = ⌧At time :
Xi⌧ = 1
j
X l⌧ = 0
m
n
k
↵il
↵jl
↵kl
↵mk
↵nk
P (X lt+1 = 1|X l
t = 0, Xt) =
8<
:
Q
p2Dl
↵pl if
^
p2Dl
Xpt = 1
0 otherwise
P (Xkt+1 = 1|Xk
t = 0, Xt) =
8<
:
1�Q
p2Dk
(1� ↵pk) if
_
p2Dk
Xpt = 1
0 otherwise
9
Defender’s Observations
❖ Defender thus observes a subset of enabled nodes that have been discovered at each time-step
disabled attribute
enabled & undetected
enabled & detected
❖ Defender only partially observes this process
Yt 2 {0, 1}N
❖ Rationale: defender may not know the full capability of the attacker at any given time
Assumption: No false positives can occur.
�i = P (Y it = 1|Xi
t = 1)probability of detection
P (Y it = 1|Xi
t = 0) = 0
10
Defender’s Countermeasures❖ The defender uses network services as countermeasures
❖ Existence of exploits depend on services
❖ For example❖ Secure Shell (SSH)
❖ File Transfer Protocol (FTP)
❖ Port scanning
❖ etc.
❖ Defender can thus temporarily block or disable these services to stop the attacker from progressing through the network
❖ We don’t consider the typical hardening strategy definition of removing all
11
Defender’s Countermeasures
u11
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
❖ Suppose there are a set of services
❖ Taking action corresponds to disabling service ❖ disables a subset of the
nodes
{u1, . . . , uM}M
mum
um
Wum
Xi = 0, i 2 Wum
Wu1 = {1}
12
ut 2 U = }({u1, . . . , uM})
❖ Action at time t
Defender’s Countermeasures
u2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
u2
{u1, . . . , uM}M
mum
um
Xi = 0, i 2 Wum
Wu2 = {5, 17}
13
❖ Suppose there are a set of services
❖ Taking action corresponds to disabling service ❖ disables a subset of the
nodes Wum
ut 2 U = }({u1, . . . , uM})
❖ Action at time t
Defender’s Countermeasures
u3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{u1, . . . , uM}M
mum
um
Xi = 0, i 2 Wum
Wu3 = {13}
14
❖ Suppose there are a set of services
❖ Taking action corresponds to disabling service ❖ disables a subset of the
nodes Wum
ut 2 U = }({u1, . . . , uM})
❖ Action at time t
Defender’s Countermeasures
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
u1
u2
u2u3
u4 u4
u4
u5
u5
u6
ut 2 U = }({u1, . . . , uM})
{u1, . . . , uM}M
mum
um
Xi = 0, i 2 Wum
❖ Action at time t
Assumption: All leaf nodes are covered by at least one service.
15
❖ Suppose there are a set of services
❖ Taking action corresponds to disabling service ❖ disables a subset of the
nodes Wum
Cost Function
❖ Confidentiality & Integrity factor: state is less costly than state
❖ Cost of taking action in state : u 2 U C(x, u)x 2 X
x
x
16
uu
C(·, u) < C(·, u)
❖ Availability factor: an action that has a higher negative impact on availability than another action should satisfy:
C(x, ·) < C(x, ·)x
x
Monotone States
Assumption: The only feasible states are monotone.
1
2 3
45
6
1
2 3
45
6
1
2 3
45
6
1
2 3
45
6
Assumption: The attack graph only contains AND nodes.
(monotone states under general topology)
(monotone states under AND topology)
simplifying assumption
1
2 3
45
6
1
2 3
45
6
1
2 3
45
6
17
Defender’s Information States❖ Define the history up to time t as
❖ We capture by an information stateHt
❖ Information state obeys the update rule T : �(X )⇥ Y ⇥ U ! �(X )
⇡t+1 = T (⇡t, yt+1, ut)
x1 x2 x3 x4 xK
. . . X
space of monotone states
Ht = (⇡0, U1, Y1, U2, Y2, . . . , Ut�1, Yt)
⇧ it = P (Xt = xi|Ht)
⇧t = (⇧ 1t , . . . ,⇧
Kt ) 2 �(X )
18
Defender’s Optimization Problem❖ Choose a control policy that solvesg : �(X ) ! U , g 2 G
min
g2GEg
( 1X
t=0
⇢tC(⇧t, Ut)��⇧0 = ⇡0
)
subject to Ut = g(⇧t)
⇧t+1 = T (⇧t, Yt+1, Ut)
❖ The above is a Partially Observable Markov Decision Process (POMDP)
19
Dynamic Programming Solution
V ⇤(⇡) = minu2U
n
C(⇡, u) + ⇢X
y2YP⇡,uy V ⇤(T (⇡, y, u))
o
❖ The optimal policy achieves the minimum expected discounted total cost.
❖ The Bellman equation for the corresponding value function is:
g⇤
V ⇤
for all . ⇡ 2 �(X )
dimensionality of this term is greatly reduced due to the monotonicity assumption
Dynamic Programming Solution
Reactive: captures belief of current system state based on previous states, actions, and observations.
Predictive: continuation cost incorporates the dynamics, taking into account the effect of future possible attacks and defense policies on the specification of the current defense decision.
21
V ⇤(⇡) = minu2U
n
C(⇡, u) + ⇢X
y2YP⇡,uy V ⇤(T (⇡, y, u))
o
❖ The optimal policy achieves the minimum expected discounted total cost.
❖ The Bellman equation for the corresponding value function
g⇤
V ⇤
for all . ⇡ 2 �(X )
ExampleAttributes:
1. Vulnerability in WebDAV on machine 1
2. User access on machine 1
3. Heap corruption via SSH on machine 1
4. Root access on machine 1
5. Buffer overflow on machine 2
6. Root access on machine 2
7. Portscan on machine 2
8. Network topology leakage from machine 2
9. Buffer overflow on machine 3
10. Root access on machine 3
11. Buffer overflow on machine 4
12. Root access on machine 4
1 2
3
4
5
6
7
8
9
10
11
12
↵11,12
↵9,10
↵3,4
↵6,7
↵7,8
↵2,3
↵8,11
↵5,6
↵10,11
↵1,2
↵4,9
↵8,9
22
Example - CountermeasuresAttributes:
1. Vulnerability in WebDAV on machine 1
2. User access on machine 1
3. Heap corruption via SSH on machine 1
4. Root access on machine 1
5. Buffer overflow on machine 2
6. Root access on machine 2
7. Portscan on machine 2
8. Network topology leakage from machine 2
9. Buffer overflow on machine 3
10. Root access on machine 3
11. Buffer overflow on machine 4
12. Root access on machine 4u2
u1 : block WebDAV service: disconnect machine 2
Countermeasures:
1 2
3
4
5
6
7
8
9
1011
12
↵11,12
↵9,10
↵10,11
↵3,4
↵1,2
↵6,7
↵7,8
↵8,9
↵2,3 ↵4,9
↵8,11
↵5,6u1
u2
23
Example❖ The resulting POMDP contains
❖ 29-states/observations❖ 4 actions
❖ Solved using pomdp-solve*❖ We show three different information
states with different corresponding optimal actions
❖ The optimal policy is intuitive❖ Disable the service that
corresponds to the attribute that has a high probability of being enabled.
1 5 10 15 20 25 290
0.1
0.2
0.3
0.4
0.5
1 5 10 15 20 25 290
0.1
0.2
0.3
0.4
1 5 10 15 20 25 290
0.1
0.2
0.3
(a)
(b)
(c)
⇡(2)⌧
⇡(3)⌧
⇡(1)⌧
u⇤ = u1
u⇤ = u2
u⇤ = ?
24 *Cassandra 2003-15 (online)
Conclusion and Future Work❖ Summary
❖ Introduced a formal model (built upon Bayesian attack graphs) for defending a network
❖ Formulated the problem as a POMDP
❖ Solved the POMDP for a small example
❖ Future Work
❖ Scaling the problem
❖ Exact POMDP solvers only capable of handling small examples
❖ Realistic attack graphs are big!
❖ Complexity analysis?
❖ Structural results
❖ Directed acyclic graphs give rise to a natural partial order
❖ Can we use this to show threshold properties of the optimal policy?
25
Thank You!
Questions?
C
I A
1 2
3
4
5
6
7
8
9
1011
12
↵11,12
↵9,10
↵10,11
↵3,4
↵1,2
↵6,7
↵7,8
↵8,9
↵2,3 ↵4,9
↵8,11
↵5,6
u1
u2
u3
u4
Xi⌧ = 1
j
X l⌧ = 0
m
n
k
↵il
↵jl
↵kl
↵mk
↵nk
26
Funding Acknowledgments
❖ NSF — Foundations Of Resilient CybEr-physical Systems (FORCES)
Grant: CNS-1238962
❖ Special thanks to the following funding sources
❖ ARO MURI — Adversarial and Uncertain Reasoning for Adaptive Cyber Defense: Building the Scientific Foundations
Grant: W911NF-13-1-0421
27
Information State Update
t t+ 1
⇡t ut yt+1 ⇡t+1
attack spread
⇡t+ ⇡t++ ⇡t+++
t+ t++ t+++
⇡t+1 = T (⇡t, yt+1, ut) = T4(T3(T2(T1(⇡t, ut))), yt+1).
❖ Decompose the information state update into four steps
⇡t+ = T1(⇡t, ut) ⇡t++ = T2(⇡t+) ⇡t+++ = T3(⇡t++) ⇡t+1 = T4(⇡t+++ , yt+1)
❖ The update can then be written as the composition
(effect of action) (effect of attack) (effect of spread) (effect of observation)
28