worms, viruses, and cascading failures in networks

52
Worms, Viruses, and Cascading Failures in networks D. Towsley U. Massachusetts aborators: W. Gong, C. Zou (UMass) A. Ganesh , L. Massoulie (Microsoft)

Upload: maik

Post on 13-Jan-2016

18 views

Category:

Documents


3 download

DESCRIPTION

Worms, Viruses, and Cascading Failures in networks. D. Towsley U. Massachusetts. Collaborators: W. Gong, C. Zou (UMass) A. Ganesh , L. Massoulie (Microsoft). Internet as enabler of terrific apps. Internet as enabler of terrific apps … but also of malicious behavior - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Worms, Viruses, and Cascading Failures in networks

Worms, Viruses, and Cascading Failures in networks

D. TowsleyU. Massachusetts

Collaborators: W. Gong, C. Zou (UMass) A. Ganesh , L. Massoulie (Microsoft)

Page 2: Worms, Viruses, and Cascading Failures in networks

Internet as enabler of terrific apps

Page 3: Worms, Viruses, and Cascading Failures in networks

Internet as enabler of terrific apps

… but also of malicious behavior worms, viruses

Internet as a complex system critical DNS, BGP

infrastructures

Page 4: Worms, Viruses, and Cascading Failures in networks

Worms and failures Code Red worm

more than 360,000 infected in less than one day

disrupted parts of BGP infrastructure SQL Slammer

less than 15 minutes to infect 75,000 hosts congested parts of Internet

BGP errors in one network → cascade of faults in BGP in another network

Page 5: Worms, Viruses, and Cascading Failures in networks

Goals

what are appropriate models? deterministic stochastic

what makes worm/virus/failure virulent?

how does topology affect virulence?

Page 6: Worms, Viruses, and Cascading Failures in networks

Outline

worms, deterministic models cascading failures, stochastic

models summary

Page 7: Worms, Viruses, and Cascading Failures in networks

Worm spreading behavior

scan for vulnerable hosts sequential, random, topological uniform, local preference

virulence sensitive to scanning strategy host speed, bandwidth protocol …

Page 8: Worms, Viruses, and Cascading Failures in networks

Worm spreading model

address space, size

N vulnerable hosts

scan rate (per host),

N

Page 9: Worms, Viruses, and Cascading Failures in networks

Simple worm spreading model

I(t) - number of infected hosts at time t

Epidemic model:

with initial condition I(0)

))t(IN)(t(I)t(I

0.E+00

1.E+05

2.E+05

3.E+05

4.E+05

time

num

ber

inf

ecte

d h

osts

Page 10: Worms, Viruses, and Cascading Failures in networks

Code Red: model measurements from

two Class A networks scan rate I(t)

epidemic model matches increasing part of observed Code Red data (Staniford)

What about decrease?What about decrease? human

countermeasures congestion

Zou, etal, 20020

100000

200000

300000

400000

500000

600000

2 4 6 8 10 12 14 16 18

# of Scans( Eichman)

Model

04:00 09:00 14:00 19:00 00:000

100,000

200,000

300,000

400,000

500,000

600,000

# of scanstimetime

scan

rat

esc

an r

ate

D. GoldsmithK. Eichman

Page 11: Worms, Viruses, and Cascading Failures in networks

Assumptions

classic epidemic model ignore countermeasures ignore congestion

Code Red parameters = 358/min N = 360,000

uniform scan, 232

I(0) = 10 100s minutes to spread

0

50000

100000

150000

200000

250000

300000

350000

400000

0 100 200 300 400 500 600

Time (minutes)

No

. in

fect

ed

Page 12: Worms, Viruses, and Cascading Failures in networks

Worm virulence

increase increase I(0) decrease

Page 13: Worms, Viruses, and Cascading Failures in networks

Worm virulence

increase increase I(0) decrease smarter scanning

Page 14: Worms, Viruses, and Cascading Failures in networks

The perfect worm

perfect worm scan vulnerable nodes exactly once

flash worm (Staniford,…)

uniform scan of vulnerable nodes (N)

)()( tIN

tI

))()(()( tINtIN

tI

),min()( / NetI Nt

Page 15: Worms, Viruses, and Cascading Failures in networks

0.E+00

5.E+04

1.E+05

2.E+05

2.E+05

3.E+05

3.E+05

4.E+05

4.E+05

0 2 4 6 8 10 12 14Time (seconds)

No

. in

fec

ted perfect worm

with delay

flash worm

with delay

Perfect Code Red worm

I(0) = 10 = 358/min N = 360,000 all hosts infected

within 2 sec. add 2 sec. infection

delay -> six-fold slowdown random scan almost

perfect!

Page 16: Worms, Viruses, and Cascading Failures in networks

I(0) = 10 = 358/min N = 360,000 all hosts infected

within 2 sec. add 2 sec. infection

delay -> six-fold

slowdown random scan almost

perfect!

0.E+00

5.E+04

1.E+05

2.E+05

2.E+05

3.E+05

3.E+05

4.E+05

4.E+05

0 2 4 6 8 10 12 14Time (seconds)

No

. in

fec

ted perfect worm

with delay

flash worm

with delay

Perfect Code Red worm

0.E+00

5.E+04

1.E+05

2.E+05

2.E+05

3.E+05

3.E+05

4.E+05

4.E+05

0 100 200 300 400 500 600

Time (minutes)

No.

infe

cted

Page 17: Worms, Viruses, and Cascading Failures in networks

Hitlist, routing worms

hitlist worm increases I(0)

routing worm decreases BGP table information:

= .29 232

– 29% of IP address space

Page 18: Worms, Viruses, and Cascading Failures in networks

Hitlist, routing worms

Code Red style worm

= 358/min N = 360,000 hitlist, I(0) = 10,000

routing worm as effective as hitlist worm

hitlist/routing worm extremely virulent

0

50000

100000

150000

200000

250000

300000

350000

400000

0 100 200 300 400 500 600Time (minutes)

No

. in

fect

ed

Code Red worm

Hit-list worm

Routing worm

Hitlist routing worm

Page 19: Worms, Viruses, and Cascading Failures in networks

1

Local preference worm

K subnetworks p – probability scan

local subnet (1-p) – prob. scan

outside local subnet 2K

1-p

p

Page 20: Worms, Viruses, and Cascading Failures in networks

Local preference worm

Nk, no. vulnerable hosts in subnet k

Ik(t), no. infected hosts in subnet k

fits epidemic model for interacting groups

set of coupled ODEs

Page 21: Worms, Viruses, and Cascading Failures in networks

Local preference worm

K = 116 Nk = 360,000/K I1(0) = 10;

Ik(0) = 0, k>1 = 358/min

provides some of the locality of a routing worm

0

50000

100000

150000

200000

250000

300000

350000

400000

0 100 200 300 400 500 600

Time (minutes)

No.

infe

cted

Uniform worm

Routing worm

Preference p=0.1

Preference p=0.5

Preference p=0.99

Page 22: Worms, Viruses, and Cascading Failures in networks

Questions

topological worms

sequential scan

bandwidth constraints

Page 23: Worms, Viruses, and Cascading Failures in networks

topology?

failure recovery?

Page 24: Worms, Viruses, and Cascading Failures in networks

Topology and fast/slow recovery

model description

general network topologies conditions for fast-slow recovery

specific network topologies complete graphs (BGP routers) hypercubes (peer-to-peer networks) power-law graphs (Internet AS graph; E-

mail address book graph)

Page 25: Worms, Viruses, and Cascading Failures in networks

Susceptible-Infective-Susceptible

(SIS) epidemic model

Also known as contact process; see [Liggett]

topology: undirected, finite graph G=(V,E), connected ;

Xv = 1 if node v down (infected)

Xv = 0 if node v up (healthy)

Page 26: Worms, Viruses, and Cascading Failures in networks

Model

{Xv vV} Markov process on {0,1}V with jump rates:

Xv → 1 with rate w→vXw

Xv → 0 with rate

unique absorbing state at 0 all other states communicate, 0 is

reachable

Page 27: Worms, Viruses, and Cascading Failures in networks

Time to absorption system eventually recovers

how long does this take? T = time to hit 0 (from a given

initial condition)

how does E[T] depend on G?

Page 28: Worms, Viruses, and Cascading Failures in networks

Example

G = line segment or ring with n nodes Fix Theorem (Durrett and Liu): There is

critical c > 0 such that,

if c , then E[T] = O(log n)

if c , then log E[T] ≈ na

signature of phase transition in infinite 1-D lattice.

Page 29: Worms, Viruses, and Cascading Failures in networks

Fast recovery, spectral radius

- spectral radius of graph adjacency matrix, A; n=|V| .

Then, P(X(t) 0) ≤ c n½ exp([ -]t)

Hence, when < ,

Survival time T satisfies:

E(T) ≤ [log(n)+1]/[ - ]

Page 30: Worms, Viruses, and Cascading Failures in networks

Coupling proof

Consider “Branching Random Walk”, i.e.Markov process {Yv}vV

Yv → Yv +1 with rate w~v Yw = (AY)v

Yv → Yv -1 with rate Yv

Can couple processes so that, for all t,

X(t) ≤ Y(t).

Page 31: Worms, Viruses, and Cascading Failures in networks

Branching random walk bound

By “linearity” of Y, dE[Y(t)]/dt = ( A - I) Y(t),soE[Y(t)] = exp( A - I) Y(0) ; Use P(X(t) 0) ≤ vV E[Yv(t)]

Page 32: Worms, Viruses, and Cascading Failures in networks

Slow recovery

Graph isoperimetric constant:

|||),(|

min2/|:| S

SSEnSS

“perimeter”

“area”S

S

Page 33: Worms, Viruses, and Cascading Failures in networks

Generalized isoperimetric constant

|||),(|

inf|:| S

SSEmSS

m

Page 34: Worms, Viruses, and Cascading Failures in networks

Slow die-out and isoperimetric constant

Suppose for some m ≤ n/2, r := [

m] / > 1

Then, with positive probability, epidemics survive for time at least rm/[2m]

Hence, if m = n, survival time T satisfies

log (E[T]) = (na)

Page 35: Worms, Viruses, and Cascading Failures in networks

Coupling proof

Let |X| = v Xv .

Then |X| dominates process Z on {0,…,m} with transition rates:

z→ z+1 at rate z,z→ z-1 at rate z.

Then study absorption time for Z

Page 36: Worms, Viruses, and Cascading Failures in networks

Complete graph

Here, = n-1, m = n-m

By picking m = na, a < 1,

Thresholds: fast recovery if / < 1/(n-1)

slow recovery if /> 1/(n-na)

Page 37: Worms, Viruses, and Cascading Failures in networks

Hypercube {0,1}d

Here, d = log2(n) and = d

For m=2k, k < d, m = d-k

Hence, for k = d, Thresholds: ,

fast recovery if / < 1/d slow recovery if /> 1/[d(1-)]

Page 38: Worms, Viruses, and Cascading Failures in networks

Erdős-Rényi random graph

edge between each pair of nodes present with probability pn independent of others

dense: dn := npn = Ω(log n)

then ρ ~ ~ dn with high probability

Page 39: Worms, Viruses, and Cascading Failures in networks

Star network

spectral radius: n1/2

isoperimetric constant: m = 1 for all m < n/2

general results not useful

Specialized analysis yields: for arbitrary constant c > 0, if < c/n1/2,

fast recovery, E[T] = O(log(n))

if /> na-1/2 , for a > 0,slow recovery, log(E[T]) = (na)

Page 40: Worms, Viruses, and Cascading Failures in networks

Power-law random graph

Power-law graph with exponent : number of degree k vertices k-

E.g. Internet AS graph with = 2.1

Expected degree PLRG [Chung et al]: expected degrees w1 > ··· > wn :

edge (i,j) present w.p. wi wj/k wk

particular choice: wi = c1(i+c2)-1/( -1)

Page 41: Worms, Viruses, and Cascading Failures in networks

Power-law random graph (2)

Spectral radius of PLRG [Chung et al.,03]:

Denote by m max. expected degree (m=w1), and by d average of expected degrees.

Then:

522

523 .),(

.),()(

m

mA

Page 42: Worms, Viruses, and Cascading Failures in networks

PLRG, > 2.5

Epidemics on full graph live longer than on sub-graph.

Look at star induced by node 1:slow die-out for / > m-1/2

Compare to spectral radius condition:

Fast die-out for / < m-1/2

Two thresholds differ by m ; same gap as for star

Page 43: Worms, Viruses, and Cascading Failures in networks

PLRG, 2 < < 2.5

Consider top N nodes, for suitable N;

Erdős-Rényi core, with isoperimetric constant:

= F()

Gap between thresholds and : constant factor, F()

Page 44: Worms, Viruses, and Cascading Failures in networks

Open problems

gap between upper and lower bounds in sparse ER graphs power law random graphs for < 2.5

spectral radius bound tight in examples, always true?

conditioned on slow recovery, how many nodes are down at intermediate times?

extensions to other graphs and to SIR epidemics

Page 45: Worms, Viruses, and Cascading Failures in networks

Observations

neither parameter tight gap for topologies with diverse

degrees spectral radius “seems” to be right

nothing between log n and exp(n) ?

Page 46: Worms, Viruses, and Cascading Failures in networks

Hitlist, routing worms

hitlist worm increase I(0)

routing worm decrease BGP table information: = .29 232

– 29% of IP address space /8 aggregation: = .45 232

– 116 out of 256 possible 8 bit prefixes

0110…0xxx

8

Page 47: Worms, Viruses, and Cascading Failures in networks

The appearance of phase transitions

N=200, ks =1, kl=0.01Mean time to absorption goes down from 1047 , to about 0 in a matter of few states

Page 48: Worms, Viruses, and Cascading Failures in networks

Accuracy of fluid model

population: 360,000

scan rate = N(358/min, 1002) normal distr.

scanning space: 232

I(0) =1 100 simulations 0

50000

100000

150000

200000

250000

300000

350000

400000

time (minutes)

No

. In

fect

ed

Mean 95%

5%

Page 49: Worms, Viruses, and Cascading Failures in networks

Accuracy of fluid model

population: 360,000

scan rate = N(358/min, 1002) normal distr.

scanning space: 232

I(0) =10 100 simulations

0

50000

100000

150000

200000

250000

300000

350000

400000

1 100 200 300 400 500 600 700

time (minutes)

No

. In

fect

ed

Code Red worm

Mean

Page 50: Worms, Viruses, and Cascading Failures in networks

Accuracy of fluid model

population: 360,000

scan rate = N(358/min, 1002) normal distr.

scanning space: 232

I(0) =10 100 simulations

0

50000

100000

150000

200000

250000

300000

350000

400000

1 100 200 300 400 500 600 700

time (minutes)

No

. In

fect

ed

Mean 95%

5%

Page 51: Worms, Viruses, and Cascading Failures in networks

Local preference worm

- local scan rate

’- global scan rate

initial conditions Ik(0)

)()(')( tINtItItI kkkj

jkk

Page 52: Worms, Viruses, and Cascading Failures in networks

Erdős-Rényi random graph

edge between each pair of nodes present with probability pn independent of others

sparse: pn = c log(n)/n, c > 1. then ρ ≤ c’ log(n), ≥ c’’ log(n)

with high probability, for some c’’ < c < c’

dense: dn := npn = Ω(log n)

then ρ ~ ~ dn with high probability