percolation thresholds of updated posteriors for tracking...

8
Percolation Thresholds of Updated Posteriors for Tracking Causal Markov Processes in Complex Networks Patrick L. Harrington Jr. * Bioinformatics Graduate Program Department of Statistics University of Michigan Ann Arbor, MI 48109 Alfred O. Hero III. Department of EECS Department of Statistics University of Michigan Ann Arbor, MI, 48109 Abstract Percolation on complex networks has been used to study computer viruses, epidemics, and other casual processes. Here, we present conditions for the existence of a network spe- cific, observation dependent, phase transi- tion in the updated posterior of node states resulting from actively monitoring the net- work. Since traditional percolation thresh- olds are derived using observation indepen- dent Markov chains, the threshold of the pos- terior should more accurately model the true phase transition of a network, as the up- dated posterior more accurately tracks the process. These conditions should provide in- sight into modeling the dynamic response of the updated posterior to active intervention and control policies while monitoring large complex networks. 1 INTRODUCTION Increasingly often, researchers are confronted with monitoring the states of nodes in large computer, so- cial, or power networks where these states dynamically change due to viruses, rumors, or failures that prop- agate according to the graph topology [2, 5, 8]. This class of network dynamics has been extensively mod- eled as a percolation phenomenon, where nodes on a graph can randomly “infect” their neighbors. Percolation across networks has a rich history in the field of statistical physics, computer science, and mathematical epidemiology. Here, researchers are typ- ically confronted with a network, or a distribution over the network topology, and extract fixed point attrac- tors of node configurations, thresholds for phase tran- sitions in node states, or distributions of node state * [email protected] [email protected] configurations [4, 1, 9]. In the field of fault detection, the nodes or edges can “fail”, and the goal is to ac- tivate a subset of sensors in the network which yield high quality measurements that identify these failures [12]. While the former field of research concerns it- self with extracting offline statistics about properties of the percolation phenomenon on networks, devoid of any measurements, the latter field addresses online measurement selection tasks. Here, we propose a methodology that actively tracks a causal Markov process across a complex network (such as the one in Figure 3), where measurements are adap- tively selected. We extract conditions such that the updated posterior probability of all nodes “infected” is driven to one in the limit of large observation time. In other words, we derive conditions for the existence of an epidemic threshold on the updated posterior dis- tribution over the states. The proposed percolation threshold should more ac- curately reflect the true conditions that cause a phase transition in a network, e.g., node status changing from healthy/normal to infected/failed, than tradi- tional thresholds derived from conditions on predic- tive distributions which are devoid of observations or controls. Since most practical networks of interest are large, such as electrical grids, it is usually infeasible to sam- ple all nodes continuously, as such measurements are either expensive or bandwidth is limited. Given these, or other resource constraints, we present an informa- tion theoretic sampling strategy that selectively tar- gets specific nodes that will yield the largest informa- tion gain, and thus, better detection performance. The proposed sampling strategy balances the trade- off between trusting the predictions from the known model dynamics (from percolation theory) and ex- pending precious resources to select a set of nodes for measurement. We present the adaptive measurement selection prob- arXiv:0905.2236v1 [stat.ML] 14 May 2009

Upload: others

Post on 16-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

Percolation Thresholds of Updated Posteriorsfor Tracking Causal Markov Processes in Complex Networks

Patrick L. Harrington Jr. ∗

Bioinformatics Graduate ProgramDepartment of StatisticsUniversity of MichiganAnn Arbor, MI 48109

Alfred O. Hero III. †

Department of EECSDepartment of StatisticsUniversity of MichiganAnn Arbor, MI, 48109

Abstract

Percolation on complex networks has beenused to study computer viruses, epidemics,and other casual processes. Here, we presentconditions for the existence of a network spe-cific, observation dependent, phase transi-tion in the updated posterior of node statesresulting from actively monitoring the net-work. Since traditional percolation thresh-olds are derived using observation indepen-dent Markov chains, the threshold of the pos-terior should more accurately model the truephase transition of a network, as the up-dated posterior more accurately tracks theprocess. These conditions should provide in-sight into modeling the dynamic response ofthe updated posterior to active interventionand control policies while monitoring largecomplex networks.

1 INTRODUCTION

Increasingly often, researchers are confronted withmonitoring the states of nodes in large computer, so-cial, or power networks where these states dynamicallychange due to viruses, rumors, or failures that prop-agate according to the graph topology [2, 5, 8]. Thisclass of network dynamics has been extensively mod-eled as a percolation phenomenon, where nodes on agraph can randomly “infect” their neighbors.

Percolation across networks has a rich history inthe field of statistical physics, computer science, andmathematical epidemiology. Here, researchers are typ-ically confronted with a network, or a distribution overthe network topology, and extract fixed point attrac-tors of node configurations, thresholds for phase tran-sitions in node states, or distributions of node state

[email protected][email protected]

configurations [4, 1, 9]. In the field of fault detection,the nodes or edges can “fail”, and the goal is to ac-tivate a subset of sensors in the network which yieldhigh quality measurements that identify these failures[12]. While the former field of research concerns it-self with extracting offline statistics about propertiesof the percolation phenomenon on networks, devoidof any measurements, the latter field addresses onlinemeasurement selection tasks.

Here, we propose a methodology that actively tracks acausal Markov process across a complex network (suchas the one in Figure 3), where measurements are adap-tively selected. We extract conditions such that theupdated posterior probability of all nodes “infected”is driven to one in the limit of large observation time.In other words, we derive conditions for the existenceof an epidemic threshold on the updated posterior dis-tribution over the states.

The proposed percolation threshold should more ac-curately reflect the true conditions that cause a phasetransition in a network, e.g., node status changingfrom healthy/normal to infected/failed, than tradi-tional thresholds derived from conditions on predic-tive distributions which are devoid of observations orcontrols.

Since most practical networks of interest are large,such as electrical grids, it is usually infeasible to sam-ple all nodes continuously, as such measurements areeither expensive or bandwidth is limited. Given these,or other resource constraints, we present an informa-tion theoretic sampling strategy that selectively tar-gets specific nodes that will yield the largest informa-tion gain, and thus, better detection performance.

The proposed sampling strategy balances the trade-off between trusting the predictions from the knownmodel dynamics (from percolation theory) and ex-pending precious resources to select a set of nodes formeasurement.

We present the adaptive measurement selection prob-

arX

iv:0

905.

2236

v1 [

stat

.ML

] 1

4 M

ay 2

009

Page 2: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

lem and give two tractable approximations to thissubset selection problem based upon the joint andmarginal posterior distribution, respectively. A setof decomposable Bayesian filtering equations are pre-sented for this adaptive sampling framework and thenecessary tractable inference algorithms for complexnetworks are discussed. We present analytical worstcase performance bounds for our adaptive samplingperformance, which can serve as sampling heuristicsfor the activation of sensors or trusting predictionsgenerated from previous measurements.

To the author’s knowledge, this is the first attemptto extract a percolation threshold of an actively moni-tored network using the updated posterior distributioninstead of the observation independent predictive dis-tributions.

2 PROBLEM FORMULATION

The objective of actively monitoring the n node net-work is to recursively update the posterior distributionof each hidden node state given various measurements.Specifically, the next set of m measurement actions(nodes to sample), m � n, at next discrete time arechosen such that they yield the highest quality of in-formation about the n hidden states. The conditionon m � n simulates the reality of fixed resource con-straints, where typically only a small subset of nodesin a large network can be observed at any one time.

Here, the hidden states are discrete random variablesthat correspond to the states encoded by the percola-tion process on the graph. Here, the graph G = (V, E),with V representing the set of nodes and E corre-sponding to the set of edges. Formally, we will as-sume a state-space representation of a discrete time,finite state, partially observed Markov decision process(POMDP). Here,

Zk = {Z1k , . . . , Z

nk } (1)

represents the joint hidden states, e.g., healthy or in-fected

Yk = {Y(1)k , . . . ,Y(m)

k } (2)

represents the m observed measurements obtained attime k, e.g., biological assays or PINGing an IP address,and

ak = {a1k, . . . , a

mk } (3)

represents the m actions taken at time k, i.e., whichnodes to sample. Here, Y(j)

k , continuous/categoricalvalued vector of measurements, which is induced byaction ajk, ajk ∈ A, with A = {1, . . . , n} confinedto be the set of all n individuals in the graph, andZik ∈ {0, 1, . . . , r}. Since the topology of G encodes thedirection of ”flow” for the process, the state equations

may be modeled as a decomposable partially observedMarkov process:

Yik = f(Zik) + wi

k (4)

Zik = h(Zik−1, {Zjk−1}j∈η(i)

). (5)

Here, η(i) = {j : E (Vi,Vj) /∈ ∅} is the neighborhoodof i, f(Zik) is a non-random vector-valued function, wi

k

is measurement noise, and h(Zik−1, {Zjk−1}j∈η(i)

)is a

stochastic equation encoding the transition dynamicsof the Markov process (see Figure 1 for a two nodegraphical model representation).

Zjk!1Zi

k!1

Yjk!1Yi

k!1

Yik Yj

k

Zik Zj

k

Figure 1: Partially Observed Markov Structure for iand j for E (Vi,Vj) /∈ ∅

2.1 BAYESIAN FILTERING

In our proposed framework for actively monitoring thehidden node states in the network, the posterior dis-tribution is the sufficient statistic for inferring thesestates. The general recursion for updating the jointposterior probability given all past and present obser-vations is given by the standard Bayes update formula:

p(Zk|Yk0) =

f(Yk|Zk)g(Yk|Yk−1

0 )p(Zk|Yk−1

0 ) (6)

with

p(Zk|Yk−10 ) =

∑z∈{0,1,...,r}n

p(Zk|Zk−1 = z)p(Zk−1 = z|Yk−10 ).

(7)The Chapman-Kolmogorov equations provide the con-nection between the posterior update (7) and thedistribution resulting from the standard percolationequations. In the former, the updates are conditionalprobabilities that are conditional on past observations,while in the latter, the updates are not dependent onobservations.

The local interactions in the graph G imply the follow-ing conditional independence assumptions:

f(Yk|Zk) =n∏i=1

f(Yik|Zik). (8)

Page 3: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

p(Zk|Zk−1) =n∏i=1

p(Zik|Zik−1, {Zjk−1}j∈η(i)

)(9)

where the likelihood term is defined in (4) and thetransition dynamics are defined in (5). This decom-posable structure allows the belief state (posterior ex-cluding time k observations) update, for the ith nodein G, to be written as:

p(Zk|Yk−10 ) =

∑z∈{0,1,...,r}‖pa‖

p(Zk|Zpak−1 = z)p(Zpa

k−1 = z|Yk−10 )

(10)with the parent set, pa = {η(i), i}. Unfortunately,for highly connected nodes in G, this marginal updatebecomes intractable. It thus must be approximated[3, 10, 7].

2.2 INFORMATION THEORETICADAPTIVE SAMPLING

In most real world situations, acquiring measurementsfrom all n nodes at any time k is unrealistic, andthus, a sampling policy must be exploited for mea-suring a subset of nodes [6, 12]. Since we are con-cerned with monitoring the states of the nodes in thenetwork, an appropriate reward is the expected in-formation gain between the updated posterior, pk =p(Zk|{Yi

k}i∈ak,Yk−1

0 ), and the belief state, pk|k−1 =p(Zk|Yk−1

0 ):

ak = argmaxa⊂AE[Dα({Yi

k}i∈a)|Yk−1

0

](11)

Dα({Yi

k}i∈a)

= Dα(pk||pk|k−1

), 0 < α < 1 (12)

with α-Divergence

Dα(p||q) =1

α− 1log (Eq [(p/q)α]) (13)

for distributions p and q with identical support.

The reward in (11) has been widely applied tomulti-target, multi-sensor tracking for many prob-lems including, sensor management and surveillance[6, 11]. Note that limα→1Dα(p||q) → DKL(p||q),where DKL(p||q) is the Kullback-Leibler divergencebetween p and q. The expectation in (11) is taken withrespect to the conditional distribution g(Yk|Yk−1

0 )given the previous measurements Yk−1

0 and actionsak. In practice, the expected information divergencein (11) must be evaluated via Monte-Carlo methods.Also, the maximization in (11) requires enumerationover all

(nm

)actions (for subsets of size m), and there-

fore, we must resort to Greedy approximations. Wepropose incrementally constructing the set of actionsat time k, ak, for j = 1, . . . ,m, according to:

ajk = argmaxi∈A\akE[Dα(Yik, {Yj

k}j∈ak

)|Yk−1

0

].

(14)

Both (11) and (14) are selecting the nodes to sam-ple which yield maximal divergence between the per-colation prediction distribution (belief state) and theupdated posterior distribution, averaged over all possi-ble observations. Thus (11) provides a metric to assesswhether to trust the predictor and defer actions untila future time or choose to take action, sample a node,and update the posterior.

2.2.1 Lower Bound on Expectedα-Divergence

Since the expected α-Divergence in (11) is not closedform, we could resort to numerical methods for esti-mating this quantity. Alternatively, one could specifyan analytical lower-bound that could be used in-lieu ofnumerically computing the expected information gainin (11) or (14).

We begin by noting that the expected divergence be-tween the updated posterior and the predictive dis-tribution (conditioned on previous observations) dif-fer only through the measurement update factor,fk/gk|k−1 ((11) re-written):

Egk|k−1

[Dα(pk||pk|k−1

)]]

= Egk|k−1

[1

α− 1log Epk|k−1

[(fk

gk|k−1

)α]]. (15)

So, if there is significant overlap between the likeli-hood distributions of the observations, the expecteddivergence will tend to zero, implying that there isnot much value-added in taking measurements, andthus, it is sufficient to use the percolation predictionsfor inferring the states.

It would be convenient to interchange the order of theconditional expectations in (15). It is easily seen thatJensen’s inequality yields the following lower boundfor the expected information gain

Egk|k−1

[Dα(pk||pk|k−1

)]≥ 1α− 1

log Epk|k−1

[Egk|k−1

[(fk

gk|k−1

)α]]. (16)

Here, the inner conditional expectation can be ob-tained from Dα

(fk||gk|k−1

), which has a closed form

for common distributions (e.g., multivariate Gaus-sians) [6].

3 ASYMPTOTIC ANALYSIS OFMARGINAL POSTERIOR

For tracking the percolation process across G, we havediscussed recursive updating of the belief state. How-ever, computing these updates exactly is in general in-tractable. For the remainder of the paper, we will use

Page 4: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

(4) and (5) to directly update the marginal posteriordistribution using the following matrix representation:

pk(z) = Dk(z)pk|k−1(z) (17)

with updated marginal posterior pk(z) =[p1,k(z), . . . , pn,k(z)]T with pi,k(z) = p(Zik =

z|Yik,Y

k−10 ), Dk(z) = diag

(f

(z)i,k /gi,k|k−1

),

and marginal belief state pk|k−1(z) =[p1,k|k−1(z), . . . , pn,k|k−1(z)]T with pi,k|k−1(z) =p(Zik = z|Yk−1

0 ).

Note that for i /∈ ak, (Dk(z))i,i = 1, and pi,k(z) =pi,k|k−1(z). Given that we can find an efficient wayof updating pk|k−1(z), according to the transition dy-namics (5), we can solve a modified version of (14), forj = 1, . . . ,m:

ajk = argmaxi∈A\akE[Dα(Yik

)|Yk−1

0

](18)

Dα(Yik

)= Dα

(pi,k(z)||pi,k|k−1(z)

), 0 < α < 1.

(19)

3.1 TOTAL DIVERGENCE OF UPDATEDPOSTERIOR

One interesting property of the Bayesian filtering equa-tions is that the updated posterior can be written asa perturbation of the predictive percolation distribu-tion through the following relationship (z omitted forclarity):

pk = Dkpk|k−1 = pk|k−1 + (Dk − I) pk|k−1. (20)

Hence, when the sensors do a poor job in discriminat-ing the observations, Dk ≈ I, we have pk ≈ pk|k−1. Itis of interest to determine when there is significant dif-ference between the posterior update and the prior up-date specified by the standard percolation equations.Recall that the updated posterior is, in the mean,equal to the predictive distribution, E

[pk|Yk−1

0

]=

pk|k−1. The total deviation of the updated posteriorfrom the percolation distribution can be summarizedby computing the trace of the following conditionalcovariance:

tr“Rhpk|Y

k−10

i”= (21)

tr

„E»“

pk − Ehpk|Y

k−10

i”“pk − E

hpk|Y

k−10

i”T|Yk−1

0

–«.

Using (20) and properties of the trace operator, weobtain the following measure of total deviation of theupdated posterior from the predictive distribution interms of fk and gk|k−1:

tr(R[pk|Yk−1

0

])= tr

(E[(Dk − I)2 |Yk−1

0

]Pk|k−1

)(22)

with Pk|k−1 = pk|k−1pTk|k−1. The conditional expec-

tation in (22) is the Pearson χ2 divergence betweendistributions fi,k and gi,k|k−1, for all i. This joint mea-sure of deviation is analytical for particular families ofdistributions and thus can be used as an alternativemeasure of divergence for activation of sensors [6].

3.2 PERCOLATION THRESHOLD OFUPDATED POSTERIOR

There has recently been significant interest in deriv-ing the conditions of a percolation/epidemic thresholdin terms of transition parameters and the graph ad-jacency matrix spectra for two state causal Markovprocesses [1, 4, 9]. Such thresholds yield conditionsnecessary for an epidemic to arise from a small numberof “infections”. Knowledge of these conditions are par-ticularly useful for designing “robust” networks, wherethe probability of epidemics is minimized.

Percolation thresholds are typically obtained by ex-tracting the sufficient conditions of the network andmodel parameters for the node states to be drivento their stationary point, with high probability. Theprobability of these events are usually computed usingthe observation independent percolation distribution[1, 4, 9].

We use the results in [1, 4] to derive a percolationthreshold based upon the updated posterior distribu-tion assuming a restricted class of two-state Markovprocesses. These conditions should more accuratelymodel the current network threshold since the poste-rior distribution tracks a particular “disease” trajec-tory better than the observation independent percola-tion distribution.

Formally, Zik ∈ {0, 1}, f(z)i,k = f(Yi

k|Zik = z) isthe conditional likelihood for node i, pi,k = p(Zik =1|Yi

k,Yk−10 ), and pi,k = p(Zik = 1|Yk−1

0 ). Here, wewill assume that Zk = 0 is the unique absorbing stateof the system.

The Bayes update for pi,k can be written as (i sub-script omitted for clarity):

pk =f

(1)k

f(1)k pk|k−1 + f

(0)k (1− pk|k−1)

pk|k−1

=f

(1)k /f

(0)k

1 + f(1)k −f

(0)k

f(0)k

pk|k−1

pk|k−1

=f

(1)k /f

(0)k

1 + ∆fk

f(0)k

pk|k−1

pk|k−1. (23)

There are three different sampling/observation depen-dent possibilities for each individual at time k: case

Page 5: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

(1), i is not sampled and therefore, pk = pk|k−1, case(2), ∆fk > 0, and case (3), ∆fk < 0.

We first derive a tight-upper bound for cases (2) and(3) of the form pk ≤ ck pk|k−1. For the remainder ofthe analysis we will assume that |∆fk

f(0)k

pk|k−1| < 1 for

cases (2) and (3) (see Appendix).

Using the upper-bounds derived in the Appendix, andafter gathering all n nodes, we have the followingelement-wise upper-bound on the updated belief state:

pk ≤ Ckpk|k−1 = (Bk +Ok) pk|k−1. (24)

with Bk = diag (bi,k) and Ok =

diag(

I{∆fi,k<0}O(|∆fi,k|f(0)i,k

pi,k|k−1

))where I{∆fi,k<0}

is the indicator function for the event ∆fi,k < 0.

Thus far, we have established, under the assumptionsof |∆fk

f(0)k

pk|k−1| < 1, an upper-bound for the updated

posterior in terms of observation likelihoods and thebelief state (24).

Next, consider the restricted class of two-state Markovprocesses on G, for which we can produce a bound ofthe form

pk|k−1 ≤ Spk−1 (25)

where S contains information about the transition pa-rameters and the topology of the network.

It turns out that the SIS model of mathematical epi-demiology falls within this restricted class of percola-tion problems [1].

The SIS model on a graph G, assumes that each of then individuals are in states 0 or 1, where 0 correspondsto susceptible and 1 corresponds to infected. At anytime k, an individual can receive the infection fromtheir neighbors, η(i), based upon their states at k− 1.

Under this SIS model in [1]

S = (1− γ)I + βA (26)

where the Markov transition parameters γ is the prob-ability of i transitioning from 1 to 0, β is the proba-bility of transmission between neighbors i and j, andA is the graph adjacency matrix (see Figure 2).

Returning to the derivation, using the bound (25), wehave, by induction, the following recursion:

pk ≤ Ckpk|k−1 ≤ CkSpk−1 ≤ (CkS · · ·C1S) p0

= (BkS · · ·B1S) p0 +OCkS (27)

where we have lumped the higher order modes andhigher order cross-terms into OCkS.

The dominant mode of decay of the updated poste-rior may be found by investigating the following eigen-decomposition:

BkS =

n∑j=1

bj,kejeTj

n∑j=1

λjujuTj

(28)

with ej = [0, . . . , 0, 1, 0, . . . , 0]T (1 at jth element).Without loss of generality, we can assume the eigen-values of S are listed in decreasing order, |λ1| ≥ · · · ≥|λn|. Now rewriting (28), we have

BkS =(bjkejke

Tjk

+OB) (λ1u1uT1 +OS

)=

(λ1bjkejke

Tjk

u1uT1 +OBS)

(29)

where bjk = maxj∈{1,...,n}bj,k and the OB ,OS ,OBSvariables corresponds to the higher order terms. In-serting (29) into (27), and matching the largest eigen-values of Bk with λ1 we obtain

pk ≤ (BkS . . .B1S) p0 +OCkS

=λk1

k∏l=1

bjl

(k∏l=1

(ejle

Tjlu1uT1

))p0 +O(ϕk).(30)

Thus, at large k, the dominant mode of the posteriorgoes as λk1

∏kl=1 bjl (the modes in O(ϕk) decay faster

than the dominant mode presented above).

We can see that if the spectral radius of S is less thanone, |λ1| < 1, then for large k, pk → 0, which is theunique absorbing state of the system.

This epidemic threshold condition on λ1 has been pre-viously established for unforced SIS-percolation pro-cesses [1]. However, in the tracking framework, therate at which the posterior decays to the susceptiblestate is perturbed by an additional measurement de-pendent factor,

∏kl=1 bjl .

This measurement-dependent dominant mode of theposterior should more accurately model the true dy-namic response of the node states better than thatin [1] since the posterior better tracks the truth thanthe unforced predictive distribution. Additionally, thisdominant mode of the updated posterior distributionallows one to simulate the response of the percolationthreshold to intervention and control actions which aredesigned to increase the threshold, such that the prob-ability of epidemics is minimized.

4 NUMERICAL EXAMPLE

Here, we present results of simulations of our adaptivesampling for the active tracking of a causal Markovground truth process across a random 200 node, scale-free network (Figure 3). Since the goal in tracking

Page 6: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

IS

!

1! !q1! q

I(1) I(|!|)

Figure 2: SIS Markov Chain for Node i Interactingwith the Infected States of its Neighbors

is to accurately classify the states of each node, weare interested in exploring the detection performanceas the likelihood of an epidemic increases through thepercolation threshold for this graph.

One would expect different phase transitions (thresh-olds) in detection performance for various samplingstrategies, ranging from the lowest threshold for un-forced percolation distributions to highest for a con-tinuous monitoring of all n nodes. We will present afew of these detection surfaces that depict these phasetransitions for the unforced percolation distribution,random m = 40 node sampling, and our proposed in-formation theoretic adaptive sampling of m = 40.

Here, we will restrict our simulations to the two-stateSIS model of mathematical epidemiology describedabove.

Figure 3: 200 Node Scale-Free Graph G = (V, E)

The sensor models (4), are of the form of two-dimensional multivariate Guassians with common co-variance and shifted mean vector. The transition dy-namics of the ith individual (5), for the SIS model isgiven by:

Zik|Z{i,η(i)}k−1 ∼ (1−γ)Zik−1+(1−Zik−1)

241−Yj∈η(i)

(1− βZjk−1)

35 .(31)

where Zik−1 ∈ {0, 1} is the indicator function of i be-ing infected at time k − 1. The transmission termbetween i and η(i) is known the Reed-Frost model

[1, 4, 8]. Since the tail of the degree distribution ofour synthetic scale-free graph contains nodes with de-gree greater than 10, updating (10) exactly is unreal-istic and we must resort to approximate algorithms.Here, we will assume the mean field approximationused by [1] for this SIS model, resulting in the fol-lowing marginal belief state update for the ith node ofinfected (Zik = 1):

pi,k|k−1 = (1−γ)pi,k−1+(1−pi,k−1)

"1−

Yj∈η

(1− βpj,k−1)

#.

(32)

Equation (32) allows us to efficiently update themarginal belief state directly for all n nodes which arethen used for estimating the best m measurements us-ing (18).

As we are interested in detection performance as afunction of time and epidemic intensity, the Area Un-der the ROC Curve (AUR) is a natural statistic toquantify the detection power (detection of the infectedstate). The AUR is evaluated at each time k, each SISpercolation intensity parameter

τ = β/γ (33)

and over 500 random initial states of the network. Forthe SIS model, τ is the single parameter (aside fromthe topology of the graph) that characterizes the in-tensity of the percolation/epidemic. It is useful tounderstand how the detection performance varies asa function of epidemic intensity, as it indicates howwell the updated posteriors are playing “catch-up” intracking the true dynamics on the network.

For this SIS model, the percolation threshold is de-fined as τc = 1/λ1(A) where λ1(A) = maxi∈{1,...,n}|λi|is the spectral radius of the graph adjacency matrix,A [1]. Values of τ greater than τc imply that any infec-tion tend to become an epidemic, whereas those valuesless than τc imply that small epidemics tend to die out.

For the network under investigation (Figure 3), τc =0.1819. We see from Figure 4(a) that a phase transi-tion in detection power (AUR) for the unforced per-colation distribution does indeed coincide with theepidemic threshold τc. While the epidemic thresholdfor the random and adaptive sampling policies is stillτc = 0.1819, the measurements acquired allow the pos-terior to better track the truth, but only up to their re-spective phase transitions in detection power (see Fig-ures 4(b) and 4(c)).

Figure 4(c) confirms that the adaptive sampling bet-ter tracks the truth than randomly sampling nodes,while pushing the phase transition in detection per-formance to higher percolation intensities, τ . We seethat the major benefit of the adaptive sampling is ap-parent when conditions of the network are changing

Page 7: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

010

2030

4050 0.0200

0.08330.1429

0.20000.2500

0.33330.6000

0.9000

0.50.60.70.80.9

TauTime

AUR

(a) AUR Surface for Unforced Prediction Distribution (no ev-idence acquired throughout the monitoring)

010

2030

4050

00.0200

0.08330.1429

0.20000.2500

0.33330.6000

0.9000

0.7

0.8

0.9

1

TauTime

AUR

(b) AUR Surface for Updated Posterior Distribution with m =40 Random Measurements at Each Time k

010

2030

4050 0.0200

0.08330.1429

0.20000.2500

0.33330.6000

0.9000

0.5

0.6

0.7

0.8

0.9

1

TauTime

AUR

(c) AUR Surface for Updated Posterior Distribution withm = 40 Information Theoretic Adaptive Measurements atEach Time k

Figure 4: Area under the ROC curve surface as a func-tion of percolation parameter τ = β/γ and time

moderately, at medium epidemic conditions. Beyonda certain level of percolation intensity, more resourceswill need to be allocated to sampling to maintain ahigh level of detection performance.

A heuristic sampling strategy based on the topologyof G was also explored (results not shown) by sam-pling the ”hubs” (highly-connected nodes). However,detection performance was only slightly better thanrandom sampling and poorer than our adaptive sam-

pling method.

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 101112

0

200

400

600

Degree

Time

Rel

ativ

e F

requen

cy

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 101112

0

200

400

600

Degree

Time

Rel

ativ

e F

req

un

ecy

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 101112

0

200

400

600

Degree

Time

Rel

ativ

e F

requen

cy

a.)

b.)

c.)

Figure 5: Relative Frequency of Nodes Sampled (z-axis) of a Given Degree (x-axis) Over Time (y-axis) form = 40 Adaptive Sampling Strategy: a.) τ = 0.125,b.) τ = 0.2143, and c.) τ = 0.5

It is often useful for developing sampling heuristics andoffline control/intervention policies to inspect whattype of nodes, topologically speaking, is the adaptivesampling strategy targeting, under various networkconditions (different values of τ). In Figure 5, therelative frequency of nodes sampled with a particulardegree is plotted against time (under the m = 40 adap-tive sampling strategy) for three different values of τ(over 500 random initial conditions of the network).

For the larger of the three values explored (τ = 0.5 >τc) we see that the sampling is approximately uniformacross the nodes of each degree on the graph (Figure5(c)). Therefore, under extremely intense epidemicconditions, the adaptive sampling strategy is target-ing all nodes of each degree equally, and therefore, itis sufficient to perform random sampling. For the twolower values of τ , Figure 5(a) and Figure 5(b) (nearτc), we see that adaptive policy targets highly con-nected nodes more frequently than those of lesser de-gree and thus, it is more advantageous to exploit sucha strategy, as compared to random sampling (see AURsurface in Figure 4(c)).

5 DISCUSSION

In this paper, we have derived the conditions for a net-work specific percolation threshold using expressionsfor the updated posterior distribution resulting fromactively tracking the process. These conditions recoverthe unforced percolation threshold derived in [1] butwith an additional factor involving sensor likelihoodterms due to measurements obtained throughout the

Page 8: Percolation Thresholds of Updated Posteriors for Tracking ...web.eecs.umich.edu/~hero/Preprints/PercolationThresholds.pdf · graphical model representation). Zj k! 1 Zi k! 1 Yj k!

monitoring. A term of the form λk1∏kl=1 bjl (derived

in (30)) was shown to be the dominant mode of theupdated posterior dynamic response to active inter-vention of immunizing the nodes (holding node statesconstant). The conditions of the percolation using theupdated posterior should more accurately model thephase transition corresponding to a particular diseasetrajectory and therefore, enable a better assessmentof immunization strategies and any subsequent obser-vations resulting from such actions. The frameworkpresented above, along with the new posterior perco-lation threshold, should provide additional insight intoactive monitoring of large complex networks under re-source constraints.

6 APPENDIX

In case (2), when ∆fk > 0, we can re-write (23) interms of an alternating geometric series:

pk =f

(1)k

f(0)k

∞∑l=0

(−1)l(|∆fk|f

(0)k

pk|k−1

)l pk|k−1

≤ f(1)k

f(0)k

[1 +|∆fk|f

(0)k

pk|k−1

]pk|k−1 (34)

where we have used the fact that 1/(1 + |a|) ≤ 1 + |a|.Recalling that p ≥ p2 for 0 ≤ p ≤ 1, we have

pk ≤f

(1)k

f(0)k

[1 +|∆fk|f

(0)k

]pk|k−1. (35)

In case (3), when ∆fk < 0, and (23) can be representedas a geometric series:

pk =f

(1)k

f(0)k

24 ∞Xl=0

|∆fk|f

(0)k

pk|k−1

!l35 pk|k−1

=f

(1)k

f(0)k

241 +|∆fk|f

(0)k

pk|k−1 +

∞Xl=2

|∆fk|f

(0)k

pk|k−1

!l35 pk|k−1.

(36)

Once again, using the p ≥ p2 bound, we obtain:

pk ≤f

(1)k

f(0)k

[1 +|∆fk|f

(0)k

]pk|k−1 +O

(|∆fk|f

(0)k

pk|k−1

).

(37)

A general inequality, that is equality for case (1), is ofthe form pk ≤ ck pk|k−1 with

bk =

1 , i /∈ akf(1)k

f(0)k

[1 + |∆fk|

f(0)k

],∆fk > 0 or ∆fk < 0

with ck = bk for cases (1) and (2) and ck = bk +

O(|∆fk|f(0)k

pk|k−1

)for case (3).

References

[1] D. Chakrabarti, Y. Wang, C. Wang,J. Leskovec, and C. Faloutsos, Epidemicthresholds in real networks, ACM Trans. Inf. Syst.Secur., 10 (2008), pp. 1094–9224.

[2] R. Cohen, S. Havlin, and D. ben Avra-ham, Efficient immunization strategies for com-puter networks and populations, Physical ReviewLetters, 91 (2003), p. 247901.

[3] A. Doucet, S. Godsill, and C. Andrieu,On sequential monte carlo sampling methods forbayesian filtering, Statistics and Computing, 10(2000), pp. 197–208.

[4] M. Draief, A. J. Ganesh, and L. Massoulie,Thresholds for virus spread on networks, Ann.Appl. Probab., 18 (2008), pp. 359–378.

[5] S. Eubank, H. Guclu, Anil, M. V. Marathe,A. Srinivasan, Z. Toroczkai, and N. Wang,Modelling disease outbreaks in realistic urban so-cial networks, Nature, 429 (2004), pp. 180–184.

[6] A. O. Hero, D. Castenon, D. Cochran, andK. D. Kastella, Foundations and applicationsof sensor management, Springer, NY, 2007.

[7] D. J. C. Mackay, Information Theory, Infer-ence & Learning Algorithms, Cambridge Univer-sity Press, 2002.

[8] M. E. J. Newman, The spread of epidemic dis-ease on networks, Physical Review E, 66 (2002),p. 016128.

[9] M. E. J. Newman, Threshold effects for twopathogens spreading on a network, Physical Re-view Letters, 95 (2005), p. 108701.

[10] B. Ng, L. Peshkin, and A. Pfeffer, Factoredparticles for scalable monitoring, in In Proceed-ings of the Eighteenth Conference on Uncertaintyin Artificial Intelligence, 2002, pp. 370–377.

[11] Z. Rabinovich and J. S. Rosenschein, Ex-tended markov tracking with an application tocontrol, in The Third International Joint Confer-ence on Autonomous Agents and Multiagent Sys-tems, 2004, pp. 95–100.

[12] A. X. Zheng, I. Rish, and A. Beygelzimer,Efficient test selection in active diagnosis via en-tropy approximation, in Proceedings of the 21thAnnual Conference on Uncertainty in ArtificialIntelligence, 2005, p. 675.