magestate ©i distributed fusion in sensor networkswainwrig/papers/... · 2006-08-15 · approach...

14
© IMAGESTATE IEEE SIGNAL PROCESSING MAGAZINE [42] JULY 2006 1053-5888/06/$20.00©2006IEEE Distributed Fusion in Sensor Networks [ Mujdat Çetin, Lei Chen, John W. Fisher III, Alexander T. Ihler, Randolph L. Moses, Martin J. Wainwright, and Alan S. Willsky ] D istributed inference methods developed for graphical models comprise a principled approach for data fusion in sensor networks. The application of these methods, how- ever, requires some care due to a number of issues that are particular to sensor net- works. Chief of among these are the distributed nature of computation and deployment coupled with communications bandwidth and energy constraints typical of many sensor networks. Additionally, information sharing in a sensor network necessarily involves approximation. Traditional measures of distortion are not sufficient to characterize the quality of approximation as they do not address in an explicit manner the resulting impact on infer- ence which is at the core of many data fusion problems. While both graphical models and a distrib- uted sensor network have network structures associated with them, the mapping is not one to one. All of these issues complicate the mapping of a particular inference problem to a given sensor net- work structure. Indeed, there may be a variety of mappings with very different characteristics with regard to computational complexity and utilization of resources. Nevertheless, it is the case that many of the powerful distributed inference methods have a role in information fusion for sensor networks. In this article we present an overview of research conducted by the authors that has [ A graphical models perspective ]

Upload: others

Post on 29-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

©IM

AG

ES

TAT

E

IEEE SIGNAL PROCESSING MAGAZINE [42] JULY 2006 1053-5888/06/$20.00©2006IEEE

Distributed Fusion in Sensor Networks

[Mujdat Çetin, Lei Chen, John W. Fisher III, Alexander T. Ihler,

Randolph L. Moses, Martin J. Wainwright, and Alan S. Willsky]

Distributed inference methods developed for graphical models comprise a principledapproach for data fusion in sensor networks. The application of these methods, how-ever, requires some care due to a number of issues that are particular to sensor net-works. Chief of among these are the distributed nature of computation anddeployment coupled with communications bandwidth and energy constraints typical

of many sensor networks. Additionally, information sharing in a sensor network necessarilyinvolves approximation. Traditional measures of distortion are not sufficient to characterize thequality of approximation as they do not address in an explicit manner the resulting impact on infer-ence which is at the core of many data fusion problems. While both graphical models and a distrib-uted sensor network have network structures associated with them, the mapping is not one to one.All of these issues complicate the mapping of a particular inference problem to a given sensor net-work structure. Indeed, there may be a variety of mappings with very different characteristics withregard to computational complexity and utilization of resources. Nevertheless, it is the case thatmany of the powerful distributed inference methods have a role in information fusion for sensornetworks. In this article we present an overview of research conducted by the authors that has

[A graphical models perspective]

Page 2: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

sought to clarify many of the important issues at the intersectionof these domains. We discuss both theoretical issues and prototypi-cal applications in addition to suggesting new lines of reasoning.

INTRODUCTIONFusion of information in interconnected sensor networks andthe design of inference algorithms for graphical models are farfrom synonymous lines of inquiry. That said, the evocative“message-passing’’ structure of many graphical model inferencealgorithms has motivated a number of research groups to exam-ine how methods for such models might be applied or adaptedto challenges that arise in fusion for sensor networks. A varietyof questions arise that fall outside the standard domain ofgraphical models. The result is a rich, new area of research thathas already shown its promise and offers more for the future.

Several reasons underlie the choice of graphical models foraddressing distributed fusion problems in sensor networks.First, a graphical model is well suited to capture the structureof a sensor network, which consists of nodes (for sensing, com-munication, and computation) as well as connections betweenthe nodes (for modeling statistical dependencies and/or com-munication links). Second, there has recently been significantprogress in the development and analysis of scalable inferencealgorithms on graphs. This includes recent work on analyzingand understanding the behavior of well-known classes of localmessage-passing algorithms on graphs that contain loops [14],[32]. There has also been considerable progress in developingnew message-passing algorithms with superior convergenceand accuracy properties; in some cases these algorithms haveprovable guarantees of convergence to optimal answers [18],[31], [33]. Third, the algorithms commonly used for perform-ing estimation in graphical models involve parallel message-passing operations that are not only efficient and scalable butalso well suited to parallel realization via physically distributedprocessors, a crucial requirement in sensor network applica-tions. Finally, graphical models provide a suitable frameworkfor the development and analysis of communication-con-strained versions of message-passing algorithms, which are ofconsiderable interest in the context of sensor networks giventheir severe power and energy limitations [15]; see also [7].Distributed inference in sensor networks under communica-tion constraints is a topic of much current interest. Aldosariand Moura [1] address the use of these algorithms for distrib-uted decision making as well as the effect of graph topology onconvergence rates. The message-passing algorithms studied inthis article, while closely related, are designed for statisticalestimation and fusion. Xiao et al. [37] consider random andnonrandom parameter estimation in networks in which sensorstransmit quantized measurements.

We describe our ongoing line of inquiry on this subject. Weuse two applications—self-localization in sensor networks anddistributed data association in multiobject tracking—first toshow how sensor network fusion problems can be cast as prob-lems of inference in graphical models and then to look moredeeply at how conservation of power through judicious use of

communications resources provides new insights not previouslyfound in the graphical model literature.

The need to conserve communications resources requiresthat the so-called “messages’’ found in inference algorithms forgraphical models be compressed and/or “censored’’ (i.e., nottransmitted). We describe results of such censoring methods forthe data association problem and of message approximationmethods for the self-localization problem. The method wedescribe for message censoring—one that corresponds to a verysimple and local fusion protocol—provides a sensor networkwith an adaptive fusion/communication mechanism that yieldsexcellent fusion performance with significantly reduced expen-diture of communications resources. This framework leads to aprincipled means for trading off communication load with theaccuracy of the resulting fused inference results. In addition,this research provided the motivation and basis for examiningthe effects of message “errors’’ (due to approximation, censor-ing, or transmission loss) on overall inference accuracy, aninquiry that had the surprising and important side benefit ofyielding the best known results on convergence of the mostwidely used inference algorithm for graphical models. Thesetopics are developed in far greater detail in a number of publica-tions that are referenced throughout this article.

GRAPHICAL MODELSWe begin with a brief discussion of graphical models, focusingon aspects related to inference and linking these to distributedinference in sensor networks in later sections. A graphicalmodel on an (undirected) graph, G = (V, E) (consisting of a vertex or node set V and edge set E ⊂ V × V) consists of a collection of random variables or vectors, X = {Xv, v ∈ V}, thatcollectively satisfy a Markov property with respect to G; specifi-cally, for any subset U of V, let XU = {Xv, v ∈ U}. The randomvector X is Markov with respect to G if for any partition of theV into disjoint sets A, B, C, in which B separates A and C (i.e.,all paths in G from A to C include vertices in B), the randomvectors XA and XC are conditionally independent given XB. Forthe “graph’’ associated with time series, i.e., consecutive pointsin time with each point connected to its immediate predecessorand successor, this corresponds to the usual notion of temporalMarkovianity (i.e., that the past and future are conditionallyindependent given the present). For general graphs, however,the Markov property requires a far richer set of conditionalindependencies and associated challenges in both specifyingsuch distributions and in performing inference using them. Byway of example, consider Figure 7(a) (used for subsequentanalysis) in which the variables x and y are conditionally inde-pendent given the variables w and z. Due to the edge between wand z, they are not conditionally independent given x and y.

The celebrated Hammersley-Clifford Theorem [2] providesa sufficient condition (necessary for strictly positive probabilitydistributions) for the form that the joint distribution musttake to be Markov with respect to G. Specifically, let C denotethe set of all cliques in G, where a subset of nodes C is a cliqueif it is fully connected (i.e., an edge exists between each pair of

IEEE SIGNAL PROCESSING MAGAZINE [43] JULY 2006

Page 3: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

nodes in C). The random vector X is Markov with respect to Gif (and only if for strictly positive probability distributions) itsdistribution admits a factorization as a product of functions ofvariables restricted to cliques

p(x) =∏

C∈C ψC(xC)

Z; Z �

∑x

∏C∈C

ψC(xC), (1)

where Z is the partition function, and the ψC(xC) are so-called compatibility functions. The logarithms of these com-patibility functions are commonly referred to as potentials orpotential functions.

For simplicity we will assume for the remainder of this arti-cle that each of the nonzero potentials (or equivalently eachcompatibility function in (1) that is not constant) is a functioneither of the variable at a single node of the graph (node poten-tials) or of the variables at a pair of nodes corresponding to anedge in E (edge potentials). In this case, (1) takes the form

p(x) =(∏

s∈V ψs(xs)) (∏

(s,t)∈E ψs,t(xs, xt))

Z. (2)

Note that any graphical model can be put into this form byappropriate node aggregation [2]. While all of the ideas that arepresented here can be extended to the more general case, pair-wise potentials are sufficient for the specific applications consid-ered in this article. Moreover, the communication interpretationof the so-called message-passing algorithms used herein aremore easily explained in this context.

As long as G is a relatively sparse graph, the factorizations (1)or (2) represent parsimonious means to describe the joint distri-bution of a large number of random variables, in the same waythat specifying an initial (or final) distribution and a set of one-step transition distributions is a compact way in which to speci-fy a Markov chain. Moreover, for many inference and estimationproblems (including those described in this article), such a spec-ification is readily available. The challenge, however, is thatunless the graph has very special properties, such as in the caseof Markov chains, the compatibility functions do not readilydescribe the quantities of most interest, such as the marginaldistributions of the variables at individual (or small sets of)nodes or the overall peak of the distribution jointly optimizedover all nodes. Indeed, for discrete-valued random variables thecomputation of such quantities for general graphs is NP-hard.

For graphs without loops (Markov chains and, more gener-ally, graphical models on trees), computation of the marginaldistributions is relatively straightforward. In this case, the nodeand pair-wise potentials of the joint distribution in (2) for anycycle-free graph can be expressed in terms of the marginal prob-abilities at individual nodes and joint probabilities of pairs ofnodes connected by edges [6], [32]

p(x) =∏s∈V

ps(xs)∏

(s,t)∈E

pst(xs, xt)

ps(xs)pt(xt). (3)

That is, ψs(xs) = ps(xs) [or ψs(xs) = ps(xs)p(ys|xs) whenthere is a measurement ys associated with xs] andψ(xs, xt) = (p(xs, xt)/p(xs)p(xt)) . Marginal probabilities canbe efficiently calculated in a distributed fashion by so-calledsum-product algorithms. Specifically, as shown in [26], themarginal probabilities at any node s in the graph can beexpressed in terms of the local potential ψs at node s, alongwith a set of so-called messages from each of its neighbors inthe set N (s) = {t ∈ V | (s, t) ∈ E}. The message from node t tonode s is a function Mts(xs) that (up to normalization) repre-sents the likelihood function of xs based on the subtree rootedat t and extending away from s. In particular, the marginal dis-tribution ps takes the form

ps(xs) ∝ ψs(xs)∏

t∈N (s)

Mts(xs). (4)

Furthermore, in the absence of loops, these messages are relatedto each other via a sum-product formula:

Mts(xs) ∝∑

xt

ψst(xs, xt)ψt(xt)∏

u∈N (t)\s

Mut(xt). (5)

The product operation embedded in the message computationfrom node t to node s combines the information in the subtreerooted at node t, by combining the likelihood information fromall neighbors of node t other than s with the local potential atnode t. This yields a likelihood function for the random variableXt at node t. This is then converted to a likelihood for the ran-dom variable Xs at node s by multiplying by the compatibilityfunction between these two nodes and then “summing’’ or inte-grating out the variable at node t in a fashion analogous to theChapman-Kolmogorov equation in a Markov chain.

Together (4) and (5) relating messages throughout the loop-free graph represent a set of fixed-point equations that can besolved in a variety of ways corresponding to different message-passing algorithms. For example, one can solve these equationsexplicitly, much as in Gaussian elimination, by starting at leafnodes, working inward toward a “root’’ node, and then propa-gating back toward the leaves; this is a generalization of two-pass smoothing algorithms for Markov chains. An alternative isto solve these equations iteratively; we begin with guesses(often taken simply to be constant) of all of the messages anditeratively update messages by substitution into the fixed-pointequations. Each step of this procedure involves passing the cur-rent guess of messages among neighboring nodes. While thereis great flexibility in how one schedules these messages, thehappy fact remains that after a sufficient number of iterations(enough so that information propagates from every node toevery other), the correct messages are obtained from which thedesired probabilities can then be computed.

Interestingly, for loop-free graphs, a variant of this approachalso yields the solution to the problem of computing the overallmaximum a posteriori (MAP) configuration for the entiregraphical model. For such graphical models, there is an

IEEE SIGNAL PROCESSING MAGAZINE [44] JULY 2006

Page 4: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

IEEE SIGNAL PROCESSING MAGAZINE [45] JULY 2006

alternative factorization of p(x) in terms of so-called max-mar-ginals. As their name would suggest, these quantities aredefined by eliminating variables through maximization (asopposed to summation); in particular, we define

qs(xs) := maxxu,u∈V\s

p(x1, . . . , xn) (6a)

qst(xs, xt) := maxxu,u∈V\{s,t}

p(x1, . . . , xn). (6b)

It is a remarkable fact that for a tree-structured graph, the distribution (1) can also be factorized in terms of these max-marginals, viz.,

p(x) ∝∏s∈V

qs(xs)∏

(s,t)∈E

qst(xs, xt)

qs(xs)qt(xt). (7)

Furthermore, there are equations analogous to those for thesum-product algorithm that show how these quantities can becomputed in terms of node potentials and messages, where thefixed-point equations involve maximization rather than summa-tion (yielding what are known as max-product algorithms). Thesolution of these fixed-point equations can be computed via leaf-root-leaf message passing (corresponding to dynamic programming/Viterbi algorithms) or by iterative message pass-ing with more general message scheduling.

While the representation of marginal distributions or max-marginals at individual nodes in terms of messages holds only ifthe graph is loop free, (4) and (5) are well defined for any graph,and one can consider applying the sum-product algorithm toarbitrary graphs which contain loops (often referred to as loopybelief propagation); this corresponds to fusing informationbased on assumptions that are not precisely valid. For example,the product operation in (4) as well as (5) corresponds to thefusion of information from the different neighbors of a node, t,assuming that the information contained in the messages fromthese different neighbors are conditionally independent giventhe value of xt, something that is valid for trees but is decidedlynot true if there are loops.

Despite this evident suboptimality, this loopy form of thesum-product algorithm has been extremely successful in cer-tain applications, most notably in decoding of low-density pari-ty check codes [20], [28], which can be described by graphswith long cycles. In contrast, sensor network applications (aswell as others) involve graphs with relatively short cycles. Thishas led to a considerable and still growing body of literature onthe analysis of these algorithms on arbitrary graphs as well asthe development of new ones that yield superior performance.For arbitrary loopy graphs, the reparameterization perspectiveon these algorithms [32], in conjunction with a new class ofefficiently computable bounds [34] on the partition function Z,provide computable bounds on the error incurred via applica-tion of sum-product to loopy graphs. Similar analysis is also

applicable to the max-product updates [9], [31]. The fact thatthe max-product algorithm may yield incorrect (i.e., non-MAP)configurations motivates the development of a new class oftree-reweighted max-product algorithms (TRMP) [33] forwhich, in sharp contrast with the ordinary max-productupdates, there is a set of testable conditions for determining ifthe solution is indeed the MAP configuration. Tight perform-ance guarantees can be provided for TRMP for specific classesof graphs [8], [18]. More broadly, we refer the reader to variousresearch and survey papers, e.g., [23], [35], [38], as well as cita-tions at the end of this article, which provide only a samplingof this rapidly growing literature.

NONPARAMETRIC BELIEF PROPAGATIONWe next provide a brief discussion of one specific contributionthat plays a role in what follows; specifically, whether applied toloop-free or loopy graphs, message-passing algorithms corre-sponding to the iterative solution of (5) require the transmissionof full likelihood functions, each of which is a function of thevariable at the receiving node. In the case of discrete randomvariables, this corresponds to the transmission of a vector ofnumbers, while for Gaussian models, these messages can bereduced to the transmission of means and covariances. However,for non-Gaussian continuous random variables, as commonlyoccur in many sensor network applications, one must transmit arepresentation of an entire function Mts(xs). A commonapproach is simply to discretize the underlying continuous vari-ables often leading to unwarranted computational complexity(and communication overhead for sensor networks). An alterna-tive, developed recently, is nonparametric belief propagation(NBP), representing a significant generalization of particle filter-ing methods for Markov chains [29].

Particle filtering for Markov chains also involves an equa-tion similar to (5), with the important distinction that there isno “product’’ as there is only one neighbor of node t. The set ofparticles, representing samples from the single message corre-sponding to the product term in (5), are weighted by the localnode compatibility function (and perhaps resampled from theweighted particle representation). This step is followed by a“sum’’ operation by simulating the transition dynamics fromnode t to s, i.e., for each sample at node t we sample from thetransition distribution to generate a sample at node s. All ofthese steps, with one significant exception, apply equally well tomessage passing in general graphical models. The additionalcomplexity is that we have the product of particle-based mes-sages from several neighbors of node s, which requires that wemake sense of this product and then find an efficient method togenerate particles corresponding to this product.

As developed in [29], each particle-based message can beinterpreted as a nonparametric estimate of the likelihood function or probability density function corresponding to theexact message; e.g., if we use Gaussian kernels for these non-parametric densities, the set of particles corresponds to aGaussian mixture. Consequently, the problem of generatingsamples from the product of messages, with each represented

Page 5: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

by a set of particles, reduces to drawing samples from a den-sity that is the product of a set of Gaussian mixtures. Unlesswe are careful, however, we may have a geometrically grow-ing number of terms in these resulting Gaussian mixtures,making sampling intractable. The key to overcoming thisproblem is that of finding ways of sampling from this productwithout explicitly constructing the product. Generating sucha sample can be accomplished in two steps: choosing one ofthe product Gaussian kernels from which to sample (i.e., theset of labels corresponding tospecific terms in each of theGaussian mixtures in the prod-uct) and then drawing a samplefrom the Gaussian correspon-ding to that set of labels. Sincesampling from a Gaussian isstraightforward, the real chal-lenge is in sampling from thesets of labels of components, something that can be solvedvia importance and Gibbs sampling methods.

Moreover, dramatic speedups can be achieved through a mul-tiresolution representation utilizing k-dimensional (KD) trees, adata structure that plays a very important role in efficient codingand communication of messages. Specifically, suppose that thevariables at each node are KD real-valued vectors. Then, given aset of KD particles representing one of the messages in (5), weaggregate these into a multiresolution tree, starting with all of theparticles clustered together at the root of this tree. Proceedingdown the tree, at each node, we take whichever particles are clus-tered at that node and divide them into two subclusters (whichcorrespond to the children of this node) along one of the compo-nents of the vector (so that we cycle through these components aswe proceed down the tree). At the finest level we have individualparticles and hence individual “labels’’ for a term in the Gaussianmixture representation of this message. Sampling of labels canthen be accomplished very efficiently using this coarse-to-finerepresentation. We refer the reader to [11] for details. Note thathere, nonparametric refers to the sample-based representation ofthe messages, while the graphical model itself remains defined interms of the potential functions ψ (which are parametric in our

applications). In contrast, the paper by Predd et al. [27] discussesnonparametric, data-based models in sensor networks.

MAPPING TWO PROTOTYPICAL APPLICATIONS TO GRAPHICAL MODELSMapping fusion problems in sensor networks to problems ongraphical models might at first seem to be an obvious function,as there is a natural graph that exists, defined by the sensornodes and the intersensor communication structure. However,

it is the informational structure ofthe inference problem, involvingthe relationships between sensedinformation and the variablesabout which we wish to performestimation, that is just as criticalas the communication structureof the problem. We describe twosensor networks applications

here, focusing on the informational structure.

SELF-LOCALIZATION IN SENSOR NETWORKSA well-recognized problem for many sensor network applica-tions is that of sensor localization. Figure 1 illustrates such aproblem. Here each node corresponds to a sensor, and the ran-dom vector at that node describes sensor calibration informa-tion, such as its location, and also perhaps its orientation (e.g., ifthe node provides directional-sensing capability), and time offset(e.g., if intersensor time-of-flight measurements are used).While the framework we describe extends immediately to a gen-eral setting, we assume for simplicity that only the location vari-ables are of interest. We consider a case in which the availableinformation for estimating sensor locations consist of: i) uncer-tain prior information about the location of a subset of the sen-sors (e.g., if any of the sensors are provided with GPS); ii) theability of sensors to “hear” one another and attempt to measuretheir intersensor distance (typically only possible for nearby sen-sors); and iii) any distance measurements so obtained.

Specifically, let us denote by ρs(xs) the prior location probabil-ity distribution for sensor s, if any, let Pr(xs, xt) be the probabilityof obtaining a distance measurement between two sensors s, tlocated at xs and xt, and let ρL(lst|xs, xt) the probability distribu-tion of measuring a distance lst given that the true sensor positionsare xs and xt. Notice that all three sources of information involveonly the variables at single sensors or pairs of sensors; thus we mayuse a pairwise graphical model to describe the joint distribution ofsensor locations, with each node in the graph associated with one ofthe sensors and its variables of interest. One may immediately writethat the joint distribution has the form of (2), with

ψs(xs) = ρs(xs) (8a)

ψst(xs, xt) ={

Pr(xs, xt)ρL(lst); lst is observed1 − Pr(xs, xt); otherwise.

(8b)

The sensor localization problem is then precisely one ofcomputing the best estimates of all sensor locations given all

IEEE SIGNAL PROCESSING MAGAZINE [46] JULY 2006

[FIG1] Sensor localization: (a) the physical location of a collectionof sensors may be represented as (b) a graphical model inwhich nodes correspond to the sensor node location variablesand edges correspond to observed information, such as intersensor distance measurements from a subset of node pairs.

(a) (b)

THERE HAS ALSO BEEN CONSIDERABLEPROGRESS IN DEVELOPING NEWMESSAGE-PASSING ALGORITHMSWITH SUPERIOR CONVERGENCEAND ACCURACY PROPERTIES.

Page 6: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

IEEE SIGNAL PROCESSING MAGAZINE [47] JULY 2006

of this information. As an optimization problem this has beenwell studied by others (e.g., [24], [25], [30], often under theassumption that the distributions (ρs, ρL ) involved areGaussian so that its solution entails the nonlinear optimiza-tion of a quadratic cost function. However, by formulatingthis problem as one of inferencefor a graphical model (in whichlocalization corresponds to com-puting the marginal distribu-tions of variables at each node),we directly obtain message-pass-ing algorithms (such as sum-product) that distribute thecomputations across the net-work. Moreover, the graphical model formulation allows us toeasily include features which bring additional realism, accura-cy, and well-posedness [12], [13]. In particular, anomalousrange measurements can occur with nontrivial probability;this phenomenon is easily included through a simple modifi-cation of the edge potential likelihood functions to capturesuch a noise model. Also, location distributions for sensorscan be multimodal (e.g., receiving perfect range measure-ments from two neighboring sensors whose locations areknown perfectly yields two possible locations). The NBP for-mulation finds an estimate of the node location probabilitydistributions (as opposed to a point estimate such as the dis-tribution maximum); from which multimodal, non-Gaussian,or other characteristics are readily seen. Representing suchmultimodality at the individual node level is one of thestrengths of the NBP algorithm, and is far easier than dealingwith this at a centralized level.

The graphical model formulation of the localization prob-lem also provides a natural mechanism for including informa-tion about which pairs of sensors are able to measuredistance. This information can improve the well-posednessand reduce estimation error in the location estimates. Forexample, the topmost sensor node in Figure 1 forms a loca-tion estimate using distance measurements from two veryclosely located sensors. Thus, the distribution for the locationestimate will have large values in an entire circle centered atthese two other sensors. However, if we also account for thefact that the node on the other side of these two sensors can-not hear (or can hear only with small probability) the topmostsensor, this circular ambiguity is considerably reduced.Dealing with the absence of a measurement as an indicator ofbeing more distant from each other can be difficult to accom-modate in the traditional optimization framework (for exam-ple, it is certainly not a simple quadratic cost). In ourgraphical model formulation, however, this information canbe included by adding edges to the graph. Specifically, weinclude 2-step (or more generally, n-step) edges, where a 2-step edge is one between sensors that are both heard by acommon sensor but cannot hear each other; see [13] fordetails. Including these additional edges can increase thecomplexity of inference slightly, although experimentally it

appears that a few edges are often sufficient to resolve most ofthe ambiguities that may arise.

The fact that the variables in our model are continuous—leading to our use of the particle-based NBP algorithm—raisesquestions unique to sensor networks, namely how best to make

use of scarce communicationresources. For example, howmany particles do we need tosend, and how can we encodethem efficiently? Moreover, howdo we adjust this as estimatesevolve dynamically (e.g., as ambi-guities due to multimodality indistributions resolve them-

selves)? These issues are considered later in this article.

MULTIOBJECT DATA ASSOCIATION IN SENSOR NETWORKSA second application for sensor networks is that of multisensor,multiobject tracking. This is a challenging problem even forcentralized algorithms, in large part because of the embeddedproblem of data association, i.e., of determining which measure-ments from different sensors correspond to the same object. Forsensor networks there are additional challenges, due to the needfor distributed implementation, but typical networks also havestructure; e.g., sensors have limited sensing range overlappingthe range of a limited number of other sensors. This suggestsnew approaches for solving data association problems that arecomputationally feasible and fully distributed.

Figure 2 depicts a notional example of the problem. Here, anumber of sensors cover a region of interest with overlappingareas of regard. A number of targets are located within theregion, each sensed by one or more sensors. In this case themapping of the inference problem to a graphical model is notunique, and different approaches present tradeoffs that lead toa very different solutions than in the well-studied case of cen-tralized multitarget tracking. In particular, as is well-docu-mented [22], the preferred centralized way in which toorganize the various data association hypotheses is that basedon so-called track hypotheses, which leads to data structures—and corresponding graphical models—in which, roughlyspeaking, the nodes correspond to targets.

In a sensor network, however, it is advantageous to organ-ize the representation around sensors rather than targets.For centralized processing, such measurement-orientedapproaches have been discarded for the same basic reasonthat purely sensor-based representations do not work here.In particular, consider a simple situation in which we knowhow many targets are present and we know which sensorssee which targets. If we wish to use a model in which thenodes are in one-to-one correspondence with the sensors,the variable to be estimated at each node is simply the asso-ciation vector that describes which measurement from thatsensor goes with which target, which measurements are falsealarms, and which targets in its area of regard it fails to

WE USE TWO APPLICATIONS TOSHOW HOW SENSOR NETWORKFUSION PROBLEMS CAN BE CAST

AS PROBLEMS OF INFERENCEIN GRAPHICAL MODELS.

Page 7: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

detect. The problem with such a graphical model is that ifmultiple targets are seen by the same set of sensors, the like-lihoods of these sets of associations (across sensors and tar-gets) are coupled, implying that in the representation in (1),we must include cliques of size larger than two. In principle,these cliques can be quite large, and it is precisely for thisreason that the association problem is NP-hard.

By taking advantage of the sparse structure of sensor net-works; i.e., the fact that each sensor has only a limited field ofview and thus has only a modest number of other sensors withwhich it interacts and small number of targets within its meas-urement range—one can readily construct a hybrid representa-tion comprised of two types of nodes. Sensor nodes capture theassignment of groups of measurements to multitarget nodes inaddition to assignments that do not have any multisensor/target contention. Multitarget nodes correspond to sets of tar-gets seen by the same set of three or more sensors. In such amodel, the variable at each sensor node captures the assign-ment of groups of measurements to each of these multitargetnodes as well as any assignments that do not have such a levelof multisensor/target contention. Moreover, the resultinggraphical model yields a representation as in (2) with only pair-

wise potentials [5]. Furthermore, while we have described theidea for the case in which we already know which targets areseen by which sets of sensors, it is also possible to formulategraphical models that deal with the problem of also determin-ing which targets are seen by which subsets of sensors. We doso by introducing virtual nodes representing regions of spacecorresponding to overlaps in areas of regard of multiple sen-sors, as shown in Figure 3. In addition, although we have dis-cussed the data association problem at a single time point herefor simplicity, the tracking problem is dynamic, and ourframework can be generalized to incorporate data from multi-ple time slices by using a multiple hypothesis tracking-likeapproach [4].

For the data association problem, both the computation ofmarginal probabilities and of the overall MAP estimate are ofinterest. The MAP estimate is of importance because it cap-tures consistency of association across multiple sensors andtargets (for example, capturing the fact that one measurementcannot correspond to multiple targets). As a result, algorithmssuch as sum-product and max-product are both of interest asis the issue of communications-sensitive message passing, atopic to which we turn in the next section.

IEEE SIGNAL PROCESSING MAGAZINE [48] JULY 2006

[FIG2] A snapshot of a typical data association scenario in a sensor network. Twenty five sensors (circle nodes) and the bearing-onlymeasurements (line segments) are shown. Each cluster of samples represents the prior position distribution of one target.

Page 8: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

IEEE SIGNAL PROCESSING MAGAZINE [49] JULY 2006

SOLVING INFERENCE PROBLEMS IN COMMUNICATION-CONSTRAINED NETWORKS The sensor network applications in the previous section are twoof many that can be naturally cast as problems of inference ingraphical models—at least in part, as there are other issues,including power conservation and careful use of scarce commu-nication resources, that must be considered. Using the two pre-vious applications as examples, we describe approaches todealing with such power and communication issues.

MESSAGE CENSORINGAs described in the preceding section, multiobject data associa-tion in sensor networks can be formulated as a problem eitherof computing the marginal probabilities or the overall MAP esti-mate for a graphical model whose variables are discrete and rep-resent various assignments of measurements to objects orspatial regions. In our work we have applied a variety of differentalgorithms to solve this problem, including the sum-productalgorithm [21] for the computation of approximate marginals,the max-product algorithm [31] for the computation of approxi-mate MAP estimates, and the TRMP algorithm [31] for the com-putation of the true MAP estimate.

One issue with all of these algorithms is that messages are,in principle, continually transmitted among all of the nodes inthe graphical model. In typical graphical model applicationssome type of global stopping rule is applied to decide when theinference iterations should be terminated. Also, it is often thecase that convergence behavior can depend strongly on themessage schedule—i.e., the order in which messages are creat-ed, transmitted, and processed. For sensor network applica-tions, convergence criteria or message schedules that requirecentralized coordination are out of the question. Rather, whatis needed are local rules by which individual nodes can decide,at each iteration, whether it has sufficient new information towarrant the transmission of a new message, with the under-standing that the receiving node will simply use the precedingmessage if it does not get a new one and will use a default valuecorresponding to a noninformative message if it has notreceived any prior message. This formulation also allows fortransmission erasures.

A simple local rule for message censoring is the following.first, we interpret each message as a probability distribution onthe state of the node to which the message is to be sent; easilyaccomplished by normalizing the message. We then computethe Kullback-Leibler divergence (KLD) between each messageand its successor

D(

Mkts‖Mk−1

ts

)=

∑xs

Mkts(xs)log

Mkts(xs)

Mk−1ts (xs)

(9)

as a measure of novel information and send Mkts only if

D(Mkts || Mk−1

ts ) exceeds a threshold ε. This procedure is com-pletely local to each node and provides for network adaptivity, asthese rules lead to data-dependent message scheduling; indeed,it is quite common for a node to become silent for one or more

iterations and then to restart sending messages as sufficientlynew information reaches it from elsewhere in the network.

When applying the above method to algorithms such assum-product, we observe that major savings in communica-tion (hence power) can be achieved without significant per-formance loss as compared to standard message passingalgorithms. An example is shown in Figure 4, where the dataare obtained by simulating tracking of 50 targets in a 25-sen-sor network and censored versions of max-product are readilycompared to max-product and TRMP. In the figure, the cen-soring threshold is varied. It can be seen that with certainthresholds the performance deviations from the max-product

[FIG3] (a) A piece of a partially organized sensor network, wherethe sensor-target coverage relationship is ambiguous. Twosensors with their surveillance regions (s1 and s2) and non-parametric representations of two target distributions (T2 andT2) are shown. The surveillance area is divided into three non-overlapping subregions (r1, r2, r3), each of which is covered by adistinct subset of sensors. (b) The graphical model for thescenario in (a); circles, squares, and triangles correspond tosensors, subregions, and targets, respectively.

S1 S2

r1 r2 r3

T1 T2

T2

r3

s2s1

r1

T1

r2

(a) (b)

[FIG4] Performance-communication tradeoff with varyingthresholds for message censoring. Inference performance isevaluated by the association error rate, which is defined as theratio of the number of measurements that are assigned towrong targets to the total number of measurements. Theamount of communication is defined as the number of messagessent by each node on average. Max–product (cyan) and TRMP(red) are plotted for comparison.

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Amount of Communication

Ass

ocia

tion

Err

or R

ate

103102101

Page 9: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

and the TRMP algorithms are very small, even though theamount of communication is dramatically reduced. Thisshows that censored message passing can provide significantcommunication savings togetherwith near-optimal performance.In addition, we have found exam-ples in which this algorithmyields better performance thanone without message censoring.We conjecture this is related tothe so-called “rumor propaga-tion’’ behavior of algorithmssuch as sum-product in whichthe repeated propagation of messages around loops leads toincorrect corroboration of hypotheses; by censoring mes-sages, this corroboration is attenuated. The communication-fusion performance tradeoff is examined more thoroughly inthe next section.

TRADING OFF ACCURACY FOR BITS IN PARTICLE-BASED MESSAGINGThe NBP approach to inference in graphical models involvesparticle-based representations of messages sent from node tonode. In the context of sensor networks this raises two ques-tions: i) how does one efficiently transmit such messages, and ii)how many particles does one need; i.e., how accurate a represen-tation of the true, continuous message is required and whattransmitted particles provide that level of accuracy?

These questions can be posed as follows. We are trying totransmit a probability distribution q(x) from one node toanother, where the representation is specified by a set of parti-cles {xi} that can be viewed as a set of independent, identicallydistributed samples from that distribution. While this seemslike a standard problem in transmission of information (see,for example, [10]), an important distinction here is neither theorder in which samples are transmitted nor the accuracy of thetransmission of those individual particles is important, as weare ultimately interested in the accuracy with which thereceiver can reconstruct q(x). The flexibility in ordering sam-ples immediately suggests protocols for transmission that canreduce the required numbers of bits. For example, for scalar

particles, one can order the particles from smallest to largestand encode only the differences xi+1 − xi . A significant sav-ings in bits can thus be achieved [15].

Alternatively, either for scalaror vector variables, we canemploy the KD tree structureused in NBP to provide a mul-tiresolution protocol, conceptu-ally illustrated in Figure 5. Theidea of the KD tree is to groupparticles in a binary tree struc-ture, starting from a root node inwhich all of the particles are

grouped together and refining the clustering of points at eachlevel of the tree by splitting each set into two subsets. By creat-ing simple density approximations at each node of the KD tree,such as a Gaussian with mean and variances obtained from thesamples associated with that node, we create a multiresolutionrepresentation of the distribution. This representation providesa direct means of trading off message accuracy with total bitstransmitted. In particular, any cut through this tree correspondsto an approximation of the full, finest scale distribution, q(x).The tree structure allows us to easily to estimate the K-L diver-gence between q(x) and any such approximation. Furthermore,the transmit protocol can take advantage of the structurebetween the means of the children and their parent in the treeand the fact that the covariances of the children are typicallysmaller than that of the parent, to efficiently encode thesemeans and covariances. Using a simple predictive encoder whichcaptures this information, we can readily compute the bit cost oftransmitting any particular approximation.

This structure allows one to adapt message transmission in avariety of ways. For example, specifying a desired message accu-racy leads directly to a specific cut through the tree and corre-sponding message approximation, which then in turn leads to aspecific protocol for efficient transmission of that approxima-tion. Alternatively one can specify an upper bound on bits to becommunicated and then determine the most accurate approxi-mation whose transmission satisfies that bound.

Figure 6 illustrates the type of result that such anapproach yields for the sensor localization problem, in this

case for a network consisting of 25sensors. Figure 6(a) depicts thetradeoff between message approxi-mation error (as measured by theKLD) and localization error (where,of course, there is a nonzero irre-ducible error even if messages aresent exactly). A curve such as thisallows the system designer to deter-mine the threshold on messageapproximation error to be used toachieve an acceptable level of local-ization performance. This threshold,then, in turn is used as the basis for

IEEE SIGNAL PROCESSING MAGAZINE [50] JULY 2006

[FIG5] Approximating a message (or density estimate) q(x) using a hierarchical, KD-treestructure. The same hierarchical structure can also be used to encode the approximation,providing a quantified tradeoff between message bits and approximation error.

x xx x x x xx

x xx x x x xx

x xx x xx x x

x xx x xx x x

x xx x xx x x

CAN WE DEVELOP ALGORITHMS THATUSE MORE MEMORY AND PERFORMMORE LOCAL COMPUTATION AND

THAT AS A RESULT REDUCE THENUMBER OF MESSAGES THAT

NEED TO BE SENT?

Page 10: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

IEEE SIGNAL PROCESSING MAGAZINE [51] JULY 2006

the adaptive message transmission procedure just described.Figure 6(b) illustrates how the entropy in successive messagesbehaves as localization iterations proceed. Initially, distribu-tions and messages may be multimodal; however, as sum-product iterations proceed and individual sensor locationscome into focus, this multimodality disappears. As a result, anadaptive algorithm as we have described may start with theneed to transmit multiple particles but over time will usecoarser, single-mode distributions, thereby saving bits.Combining this with message censoring, as described previ-ously, provides a communication resource-sensitive solutionto the sensor localization problem that trades off totalresources used for message errors incurred.

THE EFFECTS OF MESSAGE APPROXIMATIONThe methods described in the preceding section deal explicitlywith the issue of conserving communication resources by incur-ring message approximations and errors due to censoring or toparticle-based approximation. Of course, what one really wouldlike is to relate the expenditure of communication resources tothe accuracy of the fused estimates produced as a result of hav-ing used approximate messages. The missing piece required toyield this tradeoff is to relate message error size to the ultimateerrors in the resulting fused outputs. Ihler et al. [14] examinethis relationship in detail. Additionally, Xiao et al. [37] analyzethe effect of quantized sensor measurements (messages in theircontext) on estimation performance.

The analysis in [14] involves the use of two alternate metrics toquantify the difference between an exact and an approximate mes-sage. One of these is the KLD; the other is a measure of the dynam-

ic range in this difference. In particular, the dynamic range is

d(

Mts, Mts

)= sup

x,x′

(Mts(x)

Mts(x)

Mts(x′)Mts(x′)

)1/2

, (10)

which can be shown to be equivalent [14], in the log-domain, to

log d(

Mts, Mts

)= inf

αsup

x| log α + log Mts(x) − log Mts(x)|.

(11)

The measure d(·) has several very important properties. First, aswith KLD, it is insensitive to the (irrelevant) scaling of entire mes-sages. Second, the sup-norm on measurement error in the log-messages is bounded below by log d(·) and above by 2log d(·),enabling one to compute simple bounds on the effects of messageapproximation. Most importantly, d(·) satisfies conditions thatbound how this error propagates through the two steps in mes-sage generation contained in (5). Specifically, log d(·) satisfies asubadditivity condition with respect to the “product’’ operation inthe sum-product algorithm. Furthermore, one can compute abound on the mixing that results from the “sum’’ part of messagegeneration. If there is sufficient mixing associated with this tran-sition, the error is attenuated—the sum operation acts as a con-traction. It is straightforward to specify a bound on how the log ofdynamic range contracts. A particular measure of the strength ofa potential ψts that turns out to be analytically convenient (i.e.,one can derive bounds) is

S(ψts) = supa,b,c,d

supψs,ψt

ψts(a, b)ψt(a)ψs(b)

ψt(c)ψs(d)

ψts(c, d)(12)

[FIG6] Illustrating the basic ideas behind the adaptive message approximation and transmission procedure for network sensorlocalization. (a) depicts localization error as a function of message approximation error (in terms of the KLD). Such a curve allows thedesigner to set an approximation threshold as a function of the target localization accuracy desired; (b) illustrates the behavior ofmessage entropy as the sum-product localization iterations proceeds. As sensors become better localized, the message entropydecreases. Using whatever approximation threshold that has been specified, our method automatically adapts to this by sendingmultiple Gaussian components early on (to capture multimodality and higher entropy) but then sending simple and eventually singleGaussian components as location ambiguities are resolved over time.

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

0.12

0.14

0.16

0.06

0.08

0.10

0.18

0.2

Allowed Message Error (KL)(a)

Med

ian

Loca

lizat

ion

Err

or

1 2 3 4 5 6 7 8 9 10

5

5.5

4

4.5

6

6.5

BP Iteration(b)

Mes

sage

Ent

ropy

Page 11: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

and if one denotes the product of incoming messages and localpotential as M, one may show that

d(∑

ψtsM,∑

ψtsM)

≤ S(ψts)d(M, M) + 1

S(ψts) + d(M, M). (13)

These two conditions providethe basis for bounding how errorspropagate through multiple itera-tions of the sum-product algo-rithm. This propagation is mosteasily visualized through a com-putation tree, found by “unwrap-ping’’ a loopy graph into a tree as shown in Figure 7. Combiningthe subadditivity and contraction bounds, it is then possible tocompute a bound on the log of the dynamic range after anynumber of iterations of sum-product. Interestingly this proce-dure also provides important new results for sum-product with-out message approximations, namely the best conditions known

to date for algorithm convergence and bounds on the distancebetween fixed points of the algorithm [14].

Since the analysis yields bounds, there are cases in whichthe results it provides can be conservative. As an alternative,we have also developed an approximation (not a bound) built

on the same principle used inroundoff noise analysis of digitalfilters, namely that at each pointat which messages are approxi-mated, the log dynamic rangeerror can be modeled as a whitenoise perturbation. This thenleads to easily computed approx-

imations to the variance in this measure.Figure 8 shows two illustrations of these analyses for a case

in which errors are introduced by message quantization, one forrelatively weak edge potentials (ones for which convergence ofsum-product is guaranteed) and one for stronger edge potentials(typical of cases in which one would expect our strict bounds tobe conservative). These curves depict resulting error measuresas functions of the quantization error introduced at each stage.Note that even in the difficult case of strong potentials theapproximation provides an accurate (if somewhat conservative)estimate of resulting error. Furthermore, taken together withthe relationship between communication costs and messageapproximation error, we now have the elements needed for acomplete “audit trail’’ from bits used to message errors incurredand finally to resulting errors in the desired fused outputs.

OPEN QUESTIONS AND FUTURE DIRECTIONSWe have described a line of research that provides a bridge fromthe rich field of graphical models to the emerging field of data

IEEE SIGNAL PROCESSING MAGAZINE [52] JULY 2006

[FIG8] Maximum errors incurred as a function of the quantization error δ in a quantized implementation of sum–product. Thescatterplot indicates the maximum error measured in the graph for each of 200 Monte Carlo runs, as compared to our upper bound(solid) and approximation (dashed). (a) log S(ψ) = .25 and (b) log S(ψ) = 1.

Strict BoundStochastic EstimatePositive Corr. PotentialsMixed Corr. Potentials

max

log

d(M

,M)

101

100

10−1

10−2

10−3

101

100

10−1

10−2

10−3

10−3 10−2 10−1 100 10−3 10−2 10−1 100

δ δ(b)(a)

max

log

d(M

,M)

[FIG7] A (a) loopy graphical model and (b) its associated 3-levelcomputation tree with root node w. The messages received atnode w after three iterations of loopy sum-product in theoriginal graph is equivalent to the messages received at the rootnode of the tree after three (or more) iterations.

x y

z

w

(a)

y

z

z y

x y z

xwww

w

x

w

(b)

A WELL-RECOGNIZED PROBLEM FORMANY SENSOR NETWORK

APPLICATIONS IS THAT OF SENSORLOCALIZATION.

Page 12: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

IEEE SIGNAL PROCESSING MAGAZINE [53] JULY 2006

fusion for sensor networks. We have i) demonstrated how twoprototypical sensor network problems can be mapped to prob-lems of inference on graphical models, ii) discussed issues thatarise when one attempts to solvethese inference problems, iii)developed a particle-basedmethod for message passing aswell as communications-sensitivemessaging strategies; and iv) pre-sented an approach to analyzingthe impact of messaging errors(due to approximation, communication limitations, or simplemessage dropouts) on the resulting fusion results. While theseresults and lines of inquiry provide substantive results of impor-tance for sensor network applications, they are far from thecomplete story in moving from the confines of graphical modelinference to the domain of sensor networks. Indeed there are anumber of questions, each of which is the subject of currentresearch and raises new issues not seen in the graphical modelsliterature but that have strong conceptual ties to that literatureand that offer considerable promise for the future.

One such issue deals with how one maps inferenceresponsibility to sensor nodes. As we have already seen in thedata-association example, there is considerable flexibility inhow one maps a sensor network fusion problem to a graphi-cal model. The ones that we have chosen here have somenodes that are explicitly identified with individual sensornodes and other hidden or “virtual’’ nodes correspond togroups of objects or regions in space. Where do the infer-ence computations associated with these parts of the modelreside? Moreover, in a dynamic problem, as objects move, onewould expect that these hidden variables might “migrate,’’which raises a new resource allocation problem, namely thatsensor handoff, i.e., of deciding when and to whom theresponsibility for such nodes should be handed off. Such ahandoff involves communication—e.g., of particle-based rep-resentations of the object location and velocity—and theanalysis in this article provides tools to quantify the commu-nication cost/accuracy tradeoff of different handoff strategies.Having such costs allows one to develop dynamic strategiesnot only for querying sensors (since taking measurementsand transmitting messages use power) but also for handingoff fusion responsibility [36].

In addition, there is considerable motivation to consideralternative fusion algorithms to the ones based on sum-prod-uct or max-product. For example, one can imagine operationalstructures in which “seed’’ nodes initiate messaging, propagat-ing information radially outward, fusing information aroundthese radially expanding regions as they meet, and then propa-gating information back inward toward the seed nodes. Suchan algorithmic structure allows great flexibility (e.g., one canimagine allowing any sensor to act as a seed if it measuressomething of interest) and also leads to new algorithms withgreat promise for inference on graphical models more general-ly. We refer to reader to [17] for a first treatment of this

approach. Also, the computation tree interpretation of thesum-product algorithm allows one to clearly see the computa-tions that sum-product fails to make that a truly optimal algo-

rithm would—computations thatin essence take into account thedependencies between messagesthat sum-product neglects [16].This suggests another line ofresearch that focuses on one ofthe significant differencesbetween standard graphical

model inference problems and sensor networks. In particular,when viewed as a sensor network fusion algorithm sum-prod-uct has the property that it makes very little use of local nodememory and computational power (all that is rememberedfrom step to step are the most recent messages, and all thatare computed are essentially the sum-product computations).Can we develop algorithms that use more memory and per-form more local computation and that as a result reduce thenumber of messages that need to be sent? Several ideas alongthese lines are currently under investigation. Also, a standardfeature in wireless networks is the inclusion of header bits thatprovide information on the path a message has taken from ini-tiator to receiver. Can we take advantage of such header bits tocapture dependencies between messages so that they can thenbe used to fuse messages in a manner that is not as naïve asassuming conditional independence? Of course using suchheader bits for what we term informational pedigree meansthat there are fewer bits available for the actual message, sothat the message quantization error will be larger. How doesthe error incurred by such an increased quantization errorcompare to the additional fusion accuracy provided by provid-ing these pedigree bits? Current research building on what wehave presented here, is addressing this question.

Finally, there is the introduction of ideas from team theoryand decentralized decision making. In particular, while we havedescribed message approximation and censoring methods, theyare all based solely on the perspective of the sending node with-out explicit regard for the objectives (e.g., in terms of desiredfusion results) of the receiver. Taking such objectives intoaccount makes the problem of network fusion system designone of team decision making, a notoriously difficult problem ifone seeks truly optimal solutions. In [3], the authors describe aset of results that deal with a widely studied problem of distrib-uted detection in which the sensors in a network must providelimited numbers of bits, through a possibly uncertain channelto a “fusion center” which is then responsible for making anoverall decision. As discussed in [3], the fact that these sensorsact as a “team” has a significant effect on the signal processingstrategy employed by each individual sensor.

In [19] we consider a problem in which both the sensingand the decision making are distributed throughout the net-work—i.e., each node acts as a provider of information toother nodes, a receiver of information from others, and poten-tially also a decision maker for part of the overall inference

IN A SENSOR NETWORK, IT ISADVANTAGEOUS TO ORGANIZE THE

REPRESENTATION AROUNDSENSORS RATHER THAN TARGETS.

Page 13: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

problem. We examine all of this in the context of a loop-freedirected network and in the process expose several key ques-tions that we believe will fuel considerable research. The firstis that in the presence of communication limitations, and evenif the fusion responsibilities ofeach node have been specified(e.g., we have specified to whomresponsibility for inference onhidden variables will fall), thenodes must organize to specifywhat we term a fusion protocol,namely not only the rules bywhich a sending node decides when and what bits to transmit,based on the impact of those bits on the fusion decisions madeby other nodes subsequently affected by those transmitted bits,but also the specification of enough information at the receiv-ing node to know how to interpret the received information.Roughly speaking, the specification of such a protocol corre-sponds to the design of the free parts of a composite graphicalmodel, involving i) the variables the network senses and aboutwhich we wish to make inferences; ii) the variables that aretransmitted from node to node; and iii) the decisions made byeach node and their relationship to the external phenomenon,i.e., the overall objective function represented in distributedfashion. It is only the second of these that is generally at ourcontrol (assuming that the fusion responsibilities in iii) havebeen determined). Problems of this type can capture very sub-tle effects, ones that can be important but that also make clearthe intractability of full optimization and the need for newmethods of approximate solution, much as algorithms such assum-product represent principled suboptimal algorithms thatare in fact optimal for particularly nice (namely loop-free)graphs. For example, suppose that each node in a network hasthe possibility either not to communicate with its neighbor orto send 1 b, at a cost of some power. Once a network organiz-es—i.e., once the neighbor knows something about the rulethat the transmitting node will be using, then no news isindeed news, i.e., not receiving a bit from its neighbor tells anode something! This idea has already shown benefit in thesensor localization problem.

These questions and issues form just the tip of the icebergand make it clear that sensor network problems are truly ones ofinformation science in the large: they are not problems that arenicely separated into signal processing, communications, andoptimization problems; they are all of those at the same time.We are all in for a long, enjoyable, and educational ride.

AUTHORSMujdat Çetin ([email protected]) received the Ph.D.degree from Boston University in 2001 in electrical engineer-ing. Since 2001, he has been a member of the Laboratory forInformation and Decision Systems, Massachusetts Institute ofTechnology, Cambridge, where he is a research scientist. SinceSeptember 2005 he has also been an assistant professor atSabancı University, Istanbul, Turkey. His research interests

include statistical signal and image processing, inverse prob-lems, computer vision, data fusion, wireless sensor networks,biomedical information processing, radar imaging, and sensorarray signal processing. He is a Member of the IEEE.

Lei Chen ([email protected])received the B.S. degree fromTsinghua University, Beijing, in1997 and the M.S. degree fromChinese Academy of Sciences,Beijing, in 2000. Since then hehas been a Ph.D. student at theMassachusetts Institute of

Technology. He is currently an RA of the Stochastic SystemsGroup, LIDS. His research interest includes statistical inferenceand multiple target tracking and sensor networks.

John W. Fisher, III ([email protected]) received a Ph.D.degree in electrical and computer engineering from theUniversity of Florida, Gainesville, in 1997. He is currently aprincipal research scientist in the Computer Science andArtificial Intelligence Laboratory and affiliated with theLaboratory for Information and Decision Systems, both at theMassachusetts Institute of Technology. His research interestsinclude information theoretic approaches to signal processing,multimodal data fusion, machine learning, and sensor networks.He is a Member of the IEEE.

Alexander T. Ihler ([email protected]) received his B.S.degree from the California Institute of Technology in 1998and S.M. and Ph.D. degrees from the Massachusetts Instituteof Technology in 2000 and 2005 as a member of theStochastic Systems Group. He is currently a PostdoctoralScholar at the University of California, Irvine. His researchinterests include statistical signal processing, machine learn-ing, nonparametric statistics, distributed systems, and sensornetworks. He is a Member of the IEEE.

Randolph L. Moses ([email protected]) received the B.S.,M.S., and Ph.D. degrees in electrical engineering fromVirginia Polytechnic Institute and State University in 1979,1980, and 1984, respectively. Since 1985 he has been with theDepartment of Electrical Engineering, Ohio State University,where he is currently a professor. He was a summer facultyresearch fellow at Rome Air Development Center, New Yorkand a NATO postdoctoral fellow at Eindhoven University ofTechnology, The Netherlands. He spent sabbaticals at UppsalaUniversity, Sweden, the Air Force Research Laboratory inOhio, and the Massachusetts Institute of Technology. He wasan associate editor for IEEE Transactions on SignalProcessing and served on the IEEE Signal Processing SocietyTechnical Committee on Statistical Signal and ArrayProcessing. He is coauthor with P. Stoica of Spectral Analysisof Signals (Prentice Hall, 2005). He is a member of EtaKappa Nu, Tau Beta Pi, Phi Kappa Phi, and Sigma Xi. Hisresearch interests are in stochastic signal processing, andinclude parametric estimation, array signal processing, sen-sor networks, and radar imaging. He is a Senior Member ofthe IEEE.

IEEE SIGNAL PROCESSING MAGAZINE [54] JULY 2006

THERE IS CONSIDERABLE FLEXIBILITYIN HOW ONE MAPS A SENSOR

NETWORK FUSION PROBLEM TO AGRAPHICAL MODEL.

Page 14: MAGESTATE ©I Distributed Fusion in Sensor Networkswainwrig/Papers/... · 2006-08-15 · approach for data fusion in sensor networks. The application of these methods, how-ever, requires

IEEE SIGNAL PROCESSING MAGAZINE [55] JULY 2006

Martin J. Wainwright ([email protected]) isan assistant professor at University of California at Berkeley,with a joint appointment between the Department ofStatistics and the Department of Electrical Engineering andComputer Sciences. He received his Ph.D. degree in electri-cal engineering and computer science from theMassachusetts Institute of Technology. His research interestsinclude statistical signal and image processing, coding andinformation theory, and machine learning. He has beenawarded an Alfred P. Sloan Foundation Fellowship, theGeorge M. Sprowls Prize for his dissertation research, aNatural Sciences and Engineering Research Council ofCanada 1967 Fellowship, and several outstanding conferencepaper awards.

Alan S. Willsky ([email protected]) joined M.I.T. in 1973 andis the Edwin Sibley Webster Professor of ElectricalEngineering. He was a founder of Alphatech, Inc. and chief sci-entific consultant, a role in which he continues at BAESystems Advanced Information Technologies. From 1998–2002he served on the U.S. Air Force Scientific Advisory Board. Hehas received several awards including the 1975 AmericanAutomatic Control Council Donald P. Eckman Award, the 1979ASCE Alfred Noble Prize, the 1980 IEEE Browder J. ThompsonMemorial Award, the IEEE Control Systems SocietyDistinguished Member Award in 1988, the 2004 IEEE DonaldG. Fink Prize Paper Award, and Doctorat Honoris Causa fromUniversitè de Rennes in 2005. He has delivered numerouskeynote addresses and is coauthor of the book Signals andSystems. His research interests are in the development andapplication of advanced methods of estimation and statisticalsignal and image processing. He is a Fellow of the IEEE.

REFERENCES[1] S.A. Aldosari and J.M.F. Moura, “Decision making in sensor networks: Topologyissues,” submitted for publication.

[2] P. Brémaud, Markov Chains, Gibbs Fields, Monte Carlo Simulation, andQueues. New York: Springer-Verlag, 1991.

[3] B. Chen, L. Tong, and P. Varshney, “Channel aware distributed detection in wire-less sensor networks,” IEEE Signal Processing Mag., vol. 23, no. 4, pp. 16–25, 2006.

[4] L. Chen, M. Çetin, and A.S. Willsky, “Distributed data association for multi–target tracking in sensor networks,” in Proc. Information Fusion 2005,Philadelphia, PA, 2005 [CD-ROM].

[5] L. Chen, M.J. Wainwright, M. Cetin, and A.S. Willsky, “Data association basedon optimization in graphical models with applications to sensor networks,” Math.Comput. Model. (special issue on optimization and control for military applica-tions), vol. 43, no. 9-10, pp. 1114-1135, May 2006.

[6] R.G. Cowell, A.P. Dawid, S.L. Lauritzen, and D.J. Spiegelhalter, “ProbabilisticNetworks and Expert Systems,” in Statistics for Engineering and InformationScience. New York: Springer-Verlag, 1999.

[7] E.B. Ermis, M. Alanyali, and V. Saligrama, “Search and discovery in an uncer-tain networked world,” IEEE Signal Processing Mag., vol. 23, no. 4, pp. 107–118,2006.

[8] J. Feldman, M.J. Wainwright, and D.R. Karger, “Using linear programming todecode binary linear codes,” IEEE Trans. Inform. Theory, vol. 51, pp. 954–972,Mar. 2005.

[9] W.T. Freeman and Y. Weiss, “On the optimality of solutions of the max-productbelief propagation algorithm in arbitrary graphs,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 736–744, 2001.

[10] M. Gastpar, M. Vetterli, and P.L. Dragotti, “Sensing reality and communicatingbits: A dangerous liaison,” IEEE Signal Processing Mag., vol. 23, no. 4, pp. 70–83,2006.

[11] A.T. Ihler, E.B. Sudderth, W.T. Freeman, and A.S. Willsky, “Efficient multiscale

sampling from products of Gaussian mixtures,” in Proc. Neural Inform. ProcessingSyst., 2003, vol. 17 [CD Rom].

[12] A.T. Ihler. “Inference in sensor networks: Graphical models and particle meth-ods,” Ph.D. dissertation, MIT, June 2005.

[13] A.T. Ihler, J.W. Fisher III, R.L. Moses, and A.S. Willsky, “Nonparametric beliefpropagation for self-localization of sensor networks,” IEEE J. Select AreasCommun., vol. 23, no. 4, pp. 809–819, Apr. 2005.

[14] A.T. Ihler, J.W. Fisher III, and A.S. Willsky, “Loopy belief propagation:Convergence and effects of message errors,” J. Mach. Learning Res., vol. 6, pp.905–936, May 2005.

[15] A.T. Ihler, J.W. Fisher III, and A.S. Willsky, “Using sample-based representa-tions under communications constraints,” submitted for publication.

[16] J.K. Johnson, D.M. Malioutov, and A.S. Willsky, “Walk-sum interpretation andanalysis of Gaussian belief propagation,” Adv. Neural Inform. Processing Syst., vol.18, 2006.

[17] J.K. Johnson and A.S. Willsky, “A recursive model-reduction method for approx-imate inference in Gaussian Markov random fields,” submitted for publication.

[18] V. Kolmogorov and M.J. Wainwright, “On optimality properties of tree-reweight-ed message-passing,” in Proc. Uncertainty Artificial Intell., July 2005 [CD-Rom].

[19] O.P. Kreidl and A.S. Willsky, “Inference with minimal communication: A decision-theoretic variational approach,” in Proc. Advances Neural Inform.Processing Syst., 2006, vol. 18 [CD-Rom].

[20] F. Kschischang, “Codes defined on graphs,” IEEE Signal Processing Mag., vol.41, no. 4, pp. 118–125, Aug. 2003.

[21] F. Kschischang, B. Frey, and H.-A. Loeliger, “Factor graphs and the sum–product algorithm,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498–519, Feb.2001.

[22] T. Kurien, “Issues in the design of practical multitarget tracking algorithms.in Multitarget-Multisensor Tracking: Advanced Applications, vol. 1, Y. Bar-Shalom,Ed. Norwood, MA: Artech House, 1990, pp. 43–83.

[23] H.A. Loeliger. “An introduction to factor graphs,” IEEE Signal ProcessingMag., vol. 21, no. 1, pp. 28–41, 2004.

[24] R. Moses, D. Krishnamurthy, and R. Patterson, “Self-localization for wirelessnetworks,” Eurasip J. Appl. Signal Processing, vol. 2003, no. 4, pp. 348–358, 2003.

[25] N. Patwari, A.O. Hero III, M. Perkins, N.S. Correal, and R.J. O’Dea, “Relativelocation estimation in wireless sensor networks,” IEEE Trans. Signal Processing,vol. 51, no. 8, pp. 2137–2148, Aug. 2003.

[26] J. Pearl, Probabilistic Reasoning in Intelligent Systems. San Mateo, CA:Morgan Kaufman, 1988.

[27] J.B. Predd, S.R. Kulkarni, and H.V. Poor, “Distributed learning in wireless sen-sor networks,” IEEE Signal Processing Mag., vol. 23, no. 4, pp. 56–69, 2006.

[28] T. Richardson and R. Urbanke, “The capacity of low-density parity check codesunder message-passing decoding,” IEEE Trans. Inform. Theory, vol. 47, pp.599–618, Feb. 2001.

[29] E.B. Sudderth, A.T. Ihler, W.T. Freeman, and A.S. Willsky, “Nonparametricbelief propagation,” Comput. Vis. Pattern Recognit., vol. 1, pp. 609–612, 2003.

[30] S. Thrun, “Affine structure from sound,” in Advances in Neural InformationProcessing Systems, 18, Y. Weiss, B. Schölkopf, and J. Platt, Ed. Cambridge, MA:MIT Press, 2006, pp. 1355–1362.

[31] M.J. Wainwright, T. Jaakkola, and A.S. Willsky, “Tree consistency and boundson the performance of the max–product algorithm and its generalizations,” Statist.Computing, vol. 14, pp. 143–166, Apr. 2004.

[32] M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky, “Tree–based reparameteriza-tion analysis of sum–product and its generalizations,” IEEE Trans. Inform. Theory,vol. 49, no. 5, pp. 1120–1146, May 2003.

[33] M. J. Wainwright, T. S. Jaakkola, and A. S. Willsky, “MAP estimation viaagreement on (hyper)trees: Message-passing and linear-programming approach-es.” IEEE Trans. Inform. Theory, vol. 51, no. 11, pp. 3697–3717, Nov. 2005.

[34] M.J. Wainwright, T.S. Jaakkola, and A.S. Willsky, “A new class of upper boundson the log partition function,” IEEE Trans. Inform. Theory, vol. 51, no. 7, pp.2313–2335, July 2005.

[35] M.J. Wainwright and M.I. Jordan, “A variational principle for graphical models,” in New Directions in Statistical Signal Processing, S. Haykin, J. Principe,T. Sejnowski, and J. McWhirter, Eds. Cambridge, MA: MIT Press, 2005.

[36] J.L. Williams, J.W. Fisher III, and A.S. Willsky. “An approximate dynamicprogramming approach to a communication constrained sensor managementproblem,” in Proc. 8th Int. Conf. Information Fusion, July 2005 [CD-Rom].

[37] J.-J. Xiao, A. Ribeiro, Z.-Q. Luo, and G.B. Giannakis. “Distributed compression-estimation using wireless sensor networks,” IEEE Signal Processing Mag., vol. 23,no. 4, pp. 26–41, 2006.

[38] J.S. Yedidia, W.T. Freeman, and Y. Weiss, “Constructing free energy approxi-mations and generalized belief propagation algorithms,” IEEE Trans. Inform.Theory, vol. 51, no. 7, pp. 2282–2312, July 2005. [SP]