on distributing probabilistic inference
Post on 05-Jan-2016
77 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
On Distributing On Distributing Probabilistic InferenceProbabilistic Inference
Metron, Inc.
Thor Whalen
2
OutlineOutline
• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information
3
OutlineOutline
• Inference and distributed inference– Probabilistic inference– Distributed probabilistic inference– Use of distributed probabilistic inference
• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information
4
Probabilistic InferenceProbabilistic Inference
• “(Probabilistic) inference, or model evaluation, is the process of updating probabilities of outcomes based upon the relationships in the model and the evidence known about the situation at hand.” (research.microsoft.com)
• Let V = {X1, ..., Xn} be the set of variables of interest• We are given a (prior) probability space P (V) on V
• Bayesian Inference: Given some evidence e, we compute the posterior probability space P ‘=P (V|e) Evidence
5
Probabilistic InferenceProbabilistic Inference
• A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V• Such a table is commonly called a “potential”• A receives evidence → have P(e|A) → want P(A,B,C|e)
0.2240
0.0960
0.0120
0.0080
0.0540
0.1260
0.2880
0.1920
0.9000
0.3000
60.2240
0.0960
0.0120
0.0080
0.0540
0.1260
0.2880
0.1920
Probabilistic InferenceProbabilistic Inference
0.2240
0.0960
0.0120
0.0080
0.0540
0.1260
0.2880
0.1920
0.9000
0.3000
0.2016
0.0864
0.0036
0.0024
0.0162
0.0378
0.2592
0.1728
0.9000
0.9000
0.3000
0.3000
0.3000
0.3000
0.9000
0.9000
0.2585
0.1108
0.0046
0.0031
0.0208
0.0485
0.3323
0.2215
• A probability space on a finite set V of discrete variables can be represented as a table containing the probability of every combination of states of V• Such a table is commonly called a “potential”• A receives evidence → have P(e|A) → want P(A,B,C|e)• Assuming evidence e only depends on variable A, we have P(A,B,C|e) = P(A,B,C)P(e|A)
7
Distributed Probabilistic InferenceDistributed Probabilistic Inference• Several components, each inferring on a given subspace• Two components may communicate information about variables they have in common• Wish to be able to fuse evidence received throughout the system
Evidence
Evidence Probability
space
same
same
A A
B B
B B
C C
8
Use of Distributed Probabilistic InferenceUse of Distributed Probabilistic Inference
n
(assuming one million operations per second)
• Be able to implement probabilistic inference• Implement multi-agent systems:
- agents sense their environment and take actions intelligently- they can observe given variables of the probability space- they need to infer on other variables in order to take action- cooperate in exchanging information about the probability space
9
Use of Distributed Probabilistic InferenceUse of Distributed Probabilistic Inference• Be able to implement probabilistic inference• Implement multi-agent systems:
- agents sense their environment and take actions intelligently- they can observe given variables of the probability space- they need to infer on other variables in order to take action- cooperate in exchanging information about the probability space
Agent 1 Agent 2
10
OutlineOutline
• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information
11
Conditional independenceConditional independence
0.0120 + 0.0080
• Marginalize out “A” from P(A,B,C) to get P(A,B)
• P(C|A,B) = P(A,B,C)/P(A,B)
• P(C|A,B) = P(C|B)
12
• Marginalize out “A” from P(A,B,C) to get P(A,B)
Conditional independenceConditional independence
• P(C|A,B) = P(A,B,C)/P(A,B)
• P(C|A,B) = P(C|B)
Wait a minute Dr. Watson!
P(C|A,B) is a table of size 8, whereas P(C|B) is of size 4!
Eh, but I see only four distinct numbers
here, doc...
... and “entropicaly” is not a word!
Why you little!!This is entropicaly
impossible!
13
Conditional independenceConditional independence
• Marginalize out “A” from P(A,B,C) to get P(A,B)
• P(C|A,B) = P(A,B,C)/P(A,B)
• P(C|A,B) = P(C|B)i.e.
( , , ) ,
( | , )
( | )
a b c A B C
P C c A a B b
P C c B b
We say that A and C are conditionally independent given B
Insensative to A
14
OutlineOutline
• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability
– Bayes nets– Markov networks
• Using CI: Sufficient information
15
Bayes NetsBayes Nets
Note that
P(A,B,C,D,E) = P(B|A)P(C|A,B)P(D|A,B,C)P(E|A,B,C,D)P(A)
P(B) P(C|A) P(D|B,C) P(E|C)P(A)
P(B|A)P(C|A,B)P(D|A,B,C)P(E|A,B,C,D)P(A)
A B
DC
E
16
Bayes NetsBayes Nets• A Bayes Net is a representation of
a probability distribution P(V) on a set V=X1, ..., Xn of variables
17
Bayes NetsBayes Nets• A Bayes Net is a representation of
a probability distribution P(V) on a set V=X1, ..., Xn of variables
• A BN consists of – A Directed Acyclic Graph (DAG)
• Nodes: Variables of V
• Edges: Causal relations
A DAG is a directed graph with no directed cycles
The above directed graph is a DAG
Now this graph IS NOT a DAG because it has a directed cycle
Directed cycle
X1 X2 X3
X4 X5
X6
X7X8
X9 X10
X11 X12 X13
18
Bayes NetBayes Net• A Bayes Net is a representation of
a probability distribution P(V) on a set V=X1, ..., Xn of variables
• A BN consists of – A Directed Acyclic Graph (DAG)
• Nodes: Variables of V
• Edges: Causal relations
– A list of conditional probability distributions (CPDs); one for every node of the DAG
X1 X2 X3
X4 X5
X6
X7X8
X9 X10
X11 X12 X13
Etc...
19
- i.e. P(A , C | B) = P(A | B) P(C | B)
Bayes NetsBayes Nets• A Bayes Net is a representation of
a probability distribution P(V) on a set V=X1, ..., Xn of variables
• A BN consists of – A Directed Acyclic Graph (DAG)
• Nodes: variables of V• Edges: Causal relations
– A list of conditional probability distributions (CPDs); one for every node of the DAG
• The DAG exhibits particular (in)dependencies of P(V)
X1 X2 X3
X4 X5
X6
X7X8
X9 X10
X11 X12 X13
A C
BA and are independent given
B
C
- i.e. P(C | A, B) = P(C | B)
We say that B separates A and C
20
Bayes NetsBayes Nets• A Bayes Net is a representation of
a probability distribution P(V) on a set V=X1, ..., Xn of variables
• A BN consists of – A Directed Acyclic Graph (DAG)
• Nodes: variables of V
• Edges: Causal relations
– A list of conditional probability distributions (CPDs); one for every node of the DAG
• The DAG characterizes the (in)dependency structure of P(V)
• The CPDs characterize the probabilistic and/or deterministic relations between parent states and children states
X1 X2 X3
X4 X5
X6
X7X8
X9 X10
X11 X12 X13
21
X7
X3X1 X2
X5
X8
X4
X11 X13
X6
X7
X12Evidence
Bayes NetsBayes Nets
• The prior distributions on the variables of parentless nodes, along with the CPDs of the BN, induce prior distribution—called “beliefs” in the literature—on all the variables
• If the system receives evidence on a variable: – this evidence impacts its belief,– along with the beliefs of all other
variables
X9 X10
Parentless nodes
22
Markov networksMarkov networks
• The edges of a Markov network exhibit
direct dependencies between variables• The absence of an edge means absence of direct dependency• If a set B of nodes separates the graph into several components
then these components are independent given B
23
OutlineOutline
• Inference and distributed inference• Conditional independence (CI)• Graphical models, CI, and separability• Using CI: Sufficient information
– Specifications for a distributed inference system– A naive solution– Using separation
24
Using CI: Sufficient informationUsing CI: Sufficient information
B
CD
A
E
F
GH
I
JK
L
M
Evidence variables
Query variables
Agent 1
Agent 2
Agent 3
Agent 4
Specifications
• A probability space
• A number of agents, each having
- query variables
- evidence variables
What variables must agent 1 represent so that it may fuse the evidence received by other agents?
25
Using CI: Sufficient informationUsing CI: Sufficient information
B
CD
A
E
F
GH
I
JK
L
M
Agent 1
E F G
H I J
K L M
E F G
H I J
K L M
A B
E F G K L MH I J
Specifications
• A probability space
• A number of agents, each having
- query variables
- evidence variables
Agent 2
Agent 3
Agent 4
A naïve solution
• Agents contain their own query and evidence variables
• In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M
Agent 1
Agent 2 Agent 3 Agent 4
26
E F G K L MH I J
Using CI: Sufficient informationUsing CI: Sufficient information
B
CD
A
E
F
GH
I
JK
L
M
Agent 1
E F G K L MH I J
A B
Specifications
• A probability space
• A number of agents, each having
- query variables
- evidence variables
Agent 2
Agent 3
Agent 4
A naïve solution
• Agents contain their own query and evidence variables
• In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M
Y
ZX
separatesZ and X Y
Note that
whether Y is equal to:
• {K,L,M},
• {H,J,I}, or
• {E,F,G}.YY
Agent 1 must
represent many
variables!
How else could
the other agents
communicate their
evidence?
27
Likelihood given X and Z
of evidence on Y
Likelihood given Z
of evidence on Y
P(X,Z|eY)
Using CI: Sufficient informationUsing CI: Sufficient information
B
C D
A
E
F
G
separatesZ and X Y
Z
X
Y
X = {A,B}
Z = {C,D}
Z = {C,D}
Y = {E,G,F}
P(Y|Z) = P(Y|X,Z)
→→
=
→ It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
Agent 2
P(Y,Z|eY)
P(X,Z)
P(Z|eY)
P(Y,Z)
ΣY
x P(X,Z)P(Z)-1
evidence eY
P(eY|Z)P(eY|X,Z)
28
P(X,Z|eY)
Using CI: Sufficient informationUsing CI: Sufficient information
B
C D
A
E
F
G
separatesZ and X Y
Z
X
Y
X = {A,B}
Z = {C,D}
Z = {C,D}
Y = {E,G,F}
P(Y|X,Z) = P(Y|Z)
→→
=
→ It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
Agent 2
P(Y,Z|eY)
P(Z|eY)
P(eY)-1
P(Z)-1P(X,Z) P(Z|eY)
P(Z)-1= P(X,Z) P(Z,eY)
P(eY)-1= P(X,Z) P(eY|Z)
P(eY)-1= P(X,Z) P(eY|X,Z)
P(X,Z|eY)P(eY)-1= P(X,Z,eY) =
Because:
P(eY|Z)P(eY|X,Z)
29
E F G K L MH I J
Using CI: Sufficient informationUsing CI: Sufficient information
E F G K L MH I J
A B
Specifications
• A Bayes net
• A number of agents, each having
- query variables
- evidence variables
A naïve solution
• Agents contain their own query and evidence variables
• In order to receive evidence from the other agents, agent 1 must represent variables E, F, G, H, I, J, K, L, and M
C D
E F G K L MH I J
A B
C D C D C D
Using separation
• Agent 1 only needs to represent two extra variables
• Agent 1 may compute its posterior queries faster from CD than from EFGHIJK
• Communication lines need to transmit two variables instead of three
E F G K L MH I JE F G K L MH I J
C D C D C DC D C D C D
30
Using CI: Sufficient informationUsing CI: Sufficient information
Query variable
Evidence variable
Other variable
1,2 2,
Let and be repectively the evidence and querry sets of two different agents.
In order for the agent querying to fuse evidence received on by the other agent
we wish to find a sequence ,
E Q
Q E
S S 3 1,..., such that 2 jS k
1, 1, , 1 1, 1,( | , ) ( | ) and ( | , ) ( | )k k k k k k k k k k k k kP Q S E P Q S P S S E P Q S
31
32
MatLab ToolMatLab Tool
i + [Enter] to Initialize,
v + [Enter] to view all variables (even those containing no information),
e + [Enter] to enter evidence,
c + [Enter] to perform a inter-agent communication,
p + [Enter] to go to the previous step,
n + [Enter] to go to the next step,a + [Enter] to add a sensor,
r + [Enter] to remove a sensor,t + [Enter] to turn true marginals view ON,
m + [Enter] to turn discrepancy marking OFF,s + [Enter] to save to a file,
q + [Enter] to quit.
Enter Command:
• Insert evidence into given agents and propagate their impact inside the subnet
• Initiate communication between agents, followed by the propagation of new information
• View the marginal distributions of the different agents at every step
• Step forward and backward
• Save eye-friendly logs to a file
Main functionality
33
MatLab Tool: DisplayMatLab Tool: Display* Configuration 2: After evidence L(e|C) = (2,5) has been entered into subnet number 2
The TRUTH:------------------ ------------------ ------------------ ------------------ ------------------ ------------------| A | Values | | C | Values | | B | Values | | E | Values | | D | Values | | F | Values | |----------------| |----------------| |----------------| |----------------| |----------------| |----------------| | True | 0.2005 | | True | 0.1434 | | True | 0.4403 | | True | 0.5426 | | True | 0.2780 | | True | 0.2901 | | False | 0.7995 | | False | 0.8566 | | False | 0.5597 | | False | 0.4574 | | False | 0.7220 | | False | 0.7099 | ------------------ ------------------ ------------------ ------------------ ------------------ ------------------
SUBNET 1 (adjacent to subnets 2): Err(ACB) = 0.0527.~~~~ AD = 0.0704 / ~~~~ AD = 0.1072 / ~~~~ AD = 0.0493 / ------------------ ------------------ ------------------/ A / Values / / C / Values / / B / Values / | E | Values | | D | Values | | F | Values | /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ |----------------| |----------------| |----------------| / True / 0.3000 / / True / 0.2950 / / True / 0.5100 / | True | ###### | | True | ###### | | True | ###### | / False / 0.7000 / / False / 0.7050 / / False / 0.4900 / | False | ###### | | False | ###### | | False | ###### | ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ------------------ ------------------ ------------------
SUBNET 2 (adjacent to subnets 1, 3):------------------ ------------------ ------------------ ------------------ ------------------ ------------------| A | Values | | C | Values | | B | Values | | E | Values | | D | Values | | F | Values | |----------------| |----------------| |----------------| |----------------| |----------------| |----------------| | True | ###### | | True | 0.1434 | | True | 0.4403 | | True | 0.5426 | | True | 0.2780 | | True | ###### | | False | ###### | | False | 0.8566 | | False | 0.5597 | | False | 0.4574 | | False | 0.7220 | | False | ###### | ------------------ ------------------ ------------------ ------------------ ------------------ ------------------
SUBNET 3 (adjacent to subnets 2): Err(EDF) = 0.0169.------------------ ------------------ ------------------ ~~~~ AD = 0.0429 / ~~~~ AD = 0.0025 / ~~~~ AD = 0.0048 / | A | Values | | C | Values | | B | Values | / E / Values / / D / Values / / F / Values / |----------------| |----------------| |----------------| /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ /~~~~~~~~~~~~~~~~/ | True | ###### | | True | ###### | | True | ###### | / True / 0.4820 / / True / 0.2745 / / True / 0.2969 / | False | ###### | | False | ###### | | False | ###### | / False / 0.5180 / / False / 0.7255 / / False / 0.7031 / ------------------ ------------------ ------------------ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~
Enter a command (enter h + [Enter] for help):
Indicates step number and last action that was taken
Shows the marginal distributions that would have been obtained by infering on the entire Bayes Net
Shows the marginal distributions of the variables represented in each subnet
Prompts for new action
34
Communication Graph ConsiderationsCommunication Graph Considerations
1
2
3
4
5
6
Agent 6 receives info from agent 1 through both agent 4 and 5.
How should subnet 6 deal with possible redundancy?
A communication graphOne solution (often adopted) would be to impose a tree structure to the communication graph
35
36
Communication Graph ConsiderationsCommunication Graph Considerations
• When choosing the communication graph, one should take into consideration
- The quality of the possible communication lines- The processing speed of the agents- The importance of given queries
If this is the key decision-making agent
...then this communication graph is more appropriate… than this one
37
Problem SpecificationProblem SpecificationGiven: • A prob. space on V={X1, ..., Xn}• A number of agents, each having:
– Qi: a set of query variables– Ei: a set of evidence variables
38
Problem SpecificationProblem SpecificationGiven: • A BN on V={X1, ..., Xn}
• A number of agents, each having:– Qi: a set of query variables
– Ei: a set of evidence variables
Determine: • An agent communication graph
• A subset Si of V for each agent
• An inference protocol that specifies – How to fuse evidence and messages
received from other agents– The content of messages between
agents
39
• A communication graph:– Nodes represent agents– Edges represent communication lines
• Each agent i has:– Qi: a set of query variables
– Ei: a set of evidence variables
– Pi(Si): a probability space on a subset Si of V
– An inference protocol. This includes a specification of• What to do with received evidence or messages?
• What messages must be sent to other agents?
Query variables
Evidence variables
Distributed Inference DesignDistributed Inference Design
40
Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that,
given any evidence e = e1, ..., ek
(where ei is the set of evidence received by subnet i), the agents may compute the correct posterior on their
query variables,
i.e. for all i, Pi (the probability space of agent i) must become consistent with P on its query variables
i.e. agent i must be able to compute, for all query variable Q of Qi, the probability Pi(Q|e) = P (Q|e)
Distributed Bayesian Inference ProblemDistributed Bayesian Inference Problem
41
More DefinitionsMore Definitions
Let X, Y and Z be subsets of V:
• If P is a prob. space on V, I(X,Y|Z)P is the statement
“X is independent of Y given Z,” i.e. P (X|Y,Z) = P (X|Z)
• If D is a DAG, I(X,Y|Z)D is the statement
“X and Y are d-separated by Z”
• If G a graph, I(X,Y|Z)G is the statement “X and Y are disconnected by Z”
• Theorem: If D is a Bayes Net for P and G is the moral
graph of the ancestor hull of XUYUZ, then
I(X,Y|Z)G ↔ I(X,Y|Z)D → I(X,Y|Z)P
42
B
A
Use of Conditional Independence Use of Conditional Independence in the Distributed Inference Problemin the Distributed Inference Problem
E1
Q2
• What should S1 send to S2 so that Q2 so may “feel” the effect of evidence received by S1 on E1?
• We want S2 to be able to update its probability space so that P2(Q2 | e1) = P (Q2 | e1)
• Claim: If I(E1,Q2|A,B)P then P1(A,B|e1) = is sufficient information for S2 to update its probability space
• “Proof”: P (Q2 | E1,A,B) = P (Q2 | A,B)
S1 S2
43
Using CI: Sufficient informationUsing CI: Sufficient information
44
Given a set of pairs (Q1, E1), ..., (Qk, Ek), we wish to find an inference scheme so that,
given any evidence e = e1, ..., ek
where ei is the set of evidence received by subnet i, the subnets may compute the correct posterior on their
query variables,
i.e. the Pi must become consistent with P on their query variables
i.e. subnet i must be able to compute, for all Q of Qi, the probability Pi(Q|e) = P (Q|e)
Distributed Bayesian Inference ProblemDistributed Bayesian Inference Problem
45
Distributed Bayesian Inference: Distributed Bayesian Inference: Inference ProtocolInference Protocol
• A message between two subnets is a joint distribution on a common subset of variables, computed from the probability space of the sender
• Subnets remember the last message that each subnet sent to it
• A subnet divides the new message by the old one and absorbs the result into its probability space
46
P(X,Z|eY)
Sufficient informationSufficient information
B
C D
A
E
F
G
separatesZ and X Y
Z
X
Y
X = {A,B}
Z = {C,D}
Z = {C,D}
Y = {E,G,F}
P(Y|Z) = P(Y|X,Z)
→→
Likelihood given X and Z
of evidence on Y
Likelihood given Z
of evidence on Y=
→ It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
Agent 2
P(Y,Z|eY)
P(X,Z)
P(Z|eY)
P(Y,Z)
ΣY
x P(X,Z)P(Z)-1
evidence eY
47
P(X,Z|eY)
Sufficient informationSufficient information
B
C D
A
E
F
G
separatesZ and X Y
Z
X
Y
X = {A,B}
Z = {C,D}
Z = {C,D}
Y = {E,G,F}
P(Y|X,Z) = P(Y|Z)
→→
Likelihood given X and Z
of evidence on Y
Likelihood given Z
of evidence on Y=
→ It is sufficient for agent 2 to send its posterior on Z
to agent 1 for the latter to compute its posterior on X
Agent 1
Agent 2
P(Y,Z|eY)
P(Z|eY)
P(eY)-1
P(Z)-1P(X,Z) P(Z|eY)
P(Z)-1= P(X,Z) P(Z,eY)
P(eY)-1= P(X,Z) P(eY|Z)
P(eY)-1= P(X,Z) P(eY|X,Z)
P(X,Z|eY)P(eY)-1= P(X,Z,eY) =
Because:
P(eY|Z)P(eY|X,Z)
48
• In a tree communication graph every edge is the only communication line
between two parts of the network• Hence it must deliver enough information so that the evidence received in
one part may convey its impact to the query variables of the other part• We restrict ourselves to the case where every node represented by an agent can be queried or receive evidence• In this case it is sufficient that the set of variables Z, that will be represented in any communication line, separates the set X of variables of one side of the network from the set Y of variables of the other side
Communication Graph ConsiderationsCommunication Graph Considerations
Z
YX
top related