cognitive computer vision kingsley sage [email protected] and hilary buxton [email protected]...
TRANSCRIPT
Cognitive Computer Vision
Kingsley [email protected]
Hilary [email protected]
Prepared under ECVision Specific Action 8-3http://www.ecvision.org
So why is Bayes rule relevant to Cognitive CV?
Provides a well-founded methodology for reasoning with uncertainty
These methods are the basis for our model of perception guided by expectation
We can develop well-founded methods of learning rather than just being stuck with hand-coded models
Bayes rule: dealing with uncertainty
Rev. THOMAS BAYES1702-1761
Sources of uncertainty e.g.:– ignorance– complexity– physical randomness– vagueness
Use probability theory to reason about uncertainty
Be careful to understand what you mean by probability and use it consistently
– frequency analysis– belief
Probability theory - reminder
p(x): single continuous value in the range [0,1]. Think of either as “x is true in 0.7 of cases” (frequentist) of “I believe x = true with probability 0.7”
P(X): often (but not always) used to denote a distribution over a set of values, e.g. if X is discrete {x=true, x=false} then P(X) encompasses knowledge of both values. p(x=true) is then a single value.
Probability theory - reminder
Joint probability
Conditional probability
)().|(),(
)( as written also ),(
YpYXPYXP
YXPYXP
Y"given X" i.e. )|( YXp
Probability theory - reminder
Conditional independence
)().(),(
)()|( then iff
YPXPYXP
XpYXPYX
Marginalising
Y
YPYXPXP
YPYXPYXP
)().|()(
)().|(),(
Bayes rule – the basics
Y
X
)().|()().|(
),(),(
)().|(),(
)().|(),(
YPYXPXPXYP
XYPYXP
YPYXPXYP
XPXYPYXP
)(
)().|()|(
XP
YPYXPXYP
BAYES RULE
Bayes rule – the basics
)(
)().|()|(
EP
HPHEPEHP
As an illustration, let’s look at the conditional probability of a hypothesis H based on some evidence E
evidenceofyprobabilit
priorlikelihoodposterior
Bayes rule – example
)(
)().|()|(
EP
HPHEPEHP
Consider a vision system used to detect zebra in static images
It has a “stripey area” operator to help it do this (the evidence E)
Let p(h=zebra present) = 0.02 (prior established during training)
Assume the “stripey area” operator is discrete valued (true/false)
Let p(e=true|h=true)=0.8 (it’s a fairly good detector)
Let p(e=true|h=false)=0.1 (there are non-zebra items with stripes in the data set – like the gate)
Given e, we can establish p(h=true|e=true) …
Bayes rule – example
1404.0)|(098.0016.0
016.0)|(
98.0*1.002.0*8.0
02.0*8.0)|(
)().|()().|(
)().|()|(
)(
)().|()|(
ehp
ehp
ehp
hphephphep
hphepehp
Ep
truehptruehtrueepetruehp
Note that this is an increase over the prior = 0.02due to the evidence e
Interpretation
Despite our intuition, our detector does not seem very “good”
Remember, only 1 in 50 images had a zebra That means that 49 out of 50 do not contain a zebra
and the detector is not 100% reliable. Some of these images will be incorrectly determined as having a zebra
Failing to account for “negative” evidence properly is a typical failing of human intuitive reasoning
Moving on …
Human intuition is not very Bayesian (e.g. Kahneman et al., 1982).
Be sure to apply Bayes theory correctly Bayesian networks help us to organise our
thinking clearly Causality and Bayesian networks are related
Bayesian networks
A
ED
B C
Compact representation of the joint probability over a set of variables
Each variable is represented as a node. Each variable can be discrete or continuous
Conditional independence assumptions are encoded using a set of arcs
Set of nodes and arcs is referred to as a graph
No arcs imply nodes are conditionally independent of each other
Different types of graph exist. The one shown is a Directed Acyclic Graph (DAG)
Bayesian networks - terminology
A
ED
B C
A is called a root node and has a prior only
B,D, and E are called leaf nodes A “causes” B and “causes” C. So
value of A determines value of B and C
A is the parent nodes of B and C B and C are child nodes of A To determine E, you need only to
know C. E is conditionally independent of A given C
Encoding conditional independence
A B C
)().|().|(),,(
),(),|(
ce)independen al(condition given But
)().|(),(
),()..|(),,(
APABPBCPCBAP
BCPBACP
BAC
APABPBAP
BAPBACPCBAP
Ni
i
iiN XXPXXXP
1
21 )(parents|(),...,,( FACTOREDREPRESENTATION
Specifying the Conditional Probability Terms (1)
For a discrete node C with discrete parents A and B, the conditional probability term P(C|A,B) can be represented as a value table
a= b= p(c=T|A,B)
red T 0.2
red F 0.1
green T 0.6
green F 0.3
blue T 0.99
blue F 0.05
A
C
B
{red,green,blue} {true,false}
{true,false}
Specifying the Conditional Probability Terms (2)
For a continuous node C with continuous parents A and B, the conditional probability term P(C|A,B) can be represented as a functionA
C
B
A B
p(c|
A,B
)
Specifying the Conditional Probability Terms (3)
For a continuous node C with 1 continuous parent A and and 1 discrete parent B, the conditional probability term P(C|A,B) can be represented as a set of functions (the continuous function is selected according to a “context” determined by B
A
p(c|
A,B
)
A
C
B
{true,false}
Directed Acyclic Graph (DAG)
A
ED
B C
Arcs encode “causal” relationships between nodes
No more than 1 path (regardless of arc direction) between any node and any other node
If we added dotted red arc, we would have a loopy graph
Loopy graphs can be approximated by acyclic ones for inference, but this is outside the scope of this course
Inference and Learning
Inference– Calculating a probability over a set of nodes given the
values of other nodes– Two most useful modes of inference are
PREDICTIVE (from root to leaf) and DIAGNOSTIC (from leaf to root)
Exact and approximate methods– Exact methods exist for Directed Acyclic Graphs
(DAGs)– Approximations exists for other graph types
evidenceofyprobabilit
priorlikelihoodposterior
Summary
Bayes rule allows us to deal with uncertain data
Bayesian networks encode conditional independence. Simple DAGs can be used n causal and diagnostic modes