temporal probabilistic models
DESCRIPTION
Temporal Probabilistic Models. Motivation. Observing a stream of data Monitoring (of people, computer systems, etc ) Surveillance, tracking Finance & economics Science Questions: Modeling & forecasting Unobserved variables. Time Series Modeling. Time occurs in steps t=0,1,2,… - PowerPoint PPT PresentationTRANSCRIPT
TEMPORAL PROBABILISTIC MODELS
MOTIVATION Observing a stream of data
Monitoring (of people, computer systems, etc)
Surveillance, tracking Finance & economics Science
Questions: Modeling & forecasting Unobserved variables
TIME SERIES MODELING Time occurs in steps t=0,1,2,…
Time step can be seconds, days, years, etc State variable Xt, t=0,1,2,… For partially observed problems, we see
observations Ot, t=1,2,… and do not see the X’s X’s are hidden variables (aka latent variables)
MODELING TIME Arrow of time
Causality? Bayesian networks to the rescue
Causes Effects
PROBABILISTIC MODELING For now, assume fully observable case
What parents?
X0 X1 X2 X3
X0 X1 X2 X3
MARKOV ASSUMPTION Assume Xt+k is independent of all Xi for i<t
P(Xt+k | X0,…,Xt+k-1) = P(Xt+k | Xt,…,Xt+k-1) K-th order Markov Chain
X0 X1 X2 X3
X0 X1 X2 X3
X0 X1 X2 X3
X0 X1 X2 X3
Order 0
Order 1
Order 2
Order 3
1ST ORDER MARKOV CHAIN MC’s of order k>1 can be converted into a
1st order MC[left as exercise]
So w.o.l.o.g., “MC” refers to a 1st order MC
X0 X1 X2 X3
INFERENCE IN MC What independence relationships can we
read from the BN?
X0 X1 X2 X3
Observe X1
X0 independent of X2, X3, …
P(Xt|Xt-1) known as transition model
INFERENCE IN MC Prediction: the probability of future state?
P(Xt) = Sx0,…,xt-1P (X0,…,Xt) = Sx0,…,xt-1P (X0) Px1,…,xt P(Xi|Xi-1)= Sxt-1P(Xt|Xt-1) P(Xt-1)
“Blurs” over time, and approaches stationary distribution as t grows Limited prediction power Rate of blurring known as mixing time
[Incremental approach]
HOW DOES THE MARKOV ASSUMPTION AFFECT THE CHOICE OF STATE? Suppose we’re tracking a point (x,y) in 2D What if the point is…
A momentumless particlesubject to thermal vibration?
A particle with velocity? A particle with intent, like
a person?
HOW DOES THE MARKOV ASSUMPTION AFFECT THE CHOICE OF STATE? Suppose the point is the position of our robot,
and we observe velocity and intent What if:
Terrain conditions affectspeed?
Battery level affects speed? Position is noisy, e.g. GPS?
IS THE MARKOV ASSUMPTION APPROPRIATE FOR: A car on a slippery road? Sales of toothpaste? The stock market?
HISTORY DEPENDENCE In Markov models, the state must be chosen
so that the future is independent of history given the current state
Often this requires adding variables that cannot be directly observed
PARTIAL OBSERVABILITY Hidden Markov Model (HMM)
X0 X1 X2 X3
O1 O2 O3
Hidden state variables
Observed variables
P(Ot|Xt) called the observation model (or sensor model)
INFERENCE IN HMMS Filtering Prediction Smoothing, aka hindsight Most likely explanation
X0 X1 X2 X3
O1 O2 O3
INFERENCE IN HMMS Filtering Prediction Smoothing, aka hindsight Most likely explanation
X0 X1 X2
O1 O2
Query variable
FILTERING Name comes from signal processing P(Xt|o1:t) = Sxt-1 P(xt-1|o1:t-1) P(Xt|xt-1,ot) P(Xt|Xt-1,ot) = P(ot|Xt-1,Xt)P(Xt|Xt-1)/P(ot|Xt-1)
= a P(ot|Xt)P(Xt|Xt-1)
X0 X1 X2
O1 O2
Query variable
FILTERING P(Xt|o1:t) = a Sxt-1P(xt-1|o1:t-1) P(ot|Xt)P(Xt|xt-1) Forward recursion If we keep track of P(Xt|o1:t)
=> O(1) updates for all t!
X0 X1 X2
O1 O2
Query variable
INFERENCE IN HMMS Filtering Prediction Smoothing, aka hindsight Most likely explanation
X0 X1 X2 X3
O1 O2 O3
Query
PREDICTION P(Xt+k|o1:t) 2 steps: P(Xt|o1:t), then P(Xt+k|Xt) Filter then predict as with standard MC
X0 X1 X2 X3
O1 O2 O3
Query
INFERENCE IN HMMS Filtering Prediction Smoothing, aka hindsight Most likely explanation
X0 X1 X2 X3
O1 O2 O3
Query
SMOOTHING P(Xk|o1:t) for k < t P(Xk|o1:k,ok+1:t)
= P(ok+1:t|Xk,o1:k)P(Xk|o1:k)/P(ok+1:t|o1:k)= a P(ok+1:t|Xk)P(Xk|o1:k)
X0 X1 X2 X3
O1 O2 O3
Query
Standard filtering to time k
SMOOTHING Computing P(ok+1:t|Xk) P(ok+1:t|Xk) = Sxk+1P(ok+1:t|Xk,xk+1) P(xk+1|Xk)
= Sxk+1P(ok+1:t|xk+1) P(xk+1|Xk)= Sxk+1P(ok+2:t|xk+1)P(ok+1|xk+1)P(xk+1|Xk)
X0 X1 X2 X3
O1 O2 O3
Given prior states
What’s the probability of this sequence?
Backward recursion
INFERENCE IN HMMS Filtering Prediction Smoothing, aka hindsight Most likely explanation
X0 X1 X2 X3
O1 O2 O3
Query returns a path through state space x0,…,x3
MLE: VITERBI ALGORITHM Recursive computation of max likelihood of
path to all xt in Val(Xt) mt(Xt) = maxx1:t-1 P(x1,…,xt-1,Xt|o1:t)
=a P(ot|Xt) maxxt-1P(Xt|xt-1) mt-1(xt-1) Previous ML state
argmaxxt-1P(Xt|xt-1) mt-1(xt-1)
APPLICATIONS OF HMMS IN NLP Speech recognition Hidden phones
(e.g., ah eh ee th r) Observed, noisy acoustic
features (produced by signal processing)
PHONE OBSERVATION MODELS
Phonet
Signal processing
Features(24,13,3,59)
Featurest
Model defined to be robust over variations in accent, speed, pitch, noise
PHONE TRANSITION MODELS
Phonet
Featurest
Good models will capture (among other things):Pronunciation of wordsSubphone structureCoarticulation effects Triphone models = order 3 Markov chain
Phonet+1
WORD SEGMENTATION Words run together when
pronounced Unigrams P(wi) Bigrams P(wi|wi-1) Trigrams P(wi|wi-1,wi-2)
Logical are as confusion a may right tries agent goal the was diesel more object then information-gathering search is
Planning purely diagnostic expert systems are very similar computational approach would be represented compactly using tic tac toe a predicate
Planning and scheduling are integrated the success of naïve bayes model is just a possible prior source by that time
Random 20 word samples from R&N using N-gram models
TRICKS TO IMPROVE RECOGNITION Narrow the # of variables
Digits, yes/no, phone tree Training with real user data
Real story: “Yes ma’am”
KALMAN FILTERING In a nutshell
Efficient filtering in continuous state spaces
Gaussian transition and observation models
Ubiquitous for tracking with noisy sensors, e.g. radar, GPS, cameras
HIDDEN MARKOV MODEL FOR ROBOT LOCALIZATION Use observations to get a better idea of
where the robot is at time t
X0 X1 X2 X3
z1 z2 z3
Hidden state variables
Observed variables
Predict – observe – predict – observe…
LINEAR GAUSSIAN TRANSITION MODEL Consider position and velocity xt, vt Time step h Without noise
xt+1 = xt + h vt
vt+1 = vt With Gaussian noise of std s1
P(xt+1|xt) exp(-(xt+1 – (xt + h vt))2/(2s12)
i.e. xt+1 ~ N(xt + h vt, s1)
LINEAR GAUSSIAN TRANSITION MODEL If prior on position is Gaussian, then the
posterior is also Gaussian
vh s1
N(m,s) N(m+vh,s+s1)
LINEAR GAUSSIAN OBSERVATION MODEL Position observation zt Gaussian noise of std s2
zt ~ N(xt,s2)
LINEAR GAUSSIAN OBSERVATION MODEL If prior on position is Gaussian, then the
posterior is also Gaussian
m (s2z+s22m)/(s2+s2
2)
s2 s2s22/(s2+s2
2)
Position prior
Posterior probability
Observation probability
MULTIVARIATE CASE Transition matrix F, covariance Sx Observation matrix H, covariance Sz
mt+1 = F mt + Kt+1(zt+1 – HFmt)St+1 = (I - Kt+1)(FStFT + Sx)
WhereKt+1= (FStFT + Sx)HT(H(FStFT + Sx)HT +Sz)-1
Got that memorized?
PROPERTIES OF KALMAN FILTER Optimal Bayesian estimate for linear
Gaussian transition/observation models Need estimates of covariance… model
identification necessary Extensions to nonlinear
transition/observation models work as long as they aren’t too nonlinear Extended Kalman Filter Unscented Kalman Filter
PROPERTIES OF KALMAN FILTER Optimal Bayesian estimate for linear
Gaussian transition/observation models Need estimates of covariance… model
identification necessary Extensions to nonlinear systems
Extended Kalman Filter: linearize models Unscented Kalman Filter: pass points through
nonlinear model to reconstruct gaussian Work as long as systems aren’t too nonlinear
NON-GAUSSIAN DISTRIBUTIONS Gaussian distributions are a “lump”
Kalman filter estimate
NON-GAUSSIAN DISTRIBUTIONS Integrating continuous and discrete states
Splitting with a binary choice
“up”
“down”
EXAMPLE: FAILURE DETECTION Consider a battery meter sensor
Battery = true level of battery BMeter = sensor reading
Transient failures: send garbage at time t Persistent failures: send garbage forever
EXAMPLE: FAILURE DETECTION Consider a battery meter sensor
Battery = true level of battery BMeter = sensor reading
Transient failures: send garbage at time t 5555500555…
Persistent failures: sensor is broken 5555500000…
DYNAMIC BAYESIAN NETWORK
BMetert
BatterytBatteryt-1
BMetert ~ N(Batteryt,s)
(Think of this structure “unrolled” forever…)
DYNAMIC BAYESIAN NETWORK
BMetert
BatterytBatteryt-1
BMetert ~ N(Batteryt,s)
P(BMetert=0 | Batteryt=5) = 0.03Transient failure model
RESULTS ON TRANSIENT FAILUREE
(Bat
tery
t)
Transient failure occurs
Without model
With model
Meter reads 55555005555…
RESULTS ON PERSISTENT FAILUREE
(Bat
tery
t)
Persistent failure occurs
With transient model
Meter reads 5555500000…
PERSISTENT FAILURE MODEL
BMetert
BatterytBatteryt-1
BMetert ~ N(Batteryt,s)
P(BMetert=0 | Batteryt=5) = 0.03
Brokent-1 Brokent
P(BMetert=0 | Brokent) = 1
Example of a Dynamic Bayesian Network (DBN)
RESULTS ON PERSISTENT FAILUREE
(Bat
tery
t)
Persistent failure occurs
With transient model
Meter reads 5555500000…
With persistent failure model
HOW TO PERFORM INFERENCE ON DBN? Exact inference on “unrolled” BN
Variable Elimination – eliminate old time steps After a few time steps, all variables in the state
space become dependent! Lost sparsity structure
Approximate inference Particle Filtering
PARTICLE FILTERING (AKA SEQUENTIAL MONTE CARLO)
Represent distributions as a set of particles
Applicable to non-gaussian high-D distributions
Convenient implementations
Widely used in vision, robotics
PARTICLE REPRESENTATION
Bel(xt) = {(wk,xk)} wk are weights, xk are state
hypotheses Weights sum to 1 Approximates the underlying
distribution
Weighted resampling step
PARTICLE FILTERING Represent a distribution at time t as a set of
N “particles” St1,…,St
N
Repeat for t=0,1,2,… Sample S[i] from P(Xt+1|Xt=St
i) for all i Compute weight w[i] = P(e|Xt+1=S[i]) for all i Sample St+1
i from S[.] according to weights w[.]
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Sampling step
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Suppose we now observe BMeter=0
P(BMeter=0|sample) = ?0.03
1
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Compute weights (drawn as particle size)
P(BMeter=0|sample) = ?0.03
1
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Weighted resampling
P(BMeter=0|sample) = ?
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Sampling Step
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Now observe BMetert = 5
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Compute weights
10
BATTERY EXAMPLE
BMetert
BatterytBatteryt-1
Brokent-1 Brokent
Weighted resample
APPLICATIONS OF PARTICLE FILTERING IN ROBOTICSSimultaneous Localization and
Mapping (SLAM)Observations: laser rangefinderState variables: position, walls
SIMULTANEOUS LOCALIZATION AND MAPPING (SLAM)
Mobile robotsOdometry
Locally accurateDrifts significantly over
timeVision/ladar/sonar
Inaccurate locallyGlobal reference frame
Combine the twoState: (robot pose, map)Observations: (sensor
input)
COUPLE OF PLUGS CSCI B553 CSCI B659: Principles of Intelligent Robot
Motion http://cs.indiana.edu/classes/b659-hauserk
CSCI B657: Computer Vision David Crandall/Chen Yu
NEXT TIME Learning distributions from data Read R&N 20.1-3
MLE: VITERBI ALGORITHM Recursive computation of max likelihood of
path to all xt in Val(Xt) mt(Xt) = maxx1:t-1 P(x1,…,xt-1,Xt|o1:t)
=a P(ot|Xt) maxxt-1P(Xt|xt-1) mt-1(xt-1) Previous ML state
argmaxxt-1P(Xt|xt-1) mt-1(xt-1)
Does this sound familiar?
MLE: VITERBI ALGORITHM Do the “logarithm trick” log mt(Xt) = log a P(ot|Xt)
+ maxxt-1 [log P(Xt|xt-1) + log mt-1(xt-1) ] View:
log a P(ot|Xt) as a reward log P(Xt|xt-1) as a cost log mt(Xt) as a value function
Bellman equation