robot vision lesson 10: object tracking and visual servoing matthias rüther
DESCRIPTION
ROBOT VISION Lesson 10: Object Tracking and Visual Servoing Matthias Rüther. Contents. Object Tracking Appearance based tracking Kalman filtering Condensation algorithm Model based tracking Model fitting and tracking Visual Servoing Principle Servoing Types. Tracking. Tracking. - PowerPoint PPT PresentationTRANSCRIPT
Robot Vision SS 2005 Matthias Rüther 1
ROBOT VISION Lesson 10: Object Tracking and
Visual Servoing
Matthias Rüther
Robot Vision SS 2005 Matthias Rüther 2
Contents
Object Tracking– Appearance based tracking
• Kalman filtering• Condensation algorithm
– Model based tracking• Model fitting and tracking
Visual Servoing– Principle
– Servoing Types
Robot Vision SS 2005 Matthias Rüther 3
Tracking
Tracking
Robot Vision SS 2005 Matthias Rüther 4
Definition of Tracking
Tracking:
– Generate some conclusions about the motion of the scene, objects, or the camera, given a sequence of images.
– Knowing this motion, predict where things are going to project in the next image, so that we don’t have so much work looking for them.
Robot Vision SS 2005 Matthias Rüther 5
Why Track?
Robot Vision SS 2005 Matthias Rüther 6
Tracking a Silhouette by Measuring Edge Positions
Observations are positions of edges along normals to tracked contour
Robot Vision SS 2005 Matthias Rüther 7
Why not Wait and Process the Set of Images as a Batch?
E.g. in a car system, detecting and tracking pedestrians in real time is important.
Recursive methods require less computing
Robot Vision SS 2005 Matthias Rüther 8
Implicit Assumptions of Tracking
Physical cameras do not move instantly from a viewpoint to another.
Objects do not teleport between places around the scene.
Relative position between camera and scene changes incrementally.
We can model motion
Robot Vision SS 2005 Matthias Rüther 9
Related Fields
Signal Detection and Estimation
Radar technology
Robot Vision SS 2005 Matthias Rüther 10
The Problem: Signal Estimation
We have a system with parameters– Scene structure, camera motion, automatic zoom
– System state is unknown (“hidden”)
We have measurements– Components of stable “feature points” in the images.
– “Observations”, projections of the state.
We want to recover the state components from the observations
Robot Vision SS 2005 Matthias Rüther 11
Necessary Models
Robot Vision SS 2005 Matthias Rüther 12
A Simple Example of Estimation by Least Square Method
Robot Vision SS 2005 Matthias Rüther 13
Recursive Least Square Estimation
We don’t want to wait until all data have been collected to get an estimate of the depth.
We don’t want to reprocess old data when we make a new measurement.
Recursive method: data at step i are obtained from data at step i-1
Robot Vision SS 2005 Matthias Rüther 14
Recursive Least Square Estimation 2
Robot Vision SS 2005 Matthias Rüther 15
Recursive Least Square Estimation 3
Robot Vision SS 2005 Matthias Rüther 16
Least Square Estimation of the State Vector of a Static System
Robot Vision SS 2005 Matthias Rüther 17
Least Square Estimation of the State Vector of a Static System 2
Robot Vision SS 2005 Matthias Rüther 18
Dynamic System
Robot Vision SS 2005 Matthias Rüther 19
Recursive Least Square Estimation for a Dynamic System (Kalman Filter)
Robot Vision SS 2005 Matthias Rüther 20
Estimation when System Model is Nonlinear (Extended Kalman Filter)
Robot Vision SS 2005 Matthias Rüther 21
Tracking Steps
Robot Vision SS 2005 Matthias Rüther 22
Recursive Least Square Estimation for a Dynamic System (Kalman Filter)
Robot Vision SS 2005 Matthias Rüther 23
Tracking as a Probabilistic Inference Problem
Find distributions for state vector ai and for measurement vector xi. Then we are able to compute the expectations âi and x^i.
• Simplifying assumptions (same as for HMM)
Robot Vision SS 2005 Matthias Rüther 24
Tracking as Inference
Robot Vision SS 2005 Matthias Rüther 25
Model based tracking
Robot Vision SS 2005 Matthias Rüther 26
IDEA: if motion is caused by known 3-D object, we cantrack 3-D motion parameters, not just individual features!
ADVANTAGES: - low dimensionality (3 rotations, 3 translations independent of number of features tracked)- mutually constrained motion instead of independently moving points
LIMITATIONS:- 6 params only with rigid objects! Not articulated, not deformable.- assumes 3-D model known a priori
MODEL-BASED 3-D TRACKING
Robot Vision SS 2005 Matthias Rüther 27
[Wunsch,Hirzinger IEEE RA 1997]
SKETCH OF ALGORITHM:
0. Initialize 3-D pose R0, t0 (rot, transl)
1. Extract features from image It
2. Match img features with features of 3-D modelpositioned at Rt-1, tt-1
3. Evaluate global error metric in 3-D space
(notice, not in image space)
4. Estimate Rt, tt aligning img and model features
5. Next frame and go to 1.
Example Algorithm
Robot Vision SS 2005 Matthias Rüther 28
FEATURES: for instance using image edges with orient. and offset d (and sx, sy camera scale factors), then )0,sincos( yx ss n
is the normal of the 3-D plane through the img edge.
Corresponding model edge
3-D plane through img edge
ERROR METRIC: in 3-D space for efficiency (no back-projection): orthogonality of n and model edge
p
q
22 )]([)]([ tqntpn RRE TT
Some Details
Robot Vision SS 2005 Matthias Rüther 29
MINIMISATION: using, say, 3 types of features:
}{min 321,
j
fjj
j
fjj
j
fjjtR EwEwEw
Trick 1: Approximating R with differential rotations:
xxxxRx ][
0
0
0
dxdy
dxdz
dydz
All E terms can be linearized, a linear system obtained from the quadratic minimization, and a solution computed in closed form: e.g., for edges,
kkTk
kkTk
Tkk
Tkk
Tkk
Tkk
vpn
npnt
vvnv
vnnn
)(
)(
Some Details
Robot Vision SS 2005 Matthias Rüther 30
... where kTk
Tk pnv
The resulting linear systemA [t ] = b
is (trick 2) applied iteratively at each time instant to reduce errors; a few iterations should suffice for small frame-to-frame displacements.
NOTICE ASSUMPTIONS MADE:- rigid object- model known a priori- small frame-to-frame displacements - img-model feature correspondences known
(if small displacements, by min distance)
Some Details
Robot Vision SS 2005 Matthias Rüther 31
Problems with Tracking
Initial detection– If it is too slow we will never catch up
– If it is fast, why not do detection at every frame?
Even if raw detection can be done in real time, tracking saves processing cycles compared to raw detection.
The CPU has other things to do.
Detection is needed again if you lose tracking
Most vision tracking prototypes use initial detection done by hand
Robot Vision SS 2005 Matthias Rüther 32
Visual Servoing
Vision System operates in a closed control loop.
Better Accuracy than „Look and Move“ systems
Figures from S.Hutchinson: A Tutorial on Visual Servo Control
Robot Vision SS 2005 Matthias Rüther 33
Visual Servoing
Example: Maintaining relative Object Position
Figures from P. Wunsch and G. Hirzinger. Real-Time Visual Tracking of 3-D Objects with Dynamic Handling of Occlusion
Robot Vision SS 2005 Matthias Rüther 34
Visual Servoing
Camera Configurations:
End-Effector Mounted Fixed
Figures from S.Hutchinson: A Tutorial on Visual Servo Control
Robot Vision SS 2005 Matthias Rüther 35
Visual Servoing
Servoing Architectures
Figures from S.Hutchinson: A Tutorial on Visual Servo Control
Robot Vision SS 2005 Matthias Rüther 36
Visual Servoing
Position-based and Image Based control
– Position based: • Alignment in target coordinate system• The 3D structure of the target is rconstructed• The end-effector is tracked• Sensitive to calibration errors• Sensitive to reconstruction errors
– Image based:• Alignment in image coordinates• No explicit reconstruction necessary• Insensitive to calibration errors• Only special problems solvable• Depends on initial pose• Depends on selected features
target
End-effector
Image of target
Image of end effector
Robot Vision SS 2005 Matthias Rüther 37
Visual Servoing
EOL and ECL control
– EOL: endpoint open-loop; only the target is observed by the camera
– ECL: endpoint closed-loop; target as well as end-effector are observed by the camera
EOL ECL
Robot Vision SS 2005 Matthias Rüther 38
Visual Servoing
Position Based Algorithm:1. Estimation of relative pose
2. Computation of error between current pose and target pose
3. Movement of robot
Example: point alignment
p1
p2
Robot Vision SS 2005 Matthias Rüther 39
Visual Servoing
Position based point alignment
Goal: bring e to 0 by moving p1
e = |p2m – p1m|
u = k*(p2m – p1m)
pxm is subject to the following measurement errors: sensor position, sensor calibration, sensor measurement error
pxm is independent of the following errors: end effector position, target position
p1m p2m
d
Robot Vision SS 2005 Matthias Rüther 40
Visual Servoing Image based point alignment
Goal: bring e to 0 by moving p1
e = |u1m – v1m| + |u2m – v2m|
uxm, vxm is subject only to sensor measurement error
uxm, vxm is independent of the following measurement errors: sensor position, end effector position, sensor calibration, target position
p1 p2
c1 c2
u1
u2
v1 v2
d1d2
Robot Vision SS 2005 Matthias Rüther 41
Visual Servoing
Example Laparoscopy
Figures from A.Krupa: Autonomous 3-D Positioning of Surgical Instruments in Robotized Laparoscopic Surgery Using Visual Servoing
Robot Vision SS 2005 Matthias Rüther 42
Visual Servoing
Example Laparoscopy
Figures from A.Krupa: Autonomous 3-D Positioning of Surgical Instruments in Robotized Laparoscopic Surgery Using Visual Servoing
Robot Vision SS 2005 Matthias Rüther 43
Tracking using CONDENSATION
CONditional DENSity PropagATION
M. Isard and A. Blake, CONDENSATION – Conditional density propagation for visual tracking, Int. J. Computer Vision 29(1), 1998, pp. 4-28.
Robot Vision SS 2005 Matthias Rüther 44
Goal
Model-based visual tracking in dense clutter at near video frame rates
Robot Vision SS 2005 Matthias Rüther 45
Example
Robot Vision SS 2005 Matthias Rüther 46
Approach
Probabilistic framework for tracking objects such as curves in clutter using an iterative sampling algorithm.
Model motion and shape of target
Top-down approach
Simulation instead of analytic solution
Robot Vision SS 2005 Matthias Rüther 47
Probabilistic Framework
Object dynamics form a temporal Markov chain
Observations, zt , are independent (mutually and w.r.t process)
Use Bayes’ rule
Robot Vision SS 2005 Matthias Rüther 48
Notation
X State vector, e.g., curve’s position and orientation
Z Measurement vector, e.g., image edge locations
p(X) Prior probability of state vector; summarizes prior domain knowledge, e.g., by independent measurements
p(Z) Probability of measuring Z; fixed for any given image
p(Z | X) Probability of measuring Z given that the state is X; compares image to expectation based on state
p(X | Z) Probability of X given that measurement Z has occurred; called state posterior
Robot Vision SS 2005 Matthias Rüther 49
Tracking as Estimation
Compute state posterior, p(X|Z), and select next state to be the one that maximizes this (Maximum a Posteriori (MAP) estimate)
Measurements are complex and noisy, so posterior cannot be evaluated in closed form
Particle filter (iterative sampling) idea: – Stochastically approximate the state posterior with a set of N
weighted particles, (s, ), where s is a sample state and is its weight
Use Bayes’ rule to compute p(X|Z)
Robot Vision SS 2005 Matthias Rüther 50
Factored Sampling
Generate a set of samples that approximates the posterior p(X|Z)
Sample set s={s(1), …, s(N)} generated from p(X); each sample has a weight (“probability”)
Robot Vision SS 2005 Matthias Rüther 51
Factored Sampling
N=15
• CONDENSATION for one image
Robot Vision SS 2005 Matthias Rüther 52
Estimating Target State
State samples Mean of weighted state samples
Robot Vision SS 2005 Matthias Rüther 53
Bayes’ Rule
( | ) ( )( | )
( )
p Z X p Xp X Z
p Z
This is what you canevaluate
This is what you mayknow a priori, or whatyou can predict
This is what you want. Knowing p(X|Z) will tell us what is the most likely state X.
This is a constant for agiven image
Robot Vision SS 2005 Matthias Rüther 54
CONDENSATION Algorithm
1. Select: Randomly select N particles from {st-1(n)} based on weights
t-1(n); same particle may be picked multiple times (factored
sampling)
2. Predict: Move particles according to deterministic dynamics (drift), then perturb individually (diffuse)
3. Measure: Get a likelihood for each new sample by comparing it with the image’s local appearance, i.e., based on p(zt|xt); then update weight accordingly to obtain {(st
(n), t(n))}
Robot Vision SS 2005 Matthias Rüther 55
CONDENSATION Scheme
Robot Vision SS 2005 Matthias Rüther 56
Notes on Updating
Enforcing plausibility: Particles that represent impossible configurations are discarded
Diffusion modeled with a Gaussian
Likelihood function: Convert “goodness of prediction” score to pseudo-probability
– More markings closer to predicted markings -> higher likelihood
Robot Vision SS 2005 Matthias Rüther 57
State Posterior
Robot Vision SS 2005 Matthias Rüther 58
State Posterior Animation
Robot Vision SS 2005 Matthias Rüther 59
Object Motion Model
For video tracking we need a way to propagate probability densities, so we need a “motion model” such as Xt+1 = A Xt + B Wt where W is a noise term and A and B are state
transition matrices that can be learned from training sequences
The state, X, of an object, e.g., a B-spline curve, can be represented as a point in a 6D state space of possible 2D affine transformations of the object
Robot Vision SS 2005 Matthias Rüther 60
Evaluating p(Z | X)
where m = {true measurement is zm} for m = 1,…,M, and q = 1 - mp(m) is the probability that the target is not visible
1
( | ) ( | ) ( | , ) ( )M
m mm
p z x qp z clutter p z x p
2
m m m mm
x z if x z
otherwise
Robot Vision SS 2005 Matthias Rüther 61
Dancing Example
Robot Vision SS 2005 Matthias Rüther 62
Hand Example
Robot Vision SS 2005 Matthias Rüther 63
Pointing Hand Example
Robot Vision SS 2005 Matthias Rüther 64
3D Model-based Example
3D state space: image position + angle
Polyhedral model of object
Robot Vision SS 2005 Matthias Rüther 65
Advantages of Particle Filtering
Nonlinear dynamics, measurement model easily incorporated
Copes with lots of false positives
Multi-modal posterior okay (unlike Kalman filter)
Multiple samples provides multiple hypotheses
Fast and simple to implement