lectureship a proposal for advancing computer graphics, imaging and multimedia design at rgu robert...
TRANSCRIPT
LectureshipA proposal for advancing computer
graphics, imaging and multimedia design at RGU
Robert Gordon University
Aberdeen, 20/6/2008
Fabio CuzzolinINRIA Rhone-Alpes
Career path
Master’s thesis on gesture recognition at the University of Padova
Visiting student, ESSRL, Washington University in St. Louis, and at the University of California at Los Angeles (2000)
Ph.D. thesis on belief functions and uncertainty theory (2001)
Researcher at Politecnico di Milano with the Image and Sound Processing group (2003-2004)
Post-doc at the University of California at Los Angeles, UCLA Vision Lab (2004-2006)
Marie Curie fellow at INRIA Rhone-Alpes
collaborations with several groups
Scientific production and collaborations
collaborations with journals:
IEEE PAMI IEEE SMC-B
CVIUInformation
FusionInt. J. Approximate
ReasoningPC member for VISAPP, FLAIRS, IMMERSCOM, ISAIMcurrently 4+10 journal papers and 31+8 conference papers
SIPTA
Setubal
CMU
Pompeu Fabra
EPFL-IDIAPUBoston
My background
research
Discrete math
linear independence on lattices and matroids
Uncertainty theory
geometric approach
algebraic analysis
generalized total probability
Machine learning
Manifold learning for dynamical models
Computer vision gesture and action recognition
3D shape analysis and matching
Gait ID
pose estimation
action recognition
action segmentation
A multi-layer frameworkfor human motion analysis
different tasks, integrated in a series of layesfeedbacks act between different layers
multiple views
3D reconstruction
unsupervised body-part segmentation
image data fusion
model fitting (stick-
articulated)
motion capture
identity recognitio
n
surveillance
HMI
A multi-layer frameworkfor human motion analysis
Action and gesture recognition
Laplacian unsupervised segmentation
Matching of 3D shapes by embedded orthogonal alignment
Bilinear models for invariant gaitID
Manifold learning for dynamical models
The role of uncertainty measures
Information fusion for model-free pose estimation
HMMs for gesture recognition
transition matrix A -> gesture dynamics
state-output matrix C -> collection of hand poses
Hand poses were represented by “size functions” (BMVC'97)
Gesture classification
…
HMM 1
HMM 2
HMM n
EM to learn HMM parameters from an input sequence
the new sequence is fed to the learnt gesture models
they produce a likelihoodthe most likely model is chosen (if above a threshold)
OR new model is attributed the label of the closest one (using K-L divergence or other distances)
Volumetric action recognition
• 2D approaches: features are extracted from single views -> viewpoint dependence
• volumetric approach: features are extracted from a volumetric reconstruction of the moving body (ICIP'04)
A multi-layer frameworkfor human motion analysis
Action and gesture recognition
Laplacian unsupervised segmentation
Matching of 3D shapes by embedded orthogonal alignment
Bilinear models for invariant gaitID
Manifold learning for dynamical models
The role of uncertainty measures
Information fusion for model-free pose estimation
Unsupervised coherent 3D segmentation
to recognize actions we need to extract features
segmenting moving articulated 3D bodies into parts
along sequences, in a consistent way
in an unsupervised fashion
robustly, with respect to changes of the topology of the moving body
as a building block of a wider motion analysis and capture framework
ICCV-HM'07, CVPR'08, to submit to IJCV
Clustering after Laplacian embedding
generates a lower-dim, widely separated embedded cloudless sensitive to topology changes than other methodsless computationally expensive then ISOMAP
rigid part
rigid part
moving joint area
unaffected neighborhoods
unaffected neighborhoods
affected neighborhoods
local neighborhoods -> stable under articulated motion
Algorithm
K-wise clustering in the embedding space
Seed propagation along time
To ensure time consistency clusters’ seeds have to be propagated along time
Old positions of clusters in 3D are added to new cloud and embedded
Result: new seeds
Results
Coherent clustering along a sequence
Handling of topology changes
A multi-layer frameworkfor human motion analysis
Action and gesture recognition
Laplacian unsupervised segmentation
Matching of 3D shapes by embedded orthogonal alignment
Bilinear models for invariant gaitID
Manifold learning for dynamical models
The role of uncertainty measures
Information fusion for model-free pose estimation
Laplacian matching of dense meshes or voxelsets
as embeddings are pose-invariant (for articulated bodies)
they can then be used to match dense shapes by simply aligning their images after embedding
ICCV '07 – NTRL, ICCV '07 – 3dRR, CVPR '08, submitted to ECCV'08, to submit to PAMI
Eigenfunction Histogram assignment
Algorithm:
compute Laplacian embedding of the two shapesfind assignment between eigenfunctions of the two shapesthis selects a section of the embedding spaceembeddings are orthogonally aligned there by EM
Results
Appls: graph matching, protein analysis, motion capture To propagate bodypart segmentation in timeMotion field estimation, action segmentation
Application: spatio-temporal action segmentation
problem: segmenting parts of the video(s) containing “interesting” motions
• global approach: working on the entire sequence (multidimensional volumemultidimensional volume)
• previous works: object segmentation on the spatio-temporal volume for single frames
idea: in a multi-camera setup, working on 3D clouds (hulls) + motion fields + time = 7D volume
• outline of an approach: smoothingsmoothing using message passing + shape detectionshape detection on the obtained manifold
A multi-layer frameworkfor human motion analysis
Action and gesture recognition
Laplacian unsupervised segmentation
Matching of 3D shapes by embedded orthogonal alignment
Bilinear models for invariant gaitID
Manifold learning for dynamical models
The role of uncertainty measures
Information fusion for model-free pose estimation
Bilinear models for gait-ID
CSSC bAy
To recognize the identity of humans from their gait (CVPR '06, book chapter in progress)nuisance factors: emotional state, illumination, appearance, view invariance ... (literature: randomized trees)each motion possess several labels: action, identity, viewpoint, emotional state, etc.
• bilinear models (Tenenbaum) can be used to separate the influence of “style” and “content” (the label to classify)
Content classification of unknown style
given a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)an asymmetric bilinear model can learned from it through SVDwhen new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)…
an iterative EM procedure can be set up to classify the content
E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for unknown style s
Three layer model
each sequence is encoded as an HMMits C matrix is stacked in a single observation vectora bilinear model is learnt from those vectors
Three-layer model
Features: projections of silhouette's contours onto a line through the center
Results on CMU database
Mobo database: 25 people performing 4 different walking actions, from 6 cameras. T Three labels: action, id, view
Compared performances with “baseline” algorithm and straight k-NN on sequence HMMs
A multi-layer frameworkfor human motion analysis
Action and gesture recognition
Laplacian unsupervised segmentation
Matching of 3D shapes by embedded orthogonal alignment
Bilinear models for invariant gaitID
Manifold learning for dynamical models
The role of uncertainty measures
Information fusion for model-free pose estimation
Learning manifolds of dynamical models
Classify movements represented as dynamical models
for instance, each image sequence can be mapped to an ARMA, or AR linear model
Motion classification then reduces to find a suitable distance function in the space of dynamical models
when some a-priori info is available (training set)..
.. we can learn in a supervised fashion the “best” metric for the classification problem!
To submit to ECCV'08 – MLVMA Workshop
Learning pullback metrics
many unsupervised algorithms take in input dataset and map it to an embedded space, but fail to learn a full metric
consider than a family of diffeomorphisms F between the original space M and a metric space N
the diffeomorphism F induces on M a pullback metric maximizing inverse volume finds the manifold which better
interpolates the data (geodesics pass through “crowded” regions)
N
k
M
k
k
dmmg
mgDO
1 2
1
2
1
))((det
))((det)(
Space of AR(2) models
given an input sequence, we can identify the parameters of the linear model which better describes itautoregressive models of order 2 AR(2)Fisher metric on AR(2)
Compute the geodesics of the pullback metric on M
21
12
2212121 1
1
)1)(1)(1(
1),(
aa
aa
aaaaaaag
Results on action and ID rec
scalar feature, AR(2) and ARMA models
A multi-layer frameworkfor human motion analysis
Action and gesture recognition
Laplacian unsupervised segmentation
Matching of 3D shapes by embedded orthogonal alignment
Bilinear models for invariant gaitID
Manifold learning for dynamical models
The role of uncertainty measures
Information fusion for model-free pose estimation
assumption: not enough evidence to determine the actual probability describing the problem
second-order distributions (Dirichlet), interval probabilities
credal sets
Uncertainty measures: Intervals, credal sets
Belief functions (Shafer 76): special case of
credal sets
a number of formalisms have been proposed to extend or replace classical probability
1)( B
Bm• if m is a mass function on 2Θ
s.t.
• Probability on a finite set: function p: 2Θ -> [0,1] with p(A)=x m(x), where m: Θ -> [0,1] is a mass function
• Probabilities are additive: if AB= then p(AB)=p(A)+p(B)
Belief functions as random sets
AB
BmAb )(
A
B• belief function b: 2Θ ->[0,1]
Information fusion by Dempster’s rule
several aggregation or elicitation operators proposed
original proposal: Dempster’s rule
• b1:
m({a1})=0.7, m({a1 ,a2})=0.3
a1
a2
a3
a4
• b1 b2 :
m({a1}) = 0.7*0.1/0.37 = 0.19
m({a2}) = 0.3*0.9/0.37 = 0.73
m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08
• b2:
m()=0.1, m({a2 ,a3 ,a4})=0.9
Imprecise classifiers and credal networks
“imprecise” classifiersclass estimate is a belief function
exploit only available evidence, represent ignorance
Belief networks or credal networksat each node a belief function or a convex set of probs
robust version of bayesian networks
A multi-layer frameworkfor human motion analysis
Action and gesture recognition
Laplacian unsupervised segmentation
Matching of 3D shapes by embedded orthogonal alignment
Bilinear models for invariant gaitID
Manifold learning for dynamical models
The role of uncertainty measures
Information fusion for model-free pose estimation
Model-free pose estimation
estimating the “posepose” (internal configuration) of a moving body from the available images
Qtq k ˆt=0
t=T
if you do not have an a-priori model of the
object ..
Learning feature-pose maps
... learn a map between features and poses directly from the data
given pose and feature sequences acquired by motion capture ..
q q
y y
1
1
T
T
Q~
a Gaussian density for each state is set up on the feature space -> approximate feature space
• maps each cluster to the set of training poses qk with feature yk inside it
Evidential model
18594
161
38
.. and approximate parameter space ..
.. form the “evidential model”
MTNS'00, ISIPTA'05, to submit to Information Fusion
Results on human body tracking
comparison of three models: left view only, right view only, both views
pose estimation yielded by the overall model
estimate associated with the “right” model
ground truth
• “left” model
Conclusions - Research
Hot topic in computer vision and machine learning: human motion analysis
Applications: motion capture, surveillance, human machine interaction, biometric identification
Different tools from machine learning, robust statistics, differential geometry can be useful
Several tasks are involved in a hierarchical fashion
Tasks are not isolated, but interact and generate feedbacks to help the solution of the others
Conclusions - Teaching plans
machine vision involves notions coming from different branches of pure and applied mathematics: robust statistics, differential geometry, discrete math
all of them are considered as useful tools to solve real-world problems
students have then the chance to improve their mathematical background ...
... and learn at the same time how to develop real products on the ground
integrated courses can be designed along this line
Conclusions – Commercial partnershipsseveral opportunities to develop technology transfer activities involving companies
biometrics: in particular, behavioral (non-controlled) identification
surveillance: multi-camera human motion detection and classification
image and video browsing: internet-based content retrieval
personal links with companies like Honeywell Labs (surveillance), Riya (image googling), MS Research