lectureship a proposal for advancing computer graphics, imaging and multimedia design at rgu robert...

LectureshipA proposal for advancing computer

graphics, imaging and multimedia design at RGU

Robert Gordon University

Aberdeen, 20/6/2008

Fabio CuzzolinINRIA Rhone-Alpes

Career path

Master’s thesis on gesture recognition at the University of Padova

Visiting student, ESSRL, Washington University in St. Louis, and at the University of California at Los Angeles (2000)

Ph.D. thesis on belief functions and uncertainty theory (2001)

Researcher at Politecnico di Milano with the Image and Sound Processing group (2003-2004)

Post-doc at the University of California at Los Angeles, UCLA Vision Lab (2004-2006)

Marie Curie fellow at INRIA Rhone-Alpes

collaborations with several groups

Scientific production and collaborations

collaborations with journals:

IEEE PAMI IEEE SMC-B

CVIUInformation

FusionInt. J. Approximate

ReasoningPC member for VISAPP, FLAIRS, IMMERSCOM, ISAIMcurrently 4+10 journal papers and 31+8 conference papers

SIPTA

Setubal

CMU

Pompeu Fabra

EPFL-IDIAPUBoston

My background

research

Discrete math

linear independence on lattices and matroids

Uncertainty theory

geometric approach

algebraic analysis

generalized total probability

Machine learning

Manifold learning for dynamical models

Computer vision gesture and action recognition

3D shape analysis and matching

Gait ID

pose estimation

action recognition

action segmentation

A multi-layer frameworkfor human motion analysis

different tasks, integrated in a series of layesfeedbacks act between different layers

multiple views

3D reconstruction

unsupervised body-part segmentation

image data fusion

model fitting (stick-

articulated)

motion capture

identity recognitio

n

surveillance

HMI


Action and gesture recognition

Laplacian unsupervised segmentation

Matching of 3D shapes by embedded orthogonal alignment

Bilinear models for invariant gaitID


The role of uncertainty measures

Information fusion for model-free pose estimation

HMMs for gesture recognition

transition matrix A -> gesture dynamics

state-output matrix C -> collection of hand poses

Hand poses were represented by “size functions” (BMVC'97)

Gesture classification

…

HMM 1

HMM 2

HMM n

EM to learn HMM parameters from an input sequence

the new sequence is fed to the learnt gesture models

they produce a likelihoodthe most likely model is chosen (if above a threshold)

OR new model is attributed the label of the closest one (using K-L divergence or other distances)

Volumetric action recognition

• 2D approaches: features are extracted from single views -> viewpoint dependence

• volumetric approach: features are extracted from a volumetric reconstruction of the moving body (ICIP'04)

Unsupervised coherent 3D segmentation

to recognize actions we need to extract features

segmenting moving articulated 3D bodies into parts

along sequences, in a consistent way

in an unsupervised fashion

robustly, with respect to changes of the topology of the moving body

as a building block of a wider motion analysis and capture framework

ICCV-HM'07, CVPR'08, to submit to IJCV

Clustering after Laplacian embedding

generates a lower-dim, widely separated embedded cloudless sensitive to topology changes than other methodsless computationally expensive then ISOMAP

rigid part

rigid part

moving joint area

unaffected neighborhoods

unaffected neighborhoods

affected neighborhoods

local neighborhoods -> stable under articulated motion

Algorithm

K-wise clustering in the embedding space

Seed propagation along time

To ensure time consistency clusters’ seeds have to be propagated along time

Old positions of clusters in 3D are added to new cloud and embedded

Result: new seeds

Results

Coherent clustering along a sequence

Handling of topology changes

Laplacian matching of dense meshes or voxelsets

as embeddings are pose-invariant (for articulated bodies)

they can then be used to match dense shapes by simply aligning their images after embedding

ICCV '07 – NTRL, ICCV '07 – 3dRR, CVPR '08, submitted to ECCV'08, to submit to PAMI

Eigenfunction Histogram assignment

Algorithm:

compute Laplacian embedding of the two shapesfind assignment between eigenfunctions of the two shapesthis selects a section of the embedding spaceembeddings are orthogonally aligned there by EM

Results

Appls: graph matching, protein analysis, motion capture To propagate bodypart segmentation in timeMotion field estimation, action segmentation

Application: spatio-temporal action segmentation

problem: segmenting parts of the video(s) containing “interesting” motions

• global approach: working on the entire sequence (multidimensional volumemultidimensional volume)

• previous works: object segmentation on the spatio-temporal volume for single frames

idea: in a multi-camera setup, working on 3D clouds (hulls) + motion fields + time = 7D volume

• outline of an approach: smoothingsmoothing using message passing + shape detectionshape detection on the obtained manifold

Bilinear models for gait-ID

CSSC bAy

To recognize the identity of humans from their gait (CVPR '06, book chapter in progress)nuisance factors: emotional state, illumination, appearance, view invariance ... (literature: randomized trees)each motion possess several labels: action, identity, viewpoint, emotional state, etc.

• bilinear models (Tenenbaum) can be used to separate the influence of “style” and “content” (the label to classify)

Content classification of unknown style

given a training set in which persons (content=ID) are seen walking from different viewpoints (style=viewpoint)an asymmetric bilinear model can learned from it through SVDwhen new motions are acquired in which a known person is being seen walking from a different viewpoint (unknown style)…

an iterative EM procedure can be set up to classify the content

E step -> estimation of p(c|s), the prob. of the content given the current estimate s of the style M step -> estimation of the linear map for unknown style s

Three layer model

each sequence is encoded as an HMMits C matrix is stacked in a single observation vectora bilinear model is learnt from those vectors

Three-layer model

Features: projections of silhouette's contours onto a line through the center

Results on CMU database

Mobo database: 25 people performing 4 different walking actions, from 6 cameras. T Three labels: action, id, view

Compared performances with “baseline” algorithm and straight k-NN on sequence HMMs

Learning manifolds of dynamical models

Classify movements represented as dynamical models

for instance, each image sequence can be mapped to an ARMA, or AR linear model

Motion classification then reduces to find a suitable distance function in the space of dynamical models

when some a-priori info is available (training set)..

.. we can learn in a supervised fashion the “best” metric for the classification problem!

To submit to ECCV'08 – MLVMA Workshop

Learning pullback metrics

many unsupervised algorithms take in input dataset and map it to an embedded space, but fail to learn a full metric

consider than a family of diffeomorphisms F between the original space M and a metric space N

the diffeomorphism F induces on M a pullback metric maximizing inverse volume finds the manifold which better

interpolates the data (geodesics pass through “crowded” regions)

N

k

M

k

k

dmmg

mgDO

1 2

1

2

1

))((det

))((det)(

Space of AR(2) models

given an input sequence, we can identify the parameters of the linear model which better describes itautoregressive models of order 2 AR(2)Fisher metric on AR(2)

Compute the geodesics of the pullback metric on M

21

12

2212121 1

1

)1)(1)(1(

1),(

aa

aa

aaaaaaag

Results on action and ID rec

scalar feature, AR(2) and ARMA models

assumption: not enough evidence to determine the actual probability describing the problem

second-order distributions (Dirichlet), interval probabilities

credal sets

Uncertainty measures: Intervals, credal sets

Belief functions (Shafer 76): special case of

credal sets

a number of formalisms have been proposed to extend or replace classical probability

1)( B

Bm• if m is a mass function on 2Θ

s.t.

• Probability on a finite set: function p: 2Θ -> [0,1] with p(A)=x m(x), where m: Θ -> [0,1] is a mass function

• Probabilities are additive: if AB= then p(AB)=p(A)+p(B)

Belief functions as random sets

AB

BmAb )(

A

B• belief function b: 2Θ ->[0,1]

Information fusion by Dempster’s rule

several aggregation or elicitation operators proposed

original proposal: Dempster’s rule

• b1:

m({a1})=0.7, m({a1 ,a2})=0.3

a1

a2

a3

a4

• b1 b2 :

m({a1}) = 0.7*0.1/0.37 = 0.19

m({a2}) = 0.3*0.9/0.37 = 0.73

m({a1 ,a2}) = 0.3*0.1/0.37 = 0.08

• b2:

m()=0.1, m({a2 ,a3 ,a4})=0.9

Imprecise classifiers and credal networks

“imprecise” classifiersclass estimate is a belief function

exploit only available evidence, represent ignorance

Belief networks or credal networksat each node a belief function or a convex set of probs

robust version of bayesian networks

Model-free pose estimation

estimating the “posepose” (internal configuration) of a moving body from the available images

Qtq k ˆt=0

t=T

if you do not have an a-priori model of the

object ..

Learning feature-pose maps

... learn a map between features and poses directly from the data

given pose and feature sequences acquired by motion capture ..

q q

y y

1

1

T

T

Q~

a Gaussian density for each state is set up on the feature space -> approximate feature space

• maps each cluster to the set of training poses qk with feature yk inside it

Evidential model

18594

161

38

.. and approximate parameter space ..

.. form the “evidential model”

MTNS'00, ISIPTA'05, to submit to Information Fusion

Results on human body tracking

comparison of three models: left view only, right view only, both views

pose estimation yielded by the overall model

estimate associated with the “right” model

ground truth

• “left” model

Conclusions - Research

Hot topic in computer vision and machine learning: human motion analysis

Applications: motion capture, surveillance, human machine interaction, biometric identification

Different tools from machine learning, robust statistics, differential geometry can be useful

Several tasks are involved in a hierarchical fashion

Tasks are not isolated, but interact and generate feedbacks to help the solution of the others

Conclusions - Teaching plans

machine vision involves notions coming from different branches of pure and applied mathematics: robust statistics, differential geometry, discrete math

all of them are considered as useful tools to solve real-world problems

students have then the chance to improve their mathematical background ...

... and learn at the same time how to develop real products on the ground

integrated courses can be designed along this line

Conclusions – Commercial partnershipsseveral opportunities to develop technology transfer activities involving companies

biometrics: in particular, behavioral (non-controlled) identification

surveillance: multi-camera human motion detection and classification

image and video browsing: internet-based content retrieval

personal links with companies like Honeywell Labs (surveillance), Riya (image googling), MS Research

lectureship a proposal for advancing computer graphics, imaging and multimedia design at rgu robert...

Documents

action segmentation

articulated motion slide

em slide

pami slide

ijcv slide

embedding space slide

d shapes

human motion analysis