knowledge-based event recognition from salient regions of activity

Knowledge-based event recognition from salient

regions of activity

Nicolas Moënne-LoccozViper groupComputer vision & multimedia laboratory University of Geneva

January 23 2003 / [email protected]

M4 – Meeting – January 2004

NML - CVML - UniGe 2

Outline

• Context

• Salient Regions of Activity (SRA)

• Learning the semantic of SRA

• Visual Event Query language

• Conclusion


Context

• Retrieval of visual events based on user query Abstract representation of the visual content Query Language to express visual events

• Approach – Region-based description of the content– Classification of the regions– Events queried as spatio-temporal constraints on the

regions


Overview

Domain Knowledge

Salient regions

of activity

Labelled regions

Videosdatabase

User queriesRegion extraction Classification


Salient regions of activity

• Regions of the image space – Moving in the scene– Having an homogenous colour distribution

Moving objects or meaningful parts of moving objects

• Extraction : – From moving salient points– By an adaptive mean-shift algorithm


Salient points extraction

• Scale invariant interest points (Mikolajczyk, Schmid 2001)

– Extracted in the linear scale-space

– Local maxima of the scale normalized Harris function (image space)

– Local maxima of the scale normalized Laplacian (scale space)

)),(()),(det(),( 2 svHTracesvHsvh

),(),(

),(),(),( 2

2

svLsvLL

svLLsvLsvH

xyx

yxx

),(),(),( 2 svLsvLssvl yyxx

)()(),( sGvIvsLii vv


Salient points extraction

• Example :

scale


Salient points trajectories

• Trajectories used to :– Find salient points moving in the scene– Track salient points along the time

• Points matching using Local grayvalue invariants (Schmid)

kjiijk

lkijklij

kjiijkkkjiij

llijkklkijklij

jiij

ii

jiji

ii

LLLL

LLLL

LLLLLLLL

LLLLLLLL

LL

L

LLL

LL

L

wg

)(

)(

yxji

yxi

LL

jiij

ii

yxiii

,,,

,,0

,


Salient points trajectories

• Mahalanobis distance :

• Set of matching points minimize

– Greedy Winner-Takes-All algorithm

Set of points trajectories

Moving salient points :

1,

,tjti WwWw

ji wwd

jiT

jiji wgwgwgwgwwd 1,

1,, tjtijiw WwWwwwTi

wTw


Salient regions estimation

• Estimate characteristic regions of the moving salient points

• Mean-Shift algorithm : estimate the position

Likelihood of pixels (RGB colour distribution)

Ellipsoidal Epanechnikov Kernel

rvr of

v r

rv rv vPvvK

vvvPvvKr

www NvPvP ,

21

4

3rr

Trr vvvvvvK

r


Salient regions estimation

• Kernel adaptation step : estimate shape and size

• Algorithm :

rwWW

v

AdaptationKernel

ShiftMeanv

ssdiagwv

Ww

wW

rr

r

r

wwrr

converge , until

repeat

)3,3(,

each for

pointssalient moving

rr rvP cov

rr of


Salient regions representation

• Set of salient regions of activity represented by :

– Position – Ellipsoid – Colour distribution

– Set of salient points

• Salient regions tracking– Regions are matched by a majority vote of their salient

points

rvr

rgbrgbr ,

rW


Salient regions of activity


Regions classification

• To obtain an abstract description :– Map regions to a domain-specific basic vocabulary

Meetings : {Arm, Head, Body, Noise}

• SVM classifier :

– Set of 500 annotated salient regions of activity (~200 frames)


Regions classification

• Confusion Matrix :

• Discussion :– Noise class is ill-defined– Good results explained by the limited number of classes

Arm Head Body Noise

Arm 1.000 0 0 0

Head 0 0.909 0.091 0

Body 0 0 1.000 0

Noise 0 0.052 0 0.946


Visual event language

• To express visual events queries– Spatio-temporal constraints on labelled regions (LR)

• To integrate domain Knowledge– As specification of the layout (L)– As set of basic events

a formula of the language is a conjunctive form of :

– Temporal relations {after, just-after} between 2 LR– Spatial relations {above, left} between 2 LR {in} between a LR and a L– Identity relations {is} between 2 LR {is-a} between a LR and a label


Knowledege - Meetings

• Scene layout : L = {SEATS, DOOR, BOARD}


Knowledege - Meetings• Basic events : {Meeting-participant, sitting, standing}

Meeting-participant : actors LR

constraints is-a(head, LR).

Sitting : actor : LR

constraints : Meeting-participant(LR),

in(SEATS, LR).

Standing : actor : LR

constraints : Meeting-participant(LR),

~in(SEATS, LR).


Events queries

• Example of user queries :

Sitting-down : actors LR1, LR2

constraints is(LR1, LR2),

sitting(LR1),

standing(LR2),

just-after(LR1, LR2).

Go-to-board : actors LR1, LR2

constraints is(LR1, LR2),

standing(LR1),

~in(Board, LR1),

standing(LR2),

in(Board, LR2), just-after(LR2, LR1).


Events queries - Results

• Results :

• Discussion :• Recall validate the retrieval capability • False alarms occur because of the hard decision

Precision Recall

Sit-down 0.43 1.00

Stand-up 0.50 1.00

Go-to-board 1.00 1.00

Enter 0.20 1.00

Leave 0.25 0.50


Conclusion

• Contributions– Well-suited framework for constraint domains– Generic representation of the visual content– Paradigm to retrieve visual events from videos

• Limitations– Cannot retrieve all visual events (e.g. emotion)

• Ongoing work– Uncertainty handling and fuzziness– Integration of other modalities (e.g. transcripts)

knowledge-based event recognition from salient regions of activity

Documents

moving salient points

salient pointsnml cvml

scalenml cvml

regionsnml cvml

framesnml cvml

scenetrack salient points

map regions

matching points