multi-view super vector for action recognition

Multi-View Super Vector for Action Recognition

Shenzhen Institutes of Advanced Technology, CASChinese University of Hong Kong

Zhuowei Cai Yu QiaoXiaojiang PengLimin Wang

Content

• Motivation

• M-PCCA model & MVSV representation

• Experimental Results

Actions in Video Clips can be captured by ...

HOG HOF MBHx/MBHy

*Video from chalearn looking at people chanllenge

static feature dynamic features

Feature Fusion - Concatenation

HOG HOF

Concatenation before GMM : HOG + HOF

Defect : features presumed to be strongly correlated

Feature Fusion - Kernel Average

HOG HOF

Concatenation after GMM : CodeHOG + CodeHOF

Defect : features presumed to be mutually independent

CodeHOG CodeHOF

Decomposition

HOG HOF

HOG-Specific HOF-SpecificHOG/HOF-Shared


Merit : features are decomposed into relatively independent components


M-PCCAMixture of Probabilistic Canonical Correlation Analyzers

Content

• Motivation



Mixture of Probabilistic Models

X = Wx Z + Zx

Y = Wy Z + ZyZ ~ N(0, I), Zx ~ N(μx, Φx), Zy ~ N(μy, Φy).

Latent Variable Models

V = W Z + Zv

Probabilistic Canonical Correlation Analyzer. *B. Francis, M. Jordan; K. Arto, S. Kaski

Z ~ N(0, I), Zv ~ N(μ, Φ)

Probabilistic Principal Component Analysis: Φ = σI. *M. Tipping, C. Bishop

Probabilistic Factor Analysis: Φ is diagonal.

Mixture Version: M-PPCA *M. Tipping, C. Bishop, M-FA *G. Zoubin, G. Hinton

M-PCCA

EM Learning Algorithm

M-PCCA

M-PCCA

X = Wx Z + Zx

Y = Wy Z + Zy

M-PCCA

Z2

Z1

Z3

= Shared Part11 , 1{ } {[ , ] ( )}K T T K

k k x y k i k i k ki

Z z W W v

M-PCCA

M-PCCA

gx

gy

= Private Part

1

( ) ( ){ , }Kkk k

x x

E L E L

1

( ) ( ){ , }Kkk k

y y

E L E L

M-PCCA

gx gy Private Part

Shared Part +

= Multi-View Super Vector

Content

• Motivation



Performance w.r.t number of Components Performance w.r.t Latent Dimension

MVSV with SVM classifier on HMDB51with various configurations

Results#components = 256, dimension = 45

HMDB51 FV VLAD MVSV

Fusion d-level k-level d-level k-level k-level

HOG+MBH 50.9% 50.4% 47.0% 48.5% 52.1%

HOG+HOF 47.0% 48.3% 44.4% 47.7% 48.9%

MBHx+MBHy 49.2% 49.1% 45.2% 47.0% 51.1%

Combine 52.4% 53.2% 51.5% 52.6% 55.9%

UCF101 FV VLAD MVSV

Fusion d-level k-level d-level k-level k-level

HOG+HOF 76.1% 77.7% 75.7% 77.5% 78.9%

MBHx+MBHy 78.9% 78.7% 75.6% 76.3% 80.9%

Combine 81.1% 81.9% 80.6% 81.0% 83.5%

X

Fusion

Y

X

Y

Descriptor-level

(linear) Kernel-level

GMM

GMM

Fusion

GMM

SVM Score

SVM Score

X

Y

MVSV

M-PCCA Fusion SVM Score

X

Y

Score-levelGMM

GMM

Fusion Score

SVM Score

SVM Score

multi-view super vector for action recognition

Documents

wx z zxy

wy z zyz

wy z zympccaz2z1z3

shared partmpccam

private partmpccagxgy

multiview super vector

t latent dimensionmvsv

levelklevelhog mbh50