a compact and discriminative face track descriptorvgg/publications/2014/... · very simple yet...

Omkar M Parkhi, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

A Compact and Discriminative!Face Track Descriptor

Recognising and verifying faces in videos 2

Recognition Verification

different

VF2: a new compact face track descriptor

▶Discriminative!▶Useful for different tasks (Recognition, Verification)!▶Extremely compact

Face track: sequence of face detections in consecutive frames.

face track descriptor

Large scale face retrieval

▶ Example of a typical target dataset!▶ 5 years of evening news programs!▶ 10,000 hrs of broadcast !▶ 20 Million frames, !▶ 2.1 Million face tracks!▶ Real time performance

http://www.robots.ox.ac.uk/~vgg/research/on-the-fly/

▶ 30 frames per track on average!▶ Typical 4000D descriptor → 1 TB!▶ Our descriptor → 270 MB!

d2W (x, y)

[011001010]

Outline

1. Dense feature computation!

2. Fisher Vector encoding!

3. Video and jittered pooling!

4. Compression by metric learning!

5. Binarisation!

6. Results

1. Dense feature computation

▶ Input: a face track!▶ Aligned or unaligned!▶ No facial landmarks required (eyes, nose, etc.)!

▶ Output: a set of local features !▶ Extracted from all frames!▶ Dense RootSIFT at multiple scales!▶ 64-D PCA

d2W (x, y)

[011001010]

Outline

5. Binarisation!

6. Results

[Perronnin et al. ECCV 2012]

2. Fisher Vector encoding 8

first and second order statistics

FV encoding+ sqrt-L2

normalisation

6666666664

v1u1v2u2...vKuK

7777777775 uk =1

Mp2⇡k

�k(xi )

✓xi � µk

�i� 1

Mp⇡k

�k(xi )xi � µk

Gaussians!(μk,Σk)

xi�k(xi )

AssignmentHard

Dense SIFT

Gaussian components as part detectors

2. Fisher Vector Encoding 9

66664 xW � 1

2yH � 1

Spatial (x,y) Augmentation

d2W (x, y)

[011001010]

Outline

5. Binarisation!

6. Results

3. Video and jittered pooling

▶ Typically each frame is pooled independently!▶ Complex inference procedures combining multiple descriptors!▶ Large memory footprint

[Sivic et al. CVPR 09, Everingham et. al IVC 09,, Wolf et al. CVPR 2011]

▶ Single descriptor per track!▶ Smaller memory footprint!▶ Easy to use!▶ Improved performance

[Application to Action Recognition: Oneata, Verbeek, Schmid ICCV 2013]

▶ Data augmentation!▶ Data augmentation without training set increase!▶ Improvement in the performance

[Paulin et al. CVPR 2014]

d2W (x, y)

[011001010]

Outline

5. Binarisation!

6. Results

Learn to discriminate faces

4. Metric Learning 15

d2W (x, y) = kW x�W yk2 < b

same person

d2W (u, v) = kWu�W vk2 > b

x y uv

different people

d2W (x, y) = kW x�W yk2

[Simonyan, Parkhi, Vedaldi, Zisserman BMVC 2013]

learnt projectionFisherVector

d2W (x, y)

[011001010]

Outline

5. Binarisation!

6. Results

Parseval Tight Frame

5. Binarisation

▶ Low-dimensional real-valued descriptor → high dimensional binary !▶ 4x decrease in memory footprint (128D real → 1024D binary)!▶ Fast distance computation!▶ Alternative binarisation methods could be used

q ⨉ m!Columns from a

random rotation matrix

sign= q

q bits only

0101010

[Jégou et al. ICASSP 2012, Simonyan et al. PAMI 2014]

real-valued descriptor

U z sign(Uz)zU

d2W (x, y)

[011001010]

Outline

5. Binarisation!

6. Results

Face Verification

YouTube Faces Dataset 19

▶ Face verification in videos!▶ 3,425 videos of 1,595 celebrities!▶ Videos collected from internet!▶ Wide pose, expression and illumination variation!▶ 10 splits of 600 pairs of videos!

▶ Restricted setting: Use provided pairs!▶ Unrestricted setting: Free to form own pairs.

differentsame

[Wolf, Hassner, Moaz CVPR 2011]

Image Pool (Soft assignment FV)

Video Pool (Soft assignment FV)

Video Pool hard asignment fv

Video Pool + Jittered Pool

Video Pool. + Binar. 1024 bit + jitt.

Video Pool. + Joint sim. + jitt.

Error0 4.5 9 13.5 18

Face Verification

MGBS & SVM-APEM FUSION

STFRD & PMMLVSOF & OSS (Adaboost)

DDML (Combined)VF 1024D (binary)

VF 256DDeep Face (facebook.com)

Error0 5.5 11 16.5 22

8.612.3

13.418.5

2019.9

21.421.2

Requires additional training data.

Weakly supervised face classification

Oxford Buffy Dataset 22

▶ “Buffy The Vampire Slayer”!▶ Face tracks from 7 episodes of season 5.!▶ Both frontal and profile detections!▶ Weak supervision from transcript and subtitles!▶ Multi Class classification for every episode

[Everingham et al. IVC 2009, Sivic et al. CVPR 2009]

Weakly supervised classification

Oxford Buffy Dataset 23

Sivic et al. (HOG RBF MKL)

VF ( GMMs trained on Buffy )

VF ( GMMs trained on YTF )

VF ( GMMs trained on YTF ) + Jitt. Pool 1024D

VF ( GMMs trained on YTF 2048b)

Avg. AP0.79 0.808 0.825 0.843 0.86

Very simple yet powerful face track descriptor!

Recap 24

▶Track descriptor in 128 bytes!▶Face landmarks and alignment not required!▶One descriptor per track!!

▶State of the art/comparable results on multiple tasks!▶YouTube Faces Dataset!▶Oxford Buffy Dataset!

!▶ Can be trained with very small amount of data!▶ Extremely easy to compute!

!▶ Code online soon.

Questions?

a compact and discriminative face track descriptorvgg/publications/2014/... · very simple yet...

Documents

descriptor vigna

lubrication of track year for gauge face ... - indian...

petp programme descriptor

face track

face recognition with learning-based descriptor

hcpcs: descriptor:

face representations in deep convolutional neural...

face recognition using the weber local descriptor

encoding cnn activations for writer recognition · encoding...

learning compact feature descriptor and adaptive matching...

solution deployment descriptor face to face

programme descriptor

descriptor 3

fourier descriptor

coupled information -theoretic encoding for face photo...

risk you may face when garage door is off track

mpc184 descriptor programmer’s guide—8xx view...mpc184...

descriptor beg int

ova descriptor file properties - cisco · ova descriptor...

midwifery pre-programme information - university of web...