iccs-ntua contributions to e-teams of muscle wp6 and wp10

10
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10 Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http:// cvsp.cs.ntua.gr

Upload: caelan

Post on 21-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10. Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http://cvsp.cs.ntua.gr /projects/muscle. Researchers: P. Maragos, S. Kollias (Faculty members) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

ICCS-NTUA Contributions to E-teams of

MUSCLE WP6 and WP10

Prof. Petros MaragosNational Technical University of Athens

School of Electrical and Computer Engineering

URL: http://cvsp.cs.ntua.gr/projects/muscle

Page 2: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

ICCS-NTUA: E-team Researchers & Directions Researchers:

P. Maragos, S. Kollias (Faculty members)

G. Papandreou, K. Rapantzikos, G. Evangelopoulos, A. Katsamanis,

I. Kokkinos (PhD GRA)

G. Stamou, I. Avrithis (Post-Doc) (WP6) E-team 1: Audio-Visual (AV) Speech Analysis & Recognition

Face Detection, Modeling & Tracking

AV Feature Extraction, Fusion, Dynamic Models for AV-ASR

AV to Articulatory Speech Inversion

(WP6) E-team 2: Audio-Visual Understanding

Audio-Visual Salient Event Detection,

Integrated Multimedia Content Analysis

Page 3: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

AV-ASR Front-End

SpeechFeature Transform./Selection

Modulations – Energy• Multiband Filtering• Nonlinear Processing• Demodulation

VAD

Dynamics - Fractals • Embedding• Geometrical Filtering• Fractal Dimensions

Speaker Normalization

( )is t

M-Array

Processing

Visual • Active Appearance Model• Face Detection/Tracking• Mouth R.O.I. Features

Fusion

Feature Stream

MFCC

Page 4: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

Audiovisual ASR: Face Modeling

● A well studied problem in Computer Vision:● Active Appearance Models, Morphable Models, Active Blobs

● Both Shape & Appearance can enhance lipreading● The shape and appearance of human faces “live” in low

dimensional manifolds

+p1 +p2=

1 2=

Page 5: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

Image Fitting Example

step 2 step 6 step 10

step 14 step 18

Page 6: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

Example: Face Interpretation Using AAM

original video

shape track superimposed

on original video

reconstructed faceThis is what the

visual-only speech recognizer “sees”!

Generative models like AAM allow us to evaluate the output of the visual front-end

Page 7: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

Joint Image Segmentation and Object Detection via the  Expectation Maximization algorithm

•Generative models ‘compete’ for image observations

•Segmentation translates into the assignment of image observations into one of K models (image labelling)

•Segmentation labels are treated like hidden data

•EM algorithm:

•Ε-step: use current parameter estimates to assign micro-segments to objects

•M-step use assignment probabilities to derive optimal model parameters

•Active Appearance Models used as generative

models for the object categories of cars and faces

Page 8: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

Top-Down Segmentation Results Thresholding the E-step we get a hard figure-ground segmentation No ‘shape-prior’ knowledge is necessary for the segmentation

generative model contains information about shape variation

Combination of bottom-up & top-down detection

On false alarm locations the object model manages to reconstruct the image appearance only by chance, thereby typically getting a small image support for the object.

Page 9: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

Spatio-Temporal Visual Attention I: Video Analysis

Create video volume Feature extraction from spatiotemporal dataFusion & saliency generation

Page 10: ICCS-NTUA  Contributions to E-teams of MUSCLE WP6 and WP10

WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA

Use spatiotemporal VA for efficient global classification of videos Claim: features extracted only from low or high saliency

regions are more representative of the input video

Foreground/Background segmentationClaim: most salient regions are related to foreground

areas of the video

Spatio-Temporal Visual Attention II: Classification & segmentation