iccs-ntua contributions to e-teams of muscle wp6 and wp10
DESCRIPTION
ICCS-NTUA Contributions to E-teams of MUSCLE WP6 and WP10. Prof. Petros Maragos National Technical University of Athens School of Electrical and Computer Engineering URL: http://cvsp.cs.ntua.gr /projects/muscle. Researchers: P. Maragos, S. Kollias (Faculty members) - PowerPoint PPT PresentationTRANSCRIPT
ICCS-NTUA Contributions to E-teams of
MUSCLE WP6 and WP10
Prof. Petros MaragosNational Technical University of Athens
School of Electrical and Computer Engineering
URL: http://cvsp.cs.ntua.gr/projects/muscle
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
ICCS-NTUA: E-team Researchers & Directions Researchers:
P. Maragos, S. Kollias (Faculty members)
G. Papandreou, K. Rapantzikos, G. Evangelopoulos, A. Katsamanis,
I. Kokkinos (PhD GRA)
G. Stamou, I. Avrithis (Post-Doc) (WP6) E-team 1: Audio-Visual (AV) Speech Analysis & Recognition
Face Detection, Modeling & Tracking
AV Feature Extraction, Fusion, Dynamic Models for AV-ASR
AV to Articulatory Speech Inversion
(WP6) E-team 2: Audio-Visual Understanding
Audio-Visual Salient Event Detection,
Integrated Multimedia Content Analysis
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
AV-ASR Front-End
SpeechFeature Transform./Selection
Modulations – Energy• Multiband Filtering• Nonlinear Processing• Demodulation
VAD
Dynamics - Fractals • Embedding• Geometrical Filtering• Fractal Dimensions
Speaker Normalization
( )is t
M-Array
Processing
Visual • Active Appearance Model• Face Detection/Tracking• Mouth R.O.I. Features
Fusion
Feature Stream
MFCC
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
Audiovisual ASR: Face Modeling
● A well studied problem in Computer Vision:● Active Appearance Models, Morphable Models, Active Blobs
● Both Shape & Appearance can enhance lipreading● The shape and appearance of human faces “live” in low
dimensional manifolds
+p1 +p2=
1 2=
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
Image Fitting Example
step 2 step 6 step 10
step 14 step 18
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
Example: Face Interpretation Using AAM
original video
shape track superimposed
on original video
reconstructed faceThis is what the
visual-only speech recognizer “sees”!
Generative models like AAM allow us to evaluate the output of the visual front-end
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
Joint Image Segmentation and Object Detection via the Expectation Maximization algorithm
•Generative models ‘compete’ for image observations
•Segmentation translates into the assignment of image observations into one of K models (image labelling)
•Segmentation labels are treated like hidden data
•EM algorithm:
•Ε-step: use current parameter estimates to assign micro-segments to objects
•M-step use assignment probabilities to derive optimal model parameters
•Active Appearance Models used as generative
models for the object categories of cars and faces
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
Top-Down Segmentation Results Thresholding the E-step we get a hard figure-ground segmentation No ‘shape-prior’ knowledge is necessary for the segmentation
generative model contains information about shape variation
Combination of bottom-up & top-down detection
On false alarm locations the object model manages to reconstruct the image appearance only by chance, thereby typically getting a small image support for the object.
Spatio-Temporal Visual Attention I: Video Analysis
Create video volume Feature extraction from spatiotemporal dataFusion & saliency generation
WP6 E-teams: 8-12-2005 MUSCLEMUSCLEICCS - NTUA
Use spatiotemporal VA for efficient global classification of videos Claim: features extracted only from low or high saliency
regions are more representative of the input video
Foreground/Background segmentationClaim: most salient regions are related to foreground
areas of the video
Spatio-Temporal Visual Attention II: Classification & segmentation