some activities on non-linear speech processing at enst/cnrs-ltci gérard chollet...

23
Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet @ tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13 http://www.tsi.enst.fr/~chollet

Upload: anne-marteau

Post on 03-Apr-2015

109 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Some activities on Non-linear Speech Processing

at ENST/CNRS-LTCI

Gérard [email protected]

ENST/CNRS-LTCI46 rue Barrault

75634 PARIS cedex 13http://www.tsi.enst.fr/~chollet

Page 2: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Outline

What is ENST/CNRS-LTCI ?

Research and application topics related to COST-277: Speech production and perception, Speech analysis and synthesis, Speech coding:

The SYMPATEX project Automatic speech recognition:

The SIROCCO project Speaker characterisation and verification

Perspectives within COST-277

Page 3: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

ENST:ENST: Ecole Nationale Supérieure des Ecole Nationale Supérieure des TélécommunicationsTélécommunicationshttp://www.enst.frhttp://www.enst.fr

CNRS:CNRS: Centre National de la Recherche ScientifiqueCentre National de la Recherche Scientifiquehttp://www.cnrs.frhttp://www.cnrs.fr

LTCI:LTCI: Laboratoire de Traitement et Communication Laboratoire de Traitement et Communication de l’Informationde l’Information

http://www.enst.fr/externe/ura.htmlhttp://www.enst.fr/externe/ura.html

Our affiliations

Page 4: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

What is ENST?Ecole Nationale de

Télécommunications

• classed among the

‘Grandes Ecoles d'Ingénieurs’.

• 250 state certified engineers

each year .

• part of ‘Groupement des Ecoles

de Télécommunications’

Page 5: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

ENST-Paris ( ) ENST-Bretagne in Brest Institut National des

Télécommunications in Evry EURECOM in Sophia-Antipolis ENIC (Ecole Nouvelle d’Ingénieurs en

Télécoms) in Lille Internet school in Marseille

GET: Groupement des Ecoles de Télécommunications

Page 6: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Speech Production and Perception

Parametric Vocal Tract model (Shinji Maeda)

Non-linear Production model using Distinctive Regions and Modes (René Carré)

Quantal nature of speech (R. Carré and S. Maeda)

Perceptual filter (Nicolas Moreau)

Auditory prosthesis (Alain Goyé and Jacques Prado)

Page 7: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Speech analysis and synthesis

Time-Frequency representations, Wavelets

Time-dependent spectral models (Yves Grenier)

HNM (Harmonics + Noise Model)(Olivier Cappé, Eric Moulines, Maurice Charbit)

Glottal Excited LPC

Page 8: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Time-dependent Spectral Models

Temporal Decomposition (B. Atal, 1983)

Vectorial Autoregressive models with detection of model ruptures (A. DeLima, Y. Grenier)

Segmental parameterisation using a time-dependent polynomial expansion (Y. Grenier)

Page 9: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Temporal Decomposition

Page 10: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

HNM: Harmonics + Noise Model

Estimation des harmoniques

Estimation de l’enveloppe harmonique

Paramètres H+Bf

A

Signal à l ’entrée Voisement

EstimationAR du

résiduel

Détection dupitch, etl’énergie

EstimationAR

+-

+

Voisé

Non-voisé

f

A

Page 11: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

A L I S P

A utomatic L anguage I ndependent S peech P rocessing

Automatic discovery of segmental units for speech coding, synthesis, recognition, language

identification and speaker verification.

Page 12: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Speech Coding by indexing

SYMPATEX

SYstème de Messagerie unifiée avec présentation vocale des messages (PArole et TEXte)

Thomson-CSF, ELAN TTS, Irius

GET, ESIEE

Page 13: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Coding principle

parole Analyse spectral

e

Analyse prosodiqu

e

Reconnaissance HMM

Dictionnaire des modèles

HMM des unités ALISP

Représentant A1

Représentant A8

HMM A

Détermination des unités de

synthèse

Choix unité de synthèse par

DTW

Codage prosodie

Indice unité ALISP

Indice unité de

synthèsePitch,

énergie, temps

Page 14: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Decoding

Parole synthétique

Représentant A1

…Représentant A8

Indice ALISP

N° représentant de synthèse

Paramètres de prosodie

Choix unité de synthèse

Synthèse par

concaténation

Page 15: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Automatic Speech Recognition

Recognition of proper names and spellings

Keyword spotting, noise robustness, adaptation

Large Vocabulary Speech Recognition (SIROCCO)

http://perso.enst.fr/~sirocco/index-en.html

Markov Random Fields, Bayesian Networks and Graphical Models

Page 16: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Markov Random Fields Bayesian Networks

and Graphical Models

• Speech modelling with state constrained Markov Random Field over Frequency bands (Guillaume Gravier and Marc Sigelle) http://perso.enst.fr/~ggravier/recherche.html#these

• Comparative framework to study MRF, Bayesian Networks and Graphical Models. http://www.cs.berkeley.edu/~murphyk/Bayes/bayes.html

Page 17: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Speaker Verification

Typology of approaches (EAGLES Handbook) Text dependent

Public password Private password Customized password Text prompted

Text independent Incremental enrolment Evaluation

Page 18: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Speaker Verification (text independent)

The ELISA consortium ENST, LIA, IRISA, ... http://www.lia.univ-avignon.fr/equipes/RAL/elisa/

index_en.html

NIST evaluations http://www.nist.gov/speech/tests/spk/

index.htm

Page 19: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Support Vector Machines and Speaker Verification

Hybrid GMM-SVM system is proposed

SVM scoring model trained on development data to classify true-target speakers access and impostors access,using new feature representation based on GMMs

Modeling

Scoring

GMM

SVM

Page 20: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

SVM principles

X (X)

Inpu

t sp

ace

Feat

ure

spac

e Separating hyperplan H , with the optimal hyperplan Ho

Ho

H

Class(X)

Page 21: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Results

Page 22: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Voice technology in Majordome

Server side background tasks:continuous speech recognition applied to voice messages upon reception Detection of sender’s name and subject

User interaction: Speaker identification and verification Speech recognition (receiving user

commands through voice interaction) Text-to-speech synthesis (reading text

summaries, E-mails or faxes)

Page 23: Some activities on Non-linear Speech Processing at ENST/CNRS-LTCI Gérard CHOLLET chollet@tsi.enst.fr ENST/CNRS-LTCI 46 rue Barrault 75634 PARIS cedex 13

Perspectives within COST-277

Text-book on Speech Processing

Evaluation of parametric representations of speech for diverse applications

Fundamental work on voice transformations with applications in coding, synthesis, recognition and speaker characterisation

Fundamental work on noise robustness with applications in coding, recognition and speaker verification