centro per la ricerca scientifica e tecnologica

26
Centro per la Ricerca Scientifica e Tecnologica Spoken language technologies: recent advances and future challenges Gianni Lazzari VIENNA July 26

Upload: wayne

Post on 25-Feb-2016

57 views

Category:

Documents


3 download

DESCRIPTION

Centro per la Ricerca Scientifica e Tecnologica. Spoken language technologies: recent advances and future challenges Gianni Lazzari VIENNA July 26. SUMMARY Short introduction on SLT Where are we today ? TC-STAR and RAI projects Outlook for the future. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Centro per la Ricerca Scientifica e Tecnologica

Centro per la Ricerca Scientifica e Tecnologica

Spoken language technologies: recent advances and future challenges

Gianni LazzariVIENNA July 26

Page 2: Centro per la Ricerca Scientifica e Tecnologica

Centro per la Ricerca Scientifica e Tecnologica

Focus on the use of Spoken Language Technologies for multilingual transcription

and reporting tasks

SUMMARY Short introduction on SLT Where are we today ? TC-STAR and RAI projects Outlook for the future

Page 3: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

3

Typical tasks in Human Language Technologies

(HLT) speech recognition (voice commands & speech

transcription) character recognition object and gesture recognition (spoken and written) language understanding spoken dialog systems speech synthesis text summarization document classification and information retrieval syntactic analysis of natural language speech and text translation • ...

Page 4: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

4

General Spoken Language System Architecture

inputinputRecognition

Understanding and dialog

Generation and Synthesis

answeranswer

MODELS

acoustic

language

semantic

dialog

synthesis

Page 5: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

5

Speech Transcription System Architecture

InputInput

Audio:Audio: -Noise-Noise

-Speech-Speech

-Music-Music

-…..-…..

Recognition

results:results:

Enriched Text Enriched Text

MODELS

Acoustic

Language

Speakers

Speech Music Noise

Page 6: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

6

Typical Transcription System

Page 7: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

7

Standard Automatic Speech Recognition Architecture

Page 8: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

8

Word error rate of different speech recognition tasks

Dictation: 7%, well formed, computer, FBWBroadcast news: 12%, various, audience, FBWSwitchboard : 20-30% spontaneous, person, TBWVoicemail: 30% spontaneous, person, TWBMeetings: 50-60% spontaneous, person FF

The features characterizing these tasks are:

type of speech: well formed vs spontaneous target of communication: computer, audience, person bandwidth:

FWB, full bandwidth TWB, telephone bandwidth FF, far field.

Page 9: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

9

RAI Italian Broadcast news Transcription

Page 10: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

10

Evaluation of the Italian broadcast news transcription task.

Acoustic models are trained through a speaker adaptive acoustic modelling procedures

Two sets of acoustic models were trained, for wideband and narrowband speech: exploiting for each set about 140 hours of speech.

The LM was estimated on a 226M-word corpus including newspaper articles, for the largest part, and BN transcripts.

The LM is compiled into a static network with a shared-tail topology..

Page 11: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

11

Word error rate on the Italian broadcast news transcription task.

Wideband Narrowband Overall

First Pass

Second Pass

First Pass

Second Pass

First Pass

Second Pass

Old 15.5 14.2 25.2 22.4 17.6 16.0

New 14.6 11.7 21.0 17.1 16.0 12.9

Relative reduction

5.8% 17.6% 16.7% 23.7% 9.1% 19.4%

Page 12: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

12

STATISTICAL TRANSLATION BASED ON BAYESIAN DECISION RULE

Speech recognition Transformation

Source language text

Global Search

Transformation Speech synthesis

target language text

Lexicon model

Alignment model

Language model

Vorrei prenotare un albergo a Francoforte

I want to reserve a hotel room in Frankfurt

Page 13: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

13

Statistical Translation System

Page 14: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

16

Experimental findings in HLT research (1973-2004)

statistical methods most successful: in particular: speech recognition, language translation, parsing, dialog

systems, ... scientific foundations:

methods of computer science, statistical modelling, information theory handling huge amounts of data

200 hours of speech recordings, 100 Mio of running words, ... learning from data:

fully automatic procedures more data than can be processed by human experts

efficient algorithms: search/decision algorithms for heuristic search

• ...

Page 15: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

17

Research on HLT: 1973-2004

speech recognition (1973-2004) most of the progress: by pure statistical modelling some progress: by weak acoustic-phonetic-linguistic

knowledge,i.e. domain specific knowledge virtually no progress: by classical rule-based and AI methods

similar recent experience (1993-2004) machine translation, information extraction, dialog systems, ...

expectation for future progress in HLT most important: methodology: computer science, statistical modelling, information theory domain-specific knowledge:

acoustics, phonetics, linguistics, ...

Page 16: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

18

Spoken language translation: joint projects (national, European, international: ATR, C-Star, Verbmobil, Eutrans, Nespole!, Fame, LC-Star, PF-Star, TC-STAR:

restricted domains: appointment scheduling, conference registration, travelling, tourism

information, ... • vocabulary size: 3 000 – 10 000 words best performing systems and approaches: data-driven

example-based methods finite-state transducers statistical approaches

e.g.: Verbmobil evaluation [June 2000]: better by a factor of 2 written language translation: US Tides project 2001-2004

unrestricted domain: press news, vocab.size »= 50 000 words language pairs: Chinese!English, Arabic!English performance [July 2003]: best statistical systems are better than conventional/commercial

systems

Page 17: Centro per la Ricerca Scientifica e Tecnologica

TC-STARTechnology and Corpora

for Speech to Speech Translation

Contract Nr. FP6 506738

VI FRAMEWORK PROGRAM PRIORITY Multimodal Interfaces

IST-2002-2.3.1.6

Page 18: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

20

PARTNERS

Page 19: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

21

TC-STAR Project focuses on advanced research in key technologies for speech to speech translation:

- speech recognition (ASR)- spoken language translation (SLT)- speech synthesis (TTS)

- Start: April 2004- End: March 2007- Grant: 11 M. Euro

- METHODOLOGY: - COMPETITIVE EVALUATION- COOPERATION

TC-STAR

Page 20: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

22

Vision

Transcription and Translation of broadcast news, speeches, lectures and interviews

Vocal access

Web access

SimultaneousTranslation

Hi, What do you think about

Page 21: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

23

Application Scenario

A selection of unconstrained conversational speech domains:

- Broadcast news - European Parliament

Plenary Session

A few languages important for Europe society and economy:

European Accented English European Spanish Chinese

Page 22: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

24

2005 FIRST EVALUATION RESULTS ON

THE EUROPEAN PARLIAMENT PLENARY SESSION

TASK The Evaluation Tasks and Databases translation tasks:– English to Spanish: EPPS: European Parliament Plenary Sessions– Spanish to English: EPPS: European Parliament Plenary Session

Three types of input to SLT: – output of automatic speech recognition – verbatim manual transcriptions – final text editions (with punctuation marks)

Page 23: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

25

2005 FIRST EVALUATION RESULTS ON

THE EUROPEAN PARLIAMENT PLENARY SESSION

TASKTraining data • Sentence-aligned speeches and their translations • Final text editions: from April 1996 to Oct. 4th, 2004 • Verbatim transcriptions: from May 2004 to Oct. 4th, 2004

Development data Oct. 26, 2004Evaluation data Nov. 14, 2004

Page 24: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

26

2005 FIRST EVALUATION RESULTS ON

THE EUROPEAN PARLIAMENT PLENARY SESSION

TASK

Page 25: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

27

2005 FIRST EVALUATION RESULTS ON

THE EUROPEAN PARLIAMENT PLENARY SESSION

TASKASR EPPS DATA word error rate - wer- EUROEPAN ACCENTED ENGLISH: 9,5 % best TC-STAR- EUROPEAN SPANISH : 10,1 % best TC-STAR

SLT EPPS DATA position independent - wer- ENGLISH TO SPANISH 49% best PARTNER result- SPANISH TO ENGLISH 46% best PARTNER result

Page 26: Centro per la Ricerca Scientifica e Tecnologica

Spoken Language Technologies: recent advances and future challenges

28

“ The spoken translation problem …….is still a significant challenge:

Good text translation was hard enough to pull off. Speech to speech MT was beyond going to the Moon – it was Mars…” [Steve Silbermann, Wired Magazine].