december 19, 2005

35
December 19, 2005 FPMS

Upload: tender

Post on 19-Mar-2016

36 views

Category:

Documents


0 download

DESCRIPTION

FPMS. December 19, 2005. Acapela’s corporate profile. Group Background. Babel Technologies > Created in 1995 in Mons (Belgium) > Spin off of Mons Polytechnical University > In-house TTS & ASR technologies > TTS and ASR leader in Embedded environment. Infovox - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: December 19, 2005

December 19, 2005

FPMS

Page 2: December 19, 2005

Acapela’s corporate profile

Page 3: December 19, 2005

Group Background

Babel Technologies> Created in 1995 in Mons (Belgium)> Spin off of Mons Polytechnical University > In-house TTS & ASR technologies> TTS and ASR leader in Embedded environment

Infovox> Created in 1983 in Stockholm (Sweden)> Spin off of KTH (Royal Institute of Technology)> Integrated into Telia Promotor in 1993> Acquired by Babel Technologies in 2001> TTS leader in Nordic, Germany and Netherlands> Accessibility and Telecom expertise

Elan Speech> Created in 1980 in Toulouse (France)> Focused on TTS since 1996> Launch of in-house high quality TTS in 2002 (Elan Sayso)> TTS leader in Telecom and Automotive

Page 4: December 19, 2005

Acapela’s locations

France, Toulouse

Belgium, Mons

Sweden, Stockholm

3 sites50 people

InternationalTeam

Local support in each site

Merged organization

Page 5: December 19, 2005

Acapela’s multilingual offer

ASR & TTS components in 23 languages

Page 6: December 19, 2005

Acapela’s technologies

Page 7: December 19, 2005

Technologies (TTS)

Architecture

Text Preprocessor

Synthesizer

Tagger

Phonetizer

Prosody

Set of RulesSet of Rules

Dictionary basedDictionary based

Phonetic tree + DictionaryPhonetic tree + Dictionary

Prosodic PatternsProsodic Patterns

database (Voice)database (Voice)

Page 8: December 19, 2005

Text Preprocessor

> Function– Generation of standard text

> Examples– Numbers: 100 one hundred– Currencies: $20 twenty dollars– Abbreviations: tel. telephone

> Implementation– Rules are defined in a standard format (BNF

format) > Size of data

– 20 Kbytes

Text Prepro.

Tagger

Phonetizer

Prosody

Speech Synth

Page 9: December 19, 2005

Tagger (optional)

> Function– Generation of grammatical function of each word– Optional: not necessary for all languages

> Examples– To read – I have read– Les poules du couvent couvent

> Implementation– Dictionary based + set of rules

> Size of data– 0 to 20 Kbytes

Text Prepro.

Tagger

Phonetizer

Prosody

Speech Synth

Page 10: December 19, 2005

Phonetizer

> Function– Generation of phonetic transcription for each word

> Examples– Babel: b a b E l

> Implementation– Decision tree + exception dictionary

> Size of data (language dependent)– 5 to 350 Kbytes

Text Prepro.

Speech Synth

Phonetizer

Prosody

Tagger

Page 11: December 19, 2005

Prosodic module

> Function– Generation of intonation:

• Phoneme duration• Pitch markers

> Examples– See MBROLI application

> Implementation– Prosodic patterns extracted from speech corpus

> Size of data (language dependent)– 30 to 300 Kbytes

Text Prepro.

Prosody

Tagger

Phonetizer

Speech Synth

Page 12: December 19, 2005

Synthesizer

> Function– Generation of speech samples from phoneme

sequence + intonation> Implementation: 3 technologies

– Formant-based = rules– Diphone concatenation– Unit Selection

> Size of data: depends on– Technology– Sampling frequency– Compression rate– From 50 Kbytes to 50 Mb

Text Prepro.

Speech Synth.

Tagger

Phonetizer

Prosody

Page 13: December 19, 2005

Technologies (ASR)

Speech Recognition

Hybrid Models : Hidden Markov Models/ Neural Networks.Hybrid Models : Hidden Markov Models/ Neural Networks.

Analyse Acoustique

Reseau neurones

HMM

DiscriminationDiscrimination

Programmation Dynamique Programmation Dynamique (decoder)(decoder)

Page 14: December 19, 2005

Reconnaissance

Vocabulaire– Transcription phonétique

Ex: reconnaissance: R [@] k O n E s a~ s– Envisager toutes les transcriptions !

Ex: 10 = dis – diz – di– Envisager les synonymes !

Ex: Oui , ouais, ok, c’est cela, …Ex: Télévision, TV, poste de télévision

Page 15: December 19, 2005

Reconnaissance (suite) : difficultés

BruitAccentsHésitationsUtilisateursSyntaxe incorrecteMots hors vocabulaire

Page 16: December 19, 2005

ASR : advantage of NN

Page 17: December 19, 2005

Acapela’s product overview

Page 18: December 19, 2005

Acapela’s Technologies Overview

> High-Quality TTS : the pleasant and natural sounding voicevoice enabled by Sayso and BrightSpeech based on Unit Selection technology

> High-Density TTS : the right choice for high density and small footprintsvoice enabled by Tempo and Babil based on Diphone technology

> ASR : the robust speech recognizervoice enabled by Babear Speaker Independent ASR based on Hidden Markov Models and Artificial Neural Networks

3 Technologies

Page 19: December 19, 2005

Two TTS technologies

Diphone based concatenative TTS

Advantages• Small footprint (2 to 6 Mb)• Flexibility (Pitch, Speed adjustment, prosody copying)• High intelligibility• 21 languages supported

Disadvantage :• Less natural sounding

Markets/Application targeted :• Automotive & consumer electronic (low footprint)• High density, short ROI server based TTS (telephony)• Multimedia software products

High Density TTSVoice enabled by Tempo & Babil

Page 20: December 19, 2005

LanguageFrench Female MaleUS English Female MaleUK English Female MaleGerman Female MaleSpanish (castillian) Female MaleItalian Female MalePolish MaleRussian MaleDutch ( NL ) Female MaleDutch ( B ) FemaleContinental Portuguese FemaleDanish Female MaleSwedish Female MaleNorwegian MaleFinnish MaleIcelandic MaleCzech FemaleTurkish MaleArabic MaleSouth American Spanish FemaleBrazilian Portuguese Female Male

Gender

High Density TTS language availability

Page 21: December 19, 2005

Unit selection concatenative TTS

Advantages :• Very high quality• Highly natural• Flexibility (Pitch, Speed adjustment, timber alteration, whispering feature)• Support for Custom voice (“SpeechBrand” Program)

Disadvantage :• larger footprint (16 to 70 Mb)

Markets/Application targeted :• High end telephony application (Voice portal, news)• New generation of navigation terminals• Public address

High Quality TTS Voice enabled by Sayso & BrightSpeech

Page 22: December 19, 2005

Language StatusFrench Female Male AvailableUS English Female Male AvailableUK English Female Male AvailableGerman Female Q1-2006 AvailableSpanish (castillian) Female AvailableItalian Female AvailablePolish Female AvailableSwedish Female Male AvailableArabic Female Male AvailableDutch ( NL ) Female AvailableDutch ( B ) Female AvailableNorwegian Female AvailableContinental Portuguese FemaleDanish ****Mexican Spanish ***Finish **Canadian French *

Gender

High Quality TTS language availability

Page 23: December 19, 2005

Hybrid technology of Hidden Markov Models and Artificial Neural Networks

Advantages :• Very high accuracy in difficult contexts• High dialog flexibility, • lip-sync and language learning capabilities thru phoneme level discrimination• Speaker independent• Accurate Voice Activation for noisy environments

Markets/Application targeted :

• Industrial Data collection : inventories, picking…• Automotive• Name dialing• Multimedia Command & Control / language learning

ASR Voice enabled by Babear

Page 24: December 19, 2005

Language Robustness StatusUS English +++ AvailableUK English +++ AvailableSpanish + AvailableFrench +++ AvailableGerman +++ AvailableItalian ++ AvailableDutch + AvailableGreek + AvailableArabic ++ Available

ASR language availability

Page 25: December 19, 2005

Acapela’s market coverage

Page 26: December 19, 2005

Acapela’s Markets

Solutions for Telecom, Automotive, Accessibility

Mobility, Industry, Multimedia, Consumer Electronics.

Page 27: December 19, 2005

Leading 3 major and mature marketsTelecom, Automotive, Accessibility

Acapela’s Markets

Page 28: December 19, 2005

Acapela’s main Markets

TelecomServer based vocalization of contents for multiple users over the phone• for Companies : Unified messaging, Auto attendant, CRM• for Telcos : Unified messaging, Voice portal, SMS2Voice, directory and reverse directory• for Contact centers: call automation, FAQ

            

                   

                

                

                

                

Page 29: December 19, 2005

Acapela’s main Markets

AutomotiveOn board and off-board speech solutions• On board & Off board car navigation systems• Traffic information• PDA based applications• Telematics          

Page 30: December 19, 2005

Acapela’s main Markets

AccessibilityAssistive technologies• Screen readers• Reading machines• Voice-controlled mobile phones

Page 31: December 19, 2005

Creating new speech markets opportunities in

Acapela’s Markets

>> Mobility• Cell phones• Navigation on PDAs

Page 32: December 19, 2005

Creating new speech markets opportunities in

Acapela’s Markets

>> Industry• Public Address• Alarm & Supervision• Warehousing, Production Line

      

Page 33: December 19, 2005

Creating new speech markets opportunities in

Acapela’s Markets

>>Multimedia • Edutainment • Education• Language learning• E-learning

Page 34: December 19, 2005

Creating new speech markets opportunities in

Acapela’s Markets

>> Consumer Electronics, …• Talking dictionaries devices• Toys

Page 35: December 19, 2005

giving you the say