10 surmei sped2009 pres

7/23/2019 10 Surmei SpeD2009 Pres

1/12

SpeDSpeD 20092009

June 18June 18 21, 200921, 2009 Constanta, ROMANIAConstanta, ROMANIA

RealReal--Time Architecture For A NetworkTime Architecture For A Network--Based TextBased Text--ToTo--Speech ServiceSpeech Service

ImplementationImplementation

Mihai Surmei *, Dragos Burileanu **, Cristian Negrescu **,

Catalin Ungurean **, Aurelian Dervis **

*

ERICSSON Telecommunications Romania S.R.L.** Faculty of Electronics, Telecommunications and IT,

University Politehnica of Bucharest, ROMANIA


2/12

2

OUTLINEOUTLINE

GoalsGoals

MultiMulti --service network architecturesservice network architectures

TTS network servicesTTS network services

A realA real--time processing environment for speechtime processing environment for speechsynthesis in Romaniansynthesis in Romanian

TTS in IMS contextTTS in IMS context reference architecturereference architecture

TTS engineTTS engine

The proposed service:The proposed service: PoCPoC--ChatChat

ConclusionsConclusions


3/12

3

GoalsGoals

Build up a reference network-based media

processing environment for Romanian TTS

Particular service on proposed reference

environment combining several network

capabilities


4/124

MultiMulti--service network architectureservice network architecture

From peer-to-peer voice and shoot-and-forgetmessaging to session oriented real-time multi-

service communication

Mobile networks

Fixed networks

Nomadic networks

Internet

The common factor: IMS


5/125

TTS network servicesTTS network services

Existing TTS services Not real-time

Proprietary client-server approach

Next generation services

Real-time Service mix

Open architecture and protocols


6/126

A realA real--time processing environmenttime processing environment

for speech synthesis in Romanianfor speech synthesis in Romanian

Leveraging on open

protocols (MRCP)

Following the latest

development intelecom field

Modular design Expandable to close

the loop on speech

recognition

Supporting

network

TEXT

SPEECH

TEXTSPEECH

Generic real-time TTS-basedtelecom service


7/127

TTS in IMS contextTTS in IMS context referencereference

architecturearchitecture

IMS overlay network:

supplementary control Access agnostic:

GPRS/HSPA, ADSL,

WiFi TTS closely related to

MRFC/MRFP pair due tothe hybrid functionality:

Signaling

Payload

MRCP protocol

GGSN

SBC

CSCF

AS MRFC

MRFP

MRCP Client

HSS

MRCPv2

IMS

overlay

Any 3G mobile network

MRCP Server

TTS Engine

SIP

session 2

SIP

session 1

RTP


8/128

TTS engineTTS engine

P rosody es t ima t ion

Let ter - to-phoneconver s ion

Linguis t icre source s

(dic t ionar ies)Except i on

dic t ionary

T ext ana lys is

( t ex t norma l i za t ion , m orpho l og i c ,

syntac t ic and contextua l ana lys i s)

In p u t te x t

S ynthes i s a l gor it hm (HN M )

P rosody gen e ra ti on

P r o s o d i c m o d e l

( intonat ion, dura t ion)

Database( H N M p a r a m e t e rs ;

auxi l ia ry informat ion

a t s egmen t l eve l )

Pho ne t i c t ranscr i p t ion ;con t ex t ua l , phone t ic and

p ro so d ic in fo rm a tio n

Co nve rs i on ru l e s

(dec is ion t ree)

Automat ic diacr i t i c

res tora t ion

S p e e c h

Tex t wi th

diacr i t i cs ?

N O

Y E S

T w o-s tage acous t ic s egment

se lec t ion a lgor i thm

S peech s i gna l gene ra ti on

The last version of our

concatenative TTS system isbased on non-uniformacoustic units (diphones andpolyphones). The synthesis

technique makes use of theHarmonic plus Noise Modelof speech.

The TTS systemimplementation has beenenhanced in order to beintegrated in a multi-

thread/multi-processenvironment.


9/12

9

The proposed service:The proposed service: PoCPoC--Chat (1)Chat (1)

Paradigm shift towards multi-session

convergent experience IMS network exposes services

Our proposal - a service mix: Text (chat)

Voice

Presence

TTS conversion

PoC: Push to talk over cellular


10/12

10


A and B users arechatting

A change state, networkreacts switching on TTS

conversion

A hears the chat

conversation

TTS

Engine

Instant

Messaging

Presence

Information

TEXT TEXT

AB

TTS

Engine

Instant

Messaging

Presence

Information

VOICE

TEXT

A

B

PRESEN

CE


11/12

11


IMS test network using existing functional

nodes: CSCF, HSS, AS

ACE framework for MRCP server: TTS engine

RTP stackSIP servlet technology for MRCP stack


12/12

12

ConclusionsConclusions

To emphasize the TTS importance, we add it into thelarger set of existing telecom services

We presented a convergent service example

combining TTS, IM and presence

IMS overlay network will allow mixing the needed

capabilities on a single session Service realization is based on a modified TTS engine,

new MRCP server and client developments

10 surmei sped2009 pres

Documents