10 surmei sped2009 pres

Upload: monica-mihaela-rizea

Post on 13-Feb-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/23/2019 10 Surmei SpeD2009 Pres

    1/12

    SpeDSpeD 20092009

    June 18June 18 21, 200921, 2009 Constanta, ROMANIAConstanta, ROMANIA

    RealReal--Time Architecture For A NetworkTime Architecture For A Network--Based TextBased Text--ToTo--Speech ServiceSpeech Service

    ImplementationImplementation

    Mihai Surmei *, Dragos Burileanu **, Cristian Negrescu **,

    Catalin Ungurean **, Aurelian Dervis **

    *

    ERICSSON Telecommunications Romania S.R.L.** Faculty of Electronics, Telecommunications and IT,

    University Politehnica of Bucharest, ROMANIA

  • 7/23/2019 10 Surmei SpeD2009 Pres

    2/12

    2

    OUTLINEOUTLINE

    GoalsGoals

    MultiMulti --service network architecturesservice network architectures

    TTS network servicesTTS network services

    A realA real--time processing environment for speechtime processing environment for speechsynthesis in Romaniansynthesis in Romanian

    TTS in IMS contextTTS in IMS context reference architecturereference architecture

    TTS engineTTS engine

    The proposed service:The proposed service: PoCPoC--ChatChat

    ConclusionsConclusions

  • 7/23/2019 10 Surmei SpeD2009 Pres

    3/12

    3

    GoalsGoals

    Build up a reference network-based media

    processing environment for Romanian TTS

    Particular service on proposed reference

    environment combining several network

    capabilities

  • 7/23/2019 10 Surmei SpeD2009 Pres

    4/124

    MultiMulti--service network architectureservice network architecture

    From peer-to-peer voice and shoot-and-forgetmessaging to session oriented real-time multi-

    service communication

    Mobile networks

    Fixed networks

    Nomadic networks

    Internet

    The common factor: IMS

  • 7/23/2019 10 Surmei SpeD2009 Pres

    5/125

    TTS network servicesTTS network services

    Existing TTS services Not real-time

    Proprietary client-server approach

    Next generation services

    Real-time Service mix

    Open architecture and protocols

  • 7/23/2019 10 Surmei SpeD2009 Pres

    6/126

    A realA real--time processing environmenttime processing environment

    for speech synthesis in Romanianfor speech synthesis in Romanian

    Leveraging on open

    protocols (MRCP)

    Following the latest

    development intelecom field

    Modular design Expandable to close

    the loop on speech

    recognition

    Supporting

    network

    TEXT

    SPEECH

    TEXTSPEECH

    Generic real-time TTS-basedtelecom service

  • 7/23/2019 10 Surmei SpeD2009 Pres

    7/127

    TTS in IMS contextTTS in IMS context referencereference

    architecturearchitecture

    IMS overlay network:

    supplementary control Access agnostic:

    GPRS/HSPA, ADSL,

    WiFi TTS closely related to

    MRFC/MRFP pair due tothe hybrid functionality:

    Signaling

    Payload

    MRCP protocol

    GGSN

    SBC

    CSCF

    AS MRFC

    MRFP

    MRCP Client

    HSS

    MRCPv2

    IMS

    overlay

    Any 3G mobile network

    MRCP Server

    TTS Engine

    SIP

    session 2

    SIP

    session 1

    RTP

  • 7/23/2019 10 Surmei SpeD2009 Pres

    8/128

    TTS engineTTS engine

    P rosody es t ima t ion

    Let ter - to-phoneconver s ion

    Linguis t icre source s

    (dic t ionar ies)Except i on

    dic t ionary

    T ext ana lys is

    ( t ex t norma l i za t ion , m orpho l og i c ,

    syntac t ic and contextua l ana lys i s)

    In p u t te x t

    S ynthes i s a l gor it hm (HN M )

    P rosody gen e ra ti on

    P r o s o d i c m o d e l

    ( intonat ion, dura t ion)

    Database( H N M p a r a m e t e rs ;

    auxi l ia ry informat ion

    a t s egmen t l eve l )

    Pho ne t i c t ranscr i p t ion ;con t ex t ua l , phone t ic and

    p ro so d ic in fo rm a tio n

    Co nve rs i on ru l e s

    (dec is ion t ree)

    Automat ic diacr i t i c

    res tora t ion

    S p e e c h

    Tex t wi th

    diacr i t i cs ?

    N O

    Y E S

    T w o-s tage acous t ic s egment

    se lec t ion a lgor i thm

    S peech s i gna l gene ra ti on

    The last version of our

    concatenative TTS system isbased on non-uniformacoustic units (diphones andpolyphones). The synthesis

    technique makes use of theHarmonic plus Noise Modelof speech.

    The TTS systemimplementation has beenenhanced in order to beintegrated in a multi-

    thread/multi-processenvironment.

  • 7/23/2019 10 Surmei SpeD2009 Pres

    9/12

    9

    The proposed service:The proposed service: PoCPoC--Chat (1)Chat (1)

    Paradigm shift towards multi-session

    convergent experience IMS network exposes services

    Our proposal - a service mix: Text (chat)

    Voice

    Presence

    TTS conversion

    PoC: Push to talk over cellular

  • 7/23/2019 10 Surmei SpeD2009 Pres

    10/12

    10

    The proposed service:The proposed service: PoCPoC--Chat (2)Chat (2)

    A and B users arechatting

    A change state, networkreacts switching on TTS

    conversion

    A hears the chat

    conversation

    TTS

    Engine

    Instant

    Messaging

    Presence

    Information

    TEXT TEXT

    AB

    TTS

    Engine

    Instant

    Messaging

    Presence

    Information

    VOICE

    TEXT

    A

    B

    PRESEN

    CE

  • 7/23/2019 10 Surmei SpeD2009 Pres

    11/12

    11

    The proposed service:The proposed service: PoCPoC--Chat (3)Chat (3)

    IMS test network using existing functional

    nodes: CSCF, HSS, AS

    ACE framework for MRCP server: TTS engine

    RTP stackSIP servlet technology for MRCP stack

  • 7/23/2019 10 Surmei SpeD2009 Pres

    12/12

    12

    ConclusionsConclusions

    To emphasize the TTS importance, we add it into thelarger set of existing telecom services

    We presented a convergent service example

    combining TTS, IM and presence

    IMS overlay network will allow mixing the needed

    capabilities on a single session Service realization is based on a modified TTS engine,

    new MRCP server and client developments