speech to speech real time translations, aigars macins, skype

34

Upload: taus-enabling-better-translation

Post on 15-Apr-2017

205 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Skype Translator: Goal•To support open-domain conversations between Skype users in different parts of the world, speaking different languages

4

Speech to Speech real time translationsAigars MačiņšTechnical Program ManagerSkype Localization TeamMicrosoft

TAUS Roundtable 2016, 1 June, Riga

Skype Translator: What is it?•Translate voice calls and video calls in 7 languages and instant messages in over 50!

Automatic Speech Recognition (ASR) Microsoft Translator Skype Infrastructure Skype Translator Client app

Skype Translator process

The Challenges• The gulf between speech and text

• It’s not enough to just chain a really good ASR system with a really good MT system• How people talk to each other is not how they write

• Building really good conversational ASR and MT systems• Significant changes in the data we use to train the ASR and MT systems.

• The gap between technology demo and consumer product• Producing models with shippable latency• Interesting problems one encounters with real consumers

How people really speakWhat person thought they said:

Yeah. I guess it was worth it. Ja. Ich denke, es hat sich gelohnt. はい。私はそれの価値があったと思います。

What they actually said: Yeah, but um, but it was you know, it was, I guess, it was worth it. Ja, aber ähm, aber es war, weißt du, es war, ich denke, es hat sich gelohnt. はい、ええと、あなたが知っている、だったが、推測すると、それはそれの価値があったけど。

Disfluency removalMore than just removing “um” and “ah”

Disfluencies in Conversational Speechum no i mean yes but you know i am i'venever done it myself have you done that uh

yesDisfluency types:• Pause Fillers

• Discourse Markers• Repetition

• Corrections (“speech repairs”)

um no i mean yes but you know i am i'venever done it myself have you done that uh

yesYes.

But, I’ve never done it myself.Have you done that?

Yes?

Disfluencies in Conversational Speechum no i mean yes but you know i am i'venever done it myself have you done that uh

yesum no i mean yes but you know i am i'venever done it myself have you done that uh

yesYes.

But, I’ve never done it myself.Have you done that?

Yes?

Need to:1. Segment

2. Remove disfluencies3. Punctuate4. Add case

Missing punc Catastrophic EffectsQuestions

¿vas ahora? are you going now?vas ahora go now

Negationno es mi segundo it is not my secondno. es mi segundo no. it’s my second

Seriously embarrassingtienes una hija ¿no? es muy preciosa you have a daughter right? is very beautifultienes una hija no es muy preciosa you have a daughter is not very beautiful

Accents/Wrong chars Changes in meaningAccented words (sound-alikes)

• Written with different forms different meanings• But pronounced the same

Si los vinos mendocinos son muy famososIf the wines from Mendoza are very famousSí los vinos mendocinos son muy famososYes the wines from Mendoza are very famous

Misrecognized words/characters (sound-alikes)你经常在没有听完的时候就睡着了吗 Do you often fall asleep without listening to it?你经常在没有听完的时候就睡着了嘛 You often fall asleep without listening to it.

How people say thingsHere’s what we need to recognize and translate

• He ain't my choice. But, hey, we hated the last guy.• We're going to hit it and quit it.• Boy, that story gets better every time you hear it.• I swear to God I am done with guys like that.

Unfortunately a lot of our MT training data looks like this• Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for regional policy, I want to stress that some very important points are made in this resolution.• I am therefore calling for integrated policies, all-encompassing policies that we can adapt to society, which must listen to our recommendations and comply with them.

Data mismatch & scarcityTraining data mismatch

• MT training is clearly mismatched• ASR training data is a mixed bag

Data scarcity• Traditional data sources (govt, news, web) not well matched• Not a lot of parallel conversational data (for MT)• Not a lot of transcribed conversational data (for ASR)

ASR: word errors, missing vocabASR vocab issues – e.g. names

Hi Arul Hi AaronI went skiing at Snoqualmie pass I went skiing at snow call me pass

ASR errorsHow do we minimize the impact of misrecognized words?

TrueText: Speech Correction

ASR

Speech Correction

Translation

Text to Speech

um no i mean yes but i am i've never done it myself did users before uh I will ask

gurdeep to help me

um no i mean yes but i am i've never done it myself did users before uh I will ask go

deep to help me

Yes.But I’ve never done it myself.

Did you use yours before?I will ask Gurdeep to help me.

um no i mean yes but i am i've never done it myself did you use yours before uh I will

ask gurdeep to help me

no i mean yes but I am i've never done it myself did you use yours before uh I will

ask gurdeep to help me

Raw ASR Output

Customizationand Personalization

Lattice Rescoring

Disfluency Removal

SegmentationPunctuation andTrue Casing

Oui.Mais je ne l'ai jamais fait moi-même.

Avez-vous utilisé le vôtre avant ?Gurdeep va demander de l'aide.

Euh non je veux dire oui mais je suis je l'ai jamais fait moi-même fait utilisateurs avant euh je vais demander à aller profond pour

m'aider`

Translator

Microsoft Confidential

••••

•••

•••

Client app Translated text

WEB API

Microsoft Confidential

(Speech category)

Partial TranscriptsFinal Transcripts

Partial TranslationsFinal Translations

Microsoft Confidential

Personal Translation Communication

Presentations Gatherings

Business Intelligence AI Interactions