speech to speech real time translations, aigars macins, skype
TRANSCRIPT
Skype Translator: Goal•To support open-domain conversations between Skype users in different parts of the world, speaking different languages
•
4
Speech to Speech real time translationsAigars MačiņšTechnical Program ManagerSkype Localization TeamMicrosoft
TAUS Roundtable 2016, 1 June, Riga
Skype Translator: What is it?•Translate voice calls and video calls in 7 languages and instant messages in over 50!
Automatic Speech Recognition (ASR) Microsoft Translator Skype Infrastructure Skype Translator Client app
Skype Translator process
The Challenges• The gulf between speech and text
• It’s not enough to just chain a really good ASR system with a really good MT system• How people talk to each other is not how they write
• Building really good conversational ASR and MT systems• Significant changes in the data we use to train the ASR and MT systems.
• The gap between technology demo and consumer product• Producing models with shippable latency• Interesting problems one encounters with real consumers
How people really speakWhat person thought they said:
Yeah. I guess it was worth it. Ja. Ich denke, es hat sich gelohnt. はい。私はそれの価値があったと思います。
What they actually said: Yeah, but um, but it was you know, it was, I guess, it was worth it. Ja, aber ähm, aber es war, weißt du, es war, ich denke, es hat sich gelohnt. はい、ええと、あなたが知っている、だったが、推測すると、それはそれの価値があったけど。
Disfluency removalMore than just removing “um” and “ah”
Disfluencies in Conversational Speechum no i mean yes but you know i am i'venever done it myself have you done that uh
yesDisfluency types:• Pause Fillers
• Discourse Markers• Repetition
• Corrections (“speech repairs”)
um no i mean yes but you know i am i'venever done it myself have you done that uh
yesYes.
But, I’ve never done it myself.Have you done that?
Yes?
Disfluencies in Conversational Speechum no i mean yes but you know i am i'venever done it myself have you done that uh
yesum no i mean yes but you know i am i'venever done it myself have you done that uh
yesYes.
But, I’ve never done it myself.Have you done that?
Yes?
Need to:1. Segment
2. Remove disfluencies3. Punctuate4. Add case
Missing punc Catastrophic EffectsQuestions
¿vas ahora? are you going now?vas ahora go now
Negationno es mi segundo it is not my secondno. es mi segundo no. it’s my second
Seriously embarrassingtienes una hija ¿no? es muy preciosa you have a daughter right? is very beautifultienes una hija no es muy preciosa you have a daughter is not very beautiful
Accents/Wrong chars Changes in meaningAccented words (sound-alikes)
• Written with different forms different meanings• But pronounced the same
Si los vinos mendocinos son muy famososIf the wines from Mendoza are very famousSí los vinos mendocinos son muy famososYes the wines from Mendoza are very famous
Misrecognized words/characters (sound-alikes)你经常在没有听完的时候就睡着了吗 Do you often fall asleep without listening to it?你经常在没有听完的时候就睡着了嘛 You often fall asleep without listening to it.
How people say thingsHere’s what we need to recognize and translate
• He ain't my choice. But, hey, we hated the last guy.• We're going to hit it and quit it.• Boy, that story gets better every time you hear it.• I swear to God I am done with guys like that.
Unfortunately a lot of our MT training data looks like this• Mr President, Commissioner, Mr Sacconi, ladies and gentlemen, as the PPE-DE's coordinator for regional policy, I want to stress that some very important points are made in this resolution.• I am therefore calling for integrated policies, all-encompassing policies that we can adapt to society, which must listen to our recommendations and comply with them.
Data mismatch & scarcityTraining data mismatch
• MT training is clearly mismatched• ASR training data is a mixed bag
Data scarcity• Traditional data sources (govt, news, web) not well matched• Not a lot of parallel conversational data (for MT)• Not a lot of transcribed conversational data (for ASR)
ASR: word errors, missing vocabASR vocab issues – e.g. names
Hi Arul Hi AaronI went skiing at Snoqualmie pass I went skiing at snow call me pass
ASR errorsHow do we minimize the impact of misrecognized words?
TrueText: Speech Correction
ASR
Speech Correction
Translation
Text to Speech
um no i mean yes but i am i've never done it myself did users before uh I will ask
gurdeep to help me
um no i mean yes but i am i've never done it myself did users before uh I will ask go
deep to help me
Yes.But I’ve never done it myself.
Did you use yours before?I will ask Gurdeep to help me.
um no i mean yes but i am i've never done it myself did you use yours before uh I will
ask gurdeep to help me
no i mean yes but I am i've never done it myself did you use yours before uh I will
ask gurdeep to help me
Raw ASR Output
Customizationand Personalization
Lattice Rescoring
Disfluency Removal
SegmentationPunctuation andTrue Casing
Oui.Mais je ne l'ai jamais fait moi-même.
Avez-vous utilisé le vôtre avant ?Gurdeep va demander de l'aide.
Euh non je veux dire oui mais je suis je l'ai jamais fait moi-même fait utilisateurs avant euh je vais demander à aller profond pour
m'aider`
Microsoft Confidential
(Speech category)
Partial TranscriptsFinal Transcripts
Partial TranslationsFinal Translations
Microsoft Confidential
Personal Translation Communication
Presentations Gatherings
Business Intelligence AI Interactions