m&l webinar "video in a multilingual context"

16
Video in a multilingual context Jorge Civera [email protected] Thursday 11th June, 2015

Upload: media-learning-conference

Post on 15-Jan-2017

165 views

Category:

Education


0 download

TRANSCRIPT

Video in a multilingual context

Jorge Civera

[email protected]

Thursday 11th June, 2015

Presentation

• Machine Learning and Language Processing (MLLP) group (mllp.upv.es)

• Automatic Speech Recognition:

– Already supported: English (En), Spanish (Es), Italian (It), Dutch (Nl), Estonian (Et),Portuguese (Pt), French (Fr) and Catalan (Ca)

– In progress: German (De) and Slovene (Sl)

• Machine Translation:

– Language pairs available: En→ {Es, It, Fr, Ca} and {Es, It, Nl, Et, Pt, Fr, Ca}→ En

• Speech Synthesis:

– Already supported: English (En) and Spanish (Es)

• Experience on EU projects with videos in a multilingual context: transLectures and EMMA

J. Civera - Video in a multilingual context 2 / 16

Introduction

• transLectures (Nov 2011 - Oct 2014)

– Lowering language barrier to access video repositories by providing multilingual subtitles

– Improving subtitles by massive adaptation and intelligent interaction

– VideoLectures.NET (VL) and poliMedia (pM) video repositories with thousands of hours

– Source languages: English and Slovene in VL and Spanish in pM

– Target languages: Spanish, French, German, Slovene and English

• EMMA (Feb 2014 - Jul 2016)

– Providing multilingual access to videos (and documents) in MOOC courses

– Few hours of video in 7 languages: En, Es, It, Nl, Et, Pt and Fr

– Source language is the national language of the MOOC provider

– Target languages: English, Spanish and Italian

J. Civera - Video in a multilingual context 3 / 16

Cost of multilingual access to video

• Ideally, videos should be automatically dub in the desired language

• Alternatively, subtitles in multiple languages should be available

• Generating subtitles for a video:

1. Transcription of speech (10 RTF)

2. Translation of transcription (30 RTF)

3. Dubbing translation (Optional)

• 1 hour of video takes 10 hours to transcribe plus 30 hours to translate

• The cost of providing multilingual access to video is not negligible

• Solutions to lower costs:

– Translate only most popular videos to achieve higher impact

– Crowdsourcing (TED talks)

– Speech Recognition and Machine Translation to generate draft subtitles

J. Civera - Video in a multilingual context 4 / 16

Overview of automatic video subtitling

• State-of-the-art technology cannot provide perfect automatic subtitles

• However, it significantly reduces the effort to generate multilingual subtitles

• Step-by-step process:

1. Generation of automatic transcriptions from video

2. Manual review of automatic transcriptions to correct transcription errors

3. Generation of automatic translations from manually reviewed transcription

4. Manual review of automatic translations to generate final subtitles

J. Civera - Video in a multilingual context 5 / 16

Reviewing automatic transcriptions

• Speech Recognition technology is in a more mature stage than Machine Translation

• Quality of automatic transcription can be impressive, but it greatly depends on:

– Sound quality of video– Availability of similar videos manually transcribed– Availability of text resources related to the video in question– Complexity of language involved (phonetics and grammar)

• Adaptation of speech technology to the specific videos is a key aspect for high accuracy

J. Civera - Video in a multilingual context 6 / 16

Evaluating transcription generation

• Review of automatic transcriptions is evaluated from two viewpoints:

– Transcription accuracy

– Time spent to review automatic transcriptions measured as Real Time Factor (RTF)

Language Accuracy RTF (10)Spanish Excellent 3Estonian Good 3Portuguese Average 5Italian Good 5English Good 6Catalan Good 6Dutch Good 6French Good 7

J. Civera - Video in a multilingual context 7 / 16

Reviewing automatic translations

• Machine Translation has improved over the last years, but it is still far from perfect

• Quality of automatic translation depends on:

– Proximity between source and target languages

– Complexity of grammar structures used by the speaker

– How specific the vocabulary employed in the video is

– Availability of parallel texts in the same field of the video

J. Civera - Video in a multilingual context 8 / 16

Evaluating translation generation

• Review of translations is evaluated from two viewpoints:

– Translation accuracy

– Time spent to review automatic translations (in RTF)

Language pairs Accuracy RTF (30)Portug. → English Good 61

Spanish → English Good 7Spanish → Catalan Excellent 9English → Italian Good 10Dutch → English Good 13Italian → English Good 14Estonian→ English Poor 16English → Spanish Average 17French → English Average 26

1Preliminar translationsJ. Civera - Video in a multilingual context 9 / 16

TTP demo

• Transcription and Translation Platform (TTP)

– Available at http://ttp.mllp.upv.es

– Free registration to try our platform

• Video-related functionality available at TTP2:

– Uploading video (media)

– Automatic transcription and translation

– Reviewing transcription and translation

– Text-To-Speech demo

2Translation of HTML documents is also available.J. Civera - Video in a multilingual context 10 / 16

Conclusions

• Multilingual access to video undoubtedly boosts content visibility

• The cost of manually generating subtitles in a second language is high (40 RTF)

• Automatic subtitling can reduce the temporal cost up to 30-60%

• Accuracy of automatic subtitles depends on several factors:

– Languages involved

– Availability of annotated data resources related to the video in question

– Specificity of the video content

• Automatic subtitling reduces the effort to provide multilingual access to video ...

... but subtitles still need to be reviewed for educational purposes.

J. Civera - Video in a multilingual context 11 / 16

Try TTP at ttp.mllp.upv.es

Thank you for your attention!

J. Civera - Video in a multilingual context 12 / 16

Evaluating generation of transcriptions

• Automatic transcription systems are evaluated from two viewpoints:

– Transcription error measured in terms of Word Error Rate (WER)

– Time spent to review automatic transcriptions measured as Real Time Factor (RTF)

Language WER RTF (10)English 34.6 5.7Spanish 13.5 2.9Italian 28.9 5.3Dutch 25.7 6.0Portuguese 68.1 4.8Estonian 29.5 3.1French 23.8 7.2Catalan 39.6 5.7

J. Civera - Video in a multilingual context 13 / 16

Evaluating generation of translations

• Machine translation systems are evaluated from two viewpoints:

– Translation quality is automatically measured as Translation Error Rate (TER)

– Time spent to review automatic translations from video transcriptions (in RTF)

Language pairs TER RTF (30)Spanish → English 35.6 6.9Italian → English 46.7 13.9Dutch → English 48.1 12.5Portuguese→ English 38.2 5.83

Estonian → English 86.8 15.5French → English 77.9 25.9English → Spanish 58.4 16.5English → Italian 40.9 10.1Spanish → Catalan 22.9 9.1

3Preliminar translationsJ. Civera - Video in a multilingual context 14 / 16

Comparative results in transcription

• Comparison with YouTube in terms of Word Error Rate

Word Error RateLanguage MLLP YouTubeDutch 25.7 38.6English 39.2 70.8Italian 28.9 31.6Portuguese 49.8 62.3Spanish 14.4 34.3

J. Civera - Video in a multilingual context 15 / 16

Comparative results in translation

• Comparison with Google Translate in terms of BLEU

Quality - BLEULanguage pairs MLLP Google

Dutch → English 41.6 33.4English → Spanish 42.5 39.0Italian → English 46.9 27.9Portuguese→ English 47.6 45.4Spanish → English 28.2 27.6

3Preliminar translationJ. Civera - Video in a multilingual context 16 / 16