applying advanced language technologies for multilingual subtitling

11
Applying advanced language technologies for multilingual subtitling and text translation in education Alfons Juan Ciscar Machine Learning and Language Processing research group www.mllp.upv.es 28 October 2016

Upload: phungdiep

Post on 31-Dec-2016

222 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Applying advanced language technologies for multilingual subtitling

Applying advanced language technologiesfor multilingual subtitling and text translation

in education

Alfons Juan CiscarMachine Learning and Language Processing research group

www.mllp.upv.es

28 October 2016

Page 2: Applying advanced language technologies for multilingual subtitling

Contents

The MLLP research group: recent projects 3

EMMA 4

Transcription and translation of videos 5

Translation of course texts 6

Integration of the tools: Web Service + API 8

Demo: The MLLP Platform for Transcription and Translation 9

Conclusions and future work 10

MLLP - The MLLP Platform 2 / 11

Page 3: Applying advanced language technologies for multilingual subtitling

• Research group at the Univ. Politècnica de València (Spain)

• Recent projects

– transLectures (EU FP7): Transcription and translation of video lectures (2011–2014)– EMMA (EU CIP): European Multiple MOOC Aggregator (2014–2016)– MORE: Multilingual Open Resources for Education

• Language technologies at the MLLP

– Automatic Speech Recognition (automatic subtitling):∗ EN, ES, CA, DE, FR, ET, IT, NL, PT, SL

– Machine Translation (for subtitles, documents and websites):∗ {ES, CA, FR, ET, IT, NL, PT}→ EN∗ EN→ {ES, CA, FR, IT, PT, SL}

– Text-to-Speech Synthesis:∗ EN, ES, CA∗ E.g., EN→ {ES, CA} voice-to-voice translation

MLLP - The MLLP Platform 3 / 11

Page 4: Applying advanced language technologies for multilingual subtitling

EMMA (2014 – 2016)

European Multiple MOOC Aggregator (EMMA)

• Main goal: To provide multilingual access to European MOOCs

• Motivation:

– Language barrier is keeping many learners from taking MOOCs– EMMA uses transLectures tools to translate videos and texts

• Cost of manually translating MOOCs:

– Videos:∗ Before translating, videos are manually transcribed (10 RTF)∗ Then, transcriptions are translated (30 RTF)∗ A course including 2 hours of video takes 0.5 PM

– Texts:∗ Manual translation rate is approximately 2500 words per day∗ A 6-week course with 75,000 words takes 1.5 PM

– Solutions to lower costs:∗ Crowdsourcing (e.g., TED talks)∗ ASR and MT: user effort is reduced to 30% (2 PM→ 0.6 PM)

MLLP - The MLLP Platform 4 / 11

Page 5: Applying advanced language technologies for multilingual subtitling

Transcription and translation of videos

1. Generation of automatic transcriptions (subtitles) from video

2. Manual review of automatic transcriptions

3. Generation of automatic translations from transcriptions

4. Manual review of automatic translations

MLLP - The MLLP Platform 5 / 11

Page 6: Applying advanced language technologies for multilingual subtitling

Translation of course texts

• Text included in the course is ingested into the translation system

• An advanced web interface allows you to review source and target texts side-by-side

• Preview of source and target documents also available

MLLP - The MLLP Platform 6 / 11

Page 7: Applying advanced language technologies for multilingual subtitling

Quality and time saving measurements

Video transcription

Word Error Rate ASR time savingLanguage EMMA YouTube ∆% w.r.t. 10 RTF %Spanish 15 + 52 – 69English 39 + 41 – 47Estonian 27 N/A N/AFrench 21 + 55 – 34Italian 17 + 85 – 61Dutch 25 + 86 – 42Portuguese 43 + 31 N/A

Video translation

Quality (BLEU) MT time savingLanguage pair EMMA Google ∆% w.r.t. 30 RTF %

English → Spanish 43 – 10 – 70English → Italian 53 – 16 – 68

Spanish → English 41 – 15 – 74Estonian → English 28 – 5 N/A

French → English 33 – 7 – 23Italian → English 48 – 22 – 58Dutch → English 42 – 18 – 68

Portug. → English 56 – 18 N/A

MLLP - The MLLP Platform 7 / 11

Page 8: Applying advanced language technologies for multilingual subtitling

Integration of the tools: Web Service + API

ttp.mllp.upv.es/doc/

MLLP - The MLLP Platform 8 / 11

Page 9: Applying advanced language technologies for multilingual subtitling

Demo: The MLLP Platformfor Transcription and Translation

ttp.mllp.upv.es

MLLP - The MLLP Platform 9 / 11

Page 10: Applying advanced language technologies for multilingual subtitling

Conclusions and future work

• Multilingual access to courses boosts visibility (in EMMA: +76% enrolled students)

• The cost of manually translating a course is high

• Automatic translation can reduce the temporal cost to 30%

• Accuracy of automatic transcription and translation depends on several factors:

– Languages involved, recording conditions, availability of annotated data resourcesrelated to the course, specificity of the course

• Designing multilingual courses and contents should also take into account:

– Slides, images, application interfaces (demos), bibliography...

• Future work:

– New languages for transcription and language pairs for translation

– Improving auto transcription accuracy with noisy recordings (i.e. classroom recordings)

– Multilingual website translation

– Text-to-speech technology to “dub” videos and video lectures

– Live translation in forums

– Virtual multilingual teaching assistant

MLLP - The MLLP Platform 10 / 11

Page 11: Applying advanced language technologies for multilingual subtitling

Applying advanced language technologiesfor multilingual subtitling and text translation

in education

Alfons Juan CiscarMachine Learning and Language Processing research group

www.mllp.upv.es

28 October 2016