subtitling & translation of weblectures by carlos turró ribalta

33
Rec: All Lecture Capture Workshop 11 December 2013 Carlos Turró Universitat Politècnica de València EC FP7 ICT project #287755

Upload: recall-project

Post on 05-Dec-2014

583 views

Category:

Education


2 download

DESCRIPTION

This presentation was given by Carlos Turró Ribalta, Head of Media Services at Universitat Politecnica de Valencia, Spain on 11 December at the REC:all workshop 2013 "Lecture Capture: Moving beyond the pilot stage: large-scale implementation of lecture capture in European Higher Education" in Leuven, Belgium.

TRANSCRIPT

Page 1: Subtitling & translation of weblectures by Carlos Turró Ribalta

Rec: All Lecture Capture Workshop11 December 2013

Carlos TurróUniversitat Politècnica de València EC FP7 ICT project #287755

Page 2: Subtitling & translation of weblectures by Carlos Turró Ribalta

Motivation

12 Nov 2013 2

• Video lecture repositories and MOOCs• Thousands of hours of video lectures available• Hundreds of hours of video lectures

recorded every week

• Most video lectures only available in their original language• No subtitles

Page 3: Subtitling & translation of weblectures by Carlos Turró Ribalta

Motivation

12 Nov 2013 3

• Transcriptions and translations are needed• Accessibility for people with disabilities• Accessibility for speakers of different

languages• Search and analysis functions• Automated topic finding• …

Page 4: Subtitling & translation of weblectures by Carlos Turró Ribalta

Motivation

12 Nov 2013 4

• Transcriptions and translations are needed• Accessibility for people with disabilities• Accessibility for speakers of different

languages• Search and analysis functions• Automated topic finding• …

• How do we get there?

Page 5: Subtitling & translation of weblectures by Carlos Turró Ribalta

The transLectures approach

12 Nov 2013 5

1. Automatic Speech Recognition (ASR)and Machine Translation (MT)• Adaptation: Taking advantage of the

characteristics of video lecture repositories• High-quality automatic transcriptions and

translations

2. Interactive postediting:intelligent interaction for reduced effort

Page 7: Subtitling & translation of weblectures by Carlos Turró Ribalta

The transLectures partners

12 Nov 2013 7

Name Country

1 Universitat Politècnica de València Spain2 Xerox SAS France3 Institut Jožef Stefan Slovenia3+ Knowledge for All Foundation UK4 RWTH Aachen University Germany5 EML – European Media Laboratory Germany6 DDS – Deluxe Digital Studios UK

36 Months

Now we are in M25

Page 8: Subtitling & translation of weblectures by Carlos Turró Ribalta

Statistical Transcription (and translation)

Acustic Model

LanguageModel

TRANSCRIPTION

Sound ASR Engine

Page 9: Subtitling & translation of weblectures by Carlos Turró Ribalta

Statistical transcription(and translation)

Acustic Model

LanguageModel

Manually transcriptedvoice Modeling Engine

Page 10: Subtitling & translation of weblectures by Carlos Turró Ribalta

Architecture of TransLectures

Lecture

Language Model

Slides

Extracontent

Result

Intelligent interaction

Transcription Translation

Page 11: Subtitling & translation of weblectures by Carlos Turró Ribalta

Languages

12 Nov 2013 11

• Transcription (ASR)• EN• SL• ES

• Translation (MT)• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE

Page 12: Subtitling & translation of weblectures by Carlos Turró Ribalta

Case study: VideoLectures.NET

15000 lectures

Page 13: Subtitling & translation of weblectures by Carlos Turró Ribalta

Case study: Polimedia

10000 Learning Objects

Page 14: Subtitling & translation of weblectures by Carlos Turró Ribalta

Demo

http://translectures.videolectures.nethttp://polimedia.upv.es/catalogo

http://translectures.eu/player/

Page 15: Subtitling & translation of weblectures by Carlos Turró Ribalta

Scientific evaluations

• Transcription results

• WER: Word Error Rate (%)• Goal: WER < 20%

• EN, SL, ES

Worse

12 Nov 2013 15

Better

Page 16: Subtitling & translation of weblectures by Carlos Turró Ribalta

Scientific evaluations

• Translation results

• BLEU• Goal: BLEU > 30

• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE

Better

12 Nov 2013 16

Worse

Page 17: Subtitling & translation of weblectures by Carlos Turró Ribalta

Y2 results and comparison

12 Nov 2013 17

Page 18: Subtitling & translation of weblectures by Carlos Turró Ribalta

Y2 results and comparison

12 Nov 2013 18

Page 19: Subtitling & translation of weblectures by Carlos Turró Ribalta

Y2 results and comparison

12 Nov 2013 19

Page 20: Subtitling & translation of weblectures by Carlos Turró Ribalta

Massive adaptation

• Characteristicsof video lectures Just one person

Known speaker

Clear talking

No interruptions

Focused on a topic

Slides

12 Nov 2013 20

Page 21: Subtitling & translation of weblectures by Carlos Turró Ribalta

Massive adaptation

12 Nov 2013 21

• Known speaker and topic• Slides• Related documents

Page 22: Subtitling & translation of weblectures by Carlos Turró Ribalta

Intelligent interaction

12 Nov 2013 22

• Postediting automatic transcriptions/translations• The user invests the least possible effort• The system learns the most from it

• Confidence measures• Fast constrained search

Page 23: Subtitling & translation of weblectures by Carlos Turró Ribalta

Intelligent interaction

12 Nov 2013 23

Page 24: Subtitling & translation of weblectures by Carlos Turró Ribalta

Intelligent interaction

12 Nov 2013 20

Page 25: Subtitling & translation of weblectures by Carlos Turró Ribalta

Implementation and integration

12 Nov 2013 25

• Videolectures.NET• Polimedia

• Opencast Matterhorn

Page 26: Subtitling & translation of weblectures by Carlos Turró Ribalta

Online HTML5 VideoPlayer editor with editing capabilities.The user interface has three different editing layouts, and full keyboard support.User interaction statistics analyzed to improve user experience and develop a user model.

The tL player

Page 27: Subtitling & translation of weblectures by Carlos Turró Ribalta

tL player

Page 28: Subtitling & translation of weblectures by Carlos Turró Ribalta

Manual upload of lectures

Page 29: Subtitling & translation of weblectures by Carlos Turró Ribalta

transLectures: tools available

12 Nov 2013 29

• The transLectures-UPV Toolkit (TLK) for ASR• www.translectures.eu/tlk

• RWTH Aachen: rASR, Jane (MT)• http://www-i6.informatik.rwth-aachen.de/web/Software/

Note that you need an acoustic & language model

Page 30: Subtitling & translation of weblectures by Carlos Turró Ribalta

transLectures: tools at M30

• The tL player (& editor)• tL Opencast Matterhorn module• Cloud service for testing• Coming soon at M30 (www.translectures.eu)

More info at the OCWC conference

(Ljubljana) in April 2014

Page 31: Subtitling & translation of weblectures by Carlos Turró Ribalta

Next steps for transLectures

12 Nov 2013 31

• Keep improving ASR and MT results• Keep improving tL open source tools (TLK, tL player)• External user evaluations (VL.NET and polimedia)• External trials: implementation in other universities

Page 32: Subtitling & translation of weblectures by Carlos Turró Ribalta

Next EU project: EMMA

• MOOC related project

• transLectures work in adding 7 new transciption systems (English, Italian, Spanish, French, Dutch, Portuguese and Estonian)

• … and 8 translation systems (from Italian, Spanish, French, Dutch, Portuguese and Estonian into English; and from English into Italian and Spanish)

• Beginning in 2014