subtitling & translation of weblectures by carlos turró ribalta

Post on 05-Dec-2014

583 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

This presentation was given by Carlos Turró Ribalta, Head of Media Services at Universitat Politecnica de Valencia, Spain on 11 December at the REC:all workshop 2013 "Lecture Capture: Moving beyond the pilot stage: large-scale implementation of lecture capture in European Higher Education" in Leuven, Belgium.

TRANSCRIPT

Rec: All Lecture Capture Workshop11 December 2013

Carlos TurróUniversitat Politècnica de València EC FP7 ICT project #287755

Motivation

12 Nov 2013 2

• Video lecture repositories and MOOCs• Thousands of hours of video lectures available• Hundreds of hours of video lectures

recorded every week

• Most video lectures only available in their original language• No subtitles

Motivation

12 Nov 2013 3

• Transcriptions and translations are needed• Accessibility for people with disabilities• Accessibility for speakers of different

languages• Search and analysis functions• Automated topic finding• …

Motivation

12 Nov 2013 4

• Transcriptions and translations are needed• Accessibility for people with disabilities• Accessibility for speakers of different

languages• Search and analysis functions• Automated topic finding• …

• How do we get there?

The transLectures approach

12 Nov 2013 5

1. Automatic Speech Recognition (ASR)and Machine Translation (MT)• Adaptation: Taking advantage of the

characteristics of video lecture repositories• High-quality automatic transcriptions and

translations

2. Interactive postediting:intelligent interaction for reduced effort

The transLectures partners

12 Nov 2013 7

Name Country

1 Universitat Politècnica de València Spain2 Xerox SAS France3 Institut Jožef Stefan Slovenia3+ Knowledge for All Foundation UK4 RWTH Aachen University Germany5 EML – European Media Laboratory Germany6 DDS – Deluxe Digital Studios UK

36 Months

Now we are in M25

Statistical Transcription (and translation)

Acustic Model

LanguageModel

TRANSCRIPTION

Sound ASR Engine

Statistical transcription(and translation)

Acustic Model

LanguageModel

Manually transcriptedvoice Modeling Engine

Architecture of TransLectures

Lecture

Language Model

Slides

Extracontent

Result

Intelligent interaction

Transcription Translation

Languages

12 Nov 2013 11

• Transcription (ASR)• EN• SL• ES

• Translation (MT)• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE

Case study: VideoLectures.NET

15000 lectures

Case study: Polimedia

10000 Learning Objects

Demo

http://translectures.videolectures.nethttp://polimedia.upv.es/catalogo

http://translectures.eu/player/

Scientific evaluations

• Transcription results

• WER: Word Error Rate (%)• Goal: WER < 20%

• EN, SL, ES

Worse

12 Nov 2013 15

Better

Scientific evaluations

• Translation results

• BLEU• Goal: BLEU > 30

• EN>SL , SL>EN• EN>ES , ES>EN• EN>FR• EN>DE

Better

12 Nov 2013 16

Worse

Y2 results and comparison

12 Nov 2013 17

Y2 results and comparison

12 Nov 2013 18

Y2 results and comparison

12 Nov 2013 19

Massive adaptation

• Characteristicsof video lectures Just one person

Known speaker

Clear talking

No interruptions

Focused on a topic

Slides

12 Nov 2013 20

Massive adaptation

12 Nov 2013 21

• Known speaker and topic• Slides• Related documents

Intelligent interaction

12 Nov 2013 22

• Postediting automatic transcriptions/translations• The user invests the least possible effort• The system learns the most from it

• Confidence measures• Fast constrained search

Intelligent interaction

12 Nov 2013 23

Intelligent interaction

12 Nov 2013 20

Implementation and integration

12 Nov 2013 25

• Videolectures.NET• Polimedia

• Opencast Matterhorn

Online HTML5 VideoPlayer editor with editing capabilities.The user interface has three different editing layouts, and full keyboard support.User interaction statistics analyzed to improve user experience and develop a user model.

The tL player

tL player

Manual upload of lectures

transLectures: tools available

12 Nov 2013 29

• The transLectures-UPV Toolkit (TLK) for ASR• www.translectures.eu/tlk

• RWTH Aachen: rASR, Jane (MT)• http://www-i6.informatik.rwth-aachen.de/web/Software/

Note that you need an acoustic & language model

transLectures: tools at M30

• The tL player (& editor)• tL Opencast Matterhorn module• Cloud service for testing• Coming soon at M30 (www.translectures.eu)

More info at the OCWC conference

(Ljubljana) in April 2014

Next steps for transLectures

12 Nov 2013 31

• Keep improving ASR and MT results• Keep improving tL open source tools (TLK, tL player)• External user evaluations (VL.NET and polimedia)• External trials: implementation in other universities

Next EU project: EMMA

• MOOC related project

• transLectures work in adding 7 new transciption systems (English, Italian, Spanish, French, Dutch, Portuguese and Estonian)

• … and 8 translation systems (from Italian, Spanish, French, Dutch, Portuguese and Estonian into English; and from English into Italian and Spanish)

• Beginning in 2014

top related