scalable understanding of multilingual mediabig data • 250 video channels 2.5tb/day, 19tb/week,...

14
Funded by the EU H2020 ICT Programme under Grant Agreement 688139 http://summa-project.eu Scalable Understanding of Multilingual Media Steve Renals University of Edinburgh

Upload: others

Post on 09-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

Funded by the EU H2020 ICT Programme under Grant Agreement 688139http://summa-project.eu

Scalable Understanding of Multilingual Media

Steve Renals University of Edinburgh

Page 2: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

Funded by the EU H2020 ICT Programme under Grant Agreement 688139http://summa-project.eu

Page 3: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

SUMMA in a nutshell

• Significantly improve media monitoring, by the automatic● analysis of media streams across many

languages● aggregation and distillation of stream content● construction of knowledge bases from reported

facts● supply of media data visualisations at scale

Page 4: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

BBC Monitoring

300 journalists each monitoring up to 4 TV channels several online text sources

30 languages – most important include Russian Arabic Farsi

Page 5: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

Big Data

• 250 video channels● 2.5Tb/day, 19Tb/week, 1Pb/year

• BBC monitoring has access to● 1,500 TV channels● 1,350 radio sources

• But… ~700 free-to-air Arabic satellite channels, increases at ~100/year

• Current monitoring processes are largely manual and cannot keep up with the scale of the task

Page 6: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

Use cases

1. External Media Monitoring ● identify emerging trends● tracking people in the news● monitoring the evolution of storylines

2. Internal Media Montoring ● manage multilingual content creation● efficient reuse of content across languages

3. Data Journalism ● use SUMMA platform for data driven journalism

Page 7: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

SUMMA Prototypes

Channel ID & native

language

Semantic Tag word cloud- size indicates current

frequency across region/group

Segment Unique

timestamp

Player (Sd? HD?)

Player controller -

tag instances marked

Tools - screen grab, snip

video, save, attach

Translated transcript

“Now playing” text

highlighted

Tags shown underlined

Add new tag - click pencil to

‘underline’, and enter text

Segment machine analysis confidence (possibly better represented graphically?)

UI Concept 1

Page 8: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

SUMMA Prototypes

Page 9: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

SUMMA Prototypes

Page 10: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

SUMMA Prototypes

Page 11: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

Platform & Technologies

Speech recognition Machine translation Segmentation & clustering

Ingest audio, video, text

Identify entities & relationsSummarisation & distillation Sentiment detection

Visualisation & prototypes

Page 12: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

Multilingual technologies

Page 13: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

http://summa-project.eu

SUMMA system v0.1

Page 14: Scalable Understanding of Multilingual MediaBig Data • 250 video channels 2.5Tb/day, 19Tb/week, 1Pb/year • BBC monitoring has access to 1,500 TV channels 1,350 radio sources •

Funded by the EU H2020 ICT Programme under Grant Agreement 688139http://summa-project.eu