scalable understanding of multilingual mediabig data • 250 video channels 2.5tb/day, 19tb/week,...
Post on 09-Oct-2020
0 Views
Preview:
TRANSCRIPT
Funded by the EU H2020 ICT Programme under Grant Agreement 688139http://summa-project.eu
Scalable Understanding of Multilingual Media
Steve Renals University of Edinburgh
Funded by the EU H2020 ICT Programme under Grant Agreement 688139http://summa-project.eu
http://summa-project.eu
SUMMA in a nutshell
• Significantly improve media monitoring, by the automatic● analysis of media streams across many
languages● aggregation and distillation of stream content● construction of knowledge bases from reported
facts● supply of media data visualisations at scale
http://summa-project.eu
BBC Monitoring
300 journalists each monitoring up to 4 TV channels several online text sources
30 languages – most important include Russian Arabic Farsi
http://summa-project.eu
Big Data
• 250 video channels● 2.5Tb/day, 19Tb/week, 1Pb/year
• BBC monitoring has access to● 1,500 TV channels● 1,350 radio sources
• But… ~700 free-to-air Arabic satellite channels, increases at ~100/year
• Current monitoring processes are largely manual and cannot keep up with the scale of the task
http://summa-project.eu
Use cases
1. External Media Monitoring ● identify emerging trends● tracking people in the news● monitoring the evolution of storylines
2. Internal Media Montoring ● manage multilingual content creation● efficient reuse of content across languages
3. Data Journalism ● use SUMMA platform for data driven journalism
http://summa-project.eu
SUMMA Prototypes
Channel ID & native
language
Semantic Tag word cloud- size indicates current
frequency across region/group
Segment Unique
timestamp
Player (Sd? HD?)
Player controller -
tag instances marked
Tools - screen grab, snip
video, save, attach
Translated transcript
“Now playing” text
highlighted
Tags shown underlined
Add new tag - click pencil to
‘underline’, and enter text
Segment machine analysis confidence (possibly better represented graphically?)
UI Concept 1
http://summa-project.eu
SUMMA Prototypes
http://summa-project.eu
SUMMA Prototypes
http://summa-project.eu
SUMMA Prototypes
http://summa-project.eu
Platform & Technologies
Speech recognition Machine translation Segmentation & clustering
Ingest audio, video, text
Identify entities & relationsSummarisation & distillation Sentiment detection
Visualisation & prototypes
http://summa-project.eu
Multilingual technologies
http://summa-project.eu
SUMMA system v0.1
Funded by the EU H2020 ICT Programme under Grant Agreement 688139http://summa-project.eu
top related