live topic generation from event streams
DESCRIPTION
"Live Topic Generation from Event Streams", talk given at the Demo session of the 22nd World Wide Web Conference (WWW), Rio de Janeiro, BrazilTRANSCRIPT
Live Topic Generation from Event Streams
Vuk Milicic, José Luis Redondo Garcia,
Giuseppe Rizzo, Raphaël Troncy, Thomas Steiner
[email protected] / @rtroncy
Media Finder (www2013)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 2
Media Finder (zooming on media items)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 3
Media Finder (timeline view)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 4
Media Finder (timeline view)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 5
Media Server
Composition of media item extractors (12 SNs) Rely on search APIs + a fix 30s timeout window to provide results Fallback on screen scraping when necessary (Twitter ecosystem)
Implemented as a NodeJS server
Serialize results in a common schema (JSON)
22nd World Wide Web Conference (WWW) - Rio de Janeiro 15/05/2013 - 6
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 7
Deep link Permalink
Clean text for NLP processing
Aggregate view of ALL social interactions
12 Social Networks
Media Finder Architecture
Media items harvesting using the Media Server http://eventmedia.eurecom.fr/media-
server/search/{combined}/{term} https://github.com/vuknje/media-server (@tomayac fork)
Image near de-duplication DCT signature on image and video frame,
Hamming distance between image pairs
Clustering and disambiguation Named Entity Extraction using NERD Topic Generation using LDA Density-based clustering using OPTICS
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 8
Named Entities are Pivotal
http://nerd.eurecom.fr/
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 9
REST API Ontology
Dashboard UI
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 10
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 11
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 12
Media Finder (named entities clustering)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 13
Media Finder (zooming in a cluster)
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 14
Summary
Pick an event identified with a hashtag
Use MediaServer to get media items aggregated over multiple social networks
Use NERD to get entities aggregated over multiple extractors
Cluster and identify meaningful topics (aka entities) with a meaningful label often disambiguated with a DBpedia URI giving access
to more encyclopedic knowledge
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 15
Live Topic Generation from Event Streams
Meet us at WWW 2013 Demo Session, Booth 14 http://www.youtube.com/watch?v=8iRiwz7cDYY
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 16
http://www.slideshare.net/troncy
15/05/2013 22nd World Wide Web Conference (WWW) - Rio de Janeiro - 17