bringing parliamentary debates to the semantic web
DESCRIPTION
Presentation of the paper 'Bringing parliamentary debates to the Semantic Web' by Damir Juric, Laura Hollink and Geert-Jan Houben at the workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE2012) in conjunction with the 11th International Semantic Web Conference 2012 in Boston, USA. See also the homepage of the PoliMedia project: http://polimedia.nl/TRANSCRIPT
Bringing parliamentary debates to the Semantic Web
Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1
1 Delft University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb
DERIVE 2012 Boston, 12.11.2012.
Motivation
Cross-media comparison:
• What choices do different media make in the coverage of people and topics while reporting on political events?
• Does the representation of topics and people change over time and how do the various media types differ?
Motivation
Cross-media comparison:
• What choices do different media make in the coverage of people and topics while reporting on political events?
• Does the representation of topics and people change over time and how do the various media types differ?
Political events
Media
Background: the PoliMedia project
• Funded by CLARIN-NL
• May 2012 - May 2013
• 3 phases :I. modeling phase: creating
a semantic model (this presentation)
II.data production phase: creating links between political events and media
III.application phase: searching and navigating linked datasets
• www.polimedia.nl
Research questions
• How to represent political events on the Semantic Web?• How to represent links between media and political events on
the Semantic Web?
Research questions
• How to represent political events on the Semantic Web?• How to represent links between media and political events on
the Semantic Web?
Political events data set
• Some provenance:
1. Transcripts are made of the complete debates of the Dutch parliament.
2. Published online by the government on http://www.statengeneraaldigitaal.nl/ (1818 1995) and http://officielebekendmakingen.nl/ (from 1995)
3. PoliticalMashup project has translated government pdf and txt files into XML, incl URI’s as identifiers, see http://politicalmashup.nl/
4. We build on that.
• Events: Dutch parliamentary debates
Handelingen der Staten-General or Dutch Hansard
Media data sets
• newspaper articles and radio bulletins
• at the National Library of the Netherlands
• Many, mostly regional news papers 1950- 1995
• Text + images of newspaper layout
• newscasts
• at the Netherlands institute for Sound and Vision
• evening news and current affairs programs
• metadata in Dublin Core and CDMI format
• enriched with thesaurus terms from the Gemeenschappelijke Thesaurus Audiovisuele Archieven (GTAA)
Semantic model: what do we need to represent? 1/2
• Important information for every parliamentary debate is:
• When the debate was held
• What is being said in the debate (topics)
• Who is giving the speeches in the debate and in which role (persons)
• Additional information about actors involved in the event (names of the politicians, their party, age, etc.)
• Structure: Subparts of the debate have their own identifiers (part of the debate where only one speaker can be identified as actor)
• chronological order (the order in which the subparts where occurring inside the parliament debate,
• Named entities apart from politicians (persons, locations, etc.)
Debate Metadata
Topic 1
Topic 2
Speaker 1 / Content
Speaker 2 / Content
Speaker 3 / Content
Speaker 1 / Content
Semantic model: what do we need to represent? 2/2
• Various information about media items linked to the debate
• Links between subparts of the debate and news articles, radio bulletins and television newscasts
URI’s
• PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech
• Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel
• debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d.198219830000846.2.11.12
• Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd:010069811:mpeg21:pdf
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model
Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)
Current work: finding links
• Queries: speaker name + named entities + topics (created using topic modeling methods) extracted from political events dataset
• used for retrieval of media articles
NamedEntitiesVectorSpeech
TopicWordSetVector Speech
NamedEntitiesVectorPartOfDebate
TopicList =
+
ActorFromSpeechSpeaker X =
TimeFrame
TopicWordSetVectorPartOfDebate
Finally
• SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard data will be available soon.
• Feel free to use it!
• Links to media + search/browse app are expected early next year.
Henri Beunders (EUR)Jaap Blom (NISV)Laura Hollink (VU)
Geert-Jan Houben (TU Delft)
Damir Juric (TU Delft)Max Kemman (EUR)Martijn Kleppe (EUR) Johan Oomen (NISV)
Thank you for your attention!