bringing parliamentary debates to the semantic web

21
Bringing parliamentary debates to the Semantic Web Damir Juric 1,3 , Laura Hollink 2 , Geert-Jan Houben 1 1 Delft University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb DERIVE 2012 Boston, 12.11.2012.

Upload: laura-hollink

Post on 22-May-2015

356 views

Category:

Documents


4 download

DESCRIPTION

Presentation of the paper 'Bringing parliamentary debates to the Semantic Web' by Damir Juric, Laura Hollink and Geert-Jan Houben at the workshop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE2012) in conjunction with the 11th International Semantic Web Conference 2012 in Boston, USA. See also the homepage of the PoliMedia project: http://polimedia.nl/

TRANSCRIPT

Page 1: Bringing parliamentary debates to the Semantic Web

Bringing parliamentary debates to the Semantic Web

Damir Juric1,3, Laura Hollink2, Geert-Jan Houben1

 1 Delft University of Technology, 2 VU University Amsterdam, 3 FER University of Zagreb

DERIVE 2012 Boston, 12.11.2012.

Page 2: Bringing parliamentary debates to the Semantic Web

Motivation

Cross-media comparison:

• What choices do different media make in the coverage of people and topics while reporting on political events?

• Does the representation of topics and people change over time and how do the various media types differ?

Page 3: Bringing parliamentary debates to the Semantic Web

Motivation

Cross-media comparison:

• What choices do different media make in the coverage of people and topics while reporting on political events?

• Does the representation of topics and people change over time and how do the various media types differ?

Political events

Media

Page 4: Bringing parliamentary debates to the Semantic Web

Background: the PoliMedia project

• Funded by CLARIN-NL

• May 2012 - May 2013

• 3 phases :I. modeling phase: creating

a semantic model (this presentation)

II.data production phase: creating links between political events and media

III.application phase: searching and navigating linked datasets

• www.polimedia.nl

Page 5: Bringing parliamentary debates to the Semantic Web

Research questions

• How to represent political events on the Semantic Web?• How to represent links between media and political events on

the Semantic Web?

Page 6: Bringing parliamentary debates to the Semantic Web

Research questions

• How to represent political events on the Semantic Web?• How to represent links between media and political events on

the Semantic Web?

Page 7: Bringing parliamentary debates to the Semantic Web

Political events data set

• Some provenance:

1. Transcripts are made of the complete debates of the Dutch parliament.

2. Published online by the government on http://www.statengeneraaldigitaal.nl/ (1818 1995) and http://officielebekendmakingen.nl/ (from 1995)

3. PoliticalMashup project has translated government pdf and txt files into XML, incl URI’s as identifiers, see http://politicalmashup.nl/

4. We build on that.

• Events: Dutch parliamentary debates

Handelingen der Staten-General or Dutch Hansard

Page 8: Bringing parliamentary debates to the Semantic Web

Media data sets

• newspaper articles and radio bulletins

• at the National Library of the Netherlands

• Many, mostly regional news papers 1950- 1995

• Text + images of newspaper layout

• newscasts

• at the Netherlands institute for Sound and Vision

• evening news and current affairs programs

• metadata in Dublin Core and CDMI format

• enriched with thesaurus terms from the Gemeenschappelijke Thesaurus Audiovisuele Archieven (GTAA)

Page 9: Bringing parliamentary debates to the Semantic Web

Semantic model: what do we need to represent? 1/2

• Important information for every parliamentary debate is:

• When the debate was held

• What is being said in the debate (topics)

• Who is giving the speeches in the debate and in which role (persons)

• Additional information about actors involved in the event (names of the politicians, their party, age, etc.)

• Structure: Subparts of the debate have their own identifiers (part of the debate where only one speaker can be identified as actor)

• chronological order (the order in which the subparts where occurring inside the parliament debate,

• Named entities apart from politicians (persons, locations, etc.)

Debate Metadata

Topic 1

Topic 2

Speaker 1 / Content

Speaker 2 / Content

Speaker 3 / Content

Speaker 1 / Content

Page 10: Bringing parliamentary debates to the Semantic Web

Semantic model: what do we need to represent? 2/2

• Various information about media items linked to the debate

• Links between subparts of the debate and news articles, radio bulletins and television newscasts

Page 11: Bringing parliamentary debates to the Semantic Web

URI’s

• PoliMedia vocabulary: http://purl.org/linkedpolitics/nl/polivoc#Speech

• Politicians, parties: http://purl.org/linkedpolitics/nl/poli#Beel

• debates and part of debates: http://purl.org/linkedpolitics/nl/nl.proc.sgd.d.198219830000846.2.11.12

• Media articles, bulletins and news casts: http://resolver.kb.nl/resolve?urn=ddd:010069811:mpeg21:pdf

Page 12: Bringing parliamentary debates to the Semantic Web

Semantic model

Page 13: Bringing parliamentary debates to the Semantic Web

Semantic model

Page 14: Bringing parliamentary debates to the Semantic Web

Semantic model

Page 15: Bringing parliamentary debates to the Semantic Web

Semantic model

Page 16: Bringing parliamentary debates to the Semantic Web

Semantic model

Page 17: Bringing parliamentary debates to the Semantic Web

Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)

Page 18: Bringing parliamentary debates to the Semantic Web

Semantic model W.R. van Hage, V. Malaisé, R. Segers, L. Hollink and A.Th. Schreiber. Design and use of the Simple Event Model (SEM)

Page 19: Bringing parliamentary debates to the Semantic Web

Current work: finding links

• Queries: speaker name + named entities + topics (created using topic modeling methods) extracted from political events dataset

• used for retrieval of media articles

NamedEntitiesVectorSpeech

TopicWordSetVector Speech

NamedEntitiesVectorPartOfDebate

TopicList =

+

ActorFromSpeechSpeaker X =

TimeFrame

TopicWordSetVectorPartOfDebate

Page 20: Bringing parliamentary debates to the Semantic Web

Finally

• SPARQL endpoint with the PoliMedia vocabulary + RDF of Dutch Hansard data will be available soon.

• Feel free to use it!

• Links to media + search/browse app are expected early next year.

Page 21: Bringing parliamentary debates to the Semantic Web

Henri Beunders (EUR)Jaap Blom (NISV)Laura Hollink (VU)

Geert-Jan Houben (TU Delft)

Damir Juric (TU Delft)Max Kemman (EUR)Martijn Kleppe (EUR) Johan Oomen (NISV)

Thank you for your attention!