Download - Connecting political data to media data
![Page 1: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/1.jpg)
Connecting political data to media data
Laura Hollink
VU University AmsterdamWeb & Media group
ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’February 18, 2014
![Page 2: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/2.jpg)
Laura Hollink Damir JuricGeert-Jan Houben
Martijn KleppeMax KemmanHenri Beunders
Johan OomenJaap Blom
Funded by Clarin-NL
![Page 3: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/3.jpg)
![Page 4: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/4.jpg)
![Page 5: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/5.jpg)
Questions we want to answer
• Which events have attracted a lot of media attention?
• What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins?
• Has the coverage changed over time?
• How are the events visualized (photos, layout of newspaper, etc.).
![Page 6: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/6.jpg)
![Page 7: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/7.jpg)
Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.
![Page 8: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/8.jpg)
Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Archives of hundreds of
newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995.
(We only use 1945-1995)
![Page 9: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/9.jpg)
Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.
Roughly 1.8 Million news bulletins between 1937-1984
(We only use 1945-1995)
Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995.
(We only use 1945-1995)
![Page 10: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/10.jpg)
PoliMedia methods
![Page 11: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/11.jpg)
Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF
nl.proc.sgd.d.194519460000002
nl.proc.sgd.d.194519460000002.1
PartOfDebateDebate
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002
http://statengeneraaldigitaal.nl/
http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
nl.proc.sgd.d.19720000002
Handelingen Verenigde Vergadering...
Dutch
1945-11-20rdf:type
dc:id
dc:source
dc:source
dc:publisher
dc:language
dc:date
hasPart
rdf:type
nl.proc.sgd.d.194519460000002.1.1hasPart
DebateContext
rdf:type
nl.proc.sgd.d.194519460000002.1.2
Speech
rdf:type
hasPart
nl.proc.sgd.d.194519460000002.1.3
hasSubsequentSpeech
"Mijnheer de Voorzitter, de Commissie van …"
hasSpokenText
sem:hasActorSpeaker_0006
4
Party_kvp
hasParty
hasSpeaker
member_of _parliament
"De voorzitter opent de vergadering…"
hasText
http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr
coveredIn
Party
KVP
Katholieke Volkspartijrdf:type
hasAcronym
hasFullName
Joannes Antonius James
Bargefoaf:firstName
foaf:lastName
Bargerdfs:label
http://resolver.politicalmashup.nl/nl.m.00064
dc:source
Politician
rdf:typehasRole
nl.proc.sgd.d.194519460000002.2
hasSubsequentPartOfDebate
XML by War in
Parliament Project
![Page 12: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/12.jpg)
Modeling the debates as events
• An event has a date, a location, actors, and possibly sub-events.
• We build on the Simple Event Model (SEM).
•links to the original sources•reusing existing
vocabularies
nl.proc.sgd.d.194519460000002
Debate
http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002
http://statengeneraaldigitaal.nl/
http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf
nl.proc.sgd.d.19720000002
Handelingen Verenigde Vergadering...
Dutch
1945-11-20rdf:type
dc:id
dc:source
dc:source
dc:publisher
dc:language
dc:date
dc:title
![Page 13: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/13.jpg)
•the part-of structure and chronological order of the debates.
nl.proc.sgd.d.194519460000002
nl.proc.sgd.d.194519460000002.1
PartOfDebate
hasPart
rdf:type
nl.proc.sgd.d.194519460000002.1.1hasPart
DebateContext
rdf:type
nl.proc.sgd.d.194519460000002.1.2
Speech
rdf:type
hasPart
nl.proc.sgd.d.194519460000002.1.3
hasSubsequentSpeech
"Mijnheer de Voorzitter, de Commissie van …"
hasSpokenText
"De voorzitter opent de vergadering…"
hasText
nl.proc.sgd.d.194519460000002.2
hasSubsequentPartOfDebate
Handelingen Verenigde Vergadering...
dc:title
![Page 14: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/14.jpg)
•the different roles and parties that a speaker can have in his/her career.
nl.proc.sgd.d.194519460000002.1.2
Speech
rdf:type
"Mijnheer de Voorzitter, de Commissie van …"
hasSpokenText
sem:hasActorSpeaker_0006
4
Party_kvp
hasParty
hasSpeaker
member_of _parliament
http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr
coveredIn
Party
KVP
Katholieke Volkspartijrdf:type
hasAcronym
hasFullName
Joannes Antonius James
Bargefoaf:firstName
foaf:lastName
Bargerdfs:label
Politician
rdf:typehasRole
![Page 15: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/15.jpg)
Step 2: Linking speeches in the debate to the newspaper articles that cover them
We created a linking method to deal with our two challenges:1.How to link documents that are so different in nature?2. Can we use the structure of the debates: people, chronologic
order of speeches, introductions to each new topic, etc?
Detect topics in
speeches
Create queries
Search newspaper
archive
Topics
Named Entities
Name of speaker
Detect Named
Entities in speeches
Candidate articles
Queries
Rank candidate
articles
Links between speeches
and articles
Debates
Date of debate
![Page 16: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/16.jpg)
Step 2: Linking speeches in the debate to the newspaper articles that cover them
Detect topics in
speeches
Create queries
Search newspaper
archive
Topics
Named Entities
Name of speaker
Detect Named
Entities in speeches
Candidate articles
Queries
Rank candidate
articles
Links between speeches
and articles
Debates
Date of debate
Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate
![Page 17: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/17.jpg)
Step 2: Linking speeches in the debate to the newspaper articles that cover them
Detect topics in
speeches
Create queries
Search newspaper
archive
Topics
Named Entities
Name of speaker
Detect Named
Entities in speeches
Candidate articles
Queries
Rank candidate
articles
Links between speeches
and articles
Debates
Date of debate
Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate
Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.
![Page 18: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/18.jpg)
Evaluation: what do we use to rank the candidate articles?
• Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K = 0.5
• Compare text of candidate articles to:• Setting 1: Named Entities in speech
• Setting 2: Named Entities + Topics in speech
• Setting 3: Named Entities + Topics in speech and larger part-of-debate
Score Setting 1 Setting 2 Setting 3
I don’t know 0.14 0.15 0.08
0 - unrelated 0.38 0.23 0.12
1- related 0.29 0.36 0.36
2- explicit mention of the debate 0.19 0.26 0.44
1+2 0.48 0.62 0.80
![Page 19: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/19.jpg)
Results
• An open data set of Dutch parliamentary debates,
• with almost 3 Million links between 450.000 speeches and URL’s of 1.5 Million news paper articles and radio bulletins at the National Library.
• accessible though a Web demonstrator and through a SPARQL endpoint.
![Page 20: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/20.jpg)
Demo
![Page 21: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/21.jpg)
![Page 22: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/22.jpg)
![Page 23: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/23.jpg)
![Page 24: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/24.jpg)
![Page 25: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/25.jpg)
![Page 26: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/26.jpg)
![Page 27: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/27.jpg)
SPARQL endpoint
• A service to query a knowledge base using the SPARQL query language.
“All speeches with more than 60 associated news items.”
SELECT ?speech ?no_newsitems {{ SELECT ?speech (COUNT(?news) AS ?no_news_items) WHERE{ ?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news . }GROUP BY ?speech }FILTER (?no_news_items > 60) }
![Page 28: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/28.jpg)
![Page 29: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/29.jpg)
![Page 30: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/30.jpg)
![Page 31: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/31.jpg)
![Page 32: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/32.jpg)
Reflection: to what extend can we answer these questions?
• Which events have attracted a lot of media attention?
• What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins?
• Has the coverage changed over time?
• How are the events visualized (photos, layout of newspaper, etc.).
![Page 33: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/33.jpg)
Future work
• More types of links
• From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf” “talksAbout”
• More types of media
• More types of (political) events.
![Page 34: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/34.jpg)
Project ‘Talk of Europe / Traveling Clarin Campus’2014-2015Funded by CLARIN-ERIC
From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer, Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)
![Page 35: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/35.jpg)
Plans of ‘ToE/TTC’
1.Publish proceedings of the EU parliamentary debates in RDF• hosted by DANS
2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we invite international partners to work with the data.
3.In collaboration with international partners:• enrich with annotations, e.g. topics, structured data about people, parties,
etc. • link to national datasets, e.g. media or national parliaments
![Page 36: Connecting political data to media data](https://reader034.vdocuments.us/reader034/viewer/2022042716/55c11935bb61ebc5328b465d/html5/thumbnails/36.jpg)