[email protected] department of communication science, vu university amsterdam semantic...

30
j.kleinnijenhuis@fsw .vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis of Source : agent / predicate / target relationships Jan Kleinnijenhuis / Wouter van Atteveldt Atteveldt

Upload: tamara-vowels

Post on 01-Apr-2015

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

[email protected]

Department of Communication Science, VU University Amsterdam

Semantic NETwork analysis

Manual and automatic content analysis of

Source :agent / predicate / target

relationships

Jan Kleinnijenhuis / Wouter van Atteveldt Atteveldt

Page 2: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 2Department of Communication Science

The Network Institute, VU University Amsterdam

Topics

1. Introduction Semantic NETwork Content Analysis

2. Human coding, using CETA, iNET

3. Automatic coding1. Extraction of source: agent /(predicate) / target

quadruples2. Sentiment Analysis: (predicate) association ..

dissocation

4. Discussion: extraction of issue positions2

Page 3: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 3Department of Communication Science

The Network Institute, VU University Amsterdam

Introduction Semantic NETwork Content Analysis

3

1

Page 4: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 4Department of Communication Science

The Network Institute, VU University Amsterdam

Backgroundo subject / predicate / object

o Early Greeks, both semantic and syntactic, agent / predicate / target

o Namen gleichen Punkten, Sätzen Pfeilen (1921)o xRy propositions, Ludwig Wittgenstein (Tractatus 3.144)

o Evaluative assertion analysis (1956)o Heider (1946) balance theory => cognitive consistency theoryo Charles Osgood, Nunnally Saporta (1956)

o Automatic content analysis, (co)occurrence keywords (1960s - ..)o Stone e.a. (1965), The General Inquirero Efficient indexing algorithms, e.g. Google, Lucene

o Semantic Network Analysis, relational content analysis (1980s-..)o Van Cuilenburg e.a. 1986, deRidder, 1994, Kleinnijenhuis e.a. 1997, 2001,

Van Atteveldt, forthcoming; also labeled as NET-method

o Semantic Web (1990s - …), xRy + logic => inferences

Page 5: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 5Department of Communication Science

The Network Institute, VU University Amsterdam

Definitions key concepts 1o (meaning) object entity

o actor, issue, Ideal or UnspecifiedReality, Actor animated, e.g. person, group, organization Issue non-animated, e.g. employment, health care Ideal, value criterion for evaluation actor or issue, e.g. referent of entranching in “Obama’s

smile is entranching”) unspecifiedReality, e.g. referent of it in “it’s lucky for Bush”)

o appearing as subjects ( agents) and/or objects (targets, recipients) in texts

o ontology o A priori knowledge of relationships between meaning objects

Politian Person Actor; Politician[period] Party Actor; politician[period] PolFunction BarackObama Democrats [1994..?]; BarrackObama PresCandidate[2007..?]

o operationalized with an ontology dictionary: set of (linguistically or statistically enhanced) queries to search for occurrence of separate

meaning objects in natural language

Page 6: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 6Department of Communication Science

The Network Institute, VU University Amsterdam

Definition Semantic NETwork Analysis1

Extraction from texts of source: agent / predicate / target-quadruples so as to infer conclusions from their network representation

o subject agent {actors, issues; default=unspecifiedReality} = meaning object directing action or energy as described by the

o predicate {dissociations .. associations}

Thesaurus: (eventually context specific) synsets of words / mwu’s whose conjugations and combinations amount predictably to a value on the dissociation..association-scale (e.g. cooperate +1; bomb -1)

o towards the object target recipient {actors, issues; default= Ideal}

o according to a (quoted or paraphrased) source {actors; default=author}

(cf. R.M.W. Dixon, 1992, A new approach to English grammar, on semantic principles)

Page 7: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 7Department of Communication Science

The Network Institute, VU University Amsterdam

NET relation types with prototype examples

Page 8: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 8Department of Communication Science

The Network Institute, VU University Amsterdam

Issue positions: often ends, means and causal expectation in 1 sentence

Het CDA gaat door met ingrepen in de zorg om de overheidsuitgaven te beperken

o Issue position, meansCDA / gaat door met (+) / ingrepen in de zorg

o Issue position, endCDA / wil beperken (-) / overheidsuitgaven

o Causal relationshipCDA: ingrepen in de zorg / om te beperken (-) /

overheidsuitgaven

Page 9: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 9Department of Communication Science

The Network Institute, VU University Amsterdam

Human coding, using CETA / iNET

9

2

Page 10: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 10Department of Communication Science

The Network Institute, VU University Amsterdam

NET by human coders using CETA (Jan A, de Ridder),

iNET (Wouter van Atteveldt)

Page 11: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 11Department of Communication Science

The Network Institute, VU University Amsterdam

SNA by human coders using INET, ontology lookup

Page 12: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 12Department of Communication Science

The Network Institute, VU University Amsterdam

SNA by human coders using INET, network lookup

Page 13: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 13Department of Communication Science

The Network Institute, VU University Amsterdam

SNA by human coders using INET, 3 more networks

Page 14: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 14Department of Communication Science

The Network Institute, VU University Amsterdam

Automatic coding. Source: subject /pred/object-extraction

14

3.1

Page 15: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 15Department of Communication Science

The Network Institute, VU University Amsterdam

Tools ontology construction: co-occurrence analysis

Amos Tversky (1977): features of similarity

Islam*,terror* and immig*in NRC, AD 2004-2006

Page 16: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 16Department of Communication Science

The Network Institute, VU University Amsterdam

Tools ontology 2: syntactic profiling http://www.let.rug.nl/gosse/bin/verwant.py

BOLKESTEIN 1994-1995Werkwoorden waarmee Bolkestein als lijdend voorwerp geassocieerd is: kapittel, beticht, haal uit naar, zet af tegen, besta tussen, beweer, sla aan, citeer, typeer, kritiseer, beschuldig, waarschuw, bedien, verwijt, vergelijk, val aan, leg voor aan, roep, verras, overtuig

Werkwoorden waarmee Bolkestein als onderwerp geassocieerd is: schop-in de war, moraliseer, zoek aan, matig, scherts, overspeel, zwengel aan, verkwansel, nuanceer, snoer-de mond, herroep, kom-in botsing, zwalk, neem terug, krab, trek-van leer, vier feest, maak-korte metten, opteer, belijd

Bijvoeglijke naamwoorden waarmee Bolkestein geassocieerd is: negentiende-eeuws

BRINKMAN 1994-1995Werkwoorden waarmee Brinkman als lijdend voorwerp geassocieerd is: licht-beentje, sta-terzijde, tik-op de vingers, fluit terug, reken aan, stem op, ondervraag, interview, adviseer, wijs aan, schrijf af, prijs aan, eer, corrigeer, sta bij, kritiseer, houd-in de gaten, confronteer, schuif, spreek aan

Werkwoorden waarmee Brinkman als onderwerp geassocieerd is: diskwalificeer, paai, bijt vast, nuanceer, volhard, draag mee, sta-te woord, bezin, baal, profileer, blijf aan, leun, herzie, poseer, speculeer, beraad, leg neer, bid, heb-de tijd, betreur

Bijvoeglijke naamwoorden waarmee Brinkman geassocieerd is: gereformeerd, kil, arm, ander

© Gosse Bouma, RUG

Page 17: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 17Department of Communication Science

The Network Institute, VU University Amsterdam

Automation NET

Page 18: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 18Department of Communication Science

The Network Institute, VU University Amsterdam

Alpino-tree ==> source: subject / pred / object

Page 19: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 19Department of Communication Science

The Network Institute, VU University Amsterdam

Concurrent validity extraction Source: subject / (pred )/ object

Page 20: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 20Department of Communication Science

The Network Institute, VU University Amsterdam

Predictive validity, Sources and Subjects (acting actors)

Page 21: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 21Department of Communication Science

The Network Institute, VU University Amsterdam

Automatic coding. Sentiment analysis assoc .. dissoc

21

3.2

Page 22: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 22Department of Communication Science

The Network Institute, VU University Amsterdam

Sentiment analysis: decomposition F1 performance

Page 23: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 23Department of Communication Science

The Network Institute, VU University Amsterdam

F1 per relation type, elections 2006 manual corpus

Page 24: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 24Department of Communication Science

The Network Institute, VU University Amsterdam

Sentiment analysis: aggregate performance 2006 campaign

Page 25: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 25Department of Communication Science

The Network Institute, VU University Amsterdam

Extraction aggregate issue positions, 2006 campaign

Page 26: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 26Department of Communication Science

The Network Institute, VU University Amsterdam

Relative performance 2006 campaign full ‘grammar’ model

Cell entries represent correlation coefficients with manual content analysis

Page 27: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 27Department of Communication Science

The Network Institute, VU University Amsterdam

Discussion: Content Analysis of Issue Positions

27

4

Page 28: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 28Department of Communication Science

The Network Institute, VU University Amsterdam

Automatic extraction of Issue Positions presupposeso Manual codings (machine learning; validity tests)

o Ontology of meaning objects (actors, issues, values, reality)o Ontology dictionary

o Linguistic preprocessing:o tokenization, lemmatizing, parsing syntax grapho e.g. Van Noord, Bouma : ALPINO

o Identificationo Syntax graph + rules semantic roles of source, agent,

targeto Semantic roles + ontology dictionary + anaphora

resolution + posthoc extraction source:agent/pred(assoc..dissoc)/target

o Sentiment analysis pred(assoc..dissoc)

Page 29: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 29Department of Communication Science

The Network Institute, VU University Amsterdam

Discussion: prospects for advanceo Ontology, ontology dictionary

o e.g. more subissues, context-specific synonyms

o Rules to transform syntax graph semantic roleso e.g. rules dealing with different modifier types

o sentiment analysiso e.g. multi word unit-recognition; informed features

o Combining rule-based with statistical approaches (e.g. LSA) starting from high-order linguistic features

o Error analysiso e.g. more specific validity tests starting from manual

coding

o Other languages .. new language domains ..

Page 30: J.kleinnijenhuis@fsw.vu.nl Department of Communication Science, VU University Amsterdam Semantic NETwork analysis Manual and automatic content analysis

Semantic Network Analysis 30Department of Communication Science

The Network Institute, VU University Amsterdam

Literature Antoniou, G., & van Harmelen, F. (2004). A semantic web primer.

Cambridge: MIT Press. Bouma, G. (2005). Zoek verwante woorden, uitgaande van Algemeen

Dagblad en het NRC Handelsblad van 1994 en 1995 (80 miljoen woorden). . Rijksuniversiteit Groningen / NWO: http://www.let.rug.nl/gosse/bin/verwant.py.

Bouma, G., & van Noord, G. (2005). ALPINO: automatisch ontleden van het Nederlands. RUG Alfa Informatica / NWO: http://www.let.rug.nl/~kleiweg/alpino/index1.html; http://www.let.rug.nl/~vannoord/alp/.

de Ridder, J. A. (1994). Van Tekst naar informatie: ontwikkeling en toetsing van een inhoudsanalyse-instrument. Amsterdam: Universiteit van Amsterdam (proefschrift).

Dixon, R. M. W. (1992). A new approach to English grammar, based on semantic principles. Oxford: Clarendon.

Dixon, R. M. W. (2005). A semantic approach to English grammar. Oxford: Clarendon.

Kleinnijenhuis, J., de Ridder, J. A., & Rietberg, E. M. (1997). Reasoning in economic discourse: an application of the network approach to the Dutch press. In C. W. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts (pp. 191-207). New York: Erlbaum.

Kleinnijenhuis, J., Scholten, O., van Atteveldt, W. H., van Hoof, A. M. J., Krouwel, A. P., Oegema, D., et al. (2007). Nederland vijfstromenland: de rol van media en stemwijzers bij de verkiezingen in 2006. Amsterdam: Bert Bakker.

Kleinnijenhuis, J., & van Atteveldt, W. H. (2006). Geautomatiseerde inhoudsanalyse, met de berichtgeving over het EU-referendum als voorbeeld. In F. Wester (Ed.), Inhoudsanalyse: theorie en praktijk. Dordrecht: Kluwer.

Kleinnijenhuis, J., Van Hoof, A. M. J., Oegema, D., & De Ridder, J. A.

(2007). A test of rivaling hypotheses to explain news effects: news on issue positions of parties, real world developments, support and criticism, and success and failure. Journal of Communication, 57(2), 366-384.

Krippendorff, K. (2004). Content Analysis. Thousand Oaks: Sage. Osgood, C. E., Saporta, S., & Nunally, J. C. (1956). Evaluation assertion

analysis. Litera, 3(. ), 47-102. van Atteveldt, W. H. (2008, forthcoming). Extracting and Representing

Semantic Networks from Textual Sources (working title). Amsterdam:: PhD thesis, VU University Amsterdam.

van Atteveldt, W. H., Kleinnijenhuis, J., & Ruigrok, P. C. (submitted for publication 2007). Parsing, Semantic Networks and Political Authority: using syntactic analysis to extract semantic relations from Dutch newspaper articles.

van Atteveldt, W. H., Kleinnijenhuis, J., Ruigrok, P. C., & Schlobach, S. (2008, forthcoming). Good News or Bad News? Conducting sentiment analysis on Dutch text to dinstinguish between positive and negative relations. resubmitted for publication in Journal of Information Technology and Politics; presented as paper at Etmaal van de Communcatiewetenschap, Amsterdam, February 2008.

van Atteveldt, W. H., Schlobach, S., & van Harmelen, F. (2007). Media, Politics and the Semantic Web. In Proceedings of the European Semantic Web Conference 2007 (pp. 205-219). Berlin: Springer.

van Cuilenburg, J. J., Kleinnijenhuis, J., & de Ridder, J. A. (1986). Towards a graph theory of journalistic texts. European Journal of Communication, 1, 65-96.

van der Beek, L., Bouma, G., & van Noord, G. (2002). Een brede computationele grammatica voor het Nederlands. Nederlandse Taalkunde; downloaded from http://www.let.rug.nl/~vannoord/papers/taalkunde.pdf.