anti-system web portals and their network of meaning: a ...€¦ · •corpus-linguistics method to...

24
Anti-system web portals and their network of meaning: a corpus-based approach in Czech Masako Fidler (Brown University) Václav Cvrček (Charles University) February 7, 2020 AATSEEL, San Diego

Upload: others

Post on 10-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Anti-system web portals and their network of meaning:

a corpus-based approach in Czech

Masako Fidler (Brown University)Václav Cvrček (Charles University)

February 7, 2020 AATSEEL, San Diego

Page 2: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Objectives

Important features of anti-system web portals• ideological partiality (selection of topics, aspects of events-participants)• discourse spins (evaluation, sentiment)• manipulation (+ potential effects on the audience)

..., which are manifested by (explained more later):• Keywords (KWs) = pointers to topics • “Associative links” = likely co-habitation of two topics in one text• Repetition (cloning whole texts, repeating parts of texts taken from

somewhere else)• Highlights of the current pilot study• Further research: contrasting the results with other media classes.

Page 3: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Part of a larger projectOn methodology:

• Keyword analysis using Difference Index, time-sensitive perception of political texts (Fidler & Cvrček 2015)

• Inflection informing of discourse strategies – idea of keymorph analysis (Fidler & Cvrček 2019)

On anti-system web portals:

• Multi-prominence analysis, keylemmas, keymorphs, clauses containing keywords showing political stance. (Fidler & Cvrček 2018, Cvrček & Fidler 2019)

The current presentation:

• new method of exploring a larger network of meanings

• prospect of constantly updated and searchable web portal texts with diverse political orientations

Page 4: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Data – WebMedia corpus (a “start-up” corpus)

• Data from Czech news portals (10/2017–10/2018)• Total: 94 M tokens, 44 K texts• Classification of media types based on Josef Šlerka‘s

mapamedii.cz:• Similarity of audience (based on facebook posts, Alexa Rank)

• lemmatized, tagged (CNC)

Page 5: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Audience-based classification

Source: J. Šlerka, http://www.mapamedii.cz/mapa/typologie/index.php

Class Tokens DocsAlternativní ‘alternative’ 3,761,057 2,464Antisystémové ‘anti-system’ 14,666,404 7,401Bulvár ‘tabloids’ 13,435,010 3,816Levý střed ‘left-wing mainstream’ 14,511,164 5,695Politický bulvár ‘political tabloid’ 29,617,511 13,088Pravý střed ‘right-wing mainstream’ 18,121,533 12,110

Page 6: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Corpus-based methods for interpreting discourseWhat’s on the menu:1. Collocation analysis2. Keyword analysis (KWA):• Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) • KWs = words which appear statistically significantly more often in the target text(s) than in the

reference corpus (SYN2015)

3. Cloned texts (KWs and qualitative analysis)4. Market Basket analysis (MBA):• Data-mining technique for identifying interrelated choices among sets of items (shopping items, keywords in

texts…), extracts associative links• Output: rules (associative links) = patterns of behavior (antecedent → consequent) allowing for anticipation

(A → B: “if a text contains the word A it will most probably contain also the word B somewhere in the text”)

Theoretical motivations for using the methods:• KWs point to major topics (what the text is about)• Associative links point to associations shared among individual texts.

Page 7: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

What we actually did so far1. Compilation and annotation of WebMedia corpus

2. KWs identification within each text (circa 7,400 texts, each containing on the average 34 KWs)

• KWA analysis based on lemmas• Min. frequency in a text: 3

• Confidence level (based on log-likelihood): 0.001

• Min. effect size (based on DIN; Fidler and Cvrček 2015): 70

3. Establishing associative links (via MBA) in each media class• Min. support (proportion of texts containing both A and B) = 0.1% (the link

appears in at least 7 texts out of 7.000)

• Min. confidence (how often B appears in texts that contain the A) = 75%

• Lift = How much our confidence has increased to expect that KW B be present in a text when KW A is already in the text.

Page 8: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Keywords

Page 9: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Anti-system unique keywords (samples)

• neocon (DIN = 100)• anglosionistická ‘anglozionist’ (100)• apostata ’apostate’ (100)• Russiagate (99.153)• amík ‘Ami’ (98.0)• havloidní ‘Havel-ish (pejor.)’ (97.522)• Dolarizace ‘dollarization’ (97.397)• židozednářský ‘Judeo-Masonic’ (97.146)• Armageddon (92.731)• Antirusismus ‘anti-russian’ (95.98)• Vazalství ‘suzerainty’ (93.899)

References to the US, neoconservatives, negative evaluative term for Americans/US and US economic expansion

Heresy of western church (pope Francis)

Words pointing to antisemitism and conflation with freemasonry and Anglosaxonness.

References to a catastrophe

Concern for anti-Russian actions (situations, entities)

Concern for subjugation

Page 10: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

1. Key keywords (top 10) > motivation for focusing on Russia and USA (and EU for comparison)

Anti-system mediaKeyword (lemma) # of textsRusko ‘Russia’ 1931USA 1696prezident ‘president’ 1680ruský ‘Russian’ 1563americký ‘American’ 1472vláda ‘government’ 1300politický ‘political’ 1212válka ‘war’ 1105země ‘country’ 1085EU 1025

(Center-right) mainstream mediaKeyword (lemma) # of textspodle ‘according to’ 2730prezident ‘president’ 2312vláda ‘government’ 2262uvést ‘say, state’ 1797Babiš [Prime minister of CzR] 1587volba ‘election’ 1540soud ‘court, trial’ 1382strana ‘party’ 1324Zeman [CzR President] 1200předseda ‘chairman (of the government)’ 1155

Key keywords = words which appear as keywords in more texts in the corpus

Anti-system media tend to deal with specific aspects of international events, while center-right mainstream media are more interested in domestic politics.

Page 11: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

2. KW co-occurrence EU (sample) (1st approx)

Page 12: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Cloned texts

Page 13: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

3. Text cloning in anti-system web portalsClass Texts in total Duplicated texts %duplAlternative 2464 27 1,10%

Anti-system 7401 1923 25,98%

Tabloid 3816 5871 153,85%

Center-left 5695 71 1,25%

Political tabloid 13088 933 1,25%

Center-right 12110 626 5,17%

NOTE

• Text duplicity in tabloid portals is different from that of anti-system portals. The former occurs for technical reasons

⇒ Need to look at the texts themselves qualitatively

Page 14: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Some samples of KWs from the top 5 texts and recurrent KWs (highlighted)military conflicts, threat, and dominationpolitical conflict and press + evaluation (čistka, presstitutky)Russia and the US

Cloned texts, notes

KWs

1 (selected, translated)

americký, prezident, válka, akt, Trump, Kreml, Russiagate, institut, Rusko, nejen, volba, ruský, tvrzení, útok, čili, demokrat

2 (selected, translated)

Rusko, ruský, Washington, evropský, americký, válka, západ, pochopit, vojenský, proti, nařídit, přesvědčit, zájem, otevřeně, vláda, prezident, napadení, nadvláda, provincie, britský, existovat

3 (selected, translated)

Korea, severní, válka, americký, USA, proti, korejský, světový, ztratit, lid, generál, Trump, hrozba, představovat, obyvatel, krize, nukleární, oheň, populace, oběť, KLDR, válečný, vojenský, zběsilost

4 (Czech author) Praha, pak, náš, škola, pražský, vláda, republika, názor, NATO, EU, politický, Brusel, jinak, přijmout, žák, obyvatel, Drahoš, lid, prezident, území, Pražák, včetně

5. (selected, translated)

Trump, presstitutky, zbavit, americký, USA, fakt, jestliže, Amerika, prostě, věřit, prezident, Mueller, demokratický, čistka, demokrat, Rusko

Page 15: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Properties of cloned texts (illustration) • Texts from North American websites• Modification suggesting interest specifically in military threats • Encouraging further cloning, including request for financial contribution to

the source site• Speed suggests priority: fast cloning of texts within one to two days• Repetition: casting a wider net to reach more readers• Repetition of whole texts à the image of the “united front” of the group

of web portals (in contrast to the mainstream media)

Page 16: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Associative links among KWs

Page 17: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Associative linksHigher level of abstraction; whichtopics are interrelated in discourseBased on:• KWs of individual texts• Co-occurrence of KWs in texts

within media classes

Page 18: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

4. Associative links – general infoSample for closer examination• Reasonably strong (lift > 4)• Containing US, Russia (and EU)

• Rusko, ruský, proruský, protiruský, RF, Putin, Putinův, Moskva, moskevský, Kreml, kremelský, Rus, FSB, GRU, Lavrov, Medveděv, Lavrovův, Medvěděvův

• Amerika,americký, proamerický, protiamerický, USA, Trump, Trumpův, Washignton, washingtonský, Obama, Obamův, Američan, Fed, CIA, FBI, Tillerson, Tillersonův, Pompeo, Pompeův

• EU, EP, unie, unijní, prounijní, evropský, protiunijní, Brusel

Total number of all links• 11,427 (all classes)• Anti-system: 2027

(cf. out of 7,400 texts)• Center-right mainstream: 1205

(cf. out of 12,100 texts!)

⇒ anti-system has wider network of associations

Page 19: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Associative links leading both to Russia and US

Suggest points of common interests/contention between Russia and US• Intelligence activities involving Russia and US: FSB, department• Ukraine conflicts: boeing (MH17), ATO, luganský, neonacista

o Evaluation (neonacista) – high in antisystem and alternative media, but KW links point to Ukraine conflicts for antisystem and to Germany for alternative media

oUse of luganský (closer to the Russian equivalent) rather than luhanský(former nearly non-existent in other portals, latter more dispersed over portals)

Page 20: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Associative links àspecifically USlhs rhs liftObamův ‘Obama’s’ americký 5.028FSB USA 4.364boeing USA 4.364department USA 4.364Ato USA 4.364automobil USA 4.196Nixon americký 4.190ozbrojenec ‘armed person’ USA 4.170Porošenkův ‘Porošenko’s’ USA 4.121Arm ‘(US) Army’ americký 4.114uskutečnit ‘to make it happen’ USA 4.107luganský USA 4.103neonacista USA 4.052

• Associative links are in general weaker than those connecting to Russia (4.1—5) (text is more densely connected to Russia than to the US)

• US appears in discourse dealing with armed forces, leaders not very forthcoming to Russia (US as anti-Russia, associated with force, possibly aggression)

Page 21: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Associative link à specifically to Russialhs rhs liftOsetie Putin 9.834GP Putin 9.219Sobčak Putin 8.692Osetie ruský 4.735FSB ruský 4.735Lotyšsko ruský 4.735Sevastopol ruský 4.735boeing ruský 4.735department ruský 4.735Ato ruský 4.735Savčenková ruský 4.735FIFA ruský 4.735Kemerovo ruský 4.735Sobčak ruský 4.735luganský ruský 4.523mužstvo‘team’ ruský 4.520Bělorusko ruský 4.439

Various threats to Russia • Pro-Ukraine Savčenková,

Babčenko, Zacharčenko as victim

• Doping (FIFA, mužstvo, MOV, dopingový)

• Border states (Ukraine, Belarus, Latvia)

• Internal threats (Osetie, Dagestán)

• Diplomatic threat (Unfair action by the US and European states, after Kemerovo tragedy)

• no explicit reference to specifically Russian armed forces

neonacista ruský 4.397komentovat ruský 4.375Babčenko ruský 4.371MOV ‘InternationaOlympic Committee’ ruský 4.262Dúmě ruský 4.262Petrov ruský 4.209Dagestán ruský 4.209dopingový ruský 4.209uskutečnit ‘to make it happen’ ruský 4.178azovský ruský 4.143flotila ruský 4.077Zacharčenko ruský 4.059ostřelování ruský 4.007automobil ruský 4.007

Page 22: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Observations

1. Concepts unique to the portals – key keywords unique to the antisystem portals – point to unique topics: evaluative words for US, antisemitism (more to be examined)

2. Cloned texts – point to the “ready-to-serve” ideas by recurring in many portals: catastrophic future, danger of provoking Russia (more to be examined)

3. Associative links – likely co-habitation of two KWs in one text• threats to Russia – consistent with the results of the Multi-level Discourse Prominence

Analysis (Fidler& Cvrček 2018, Cvrček & Fidler 2019 on Sputnik Czech Republic)• US associated with force• Russia and US share associative links suggesting common interests/points of contention

Page 23: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

ReferencesBaker, P. 2004. ‘Querying keywords: questions in difference, frequency, and sense in keyword analysis’, Journal of English Linguistics 32 (4), pp. 346–59.

Fidler, Masako, and Václav Cvrček. 2015. A data-driven analysis of reader viewpoint: Reconstructing the historical reader using keyword analysis. Journal of Slavic Linguistics 23(2): 197–239.

Fidler, Masako, and Václav Cvrček. 2018. Going Beyond “Aboutness”: A Quantitative Analysis of Sputnik Czech Republic. In: Fidler, M. and V. Cvrček eds, Taming the Corpus. From Inflection and Lexis to Interpretation. Springer, 195–225.

Cvrček, Václav and Masako Fidler.. 2019. More than keywords: Discourse prominence analysis of the Russian Web portal Sputnik Czech Republic. In: Berrocal, M. and A. Salamurović eds, Political Discourse in Central, Eastern and Balkan Europe. Amsterdam/Philadelphia: John Benjamins. 93–117.

Fidler, Masako, and Václav Cvrček. 2017 [2019]. Keymorph analysis, or how morphosyntax informs discourse. Corpus Linguistics and Linguistic Theory. 15/1, 39–70.

Scott, Mike. 1997. “PC analysis of key words—and key key words”. System 25(2): 233–45.

Scott, Mike. 1999. WordSmith tools help manual, Version 3.0. Oxford: Oxford University Press.

Scott, Mike. 2010. “Problems in investigating keyness, or cleansing the undergrowth and marking out tails…”. Marina Bondi and Mike Scott, eds. Keyness in texts. Amsterdam: Benjamins, 43–57.

Scott, Mike and Christopher Tribble. 2006. Textual patterns: Keyword and corpus analysis in language education. Amsterdam: Benjamins.

Šlerka, Josef. mapamedii.cz

Page 24: Anti-system web portals and their network of meaning: a ...€¦ · •Corpus-linguistics method to identify prominent units (keywords, KWs) (Scott 1997, 2006, 2010) •KWs = words

Thank you!

Acknowledgements: This research was supported in part by programme Progres Q08 Czech National Corpus implemented at the Faculty of Arts, Charles University and theHumanities Research Grant (Brown University)