the language of politics - denise milizia website · 1 and the language of politics (updated may...

28
1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based on examples of real life language use. 2. What is a corpus? A corpus is a computerized collection of texts amenable to automatic or semi-automatic analysis. The texts are selected according to explicit criteria in order to capture the regularities of a language, a language variety or a sub-language. This data may be in spoken, written or intermediate forms (written and spoken merged) and can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language. Corpus is a Latin word. The plural is corpora. 3. What is the difference between a parallel corpus and a comparable corpus? A parallel corpus is a corpus consisting of a set of texts in one language and their translation in another language: they contain texts which stand in a translational relationship to each other. An example of parallel corpora are all the European documents, for instance the Lisbon Treaty (2007), written in 23 original languages (although we do not know here in which language the texts were originated). For example, Article 8 below (in English and Italian) is taken from the European Constitution (2004), written in 21 original languages: Article I-8 The symbols of the Union The flag of the Union shall be a circle of twelve golden stars on a blue background. The anthem of the Union shall be based on the “Ode to Joy” from the Ninth Symphony by Ludwig van Beethoven. The motto of the Union shall be “United in Diversity”. The currency of the Union shall be the euro. Europe day shall be celebrated on 9 May throughout the Union. Articolo I-8 I simboli dell’Unione La bandiera dell’Unione rappresenta un cerchio di dodici stelle dorate su sfondo blu. L'inno dell'Unione è tratto dall'«Inno alla gioia» della Nona sinfonia di Ludwig van Beethoven. Il motto dell'Unione è: «Unita nella diversità». La moneta dell'Unione è l'euro. La giornata dell'Europa è celebrata il 9 maggio in tutta l'Unione. As we read in Article 448 of the failed European Constitution, all the languages are original. Of course, this is hard to believe, and we think that one is the original language (very likely French or English or both) and all the others are translations.

Upload: others

Post on 22-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

1

and

the language of politics

(updated May 2011)

1. What is Corpus Linguistics? Corpus Linguistics is the study of language based on examples of real life language use.

2. What is a corpus? A corpus is a computerized collection of texts amenable to automatic or semi-automatic analysis. The texts are selected according to explicit criteria in order to capture the regularities of a language, a language variety or a sub-language. This data may be in spoken, written or intermediate forms (written and spoken merged) and can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language. Corpus is a Latin word. The plural is corpora.

3. What is the difference between a parallel corpus and a comparable corpus? A parallel corpus is a corpus consisting of a set of texts in one language and their translation in another language: they contain texts which stand in a translational relationship to each other. An example of parallel corpora are all the European documents, for instance the Lisbon Treaty (2007), written in 23 original languages (although we do not know here in which language the texts were originated). For example, Article 8 below (in English and Italian) is taken from the European Constitution (2004), written in 21 original languages:

Article I-8

The symbols of the Union The flag of the Union shall be a circle of twelve golden stars on a blue background. The anthem of the Union shall be based on the “Ode to Joy” from the Ninth Symphony by Ludwig van Beethoven. The motto of the Union shall be “United in Diversity”. The currency of the Union shall be the euro. Europe day shall be celebrated on 9 May throughout the Union.

Articolo I-8

I simboli dell’Unione La bandiera dell’Unione rappresenta un cerchio di dodici stelle dorate su sfondo blu. L'inno dell'Unione è tratto dall'«Inno alla gioia» della Nona sinfonia di Ludwig van Beethoven. Il motto dell'Unione è: «Unita nella diversità». La moneta dell'Unione è l'euro. La giornata dell'Europa è celebrata il 9 maggio in tutta l'Unione.

As we read in Article 448 of the failed European Constitution, all the languages are original. Of course, this is hard to believe, and we think that one is the original language (very likely French or English or both) and all the others are translations.

Page 2: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

2

Article VI-448

Authentic texts and translations 1. This Treaty, drawn up in a single original in the Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Slovak, Slovenian, Spanish and Swedish languages, the texts in each of these languages being equally authentic, shall be deposited in the archives of the Government of the Italian Republic, which will transmit a certified copy to each of the governments of the other signatory States. […]

Articolo VI-448

Testi autentici e traduzioni 1. Il presente trattato, redatto in unico esemplare in lingua ceca, danese, estone, finlandese, francese, greca, inglese, irlandese, italiana, lettone, lituana, maltese, olandese, polacca, portoghese, slovacca, slovena, spagnola, svedese, tedesca e ungherese, il testo in ciascuna di queste lingue facente ugualmente fede, sarà depositato negli archivi del governo della Repubblica italiana, che provvederà a trasmetterne copia certificata conforme a ciascuno dei governi degli altri Stati firmatari. […]

Another example of parallel texts are the in-flights magazines that we find on airplanes. For example, if we fly from Roma to New York the magazine will almost certainly be written in English and in Italian, where one of the two will be the original language and one its translation. Another example of parallel texts are the articles of The Economist translated weekly by the Italian magazine Economy (published with Panorama).

The world’ leading banks decided some years ago that lending is a mug’s game. They began to get rid of their loans, repacking them and selling them off as securities, or getting others to re-insure their risk. And the policy has been bearing fruit. The glut of corporate bankruptcies in 2001 and 2002 – including the two biggest of all time, Enron and World Com – have not had the devastating effect on the big banks’ balance sheets that might have been expected. The two biggest banks in America, for instance, have hardly registered a tremor. Citigroup’s profits for the second quarter of this year were $4.3 billion (12% up on a year earlier), and those of J.P. Morgan Chase were $1.8 billion for the same …

Alcuni anni fa, le principali banche internazionali hanno capito la scarsa redditività dei prestiti. Così, hanno cominciato a disfarsi dei propri mutui, riconfezionandoli e rivendendoli sotto forma di titoli, oppure ottenendo che il rischio collegato ai prestiti venisse preso in carico da altri. Questa politica ha dato buoni frutti. I numerosi fallimenti societari avvenuti nel 2001 e 2002 – compresi quelli di Enron e WorldCom, i maggiori di tutti i tempi in USA – infatti non hanno avuto l’effetto devastante sui bilanci delle grandi banche che molti prevedevano. I due principali istituti americani ne hanno risentito appena. Nel secondo trimestre di quest’anno, Citigroup ha realizzato 4,3 miliardi di dollari di utili (+12% rispetto all’anno precedente)…

The Economist

Page 3: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

3

It is not easy to find parallel corpora; of course the easiest example of parallel corpora are literature books. Let’s think, for instance, of all the translations of Shakespeare’s works in many foreign languages. A comparable corpus includes texts which are all original (that is, they are not translated), for example the speeches of George W. Bush, Tony Blair, Gordon Brown, Barack Obama, David Cameron, Nick Clegg, Silvio Berlusconi, Romano Prodi. They are comparable in terms of topic, communicative function, size and time span (but not always).

4. What is the difference between a spoken corpus and a written corpus? A spoken corpus is a collection of linguistic spoken data, namely a transcription of recorded speech which may include speeches, interviews, statements, press conferences delivered by politicians or any other person. The speeches are transcribed by expert transcribers. An example of spoken corpus is ABC (American British Corpus), a corpus which includes speeches of American and British politicians, from 1997 till today. A written corpus is a collection of written texts, like for example ECCO (Economic Comparable Corpus), a corpus assembled by the students of the Faculty of Economics including articles on finance, economics and marketing. Written corpora usually outnumber spoken corpora. For instance, the British National Corpus (BNC) includes 90% of written texts and only 10% of spoken texts.

5. What are corpora used for? Corpora are used for translation purposes, for teaching and studying purposes. Studying the language included in a corpus assembled in 2011, for example, allows researchers and students alike to analyse fresh and authentic language, namely the language which is really written and spoken, and not the language which sometimes is included in text books which is not really used by native speakers. It is very important for learners and other users to examine only real instances of language. There is no justification for inventing examples, although many people seem doomed to work with invented material. To illustrate a simple subject-verb clause, something like Birds sing is not good enough, even because very rarely will students find themselves using this phrase. Also the example used to explain the passive voice, The apple is eaten by me (the active being I eat the apple), although grammatically correct, will never be used and thus there is no point in learning this type of sentence, which, surely, will never appear in a corpus. Conversely, On January 20th Barack Obama was sworn in as the 44th President of the United States, or British Prime Minister Gordon Brown was criticized for not taking part in the main ceremony, is surely a more interesting and effective instance for students. It is essential to learn from actual examples, examples that can be trusted because they have been used in real communication.

6. How many general English corpora are you aware of? Are they freely available? The British National Corpus (BNC), the Bank of English (BoE), MICASE (Michigan Corpus of Academic Spoken English), the Brown Corpus, the LOB (Lancaster-Oslo-Bergen) Corpus, among many others. The Bank of English (BoE) was launched in 1991 by COBUILD (a division of HarperCollins Publishers) and The University of Birmingham. This huge collection is composed of a wide range of different types of writing and speech. It contains samples of the English language from hundreds of different sources. Written texts come from newspapers, magazines, fiction and non-fiction books, brochures, leaflets, reports, letters, and so on. Spoken texts are represented by transcriptions of everyday casual conversation, radio broadcasts, meetings, interviews and discussions, etc. The material is up- to-date, with the majority of texts originating after 1990. Taken together the Bank of English provides objective evidence about the English which most people read, write, speak and hear every day of their lives. This corpus today stands at 450 million words. The BNC and the BoE are not freely available. The subscription to the BoE, for example, is GBP 500 a year.

7. How many general Italian corpora are you aware of? Are they freely available?

Page 4: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

4

CORIS/CODIS has been available since September 2001. It consists of 100 million words and it is up-dated every two years. It is composed of a collection of authentic texts, in electronic format, designed to be representative of a wide cross-section of current Italian. CORIS/CODIS is free of charge. All you need to do is to write to the University of Bologna, tell them who you are, why you want to use the corpus, and they will provide you with a password within 48 hours at the latest. The first spoken Italian corpus is called CLIPS (Corpora e Lessici dell’Italiano Parlato e Scritto), completed in 2004 and presented to the Italian community in May 2007 at the University of Naples. This corpus is also free of charge, and we can have free access by contacting the University of Naples who will provide a password. CORPS (Corpus of Political Speeches) was released in January 2011 and is also freely available for research purposes: it is a corpus of political speeches tagged with specific audience reactions, such as applause and laughter. We found this very interesting, in that, whereas the American part of ABC includes both applause and laughter, in the British part (and in the Italian) transcribers have opted for not maintaining applause and laughter in their transcription, even though they occur. In the new release of CLIPS there are more than 3600 speeches, about 7.9 millions words, and more than 67 thousand tags about audience reactions. We believe that laughter plays, even more than applause, an important role in interaction, hence deleting such signals is regarded as great information loss. In ABC other markers like hesitations (erm), backchannelling (mhm), and others typical of spontaneous speech have also been removed.

8. What are the advantages of studying through corpora? And the disadvantages? The advantages are certainly those of studying authentic language without learning old and obsolete expressions. The disadvantages are that, when speaking, people may make mistakes, which almost always transcribers decide to keep in the transcription. Sometimes transcribers write [sic], to indicate that was the speaker’s mistake. A student who does not realize that a certain form is wrong might learn it thinking it is correct. For example, in ABC we found a few instances of I am looking forward to continue working with you, and this is an historical moment, rather than I am looking forward to continuing working with you and this is a historical moment. These forms are so frequently used today that they are almost accepted as being correct.

9. What is the most frequent word in all the speeches George Bush delivered from 2001 to 2008? And in the speeches Tony Blair delivered from 1997 to 2007? And in the speeches delivered by Barack Obama in the first two years of presidency? And in the speeches David Cameron and Nick Clegg have delivered since May 2010? Is this first word a function word or a content word? The most frequent word in all these politicians is always THE. It is a function word (grammar word) and it is always the most frequent word in the English language in general. It usually amounts to about 6% of all the words in the whole corpus.

10. What is the first content word in ALL politicians in “normal” times? (not in election times) In all the politicians we have looked at – Tony Blair, Gordon Brown, David Cameron, Nick Clegg, George W. Bush, Condoleezza Rice, Bill Clinton, Hillary Clinton, Barack Obama, Joe Biden – the first content word is always PEOPLE. In George W. Bush’s wordlist PEOPLE ranks 23rd, in Tony Blair it ranks 25th; in Barack Obama PEOPLE ranks 33rd (see Figure 2). The words coming before the word PEOPLE are all grammar (or function) words. Yet, the most frequent content word in electoral times is very often different. For example, in the speech Barack Obama delivered in Denver, Colorado, on 28 August 2008 when he accepted the nomination, the most frequent word was not PEOPLE (ranking 88th) but PROMISE (ranking 23rd), as Figure 3 shows. The second important content word was, not surprisingly, CHANGE, ranking 49th.

Page 5: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

5

Figure 1. WordList in Barack Obama’s speeches (first 30 words)

Page 6: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

6

Figure 2. WordList in Barack Obama’s speeches (first 60 words)

Page 7: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

7

Figure 3. WordList in Barack Obama’s speech delivered in Denver in August 2008

11. How do most American politicians end their speeches?

Most of them end their speeches saying: “God bless you”, or “God bless America”, or “God bless you, and may God bless the United States of America”. Bush sometimes ended by simply saying “God bless”. He also used to say “May God bless America and protect our troops”. Barack Obama’s speeches also end with “God bless you, and may God bless the United States of America”.

12. Can you give the definition of phrase? And of cluster? A phrase is a multi-word unit. Different labels have been given to “phrase”: clusters, n-grams, concgrams, lexical bundles, prefabs (prefabricated language). Mike Scott in WordSmith Tools speaks of clusters rather than phrases. It is clear that the sum of the single words does not correspond to the meaning of the whole phrase. Whatever designation is preferred, the common thread is that words are not chosen freely, but are placed on a cline between the open choice principle and the idiom principle. The latter governs ‘prefabs’, where content is not given by its individual item but is attached to the whole phrase. Thus, meaning is given by the unit as a whole, working in accordance with phraseological conventions.

13. Can you give an example of a 2-word cluster? And of a 3-word cluster? And of a 4-word cluster? The list below includes only some examples. You can mention any examples you like. as well = 2-word unit in that out there in hindsight so far = as yet how come

Page 8: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

8

at least at stake by far for good right away to date go broke let alone as well as = 3-word unit a great deal by the way come into force connect the dots cut and run deliver the goods food for thought foot the bill give the floor in order to in spite of ins and outs into harm’s way just like that make ends meet on behalf of on my watch on this ticket time and again to my mind cast one’s ballot take for granted pay lip service you name it all of a sudden = 4-word unit as soon as possible for the time being go into the red from all walks of life in the long run on the brink of see eye to eye on so far so good when it comes to see eye to eye on = 5-word unit from all walks of life in a matter of months stand shoulder to shoulder with turn a blind eye to at the end of the day = 6-word unit and all the rest of it Some expressions, like at the end of the day, and all the rest of it, just like that, so far, can be interpreted both according to the open choice principle and the idiom principle.

Page 9: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

9

The expression just like that did not appear in dictionaries until only a few years ago, although it is a frequent cluster, mainly in spoken language, as Figure 4 shows. It has now finally appeared in dictionaries. The table shows that the semantic prosody (positive or negative connotation that every words or phrase have) of just like that in a political context is negative (this doesn’t mean that just like that has a negative semantic prosody in the English language in general): verbs like kill, behead, chop somebody’s head off, cut your right hand off, occur in the proximity of the node, both on the left and on the right side.

Figure 4. Just like that in George W. Bush Table 5 below shows that the phrase just like that can also have a positive semantic prosody in English. The data below is taken from a general corpus of written and spoken English (BNC).

Page 10: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

10

Figure 5. Just like that in PIE (Phrases In English)

14. Give the definition of phrasal verb and provide some examples in context.

A phrasal verb is a verb followed by particles, that is an adverb or a preposition, sometimes more than one, and just like phrases, the meaning of the verb is not given by the sum of the individual words in it. Some phrasal verbs are very opaque, others are more transparent. Give up, for example, has nothing to do with give, and call off has nothing to do with call. Thus, we should never translate verbatim. Phrasal verbs are very common in English, and having to choose between a phrasal verb and a non-phrasal verb, an English native speaker will very likely prefer the phrasal verb: stop smoking vs. give up smoking. The list below includes only some examples. You can mention any examples you like.

CALL AFTER LOOK AFTER NAME AFTER TAKE AFTER PASS AWAY GET AWAY WITH PUT ASIDE BE BACK COME BACK GIVE BACK LOOK BACK GET BY BREAK DOWN CALM DOWN CLOSE DOWN

Page 11: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

11

HAND DOWN SHOOT DOWN STEP DOWN TURN DOWN VOTE DOWN CUT DOWN ON CALL FOR LOOK FOR RUN FOR STAND FOR LOOK FORWARD BREAK IN CHIP IN FILL IN GIVE IN HAND IN OPT IN SWEAR IN LOOK LIKE CALL OFF PUT OFF SHOW OFF SWITCH OFF TAKE OFF TURN OFF COME ON GET ON GO ON HANG ON HOLD ON PASS ON HOLD ON TO BAIL OUT BREAK OUT CARRY OUT DROP OUT FALL OUT FIGURE OUT FILL OUT LIVE OUT LOOK OUT PASS OUT POINT OUT PUT OUT RUN OUT SPEAK OUT SPELL OUT SORT OUT TURN OUT RULE OUT WATCH OUT WORK OUT

Page 12: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

12

BREAK OUT OF OPT OUT OF RUN OUT OF TAKE OVER CARRY THROUGH BREAK UP CALL UP GET UP GIVE UP GIVE UP ON GROW UP HOLD UP LOOK UP MAKE UP PICK UP RUN UP SIGN UP SPEAK UP SPEED UP SUM UP TURN UP WAKE UP STAND UP FOR CATCH UP WITH DEAL WITH KEEP UP WITH PUT UP WITH

15. What is the most frequent verb in the politicians under study in ABC? And the most frequent

2-word phrasal verb? And the most frequent 3-word phrasal verb? The most frequent verb in ABC is want, followed by know, make, think, get, work, thank, like, need and say among the first ten. Bearing in mind that “some words are lonelier than others”, it is soon evident that, with the exception of a few verbs which make meaning also on their own, and which are typical verbs of spoken language, most of the others need a particle or some other word to account for such a high ranking. The lexical verb make, for example, which is the third most common verb in ABC, almost certainly does not rank so high carrying the meaning of “create or produce something by working”: it ranks so high because it lends itself to creating several phrases, like for example I want to make sure. The verb make collocates with many words, and in ABC we found several instances of make progress, make great strides, make sacrifices, make a mistake, make a judgment, make a decision, make a choice, make sense, make up one’s mind, make your voice heard. Relying on the clusters facility provided by WordSmith Tools, we found out that the most frequent 2-word phrasal verb in ABC is deal with, followed by provide with, set up, go back, look for, look forward, move forward, figure out, end up, go ahead, stand up, among the first ten. The most frequent 3-word phrasal verb is look forward to, followed by get out of, come up with, live up to, stand up for. The most frequent verb occurring in company with look forward to is work: I look forward to working with you.

Page 13: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

13

Figure 6. Concordance lines of look forward to working in ABC The multi-word verbs listed above are indeed phrasal-prepositional verbs rather than phrasal verbs. English has at its disposal, just like multi-word units, various kinds of multi-word verbs: phrasal verbs, prepositional verbs and phrasal-prepositional verbs, but for the sake of convenience we have actually listed, in order of frequency, phrasal verbs, prepositional verbs and phrasal-prepositional verbs all together, without any distinction among the three, so that we find phrasal verbs like find out, give up, figure out, take off, prepositional verbs like look for, talk about, look at, think about, depend on, and phrasal-prepositional verbs like look forward to, come up with, hold on to, put up with, reach out to, stand up for. It is interesting to notice that these verbs, containing two-, three-, four- and five-word verbs, in company with the particle some verbs lose completely their original meaning, e.g. it turns out that has no semantic relationship with turn, neither do give up and give up on with give. Other verbs are more transparent, and the preposition extends the usual meaning of the verb, as is the case in verbs like go away, come up, or sit down. Others, instead, make meaning only with the particle and are not found independently as verbs, like, for example, fend for, sum up, zero in on, tamper with, which have no existence on their own. It is said that phrasal verbs are extremely difficult to learn: native speakers not only manage them with aplomb, but seem to prefer them to single word alternatives. Conversely, learners tend to avoid them and instead prefer to rely on larger, rarer, and clumsier words which make the language sound stilted and awkward. Thus, if learners have to choose between carry out and perform or undertake, or between put up with and tolerate they will certainly opt for the second. Yet, the whole draft of the historical development of English has been towards the replacement of words by phrases, and in ABC for example we found 32 instances of turned down the Treaty versus 8 of rejected the Treaty. Once we learn that phrases are handled like a single unit, we will not expect, for instance, turn and down to have a meaning on their own, because the two words occurring together have been stored in the mind as a holistic unit. We can conclude quoting Searle, when he argues that there is a conversational maxim that reads as follows: “Speak idiomatically unless there is some special reason not to”.

16. Is language idiomatic? And to what extent? Justify your answer. Yes, language is idiomatic. John Sinclair scientifically proved that about 80% of language is governed by the idiom principle and 20% by the open choice principle. Words are attracted to

Page 14: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

14

other words and they tend to occur always in company with each other more often than chance would predict, for no apparent reason other than convention and habit.

17. What is the meaning of “Words cluster just as people do”? Words attract each other also beyond idioms (when we speak of the idiom principle, we don’t have to think only of idiomatic expressions, like to rain cats and dogs). Words are like people. As it happens with people who like other people and tend to spend time together and go out together, so there are words that enjoy the company of other words, and like other words. Thus, we speak of attraction, indifference and repulsion. A clear example can be the phrase Merry Christmas: the English native-speaker routinely says Merry Christmas, Happy Christmas and Happy Birthday but not Merry Birthday. Christmas always occurs in company with Merry, so it is attracted to Merry, it is indifferent to Happy (it is not grammatically wrong, some people say Happy Christmas, but less often than Merry Christmas). On the contrary, Merry Birthday is wrong, Birthday is not attracted to Merry, they repel each other. That’s when we talk of repulsion.

18. How many features does WordSmith Tools have? WordSmith Tools has three features: Concord, Keywords, WordList.

Figure 7. WordSmith Tools 5.0

19. What’s the function of Concord? What’s the function of Wordlist? What’s the function of Keywords? As its name indicates, WordList creates word lists, ordering words by frequency (Figures 1 and 2) and alphabetically. Word frequency information is very useful in identifying characteristics of a text or of a genre. Concord is a tool which locates all references to any given word or phrase within our corpus, showing them in standard concordance lines with the search word (also called node word) centred and a variable amount of context at either side (usually N-5 and N+5: five words to the left and five words to the right). This tool allows further examination of the company a word keeps (its collocates) to be studied. The figures below shows the concordance lines of the word TIME in Barack Obama’s corpus:

Page 15: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

15

Figure 8. Concordance to time without any sort in ABC

Figure 9. Concordance to time sorted to the left (L1-L2-L3) in ABC

Page 16: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

16

Figure 10. Concordance to time sorted to the right (R1-R2-R3) in ABC

Figure 11. Concordance to time sorted to the right and to the left (R1-R2-L1) in ABC

The Keyword list uses the word lists described above, and compares them. The idea is quite simple: if a word is found to be much more frequent in one corpus with respect to another, it is a “keyword”. The notion underlying this is therefore “outstandingness” based on comparison.

Page 17: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

17

We create a Keyword list by referencing a study corpus (usually smaller) against a reference corpus (usually larger, ideally five times larger) (Figure 12).

Figure 12. Word lists in Barack Obama and George W. Bush (30-60)

20. What keywords have emerged by referencing George Bush’s speeches against Barack Obama’s speeches? And by referencing Barack Obama’s speeches against George Bush’s speeches? By comparing Barack Obama to George W. Bush’s word list, the aim is to unveil the words and phrases that the current President tends to employ, trying also to find those which can be regarded as the “signature”, as it were, of Barack Obama that distinguish him from the former president, and what makes him so different. This might also allow us to understand what persuaded the people, even those who disagreed with him, vote for him, hence what made the swing states1 become blue after the elections (Figures 13 and 14) voting for Obama.

1 In United States presidential politics, a swing state, also known as purple state (purple being the

combination of the colors red and blue, which are used to represent Republican- and Democratic-majority

states respectively) is a state where no single candidate or party has overwhelming support in securing that

state’s electoral college votes.

Page 18: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

18

Figure 13. Electoral presidential landscape of the United States before the 2009 elections

Figure 14. Electoral presidential landscape of the United States after the 2009 elections

Figure 15 shows the keywords emerged by referencing Bush against Obama: Iraq, freedom, terror, terrorists, war, rank top of the list.

Page 19: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

19

Figure 15. Keywords emerged by referencing Bush vs Obama

Page 20: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

20

One word that might arise interest in Bush’s language is appreciate (ranking 9th in Figure 15), used by the former president with the function of Thank you, as the table below illustrates:

Figure 16. Appreciate in George W. Bush

Sometimes the verb is also used without the subject, appreciate it (lines 1-2 and 5 below) or even appreciate you. A few instances of appreciate alone were also found.

Figure 17. Appreciate it and Appreciate you in George W. Bush

Figure 18 shows the keywords emerged by comparing Obama vs Bush:

Page 21: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

21

Figure 18. Keywords emerged by referencing Obama vs Bush

It is not surprising that words like insurance, recovery, crisis, health and clean figure top of the list. It is interesting to note that in both lists the former and the current Presidents’ wives figure among the first words2: Laura ranking 27th and Michelle ranking 31st. Grouping the keywords emerged by referencing Obama against Bush by semantic field, three main areas have surfaced:

1. economic crisis and recovery 2. clean energy and climate change 3. health care reform

As Scott points out, “a lot depends on the corpus”: our results would certainly be different if we referenced the British Prime Minister’s speeches against the President of the United States’ speeches.

21. What phrases have emerged in Barack Obama’s speeches? The tables below show 3-word clusters and 4-word clusters emerged in Barack Obama’s speeches.

2 The software also shows that, unlike George W. Bush, Barack Obama mentions his two daughters’ names

very frequently: Sasha and Malia.

Page 22: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

22

Among the 3-word clusters it is worth mentioning the following: to make sure, as well as, men and women, around the world, across the country, in terms of, first of all, the recovery act, on behalf of, health care system, health care reform, in order to.

Figure 19a. three-word clusters in Obama’s speeches

Page 23: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

23

Figure 19b. three-word clusters in Obama’s speeches

Page 24: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

24

Figure 19c. three-word clusters in Obama’s speeches

Since language is phraseological, learning a mere list of words doesn’t help much, because most words have meaning only embedded in phrases or have a different meaning when embedded in phrases. Knowing a language does not only mean learning a list of words but also and mostly learning how words combine with one another. For example the three words just, like, and that, combined together create a meaning different from the words used individually. Learning the word behalf alone, for example, doesn’t help much, in that behalf is always found in company with on and of, in the phrase on behalf of, as is shown in Figure 19b (ranking 67).

Page 25: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

25

Among the 4-word clusters it is worth mentioning the following: to make sure that, thank you very much, united States of America, thank you so much, when it comes to, I just want to, all across the country, a lot of people.

Figure 20. four-word clusters in Obama’s speeches

22. What key-phrases have emerged by referencing Barack Obama’s speeches against George Bush’s speeches?

Page 26: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

26

WordSmith Tools allows us to yield not only a list of words and keywords but also a list of phrases and key-phrases (or key-clusters). The key-clusters below in Figure 21 have emerged comparing Barack Obama’s phrases to George Bush’s phrases, and indicate the phrases that Obama utters much more frequently than Bush. They indicate the main concerns in Obama which were not prioritized in Bush’s government: the recovery act (ranking 7th) and health care reform (ranking 21st). Have a seat (ranking 33rd), usually uttered by Obama with the word please, Please have a seat (which in fact appears in the 4-word clusters) is indeed hardly ever pronounced by Bush, who preferred to say Please be seated.

Figure 21. three-word key-clusters obtained referencing Obama’s speeches vs Bush’s speeches

Page 27: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

27

With the opposite procedure, comparing Bush’s speeches to Obama’s speeches, the following clusters have emerged:

Figure 22. four-word key-clusters obtained referencing Bush’s speeches vs Obama’s speeches

Bearing in mind that meaning is an unstable entity which is not created by single words but by their interaction, the clusters emerged, even better than individual words, show the main concern of the former American President with respect to the current: the war on terror, weapons of mass destruction, for the sake of, September the 11th, in the Middle East, as a matter of fact, no child let behind act.

Page 28: the language of politics - Denise Milizia WebSite · 1 and the language of politics (updated May 2011) 1. What is Corpus Linguistics? Corpus Linguistics is the study of language based

28

To conclude, phraseology plays a prominent role in discourse, hence clusters are certainly much better at revealing the ‘aboutness’ of the text (and the context) than individual words. Furthermore, relying on the assumption that frequency is a guide to importance, key-clusters are even better at unveiling the main concerns of one politician with respect to another.