a. i want to know how to see a word in its · pdf file · 2017-09-04i want to know...

23
SketchEngine Workshop 2011 [email protected] A. I want to know how to see a word in its context This is the most basic corpus function and can be very useful for looking for patterns, for example if you want to show how a particular word tends to collocate with a particular preposition. 1. As an example, open ukWaC and type higgledy in the box, as in the screenshot 2. You should now see something similar to this: 3. If you look to the right, you can see what it tends to describe 4. You could now try this for any other word you want to look at

Upload: lamxuyen

Post on 23-Mar-2018

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

A. I want to know how to see a word in its context

This is the most basic corpus function and can be very useful for looking for patterns, for example if

you want to show how a particular word tends to collocate with a particular preposition.

1. As an example, open ukWaC and type higgledy in the box, as in the screenshot

2. You should now see something similar to this:

3. If you look to the right, you can see what it tends to describe

4. You could now try this for any other word you want to look at

Page 2: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

B. I want to know how to see a list of word forms for a particular lemma.

1. Select the British National Corpus.

2. Click on ‘Word List’ from the top left-hand bar and you will see something similar

to this:

3. In order to see all the word forms for the lemma EMPLOY you should

a. Select ‘lemma’ from the drop-down box next to ‘search attribute’

b. write employ in the white box

c. tick ‘Use multilevel wordlist’

d. look at the drop-down boxes next to ‘use multilevel wordlist’

e. from the first drop-down box select ‘word’

f. from the first drop-down box select ‘tag’

g. click on ‘make word list’

This function can also be useful for investigating whether a word is used most

frequently in its verb or noun form, for instance. It can also be useful for comparing the

frequency of two near-synonyms or two spelling variants.

Page 3: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

C. I want to know how to find the differences between two near synonyms

1. Select SketchDiff from the left-hand menu and add your two search items like this:

You should then see something like this:

2. If you focus on the ‘modifies’ column on your screen, what differences do you notice

between the ‘large only’ patterns and the ‘big only’ patterns?

3. Using the Sketch Diff function in Sketch Engine, compare the following:

� little / small

� boy / girl (use UKWaC ). Do girl and boy seem identical except for the gender

difference?

4. Think of your own pair to investigate and write your findings below:

This function is also very useful for comparing/deciding between two possible translations of

an item.

Page 4: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

D. I want to know how to find out how a particular word behaves, for

example, whether it usually collocates with positive or negative things

Collocation refers to the tendency for words to go together and is an extremely important

concept in corpus linguistics and the study of lexis. Words which are habitually found together

are referred to as collocates.

1. Investigating bent on

The modern conference resembles the pilgrimage of medieval Christendom in that it allows the

participants to indulge themselves in all the pleasures and diversions of travel while appearing

to be austerely bent on self-improvement. (David Lodge, Small World)1

In this case, we can identify something unusual in the combination of bent on + self-

improvement. But why?

a) Select the BNC

b) Type bent on in the search box

c) Look at the words to the right of bent on – what do they have in common?

d) To see a summary list of all the terms that come immediately after bent on:

a. Click on ‘collocations’ from the left-hand menu

b. Set the range at 0 to 1

e) What can we say about the semantic prosody of bent on? Does it tend to collocate with

good or bad things?

2. What lexical items do you think are frequently found in the company/co-text of COMMIT?

3. To check this, you can also use the Word Sketch function. Select ‘Word Sketch’ from the top

left-hand bar and you should see something like this:

1 This is a well-known example from Hoey, M 2005. Lexical Priming. London & New York: Routledge.

Page 5: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

4. Type commit into the search box and select the correct part of speech from the drop

down box. You should then see something similar to this:

5. Did you find the items that you wrote down in the box above?

6. The Word Sketch function doesn’t just tell you what words are commonly found in the

company of your search word, but also tells you what their grammatical relationship is

to the search word.

What subjects most commonly collocate with COMMIT?

What are the most common objects of COMMIT?

7. Extension: Use Word Sketch to look at the collocates of KISS, do you notice anything about

the gender distribution of the collocates?

Page 6: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

E. I want to know how to find out which verb is usually used with a

particular noun

Once again, we are investigating collocation, the tendency for certain words to go together.

There are various ways of investigating this in SketchEngine, but one of the easiest is using

Word Sketch. This is the same procedure as Q.5.

1. Select ‘Word Sketch’ from the left-hand menu.

2. Type meeting into the box and select the correct part of speech

3. You should then see something similar to this:

If you focus on the first column, you can see the verbs for which meeting is the object.

4. Now try another example, for instance a word that learners often struggle with in terms

of collocation.

Page 7: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

F. I want to know how to find out which adverbs are usually used with a

particular adjective

Once again, we are investigating collocation, the tendency for certain words to go together.

There are various ways of investigating this in SketchEngine, but one of the easiest is using

Word Sketch. This is the same procedure as Q.4.

1. Select ‘Word Sketch’ from the left-hand menu.

2. Type enormous into the box and select the correct part of speech

3. You should then see something similar to this:

If you look at the second column we can see the words which are used to modify

enormous. If working with students, for example, this would be one way of drawing their

attention to the relatively infrequent use of very with this type of adjective.

4. Try it again with another term. You could use another corpus.

Page 8: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

G. I want to know how to compare two (potential) translation equivalents

For instance, you have identified a possible translation for a particular lexical item, but you

want to check whether the SL and TL items really function in similar ways or have similar

evaluative meanings (in the corpora we have available). There are many ways you could

compare the items, but we will start with a simple one…

1. For this exercise I recommend that you use the WaC corpora because they have been

compiled in similar ways and are available for nearly all the languages. The WaC corpora

contain texts which were collected from the internet. The first two letters refer to the

domain where they were collected e.g. ukWaC is the British English corpus, itWaC is the

Italian corpus etc.

2. So, from the homepage select the WaC for your source language (if you are currently in

another corpus, first click ‘home’ from the top-right corner).

3. Now select ‘Word Sketch’ from the left-hand column and type in the word that you are

interested in. You will need to specify the part of speech (noun, verb etc).

4. You can save the resulting Word Sketch by selecting ‘save’ from the left-hand menu.

5. Now return to the Sketch homepage and choose the WaC corpus for your target

language and repeat the process, inserting your potential translation into the Word

Sketch.

6. Then you can (manually) compare your Word Sketches to see if they seem similar. (The

similarity/difference may of course be affected by a range of factors)

7. Did your items occur with similar collocates in the two languages?

This manual comparison is also useful for any kind of cross-cultural analysis.

8. Extension task: Create a Word Sketch for university in the UkWaC corpus. Now create a

Word Sketch for the equivalent word in another language. What similarities/differences

do you notice?

Page 9: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

H. I want to know how to find out how other people have translated a

particular word

This is currently only possible for the following language pairs: English-German, English-

Spanish, English-Finish, English-French, English-Italian, English-Dutch.

To look at translation, you need to select one of the parallel corpora included in Sketch. These

are all EUROPARL corpora and contain data from the European Parliament, they appear

something like this on the home page: (these are the corpora for which English is the source

language, you can also look at corpora for which English is the target language:

1. Open your chosen corpus (start with from English)

2. Type straightforward into the search box. This will give you a list of concordances

containing straightforward.

3. In order to see how it was translated you need to change the view by clicking on the name of

the corpus to the right of the hits summary

4. You may then need to change the view to make it easier to read. You can do this simply by

clicking on ‘KWIC/sentence’ from the left-hand menu. You should then see something that

looks like this:

5. Now try with another term (and of course you could use another corpus).

Page 10: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

I. I want to know how to find out which adjectives are used most

frequently in a particular discourse type, e.g. Academic Spoken English

In this case, you need to use part-of-speech tags. All the corpora on Sketch have been tagged for

information about their part of speech, for instance whether a word is a singular noun etc but

not all corpora use the same system, so you will need to check.

When you use the concordance tool, it is very easy to find your tagset because there is a link

next to the search box, as shown below:

You should also note that the ‘Query type’ has been changed from ‘simple’ to ‘CQL’ which means

‘Complex query language’. It’s a good idea to open the tagset in another window (or print it out

if you will be using it frequently.

1. For this search should select the ‘British Academic Spoken English Corpus’ (click on

‘home’ from the top right to return to Sketch homepage with the list of corpora).

2. First, open the tagset as described above.

3. Then click on ‘wordlist’ from the left-hand menu.

4. Change the search attribute to ‘tag’ and insert your tag into the box marked ‘pattern’,

then click on ‘multilevel’ and ask for ‘word’ as an output, as in the screenshot:

5. This should generate a list of the most frequent adjectives in that corpus. You could then

save the list and repeat the process for the ‘British Academic Written English Corpus’

for instance.

Page 11: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

J. I want to know how to look for patterns (e.g. what else can be in used in

the form a couple of sandwiches short of a picnic)

You can also create more complex searches for the concordance and collocate functions, for

instance, we could look at how creative the lexical template an x short of a y is

1. Click on ‘home’ (top right) and switch to the UKWaC corpus (because it is larger and

more recent)

2. Click on ‘query type’ and select ‘CQL’ (complex query language)

3. Type the following into the query box:

"a|an" []{0,3} "short" "of" "a|an”

4. You should see something a bit like this:

5. Now sort your concordances by ‘node’ (left-hand menu), what examples did you find?

What is the function of this template?

6. Think of another pattern that you would like to investigate and see if you can create the

query. If you want to get some ideas, you could look up ‘snowclones’ on the Language

Log http://languagelog.ldc.upenn.edu/nll/

Page 12: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

K. I want to know how to find out what words occur in similar lexical

patterns to my search word

The ‘Thesaurus’ function in Sketch Engine is not like a manually created thesaurus. It simply

provides a list of words which occur in similar grammatical and lexical contexts to your search

word.

1. Click on ‘Thesaurus’ from the top left-hand bar and you should see something similar to

this:

2. Type similar into the search box and select the correct part of speech from the drop

down box.

3. You will notice that different appears in the thesaurus list. This is because the thesaurus

function just looks at the contexts of use, not the meaning, and antonyms are usually

used in extremely similar contexts.

4. You could also look for words where you are more interested in the cultural patterns,

for instance you could try university or baby. For this, it is probably better to use the

ukWaC corpus because it is more recent.

Page 13: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

L. How can I find which words are typical of one corpus compared to

another? (e.g. spoken language ‘v’ written language)

You can also calculate keywords in Sketch Engine. Keywords here refers to items which occur

statistically significantly more frequently in one corpus than other.

For instance, if we wanted to compare spoken and written academic language:

1. Open the British Academic Spoken English Corpus and click on ‘Wordlist’ from the left-

hand menu.

2. At the top where it says ‘subcorpus’ we don’t change anything as we want to use all of

the corpus. At the bottom we need to specify that we are interested in ‘keywords’ and

then select the corpus that we want to compare it to i.e. the ‘British Academic Written

English Corpus’.

3. Click on ‘Make Word List’ and you should see something rather like this:

Page 14: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

To compare sub-corpora, you have to specify which parts of the corpus you are interested in. For

instance, to compare spoken and written language in the BNC, you need to ‘create’ the sub-corpora.

1. To do this, click on ‘Word list’ then ‘create new’. This will open a page with the various

sections of the BNC. You should give your sub-corpus a name, e.g. BNC spoken and then

select the two spoken divisions, as shown below, and click on ‘create subcorpus’ at the

bottom of the page.:

2. Now we need to repeat the process for the written components, so click on ‘create

subcorpus’ and this time select the three written texts types from the top-left box and give it

an appropriate name, e.g. BNC written

3. To compare the two sub-corpora, go to ‘Word list’ and select the new spoken subcorpus at

the top and the written subcorpus at the bottom, as shown below:

It will take a bit longer to process this type of query… but in the end you should see

something a bit like this:

Page 15: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

This type of search is particularly useful when you are trying to find out which words characterise a

particular corpus, for example if you were preparing materials for teaching vocabulary relating to

specialised field or preparing to translate a text from a specialised field, this would be one way of

getting to know the key items that differentiate that field from general language. It is also interesting

as a starting point for discourse studies.

Page 16: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

1. How can I re-order the concordance lines?

When you have the concordances on the screen you can click on ‘left’ or ‘right’ from the left’-

hand menu. If you click on ‘right’ it will sort the concordance lines in alphabetical order

according to the first word to right of your ‘node’ (your search word).

For instance, these concordance lines have been sorted to the right:

Hint: It will always list punctuation first, so you often have to click on ‘next’ at the bottom to

get past the punctuation.

Page 17: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

2. How can I see more context for my concordance lines?

When you are looking at your concordance lines, you can click on the line that you want

to see in more detail and it will appear at the bottom of your screen, like this:

Alternatively, you can click on ‘view options’ from the left-hand menu and increase the

number next to ‘KWIC Context size (number of characters)’.

Page 18: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

3. How can I search for more than one thing at once?

Use the pipe | (usually on the bottom-left on a UK keyboard) e.g.

“emphasise|emphasize” will search for both spelling variations

Page 19: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

4. How can I specify words that I don’t want in the co-text?

There are two ways that you could do this.

1. From your concordance view, you could select ‘filter’ from the left-hand menu. This

will show you something similar to this:

Now select ‘negative’ to say that you want to exclude certain items and type the

item that you wish to exclude in the box.

2. Alternatively, you could exclude the unwanted items when you enter your search

term at the beginning of the process. For example, let’s say I did a search for corpus

and there were lots of irrelevant results because several hits referred to the Corpus

Vasorum Antiquorum database. I could then exclude one or more words like this:

"corpus" [word!="Vasorum"]{0,3}

This says that I want corpus where it does not occur within 3 words of Vasorum.

This function can also be useful for excluding punctuation to ensure that your

complex searches are not split across sentences. For instance, to exclude a full

stop, you would insert [word!=”\.”] - in this case we have added the \ to indicate

that the dot is actually in the text and not part of a wildcard.

Page 20: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

5. How can I use wildcards in searches?

1. Any sequence of characters .*

“system.*” will retrieve: system

systems

systematic etc.

2. Any word []

"confus.*" [] "by" will retrieve: confusion generated by

confusion heightened by

confusion introduced by etc

3. Any group of words []{}

"a|" []{0,3} "short" "of" "a|an" will retrieve:

this man who is obviously a few co-ordinates short of a bearing

Jason, thou truly art a few Brontoburgers short of a picnic!

this guy's a couple of sandwiches short of a picnic etc.

NB remember to put quotation marks around your items and to set your ‘query type’ to ‘CQL’

(complex query language).

Wildcard tasks (use the BNC)

a) Find examples of words ending in ious

__________________________________________________________________

b) How many different words can you form with bloody? (Use ‘wordlist’ – you don’t

need to use quotation marks around your item in ‘wordlist’. Put the wildcard at

beginning and the end)

__________________________________________________________________

__________________________________________________________________

__________________________________________________________________

c) Which are the most common words beginning with the prefix anti-? (hint: use

wordlist)

__________________________________________________________________

d) Create a query to identify occurrences of better followed by worse with no full

stop in between

__________________________________________________________________

Page 21: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

6. How can I use part-of-speech tags in searches?

All the corpora in Sketch have been tagged for part of speech- but not all the corpora

use the same tags. To find the tagset (the list of codes) for your corpus you can just

click on ‘tagset summary’ from the concordance page. Open it in another window so

that you can keep referring back to it.

1. As an example, in a text about the new funding regime the expression that

‘Bursaries are non-refundable’ struck me as being rather unusual. To find out

why that might be, we could concordance the occurrences of the verb BE

followed by non….able. The query syntax is:

[tag="VB.*"]+ "non.*able"

2. Copy this query in to the concordance tool in the BNC. You need to set the

query type to ‘CQL’ (complex query language).

You should get something similar to this:

Page 22: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

The concordances show that the items in the node (the central part) of this

pattern are frequently desirable; we would like to get a refund, return items etc.

Therefore, we can hypothesise that the ‘unusuality’ I perceived in the original

phrase comes from a point of view difference – the author (representing the

university) presumably would like bursaries to be reimbursed, whereas I, the

reader, didn’t see that as being desirable.

Hint: When using CQL, you can’t search for phrases, so, for instance to find all examples

of nouns followed by set in, you would insert (the items set and in are separated):

[tag="N.*"]+ "set" "in"

More information on using the POS tags is also available here:

http://trac.sketchengine.co.uk/wiki/SkE/CorpusQuerying#1.

Page 23: A. I want to know how to see a word in its · PDF file · 2017-09-04I want to know how to see a word in its context ... to be austerely bent on self-improvement. (David Lodge, Small

SketchEngine Workshop 2011 [email protected]

7. How many different ways are there for investigating

collocation in Sketch Engine?

1. Using the concordance tool.

This is useful if you have a small amount of data or there is a very clear pattern you

want to show

2. Using collocates.

a. To calculate collocates you first create the concordance.

b. Then click on ‘collocations’ from the bottom of the left-hand menu and you

will see something similar to this:

c. To specify which collocates you are interested in you can alter the ‘range’ So,

for instance, if I only wanted to know which words come immediately to the

right of happily, I would set the ‘range’, to 0 in the left-hand box, because this

refer to the items that come before the node (hence the minus sign), and 1 in

the right-hand box which means 1 place to the right.

d. And this would give something like this:

3. Using ‘Word Sketch’

4. Using ‘Sketch-Diff’ (if you are comparing collocational patterns for 2 items)