lexical patterns: from hornby to hunston and beyond patrick hanks faculty of informatics, masaryk...

37
Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic [email protected] Afrilex, Stellenbosch June 30, 2008 1

Upload: oswin-ferguson

Post on 23-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Lexical Patterns: from Hornby to Hunston and beyond

Patrick Hanks Faculty of Informatics, Masaryk University,

Brno, Czech [email protected]

Afrilex, Stellenbosch June 30, 2008

1

Page 2: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Outline of the talk

• Palmer and Hornby (1942)– the recognition of lexical patterns

– the dictionary as an aid to idiomatic and productive use of language.

• Refinements to OALD (Hornby 1964; Cowie 1989; Crowther 1995;Wehmeier 2000)

• The impact of corpus linguistics on lexicography– The role of collocations (Sinclair)

• Verb patterns and sentence meaning; implicatures– Pattern grammar contrasted with pattern dictionary

• Where do we go from here?

2

Page 3: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

A. S. Hornby1923: started as a teacher of English in Japan

1931: joined H. E. Palmer at the Tokyo Institute for Research into English Teaching

1936: became head of research at the Institute

1941: Hornby repatriated in World War II; joined the British Council

1942: Idiomatic and Syntactic Dictionary (ISED), by Hornby Gatenby, and Wakefield, published in Japan by Kaitakusha

1948: ISED re-published (unchanged) by OUP as A Learner’s Dictionary of English (LDE)

1954: A Guide to Patterns and Usage in English (lexically based)

1960: title of LDE changed to Advanced Learners’ Dictionary of Current English

1963: Second edition of ALD (Hornby alone).

1974: Third edition of OALD (edited with A. P. Cowie)

1978: Death of A. S. Hornby.

3

Page 4: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Some insights of Palmer and Hornby

• Learners need idiomatic phraseology, not etymology

• English grammar is not Latin grammar– E.g. Palmer identified ‘determinatives’ (determiners), adverbial particles,

‘anomalous finite verbs’ (auxiliaries, modals, and verbal pro-forms), as well as classes inherited from Latin grammar (nouns, verbs, adjectives, adverbs)

– Palmer and Hornby also identified clause structure elements – for example subject complements and object complements (He is mad & linguistics drove him mad; he is the editor & they appointed him editor)

• Language in use is structured around the lexical patterns of verbs

4

Page 5: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Hornby’s second edition (1963)

• Huge increase in the number of lexical items– Many of them more useful for receptive (reading) than productive

(writing) use.

• Less user friendly than ISED – influenced by the Concise Oxford Dictionary, 4th edition,1951

• ‘Nested’ subentries with swung dashes– E.g. blackbird became “~bird” under the main entry black

• See Cowie (1999), English Dictionaries for Foreign Learners: A History, for more details

• Some of these editorial policies were reversed in OALD6 (2000).

5

Page 6: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Hornby’s original verb patterns (1942)

• 25 patterns were identified.– Succinct, but still hard for a learner to use easily.

– Only two subdivisions: transitive vs. intransitive.

– 14 of the 25 verb patterns take clausal complements:• E.g. Pattern 5 (S V O Inf): I made him do it.

• Separating out the clausals makes things a lot simpler.

– Some very subtle (unnecessary?) distinctions.

– For details, see the version of this paper in the Proceedings.

• In the 4th edition (1989), A. P. Cowie introduced a more user-friendly organization of patterns.

6

Page 7: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Verb patterns in OALD now (1)Intransitive verbs• [V] A large dog appeared.• [V + adv/prep] A group of swans floated by.

Transitive verbs • [VN] Jill’s behaviour annoyed me. • [VN + adv/prep] He kicked the ball into the net.

Transitive verbs + two objects• [VNN] I gave Sue a book for Christmas.

Linking verbs• [V-ADJ] His voice sounds hoarse.• [V-N] Elena became a doctor.• [VN-ADJ] She considered herself lucky. • [VN-N] They elected him president.

7

Page 8: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Verb patterns in OALD now (2)

Verbs used with clauses or phrases• [V (that)] He said that he would prefer to walk.• [VN (that)] Can you remind me that I need to buy some milk?• [V wh-] I wonder what the job will be like.• [VN wh-] I asked him where the hall was. • [V to] The goldfish need to be fed.• [VN to] He was forced to leave the keys. • [VN inf] Did you hear the phone ring?• [V –ing] She never stops talking.• [VN –ing] His comments set me thinking.

• Verbs + direct speech• [V speech] “It’s snowing,” she said.• [VN speech] “Tom’s coming to lunch,” she told him.

8

Page 9: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Word classes vs. clause structure

• OALD now, like the Pattern Grammar of Hunston and Francis (2000) and other works, expresses patterns in terms of word classes, not clause roles. – E.g. “VN” not “subject – verb – object”.

• I will argue that patterns of verb use need to be analysed in terms of clause roles – a subtle but important distinction. – Failure to observe this distinction has led to some confusion.

– Lexicographers defining verbs need to know about clause structure.

9

Page 10: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Clause roles

• S Subject

• P Predicator The verb group.

• O Object English clauses have 1, 2, or 0 objects.

• C Complement Co-referential with either the subject or the object of the clause.

• A Adverbial (sometimes called Adjunct). A clause may have any number of optional adverbials, but only 1 or 0 obligatory adverbial. – It is necessary to distinguish obligatory adverbials (e.g. on the table in

He put the cup on the table) from optional adverbials (as in He died in 1974).

10

Page 11: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Pattern Grammar

• Hunston and Francis (2000): Pattern Grammar: a corpus-driven approach to the lexical grammar of English

• “One of the most important observations in a corpus-driven description of English is that patterns and meanings are connected.”

• PG is founded on real texts and is a real attempt at empirically valid, i.e. corpus-driven, generalizations.

11

Page 12: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

How is a Pattern Dictionary different from a Pattern Grammar?

• Pattern Grammar seeks similarities – words with similar meanings – and groups them together according to syntactic as well as semantic similarity.

• By contrast, the Pattern Dictionary seeks systematic differences:– In particular, the differences in pattern that pick out different meanings

of a polysemous word.

– To do this, it needs to introduce semantic values of arguments

– At this point, all hell breaks loose!

12

Page 13: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Semantic values of arguments (valencies)

• Not just “V n” but (e.g.):

[[Human]] polish {([[Surface]] of) [[Physical Object]]}

– Not “all possible values” but “all normal values”

• Lumping vs. splitting:

– Is polishing one’s nails the same pattern as polishing one’s boots | the furniture | etc.?

• Semantic values shimmer – see Hanks and Jezek, this conference, e.g.:

– calm a person, calm an animal

– calm someone’s fears, anxieties, nerves ...

13

Page 14: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Focusing of arguments

– repair the house

– repair the roof of the house

– repair the damage to the roof

• Same event, different semantic types

14

Page 15: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Meanings are not just concatenations

• Constructions (construction grammar), e.g. resultatives:– He shook the rain off his umbrella

– What did he shake? (Not the rain!)

• He belched his way out of the room [example from Goldberg and Jackendoff 2004]– A noise-emission event AND a motion event

• Compare he belched out of the window.

– Only certain classes of verbs participate in the way construction

– Corpus analysis (not navel gazing) is needed to determine which verbs participate in which constructions

– See Fillmore’s ‘Constructicon’ (plenary, this conference)

15

Page 16: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Apparatus of Pattern Analysis

• The Pattern Grammar has an admirably simple apparatus.– target word; part of speech categories; word order; and certain function

words (mainly prepositions).

• Simplicity can be overdone.

---

• To represent the distinctive features of meaning in use, we also need at least:– Systematic analysis and categorization of ‘colligations’

– Lexical items grouped as collocates or by semantic type

– Valencies – a.k.a. clause roles (we use S P O C A)

16

Page 17: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

execute

Verb Pattern Grammar Pattern Dictionary

execute V n

(passive n be V-ed)

[[Human 1]] execute [[Human 2]]

17

Example sentence: Private Joseph Byers was the first Kitchener volunteer to be executed.

In the Pattern Dictionary (but not in PG) semantic types distinguish this sense from other “V n” patterns of the same verb, e.g. ‘execute an order’.

Page 18: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

enlistVerb Pattern Grammar Pattern Dictionary

enlist V in n [[Human]] enlist [NO OBJ] {in [[Human Group = Military]]}

18

Example sentence: He was 17 and under age when he enlisted in the 1st Royal Scots Fusiliers.

Pattern Grammar and Pattern Dictionary agree in contrasting this sense with other patterns such as “[[Human]] enlist [[Assistance]]” (V n).

Page 19: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

goVerb Pattern Grammar Pattern Dictionary

go V adj [[Human]] go [NO OBJ] {absent | AWOL}

19

Example sentence: His inexperience and the horrors he witnessed caused him to go absent without leave.

This is a light verb (“delexical verb” in Sinclair’s terminology), with many patterns.

The “adj” in PG is a Subject Complement.

The small lexical set, {absent | AWOL}, in the Pattern Dictionary activates a particular meaning of go, contrasting with other patterns of go having a Subject Complement, e.g. go {mad | bananas} .

Page 20: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

pleadVerb Pattern Grammar Pattern Dictionary

plead V adj [[Human]] plead [NO OBJ] {guilty | {not guilty}}

20

Example sentence: Byers pleaded guilty.

The adj in this pattern is an Object Complement.

The Object Complement is populated by a lexical set of just two possible (normal) items. (“plead innocent” is plausible but not idiomatic.)

Page 21: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

fireVerb Pattern Grammar Pattern Dictionary

fire V adj [[Human]] fire [NO OBJ] ([Advl[Direction]])

21

Example sentence: … the firing squad had fired wide to avoid killing the youth.

The “adj” in this sentence has the clause role of Adjunct or (in my terminology) Adverbial of Direction, as in:

The police fired into the crowd.They fired over their heads.

It’s not really an adj. at all.

Page 22: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Semantic values in patterns

• PATTERN: [[Human]] execute [[Command]]

• [[Command]] is a semantic type with many lexical realizations: command, order, instruction, wish, ...

• IMPLICATURE:

[[Human 1]] acts in accordance with [[Human 2]]’s [[Wish]]

• A look-up table (an ontology) is needed, in which users can find all the normal words (the ‘population’) of each semantic type in each normal context– and/or a procedure for recognizing type membership – e.g. ‘named

entity recognition’ – which recognizes all and only members of the set [[Human]], and distinguish them from names of places, businesses, products, dogs, etc.

22

Page 23: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Collocations; lexical sets; semantic types

• Pustejovsky (1995, 2008 (with Rumshisky)): meaning is expressed in terms of verbs and their argument structures (with the semantic types of the arguments).

• Sinclair (passim; followed by Kilgarriff and others): this is unnecessary: just list the collocates found in a corpus; don’t try to group them in terms of semantic types.

• Hanks (1996, 2007, elsewhere), Jezek: lexical sets typically share semantic values, but with much variation. – Grouping them by semantic type increases the power of the dictionary

to predict the meaning of a sentence correctly.

• Either way, sentence meaning depends on the sets of words that normally, typically occur in particular clause roles in relation to a particular verb (not just the word classes – i.e. not just “V n”).

23

Page 24: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Patterns, not senses

• Meanings taken from WordNet or a dictionary do not yield reliable data for disambiguating senses (Ide and Wilks 2005). – WordNet lists synonym sets and other semantic relations – but not

senses.

– WordNet did not do contrastive analysis of word senses.

– In standard dictionaries, word senses are not mutually exclusive.

– There is much fuzzy overlap between senses – which may be OK for sophisticated human users, but not for learners or computers.

• The patterns of all and only the normal uses of a lexical item are (normally) mutually exclusive. – However, teasing them out from corpus data is hard.

24

Page 25: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Norms and exploitations

• The Pattern Dictionary record all and only the normal uses of each verb.

– Exploitation of norms is a subject for separate analysis.

– Types of ‘exploitation’ include creative metaphor, ellipsis, and anomalous arguments. Consider:

• The goat ate the newspaper.

• The verb eat has a preference for nouns of semantic type [[Food]] in the direct object clause role.

• ‘[[Animate]] eat [[Document]]’ is not a normal pattern of English.

• Compare John devoured the newspaper.

• ‘[[Human]] devour [[Document]]’ is a normal pattern of English. It is a conventional metaphor.

25

Page 26: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Specifically, ...

The Pattern Dictionary of English Verbs will:

• list all normal patterns of each verb lemma in BNC.

• provide a benchmark for comparison and identification of norms in other corpora, e.g.– by time period: patterns in historical corpora, future corpora .

– by region: e.g. patterns in American English.

– by domain, e.g.:

• ‘[[Human]] abate [[Problem = Nuisance]]’ is a domain-specific norm in the domain of legal jargon

• abate is not normally a transitive verb.

26

Page 27: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Details, details, details ....

• amble

PATTERN: [[Human | Animal]] amble [A[Direction]]

IMPLICATURE: [[Human | Animal]] walks slowly and in a relaxed manner in the stated [[Direction]]

• Notes:Even though this is a manner-of-motion verb (so the path and destination

are of little importance), normal, idiomatic phraseology requires that the adverbial of direction be explicit (out of the house, down the hill, along, into the restaurant, ....)

In the pattern dictionary, implicatures are ‘anchored’ to each pattern by repetition of at least some of the arguments.

The metalanguage of implicatures here is English (same as the object language), but it could easily be translated into Russian, Wierzbickan primitives, or any other formalism.

27

Page 28: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Who did what to whom?

• Look at the numbered patterns for scratch, v., on pp. 119-122 of the Proceedings.

• They help users get started on addressing such questions as:– Was the action more probably intentional (2, 3, 5, 10), accidental (6),

or neither (1)?

– Was the intention benign (3) or hostile (5)?

– Was the result beneficial (3, 7, 8, 9, 10) or damaging (1, 5)?

– What pragmatic implicatures are activated? – e.g. • puzzlement (4), poverty (8, 9), reciprocity (10), superficiality (11), ...

• Compare entries for this verb in existing, traditional dictionaries

28

Page 29: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Patterns for ‘urge’

29

Page 30: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

So, what is a pattern dictionary?

• A pattern dictionary explains all normal uses of the words of a language.– Not all possible uses!

– It associates meanings with patterns of normal use, rather than with words in isolation.

• A pattern dictionary is driven by Corpus Pattern Analysis (CPA). – as described in Hanks (Euralex Proceedings, 2004).

30

Page 31: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Purposes of a pattern dictionary

• A basic infrastructure resource:

• Showing how meaning maps onto use

• Showing which patterns of usage for each word are important and which are rare.

• With more predictive power about what words mean in context than any available resource

• For use by course-book writers, language teachers, advanced learners, computational linguists, and, of course, lexicographers.

31

Page 32: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Why focus on word uses, not word meanings?

• Traditionally, lexicographers ask, “What is the meaning of this or that word?”

• This assumes that words (in isolation) have meanings.

• I argue (Hanks 1994 and elsewhere) that, strictly speaking, this assumption is wrong. Words in isolation don’t have meanings, they have meaning potentials.

• For example, what does fire mean? – It has the potential to mean lots of things.

• By studying the patterns in which it is normally used, we can work out what it means in normal contexts.

32

Page 33: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Why only normal uses? Why not all possible uses?

• The possibilities of usage of each word are infinite.

• Language in use is tremendously creative:– People like to play with language

– They like to use words in new and interesting ways

– They need to be able to talk about new and unfamiliar things

• To do all this, people exploit the norms of word usage– So lexicographers must say what these norms are.

– But we don’t know what they are – not even for English!

• Corpus Pattern Analysis enables us to discover the norms.

• Then we can associate meanings with norms – patterns – and go to say how the norms are exploited creatively.

33

Page 34: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Interpretations: probabilities, not certainties

• ‘Google fired John’ = Google dismissed John from employment

BUT• ‘Google fired John with enthusiasm’ most probably

means ‘John began to feel enthusiasm for Google’s products or services.’

• Of course it could mean that there are enthusiastic sadists in the Human Resources department at Google ...– Innumerable interpretations of texts are possible but

unlikely.

34

Page 35: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Nouns and verbs

• The apparatus required for analysing nouns is different from that required for verbs.– Nouns are grouped into lexical sets in relation to the verbs that they

normally colligate with.

– Typically, the lexical sets are united by a semantic type.

– A shallow ontology of nouns (grouped by their semantic type) is therefore part of the apparatus of a pattern dictionary.

– Semantic typing in real texts is more complex than might be expected from invented examples.

– Lexical sets include alternations , parts, and attributes of types.

35

Page 36: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Conclusions• Corpus data now enables lexicography to go beyond

Hornby and Hunston in at least two directions:– Adding semantic values to arguments, enabling the mapping of

meanings onto use

– Probable, statistical, normal – no absolute certainties

– Identification of constructions, going beyond the lexical item• The lexicographical task here is to find out which lexical items

participate in which constructions

• So far, lexicography has been slow to respond to these opportunities.

• Funding agencies have been even slower!– They seem to be stuck in a conservative time warp.

36

Page 37: Lexical Patterns: from Hornby to Hunston and beyond Patrick Hanks Faculty of Informatics, Masaryk University, Brno, Czech Republic hanks@fi.muni.cz Afrilex,

Thanks• To you, for listening,

• To the Hornby Trust, for inviting me,

• To U. Pompeu Fabra, as wonderfully efficient hosts,

• To fellow lexicographers, the late John Sinclair, and the (still extant) James Pustejovsky, who have inspired this approach,

• To Karel Pala, Pavel Rychlý, Adam Rambousek, and Adam Kilgarriff, who created tools that make this analysis possible,

• and to the Academy of Sciences of the Czech Republic (project T100300419) and the Czech Ministry of Education (National Research Program II project 2C06009), who are funding this research.

37