english 83

4
How many words? It is often said that the English language is particularly rich in vocabulary) but to make such a statement we need to know what words to count and what counts as a word. Varying estlmates HOw many words are there in English? And how many of these words does a native speaker know? These apparently simple little questions turn out to be surprisingly complicated. In answer to the first, estimates have been given ranging from half a million to over 2 million. In answer to the second, the estimates have been as low as 10,000 and over ten times that number. People are, it seems, quite happy to drop all kinds of figures into their lectures and publications (see Panel 1). The figures give the impression of great precision - though it should be noted that they are usually accompa- nied by such emptying expressions as 'approximately', 'on average', or 'it is thought'. Nonetheless, the vagueness does not stop organizations offering courses and exercises (at a price) that will enable readers to 'increase their word power' - without ever providing these readers with the opportunity of discovering what their current word power actually is. How can we throw light on this apparently confusing area? Let us begin with the question of how many words there are in English - a topic which has attracted almost as many estimates as estimators. The question is complex for two reasons. It partly depends on what you count as an English word, and partly on where you go looking for them. What counts as a word? Consider the problems, if someone asked you to count the number of words in English. You would im- mediately find thousands of cases where you would not be sure whether to count one word or two. In writing, it is often not clear whether some- thing should be written as a single word, as two words, or hyphenated. Is it washing machine or washing- machine? school children or school- children? flower pot, flower-pot or flowerpot? Would you count all the items beginning with foster as new words: foster brother, foster care, foster child,foster father,foster home, etc? Or DAVID CRYSTAL would you treat them as combina- tions of old words: foster + brother, care, and so on. This is a big problem for the dictionary-makers, who often reach different conclusions about what should be done. What would you do with get at, get by, get in, get off, get over, and the dozens of other cases where get is used with an additional word. Would you count get once, for all of these, or would you say that, because these items have different meanings (get at, for example, can mean 'nag'), they should be counted separately? In which case, what about get it?, get your own back, get your act together, and all the other 'idioms'? Would you say that these had to be counted separately too? Would you count kick the bucket (meaning 'die') as three familiar words or as a single idiom? It hardly seems sensible to count the words separately, for kick has nothing to do with moving the foot, nor is bucket a container. If you let the meaning influence you (as it should), then you will find your word count growing very rapidly indeed. But as soon as you do this, you will start to worry about other meanings, even in single words. Is there a single meaning for high in high tea, high priest and high season? Is the lock on a door the same basic Shakespeare had one of the largest vocabularies of any English writer, some 30,000 words. (Estimates of an educated person's vocabulary today vary, but it is probably about half this, 15,000.) (Robert McCrum, et ai, The Story of English, 1986, p. 102) He [Shakespeare] has the largest vocabulary of any writer in English, approximately 34,000 words, which is about double what an educated person uses today in their lifetime. (John Barton, in The Story of English episode 3) meaning as the lock on a canal? Should ring (the shape) be kept separate from ring (the sound?) Are such cases 'the same word with different meanings' or 'different words'? These are the daily decisions that any word-counter (or dictionary compiler) must make. Whose English are we counting? Sooner or later, the question would arise about the kind of vocabulary to include in your count. There wouldn't be a difficulty if the words were part of standard English - used by educated people throughout the English-speaking world. Obviously these have to be counted. But what about the vast numbers of words which are not found everywhere - words which are restricted to a particular country (such as Canada, Britain, India, or Australia), or to a particular part of a country (such as Wales, Yorkshire or Liverpool)? They will include words like stroller (= push-chair) and station (= stock farm) from Australia, bach (= holiday cottage) and pakeha (= white person) from New Zealand, do'ICP (= village) and indaba (= conference) from South Africa cwm (= valley) and eisteddfod (= competitive arts festival) from Wales,faucet (= tap) and fall (= autumn) from North America, fort- At two years old the average vocabul- ary is about three hundred words. By the age of five it is about five thousand. By twelve it is about 12,000. And there for most people it rests - at the same size repertoire employed by a popular daily newspaper. (Jane Bouttell, The Guardian, 12 August 1986) Graduates have an average vocabulary of about 23,000 words, fostered, I would contend, by intensive tutoring. (Jane Bouttel, also The Guardian) ENGLISH TODAY No. 12 - OCTOBER 1987 11

Upload: bhawani-singh-bhati

Post on 03-Dec-2015

235 views

Category:

Documents


3 download

DESCRIPTION

English 83

TRANSCRIPT

Page 1: English 83

How many words?It is often said that the English language is particularly rich in vocabulary) but to make

such a statement we need to know what words to count and what counts as a word.

Varying estlmates

HOw many words are there inEnglish? And how many ofthese words does a native

speaker know? These apparentlysimple little questions turn out to besurprisingly complicated. In answerto the first, estimates have been givenranging from half a million to over 2million. In answer to the second, theestimates have been as low as 10,000and over ten times that number.

People are, it seems, quite happy todrop all kinds of figures into theirlectures and publications (see Panel1). The figures give the impression ofgreat precision - though it should benoted that they are usually accompa­nied by such emptying expressions as'approximately', 'on average', or 'it isthought'. Nonetheless, the vaguenessdoes not stop organizations offeringcourses and exercises (at a price) thatwill enable readers to 'increase their

word power' - without ever providingthese readers with the opportunity ofdiscovering what their current wordpower actually is.

How can we throw light on thisapparently confusing area? Let usbegin with the question of how manywords there are in English - a topicwhich has attracted almost as manyestimates as estimators. The questionis complex for two reasons. It partlydepends on what you count as anEnglish word, and partly on whereyou go looking for them.

What counts as a word?

Consider the problems, if someoneasked you to count the number ofwords in English. You would im­mediately find thousands of caseswhere you would not be sure whetherto count one word or two. In writing,it is often not clear whether some­thing should be written as a singleword, as two words, or hyphenated.Is it washing machine or washing­machine? school children or school­

children? flower pot, flower-pot orflowerpot? Would you count all theitems beginning with foster as newwords: foster brother, foster care, fosterchild,foster father,foster home, etc? Or

DAVID CRYSTAL

would you treat them as combina­tions of old words: foster + brother,care, and so on. This is a big problemfor the dictionary-makers, who oftenreach different conclusions aboutwhat should be done.

What would you do with get at, getby, get in, get off, get over, and thedozens of other cases where get is usedwith an additional word. Would youcount get once, for all of these, orwould you say that, because theseitems have different meanings (get at,for example, can mean 'nag'), theyshould be counted separately? Inwhich case, what about get it?, getyourown back, get your act together, and allthe other 'idioms'? Would you saythat these had to be counted

separately too? Would you count kickthe bucket (meaning 'die') as threefamiliar words or as a single idiom? Ithardly seems sensible to count thewords separately, for kick has nothingto do with moving the foot, nor isbucket a container.

If you let the meaning influenceyou (as it should), then you will findyour word count growing veryrapidly indeed. But as soon as you dothis, you will start to worry aboutother meanings, even in single words.Is there a single meaning for high inhigh tea, high priest and high season? Isthe lock on a door the same basic

Shakespeare had one of the largestvocabularies of any English writer,some 30,000 words. (Estimates of aneducated person's vocabulary todayvary, but it is probably about half this,15,000.) (Robert McCrum, et ai, TheStory of English, 1986, p. 102)

He [Shakespeare] has the largestvocabulary of any writer in English,approximately 34,000 words, which isabout double what an educated personuses today in their lifetime. (JohnBarton, in The Story of English episode3)

meaning as the lock on a canal?Should ring (the shape) be keptseparate from ring (the sound?) Aresuch cases 'the same word withdifferent meanings' or 'differentwords'? These are the daily decisionsthat any word-counter (or dictionarycompiler) must make.

Whose English are wecounting?

Sooner or later, the question wouldarise about the kind of vocabulary toinclude in your count. Therewouldn't be a difficulty if the wordswere part of standard English - usedby educated people throughout theEnglish-speaking world. Obviouslythese have to be counted. But whatabout the vast numbers of wordswhich are not found everywhere ­words which are restricted to a

particular country (such as Canada,Britain, India, or Australia), or to aparticular part of a country (such asWales, Yorkshire or Liverpool)?

They will include words like stroller(= push-chair) and station (= stockfarm) from Australia, bach (= holidaycottage) and pakeha (= white person)from New Zealand, do'ICP (= village)and indaba (= conference) fromSouth Africa cwm (= valley) andeisteddfod (= competitive arts festival)from Wales,faucet (= tap) and fall (=autumn) from North America, fort-

At two years old the average vocabul­ary is about three hundred words. Bythe age of five it is about five thousand.By twelve it is about 12,000. And therefor most people it rests - at the samesize repertoire employed by a populardaily newspaper. (Jane Bouttell, TheGuardian, 12 August 1986)

Graduates have an average vocabularyof about 23,000 words, fostered, Iwould contend, by intensive tutoring.(Jane Bouttel, also The Guardian)

ENGLISH TODAY No. 12 - OCTOBER 1987 11

Page 2: English 83

night (= two weeks) and nappy (=baby wear) from Britain, loch( = lake)and wee (= small) from Scotland,dunny (= money) and duppy (= ghost)from Jamaica, lakh (= a hundredthousand) and crore (= ten million)from India, and many more.

Regional dialect words have everyright to be included in an Englishvocabulary count. They are Englishwords, after all- even if they are usedonly in a single locality. But no oneknows how many there are. Severalbig dictionary projects exist, cata­loguing the local words used in someof these areas, but in many parts ofthe world where English is amother-tongue or second language,there has been little or no research.

And the smaller the locality, thegreater the problem. Everyone knowsthat 'local' words exist: 'we have ourown word for such-and-such roundhere'. Local dialect societies some­

times print lists of them, and dialectsurveys try to keep records of them.But surveys are lengthy and expen­sive enterprises, and not many havebeen completed. As a result, mostregional vocabulary - especially thatused in cities - is never recorded.There must be thousands of distinc­tive words inhabiting such areas asBrooklyn, the East End of London,San Francisco, Edinburgh and Liver­pool, none of which has everappeared in any dictionary.

The more colloquial varieties ofEnglish - and slang, in particular ­also tend to be given inadequatetreatment. In dictionary-writing, thetradition has been to take material

only from the written language, andthis has led to the compilers concen­trating on educated, standard forms.They commonly leave out non­standard expressions, such as every­day slang and obscenities, as well asthe slang of specific social groups,such as the army, sport, thieves,public school, banking, or medicine.Eric Partridge once devoted a wholedictionary to this world of 'slang andunconventional English'. Some of thewords it contained were thought to beso shocking that for several yearsmany libraries banned it from theiropen shelves!

Keeping track of slang, though, isone of the most difficult tasks invocabulary study, because it can be soshifting and short-lived. The life­span of a word or phrase may be onlya few years - or even months. Theexpression might fall out of use in onesocial group, and reappear some timelater in another. Who knows exactly

12 ENGLISH TODAY No. 12 - OCTOBER 1987

DAVID CRYSTAL read English atUniversity College London, and has sinceheld posts in linguistics at the UniversityCollege of North Wales, Bangor, and atthe University of Reading, where hetaught for twenty years. He workscurrently as a writer, lecturer, andbroadcaster on language and linguistics,maintaining his academic links throughan honorary professorship in linguistics atBangor. He is the editor of LinguisticsAbstracts and Child Language Teachingand Therapy. Among his recentpublications are Listen to Your Child,Who Cares About English Usage?, andLinguistic Encounters with LanguageHandicap. His most recent book is theCambridge Encyclopedia of Language.

how much use is still made today ofsuch early jazz-world words as groovy,hip, square, solid, cat, and have a ball?Or how much use is made of the newslang terms derived from computers,such as he's integrated (= organised)or she's high res (= very alert, from'high resolution'). Which words for'being drunk' are now still current:canned, blotto, squiffy, jagged, paraly­tic, smashed ... ? And how do we getat the vast special vocabulary whichhas not grown up in the drugs world?Word-lovers from time to time makecollections, but the feeling alwaysexists that the items listed are only thetip of a huge lexical iceberg.

Some marginal cases

Estimating the vocabulary size ofEnglish is further complicated by theexistence of hundreds of thousands of

uncertain cases - words which youwouldn't feel were part of the'central' vocabulary of the language.On the other hand, you might wellfeel unhappy about leaving them out.

What would you do with all theabbreviations that exist, for example?A recent dictionary of abbreviatedwords (the impressive Acronyms,Initialisms& Abbreviations Dictionarypublished by the Gale ResearchCompany, 11th edition, 1987) listsover 400,000 entries. It includes oldand familiar forms such as flu, hi-ft.,deb, FBI, UFO, NATO and BA.There are large numbers of newtechnical terms, such as VHS (thevideo system), AIDS, and all theterms from computerspeak (PC,RAM, ROM, BASIC, bit) and spacetravel (SRB - solid rocket boosters,OMS - orbital manoeuvring system,etc.) And there are thousands ofcoinages which have a restrictedregional currency, such as RAC (=

Royal Automobile Club), AAA ( =Automobile Association of America),or reflect local organisations andattitudes - with varying levels ofseriousness - such as MADD (=Mothers Against Drunk Driving) andDAMM ( = Drinkers Against MadMothers).

Because these forms are dependenton 'bigger' words for their existence,you might well decide not to includethem in your count. On the otherhand, you could argue that they areoften more important than theoriginal words - and that the originalwords may not even be rememberedor known (as many people find withsuch forms as AIDS). Personally, Iwould include them in my wordcount - but some dictionaries do not.

There are other marginal cases.What would you do with the names ofpeople, places and things in theworld? Should London, Whitehall,Paris, Munich, and Spain be includedin your word coun t? You migh t thinkthey should - especially knowing thatmany of these words are different inother languages (such as M unchen andEspaiia). However, it isn't usual toinclude them as part ofthe vocabularyof English, because the vast majoritycan appear in any language. Whichev­er language you speak, if you walkdown Pall Mall, you can refer towhere you are by using the words PallMall in your own language. The oldmusic hall repartee relied on thispoint:

A: I say, I say, I say. I can speakFrench.B: You can speak French? I didn'tknow that. Let me hear you speakFrench.

A: Paris, Marseilles, Nice, Calais,Jean-Paul Sartre .

The same applies to the names ofpeople, animals, objects (such astrains and boats), and so on. Propernames aren't part of anyonelanguage: they are universal. How­ever, it's important to note the usageswhere these words do take on specialmeanings - as in Has Whitehall saidanything about this? Here, Whitehallmeans 'the government'; it isn't just aplace name. Dictionaries wouldusually include this kind of usage intheir list. But it's not at all clear howmany uses of this kind there are.

Fauna and flora present a furthertype of difficulty. Around a millionspecies of insects have already beendescribed, for example. Which meansthat there must be around a million

Page 3: English 83

Lexical coverage of threeunabridged US dictionaries

overlap. This figure is not muchincreased even if RH's proper namesare excluded from consideration.

The same story emerges if pairs ofdictionaries are compared. There is anoverlap of 13 between WIll and RH,of 11between RH and WEE, and of 10between WIll and WEE, suggestingthat, if this sample is representative, theaverage overlapping coverage (as de­fined by headwords) between any twodictionaries might be as low as 25%.

A hint of the extent to which any givendictionary underestimates the totalword-stock of English can be obtainedfrom the table below, which lists thebold-face words found as initial itemsin the entries of three unabridgedAmerican dictionaries (variants laterin the entry's opening line have beenexcluded). Of the 48 possible itemslisted, coverage ranges from 70% to35%. Only nine words appear in allthree dictionaries - less than 20%

Webscer III Random HouseWorld Eook

Encyclopediasabasaba

Sabadellsabadillasabadillasabadilla

sabadine sabadininesabaeanlSabaeanSabaean

sabaean2 Sabahsabai grass SabaismSabaistsabakha sabalsabalosabalo

sabalote sabal palmettosabana

~

designations available to enable En­glish-speaking entomologists to talkabout their subject. How much of thiscan be included in our word count?

The largest dictionaries already in­clude hundreds of thousands oftechnical and scientific terms, butnone of them includes more than afraction of the insect names - usuallyjust the most important species. Addthis total to that required for birds,fish, and other animals, and thetheoretical size of English vocabularyincreases enormously.

In the light of these problems, itmay not be possible to arrive at asatisfactory total for English vocabul­ary. But one thing is plain: the corevocabulary, as reflected in the entrytotals cited for such works as theunabridged Oxford English Dictionaryor Webster's Third New International,is a considerable underestimate (seePanel 2). These totals focus on afigure of about half a million.However, if we allow in some of theabove categories, this figure must beincreased by a factor of three or four.I would never want to go below onemillion, for an estimate of Englishvocabulary, and with very littlepersuasion I would readily accepttwo.

How large is yourvocabulary?

SabaothSabaothSabataSabatierSabatini

sabathe's cyclesabatonsabayonsabbatsabbatarianIsabbatarian2sabbatarianismsabbath

sabbath dayIsabbath dalsabbathariansabbath-day housesabbath-day's journeysabbathless

sabbathlysabbath schoolsabbatiasabbatian1sabbatian2

sabbaticallsabbatical2

Total: 34

sabatonsabayonSabbatSabbatarian

SabbatarianismSabbath

SabbathlessSabbatWike

Sabbath School

Sabbatical

SabbaticallySabbaticalness

22

SabbatSabbatarian

SabbatarianismSabbathsabbath

Sabbath-day's journeySabbathless

Sabbath School

sabbaticsabbatical

sabbaticalssabbatically

17

There seems to be no more agreementabout the size of an adult's vocabularythan there is about the total number

of words in English. Estimates doindeed vary, as we have seen. Part ofthe problem, I imagine', is what ismeant by 'educated'. But whether weare educated or not, how can we findout the truth of the matter?

We might tape record everythingwe said and heard for a month, or ayear, and keep a record of everythingwe read and wrote. Then we couldtabulate all the words, mark whichones we understood and which we

failed to understand, and count up.But life is too short.

An alternative, which can becarried out in a couple of hours, givesa fairly good idea. You take amedium-sized dictionary - one whichcontains about 100,000 entries - andtest your know ledge of a sample of thewords it contains. A sample of about2% of the whole, taken from varioussections of the alphabet, gives areasonable result. In other words, ifsuch a dictionary were 2000 pageslong, you would have a sample of 40pages. Use the following procedure.

ENGLISH TODAY No. 12 - OCTOBER 1987 13

Page 4: English 83

P"t of one p,~"~~::y~,~!':;'~mg~~t!'~W:~'~fth, L~Dictionary of the English Language (90,000+ headwords). + = known/used.

KNOWNUSED

Well

VaguelyNoOftenOccasionallyNever

cablese

+ +cable stich

++cable television

+ +cable vision

+ +cableway

++cabman

+ +cabob

++Caboc

++cabochon (noun)

++cabochon (abverb)

++caboodle

+ +caboose

+ +cabotage

++cab-rank

++cabriole

++cabriolet

++cab stand

+ +

WORDS USED

WORDS KNOWN

The results are interesting. Note thatpassive vocabulary is much largerthan active. This will always be thecase. You will also find that it's easier

to make up your mind about thewords you definitely know than thewords you frequently use.

Even allowing for wishful think­ing, sampling bias, and other suchfactors, it would seem that some ofthe widely quoted estimates of ourvocabulary size are a long way fromreality. Comparisons with Shake­speare or other past writers are'meaningless, given the enormousincrease in English vocabulary sincehis day. What I would now very muchlike to know is (a) whether thisprocedure can be tightened up insome way, or whether a betterprocedure can be suggested? and (b)what range of totals emerge frompeople of varying backgrounds andages? ET will publish in due course arange of vocabulary estimates fromreaders who have tried out theprocedure for themselves (or, if theyprefer, have tried it out on a 'friend').If you do send in these details, pleasemake sure you include data on age,educational background, and occupa­tion, as well as the dictionary youused. The results will always beinteresting, and may be surprising. Ifnothing else, it can provide you with agood topic for parties. There reallyisn't a way of capping such observa­tions as 'I have an active vocabulary ofapproximately 38,600 words'. It willbe a safe conversation-stopper ­unless, that is, you encounter anotherET reader at the same party. ,F.;[j

Occasionally

15,200

31,500

Vaguely

8,250

38,300

Often

16,300

Well

30,050

you know or use the word in any of itsmeanings, that will do. (Decidinghow many meanings of a word youknow or use would be another - much

vaster - project!)

• When you've finished, add up theticks in each column, and multiplythe total by 50 (if the sample was 2%of the whole). The total in the firstcolumn is probably an underestimateof your vocabulary size. And if youtake the first two columns together,the total will probably be an overesti­mate.

This procedure of course doesn'tallow for people who happen to knowa large number of non-standardwords that may not be in thedictionary (such as local dialectwords). If you are such a person, thefigures will have to be adjusted again- bu~ that will be pure guesswork.

Here are the estimates for the first

two columns, as filled in by a femaleoffice secretary in her 50s:

• The table has two columns: the

first asks you to say whether youthink you know the word, fromhaving heard or seen it used; thesecond whether you think youactually use it yourself in your speechor writing. This is the differencebetween passive and active vocabul­ary. Within each column, there arethree judgments to be made. Forpassive vocabulary, you ask 'Do Iknow the word well? vaguely? or notat all?'. For active vocabulary, youask: 'Do I use the word' often?

occasionally? or not at all?'. Place atick in the appropriate column. If youare uncertain, use· the final column.You may need to look at the definitionor examples given next to the word,before you can decide. Ignore thenumber of meanings the word has: if

• Draw up a table of words like theone in Panel 3. On the left-hand sidewrite in the headwords from thedictionary, as they appear. Do notinclude any parts of words which thedictionary might list, such as cac- or-caine, but do include words withaffixes, such as cadetship alongsidecadet, even if the former is listed onlyas -ship within the entry on cadet. Inshort, include all items in bold facewithin an entry. Include phrases oridioms (e.g. call the tune). Ignorealternative spellings (e.g. caesarian/cesanan.

• It's wise to break this sample downinto a series of selections, say of 5pages each, from different parts of thedictionary. It wouldn't be sensible totake all 40 pages from the letter U, forinstance, as a large number of thesewords would begin with un-, and thiswould hardly be typical. On the otherhand, prefixes are an importantaspect of English word formation, sowe mustn't exclude them entirely.Similarly, it would be silly to includea section containing a large number ofscientific words (such as the sectioncontaining electro-), or rare words(such as those beginning with X).

• One possible sample, which triesto balance various factors of this kind,would take sections of 5 completepages from each of the following partsof the dictionary: C-, EX-, J-, 0-,PL-, SC-, TO- and UN-. Begin withthe first full page in each case - inother words, don't include the veryfirst page of the C section, if theheading takes up a large part of thepage; ignore the first few EX- entries,if they start towards the bottom of apage; and so on.

14 ENGLISH TODAY No. 12 - OCTOBER 1987