english 83
DESCRIPTION
English 83TRANSCRIPT
![Page 1: English 83](https://reader038.vdocuments.us/reader038/viewer/2022102800/563db7c6550346aa9a8ddaea/html5/thumbnails/1.jpg)
How many words?It is often said that the English language is particularly rich in vocabulary) but to make
such a statement we need to know what words to count and what counts as a word.
Varying estlmates
HOw many words are there inEnglish? And how many ofthese words does a native
speaker know? These apparentlysimple little questions turn out to besurprisingly complicated. In answerto the first, estimates have been givenranging from half a million to over 2million. In answer to the second, theestimates have been as low as 10,000and over ten times that number.
People are, it seems, quite happy todrop all kinds of figures into theirlectures and publications (see Panel1). The figures give the impression ofgreat precision - though it should benoted that they are usually accompanied by such emptying expressions as'approximately', 'on average', or 'it isthought'. Nonetheless, the vaguenessdoes not stop organizations offeringcourses and exercises (at a price) thatwill enable readers to 'increase their
word power' - without ever providingthese readers with the opportunity ofdiscovering what their current wordpower actually is.
How can we throw light on thisapparently confusing area? Let usbegin with the question of how manywords there are in English - a topicwhich has attracted almost as manyestimates as estimators. The questionis complex for two reasons. It partlydepends on what you count as anEnglish word, and partly on whereyou go looking for them.
What counts as a word?
Consider the problems, if someoneasked you to count the number ofwords in English. You would immediately find thousands of caseswhere you would not be sure whetherto count one word or two. In writing,it is often not clear whether something should be written as a singleword, as two words, or hyphenated.Is it washing machine or washingmachine? school children or school
children? flower pot, flower-pot orflowerpot? Would you count all theitems beginning with foster as newwords: foster brother, foster care, fosterchild,foster father,foster home, etc? Or
DAVID CRYSTAL
would you treat them as combinations of old words: foster + brother,care, and so on. This is a big problemfor the dictionary-makers, who oftenreach different conclusions aboutwhat should be done.
What would you do with get at, getby, get in, get off, get over, and thedozens of other cases where get is usedwith an additional word. Would youcount get once, for all of these, orwould you say that, because theseitems have different meanings (get at,for example, can mean 'nag'), theyshould be counted separately? Inwhich case, what about get it?, getyourown back, get your act together, and allthe other 'idioms'? Would you saythat these had to be counted
separately too? Would you count kickthe bucket (meaning 'die') as threefamiliar words or as a single idiom? Ithardly seems sensible to count thewords separately, for kick has nothingto do with moving the foot, nor isbucket a container.
If you let the meaning influenceyou (as it should), then you will findyour word count growing veryrapidly indeed. But as soon as you dothis, you will start to worry aboutother meanings, even in single words.Is there a single meaning for high inhigh tea, high priest and high season? Isthe lock on a door the same basic
Shakespeare had one of the largestvocabularies of any English writer,some 30,000 words. (Estimates of aneducated person's vocabulary todayvary, but it is probably about half this,15,000.) (Robert McCrum, et ai, TheStory of English, 1986, p. 102)
He [Shakespeare] has the largestvocabulary of any writer in English,approximately 34,000 words, which isabout double what an educated personuses today in their lifetime. (JohnBarton, in The Story of English episode3)
meaning as the lock on a canal?Should ring (the shape) be keptseparate from ring (the sound?) Aresuch cases 'the same word withdifferent meanings' or 'differentwords'? These are the daily decisionsthat any word-counter (or dictionarycompiler) must make.
Whose English are wecounting?
Sooner or later, the question wouldarise about the kind of vocabulary toinclude in your count. Therewouldn't be a difficulty if the wordswere part of standard English - usedby educated people throughout theEnglish-speaking world. Obviouslythese have to be counted. But whatabout the vast numbers of wordswhich are not found everywhere words which are restricted to a
particular country (such as Canada,Britain, India, or Australia), or to aparticular part of a country (such asWales, Yorkshire or Liverpool)?
They will include words like stroller(= push-chair) and station (= stockfarm) from Australia, bach (= holidaycottage) and pakeha (= white person)from New Zealand, do'ICP (= village)and indaba (= conference) fromSouth Africa cwm (= valley) andeisteddfod (= competitive arts festival)from Wales,faucet (= tap) and fall (=autumn) from North America, fort-
At two years old the average vocabulary is about three hundred words. Bythe age of five it is about five thousand.By twelve it is about 12,000. And therefor most people it rests - at the samesize repertoire employed by a populardaily newspaper. (Jane Bouttell, TheGuardian, 12 August 1986)
Graduates have an average vocabularyof about 23,000 words, fostered, Iwould contend, by intensive tutoring.(Jane Bouttel, also The Guardian)
ENGLISH TODAY No. 12 - OCTOBER 1987 11
![Page 2: English 83](https://reader038.vdocuments.us/reader038/viewer/2022102800/563db7c6550346aa9a8ddaea/html5/thumbnails/2.jpg)
night (= two weeks) and nappy (=baby wear) from Britain, loch( = lake)and wee (= small) from Scotland,dunny (= money) and duppy (= ghost)from Jamaica, lakh (= a hundredthousand) and crore (= ten million)from India, and many more.
Regional dialect words have everyright to be included in an Englishvocabulary count. They are Englishwords, after all- even if they are usedonly in a single locality. But no oneknows how many there are. Severalbig dictionary projects exist, cataloguing the local words used in someof these areas, but in many parts ofthe world where English is amother-tongue or second language,there has been little or no research.
And the smaller the locality, thegreater the problem. Everyone knowsthat 'local' words exist: 'we have ourown word for such-and-such roundhere'. Local dialect societies some
times print lists of them, and dialectsurveys try to keep records of them.But surveys are lengthy and expensive enterprises, and not many havebeen completed. As a result, mostregional vocabulary - especially thatused in cities - is never recorded.There must be thousands of distinctive words inhabiting such areas asBrooklyn, the East End of London,San Francisco, Edinburgh and Liverpool, none of which has everappeared in any dictionary.
The more colloquial varieties ofEnglish - and slang, in particular also tend to be given inadequatetreatment. In dictionary-writing, thetradition has been to take material
only from the written language, andthis has led to the compilers concentrating on educated, standard forms.They commonly leave out nonstandard expressions, such as everyday slang and obscenities, as well asthe slang of specific social groups,such as the army, sport, thieves,public school, banking, or medicine.Eric Partridge once devoted a wholedictionary to this world of 'slang andunconventional English'. Some of thewords it contained were thought to beso shocking that for several yearsmany libraries banned it from theiropen shelves!
Keeping track of slang, though, isone of the most difficult tasks invocabulary study, because it can be soshifting and short-lived. The lifespan of a word or phrase may be onlya few years - or even months. Theexpression might fall out of use in onesocial group, and reappear some timelater in another. Who knows exactly
12 ENGLISH TODAY No. 12 - OCTOBER 1987
DAVID CRYSTAL read English atUniversity College London, and has sinceheld posts in linguistics at the UniversityCollege of North Wales, Bangor, and atthe University of Reading, where hetaught for twenty years. He workscurrently as a writer, lecturer, andbroadcaster on language and linguistics,maintaining his academic links throughan honorary professorship in linguistics atBangor. He is the editor of LinguisticsAbstracts and Child Language Teachingand Therapy. Among his recentpublications are Listen to Your Child,Who Cares About English Usage?, andLinguistic Encounters with LanguageHandicap. His most recent book is theCambridge Encyclopedia of Language.
how much use is still made today ofsuch early jazz-world words as groovy,hip, square, solid, cat, and have a ball?Or how much use is made of the newslang terms derived from computers,such as he's integrated (= organised)or she's high res (= very alert, from'high resolution'). Which words for'being drunk' are now still current:canned, blotto, squiffy, jagged, paralytic, smashed ... ? And how do we getat the vast special vocabulary whichhas not grown up in the drugs world?Word-lovers from time to time makecollections, but the feeling alwaysexists that the items listed are only thetip of a huge lexical iceberg.
Some marginal cases
Estimating the vocabulary size ofEnglish is further complicated by theexistence of hundreds of thousands of
uncertain cases - words which youwouldn't feel were part of the'central' vocabulary of the language.On the other hand, you might wellfeel unhappy about leaving them out.
What would you do with all theabbreviations that exist, for example?A recent dictionary of abbreviatedwords (the impressive Acronyms,Initialisms& Abbreviations Dictionarypublished by the Gale ResearchCompany, 11th edition, 1987) listsover 400,000 entries. It includes oldand familiar forms such as flu, hi-ft.,deb, FBI, UFO, NATO and BA.There are large numbers of newtechnical terms, such as VHS (thevideo system), AIDS, and all theterms from computerspeak (PC,RAM, ROM, BASIC, bit) and spacetravel (SRB - solid rocket boosters,OMS - orbital manoeuvring system,etc.) And there are thousands ofcoinages which have a restrictedregional currency, such as RAC (=
Royal Automobile Club), AAA ( =Automobile Association of America),or reflect local organisations andattitudes - with varying levels ofseriousness - such as MADD (=Mothers Against Drunk Driving) andDAMM ( = Drinkers Against MadMothers).
Because these forms are dependenton 'bigger' words for their existence,you might well decide not to includethem in your count. On the otherhand, you could argue that they areoften more important than theoriginal words - and that the originalwords may not even be rememberedor known (as many people find withsuch forms as AIDS). Personally, Iwould include them in my wordcount - but some dictionaries do not.
There are other marginal cases.What would you do with the names ofpeople, places and things in theworld? Should London, Whitehall,Paris, Munich, and Spain be includedin your word coun t? You migh t thinkthey should - especially knowing thatmany of these words are different inother languages (such as M unchen andEspaiia). However, it isn't usual toinclude them as part ofthe vocabularyof English, because the vast majoritycan appear in any language. Whichever language you speak, if you walkdown Pall Mall, you can refer towhere you are by using the words PallMall in your own language. The oldmusic hall repartee relied on thispoint:
A: I say, I say, I say. I can speakFrench.B: You can speak French? I didn'tknow that. Let me hear you speakFrench.
A: Paris, Marseilles, Nice, Calais,Jean-Paul Sartre .
The same applies to the names ofpeople, animals, objects (such astrains and boats), and so on. Propernames aren't part of anyonelanguage: they are universal. However, it's important to note the usageswhere these words do take on specialmeanings - as in Has Whitehall saidanything about this? Here, Whitehallmeans 'the government'; it isn't just aplace name. Dictionaries wouldusually include this kind of usage intheir list. But it's not at all clear howmany uses of this kind there are.
Fauna and flora present a furthertype of difficulty. Around a millionspecies of insects have already beendescribed, for example. Which meansthat there must be around a million
![Page 3: English 83](https://reader038.vdocuments.us/reader038/viewer/2022102800/563db7c6550346aa9a8ddaea/html5/thumbnails/3.jpg)
Lexical coverage of threeunabridged US dictionaries
overlap. This figure is not muchincreased even if RH's proper namesare excluded from consideration.
The same story emerges if pairs ofdictionaries are compared. There is anoverlap of 13 between WIll and RH,of 11between RH and WEE, and of 10between WIll and WEE, suggestingthat, if this sample is representative, theaverage overlapping coverage (as defined by headwords) between any twodictionaries might be as low as 25%.
A hint of the extent to which any givendictionary underestimates the totalword-stock of English can be obtainedfrom the table below, which lists thebold-face words found as initial itemsin the entries of three unabridgedAmerican dictionaries (variants laterin the entry's opening line have beenexcluded). Of the 48 possible itemslisted, coverage ranges from 70% to35%. Only nine words appear in allthree dictionaries - less than 20%
Webscer III Random HouseWorld Eook
Encyclopediasabasaba
Sabadellsabadillasabadillasabadilla
sabadine sabadininesabaeanlSabaeanSabaean
sabaean2 Sabahsabai grass SabaismSabaistsabakha sabalsabalosabalo
sabalote sabal palmettosabana
~
designations available to enable English-speaking entomologists to talkabout their subject. How much of thiscan be included in our word count?
The largest dictionaries already include hundreds of thousands oftechnical and scientific terms, butnone of them includes more than afraction of the insect names - usuallyjust the most important species. Addthis total to that required for birds,fish, and other animals, and thetheoretical size of English vocabularyincreases enormously.
In the light of these problems, itmay not be possible to arrive at asatisfactory total for English vocabulary. But one thing is plain: the corevocabulary, as reflected in the entrytotals cited for such works as theunabridged Oxford English Dictionaryor Webster's Third New International,is a considerable underestimate (seePanel 2). These totals focus on afigure of about half a million.However, if we allow in some of theabove categories, this figure must beincreased by a factor of three or four.I would never want to go below onemillion, for an estimate of Englishvocabulary, and with very littlepersuasion I would readily accepttwo.
How large is yourvocabulary?
SabaothSabaothSabataSabatierSabatini
sabathe's cyclesabatonsabayonsabbatsabbatarianIsabbatarian2sabbatarianismsabbath
sabbath dayIsabbath dalsabbathariansabbath-day housesabbath-day's journeysabbathless
sabbathlysabbath schoolsabbatiasabbatian1sabbatian2
sabbaticallsabbatical2
Total: 34
sabatonsabayonSabbatSabbatarian
SabbatarianismSabbath
SabbathlessSabbatWike
Sabbath School
Sabbatical
SabbaticallySabbaticalness
22
SabbatSabbatarian
SabbatarianismSabbathsabbath
Sabbath-day's journeySabbathless
Sabbath School
sabbaticsabbatical
sabbaticalssabbatically
17
There seems to be no more agreementabout the size of an adult's vocabularythan there is about the total number
of words in English. Estimates doindeed vary, as we have seen. Part ofthe problem, I imagine', is what ismeant by 'educated'. But whether weare educated or not, how can we findout the truth of the matter?
We might tape record everythingwe said and heard for a month, or ayear, and keep a record of everythingwe read and wrote. Then we couldtabulate all the words, mark whichones we understood and which we
failed to understand, and count up.But life is too short.
An alternative, which can becarried out in a couple of hours, givesa fairly good idea. You take amedium-sized dictionary - one whichcontains about 100,000 entries - andtest your know ledge of a sample of thewords it contains. A sample of about2% of the whole, taken from varioussections of the alphabet, gives areasonable result. In other words, ifsuch a dictionary were 2000 pageslong, you would have a sample of 40pages. Use the following procedure.
ENGLISH TODAY No. 12 - OCTOBER 1987 13
![Page 4: English 83](https://reader038.vdocuments.us/reader038/viewer/2022102800/563db7c6550346aa9a8ddaea/html5/thumbnails/4.jpg)
P"t of one p,~"~~::y~,~!':;'~mg~~t!'~W:~'~fth, L~Dictionary of the English Language (90,000+ headwords). + = known/used.
KNOWNUSED
Well
VaguelyNoOftenOccasionallyNever
cablese
+ +cable stich
++cable television
+ +cable vision
+ +cableway
++cabman
+ +cabob
++Caboc
++cabochon (noun)
++cabochon (abverb)
++caboodle
+ +caboose
+ +cabotage
++cab-rank
++cabriole
++cabriolet
++cab stand
+ +
WORDS USED
WORDS KNOWN
The results are interesting. Note thatpassive vocabulary is much largerthan active. This will always be thecase. You will also find that it's easier
to make up your mind about thewords you definitely know than thewords you frequently use.
Even allowing for wishful thinking, sampling bias, and other suchfactors, it would seem that some ofthe widely quoted estimates of ourvocabulary size are a long way fromreality. Comparisons with Shakespeare or other past writers are'meaningless, given the enormousincrease in English vocabulary sincehis day. What I would now very muchlike to know is (a) whether thisprocedure can be tightened up insome way, or whether a betterprocedure can be suggested? and (b)what range of totals emerge frompeople of varying backgrounds andages? ET will publish in due course arange of vocabulary estimates fromreaders who have tried out theprocedure for themselves (or, if theyprefer, have tried it out on a 'friend').If you do send in these details, pleasemake sure you include data on age,educational background, and occupation, as well as the dictionary youused. The results will always beinteresting, and may be surprising. Ifnothing else, it can provide you with agood topic for parties. There reallyisn't a way of capping such observations as 'I have an active vocabulary ofapproximately 38,600 words'. It willbe a safe conversation-stopper unless, that is, you encounter anotherET reader at the same party. ,F.;[j
Occasionally
15,200
31,500
Vaguely
8,250
38,300
Often
16,300
Well
30,050
you know or use the word in any of itsmeanings, that will do. (Decidinghow many meanings of a word youknow or use would be another - much
vaster - project!)
• When you've finished, add up theticks in each column, and multiplythe total by 50 (if the sample was 2%of the whole). The total in the firstcolumn is probably an underestimateof your vocabulary size. And if youtake the first two columns together,the total will probably be an overestimate.
This procedure of course doesn'tallow for people who happen to knowa large number of non-standardwords that may not be in thedictionary (such as local dialectwords). If you are such a person, thefigures will have to be adjusted again- bu~ that will be pure guesswork.
Here are the estimates for the first
two columns, as filled in by a femaleoffice secretary in her 50s:
• The table has two columns: the
first asks you to say whether youthink you know the word, fromhaving heard or seen it used; thesecond whether you think youactually use it yourself in your speechor writing. This is the differencebetween passive and active vocabulary. Within each column, there arethree judgments to be made. Forpassive vocabulary, you ask 'Do Iknow the word well? vaguely? or notat all?'. For active vocabulary, youask: 'Do I use the word' often?
occasionally? or not at all?'. Place atick in the appropriate column. If youare uncertain, use· the final column.You may need to look at the definitionor examples given next to the word,before you can decide. Ignore thenumber of meanings the word has: if
• Draw up a table of words like theone in Panel 3. On the left-hand sidewrite in the headwords from thedictionary, as they appear. Do notinclude any parts of words which thedictionary might list, such as cac- or-caine, but do include words withaffixes, such as cadetship alongsidecadet, even if the former is listed onlyas -ship within the entry on cadet. Inshort, include all items in bold facewithin an entry. Include phrases oridioms (e.g. call the tune). Ignorealternative spellings (e.g. caesarian/cesanan.
• It's wise to break this sample downinto a series of selections, say of 5pages each, from different parts of thedictionary. It wouldn't be sensible totake all 40 pages from the letter U, forinstance, as a large number of thesewords would begin with un-, and thiswould hardly be typical. On the otherhand, prefixes are an importantaspect of English word formation, sowe mustn't exclude them entirely.Similarly, it would be silly to includea section containing a large number ofscientific words (such as the sectioncontaining electro-), or rare words(such as those beginning with X).
• One possible sample, which triesto balance various factors of this kind,would take sections of 5 completepages from each of the following partsof the dictionary: C-, EX-, J-, 0-,PL-, SC-, TO- and UN-. Begin withthe first full page in each case - inother words, don't include the veryfirst page of the C section, if theheading takes up a large part of thepage; ignore the first few EX- entries,if they start towards the bottom of apage; and so on.
14 ENGLISH TODAY No. 12 - OCTOBER 1987