    Context-sensitive regularities in English vowelspellingMARK ARONOPF and ERIC KOCHDepartment of Linguistics, The University at Stony Brook, Stony Brook, New York, USA

    ABSTRACT. The predictive value of rime spellings in English was compared directly to othertypes of regularities beyond the level o f the single letter. A computer-assisted analysis of a listof twenty-four thousand written words, each paired with its corresponding pronunciation, revealsthat only a small number of rime spellings are highly regular in their pronunciations. Theconventional division of vowel letter pronunciations into short and long in closed and openwritten syllables is the most reliable key to English pronunciation. Our findings support thenotion that English spelling is based at least in part on syllable structure. In addition, prefixesand suffixes provide very reliable clues to pronunciation, which suggests that their regularityshould be exploited in the teaching of reading.KEY WORDS: Decoding, Linguistic regularity, Pronunciation, Rhyme, Rime, Spelling, Syllable

    Types of linguistic regularities in English spellingMost linguists have long believed that the optimal writing system is analphabet in which each letter stands for one speech sound and vice versa. Infact, no commonly used alphabet is phonemically optimal in this way, althougha few come close, like those of Italian and Polish. Of all alphabetic writingsystems, though, that of English is farthest from this ideal: no English lettercorresponds without exception to one sound, with some letters correspondingto almost twenty sounds. In addition, the system of correspondences is notconfined to individual letters but goes from one extreme to the other, fromsingle letters to whole words. Written English words sometimes can besounded out one letter at a time. So, up is /ap/ because u) = /a/ in a closedsyllable (a notion that we discuss in detail below) and (p) = /p/. (In Table 1we have given all the phonetic symbols that we use in this paper.) Sometimes,

    Table I. Phonetic symbols and their pronunciationsPhonetic symbol Pronunciation Phonetic symbol Pronunciationaaja?eej




    Reading and Writing: An Interdisciplinary Journal 8: 25 l-265 (June 1996)0 1996 Kluwer Academic Publishers. Printed in the Netherlands.

    ENGLISH VOWEL SPELLING 253is always pronounced as in luck; and (dg), which is always pronounced as injudge. English vowel letters are especially irregular in their pronunciation ifwe look at them only out of context. Unlike consonant letters, no Englishvowel letter approaches context-free letter-to-sound correspondence. The letter(0) alone corresponds to seventeen different sounds (Venezky 1970). However,once we allow ourselves to look at contexts - sequences of vowel and con-sonant letters - then greater regularity emerges. For starters, sequences ofvowel letters without consonant letters, vowel digruphs, are much more regularin their pronunciation than are individual vowel letters. So, while (0) hasseventeen different pronunciations and (a) has ten, (oa) has only three: /owl(as in loaf). /a/, and /owej/. But the second occurs only in the words broadand abroad, while the last occurs only in one word, oasis. In the great majorityof cases, (oa) is pronounced /ow/. In Tables 2 and 3 we list vowel digraphsas well as vowel digraphs followed by (r) along with their most commonpronunciations.Sequences consisting of a vowel letter followed by one or more consonantletters are also often quite regular. So, if we were to demand regularity at thelevel of the individual letter, then we would say that tight is pronounced

    Table 2. The pronunciation of vowel digraphsVowel digraph Phonetic value Vowel digraph Phonetic valueaeaiauawayeaeeeieoeueweyieoa




    Table 3. The pronunciation of vowel digraph + rVowel digraph + r Phonetic value Vowel digraph + r Phonetic valueaerairaureareer




    254 MARK ARONOFF AND ERIC KOCHirregularly, but tight, lighr, sight, fight and (except for eight) all of the overone hundred words in English which end in (ight) are pronounced with /ajt/,which is to say that the letter sequence (Cight) has a regular pronunciation,or that the letter (i) is regularly pronounced as /aj/ in the context of beingfollowed by (ght). Interestingly, a regular letter sequence like this oneconsisting of a vowel followed by one or more consonants is not linguisti-cally arbitrary, but rather corresponds to a well-known unit: the syllable rime.We will therefore call a sequence of a vowel letter followed by one or moreconsonant letters that corresponds to a syllable rime a rime spelling. Bycontrast, sequences that consist of a vowel letter preceded by one or moreconsonant letters, which do not correspond to a linguistically important subpartof a syllable, do not have any special role in English spelling.

    Other linguistic units besides the rime are important in understandingEnglish spelling. Thus, one major morphological characteristic of Englishspelling that sets it apart from most other spelling systems is the fact that itsystematically distinguishes homonyms.3 Thus, we have examples like pair,pure and pear, which are spelled differently, though all of them are pronounced/per/. In the case of homonyms, different spellings are used to distinguishlexical items that have the same pronunciation. Similarly, an affix will some-times have a constant spelling even though it is pronounced quite differentlyin different contexts. For example, the spelling of the regular plural markerfor nouns is almost always (s) and that of the regular past tense marker always(ed), even though the pronunciation of each varies quite systematically. Theplural marker (s) may be pronounced /s/ (as in curs), /z/ (as in hills), or /Iz/(as in horses), depending on the last sound of the word that it is attached to,but it is almost always spelled (s), regardless of its pronunciation. Similarly,the past tense marker (ed) is pronounced /t/ (stopped), /d/ (stubbed), or /Id/(butted), again depending on the last sound of the word it attaches to, yet itis always spelled (ed). There is no good reason why one affix ((ed)) is alwaysspelled with (e), regardless of whether the (e) is pronounced, while the otherone ((s)) is usually not spelled with (e), but nonetheless, the spelling of theseindividual morphemes is very regular.The regularities in the pronunciation of written English words are thusfound at several well-understood linguistic levels beyond the individualletter/sound correspondence that linguists expect to find in a simple idealalphabet: lexical (whole words), morphological (individual affixes), voweldigraphs (vowel letter sequences), and rimes (vowel-consonant lettersequences). This system of simultaneous regularities is both what makesEnglish spelling so difficult to understand and what makes it interesting tolinguists.Long and short vowels in relation to syllable rimesTraditional spelling instruction already depends on the notion of syllable rimes,but only indirectly and by another name: the classification of vowel letter

    ENGLISH VOWEL SPELLING 255pronunciations as long and short. Old English had pairs of truly long and shortvowel sounds that were identical except that one took up more time than theother. By the Middle English period, long and short vowel sounds occurredlargely in separate environments: short vowel sounds mostly in closedsyllables or closed rimes, in which the vowel sound is followed by one ortwo consonant sounds in the same syllable; long vowel sounds largely in opensyllables or open rimes, in which there is no closing consonant sound at theend of the syllable, so that the syllable is said to be open. We will use theabbreviation CV for open syllables and CVC for closed syllables.4In the period that divides Middle English from Modern English(14OO-1600), the phonetic value of the long vowel sounds was altered quitedramatically in the Great Vowel Shift, but their spelling did not change,resulting in the spelling system that we have today, in which the same letterstands for two quite distinct vowel sounds, which we call long and short,although they are only historically related to pairs of long and short vowelsounds and are otherwise quite different from one another phonetically. Theseare distributed more or less as their precursors were: short pronunciationsin closed syllables and long pronunciations in open syllables. This systemis laid out in Table 4, which also gives the pronunciations of vowel letters intwo other types of syllables that we discuss in more detail below. Henceforth,we will refer to the system of long and short pronunciation of vowels in openand closed syllables as well as the system of vowel digraphs and voweldigraphs followed by (r) (laid out in Tables 2 and 3) as syllable rime)-rypeanalysis, since it depends on the type of syllable rime that a vowel letter occursin, without reference to individual consonants.One other sound change determined the spelling of historically long andshort vowel sounds: the loss of unstressed final /e/. In a word like tale, thefinal (e) was originally fully pronounced and the word contained twosyllables. The first syllable was open and so the vowel sound in that syllablewas long. At about the time of Chaucer, word-final unstressed /e/ was dropped.The consonant sound that preceded this vowel sound then became the closingconsonant sound of the preceding syllable, so that the word tale was nowpronounced as a single closed syllable, CVC. One might have expected theTable 4. Default pronunciations of single vowel letters by syllable typeLetter Pronunciation of the letter

    Open syllable Closed syllable Vowel + r syllable VCe syllableaija.iowjuwa.i


    256 MARK ARONOFF AND ERIC KOCHvowel sound of this closed syllable to become short, in accordance with thegeneral pattern of short vowel sounds in closed syllables and long vowelsounds in open syllables, but it didnt. Furthermore, the (e), while it was nolonger pronounced, remained in the writing system as what we now call silente, a kind of virtual vowel: although it is not pronounced, the (e) still servesas a placeholder, so that the (1) of tale can be said to form a purely writtenCV syllable with (e), and the first syllable is still effectively open ortho-graphically, though not in its spoken form. Silent (e) thus functions in ModemEnglish as a way of indicating preceding orthographically open syllables andlong vowel sounds, even in cases where the spoken word consists of a closedsyllable. This allows the writing system to preserve the pattern of so-calledlong vowels in open (written) syllables and so-called short vowel sounds inclosed (written) syllables. We refer to this word-final silent (e) spelling patternas VCe .The opposite of silent (e) is consonant doubling. In the same way that silent(e) is used to indicate that the vowel sound of the previous written syllableis long, consonant doubling is used to indicate that it is short. There are notrue phonetically double consonants in spoken English, except when the firstword of a compound ends in a certain consonant sound and the second wordbegins in the same one (e.g., crabbait orfiallike). Otherwise, when we finda double consonant letter, its purpose is usually to ensure that the precedingsyllable is orthographically closed, so that the vowel sound will be short.Consider the words hop and hopping. We double the (p) when we add the(ing) to hop in hopping, because without the double consonant letter, we wouldhave hoping with an orthographically open first syllable, and the vowel soundwould be long. Doubling the (p) ensures that the first syllable will remainclosed orthographically so that the vowel sound will be short.The distinction between open and closed syllable rimes is therefore per-vasive in English spelling. By and large, a vowel sound in an orthographi-cally closed syllable rime is short, while a vowel sound in an orthographicallyopen syllable rime is long. The system thus depends on syllable structureand in particular on the distinction between orthographically open and closedsyllable rimes. But this pattern of orthographically open and closed syllablerimes is quite abstract and sometimes difficult to grasp. It is also distant fromphonetic reality, since it depends on written syllables that are distinct fromthe actual spoken syllables that a child can learn to perceive with minimaltraining.Rime spellingAs we noted above, traditional spelling instruction emphasized only the twoextremes of individual letters and whole words. Other levels of regularity werelargely ignored by educators. Current spelling instruction programs takeadvantage of letter/sound, whole-word and morphological regularities together.The question that provoked this study is whether individual rime spellings

    ENGLISH VOWEL SPELLING 257like the (ight) of light show their own special regularity, distinct from thegeneral patterns of long and short vowel pronunciations that we find in openand closed written syllables. Since individual rime spellings are much moreconcrete than the open and closed syllable rime types that determine the dis-tribution of long and short pronunciations of vowel letters, usingindividual rimes might be beneficial in teaching. But the first question thathas to be answered is whether these individual rime spellings play a role inthe system. The primary purpose of our study was, therefore, to investigatethe predictive value of individual rime spellings compared directly to thegeneral pattern of long and short vowel letter pronunciations in open andclosed written syllable rimes, as well as compared to the other types ofregularities that we have identified beyond the level of the single letter(morphemes and whole words). If rime spellings are indeed predictive on theirown, then it makes sense to add them to the repertoire of teaching tools.We begin this study because we had noticed that certain rime spellings doindeed have predictive value. Compare the vowel digraph (00) and the rimespelling (ook). (00) is normally pronounced as /uw/ (e.g., boot, broom, cool),but (ook) is nearly always pronounced as /II/: book, cook, hook, shook, look,nook, snook, rook, brook, crook, forsook, and rook are consistent with the /U/pronunciation, while only spook has the predicted /uw/ pronunciation. It wouldseem much easier for a child to learn one fact ((ook) at the end of a syllableis pronounced /VW) than to have to memorize separately each word that endsin (ook).


    In order to investigate the differential impact of the types of linguistic regu-larity in English spelling that we have identified, we undertook a computer-assisted analysis of the spelling patterns found in a list of twenty-four thousandwritten words, each paired with its corresponding pronunciation. The listcontained both very common simple words such as bat and less frequentcomplex words such as abnegation. We programmed a computer so that eachwritten word on this list could be transformed automatically through a seriesof steps from spelling into as much pronunciation transcription as possible,using the four types of spelling regularity that we have identified: whole-word,morphological, rime, and syllable rime-type. Since the consonant letters ofEnglish do not present as great a problem to learners, only the vowel letterswere examined. The amount of transformation from written word to pronun-ciation naturally depends upon the impact that each type of regularity has. Foreach type of spelling regularity, the number of affected syllables and thenumber of affected words in the entire list were recorded. Most importantly,the hypothetical pronunciation that was arrived at by using each type ofregularity individually was then compared against the actual pronunciationand an accuracy percentage for each type of regularity was established. After

    258 MARK ARONOFF AND ERIC KOCHthe types were examined individually, the four types were examined collec-tively and an accuracy percentage was established for all four types when usedtogether. The priority order for the types used in succession was whole-word,morphological, rime and syllable rime-type. This order is based upon theprojected size of the domain that each type of regularity should affect. Thus,whole-word analysis replaces an entire word with pronunciation whereassyllable rime-type analysis only replaces one syllable of a word at a time.Written syllables and rime spellingsBoth the traditional method of distinguishing between long and short vowelsounds and the rime-based method depend on being able to break a writtenwork into written syllables. Most people are quite adept at breaking spokenwords up into syllables and young children generally find it much easier tomanipulate spoken syllables than to deal with individual speech sounds, asfirst shown by Savin (1972). However, this ability to break spoken words upinto syllables, although useful for developing language awareness, cannot betranslated directly into reading, for the simple but often overlooked reasonthat a person cannot divide a word into spoken syllables without first knowinghow the entire word sounds. A child who is looking at a written word on paperand is trying to read it does not know how it sounds. Indeed, figuring outhow the word sounds is precisely what we are trying to teach the child to do.The ability to break a spoken word up into syllables is therefore not directlytransferrable at first to the act of reading an unfamiliar work, although it isuseful in spelling unfamiliar words, which involves the inverse task of turningspoken words into written words (Goswami and Bryant 1990). If we are tryingto measure the value of written syllables and rimes in decoding, then wemust not look at spoken syllables but rather written syllables. If we can breaka written word up into written syllables without relying on how the wordsounds, and if the pronunciation of these written syllables is regular, then itmay be useful to use written syllable structure directly in teaching childrento read. But a method of written syllable division that depends on spokenwords, although it might have some indirect value, is not what we are after.Stanback (1991, 1992), in her pioneering study of individual rime spellings,employs such a speech-dependent method of syllable division, based onKenyon (1934). Kenyons rules, however, rely on the spoken form of a word,as well as its written form. For example, decade is divided syllabically byKenyon as [dec]a[ade]o, but parade is divided as [pa]o[rade]o. One beginswith an open syllable and the other with a closed syllable, even though thetwo words have exactly the same written structure: CVCVCe . Closer inspec-tion of Kenyons rules for syllable division explains the discrepancy: decadeand parade differ in their spoken form; decade is stressed on the first vowel,while parade is stressed on the second vowel and the first is reduced. Thisdifference in stress, which is not detectable from spelling, determines thedifference in syllable structure.

    ENGLISH VOWEL SPELLING 259Since Kenyons rules require knowledge of stress within a spoken word,

    as well as finding particular speech sounds within a word, they do not meetour needs. A child who did not know how to pronounce a particular writtenword could not apply Kenyons rules to that word, whereas if the child knewhow to pronounce the word, then she or he could apply Kenyons rules, butwould have no need to learn the pronunciation. In dividing a written word upinto open and closed written syllables to which we could apply our analysis,we therefore employed an algorithm that depended only on the written dis-tinction between consonant letters and vowel letters: If a vowel letter isfollowed by at least one consonant letter at the end of a word or at least twoconsonant letters anywhere within the word (allowing for digraphs like (th)to count as one letter), the written syllable containing that vowel is definedas closed; in all other cases, the written syllable is defined as open.6 FollowingStanback, we do not limit the categorization of the written syllable to openand closed. A sequence of vowel letters or a vowel letter followed by (w ) or(y) is classified as a vowel digruph syllable and a vowel followed by (r) isclassified as a vowel-r syllable. A minor syllable type is con~onanr -1e or-re, in which a vowel letter is followed by a single consonant letter, which isin turn followed by (le) or (re) (as in ruble or fibre). Syllables of this typepattern are like open syllables, in that the pronunciation of the vowel letteris normally long. The category silent e, whose name implies knowledge ofsilent letters in words, has been reinterpreted as one in which words end in a(VCe) letter pattern. This gives us the six types of written syllables shown inTable 5, which are essentially identical to Stanbacks.

    Table 5. Written syllable types with examplesSyllable type Example (using the letter o)openclosedvowel digraphvowel + rVceconsonant -1e or -re

    hellospotroadfornotenoble ogre

    ANALYSISIn order to permit us to perform the analysis, the spellings in our list of twenty-four thousand written words and their corresponding pronunciations were firstaltered so that all consonant letters were treated simply as vowel separators.The written words were then broken down into syllable types, using the simplealgorithm discussed above. So, for example, the written word and pronunci-ation of the entry for candidate would undergo the following transformation:written word: candidate + [closed with (a)]o[open with (i)]o[(a-e )]ospoken word: /kzndIdejt/ + ae-I-ej

    260 MARK ARONOFF AND ERIC KOCHIn this example, (a) in a closed written syllable corresponds to /a/, (i) in anopen written syllable corresponds to /I/ and (a) before a word-final (e)corresponds to /ej/.The predictive value of rime spellingsBy analyzing every written syllable of all twenty-four thousand written words,a correlation is established between the written syllable types and their actualpronunciation. Table 6 shows, for example, how (a) in a closed written syllableis pronounced (from highest percentage to lowest).

    Table 6. The pronunciation of the letter a in close syllablesPronunciation Percentage/a?/ 61.40/al 25.90/a/ 05.00law/ 00.02

    Since /a~/ is the most common (modal) pronunciation for (a) in a closedwritten syllable, it is the default pronunciation of (a) in a closed writtensyllable (the pronunciation used in the absence of information about whichconsonant letter follows (a) in a closed written syllable). To assess thepredictive value of individual rime spellings in determining pronunciation, allnon-modal pronunciations with a frequency of five percent or higher for agiven vowel letter in closed written syllables are examined and comparedagainst the default. We then look for rime-spelling patterns within each non-modal pronunciation for each vowel letter. Altogether, hundreds of candidaterime spelling pronunciations can be found using this method. However,because our goal is simultaneously scientific and pedagogical (we want toknow both how the system works and how learners can better take advantageof this system), we focus on instances where the rime-spelling pronunciationis both distinct from the modal pronunciation and is valid within the par-ticular rime spelling for at least half the cases. This permits us to cast therime-spelling pronunciation as a rule within its domain. After discarding allrime spellings that are either identical to the modal value for the vowel letteror inconsistent by these criteria, only twenty-seven independently regularrime spellings remain. However, even many of these are suspect. For example,the (oot) rime spelling is pronounced 60.7% of the time as /U/ and 39.3%of the time as /uw/, which is the modal pronunciation for (oot); however,except for soot, every instance in which (oot) is pronounced as /U/ is eithera derivative of foot (e.g., footsie) or a compound word in which the wordfoot is included (e.g., footpad). In addition to the word foot, the followingwords create similar illusory rime spellings: child, mild, wild, pest, most, come,some, work, wood, good, hood, and give. Additionally, certain commonly

    ENGLISH VOWEL SPELLING 261occurring morphemes have irregular pronunciations which should not beattributed to rime spelling pronunciation: -some, -ceive, -hood, -ous, -age,-at-d, -ive. Furthermore, it seems that -age, are, a-d, -ive are not irregularpronunciations per se, but the results of vowel reduction in unstressed envi-ronments. After eliminating inconsistent rime spellings and the rime spellingsthat can be explained due to one or two common words, the rime spellingsshown in Table 7 remain.

    As there are approximately twenty-four thousand words in the listexamined, the six hundred eighteen words in Table 7 account for only 2.6%of them. This contrasts with Stanbacks conclusion, which is based on spokensyllables and does not factor out the contribution of individual high-frequencywords and default pronunciations of vowel letters.Table 7. Consistent rime spellingsRime Example Consistency (%) Total consistentai+Cind*ighight011igueiqueogueeadookowlown



    96 I24 wordsII 61 words100 39 words100 108 words77 17 words100 2 words100 IO words75 I6 words88 1 4 words98 60 words72 I4 words95 53 words

    Total = 618 words* ind is recalculated so that all compound words containing wind are counted as one word;otherwise, ind would be 44.3% consistent.Whole word spelling of high-frequency wordsIn the most extreme case of an ineffective alphabetic spelling system, onewould have to memorize the pronunciation of every word and could deriveno phonetic value from the letters. While this is certainly not the case whereEnglish consonant letters are concerned, English vowel letters seem extremelyunpredictable: In the list of twenty-four thousand words, (a), (e), (i) and (0)are pronounced more than a dozen different ways each; (u) is pronouncedten different ways and (y) is pronounced six different ways. This does notmean, however, that the distribution of each pronunciation is equal. As is wellknown, high frequency words are more likely to have irregular pronuncia-tions. They are also the elements with which compound words are most oftenformed. In order to assess the actual impact of high-frequency words, a listof those written words which both have irregular pronunciations and appear

    262 MARK ARONOFF AND ERIC KOCHwithin the first 1,068 most frequent written words (according to Francis &KuEera 1982: 465-476) was compiled. Next, these irregular, high-frequencywritten words were searched for within the twenty-four-thousand-word listas either words standing on their own or parts of words. Once encountered,these words were replaced with their actual pronunciation. A total of 338words out of the list of 1,068 (31.7%) were found to have irregular pronun-ciations. Through search and replace, 3,127 syllables out of 64,847 totalsyllables in the word list (4.8%) were changed. This involved syllables in1,933 words out of the 23,854 total words (8.1%). By definition, the pro-nunciation was accurate at a level of 100.0% (as the words were replacedwith their actual pronunciation). Thus, while memorizing whole words is bydefinition a perfect guide to pronunciation, its overall impact seems to besmall, even for the most frequent words. Since irregularity is closely relatedto frequency, enlarging the word list to include a greater number of lessfrequent words should only decrease the percentage of irregular pronuncia-tion. We conclude that irregular whole words are less important in Englishspelling than has sometimes been thought.Morpheme spellingThe English language contains a great number of latinate prefixes and suffixes.We sought in our analysis to gauge the regularity of the spelling of theseaffixes. Because the word list is in alphabetical order, examining the effectof prefixes was quite easy; however, in order to examine the suffixes, the wordlist had to be arranged in reverse alphabetical order. After this was done, acomprehensive list of prefixes and suffixes was compiled. The word list wasthen broken down into several smaller lists based solely upon the initial lettersfor prefixes or the final letters for suffixes. Thus, any word which began with(ab) was treated as a word which began with the &-prefix, despite theexistence of words which begin with (ab) but not with the ab- prefix (e.g.,ab~cus).~ Still, the prefixes and suffixes were more consistently pronouncedin 116 out of 138 cases (84.1%) than if syllable-type analysis had been used.When written prefixes and suffixes were replaced with pronunciation, theyyielded an accuracy of 84.7%. The accuracy was disproportionately slantedtowards the suffixes, which were 92.0% accurate, over the prefixes, whichwere only 64.4% accurate. Not only was the accuracy of the affixes quite high,but the overall scope of the affixes was also very impressive: 20,960 out of64,847 syllables (32.3%) in the entire list and 14,522 out of 23,854 words(60.9%) were affected by this crude method of morphological analysis. Again,the suffixes were more impressive in their scope than the prefixes: 15,421 ofthe affected syllables (73.6%) were affected by suffixes while only 5,539syllables (26.4%) were affected by prefixes. The consistent pronunciation ofboth prefixes and suffixes seems to contribute greatly to the readability ofEnglish; with a 92.0% accuracy and a scope which is-23.8% of the entire wordlist, suffixes seem especially important tools in reading English.

    ENGLISH VOWEL SPELLING 263Syllable-type analysisThe criteria used in identifying written syllable types are easy to apply;together they yield sixty-two types, each with its own pronunciation value,as shown in Tables 2 through 4 (28 digraph syllable types; 10 digraph +rsyllable types; and 24 single vowel syllable types). The clear advantage to thesyllable-type method of analysis is that all of the vowel letters in the 23,854word list were encompassed by this one method alone. When compared toactual pronunciation, syllable-type analysis produced 42,079 correct syllablesout of 64,854 (64.9%).All four methods of analysis in successionInstead of returning to the original word list after each method of analysis,the methods were applied in succession, starting with the method that affectedthe largest units and proceeding in order of the diminishing size and speci-ficity of the affected units. Whole-word analysis of frequent words was donefirst; morphological analysis was applied to the results of the whole-wordanalysis; riming analysis was applied to the combined results of the whole-word and morphological analyses; and syllable-type analysis was applied tothe results of the three others.

    The first interesting observation is that, after the whole-word and mor-phological analyses were completed, individual rime spelling only affected469 syllables (0.7% of all syllables) as opposed to the 2,254 syllables (3.5%of all syllables) that were affected when individual rime spelling analysiswas done on its own. Morphological analysis still accounted for 32.3% of allsyllables and whole-word analysis of frequent words still accounted for 4.8%of all syllables. After morphological, whole-word and rime spelling analyseswere done, syllable-type analysis transformed the remaining 40,289 syllables(62.1%). The aggregate of all four analytical methods generated 43,455 correctvowel pronunciations out of 64,845 total vowels or vowel digraphs (70.5%).This is not a large improvement (8.6%) over using the syllable-type methodof analysis by itself.


    We set out to find out whether individual rime spellings in English are regularin their pronunciation by performing a computational analysis of a list of23,854 written words paired with their pronunciations. This analysis revealedthat only a small number of individual rime spellings are indeed regular intheir pronunciations, where regular is defined as pronounced more than halfthe time in a way different from the otherwise expected modal value of thevowel in syllables of that general type. However, the analysis showed thatanalysis by written syllable type is the most reliable key to English pronun-

    264 MARK ARONOFF AND ERIC KOCHciation. This suggests that methods of reading instruction that depend on thismethod of analysis reflect well the basic structure of English spelling. Butwe must emphasize that the dichotomy of long and short vowel letter pro-nunciations (the traditional analogue of what we are calling syllable type) isbased on the division of words into written syllables, a fact which is oftenoverlooked. Our findings thus support the general notion that English spellingis based at least in part on syllable structure (Treiman 1992). In addition, ouranalysis showed that prefixes and suffixes provide reliable clues to pronun-ciation and suggests that their regularity should be exploited in the teachingof reading.

    ACKNOWLEDGEMENTSThis work was supported by a grant from the Spencer Foundation to TheResearch Foundation of the State University of New York. Thanks to FrankAnshen, Judith Klavans, and Mark Liberman for assistance. Thanks to RebeccaTreiman for comments on an earlier version.



    We distinguish letters, written words, and sounds as follows: we write letters or sequencesof letters within angled brackets: we italicize written words; and we put sounds betweenthe slash marks that linguists traditionally use for phonemes. So, the written word loveends in (e) and corresponds to the spoken word /lav/. We use the symbol to stand for thebeginning or end of a word. For example, words ending in (e) are represented by means ofthe formula (e) .The only exceptions are compounds: blackguard (in which (ck) is silent) and words likehandgun and headgear, where the (dg) sequence spans two members of the compound.French is the only other language whose writing system so systematically distinguisheshomonyms (Aronoff 1994).In this abbreviation, C stands for one or more consonants.The original database contained standard British pronunciation. We have modified this inthe direction of American English.This algorithm will lead to anomalies in a small number of cases involving (ph) and (th).These are usually digraphs, but sometimes, as in shepherd or rathole, they are not. Thereare too few of the latter type to disturb our analysis. Similarly, we treat all sequences ofvowel letters as digraphs, but again, there are cases like rearm, where they are notpronounced as such.We set the threshold at half for several reasons. First, from a pedagogical point o f view, ifwe were to count lesser generalizations, we would be faced with disjunctive statements likethe following: in most of the words of the form (V,C,), (V,) corresponds to /VJ, but in asmaller fraction of such words, (V,) corresponds to /V,/. In some cases, there would bemore than two correspondences; furthermore, we would have to select a particular thresholdvalue for these lesser generalizations. The pedagogical difficult ies are obvious: learnerswould be asked to master several patterns for a given rime spelling. For example, Stanbackshows that (0) in (0th) is pronounced thirteen times as /o/ in her data; however, in thesame data it is pronounced fourteen times as /a/, the modal pronunciation for (0) in closed

    ENGLISH VOWEL SPELLING 265syllables, and three times as low/. The putative rime spelling pronunciation in this examplewould thus be beneficial less than thirty percent of the time. I f we wanted to apply thisobservation about the pronunciation of (0th). we would have to tell learners that there aretwo common ways to pronounce this sequence. By contrast, the f i f ty percent thresholdfocuses on those rime spelling pronunciations that are true most of the time. Our secondrationale is scientific: we want to know how English spelling works, in particular whetherindividual rime spellings are part of the system. By setting the threshold so high, we greatlyreduce the risk of finding something that is not really there. Pedagogy and sciencereinforce each other in this case, which is a rare occurrence.

    8. This method does not constitute real morphological analysis, but it is not realistic to expectthe average reader, especially a child, to have enough conscious comprehension of Englishmorphology to be able to parse words into their latinate prefixes and suffixes, so that thiscrude method may be more realistic for our purposes than a more sophisticated one. Themethod can in fact only diminish the results of our analysis, because it catches in its net afairly large number o f invalid cases.

    REFERENCESAronoff, M. (1994). Spelling as culture. In: W. C. Watt (ed.), Writing systems and cognition(pp. 67-86). Dordrecht: Kluwer Academic Publishers.Francis, W. N. & Kucera, H. (1982). Frequency analysis ofEnglish usage: Lexicon and grammar.Boston: Houghton Mifflin.Goswami, U. & Bryant, P. (1990). Phonolog ical skills and learning to read. Hillsdale, NJ:Erlbaum.Huey, E. B. ( 190811968). The psychology and pedagogy of reading. Cam bridge, MA: MIT Press.Kenyon, J. S. (1934). Rules for the syllabic division of words in writing or print. In: Websters

    New Internation al Dicfionary, 2nd ed. (pp. Ivii i-lix). Springfield, MA: Merriam-Webster.Savin, H. B. (1972). What the child knows about speech when he begins to read. In: J. F.Kavanagh & I. Mattingly (eds.), Langua ge by eye and by ear (pp. 319-328). Cambridge,MA: MIT Press.Stanback, M. L. (1991). Syllabic and rime patterns for teaching reading: Analysis of a frequency-based vocabulary of 17,602 words. Dissertation. Teachers College, Columbia University.Stanback, M. L. (1992). Syllable and rime patterns for teaching reading: Analysis of a frequency-based vocabulary of 17,602 words, Anna ls of Dyslexia 42: 196-221.

    Treiman, R. (1992). The role of intrasyllabic units in learning to read and spell. In: P. Gough,L. Ehri & R. Treiman (eds.), Reading acquisition (pp. 65-106). Hillsdale, NJ: Erlbaum.Venezky, R. L. (1970). The sfructure of Eng lish orthography. The Hague: Mouton.

    Address for correspondence: Dr Mark Aronoff, Department of Linguistics, State University ofNew York at Stony Brook, Stony Brook, NY 11794-4376, USAPhone: (516) 632 7777; Fax: (516) 632 9789; E-mail: [email protected]