production of the word-final english /t/-/d/ contrast by ....../t/-/d/contrast. native...

16
Production of the word-final English/t/-/d/contrast by native speakers of English, Mandarin, and Spanish JamesEmil Flege,MurrayJ. Munro, and Laurie Skelton Department ofBiocommunication, University ofAlabama at Birmingham, Birmingham, Alabama 35294-0019 (Received 18 April 1991; accepted forpublication 1April 1992) The primaryaim of this study wasto determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position canmaster the English word-final /t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in fivegroups: native speakers of English, experienced and inexperienced nativeSpanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops werenot identified significantly betterthanstops produced by the inexperienced L2 learners; andtheir stops werecorrectly identified significantly less often than stops produced by thenative English speakers. Acoustic analyses revealed that thenative English speakers made vowels significantly longer before/d/than/t/, produced/t/-final words with a higher F 1 offset frequency than/d/-final words, produced moreclosure voicing in/d/than /t/, andsustained closure longer for/t/than/d/. The L2 learners produced the same kinds of acoustic differences between/t/and/d/, but theirs wereusually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study hadmastered the English word-final/t/-/d/contrast. Several possible explanations for thisnegative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeitsignificant, vowel duration differences produced in minimalpairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final stops in minimalpairs whenthe perceptual judgments wereobtained in an "edited"condition (wherepost-vocalic cues wereremoved) than in a "full cue"condition. This suggested that listeners may modifytheir identification of stops based on the availability of acoustic cues. PACS numbers: 43.70.Kv, 43.70.Fq, 43.71.Hw INTRODUCTION Adults who learn a second language (L2) often have difficultyproducing consonants that are found in the L2 but not in their native language (L I ). For example, Flegeand Davidian (1984) transcribed the final stops in minimally paired English words endingin/b d g/and/p t k/that had been spoken by native speakers of English,Spanish, and Taiwanese. • Most ofthe native speakers' stops were heard as intended. However, a few of the nonnative speakers' stops were perceived to havebeen omitted; and overone-thirdof their/b,d,g/tokens were perceived to havebeendevoiced. The segmentalproduction errors that were noted appear to havearisen from phonological differences between English and the nonnativesubjects' LI. Taiwanese permits/p t k/ but not/b d g/in word-finalposition(Cheng, 1968). Span- ishissaid to have a phonological rulewhichdevoices phono- logically voicedstops in utterance-final position. The few word-finalstops that do occurin Spanish wordstend to be spirantized or omitted (Harris, 1969). According to the speech learning model (SLM) devel- opedby Flege,adult learners of an L2 who have received sufficient native-speaker inputcanmaster "new" sounds in the L2 (Flege, 1988a, 1991, 1992a,b). By definition, new sounds are those L2 vowels and consonants that learners of an L2 regardas falling outside the L1 phonetic inventory. According to the SLM, sounds classified as"similar" cannot bemastered because theycontinue to beidentified perceptu- ally in terms of a phonetically distinct counterpart in the L1 (e.g., the aspirated/t/of English vis-•-vis the unaspirated /t/of Spanish). Equivalence classification is said to prevent L2 learners fromestablishing phonetic categories for similar L2 vowelsand consonants, which limits the accuracy with which these L2 sounds can beproduced (Flege,1987).The primary purpose ofthis study was to test the SLM'shypothe- sis regarding the production of new L2 consonants. More specifically, we wanted to learn if it is possible for adults whose L 1 lacks a contrast between word-final/t/and/d/to accurately produce/t/and/d/in the final position of Eng- lish words. The two groups of native Spanish subjects examined in thepresent study differed according to length of residence in the US. All of these subjects were living in a predominantly English-speaking environment at thetimeof thestudy; most wereaffiliated with a largeurbanmedical center. If subjects in the more experienced group had received sufficient L2 128 J. Acoust. Soc. Am. 92 (1), July 1992 0001-4966/92/070128-16500.80 @ 1992 Acoustical Society ofAmerica 128

Upload: others

Post on 09-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

Production of the word-final English/t/-/d/contrast by native speakers of English, Mandarin, and Spanish

James Emil Flege, MurrayJ. Munro, and Laurie Skelton Department of Biocommunication, University of Alabama at Birmingham, Birmingham, Alabama 35294-0019

(Received 18 April 1991; accepted for publication 1 April 1992)

The primary aim of this study was to determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position can master the English word-final /t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in five groups: native speakers of English, experienced and inexperienced native Spanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops were not identified significantly better than stops produced by the inexperienced L2 learners; and their stops were correctly identified significantly less often than stops produced by the native English speakers. Acoustic analyses revealed that the native English speakers made vowels significantly longer before/d/than/t/, produced/t/-final words with a higher F 1 offset frequency than/d/-final words, produced more closure voicing in/d/than /t/, and sustained closure longer for/t/than/d/. The L2 learners produced the same kinds of acoustic differences between/t/and/d/, but theirs were usually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study had mastered the English word-final/t/-/d/contrast. Several possible explanations for this negative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeit significant, vowel duration differences produced in minimal pairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final

stops in minimal pairs when the perceptual judgments were obtained in an "edited" condition (where post-vocalic cues were removed) than in a "full cue" condition. This suggested that listeners may modify their identification of stops based on the availability of acoustic cues.

PACS numbers: 43.70.Kv, 43.70.Fq, 43.71.Hw

INTRODUCTION

Adults who learn a second language (L2) often have difficulty producing consonants that are found in the L2 but not in their native language (L I ). For example, Flege and Davidian (1984) transcribed the final stops in minimally paired English words ending in/b d g/and/p t k/that had been spoken by native speakers of English, Spanish, and Taiwanese. • Most of the native speakers' stops were heard as intended. However, a few of the nonnative speakers' stops were perceived to have been omitted; and over one-third of their/b,d,g/tokens were perceived to have been devoiced. The segmental production errors that were noted appear to have arisen from phonological differences between English and the nonnative subjects' LI. Taiwanese permits/p t k/ but not/b d g/in word-final position (Cheng, 1968). Span- ish is said to have a phonological rule which devoices phono- logically voiced stops in utterance-final position. The few word-final stops that do occur in Spanish words tend to be spirantized or omitted (Harris, 1969).

According to the speech learning model (SLM) devel- oped by Flege, adult learners of an L2 who have received sufficient native-speaker input can master "new" sounds in

the L2 (Flege, 1988a, 1991, 1992a,b). By definition, new sounds are those L2 vowels and consonants that learners of

an L2 regard as falling outside the L1 phonetic inventory. According to the SLM, sounds classified as "similar" cannot be mastered because they continue to be identified perceptu- ally in terms of a phonetically distinct counterpart in the L1 (e.g., the aspirated/t/of English vis-•-vis the unaspirated /t/of Spanish). Equivalence classification is said to prevent L2 learners from establishing phonetic categories for similar L2 vowels and consonants, which limits the accuracy with which these L2 sounds can be produced (Flege, 1987). The primary purpose of this study was to test the SLM's hypothe- sis regarding the production of new L2 consonants. More specifically, we wanted to learn if it is possible for adults whose L 1 lacks a contrast between word-final/t/and/d/to

accurately produce/t/and/d/in the final position of Eng- lish words.

The two groups of native Spanish subjects examined in the present study differed according to length of residence in the US. All of these subjects were living in a predominantly English-speaking environment at the time of the study; most were affiliated with a large urban medical center. If subjects in the more experienced group had received sufficient L2

128 J. Acoust. Soc. Am. 92 (1), July 1992 0001-4966/92/070128-16500.80 @ 1992 Acoustical Society of America 128

Page 2: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

input, the SLM would predict their mastery of English/t/ and/d/. However, given continuing uncertainty as to what constitutes "sufficient" L2 input, a mbre conservative pre- diction that one might derive from the SLM is that only some of the experienced Spanish subjects would master English /t/and/d/, although all of the experienced subjects should resemble the native English speakers to a greater extent than subjects in the inexperienced Spanish group.

Both predictions were based on an important assump- tion about how the Spanish subjects would perceive word- final English stops, viz., that the experienced Spanish speak- ers of English would not identify word-final English/t/and /d/tokens in terms of a pre-existing Spanish stop category. Such an assumption might be questioned, however. The pho- nemes /p/, /t/, /k/, /b/, and/g/do not occur in the final position of native Spanish words, but they do occur in some loanwords and proper names of Catalonian origin. More- over, some native Spanish words such as hovedad ("no- velty") or ciudad ("city") end in/d/, although this phon- eme is normally realized as a voiced or voiceless fricative. It is conceivable, therefore, that experienced Spanish speakers of English might persist in identifying word-final realiza- tions of English/d/( and possibly/t/) in terms of a Spanish stop category. If so, the SLM would predict inaccurate, Spanish-like realizations of English/d/and/t/by even the experienced Spanish subjects.

We did not assess the interlingual identification of LI and L2 stops in the present study, which means that differ- ences between the native English and experienced Spanish subjects, should any be observed, would be difficult to inter- pret. As a safeguard, therefore, we examined two groups of native Mandarin subjects. These subjects would not be ex- pected to identify word-final tokens of either English/t/or /d/with word-final stops in their LI because Mandarin does not permit any obstruent consonants--voiced or voiceless-- to occur in word-final position (Cheng, 1973; Howie, 1976; Norman, 1988). If new L2 consonants can indeed be learned

by adults as hypothesized by the SLM, one would expect the experienced Mandarin subjects to outperform relatively in- experienced Mandarin subjects, and also to closely match native speakers of English in producing word-final/t/and /d/.

Based on previous research, we expected the native speakers of English to make vowels substantially longer be- fore/d/than/t/, to sustain closure longer for/t/than/d/, to sustain glottal pulsing longer in the closure interval of/d/ than/t/, and to produce words ending in/t/with a higher first formant (F 1 ) offset frequency than words ending in/d/ (Peterson and Lehiste, 1960; House, 1961; Suen and Bed- does, 1974; Wolf, 1978; Repp, 1978; Port, 1979, 1981; Ho- gan and Rozsypal, 1980; Raphael, 1981; Hillenbrand et aL, 1984; Lisker, 1986). The predictions stated above were test- ed by comparing acoustic measurements of stops produced by the native speakers of English and the two experienced L2 groups, and by comparing stops produced by the relatively •xp•riene•ed and in•np•ri•n•d L2 learners.

Predictions of the SLM were also tested by having na- tive English-speaking listeners identify the voicing feature in final stops produced by the native and nonnative speakers.

Previous research suggested that all four acoustic dimen- sions mentioned above would influence the native English listeners' judgments of stops produced by fellow native speakers of English. This was not tested, however, because there was little variability in the data obtained from the na- tive English speakers, whose/t/'s and/d/'s were identified correctly in most instances. An aim of the study was to assess the perceptual importance of the four acoustic phonetic dis- tinctions produced by the nonnative speakers, whose stops were frequently misidentified. Vowel duration, which serves as a perceptual cue to the voicing feature in word-final stops produced by native speakers of English in utterance-final or prepausal position, was of special interest (e.g., Raphael, 1972; Repp, 1978; Wardrip-Fruin, 1982; Hogan and Rozsy- pal, 1980; Port and Dalby, 1982). Crowther and Mann (1990) recently measured vowel duration and F 1 offset fre- quency in English words spoken by native speakers of Man- darin and Japanese. These subjects produced a substantially smaller vowel duration distinction in an English minimal pair (pod versus pot} than did native speakers of English. (The nonnative speakers also produced a smaller F 1 offset frequency difference than did the native English speakers.) The relatively small "voicing effect" produced by the nonna- tive speakers was expected from previous studies showing that the effect of stop voicing on preceding vowel duration is larger in English than in many other languages (e.g., Chen, 1970; Flege and Port, 1981; Stevens et al., 1986) and from studies showing that adult learners of English tend to pro- duce smaller voicing effects than native speakers of English (Suomi, 1976; Elsendoorn, 1980, 1982; F}ege and Port, 1981; Mitleb, 1981a,b; Mack, 1982; Port and Mitleb, 1983; Lowie, 1988; Hiramatsu, 1990). The small voicing effects produced by Crowther and Mann's nonnative subjects might account, at least in part, for why roughly half of their final/d/tokens were heard as/t/. This is by no means cer- tain, however, because the misidentifications of/d/ may have been due to acoustic dimensions other than vowel dura-

tion, such as F 1 offset frequency, closure voicing, or closure duration.

The perceptual efficacy of acoustic cues to the/t/-/d/ contrasts produced by the normarive speakers was assessed in multiple correlation analyses. These analyses examined acoustic measurements of minimally paired CVC words pro- duced by the normarive speakers and native English listen- ers' identifications of final stops in those CVCs. It was hoped that these analyses would provide insight into how listeners process phonetic contrasts that do not fully conform to the phonetic norms of their native language. It is known that foreign-accented speech becomes more intelligible as listen- ers gain familiarity with it (Wingstedt and Schulman, 1984), perhaps because the listeners develop "correspon- dence rules" relating incorrectly implemented sounds onto the intended target-language phonetic categories. Here, we tested the hypothesis that listeners are capable of modifying their use of acoustic cues in phonetic segments produced by nonlmtivc speakera. CVC• were presented in two listening conditions. In the "full cue" condition, the native English- speaking listeners heard unmodified versions of the eve words. In an "edited" stimulus condition, any energy pres-

129 d. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Fiego et•/.: Production of the word-final Itl-ldl contrast 129

Page 3: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

ent in the final stop closure interval was removed along with the final release burst (if one was present). We expected lower rates of correct identification for stops in the edited condition than in the full-cue condition (Mal•cot, 1958; Ohde and Sharf, 1977; Picheny et al., 1986; Flege, 1989; Flege and Wang, 1990). If listeners are able to modify seg- mental perception, the correlation between the size of vowel duration differences in minimal pairs produced by the non- native speakers and the native English listeners' identifica- tion of final stops in those minimal pairs should be signifi- cantly stronger in the edited than full-cue condition because fewer acoustic cues to the/t/-/d/contrast were available in the edited condition.

TABLE I. Characteristics of the talkers in five groups who participated in the study. Each group had five male and five female subjects. Standard de- viations are in parentheses.

Subject group AGE • YRS b AOA c LOR d PER e

Native 27.9 ............

English (6.5)

Inexperienced 26.7 6.1 25.7 0.4 68.0 Spanish (2.9) (2.8) (3.9) (0.1) (23.8)

Experienced 28.2 7.1 19.9 9.0 75.2 Spanish (5.7) (3.7) (5.2) (5.3) (10.8)

Inexperienced 28.4 7.4 27.2 0.9 32.0 Mandarin (2.9) (2.4) (2.5) (0.6) (15.5)

I. METHOD

Native and nonnative talkers produced minimally paired CVC words. The CVCs were low-pass filtered at 4.8 kHz and digitized at 10.0 kHz with 12-bit amplitude resolu- tion. After preparation (see below), half of the CVCs were presented to native English-speaking listeners who identified word-final stops as/t/or/d/. All of the CVCs were exam- ined acoustically; then the acoustic measures were related in correlational analyses to the listeners' judgments of CVCs for which both acoustic and perceptual data were available.

A. Talkers

Five males and five females in each of five groups, all of whom were recruited through advertisements in the univer- sity newspaper or personal contacts, participated as talkers. All of these subjects were living in Birmingham, AL, at the time of the study. Subjects in the native English group were monolingual native speakers of American English with a mean age of 28 years. Five were from Alabama, and one each was from South Dakota, Missouri, Washington, DC, Ohio, and Tennessee. The native Spanish subjects came from sev- eral countries in Central and South America (Colombia--9, Nicaragua--2, Puerto Ricoh2, Chiles2, Bolivia--l, Hon- duras-I, Venezuela--l, Guatemala--l, Argentina--l). Those in the "experienced" group had lived in the US longer than those in a relatively "inexperienced" group (9.0 vs 0.4 years on the average). Half of the native Mandarin speakers were from mainland China, the rest from Taiwan. The "ex- perienced" Mandarin subjects had lived in the US longer than those in the "inexperienced" group (5.5 vs 0.9 years). A native Mandarin research assistant verified that all of the

Chinese subjects, including those from Taiwan, spoke Man- darin as their native language.

All of the nonnative subjects had learned English as an L2 in adulthood. The relatively experienced and inexper- ienced L2 learners differed little in chronological age and amount of formal education in English. However, as sum- marized in Table I they differed somewhat according to oth- er factors that could potentially have influenced their ability to pronounce English. Compared to the inexperienced Span- ish subjects, the experienced native Spanish subjects had ar- rived in the US at a slightly earlier age (20 vs 26 years) and estimated speaking English slightly more often on a daily

Experienced 28.1 6.0 22.6 5.5 51.0 Mandarin (3.0) (1.8) (3.6) (2.5) (22.3)

AGE--chronological age, in years. YRS•years of classroom instruction in English. AOA--age upon arrival in the United States, in years. LOR--length of residence in the United States, in years. PER--self-estimates of percentage daily use of English.

basis (75% vs 68%). Compared to the relatively inexper- ienced Mandarin subjects, the experienced Mandarin sub- jects had arrived in the US at a slightly earlier age (23 vs 27 years) and estimated speaking English slightly more often (51% vs 32%). We assumed, however, that differences be- tween the experienced and inexperienced L2 learners, should any be observed, could be attributed primarily to dif- ferences in overall amount of English-language experience.

B. Speech stimuli

Each talker read four lists in which 34 randomized CVC

words occurred at the end of the carrier phrase "I will say__." All C¾Cs on each list contained either/i/,/I/,/e/, or/•e/. (We blocked on vowel to minimizing spelling-in- duced mispronunciations.) Each list contained minimal pairs of the form /bVt/-/bVd/ and /sVt/-/sVd/ (e.g., beat-bead, sat-sad), which will be referred to as the/bVC/ and/sVC/words, respectively. The members of minimal pairs were never immediately juxtaposed on the lists; and no words selected for analyses came from the very beginning or end of the lists. After being digitized, the 16 words spoken by each talker were edited from the carrier phrase and normal- ized for peak intensity.

C. Acoustic analyses

Four acoustic dimensions were measured in the/bVC/ and/sVC/words. Vowel duration was measured from the

first positive peak in the periodic portion of each waveform to constriction of the word-final/t/or/d/. Constriction was

defined as occurring when amplitude decreased abruptly and the waveforms became more nearly sinusoidal in shape. The changes in waveform shape indicated that frequencies in the glottal source above the region off 1 had been filtered out by closure of the vocal tract (Hillenbrand et al., 1984). Clo- sure voicing was measured from the point of constriction, as

130 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege et aL: Production of the word-final/t/-/d/contrast 130

Page 4: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

just defined, to the last positive peak in any low-amplitude, sinusoidal voicing in the stop closure interval. There was no visible acoustic evidence of release of lingual constriction in 29 (7.3 % ) of the words ending in/t/and 85 (21.2% ) of the words ending in ?d/. Stop closure duration could not be mea- sured in these stops. In all others, duration of the closure interval was measured from constriction to the beginning of the release burst.

F 1 of J•etfrequency was measured from formant tracks such as those for three minimally paired English words shown in Fig. 1. These words were spoken by one of the native English subjects. As expected from previous research, the offset frequency off I was much the same in beat-bead and bit-bid, but it was higher in bat than bad. The formant tracks in each word were obtained by performing a series of LPC analyses using 14 coefficients for male talkers and 12 coefficients for female talkers. A 12.8-ms Hamming window was moved in 5.0-ms steps through each CVC waveform. A peak-picking algorithm identified the first five peaks in the smoothed LPC spectra. Peaks were then assigned by algo- rithm to formants. This last step was fully automatic for most tokens, but user intervention was needed for some to- kens. For example, the formant tracks had to be hand edited when a peak that should have been assigned to F2 was incor- rectly assigned to F 1 because no peak in the F 1 region was identified by the peak-picking algorithm. In other instances, a few spurious data points located between the F 1 and F2 tracks, or between the F 2 and F 3, had to be removed by hand before analysis.

Three mean F 1 values were obtained for each CVC, each representing a 15-ms segment containing frequencies that were obtained, in most instances, from three successive LPC analyses. The last of the three segments, which occurred at the end of the formant transitions leading into/t/and/d/, was segmented from the following stop closure interval on the basis of frequency changes in the F2 and F3 tracks and large increases in formant bandwidths, which are not shown in Fig. 1. (Segmentation was based on changes in the shape and amplitude of time domain waveforms in a few tokens in which constriction location was not clear.) The frequency offset analysis, unlike the other acoustic analysis described thus far, was confined to the native speakers of English and the two experienced L2 groups because it was so time con- suming.

D. Listening test

Copies of the/bVC/words were edited as described by Flege (1989). Any energy present in the stop closure inter- val was removed along with the final release burst, if there was one. The edited stimuli therefore contained no acoustic

information pertaining to lingual constriction or release. As noted earlier, not all stops had release bursts. We reasoned that the effect of editing would be larger for the stops with release bursts than for those without release bursts because

more phonetically relevant information was removed from the CVCs that contained release bursts.

To provide an estimate of how many stops contained audible (as opposed to simply visible) release bursts, the first

3.0

-s o-

0.0

beat bead

3.0

0.07

bit

bid .......

3.0

bat

bad .............

200 300 400

Time (ms)

FIG. !. First anti second formant frequencies (Hz) at 5-ms intervals in the vowels of three minimal pairs spoken by a native speaker of English. The two tracks for each minimal pair are aligned at the point of constriction of the word-final slops (see text).

author judged the final stop in the 100 available tokens of beat and bea,t. These words were presented randomly over headphones to avoid biases. As summarized in Table II, the native English speakers were usually judged to have pro- duced final/t/'s with a voiceless release burst and to have

produced final/d/'s with a voiced release burst. Of the/t/'s produced by the nonnative speakers, two (5%) were per- ceived to have no burst and 38 (95%) to have a voiceless release burst. Of the nonnative speakers'/d/'s, six (15%)

131 J. Acoust. Soc. Am., VoL 92, No. 1, July 1992 Flege etaL: • roduction of the word-final/t/-/d/contrast 131

Page 5: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

TABLE II. Perceptual characteristics of final release bursts in beat and bead tokens as spoken by subjects in five groups. A listener judged final release bursts to be inaudible (none), voiced ( + voi ), or voiceless ( - voi ).

/t/ /d/

None --Voi +Vol None --Vol +Voi

Native English speakers

male 0 5 0 0 0 5 female 0 5 0 0 I 4

Inexperienced Spanish subjects

male 0 5 0 3 I 1 female 0 5 0 0 3 2

Experienced Spanish subjects

male 0 5 0 0 2 3

female 0 5 0 I 3 I

Inexperienced Mandarin subjects

male I 4 0 1 3 1 female 0 5 0 0 4 I

Experienced Mandarin subjects

male I 4 0 0 4 1 female 0 5 0 I I 3

were judged to have no release burst, 13 (33%) to have a voiced release burst, and 21 (52%) to have a ooiceless release burst. This suggested, paradoxically, that some of the non- native speakers' final /d/'s might be identified at higher rates in the edited than in the full-cue condition.

The/bVC/words were presented to three male and sev- en female monolingual speakers of American English with a mean age of 28 years. Four were from Alabama and one each was originally from Tennessee, Mississippi, North Carolina, South Carolina, Massachusetts, and Maryland. Each listen- er passed a pure-tone hearing screening at octave frequencies ranging from 500 to 4000 Hz at 20 dB HL. They heard the CVC words over headphones at a comfortable level (about 74 dB SPL-A) while seated in a sound booth. The listeners were told to identify the final stop in each CVC as/t/or/d/, and to guess if uncertain. Each CVC was presented I s after a response was received for the preceding CVC. The edited condition followed the full-cue condition after a short break.

Eight blocks of 50 stimuli were presented in counterba- lanced order within each half of the experiment. The blocks varied according to the vowel in the CVCs (/i/, /l/,/•/, /a:/) and the talkers' gender. For example, one block con- sisted of the beat and bead tokens produced by the 25 female talkers (5 groups X 5 talkers).

We determined how many of the ten native English lis- teners correctly identified the final stop in each word. Per- cent correct scores are reported in Sec. II. To correct for guessing and to avoid potential problems arising from heter- ogeneity of variance, statistical analyses were performed on A' values, a bias-free measure of perceptual sensitivity (Grier, 1971 ). Four A' scores were obtained for each talker,

one for each minimal pair. The A' scores were based on the number of"hits" ( correct identifications of intended/t/'s as voiceless) and "false alarms" (incorrect identifications of intended final/d/'s as voiceless). Each A' score was based

on 20 forced-choice judgments ( 10 listeners X 2 members of each minimal pair).

II. RESULTS

A. Listening test

The overall rate of correct identifications ranged from 68% for the final stops in bit-bid to 71% correct for the stops at the end of bet-bed and bat-bad. Rates were higher for/t/ than/d/in both the full-cue (82% vs 65% ) and edited con-

ditions (72% vs 61% ). This means that the/d/'s were per- ceived to have been alevoiced more often than the/t/'s were

heard as voiced stops. The average rate of correct identifica- tions dropped by 7% when closure voicing and release burst cues were removed in the edited condition. All listeners par- ticipated in the edited condition after the full-cue (unedited) condition, which may have helped them to identify stops in the edited condition. If so, the fixed order of the two condi-

tions may account for why the removal of closure voicing and release burst cues had a somewhat smaller effect in the

present study than in some previous studies (e.g., Wolf, 1978; Revoile et al., 1982; but see Malrcot, 1958). The effect of editing was somewhat greater for/t/than/d/( 10% vs 4%). It was roughly the same for the native speakers of English (7%) as it was for the experienced and inexper- ienced Spanish subjects (8%, 11% ) and for the experienced and inexperienced Mandarin subjects (5 %, 6% ). The native English speakers' stops were identified correctly more often in the full-cue condition (95% correct) than the inexper- ienced Spanish and Mandarin subjects' stops (71%, 65 % ). The native speakers' stops were also identified correctly more often than those of the experienced Spanish and Man- darin subjects (73%, 62%).

The A' scores were submitted to a mixed-design group- žminimal pair Xcondition ANOVA with repeated mea- sures on the last two factors. The mean A' scores shown in

Fig. 2 have been averaged over the four minimal pairs be- cause the minimal pair factor was nonsignificant [F(3,135) = 0.249, p>0.10]. As expected, the listeners were significantly more sensitive to the voicing feature of stops presented in the full-cue than edited condition (0.803 vs 0.706) IF(1,45) = 41.2, p <0.001 ]. The group X condi- tion interaction was nonsignificant [F(4,45) ----1.34, p>0.10], suggesting that the talkers in all five

groups produced perceptually useful acoustic cues to the /t/-/d/contrast in the closure interval and/or the final re- lease burst.

The significant main effect of group [F(4,45) = 7.05, p < 0.001 ] was followed up by post-hoc comparisons using Tukey's HSD test. These tests revealed that the native Eng- lish speakers' stops had significantly higher A' scores (0.953) than the experienced and inexperienced Spanish subjects' stops (0.751, 0.720). The native English speakers also received higher A' scores than the experienced and inex- perienced Mandarin subjects (0.668, 0.679). Neither the ex-

132 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege eta/.: Production of the word-final/t/-/d/contrast 132

Page 6: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

1.0

0.9

0.8

0.7

0.6

0.5

Full-Cue Cond. • Edited Cond.•

Exper. Spanish

Inexper. Exper. Inexper. Spanish Mandarin Mandari•

FIG. 2. Mean perceptual sensitivity (A') of native English-speaking listen- ers to/t/-/d/contrasts in minimal pairs spoken by native speakers of Eng- lish and four groups of adult L2 learners in two listening conditions. The data have been averaged over four minimal pairs. The brackets enclose _+ one standard deviation for the ten subjects per group.

perienced and inexperienced Spanish subjects, nor the expe- rienced and inexperienced Mandarin subjects, differed significantly (p > 0.10). The two Spanish groups had some- what higher A' scores than the corresponding Mandarin groups, but these cross-language differences were nonsignifi- cant (p> 0.10).

Figure 3 shows the average A' scores obtained for each of the 50 talkers whose final stop production was assessed. The mean scores represent an average of the scores obtained in four minimal pairs. The scores obtained for the native English subjects ranged from 0.906 to 1.000 (perfect sensi- tivity by the listeners) in the full-cue condition, and from 822 to 0.986 in the edited condition. There was less variabil-

ity among the ten native English speakers than among the ten subjects in the nonnative groups. For stops presented in the full-cue condition, only six (15%) of the nonnative sub- jects had A' scores falling within the range of values ob- served for the native English group. Three of these nonnative subjects were experienced native Spanish speakers of Eng- lish, two were inexperienced Mandarin speakers of English, and one was an inexperienced Spanish subject. Six (15%) nonnative subjects also had scores falling within the native English range for stops presented in the edited condition: three inexperienced Spanish subjects, two experienced Span- ish subjects, and one inexperienced Mandarin subject. One might conclude from this that at least some of the adult L2 learners were able to master the English word-final/t/-/d/ con trast.

Despite the success of certain individuals, many nonna- tive subjects--including the so-called "experienced" sub- jects--produced final stops that were very difficult for the native English listeners to identify correctly. The native speakers' stops were correctly identified in 91% of instances in the two conditions whereas the nonnative subjects' stops were identified in only 64% of instances on the average. One might wonder, therefore, if the normatires' stops were pro- duced at above chance rates. Twenty binomial probability tests were carried out to answer this question. An alpha rate of 0.002 for each test resulted in an experiment-wise error

1.0

08

0.6

04

O2

lO

08

0.6

o 4

o2!

o

ø ø o o o

o o

o

Full-cue Condition

o o o o

o o

o

0 •aitea Condztion

Hot rye Exper. Xnexper. Exper. Inexper. E[ 9 Ilsh $p&nlsh Spanl sh Mander lO Mandarin

FIG. 3. Mean A' scores obtained in two listening conditions for stops pro- duced in minimal pairs produced by individual subjects in five groups.

rate of 0.10. For stops in the full-cue condition, all four mini- mal pairs produced by the native English speakers were identified at rates that significantly exceeded the guessing rate of 50% (i.e., correct identifications in more than 123 of 200 instances, p < 0.002). The same was true for all four minimal pairs produced by both the experienced and inex- perienced Spanish subjects. Above-chance rates were ob- served for only three of four minimal pairs produced by the inexperienced Mandarin subjects, and just two minimal pairs produced by the experienced Mandarin subjects.

The final stops in all four minimal pairs produced by the native English speakers were identified at an above-chance rate in the edited condition (p<0.002), confirming that there were sufficient acoustic cues to the word-final/t/vs

/d/ distinction in the portion of the eve waveforms that preceded the stop closure interval (that is, in the preceding vowel and fi)rmant transitions; see Flege, 1989). Above- chance rates were obtained for three minimal pairs produced by the expertenced Spanish subjects, just one minimal pair produced by the inexperienced Spanish and Mandarin sub- jects, and no minimal pairs produced by the experienced Mandarin subjects. It appears, then, that three of the four nonnative groups did not produce sufficient preconsonantal cues to the Ill-/d/distinction.

B. Acoustic analyses

The listening test results failed to support the predic- tions generated by the SLM (see the Introduction). Both experienced L2 groups differed from the native English speakers; and the final stops produced by the experienced L2 learners were identified no better by the native English Ils-

133 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege ot at: Production of the word-final Ill-/d/contrast 133

Page 7: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

teners than those produced by relatively inexperienced L2 learners. L1 acquisition research has shown that children sometimes produce statistically reliable phonetic differences that go unnoticed by listeners (Maxwell, 1981; Weismer et al., 1981 ). The purpose of the acoustical analyses presented in this section was to test for between-group differences that might not have been evident to the listeners.

1. Vowel duration

Averaged over the words in all eight minimal pairs, vowels produced by the native speakers of English were longer than those produced by the experienced and inexper- ienced Spanish speakers (224 vs 180, 154 ms) as well as the experienced and inexperienced Mandarin speakers (169, 154 ms). As shown in Fig. 4, the difference between the native and nonnative speakers was due largely to the dura- tion of vowels preceding/d/. The native English speakers made vowels 87 ms (48%) longer on average before/d/ than/t/. Smaller voicing effects were observed for the expe- rienced Spanish subjects (55 ms or 36%), the experienced Mandarin subjects (37 ms or 25% ), the inexperienced SI•an- ish subjects ( 30 ms or 22% ), and the inexperienced Manda- rin subjects (32 ms or 23%).

The mean durations of vowels in the/bVt/, /bVd/, /sVt/, and/sVd/words were submitted to a group X initial consonant (/b/vs/s/) X final stop (/t/vs/d/) ANOVA, which yielded a significant group Xfinal stop interaction [F(4,45) = 7.62,œ < 0.001 ]. Tests of simple main effects re- vealed that the talkers in all five subject groups made vowels

5OO

250

200

•50

•00

50

0 Native

EnGlish

1 Exper. Spanish

• bVt Words • bVd Words

Inexper. Exper. Inexper. Spanish Mandarin Mandarin

500

25O

2OO

150

100

50

o Native English

Exper. Spanish

• sVt Words B• sVd Words

Inexper. Exper. Inexper. Spanish Mandarin Mandarin

FIG. 4. Mean duration of vowels (ms) in minimally-paired words ending in /d/ and /t/ spoken by subjects in five groups. The brackets enclose ñ one standard deviation.

significantly longer before/d/than/t/. The Fratios yielded by these tests ranged from 33.3 to 56.7 (p < 0.001 ). The sim- ple main effect of group was nonsignificant for words ending in/t/ [F(4,45) = 2.01, p>0.10]. It was significant, how- ever, for the words ending in /d/ [F(4,45) =7.86, p < 0.001 ]. Post-hoc comparisons (Tukey's HSD) revealed that the native English speakers made vowels significantly longer before /d/ than the talkers in all four nonnative groups (p < 0.05). No other difference, including those be- tween the experienced and inexperienced nonnative sub- jects, reached significance at the 0.05 level.

The mean duration of vowels produced by each talker in eight/t/-final words was subtracted from the mean duration of vowels in eight/d/-final words. As shown in Fig. 5, a few subjects in each nonnative group produced a mean "voicing effect" that fell within the range of mean values observed for the ten native English speakers. If we define the English "norm" as the range of values falling within one standard deviation of the native English speakers' overall mean ( 50- 123 ms in the present instance), then we could credit 11 (28%) of the nonnative speakers with having mastered the "English" voicing effect. Of these, eight were experienced L2 learners and three were relatively inexperienced L2 learners.

The larger number of experienced than inexperienced L2 learners who mastered the English voicing effect is con- sistent with the hypothesis that the phonetic contrast be- tween English/t/and/d/is learnable. An analysis of the ratio of vowel durations preceding/d/vs/t/also supported this hypothesis. It is known that the size of voicing effects produced by native speakers of English vary as a function of overall vowel duration, which may depend on factors such as degree of stress and speaking rate (Crystal and House, 1982, 1988). The use of ratios minimizes the potentially confound- ing effects of such factors. A ratio was therefore computed for each talker by dividing the mean duration of vowels in eight/d/-final words by the mean duration of vowels in eight/t/-final words. The effect of group was marginally significant in a one-way ANOVA examining the vowel dura- tion ratios [F(4,45) = 3.53, p = 0.014]. Post-hoc compari-

170

15o

110

BO

70

•0 •0

lO

o

o

Native

English

o o o @ o o

8 o o

¸ , Exper. Inexper. Exper. Inexper. Spanish Spanish Mandarin Mandarin

FIG. 5. The mean difference in the duration of vowels in eight words ending in/d/and eight words ending in/t/produced by individual subjects in five groups.

134 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege eta/.: Production of the word-final/t/-/d/contrast 134

Page 8: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

sons (Tukey's HSD) revealed that the native English sub- jects' ratio (1.52) was not significantly greater than the ratios obtained for the experienced Spanish and Mandarin subjects ( 1.36, 1.26) (p > 0.10). The native speakers' ratios did differ significantly, however, from the inexperi- enced Spanish and Mandarin subjects' ratios (1.23, 1.25) (p <0.05).

2. Closure vo•ing

The duration of closure voicing produced by each sub- ject in/bVt/,/bVd/,/sVt/, and/sVd/words was examined in a group X initial consonant Xfinal stop ANOVA. As shown in Fig. 6, there was very little closure voicing in the /t/'s produced by the subjects in any group. All five groups of talkers produced more closure voicing in/d/than/t/. The native English speakers produced a/d/vs/t/difference that averaged 54 ms. The differences produced by the experi- enced and inexperienced Spanish subjects averaged 40 and 18 ms, respectively; those produced by the experienced and inexperienced Mandarin subjects averaged just 10 and 11 ms, respectively. The significant group5< final stop interac- tion [F(4,45)= 6.65, p<0.001] yielded by the ANOVA was explored by tests of simple main effects. All five groups of talkers were found to have produced significantly more closure voicing in/d/than/t/[Fratios ranging from 9.77 to 34.1; p < 0.01 ]. The effect of group on closure voicing was nonsignificant for words ending in /t/ [F(4,45) = 1.46; p>0.10), but it was significant for /d/-final words [F(4,45) = 6.53, p < 0.001 ]. Post-hoc tests (Tukey's HSD) revealed that the native English speakers sustained voicing

90

80

70

60

50

40

30

20

0

90

80

7O

60

50

40

30 I 20

10

0

• bVt Words • bVd Words

• sVt Words •] sVd Words

FIG. 6. The mean duration (ms) of closure voicing in Ihe final stops in /bVt/and/bVd/words (top) and/sVt/and/sVd/words (boltore) spok- en by subjects in five groups. The brackets enclose 4- one standard devi~ ation.

significantly longer in/d/than the subjects in both inexper- ienced L2 groups. Although they produced more closure voicing in /d/ than the experienced Mandarin subjects (p <0.05), the native speakers did not differ significantly from the experienced Spanish subjects (p > 0.10), who pro- duced significantly more closure voicing than the inexper- ienced Spanish subjects (p < 0.05).

The mean closure voicing difference produced by each of the 50 talkers was calculated. One native speaker of Eng- lish did not produce a closure voicing distinction between /d/and/t/, a phenomenon that has been noted occasionally before (e.g., Higgs and Hodson, 1978). Of the 40 nonnative subjects, ten (25 % ) produced a mean closure voicing differ- ence between/t/and/d/that fell within one standard devi-

ation of the native English speakers' mean/t/-/d/differ- ence (viz. 24-85 ms). Of these, five were experienced Spanish subjects, four were inexperienced Spanish subjects, and just two were native speakers of Mandarin.

3. Stop closure duration

Due to the missing values discussed in Sec. I, just two mean stop closure durations were calculated for each sub- ject, one for words ending in/t/and one for words ending in /d/. As shown in Fig. 7, the subjects in all five groups made /t/ closures longer than/d/closures. The native speakers of English produced a larger average distinction (39 ms) than the experienced Spanish subjects ( 18 ms), the inexperienced Spanish subjects (7 ms), the experienced Mandarin subjects (27 ms), and the inexperienced Mandarin subjects ( ! 6 ms ). The mean stop closure duration values were submitted to a groupžfinal stop (/t/ vs /d/) ANOVA. The ANOVA yielded a significant main effect of stop voicing [F(1,44) = 22.9, p<0.001 ], a nonsignificant group main effect IF(1,44) = 1.27, p > 0.10], and a nonsignificant two- way interaction F(4,44) = 1.54, p > 0. I0].

4. FI offset frequency

The native English subjects were expected to produce words ending in/t/with a higher F 1 offset frequency than

F--• bVt & sVt Words • bVd & sVd Words

200

175

150 125 Too

50

25

0

FIG. 7. Mean duration (ms) of closure for the final stops in words ending in /t/( untilled bar.';) and/d/( filled bars) spoken by subjects in five groups. The brackels enclose 4-_ one standard deviation.

135 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege eta/.: Production ol the word-final/t/-/d/contrast 135

Page 9: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

6O0

55O

5OO

45O

4OO

•23 123 ]23

Native Exper. Exper. English Spanish Mandarin

6O0

55O

50O

450

400

123 125 123

Nati• •r. Exper. •glish S•nish Mandarin

FIG. 8. Mean frequency (Hz) of the first formant (FI) in three 15-ms acoustic segments at the end of/bVt/and/bVd/words (top) and/sVt/ and/sVd/words (bottom) as spoken by monolingual native speakers of English and experienced native Spanish and Mandarin speakers of English. Segment 3 was aligned with the end of formant transitions leading into the final stop (see text and Fig. I ).

words ending in/d/, at least for the two minimal pairs con- taining non-high vowels (Wolf, 1978; Hillenbrand et al. 1984, Summers, 1987; see, also, Summers, 1988; Fischer and Ohde, 1990). Figure 8 shows the mean frequency of F 1 in three 15-ms segments found at the end of the vowel in the /bVC/and/sVC/words. (As used here, the term "vowel" includes the formant transitions leading into/t/and/d/; see Sec. I.) F 1 offset frequency was 68 Hz higher, on the aver- age, in/bVt/than/bVd/words produced by the English subjects. F 1 averaged just 3 Hz higher in/bVt/than/bVd/ words for the experienced Spanish subjects; and it was actu- ally slightly lower (by 3 Hz) in the/bVt/than/bVd/words spoken by the experienced Mandarin subjects.

The frequencies obtained in the final three segments in four/bVt/and /bVd/ words produced by talkers in the three groups were submitted to separate vowel X final stop- X segment ANOVAs. None of the three-way interactions reached significance [F values ranging from 0.82 to 1.83, p > 0.10], suggesting that the offset frequency differences ex- tended into the preceding vowel (see Summers, 1987, 1988). Frequencies in the' three segments in each word were there- fore averaged and submitted to a (3) groupX(2) final stop X (4) preceding vowel ANOVA. The significant three- way interaction it yielded [F(6,81 ) = 3.64, p < 0.01 ] was explored by tests of simple main effects. The/t/vs/d/dif- ference proved to be significant in all four minimal pairs produced by the native speakers of English [ F values rang- ing from 11.9 to 20.8, p < 0.01 ). F 1 offset frequency was sig- nificantly higher before/t/than/d/in bat-bad (739, 610 Hz), bet-bed (592, 485 Hz) and bit-bid (485, 425 Hz); it was significantly lower before/t/than /d/in beat-bead

(303, 329 Hz). For the experienced Spanish subjects, the F 1 offset frequency differences in bat-bad (604, 587 Hz), bet- bed (493, 479 Hz), bit-bid (383, 377 Hz) and beat-bead (365,380 Hz) were all nonsignificant [Fratios ranging from 0.05 to 2.67, p > 0.10]. The differences in F 1 offset frequency produced by the experienced Mandarin subjects in bat-bad (759, 747 Hz), bet-bed (624, 632 Hz), bit-bid (438, 452 Hz) and beat-bead (322, 326 Hz) were also all nonsignifi- cant [F-ratios ranging from 0.096 to 1.367, p > 0.10].

F 1 offset frequencies for the/sVd/and/sVt/words are also shown in Fig. 8. The group X vowel X final stop interac- tion missed reaching significance in the analysis of the aver- age frequencies obtained in the final 45 ms of each word [F(6,81 =2.24, p=0.05]. The significant groupXfinai stop interaction that was obtained [F(2,27)----14.0, p < 0.001 ] was explored by tests of simple main effects. The F 1 offset frequency difference between/t/and/d/produced by the native speakers of English (534 vs 461 ) proved to be significant [F(1,9) = 76.7, p < 0.01 ]. The difference pro- dueed by experienced Mandarin subjects (543 vs 518 Hz) was also significant [F(1,9) = 11.6, p < 0.01 ]. The F 1 dif- ference produced by the experienced Spanish subjects (471 vs 453 Hz), on the other hand, was nonsignificant [F(1,9) = 4.72, p = 0.058].

A consideration of the frequency values presented ear- lier reveals that the size oftheF ! offset frequency differences in/sVC/and/bVC/words was inversely related to vowel height. Not surprisingly, a significant vowel X final stop in- teraction was obtained in the analyses of both/bVC/and /sVC/words [F(3,81) = 6.56, F(3,81) = 13.3;p <0.001]. Averaged across the three groups, the F 1 offset frequency differences decreased steadily from bat-bad (129 Hz) to bet-bed (108 Hz) to bit-bid (59 Hz) to beat-bead ( -- 27 Hz); and they decreased steadily from sat-sad ( 142 Hz) to set-said ( 116 Hz) to sit-Sid (50 Hz) to seat-seed ( -- 167 Hz). This finding agrees with previous studies of F I offset frequency (e.g., Hillenbrand et al., 1984).

5. Discussion

As expected, the native speakers of English made vowels longer before/d/than/t/, produced more closure voicing in /d/ than /t/, produced words ending in/t/with a higher F 1 offset frequency than/d/-final words, and sustained/t/clo- sures longer than/d/closures. The nonnative speakers pro- duced these acoustic distinctions also, but their acoustic dif- ferences were usually smaller than the native English speakers'. One exception to this was for stop closure dura- tion. The lack of a significant group X voicing interaction suggested that the nonnative speakers produced as large a closure duration difference between/t/and/d/as the na-

tive speakers. Also, the experienced Spanish subjects did not differ from the native English speakers in production of clo- sure voicing in/d/. Finally, when the ratios of vowels in words ending in/d/and/t/were examined, it was found that the two experienced L2 groups did not differ significant- ly from the native speakers of English.

Some of the nonnative subjects produced more English- like acoustic distinctions between/t/and/d/than others.

136 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege et al.: Production of the word-final/t/-/d/contrast 136

Page 10: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

Of the 40 who were included in the present study, 73% pro- duced a vowel duration difference in words ending in/t/vs /d/that fell within one standard deviation of the mean dif-

ference observed for the native English speakers, and 26% did so for the closure voicing difference between/t/and/d/. To provide a similar estimate for F 1 offset frequency, we computed the average F 1 frequency in the final 45 ms of the vowels in eight words (beat, bead, bit, bid, bet, bed, bat, bad) produced by the 20 inexperienced nonnative speakers. (Re- call that the original F 1 analysis examined only experienced normatire speakers.) The average /t/-/d/ difference in /bVC/words produced by all of the nonnatives was then computed. Of the 40 nonnative subjects, 36% produced F 1 frequency differences that fell within one standard deviation of the mean difference produced by the ten native English subjects.

If the percentage of normarive subjects who achieved the English "norm" is a good indication, one might conclude that it is easier for adult L2 learners to learn the vowel dura-

tion correlate of the English/t/-/d/contrast than the clo- sure voicing or F 1 offset frequency correlates. The former involves the timing of lingual gestures; the latter two involve the coordination ofglottal and supraglottal gestures, includ- ing gestures that actively enlarge the oral cavity (Flege et al., 1988). Previous research also suggests that children learning English as an LI succeed in producing the vowel duration cue to final stop voicing before learning to produce other cues such as closure voicing (e.g., Naeser, 1970; Higgs and Hodson, 1978; Smith, 1979; Greenlee, 1980; Weismer et al., 1981; Krause, 1982a,b).

One finding obtained in the present study suggested in- directly that phonetic learning may be influenced by lexical frequency. Generally speaking, we found that the nonnative subjects differed more from the native English speakers in producing/d/than/t/. For example, they closely resem- bled the native English speakers in producing/t/with little or no closure voicing, whereas they typically produced/d/ with less closure voicing than the native speakers of English. The nonnative speakers' vowel durations more closely re- sembled the durations of vowels produced by the native Eng- lish speakers in the context of/t/than/d/, something that has been observed in previous L2 production studies (see Flege, 1988a, for a review). These two findings agree with the linguistic description of/d/as more "marked" in word- final position than/t/. It is probably the result of differences in ease of articulation that more human languages permit- ting just one stop series in word-final position have/p t k/ rather than /bdg/ (Maddieson, 1984). From this, we might conclude that if the L2 learners had difficulty in pro- ducing a/t/-/d/contrast, it could be ascribed to difficulty in producing/d/rather than/t/.

Of the eight words ending in/d/that we examined, said was the most common. According to the tabulation of Fran- cis and Ku•era (1982), said occurs seven times more often in written English texts than the next two most frequent/d/- final words (viz. bad and bed). The only significant F 1 offset frequency difference observed in a minimal pair produced by the nonnative subjects was for set-said. This finding suggests the possibility that adults must hear a particular word many

TABLE IIl. 5urnmary of analyses examining acoustic measures of four minimal pairs,, spoken by 40 normatire speakers of English and listeners' perceptual sensitivity (A') to the/t/-/d/contrast in t hose minimal pairs in two listening conditions. The adjusted R 2 values were obtained at steps 1-3 of forward stepwise multiple regression analyses.

R • Beta F p

Full-cue condition

VOWDIFF • 0.152 0.343 22.28 0.000'

VOIDIFF • 0.186 0.183 6.36 0.013* F 1DIFF • 0.210 0.155 4.67 0.032*

Edited condition VOWDIFF • 0.280 0.501 53.62 0.000'

VOIDIFF •' 0.300 0.087 1.62 0.205 FI DIFF' 0.293 0.120 3.18 0.076

VOWDIFF--the difference in the duration of vowels preceding/t/vs/d/ in each of four minimal pairs produced by the 40 nonnative talkers.

b VOIDIFF•the difference in the duration of closure voicing in/t/vs/d/. FIDIFF--the difference in offset frequency of F I in the 45-ms interval preceding constriction of word-final/t/'s and/d/'s.

thousands of times before they can accurately produce a new sound in it.

C. Multiple correlation analyses

Forward stepwise multiple regression analyses were carried out I o examine the relationship between the A' scores obtained for minimal pairs produced by the normatire speakers (Sec. II A) and acoustic measures of those minimal pairs (Sec. IIB). Three acoustic predictor variables were regressed onto the 160 A' scores obtained for the normatire speakers: differences in the duration of vowels in the two members of' each minimal pair; closure voicing differences; and the F 1 offset frequency differences. (Stop closure dura- tion values were not entered into the model because of miss-

ing data; see above.) Results of the analyses of A' scores obtained in the edited and full-cue conditions are summar-

ized in Table lII.

Variations in vowel duration accounted for 15.2% of

the variance in the full-cue condition at step I. Closure voic- ing accounted for an additional 3.4% of the variance at step 2; and F 1 offset frequency accounted for an additional 1.5% of the variance at step 3. The seemingly greater importance of vowel duration than closure voicing or F 1 offset frequency was not an artifact of the order in which the variables were

entered into the forward stepwise regression model. When voicing was forced as the first variable, it accounted for only 6% of the variance. Vowel duration accounted for 9% more

variance at step 2. When F 1 offset frequency was forced as the first variable, it accounted for 4% of the variance, with vowel duration accounting for an additional 14% of the vari- ance at step 2.

It would probably be unwise to assume that the order in which the three variables were entered into the regression model, or their beta weights, indicates the variables' relative perceptual importance to native English-speaking listeners. The order of entry may simply reflect the size of the acoustic contrasts pn•uced by the normatire speakers. Recall that

137 J. Acoust. Soc. Am.. Vol. 92. No. 1, July 1992 Flege otaL: Production of the word-final It/-/d/contrast 137

Page 11: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

73 % of the nonnative speakers produced vowel duration dif- ferences that fell within one standard deviation of the mean

difference produced by the native English speakers, whereas this was true for only 26% of the nonnative speakers for closure voicing, and 36% of the nonnative speakers for F 1 offset frequency.

We can think of two explanations for why the regression model accounted for only 21% of the variance in A' scores obtained in the full-cue condition. First, acoustic measures of the spectrum and intensity of final release bursts were not included in the model. Second, the nonnative speakers sim- ply did not produce very large acoustic distinctions between /t/and ?d/. For example, the perceptual effect of vowel duration may have accounted for relatively little of the vari- ance because the stop voicing effects produced by the nonna- tive speakers was far smaller than the effects produced by the native English speakers (37 vs 83 ms on the average). In fact, when the acoustic data for the ten native English sub- jects were included in a new regression analysis, the variance that was accounted for (largely by vowel duration) in- creased from 21% to 37%. Another factor that must be con- sidered is that vowel duration is a relative rather than an

absolute cue. The listeners heard five native and 20 nonna-

tive speakers in each block. The voicing effect produced by a nonnative speaker with a characteristically fast speaking rate might have gone unnoticed if the vowels he/she pro- duced in the context of/d/were shorter than those pro- duced in the context of/t/by subjects with a characteristi- cally slow speaking rate. Perhaps vowel duration would have accounted for more variance had we presented the CVCs in their original carrier phrases.

We hypothesized that vowel duration would account for more variance in the edited than in the full-cue condition.

This hypothesis, which will be referred to as the "attentional allocation hypothesis," supposed that the listeners would al- locate greater attention to vowel duration in the edited con- dition because other cues to the/t/-/d/distinction, such

closure voicing and release bursts cues, would be unavailable in the edited condition. In fact, vowel duration accounted for more variance when it was entered at step 1 in a stepwise multiple regression analysis examining A' scores obtained in the edited condition than it did in the analysis of full-cue condition data (28% vs 15 % ). Neither closure voicing nor F 1 offset frequency were identified as significant predictors of edited-condition A' scores.

To provide a further test of the attentional allocation hypothesis, we tested the strength of the correlation between vowel duration differences and the A' scores that were ob-

tained for the minimal pairs in both the edited and the une- dited conditions. One variable in the two correlational analy- ses (i.e., vowel duration differences in the 160 minimal pairs) was the same, so we carried out a test of the equality of two related correlations (Kleinbaum et al., 1988). As pre- dicted, the correlation involving A' scores obtained in the edited condition was significantly stronger than the correla- tion involving A' scores obtained in the full-cue condition (Z = 2.54, p <0.01 ). Recall that the same vowel duration differences Were available to the listeners in both conditions. Only cues that occurred after constriction of the final stop

were removed in the edited condition. This finding therefore suggests that the listeners made greater use of the small, al- beit significant, vowel duration differences produced by the nonnative speakers in the edited condition than they did in the full-cue condition.

The finding just presented is very striking when one con- siders that some of the nonnative speakers produced very small vowel duration differences between words ending in /t/vs/d/. We reasoned that the difference in the strength of correlations obtained in the two listening conditions would have been even greater had we chosen to examine only mini- mal pairs with relatively large vowel duration differences. To test this, we performed separate analyses of the 80 mini- mal pairs having the highest A' scores in the full-cue condi- tion and the 80 pairs with the lowest A' scores. These might be called the relatively "intelligible" and "unintelligible" minimal pairs. Vowel duration differences were significantly greater in the intelligible than in the relatively unintelligible minimal pairs (52 vs 22 ms) [F( 1,158) = 18.5, p < 0.01 ]. There was not a significant difference in the size of the F 1 offset frequency differences in the intelligible and unintelligi- ble minimal pairs (15 Hz vs - 1 Hz) [F(1,158) = 2.00, p > 0.10]. Closure voicing differences between/t/and/d/ were significantly larger for the intelligible than the unintel- ligible minimal pairs (26 vs 12 ms) [F(1,158)= 6.90, œ < 0.01 ] but, of course, this would not have been evident to listeners in the edited condition.

We computed the simple correlations between vowel duration differences in the two sets of 80 minimal pairs and the A' scores obtained for those two sets of pairs in both the edited and full-cue conditions. If the native English listeners allocated more attention to vowel duration in the edited than

in the full-cue condition, we might expect to find a signifi- cant difference in the strength of correlations in the first but not the second analysis because much larger vowel duration differences were available in the intelligible than in the unin- telligible minimal pairs. In fact, the correlation obtained for the intelligible minimal pairs was significantly stronger in the edited than full-cue condition [r = 0.600 vs r = 0.349;

Z = 2.43,p < 0.01 ] but there was not a significant difference in strength of correlation for A' scores obtained for the unin- telligible minimal pairs in the edited and full-cue conditions Jr= 0.218 vs r= 0.078; Z= 1.12, p>0.10]. This implies that a vowel duration difference that averages just 22 ms cannot be exploited perceptually by native speakers of Eng- lish whereas a difference averaging 52 ms can be used.

III. GENERAL DISCUSSION

A. L2 learners' stop production

Previous studies have shown that adults who learn Eng- lish as a second language in adulthood often do not produce a fully English-like distinction between voiced and voiceless stops in the final position of English words (e.g., Suomi, 1976; Elsendoorn, 1980, 1982; Flege and Port, 1981; Eck- man, 1981; Mitleb, 1981a,b; Port and Mitleb, 1983; Ander- son, 1983, 1987; Heyer, 1986; Weinberger, 1987; Lowie, 1988; Flege and Davidian, 1984; Flege et al., 1988; Flege,

138 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege et aL: Production of the word-final It/-/dl contrast 138

Page 12: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

1989; Flege and Wang, 1990). For many subjects examined in previous studies, this might be attributed to the lack of sufficient L2 experience. For others, it might be attributed to the interlingual identification of L 1 and L2 stops. For exam- ple, one might expect French learners of English to show a smaller effect of stop voicing on the duration of preceding vowels than native speakers of English if they identified word-final tokens of English/b d g/and/p t k/in terms of the corresponding stop consonants of French. This is be- cause the stop voicing effect is smaller in French than Eng- lish (Mack, 1982). So too, one might expect adults whose L 1 permitted only/p t k/to occur in word-final position to con- tinue "devoicing" English /b,d,g/if they identified both English/b d g/and/p t k/in terms of their L1/p t k/cate- gories.

Based on the nonoccurrence of word-final stop conson- ants in Mandarin and Spanish (see the Introduction), we assumed that native speakers of those languages would not identify word-final English/t/and/d/tokens in terms of word-final L1 phonetic categories. Given this assumption, the speech learning model under evaluation (e.g., Flege, 1988a, 1992a,b) generated the prediction that adult native speakers of Spanish and Mandarin, at least those who were experienced in English, would produce/t/and/d/accu- rately in the final position of English words. The results of the present study did not support the model's prediction. The word-final stops produced by the native speakers of English were identified correctly at an average rate of 95% whereas the native Spanish and Mandarin subjects' stops were identified correctly in only 72% and 63% of instances, respectively. A' scores obtained for the experienced L2 learners were significantly lower than those obtained for the native speakers of English. Even more damaging, the A' scores obtained for relatively experienced and inexperienced Mandarin subjects, and for experienced and inexperienced Spanish subjects, did not differ significantly.

Acoustic analyses revealed that the native speakers dis- tinguished /t/ from /d/ on the basis of vowel duration, F 1 offset frequency, the presence/absence of closure voicing, and stop closure duration. The normarive speakers produced the same acoustic differences, but their differences were gen- erally smaller in magnitude than the native English speak- ers'. This might account for why the native English-speaking listeners were able to identify the voicing feature in stops produced by normatires at levels that only barely exceeded statistical significance in many instances. On the other hand, the fact that the nonnatives produced a number of statistical- ly reliable acoustic differences between/t/and/d/suggests that most if not all or them were at least partially aware of the nature of the English/t/-/d/contrast (see, also, Flege, 1988b; Flege et al., 1988). The acoustic analyses revealed instances in which the experienced L2 learners more closely resembled the native speakers than the inexperienced L2 learners. The overall pattern of results obtained in the pres- ent experiment suggests a pattern of partial phonetic approx- imation rather than the complete mastery by experienced L2 learners of the/t/-/d/contrast that was predicted by the model.

We can think of several possible explanations for why

the model's predictions were not confirmed by the present study, although the present data will not permit us to choose among the alternate explanations offered. First, we may have underestimated the normatires' ability to produce word-final stops in English if alveolar stops are more diffi- cult to learn than bilabial and velar stops. English/t/and /d/are variable in production, sometimes being realized as flaps (intervocalic position) or glottal stops (word-final po- sition). Takata and Nabelek (1990) examined the effect of noise and reverberation on the identification of English CVC words spoken by native speakers of English and Japanese. Listeners' confusions could be readily understood on the ba- sis of cross-language phonelogical differences for conson- ants found in word-initial position (e.g., the Japanese sub- jects' confi•sion of/r/ and /1/). In word-final position, however, this was not always true. There, the most common misidentification of stops produced by both the native and nonnative speakers was/t/for/d/.

Second, we may have underestimated the nonnatives' ability to produce word-final stops if we did not observe their optimal L2 speech production. Our subjects read isolated CVC words from a list. It was assumed that this would opti- mize their pronunciation of English by allowing them to "monitor" their speech. This assumption should not be al- lowed to go unchallenged, however. There is evidence, for example, that nonnatives' spontaneous conversations may sound less foreign accented than their reading of paragraph- length texts (Oyama, 1976).

Third, the study may have yielded a negative finding because even the experienced L2 learners had not received sufficient phonetic input to master the English/t/-/d/con- trust. The experienced L2 learners had, of course, lived far longer in the US than the inexperienced subjects (7.3 vs 0.7 years on the average). However, we believe that L1 acquisi- tion must be considered in attempting to define what consti- tutes "sufficient" L2 input. Most English-learning children produce a perceptually effective contrast between/t/and /d/in word-final position by the age of five years (e.g., Naeser, 1970; Higgs and Hodson, 1978; Smith, 1979; Green- lee, 1980; Weismet et ai., 1981; Krause, 1982a,b; Smit et al., 1990). One might therefore suppose that five or more years of residence in the US would surely have provided the oppor- tunity for enough phonetic input for mastery of the/t/-/d/ contrast if adults, as claimed, are able to master new L2 consonants.

There is no guarantee, however, that our subjects did receive sufficient phonetic input. Consider, for example, the nonnative subjects' report of using English on average 57% of the time on a daily basis. Were their self-reports accurate? If so, would .an adult who had lived for 8 years in the US but spoke English 57% of the time have received as much Eng- lish input as the average 5-year-old English monolingual child? And would the English input received by such an adult be as useful as the input received by young children learning English as an LI? It may be that young children learning an t. 1 hear more isolated words than adults learning an L2, and that this may be helpful in learning to pronounce an L2. For example, Brown (1973) suggested that young children are aided in learning to identify word units in the

139 J. Acousl. Soc. Am., Vol. 92, No. 1, July 1992 Flege otaL: Production of the word-final/t/-/d/contrast 139

Page 13: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

speech stream by the fact that their parents often speak in isolated words to them. Vogel and Winitz (1989) provided evidence that having been exposed to isolated Chinese words helped native English adults identify the boundaries between words in Chinese sentences.

Consider also the effect of stop voicing on preceding vowel duration. It is generally agreed that the relatively large voicing effect seen in English is an exaggerated version of an effect seen in other languages. Vowel duration is just one of several cues to the voicing distinction in word-final English stops. Perhaps because of this, the English voicing effect is not always maintained. It is far smaller in connected speech than in isolated words, especially in words not found in ut- terance-final position or in pre-pausal positions within utter- ances (e.g., Luce and Charles-Luce, 1985; Crystal and House, 1988). Children learning English as an L1 may re- ceive more useful phonetic input than adults learning an L2 as the result of an enhancement of the vowel duration corre-

late of final stop voicing. Bernstein Ratner(1986) observed a greater magnitude of prepausal lengthening in the speech mothers addressed to children just beginning to talk than in the speech those mothers addressed to other adults. Such an increase was less evident in the speech mothers addressed to children who were at a slightly more advanced stage of lan- guage development. One might suppose that the voicing ef- fect would be augmented in words whose overall durations have been increased by prepausal lengthening. In fact, Bern- stein Ratner and Luberoff (1984) reported that the voicing effect was larger in the speech mothers addressed to children just learning to talk than in speech directed to other adults (roughly 80 vs 40 ms). We suspect that adults do not exag- gerate voicing effects when speaking to nonnative adults, perhaps as an indirect result of producing fewer words in prepausal positions owing to the use of longer, more compli- cated sentences.

Still another factor that must be considered is the role of

foreign-accented input. We suspect that our subjects spoke English with at least some nonnative speakers who failed to distinguish/t/from/d/in word-final position. Perhaps our nonnative subjects would have produced better/t/-/d/con- trasts had they received better (i.e., more authentic) Eng- lish-language speech input. Some of the nonnative subjects may have failed to produce native-like/t/-/d/contrasts be- cause they established phonetic categories defined both by native and foreign-accented/t/and/d/tokens. An effect of foreign-accented input could be ruled out if results compara- ble to those obtained here were obtained from experienced late L2 learners who had little opportunity to speak their L 1 and no opportunity to speak English with other nonnatives.

We must, of course, face the possibility that the predict- ed effects were not observed because the model is incorrect in

one or more respects. For example, the SLM may be wrong in claiming that L2 production difficulties have a perceptual origin. The L2 learners examined here may not have identi- fied word-final English/t/and/d! tokens in terms of exist- ing LI categories, thereby blocking the establishment of phonetic categories for word-final English/t/and/d/. They may have established central perceptual representations for English/t/and/d/but were unable to motorically output

the sounds specified in those representations. If so, the data at hand would support the hypothesis of a critical period for human speech learning (Scovel, 1988; Patkowski, 1989 ) and could be interpreted to mean that adults are unable to master L2 sounds not present in their LI phonetic inventory.

Alternatively, the SLM may be correct in claiming that production difficulties have a perceptual origin but incorrect in positing that interlingual identification occurs at a pho- netic level. That is, contrary to the model, the experienced L2 learners may have identified word-final/t/and/d/to- kens in terms of phonetic categories used for word-initial /t/'s and/d/'s in the L1. Indirect support of this was pro- vided recently by Flege (1989; see, also, Flege and Wang, 1990). If it could be shown directly that word-final L2 stops are identified with word-initial LI stops despite substantial acoustic differences between word-initial and word-final

stops, it would support the hypothesis ofTrubetzkoy (1939) that L2 learners "filter out" acoustic differences between

corresponding LI and L2 sounds that are not phonemically relevant for contrasts in the L1. (See Jaeger, 1980, 1986, for experiments pertaining to listeners' perceptual grouping of word-initial and word-final allophones into phonemic units.) Such evidence, should it be forthcoming, would sug- gest that the interlingual identification of L 1 and L2 sounds occurs at a phonemic level rather than at the phonetic level stipulated by the SLM. If so, one would expect native Man- darin and Spanish learners of English to be unable to estab- lish phonetic categories for word-final English/p/,/t/, and /k/and, accordingly, to produce these stops inaccurately in English. Additional research is needed to determine if Span- ish and Mandarin learners of English L2 identify word-final English stops in terms of word-initial L 1 stop categories, and if they (but not native English speakers) use the same acous- tic features in identifying the distinction between/t/and /d/in both the word-initial and word-final positions.

Finally the SLM may be wrong in claiming that all adult learners who have received sufficient L2 phonetic input will master new consonants in an L2. Perhaps new consonants can be mastered by only a smallproportion of adult L2 learn- ers. Our subjects were not selected on the basis of a special ability to pronounce English. The first 40 adult L2 learners who met very broad selection requirements (see Sec. I) were enrolled in the study. Of these, 15% produced a perceptually effective/t/-/d/contrast (that is, showed A' values falling within one standard deviation of the mean A' value obtained

for ten native English speakers). In an average of 45% of instances, the nonnatives produced acoustic differences be- tween /t/ and /d/ (in vowel duration, duration of closure voicing, F 1 offset frequency) that fell within one standard deviation of the mean differences produced by the native English subjects. Although it is clear that not all subjects mastered the English/t/-/d/contrast, it seems reasonable to conclude that at least some of them did so.

B. Listeners' perception of foreign-accented stops

As noted earlier, the nonnative speakers produced a much smaller difference in the duration of vowels preceding /d/ vs /t/ than the native speakers of English ( 37 vs 83 ms).

140 J. Acoust. Soc. Am., VoL 92, No. 1, July 1992 Flege otal.: Production of the word-final/t/-Id/contrast 140

Page 14: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

Despite this, a multiple regression analysis revealed that vowel duration accounted for a small but significant amount of variance in the listeners' identification of the nonnative

speakers' stops. We hypothesized that the native English- speaking listeners would make greater use of vowel duration differences when CVCs produced by the nonnative speakers were presented in an "edited" condition (where closure voicing and release burst cues were absent) than in the origi- nal "full-cue" condition. This hypothesis was motivated by three considerations. First, a study by Flege and Hillenbrand (1987) suggested that native speakers of a language in which word-final stops are consistently released (viz., French) make less use of closure voicing and F 1 offset cues than na- tive speakers of a language in which final stops are not con- sistently released ( viz., English; see, e.g., Crystal and House, 1982; Bernstein Ratner, 1984). Results of that study sug- gested that listeners may tailor their perceptual processing of stops to the range of cues that are reliably available. Second, listeners seem to reorganize perceptual processes when ad- justing t,o foreign-accented speech. After a period of acclima- tion, the foreign-accented speech becomes more intelligible (Wingstedt and Schulman, 1984). Finally, speech percep- tion research has shown that listeners' ability to attend selec- tively to portions of a word is very flexible and precise (see Pitt and Samuel, 1990, and references therein).

The attentional allocation hypothesis was supported. We noted a significantly stronger correlation between the size of vowel duration differences and A' scores obtained in

the edited condition than between the same vowel duration

differences and A' scores obtained in the full-cue condition.

This suggests that the listeners allocated greater attention to preconsonantal cues such as vowel duration and F 1 offset frequency when information pertaining to the constriction and release of final stops was never available in a listening condition. Such flexibility in the face of a varying mix of acoustic cues to a phonetic contrast is just what one would expect given the great variety in speech that arises from dif- ferences in dialect, foreign accents, and speaking styles.

The conclusion concerning attentional allocation must be tempered somewhat for two reasons. First, listeners in the present study all participated in the edited condition after participating in the full-cue condition. Perhaps they made greater use of vowel duration in the second condition be- cause they had grown better able to detect small vowel dura- tion differences as the result of their experience identifying stops in the full-cue condition. Second, we did not directly observe a shift in attentional allocation. A more direct test of

the hypothesis might be carried out by comparing the perfor- mance of native speakers of English and French. English/t/ and/d/are released inconsistently, and/d/is often pro- duced with little or no closure voicing at the end of English words (Smith, 1979; Hogan and Rozsypal, 1980; Guy, 1980; Crystal and House, 1982; Bernstein Rather, 1984). Perhaps as the result of such variability, the removal of closure voic- ing and final bursts seldom affected greatly native English listeners' identification of the voicing feature in stops (e.g., Wolf, 1978; Revoile et al., 1982; Wardrip-Fruin, 1982; Hil- lenbrand eta!., 1984; Flege, 1989; but see Mal6cot, 1958). In French, on the other hand, word-final stops are usually re-

leased, and/b d g/are often produced with voicing through the entire period of constriction (see Flege and Hillenbrand, 1987). As noted earlier, the effect of stop voicing on preced- ing vowel duration appears to be greater in English than French (Mack, 1982). One might therefore expect that na- tive English subjects, as the result of their experience with a highly variable/t/-/d/contrast, would be more flexible than French speakers to varying mixes of stop voicing cues. More specifically, one might expect to observe a greater dif- ference in the strength of correlations between vowel dura- tion and perceptual judgments in edited versus full-cue con- ditions for native English than native French subjects.

In summary, the present study failed to confirm the pre- diction (Flege, 1992a, b) that experienced native Mandarin and Spanish speakers of English would master the produc- tion of/t/and/d/in the final position of English words. Several alternative explanations for this negative finding were offered, but the data at hand did not permit choosing among them. Native English-speaking listeners were shown to make perceptual use of the small but highly significant vowel duration produced by the normarive speakers. A stronger cnrrelation was noted between vowel duration dif- ferences p•esent in minimally paired words ending in/d/vs /t/and A' scores obtained for the identification of stops in those words when the words were presented in an edited than in a full-cue listening condition. This was interpreted to mean that '.he listeners made greater perceptual use of vowel duration when acoustic cues pertaining to constriction and release were systematically absent than when those cues were present in the signal.

ACKNOWLEDGMENTS

This research was funded by NIH grant DC00257. The authors thank C. Wang for help with data collection; C. Crowther, V. Mann, R. Port, A. Walley, and an anonymous reviewer for comments on a previous draft of the article; and M. Beckman and I. Maddieson for providing cross-language phonetic information.

Two Chinese subjects in the study were native speakers of Mandarin, which permits no word-final stops. The remaining subject was a native speaker of tlakka (Kejia), a language which, like Taiwanese, permits /p t k/in word-final position.

Anderson, J. (1983). "The difficulties of English syllable structure for Chinese ESL learners," Lang. Learn. Commun. 2, 5341.

Andersou, J. ½ 1987 I. "The markedness differential hypothesis and syllable structure difficully," in Interlanguage Phonology, edited by G. Ioup and S. Weinberger (Newbury House, RowIcy, MA), pp. 279-291.

Bernstein Ralner, A. (1984). "Phonological rule u•ge in mother-child speech," J. Phon. 12, 245-254.

Bernstein Rat aer, A., and Luberoff, A. (1984). "Cues to post-vocalic voic- ing in mother-child speech," J. Phon. 12, 285-289.

Bernstein Rather, A. (1986). "Durational cues which mark clause boun-

daries in mother-child speech," i. Phon. 14, 303-309. Brown, R. ( 1973)..4 First Language: The Early Stages (Cambridge U.P.,

Cambridge;,. Chen, M. (19'10). "Vowel length variation as a function of the voicing of the

consonant environment," Phonetica 22, 129-159.

Cheng, C.-C. ½ 1973). A Sy,tchronic Phonology of Mamtarin Chinese (Mon- ton, The Hague).

141 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Fiego otal.: Production of the word-final/t/-/d/contrast 141

Page 15: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

Cheng, R. L. (1968). "Tone Sandhi in Taiwanese," Linguistics 41, 19-42. Crowther, C., and Mann, V. (1990). "Vocalic duration as a cue to conso-

nant voicing," J. Acoust. Soc. Am. Suppl. 1 88, S80. Crystal, T., and House, A. (1982). "Segmental durations in connected

speech: Preliminary results," J. Acoust. Soc. Am. 72, 705-716. Crystal, T., and House, A. (1988). "Segmental durations in connected

speech: Current results," J. Acoust. Soc. Am. 83, 1553-1557. Eckman, F. (1981). "On the naturalncss of interlanguage phonological

rules," Lang. Learn. 31, 195-216. Elsendoorn, B. (1980). "Vowel duration as a cue to a Dutch foreign accent

in English CVC words," PRIPU 5, 64-86 (Institute of Phonetics, Uni- versity of Utrecht).

Elsendoorn, B. (1982). "Vowel duration in English sentences by native speakers and Dutch speakers of English," PRIPU 7, 42-57 (Institute of Phonetics, University of Utrecht).

Fischer, R., and Ohde, R. (1990). "Spectral and duration properties of front vowels as cues to final stop voicing," J. Acoust. Soc. Am. 88, 1250- 1259.

Flege, J. E. (1987). "The production of 'new' and 'similar' phones in a for- eign language: Evidence for the effect of equivalence classification," J. Phon. 15, 47-65.

Flege, J. E. (1988a). "The production and perception of speech sounds in foreign languages," in Human Communication and Its Disorders; d Re- view 1988, edited by H. Winitz (Ablex, Norwood, NJ).

Flege, J. E. (1988b). "The development of skill in producing word-final English stops: Kinematic parameters," J. Acoust. Soc. Am. 84, 1639- 1652.

Flege, J. E. (1989). "Chinese subjects' perception of the word-final English /t/-/d/contrast: Performance before and after training," J. Acoust. Soc. Am. 86, 1684-1697.

Flege, J. E. (1991). "Perception and production: The relevance of phonetic input to L2 phonological learning," in Crosscurrents in Second Language `4cquisition and Linguistic Theory, edited by T. Hcubner and C. Ferguson (Benjamins, Amsterdam), pp. 249-289.

Flege, J. E. (1992a). "The Intelligibility of English vowels spoken by Brit- ish and Dutch talkers," in [ntelligibility in Speech Disorders: Theory, Measurement, and Management, edited by R. Kent (Benjamins, Phila- delphia).

Flege, J. E. (1992õ). "Speech learning in a second language," in Phonologi- cal Development: Models, Research, and ̀4pplication, edited by C. Fergu- son, L. Menn, and C. StocI-Gammon (York, Parkton, MD), in press.

Flege, J. E., and Davidian, R. (1984). "Transfer and developmental pro- ccsses in adult foreign language speech production," Appl. Psycholing. $, 323-347.

Flege, J. E., and Hillenbrand, J. (1987). "Differential use of closure voicing and release burst as cues to stop voicing by native speakers of French and English," J. Phon. 15, 203-208.

Flege, J. E., McCutchcon, M., and Smith, S. (1988). "The development of skill in producing word-final English stops," J. Acoust. Soc. Am. 82, 433-447.

Flege, J. E., and Port, R. (I981). "Cross-language phonetic interference: Arabic to English," Lang. Speech 24, 125-146.

Flege, J. E., and Wang, C. (1990). "Native-language phonotactic con- straints affect how well Chinese subjects perceive the word-final English /t/-/d/contrast," J. Phon. 17, 299-315.

Francis, W., and Kueera, H. (1982). Frequency,4nalysis of English Usage: Lexicon and Grammar (Houghton Mifflin, Boston).

Greenlee, M. (1980). "Learning the phonetic cues to the voiced-voiceless distinction: A comparison of child and adult speech perception," J. Child Lang. 7, 459-468.

Grier, J. (1971}. "Nonparametric indexes for sensitivity and bias: Comput- ing formulae," Psych. Bull. 75, 424-429.

Guy, G. (1980). "Variation in the group and the individual: The case of final stop deletion," in Locating Language in Time and Space, edited by W. Labor (Academic, New York}, pp. 1-36.

Harris, J. (1969}. Spanish Phonology (MIT, Cambridge, MA). Heyer, S. (1986}. "English final consonants and the Chinese learner," Un-

published M.A. thesis, Department of Linguistics, So. Illinois Univ, Car- bondale, 1L.

Higgs, J., and Hodson, B. (1978}. "Phonologieal perception of word-final obstruent cognates," J. Phon. 6, 25-35.

Hillenbrand, J., Ingrisano, D., Smith, B., and Flege, J. E. (1984). "Percep- tion of the voiced-voiceless contrast in syllable-final stops," J. Acoust. Soc. Am. 76, 18-26.

Hiramatsu, K. (1990). "Timing acquisition by Japanese speakers of Eng- lish," Speech Research Laboratory, Work in Progress 6, 49-76 (Depart- ment of Linguistics, Reading University).

Hogan, J., and Rozsypal, A. (1980). "Evaluation of vowel duration as a cue for the voicing distinction in the following word-final consonant," J. Acoust. Soc. Am. 67, 1764-1771.

House, A. (1961). "On vowel duration in English," J. Acoust. Soc. Am. 33, 1174-1178.

HoMe, J. (1976). ,4coustic Studies of Mandarin Vowels and Tones (Cam- bridge U.P., Cambridge).

Jaeger, J. (19g0). "Testing the psychological reality of phoneroes," Lang. Speech 23, 233-253.

Jaeger, J. (1986). "Category formation as a tool for linguistic research," in Experimental Phonology, edited by J. Ohala and J. Jaeger (Academic, Orlando), pp. 211-238.

Kleinbaum, D., Kupper, L., and Muller, K. (1988). ,4pplied Regression ,4nalyses and Other Multivariable Methods ( PWS-Kent, Boston).

Krause, S. (1982a). "Vowel duration as a perceptual cue to postvoealic consonant voicing in young children and adults," J. Acoust. Soc. Am. 71, 990-995.

Krause, S. (1982b). "Developmental use of vowel duration as a cue to past- vocalic stop consonant voicing," J. Speech Hear. Res. 25, 388-393.

Lisker, L. (1986). "Voicing" in English: A catalogue of acoustic features signaling/b/versus/p/in trochees," Lang. Speech 29, 3-11.

Lowie, W. (1988). "Age and foreign language pronunciation in the class- room," Unpublished M.A. thesis, Dept. of English, Univ. of Amsterdam.

Luce, P., and Charles-Luce, J. (1985). "Contextual effects on vowel dura- tion, closure duration, and the consonant/vowel ratio in speech produc- tion," J. Acoust. Soc. Am. 78, 1949-1957.

Mack, M. (1982). "Voicing dependent vowel duration in English and French: Monolingual and bilingual production," J. Acoust. Soc. Am. 71, 173-178.

Maddicson, 1. (1984). Patterns of Sound (Cambridge U.P., Cambridge). Mal6eot, A. (1958). "The role of releases in the identification of released

final stops," Lang. 34, 370-380. Maxwell, E. ( 1981 }. "A study of misarticulation from a linguistic perspec-

tive," Ph.D. dissertation, lndiana University. Mitleb, F. (1981a). "Timing of English vowels spoken with an Arabic Ac-

cent," Research in Phonetics 2, 193-225 (Department of Linguistics, In- diana University).

Mitleb, F. (1981b}. "Segmental and non-segmental structure in phonetics: Evidence from foreign accent," Ph.D. dissertation, Indiana University.

Naeser, M. {1970). "The American child's acquisition of differential vowel duration," Wisconsin Research and Development Center for Cognitive Learning, University of Wisconsin, Madison, Tech. Rep. 144.

Norman, Jerry. (1988}. Chinese (Cambridge U.P., Cambridge). Ohde, R., and Sharf, D. (1977). "Order effect of acoustic segments of VC

and CV syllables on stop and vowel identification," J. Speech Hear. Res. 20, 543-554.

Oyama, S. (1976). "A sensitive period for the acquisition of a nonnative phonologieal system," J. Psycboling. Res. 5, 261-285.

Patkowski, M. (1989). "Age and accent in a second language: A reply to James Emil Flege," Appl. Ling. 11, 73-89.

Peterson, G., and Lehiste, I. (1960}. "The duration of syllable nuclei in English," J. Acoust. Soc. Am. 32, 693-703.

Picheny, M., Durlach, N., and Braida, L. (1986). "Speaking clearly for the hard of hearing l l: Acoustic characteristics of clear and conversational speech," J. Speech Hear. Res. 29, 43

Pitt, M., and Samuel, A. (1990). "Attentional allocation during speech per- ception: How fine is the focus?" J. Mem. Lang. 29, 611-632.

Port, R. (1979}. "The influence of tempo on stop closure duration as a cue for voicing and place," J. Phon. 7, 45-56.

Port, R. (1981). "Linguistic timing factors in combination," J. Acoust. Soc. Am. 69, 262-274.

Port, R., and Dalby, J. (1982). "Consonant/vowel ratio as a cue for voicing in English," Percep. Psychophys. 3•-, 141-152.

Port, R., and Mitleb, F. (1983}. "Segmental features and implementation in acquisition of English by Arabic speakers," J. Phon. 11, 219-231.

Raphael, L. (1972). "Preceding vowel duration as a cue to the perception of the voicing characteristic of word-final consonants in American Eng- lish," J. Acoust. Soc. Am. 51, 1296-1303.

Raphael, L. { 1981). "Durations and context as cues to word-final cognate oppositions in English," Phonetica 38, 126-147.

Repp, B. (1978). "Perceptual integration and differentiation of spectral

142 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege ot at: Production of the word-final/t/-Idl contrast 142

Page 16: Production of the Word-final English /t/-/d/ Contrast by ....../t/-/d/contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers

cues for intervocalic stop consonants," Percep. Psyehophys. 14, 471-485. Revoile, S., Picket, J., Holden, L., and Talkin, D. (1982). "Acoustic cues to

final stop voicing for impaired and normal-hearing listeners," J. Acoust. Soc. Am. 72, 1145-1154.

Scovcl, T. ( 1988 }. •4 Time to Speak: •4 Psycholinguistic Inquiry into the Criti- cal Period for Human Speech (Newbury House/Harper & Row, New York).

Stair, A., Hand, L., Freilinger, J., Bernthai, J., and Bird, A. (1990). "The Iowa articulation norms project and its Nebraska replication," J. Speech Hear. Disor& $$, 779-798.

Smith, B. (1979). "A phonetic analysis of consonantal devoieing in chil- dren's speech," J. Child Lang. 6, 19-28.

Stevens, K. Keyset, S., and Kawasaki, H. (1986). "Toward a phonetic and phonological theory of redundant features," in Symposium on ln•ariance and Variability of Speech Processes, edited by J. Perkell and D. Klatt (Erl- baum, Hillsdale, NJ), pp. 432-469.

Such, C., and Beddoes, M. (1974). "The silent interval of stop consonants," Lang. Speech 17, 126-134.

Suomi, K. (1976). "English voiceless and voiced stops as produced by na- tive and Finnish speakers," Jyvaskyla Contrastire Studies, 2 (Depart- ment of English, University of Jyvaskyla, Finland ).

Summers, W. V. (1987). "Effects of stress and final consonant voicing on vowel production: Articulatory and acoustic analyses," J. Acoust. Soc.

Am. 82, 847-863.

Summers, W. V. (1988). "F I structure provides infi•rmation for final-con- sonant voicing," J. Acoust. Soc. Am. 84, 485--492.

Takata, Y., and Nabelek, A. (1990). "English consonant recognition in noise and in reverberation by Japanese and American listeners," J. Aeoust. Soc. Am. 88, 663-666.

Trubetzkoy, N. (1939). Grundzuge der Phonologie (Travaux du cercle lin- guistique de Prague), p. 7.

Vogel, D., and Winitz, H (1989). "Word isolation and meaning in segmen- tation," I. Psycholing. Res. 18, 473484.

Wardrip-Frnin, C. (1982). "On the status of temporal cues to phonetic cat- egoties: Preceding vowel duration as a cue to voicing in final conson- ants," I. Acoust. Soc. Am. 71, 187-195.

Weinberger, S. (1987). "The influence of linguistic context on syllable structure simplificalion," in Interlanguage Phonology, edited by G. Ioup and S. Weinberger (Newbury House, RowIcy, MA), pp. 401-4 18.

Weismet, G., Dinnsen, D., and Elbert, M. (1981). "A study of the voicing distinctionassociated with omitted, word-final stops," J. Speech Hear. Disord. 46, 320-328.

Wingstedt, M., and Schulman, R. (1984). "Comprehension of foreign ac- cents," in Pho,ologica 1984, edited by W. Dressier (Adam Mickiewicz U. P., Warsaw), pp. 339-344.

Wolf, C. (1978). "Voicing cues in English final stops," J. Phon. 6, 299-309.

143 J. Acoust. Soc. Am., Vol. 92, No. 1, July 1992 Flege eta,'.: Production of the word-final Itl-/d/contrast 143