berent balaban phonological constraints on speech identification

Upload: mahiyagi

Post on 03-Apr-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    1/15

    the classication o linguistic stimuli as speech can be strategically

    altered by instructions and practice (e.g., Remez et al., 1981, 2001;Liebenthal et al., 2003; Dehaene-Lambertz et al., 2005).

    Such results make it clear that the identication o speech the linguistic messenger is logically distinct rom the message.

    Not only can we use speech to transmit messages that lack bothmeaning and grammatical structure (e.g., o-i-a), but it is also pos-sible to convey meaningul, well-ormed linguistic messages usingacoustic inormation that can be perceived as non-speech-like (e.g.,sine wave analogs). In practice, however, the message and messen-ger interact, as the classication o an input as linguistic can be

    fexibly altered by numerous actors in a top-down ashion. Thisfexibility raises a conundrum. An adaptive language system shouldrapidly hone in on its input and readily discern linguistic mes-

    sengers rom non-linguistic ones. But i the status o an auditoryinput is not determined solely by its internal acoustic properties,then how do we classiy it as linguistic? Specically, what externalactors constrain the classication o an input as speech?

    Most existing research has addressed this question by pit-ting internal, bottom-up actors against top-down eects thatare external to the stimulus (e.g., task demands, training). Top-down eects, however, might also originate rom the languagesystem itsel. The language system is a computational device thatgenerates detailed structural descriptions to inputs and evaluates

    their well-ormedness (Chomsky, 1980; Prince and Smolensky,

    IntroductIon

    Speech is the preerred carrier o linguistic messages. All hearingcommunities use oral sound as the principal medium o linguis-tic communication (Maddieson, 2006); rom early inancy, peopleavor speech stimuli to various aural controls (e.g., Vouloumanos

    and Werker, 2007; Shultz and Vouloumanos, 2010; Vouloumanoset al., 2010); and speech stimuli may engage many so-called lan-guage areas in the brain to a greater extent than non-speech inputs(Molese and Molese, 1980; Vouloumanos et al., 2001; Liebenthalet al., 2003; Meyer et al., 2005; Telkemeyer et al., 2009; but seeAbrams et al., 2010; Rogalsky et al., 2011).

    The strong human preerence or speech suggests that the lan-

    guage system is highly tuned to speech. This is indeed expected bythe view o the language system as an adaptive processor, designed

    to ensure a rapid automatic processing o linguistic messages(Liberman et al., 1967; Fodor, 1983; Liberman and Mattingly, 1989;Trout, 2003; Pinker and Jackendo, 2005). But surprisingly, thepreerential tuning to speech is highly fexible. And indeed, linguis-

    tic phonological computations apply not only to aural language,but also to printed stimuli read silently (e.g.,Van Orden et al., 1990;Lukatela et al., 2001, 2004; Berent and Lennertz, 2010). Moreover,many natural languages take manual signs as their inputs, and suchinputs spontaneously give rise to phonological systems that mirrorseveral aspects o spoken language phonology (Sandler and Lillo-

    Martin, 2006; Sandler et al., 2011; Brentari et al., 2011). Finally,

    How linguistic chickens help spot spoken-eggs: phonologicalconstraints on speech identifcation

    Iris Berent1*, Evan Balaban2,3andVered Vaknin-Nusbaum4,5

    1 Department of Psychology, Northeastern University, Boston, MA, USA2

    Department of Psychology, McGill University, Montreal, QC, Canada3 Scuola Internazionale Superiore di Studi Avanzati, Trieste, Italy4 Department of Education, University of Haifa, Haifa, Israel5 Western Galilee College, Akko, Israel

    It has long been known that the identication o aural stimuli as speech is context-dependent

    (Remez et al., 1981). Here, we demonstrate that the discrimination o speech stimuli rom their

    non-speech transorms is urther modulated by their linguistic structure. We gauge the eect o

    phonological structure on discrimination across dierent maniestations o well-ormedness in two

    distinct languages. One case examines the restrictions on English syllables (e.g., the well-ormed

    melifvs. ill-ormed mlif); another investigates the constraints on Hebrew stems by comparing ill-

    ormed AAB stems (e.g., TiTuG) with well-ormed ABB and ABC controls (e.g., GiTuT, MiGuS). In both

    cases, non-speech stimuli that conorm to well-ormed structures are harder to discriminate rom

    speech than stimuli that conorm to ill-ormed structures. Auxiliary experiments rule out alternativeacoustic explanations or this phenomenon. In English, we show that acoustic manipulations that

    mimic themlifmelifcontrast do not impair the classication o non-speech stimuli whose structure

    is well-ormed (i.e., disyllables with phonetically short vs. long tonic vowels). Similarly, non-speech

    stimuli that are ill-ormed in Hebrew present no diculties to English speakers. Thus, non-speech

    stimuli are harder to classiy only when they are well-ormed in the participants native language.

    We conclude that the classication o non-speech stimuli is modulated by their linguistic structure:

    inputs that support well-ormed outputs are more readily classied as speech.

    Kywods: phonoloy, spch, non-spch, wll-omdnss, phonolocal thoy, modlaty, ncapslaton

    Edited by:

    Charles Clifton Jr., University of

    Massachusetts Amherst, USA

    Reviewed by:

    Chen Yu, Indiana University, USA

    Bob McMurray, University of Iowa,

    USA

    *Correspondence:

    Iris Berent, Department of Psychology,

    Northeastern University, 125

    Nightingale Hall, 360 Huntington

    Avenue, Boston, MA 02115-5000,

    USA.

    e-mail: [email protected]

    www.ontsn.o September 2011 | Volume 2 | Article 182 | 1

    Original research articlepublished: 13 September 2011

    doi: 10.3389/psyg.2011.00182

    http://www.frontiersin.org/language_sciences/10.3389/fpsyg.2011.00182/abstracthttp://www.frontiersin.org/language_sciences/10.3389/fpsyg.2011.00182/abstracthttp://www.frontiersin.org/people/irisberent/30518http://www.frontiersin.org/people/evanbalaban/37197http://www.frontiersin.org/people/veredvaknin_nusbaum/31878http://www.frontiersin.org/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/Psychology/editorialboardhttp://www.frontiersin.org/Psychology/editorialboardhttp://www.frontiersin.org/Psychology/editorialboardhttp://www.frontiersin.org/Psychology/editorialboardhttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/people/veredvaknin_nusbaum/31878http://www.frontiersin.org/people/evanbalaban/37197http://www.frontiersin.org/people/irisberent/30518http://www.frontiersin.org/language_sciences/10.3389/fpsyg.2011.00182/abstracthttp://www.frontiersin.org/http://www.frontiersin.org/Psychology/abouthttp://www.frontiersin.org/Psychology/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    2/15

    1993/2004; Pinker, 1994). Existing research has shown that suchcomputations might apply to a wide range o inputs both inputsthat are perceived as speech, and those classied as non-speech-like. To the extent the system is interactive, it is thus conceivablethat the classication o an input as linguistic might be con-strained by its output namely, its structural well-ormedness.

    Such top-down eects could acquire several orms. On a weaker,

    attention-based explanation, ill-ormed stimuli are less likelyto engage attentional resources, so they allow or a more rapidand accurate classication o the stimulus, be it speech or non-speech. A stronger interactive view asserts that the output o thecomputational system can inorm the interpretation o its input

    the stronger the well-ormedness o the output (e.g., harmony,Prince and Smolensky, 1997), the more likely the input is to beinterpreted as linguistic. Accordingly, ill-ormedness should acili-tate the rapid classication o non-speech inputs, but impair theclassication o speech stimuli. While these two versions dieron their accounts or the classication o speech inputs, they

    converge on their predictions or non-speech stimuli: ill-ormedinputs will be more readily classied as non-speech compared to

    well-ormed structures.Past research has shown that the identication o non-speech

    stimuli is constrained by several aspects o linguistic knowledge.Azadpour and Balaban (2008) observed that peoples ability to

    discriminate non-speech syllables rom each other depends ontheir phonetic distance: the larger the phonetic distance, the moreaccurate the discrimination. Moreover, this sensitivity to the pho-netic similarity o non-speech stimuli remains signicant even aterstatistically controlling or their acoustic similarity (determined bythe Euclidian distance among ormants).

    Subsequent research has shown that the identication o non-speech stimuli is constrained by phonological knowledge as well(Berent et al., 2010). Participants in these experiments were pre-

    sented with various types o auditory continua either naturalspeech stimuli, non-speech stimuli, or speech-like controls rang-ing rom a monosyllable (e.g., mlif) to a disyllable (e.g., melif), andthey were instructed to identiy the number o their beats (a proxy

    or syllables). Results showed that syllable count responses weremodulated by the phonological well-ormedness o the stimulus,and the eect o well-ormedness obtained regardless o whetherthe stimulus was perceived as speech or non-speech.

    These results demonstrate that people can compute phonologi-cal structure (a property o linguistic messages) or messengers

    that they classiy as non-speech. But other aspects o the nd-ings suggest that the structure o the message can urther shapethe classication o messenger. The critical evidence comes rom

    the comparison o speech and non-speech stimuli. As expected,responses to speech and non-speech stimuli diered a dierencewe dub the speechiness eect. But remarkably, the speechi-

    ness eect was stronger or well-ormed stimuli compared toill-ormed ones. Well-ormedness, here, specically concerned thecontrast between monosyllables (e.g., mlif) and their disyllabiccounterparts (e.g., melif) in two languages: English vs. Russian.While English allows melif-type disyllables, but not their monosyl-labic mlif-type counterparts, Russian phonotactics are opposite

    Russian allows sequences like mlif, but bans disyllables such asmelif(/m eli/). The experimental results showed that the Russian

    and English groups were each sensitive to the status o the stimulias speech or non-speech, but the speechiness eect dependedon the well-ormedness o the stimuli in the participants lan-guage. English speakers maniested a stronger speechinesseect or melif-type inputs, whereas or Russian speakers, this

    eect was more robust or monosyllables structures that arewell-ormed in their language. Interestingly, this well-ormedness

    eect obtained irrespective o amiliarity or both mlif-typeitems (which are both well-ormed and attested in Russian) andthe mdif-type items that are structurally well-ormed (Russianexhibits a wide range o sonorant-obstruent onsets), but hap-pen to be unattested in this language. These results suggest that

    the classication o auditory stimuli as speech depends on theirlinguistic well-ormedness: well-ormed stimuli are more speech-like than ill-ormed controls. Put dierently, structural prop-erties o the linguistic message inorm the classication o themessenger.

    The ollowing research directly tests this prediction.

    Participants in these experiments were presented with a mixtureo speech and non-speech stimuli, and they were asked simply

    to determine whether or not the stimulus sounds like speech.The critical manipulation concerns the well-ormedness o thosestimuli. Specically, we compare responses to stimuli generatedrom inputs that are either phonologically well-ormed or ill-

    ormed. Our question here is whether the ease o discriminat-ing speech rom non-speech might depend on the phonologicalwell-ormedness o these non-speech stimuli. The precise sourceo this well-ormedness eect (whether it is due to a weakereect o attention-grabbing, or a strong top-down interaction)is a question that we deer to the Section General Discussion.

    For this reason, we make no a priori predictions regarding theeect o well-ormedness on speech stimuli. Our goal here is torst establish that well-ormedness modulates the classication

    o non-speech inputs. To the extent that well-ormed stimuliare identied as speech-like, we expect that participants shouldexhibit consistent diculty in the classication o non-speech

    stimuli whose structure is well-ormed.Well-ormed stimuli, however, might also raise diculties

    or a host o acoustic reasons that are unrelated to phonologicalstructure. Our investigation attempts to distinguish phonologicalwell-ormedness rom its acoustic correlates in two ways. First,we examine the eect o well-ormedness across two dierentlanguages, using two maniestations that dier on their phonetic

    properties Experiment 1 examines the restrictions on syllablestructure in English, whereas Experiment 3 explores the con-straints on stem structure in Hebrew. Second, we demonstrate

    that the eect o well-ormedness is dissociable rom the acous-tic properties o the input. While Experiments 1 and 3 comparewell-ormed stimuli to ill-ormed counterparts, Experiment 2

    and Experiment 4 each applies the same phonetic manipula-tions to stimuli that are phonologically well-ormed. Specically,Experiment 2 shows that a phonetic manipulation comparable tothe one used in Experiment 1 ails to produce the same results orstimuli that are well-ormed, whereas Experiment 4 demonstratesthat the diculties with non-speech stimuli that are well-ormed

    in Hebrew are eliminated once the same stimuli are presented toEnglish speakers.

    Berent et al. Phonological constraints on speech identication

    Fonts n Psycholoy | Language Sciences September 2011 | Volume 2 | Article 182 | 2

    http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    3/15

    Materials. The materials included the three pairs o nasal C1C

    2VC

    3

    C1

    eC2VC

    3non-speech and speech-control continua used in Berent

    et al. (2010). Members o the pair were matched or their rhymeand the initial consonant (always an m), and contrasted on thesecond consonant either lor d(/mlI/-/mdI/, /ml/-/md/, /mlb/-/mdb/). To generate those continua, we rst had an English

    talker naturally produce the disyllabic counterparts o each pair

    member (e.g., /me

    lI/, /me

    dI/) and selected disyllables that werematched or length, intensity, and the duration o the pretonicschwa. We next continuously extracted the pretonic vowel at zerocrossings in ve steady increments, moving rom its center out-wards. This procedure yielded a continuum o six steps, rangingrom the original disyllabic orm (e.g., /m elI/) to an onset clus-ter, in which the pretonic vowel was ully removed (e.g., /mlI).

    The number o pitch periods in Stimuli 15 was 0, 2, 4, 6, and 8,respectively; Stimulus 6 (the original disyllable) ranged rom 12to 15 pitch periods.

    These natural speech continua were used to generate non-speechstimuli and speech-like stimuli using the procedure detailed inBerent et al. (2010). Briefy, non-speech materials were generated

    by deriving the rst ormant contours rom spectrograms o theoriginal speech stimuli (256 point DFT, 0.5 ms time increment,Hanning window) using a peak-picking algorithm, which alsoextracted the corresponding amplitude values. A voltage-controlledoscillator modulated by the amplitude contour was used to resyn-thesize these contours back into sounds, and the amplitude o theoutput was adjusted to approximate the original stimulus. The more

    speech-like controls were generated using a digital low-pass lterwith a slope o85 dB per octave above a cuto requency thatwas stimulus-dependent (1216 Hz or /m elI/ and /m edI/-type

    items, 1270 Hz or /m el/, 1110 Hz or /m elb/, 1347 Hz or /med/, and 1250 Hz or /m edb/-type items), designed to reducebut not eliminate the speech inormation available at requencies

    higher than the cuto requency. This manipulation was done as acontrol manipulation to acoustically alter the stimuli in a similarmanner to the non-speech stimuli, while preserving enough speechinormation or these items to be identied as (degraded) speech.

    Previous testing using these materials conrmed that they wereindeed identied as intended (speech or non-speech) by nativeEnglish participants (Berent et al., 2010). Figure 1 provides anillustration o the non-speech materials and controls; a sample othe materials is available at http://www.psych.neu.edu/aculty/i.berent/publications.htm.

    The six-step continuum or each o the three pairs was presentedin all six durations or both non-speech stimuli and speech controls,resulting in a block o 72 trials. Each such block was repeated three

    times, yielding a total o 216 trials. The order o trials within eachblock was randomized.

    Procedure. Participants were wearing headphones and seated in

    ront o the computer screen. Each trial began with a messageindicating the trial number. Participants initiated the trial bypressing the spacebar, which, in turn, triggered the presenta-tion o a xation point (+, presented or 500 ms) ollowed byan auditory stimulus. Participants were asked to determine asquickly and accurately as possible whether or not the stimulus

    corresponded to speech, and indicate their response by pressing

    PArt 1: EnGLISH SYLLABLE StructurE conStrAInS tHE

    cLASSIFIcAtIon oF non-SPEEcH StIMuLI

    Experiments 12 examine whether the classication o acousticstimuli as speech depends on their well-ormedness as English syl-lables. Participants in this experiment were presented with a mix-ture o non-speech stimuli and matched speech-like controls, and

    they were simply asked to determine whether or not each stimulus

    sounds like speech. O interest is whether the classication o thestimulus as speech depends on its well-ormedness.

    To examine this question, we simultaneously manipulated boththe phonological structure o the stimuli and their speech status.Phonological well-ormedness was manipulated along nasal-initial

    continua ranging rom well-ormed disyllables (e.g., /m elI/, /m e

    dI) to ill-ormed monosyllables (e.g., /mlI/, /mdI/). To generatethese continua, we rst had a native English talker naturally produce

    a disyllable that began with a nasalschwa sequence either one ol-lowed by a liquid (e.g., /m elI/) or one ollowed by a stop (e.g., /m e

    dI/). We then gradually excised the schwa in ve steps until, at thelast step, the schwa was entirely removed, resulting in a CCVC mono-syllable either /mlI/ or /mdI/ (we use C and V or consonants and

    vowels, respectively). The CCVCCe

    CVC continuum thus presentsa gradual contrast between well-ormed inputs (in step 6, C eCVC)

    and ill-ormed ones (in step 1, CCVC). Among these two CCVCmonosyllables, mdI-type stimuli are worse ormed than their mlI-type counterparts (Berent et al., 2009). Although past research hasshown that English speakers are sensitive to the mlmddistinctiongiven these same materials both speech and non-speech (Berentet al., 2009, 2010, in press), it is unclear whether this subtle contrastcan modulate perormance in a secondary speech-detection task.

    Our main interest, however, concerns the contrast between CCVCmonosyllables (either mlifor mdif) and their C eCVC counterparts.

    Across languages, complex onsets (in the monosyllable mlif) areworse ormed than simple onsets (in the disyllable melif, Prince and

    Smolensky, 1993/2004). Nasal-initial complex onsets, moreover, areutterly unattested in English. Accordingly, the monosyllables at step1 o our continuum are clearly ill-ormed relative to the disyllabicendpoints. Our question here is whether ill-ormedness o those

    monosyllables would acilitate their classication as non-speech.To address this question, we next modied those continua to gen-

    erate non-speech inputs and speech controls. Non-speech stimuliwere produced by resynthesizing the rst ormant o the naturalspeech stimuli. To assure that dierences between non-speechand speech-like stimuli are not artiacts o the re-synthesis pro-

    cess, we compared those non-speech stimuli to more speech-likeinputs that were similarly ltered. I well-ormed stimuli are morespeech-like, then non-speech responses should be harder to make or

    well-ormed non-speech stimuli those corresponding to the disyl-labic items compared to ill-ormed monosyllables. Accordingly,the identication o non-speech stimuli should be modulated by

    vowel duration. Experiment 1 veries this prediction; Experiment 2rules out alternative non-linguistic explanations or these ndings.

    ExPErIMEnt 1

    Method

    Participants.Ten native English speakers, students at NortheasternUniversity took part in the experiment in partial ulllment ocourse requirements.

    Berent et al. Phonological constraints on speech identication

    www.ontsn.o September 2011 | Volume 2 | Article 182 | 3

    http://www.frontiersin.org/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    4/15

    To determine whether speech and non-speech inputs wereaected by syllable structure, we rst compared speech and non-speech inputs by means o a 2 speech status 6 vowel duration 2

    continuum type (mlvs. md) ANOVA. Because each condition in

    this experiment includes only three items, these analyses were con-ducted using only participants as a random variable. The analysiso response accuracy only produced a signicant main eect ospeech status [F(1, 9) = 7.12, MSE = 0.0052,p< 0.03], indicat-ing that people responded more accurately to speech-like stimulicompared to their non-speech counterparts. No other eect was

    signicant (allp> 0.11).The analysis o response time, however, yielded a reliable eect

    o vowel duration [F(5, 45) = 5.20, MSE = 978,p< 0.0008] as wellas a signicant three way interaction [F(5, 45) = 2.42, MSE = 1089,

    p = 0.050]. No other eect was signicant (allp> 0.13). We thus

    one o two keys on the computers numeric keypad (1 = speech,2 = non-speech). Their response was timed relative to the onseto the stimulus. Slow (responses slower than 1000 ms) and inac-curate responses triggered a warning message rom the computer.

    Prior to the experiment, participants received a short practicesession with similar items that did not appear in the experimentalsession.

    Results and Discussion

    Outliers (correct responses alling 2.5 SD beyond the mean, oraster than 200 ms, less than 3% o the total correct responses)were removed rom the analyses o response time. Mean response

    time and response accuracy are provided in Table 1. An inspectiono those means conrmed that participants indeed classied thespeech and non-speech stimuli as intended (M= 97%).

    Figure 1 | Spctoams o mdif (n stp 1). Top panel shows the non-speech stimulus; the bottom panel depicts the speech control.

    Berent et al. Phonological constraints on speech identication

    Fonts n Psycholoy | Language Sciences September 2011 | Volume 2 | Article 182 | 4

    http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    5/15

    The ANOVAs (6 vowel duration 2 continuum) conductedon speech-like stimuli produced no signicant eects in eitherresponse time (all p> 0.15) or accuracy (all F< 1). The linguis-tic structure o the stimuli also did not reliably aect responseaccuracy to non-speech inputs (allp> 0.15). In contrast, responsetime to non-speech stimuli was reliably modulated by their lin-

    guistic structure. The 6 vowel duration 2 continuum ANOVA on

    non-speech stimuli yielded a signicant eect o vowel duration[F(5, 45) = 4.12, MSE = 979,p

  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    6/15

    Method

    Participants. Ten native English speakers, students at NortheasternUniversity took part in this experiment in partial ulllment o acourse requirement.

    Materials and Procedure. The materials corresponded to the same

    three pairs o naturally produced disyllables used in Experiment

    1. For each such disyllable, we gradually decreased the durationo the tonic vowel using a procedure identical to the one appliedto the splicing o the pretonic vowel in Experiment 1. We rstdetermined the portion o the tonic vowel slated or removal by aidentiying a segment o 70 ms (1214 pitch periods), matched tothe duration o the pretonic vowel (M= 68 ms, ranging rom 12

    to 15 pitch periods). This segment was measured rom the centero the vowel outwards, and it included some coarticulatory cues.The entire tonic vowel (unspliced) was presented in step 6. Wenext proceed to excise this segment in steady increments, suchthat steps 51 had 8, 6, 4, 2, and 0 pitch periods remaining out othe pitch periods slated or removal. Despite removing a chunk

    o the tonic vowel, the items in step 1 were clearly identied as

    disyllabic, and their remaining tonic vowel averaged 38 ms (7.16pitch periods).

    The resulting speech continua were next used to orm non-speech and speech control stimuli along the same method usedin Experiment 1. The procedure was the same as in Experiment 1.

    Results and Discussion

    Outliers (responses aster than 2.5 SD rom the means, less than3% o all correct observations) were excluded rom the analyseso response time. Mean response time or speech and non-speech

    stimuli as a unction o the duration o the tonic vowel is presentedin Figure 3 (the accuracy means are provided in Table 2).

    An inspection o means showed no evidence that responses to

    non-speech stimuli were monotonically linked to the duration othe tonic vowel. A 2 speech status 2 continuum type 6 vowelANOVA yielded a reliable eect o continuum type [response

    accuracy: F(1, 9) = 5.23, MSE = 0.007, p< 0.05; response time:F(1, 8) = 10.69, MSE = 1413, p< 0.02], indicating that md-typestimuli were identied more slowly and less accurately than theirml-type counterparts. Because this eect did not depend on vowelduration, the diculty with md-type stimuli is most likely due tothe acoustic properties o those stimuli, rather than their phono-

    logical structure. The only other eect to approach signicancewas that o speech status (speech vs. non-speech) on responsetime [F(1, 8) = 4.97, MSE = 4757,p< 0.06]. No other eects weresignicant (p> 0.17).

    Unlike Experiment 1, where speech stimuli were identied morereadily than non-speech, in the present experiment, speech stimuli

    produced slower responses than their non-speech counterparts.This nding is consistent with the possibility that well-ormed non-speech stimuli tend to be identied as speech, and consequently,they are harder to discriminate rom non-speech-like inputs. Indeedthe discrimination (d) o speech rom non-speech was lower inExperiment 2 (d = 2.69) relative to Experiment 1 (d = 3.82).

    Crucially, however, unlike Experiment 1, response to non-

    speech stimuli in the present experiment was not modulated byvowel duration. A separate 2 continuum type 6 vowel duration

    ExPErIMEnt 2

    The diculties in responding to well-ormed non-speech stimulicould indicate that the classication o non-speech is modulated byphonological well-ormedness. Such diculties, however, could alsoresult rom non-linguistic reasons. One concern is that the longerresponses to the well-ormed disyllables are an artiact o their longer

    acoustic duration. This explanation, however, is countered by the

    nding that the very same duration manipulation had no measur-able eect on the identication o speech stimuli, so it is clear thatresponse time does not simply mirror the acoustic duration o thestimuli.

    Our vowel duration manipulation, however, could have nonethe-

    less aected other attributes o these stimuli that are unrelated towell-ormedness. One possibility is that the acoustic cues associatedwith vowels are more readily identied as speech, so stimuli withlonger vowels are inherently more speech-like than short-vowelstimuli. Another explanation attributes the speechiness o thedisyllabic endpoints to splicing artiacts. Recall that, unlike the other

    ve steps, the sixth endpoint was produced naturally, unspliced. Itsgreater resemblance to speech could thus result rom the absence

    o splicing.Experiment 2 addresses these possibilities by dissociating these

    two acoustic attributes rom linguistic well-ormedness. To thisend, Experiment 2 employs the same vowel manipulation used in

    Experiment 1, except that the excised vowel was now the tonic (e.g.,/I/ in m edI), rather than the pretonic vowel (i.e., the schwa). Weapplied this manipulation to the same naturally produced disyllablesused in Experiment 1, and we gradually decreased the vowel dura-tion along the same six continuum-step employed in Experiment

    1, such that the dierence between steps 1 and 6 in the two experi-ment was closely matched. This manipulation thus replicates the twoacoustic characteristics o the pretonic vowel continua it graduallydecreases the acoustic energy o a vowel, and it contrasts between

    spliced vowels (in step 15) and unspliced ones (in step 6). UnlikeExperiment 1, however, this manipulation did not ully eliminatethe vowel but only reduced its length, such that the short- and long-

    endpoints were clearly identied as disyllables. Since the vowelendpoints do not contrast phonologically in English, the increasein the duration o the tonic vowel (in Experiment 2) does not alterthe phonological structure o these stimuli.

    I the diculty responding to non-speech stimuli with longerpretonic vowels (in Experiment 1) is due to the acoustic properties

    o vowels, then non-speech stimuli with longer tonic vowels (inExperiment 2) should be likewise dicult to classiy. Similarly, ithe advantage o non-speech stimuli in steps 15 (relative to theunspliced sixth step) results rom their splicing, then these spliced

    steps should show a similar advantage in the present experiment.In contrast, i the speechiness o disyllables is due to their pho-

    nological well-ormedness, then responses to non-speech stimuliin Experiment 2 should be unaected by vowel duration. Suchwell-ormedness eects, moreover, should also be evident in theoverall pattern o responses to non-speech stimuli. Because the non-speech stimuli used in this experiment are all well-ormed disylla-bles, we expect their structure to inhibit the non-speech response.

    Consequently, participants in Experiment 2 should experiencegreater diculty in distinguishing non-speech stimuli rom theirspeech-like counterparts.

    Berent et al. Phonological constraints on speech identication

    Fonts n Psycholoy | Language Sciences September 2011 | Volume 2 | Article 182 | 6

    http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    7/15

    To urther bolster this conclusion, we next compared the responsesto non-speech across the two experiments using a 2 Experiment 2continuum type 6 vowel duration ANOVA (see Figure 4)1. Theanalysis o response time yielded a signicant eect o continuumtype [F(1, 17) = 6.08, MSE = 977, p< 0.03] and a reliable threeway interaction [F(5, 85) = 2.42, MSE = 1142,p< 0.05]. No other

    eects were signicant (allp> 0.19). To interpret this interaction,we next examined responses to the two continuum types (mlvs.md) separately, using a 2 Experiment 6 vowel duration ANOVAs.The analysis o the ml-continuum yielded no reliable eects (all

    p> 0.14). In contrast, the md-continuum yielded a marginally sig-nicant interaction [F(5, 85) = 2.23, MSE = 1392,p< 0.06]. Noother eect was signicant (all p> 0.32). We urther interpreted

    the eect o vowel duration by testing or the simple main eecto vowel duration or the tonic vs. pretonic vowels, separately (inExperiment 2 vs. 1). Vowel duration was signicant only or thepretonic vowel condition [F(5, 45) = 5.47, MSE = 746,p< 0.0006],but not in the tonic vowel condition [F(5, 40) < 1, MSE = 2118].

    The attenuation o the tonicpretonic contrast or the ml-

    continuum is likely due to phonetic actors. As noted earlier,the md-items exhibit a phonetic biurcation due to the silenceassociated with the stop, and or this reason, disyllabicity mightbe more salient or md-items. The absence o a vowel eect in

    the ml-continuum indicates that merely increasing the durationo the vowel whether it is tonic pretonic is insucient to

    impair the identication o non-speech stimuli. Results with themd-continuum, however, clearly show that vowel duration had

    o the non-speech stimuli conrmed that responses to non-speechinputs were unaected by vowel duration (F< 1, in response timeand accuracy); the interaction also did not approach signicance(F< 1, in response time and accuracy).

    Given that the tonic vowel manipulation (in Experiment 2)closely matched the pretonic vowel manipulation (in Experiment

    1), the connement o the vowel eect to non-speech stimuli inExperiment 1 suggests that this eect specically concerns the well-ormedness o non-speech inputs, rather than vowel durationper se.

    Figure 3 | rspons tm to spch and non-spch stml as a ncton o vowl daton (n expmnt 2). Error bars refect condence intervals,

    constructed or the dierence among means along each o the vowel duration continua (i.e., speech and non-speech).

    Tabl 2 | Man spons accacy and spons tm to spch and

    non-spch stml n expmnt 2.

    Vowl Stmls typ

    daton

    Spch Non-spch

    ml md ml md

    Response accuracy 1 0.90 0.92 0.98 0.84

    2 0.91 0.90 0.91 0.843 0.94 0.90 0.91 0.88

    4 0.96 0.92 0.9 0.9

    5 0.96 0.94 0.93 0.87

    6 0.91 0.90 0.92 0.87

    Response time (ms) 1 631 628 610 624

    2 630 649 603 624

    3 603 657 621 633

    4 633 651 628 607

    5 630 641 602 622

    6 633 676 612 627

    Mean (accuracy) 0.93 0.91 0.93 0.87

    Mean (RT) 627 651 613 623

    1Similar analyses conducted on the speech stimuli yielded only a signicant eecto vowel duration [F(5, 90) = 2.48, MSE = 1361,p< 0.04] and a marginally signi-cant eect o experiment [F(1, 18) = 3.81, MSE = 28538, p< 0.07] no othereects were signicant (p> 0.13). A comparison o the speech and non-speechstimuli across the two experiments (2 Experiment 2 speech status 2 continuum

    type 6 vowel duration) yielded a reliable our-way interaction [F(5, 85) = 2.84,MSE = 1089,p< 0.03].

    Berent et al. Phonological constraints on speech identication

    www.ontsn.o September 2011 | Volume 2 | Article 182 | 7

    http://www.frontiersin.org/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    8/15

    Experiments 34 urther test this hypothesis by seeking con-verging evidence rom an unrelated phenomenon in another lan-guage Hebrew. To demonstrate that the eect o well-ormedness

    on non-speech is not specic to conditions that require its com-parison to edited speech stimuli (resynthesized or spliced), wecompared non-speech stimuli with naturally produced speech.As in the case o English, we compared the ease o speech/non-speech discrimination or stimuli that were either phonologicallywell-ormed or ill-ormed. In the case o Hebrew, well-ormedness

    is dened by the location o identical consonants in the stem either initially (e.g., titug), where identical consonants are ill-ormed in Semitic languages, or nally (e.g., gitut), where theyare well-ormed. Accordingly, the phonetic characteristics o

    distinct eects on tonic and pretonic stimuli. While increasingthe duration o the pretonic vowel impaired the identicationo non-speech, the same increase in vowel length had no meas-

    urable eect when it concerned the tonic vowel a phoneticcontrast that does not aect well-ormedness. The nding thatidentical vowel manipulations aected the identication o non-

    speech stimuli in a selective manner only when it concernedthe pretonic vowel, and only with the md-continuum conrmsthat this eect is inexplicable by vowel duration per se. Merelyincreasing the duration o a vowel is insucient to impair theclassication o non-speech stimuli as such. Together, the nd-ings o Experiments 12 suggest that well-ormed structures are

    perceived as speech-like.

    Figure 4 | Th ct o th vowl manplaton (tonc vs. ptonc) on th dntfcaton o non-spch stml dvd om th ml- vs. md-contna. Error

    bars refect condence intervals constructed or the dierence between the means.

    Berent et al. Phonological constraints on speech identication

    Fonts n Psycholoy | Language Sciences September 2011 | Volume 2 | Article 182 | 8

    http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    9/15

    identical consonants (e.g., titug, gitut) and the ABB and ABC ormswere urther matched or the co-occurrence o their consonantsin Hebrew roots. The speech stimuli were recorded naturally, by anative Hebrew speaker these materials were previously used inBerent et al. (2007b; Experiment 6), and they are described there indetail (see Berent et al., 2007b, Appendix A, or the list o stimuli).

    As noted there, the three types o stimuli did not dier reliably

    on their acoustic durations [F< 1; or AAB items: M= 1191 ms(SD = 108 ms); or ABB items:M= 1195 ms (SD = 103 ms); orABC items:M= 1171 ms (SD = 101 ms)].

    We next generated non-speech stimuli by adding together threesynthetic sound components derived rom the original stimulus

    waveorms. The rst, low-requency component was produced bylowpass ltering the stimulus waveorms at 400 Hz (slope o85 dBper octave) to isolate the rst ormant, and deriving a spectralcontour o the rst ormant requency values rom spectrogramso the ltered speech stimuli (256 point DFT, 0.5 ms time incre-ment, Hanning window) using a peak-picking algorithm, which

    also extracted the corresponding amplitude values to produce anamplitude contour. Next, this low-requency spectral contour was

    shited up in requency by multiplying it by 1.47, and then resynthe-sized into a sound component using a voltage-controlled oscillatormodulated by the amplitude contour. The second, intermediate-requency sound component was produced by bandpass ltering

    the original stimulus waveorms between 2000 and 4000 Hz (slopeo85 dB per octave), and deriving a single spectral contour othe requency values in this intermediate range rom spectrogramso the ltered speech stimuli (256 point DFT, 0.5 ms time incre-ment, Hanning window) using a peak-picking algorithm, whichalso extracted the corresponding amplitude values to produce

    an amplitude contour. Next, this intermediate spectral contourwas shited down in requency by multiplying it by 0.79, and thenresynthesized into a sound component using a voltage-controlled

    oscillator modulated by the amplitude contour. The third, high-requency sound component was produced by bandpass lteringthe original stimulus waveorms between 4000 and 6000 Hz (slopeo85 dB per octave), and deriving a single spectral contour o

    the requency values in this high range rom spectrograms o theltered speech stimuli (256 point DFT, 0.5 ms time increment,Hanning window) using a peak-picking algorithm, which alsoextracted the corresponding amplitude values to produce an ampli-tude contour. These three components were then summed togetherwith relative amplitude ratios o 1.0:0.05:2.0 (low-requency com-

    ponent: intermediate-requency component: high-requency com-ponent) to produce the non-speech version o each stimulus. Thestructure o these non-speech stimuli and their natural speech

    counterparts is illustrated in Figure 5 (a sample o the materialsis available at http://www.psych.neu.edu/aculty/i.berent/publica-tions.htm). The experimental procedure was the same as in the

    previous experiments.

    Results and Discussion

    Figure 6 plots mean response time or speech and non-speechstimuli as a unction o stem structure (the corresponding accu-

    racy means are provided in Table 3). An inspection o the meanssuggests that speech and non-speech stimuli were readily identied

    well-ormedness dier markedly rom the ones considered orEnglish. To the extent that stimuli corresponding to well-ormedstructures are consistently harder to classiy as non-speech (acrossdierent phonetic maniestations and languages), such a conver-gence would strongly implicate phonological structure as the

    source o this phenomenon.

    PArt 2: IdEntItY rEStrIctIonS on HEBrEW StEMSExtEnd to non-SPEEcH StIMuLI

    Like many Semitic languages, Hebrew restricts the location o iden-tical consonants in the stem: AAB stems (e.g., titug), where identicalconsonants occur at the let edge, are ill-ormed, whereas theirABB counterparts (e.g., with identical consonants at the right edge,

    gitut) are well-ormed (Greenberg, 1950). A large body o literature

    shows that Hebrew speakers are highly sensitive to this restrictionand they reely generalize it to novel orms. Specically, novel-AABorms are rated as less acceptable than ABB counterparts (Berentand Shimron, 1997; Berent et al., 2001a), and because novel-AABstems (e.g., titug) are ill-ormed, people classiy them as non-wordsmore rapidly than ABB/ABC controls (e.g., gitut, migus) in the

    lexical decision task (Berent et al., 2001b, 2002, 2007b) and theyignore them more readily in Stroop-like conditions (Berent et al.,2005). Given that AAB Hebrew stems are clearly ill-ormed, wecan now turn to examine whether their structure might aect theclassication o non-speech stimuli. I phonologically ill-ormedstimuli are, in act, more readily identiable as non-speech, then,

    ill-ormed AAB Hebrew stems should be classied as non-speechmore easily than their well-ormed (ABB and ABC) counterparts.

    To examine this prediction, Experiment 3 compares the classi-cation o three types o novel stems. Members o all three stem typesare unattested in Hebrew, but they dier on their well-ormedness.One group o stimuli, with an AAB (e.g., titug) structure is ill-

    ormed, whereas the two controls ABB (e.g.,gitut) and ABC (e.g.,

    migus) are well-ormed. These items were recorded by a nativeHebrew talker, and they were presented to participants in two or-mats: either unedited, as natural speech, or edited, such that theywere identied as non-speech. Participants were asked to rapidlyclassiy the stimulus as either speech or non-speech. I ill-ormed

    stimuli are less speech-like, then non-speech stimuli with an AABstructure should elicit aster responses compared to their well-ormed counterparts, ABB or ABC stimuli.

    ExPErIMEnt 3

    Method

    Participants.Twenty-our native Hebrew speakers, students at theUniversity o Haia, Israel, took part in the experiment or payment.

    Materials. The materials corresponded to 30 triplets o speech stim-uli along with 30 triplets o non-speech counterparts. All materialswere non-words, generated by inserting novel consonantal roots(e.g., ttg) in the vocalic nominal template C

    1iC

    2uC

    3 the tem-

    plate o mishkal Piul (e.g., ttg+ C1iC

    2uC

    3titug). In each such

    triplet, one stem had identical consonants at its let edge (AAB),another had identical consonants at the right edge (ABB), and a

    third member (ABC) had no identical consonants (e.g., titug, gitut,migus). Within a triplet, AAB and ABB orms were matched or their

    Berent et al. Phonological constraints on speech identication

    www.ontsn.o September 2011 | Volume 2 | Article 182 | 9

    http://www.frontiersin.org/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    10/15

    was signicant (all p> 0.14). We next proceeded to interpretthe interaction by testing or the simple main eects o stemstructure and o speechiness, ollowed by planned orthogonalcontrasts (Kirk, 1982).

    Consider irst the eect o stem structure. An analysis onon-speech stimuli yielded a reliable eect o stem structure

    in the analyses o response time [F1(2, 46) = 7.14, MSE = 2312,

    p< 0.003; F2(2, 58) = 7.36, MSE = 3257, p< 0.002]. Plannedcomparisons showed that responses to ill-ormed, AABstems were aster than their well-ormed counterparts, eitherABB [t

    1(46) = 2.31, p< 0.03; t

    2(58) = 2.12, p< 0.04] or ABC

    [t1(46) = 3.74,p< 0.0006; t

    2(58) = 3.83,p< 0.0004] stems, which,

    in turn, did not dier [t1(46) = 1.43,p> 0.15, n.s.; t

    2(58) = 1.71,

    p> 0.09]. Similar analyses conducted on speech materials pro-

    duced no reliable eect o well-ormedness (all F < 1). Thus,the advantage o ill-ormed AAB stimuli was speciic to non-speech inputs.

    as intended. Moreover, well-ormedness selectively modulatedresponses to non-speech stimuli: ill-ormed AAB structures wereidentied more readily than their well-ormed controls ABB andABC given non-speech stimuli, but no such eect emerged withspeech inputs.

    These conclusions are borne out by the outcomes o the 2speech status (speech/non-speech) 3 stem-type (AAB/ABB/

    ABC) ANOVA. Since in this experiment, the conditions o interestare each represented by 30 dierent items, these analyses wereconducted using both participants and items as random vari-ables. The ANOVAs yielded a reliable interaction [In response

    time: F(2, 46) = 6.30, MSE = 1967,p< 0.004; F(2, 58) = 7.78,MSE = 2452, p < 0.002; In response accuracy: both F < 1]as well as a marginally signicant eect o speech status [Inresponse accuracy: F

    1(1, 23) = 3.03, MSE = 0.036,p< 0.10, F

    2(1,

    29) = 82.90, MSE = 0.001,p< 0.0001; In response time: F1< 1;

    F2(1, 29) = 21.25, MSE = 3235, p< 0.00008]. No other eect

    Figure 5 | Spctoams o th novl Hbw wod tt. The top panel is a non-speech stimulus; the bottom one depicts natural speech.

    Berent et al. Phonological constraints on speech identication

    Fonts n Psycholoy | Language Sciences September 2011 | Volume 2 | Article 182 | 10

    http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    11/15

    phonological structure rom the acoustic properties o ill-ormedstimuli using a complementary approach. Here, we maintained the

    acoustic properties by using the same stimuli as Experiment 3, butwe altered their phonological well-ormedness by presenting theseitems to a group o English speakers. English does not systemati-

    cally restrict the location o identical consonants in stems, andour past research suggested that, to the extent English speakersconstrain the location o identical consonants, their preerence

    is opposite to Hebrew speakers, showing a slight preerence orAAB orms (Berent et al., 2002, ootnote 7). Clearly, English speak-ers should not consider AAB items ill-ormed. I the tendency oHebrew speakers to classiy AAB stems as non-speech-like is dueto the acoustic properties o these items, then the results o Englishshould mirror the Hebrew participants. I, in contrast, the easier

    classication o AAB stimuli as speech is due to their phonologicalstructure, then the ndings rom English speakers should divergewith Hebrew participants.

    Method

    Participants. Twenty-our English speakers, students atNortheastern University, took part in this study in partial ulll-ment o a course requirement.

    The materials and procedure were identical to Experiment 3.

    Results

    Mean response time and response accuracy to speech and non-

    speech stimuli is provided in Figure 7 (the accuracy means arelisted in Table 4). An inspection o the means suggests that, unlikeHebrew speakers, English participants responses to non-speech

    Tests o the simple main eect o speech status urtherindicated that non-speech stimuli promoted aster responsesthan speech stimuli given ill-ormed AAB structures [F

    1(1,

    23) = 5.56, MSE = 5296,p< 0.03; F2(1, 29) = 31.03, MSE = 2966,

    p< 0.0001], but not reliably so with their well-ormed counter-parts, either ABB [F

    1(1, 23) < 1; F

    2(1, 29) = 4.10, MSE = 3350,

    p< 0.06], or ABC (both F< 1) stems. These results demonstratethat ill-ormedness acilitated the classiication o non-speech

    stimuli.

    ExPErIMEnt 4

    The persistent advantage o non-speech stimuli that are phono-logically ill-ormed across dierent structural maniestations andlanguages is clearly in line with our hypothesis that speechinessdepends, inter alia, on phonological well-ormedness. The act

    that similar acoustic manipulations ailed to produce the eectgiven well-ormed stimuli (in Experiment 2) oers urther evi-dence that the advantage concerns phonological structure, ratherthan acoustic attributes. Experiment 4 seeks to urther dissociate

    Figure 6 | Man spons tm o Hbw spaks to spch and non-spch npts as a ncton o th phonolocal wll-omdnss n Hbw. Error

    bars refect condence intervals constructed or the dierence between the three types o stem structures, constructed separately or speech and non-speech stimuli.

    Tabl 3 | Man spons accacy (popoton coct) n expmnt 3.

    Stct Stmls typ

    Spch Non-spch

    AAB 0.99 0.93

    ABB 0.99 0.93

    ABC 0.99 0.94

    Berent et al. Phonological constraints on speech identication

    www.ontsn.o September 2011 | Volume 2 | Article 182 | 11

    http://www.frontiersin.org/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    12/15

    non-speech AAB stems stems that are ill-ormed in their language,English participants in the present experiment were utterly insensi-tive to the structure o the same non-speech stimuli. And indeed,English does not systematically constrain the location o identi-cal consonants in the stem. The selective sensitivity o Hebrew,but not English speakers to the structure o non-speech stimuli

    demonstrates that this eect refects linguistic knowledge, rather

    than the acoustic properties o those stimuli.While stem structure did not aect the responses o English

    participants to non-speech inputs, it did modulate their responsesto speech stimuli: speech stimuli with identical consonants eitherAAB or ABB were identied aster than ABC controls. Since

    English speakers did not dierentiate AAB and ABB stems, thiseect must be due to reduplicationper se, rather than to its loca-tion. Indeed, many phonological systems produce identical con-sonants by a productive grammatical operation o reduplication(McCarthy, 1986; Yip, 1988; Suzuki, 1998). It is thus conceivablethat English speakers encode AAB and ABB stems as phonologi-

    cally structured, and consequently, they consider stems as betterormed than no-identity controls. The sensitivity o English speak-

    stimuli were utterly unaected by stem structure. Stem structure,however, did modulate responses to speech stimuli, such that

    stems with identical consonants produced aster responses thanno-identity controls.

    A 2 speech status (speech/non-speech) 3 stem-type (AAB/ABB/ABC) ANOVA indeed yielded a marginally signicant interac-tion in the analyses o response time [F

    1(2, 46) = 3.14, MSE = 772,

    p

  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    13/15

    unamiliar, as these particular items (mdif-type monosyllables) while structurally well-ormed happened to be unattested inthe participants language (Russian; Berent et al., 2010). Likewise,the restriction on identical Hebrew consonants generalizes acrossthe board, to novel segments and phonemes (e.g., Berent et al.,2002), and computational simulations have shown that such

    generalizations all beyond the scope o several non-algebraic

    mechanisms (Marcus, 2001; Berent et al., in press). The algebraicproperties o phonological generalizations, on the one hand, andtheir demonstrable dissociation with acoustic amiliarity, on theother, suggest that the knowledge available to participants spe-cically concerns grammatical well-ormedness2. Whether suchalgebraic knowledge modulates the identication o non-speechstimuli, specically, remains to be seen. But regardless o what

    linguistic knowledge is consulted in this case, it is clear that somestructural attributes o the linguistic message inorm the clas-sication o auditory stimuli as speech.

    Why are well-ormed non-speech stimuli harder to classiy?Earlier, we proposed two possible loci or the eect o phonologicalwell-ormedness. One possibility is that well-ormedness aects

    the allocation o attention to acoustic stimuli. In this account,well-ormed stimuli engage attentional resources that are necessaryor the speech-discrimination task, and consequently, the clas-sication o non-speech stimuli suers compared to ill-ormedstructures. On a stronger interactive account, well-ormednessinorms the evaluation o acoustic inputs by the language systemitsel. In this view, the output o the phonological grammar eeds

    back into the evaluation o the input, such that better-ormedinputs are interpreted as more speech-like. While the weak atten-tion view and strong interactive accounts both predict dicultieswith well-ormed non-speech stimuli, they dier with respect totheir predictions or speech inputs. The strong interactive accountpredicts that well-ormed speech stimuli should be easier to rec-

    ognize as speech, whereas the attention-grabbing explanationpredicts that well-ormed speech stimuli should likewise engageattention resources, hence, they should be harder to classiy thanill-ormed counterparts.

    While our results are not entirely conclusive on this question,two observations avor the stronger interactive perspective. First,

    well-ormedness impaired the identication o non-speech stimuliin Experiment 1, but it had no such eect on speech stimuli. Therelevant (speech statusvowel duration) interaction, however, wasnot signicant, so the interpretation o this nding requires somecaution. Stronger dierential eects o well-ormedness obtainedin Experiment 3. Here, well-ormedness selectively impaired the

    classication o non-speech stimuli. Moreover, well-ormedness

    produced the opposite eect on speech inputs (in Experiment 4):well-ormed inputs with reduplication were identied more read-ily as speech. Although these conclusions are limited in as muchas the contrasting ndings or speech and non-speech stimulicome rom dierent experiments (Experiments 3 vs. 4), these

    ers to consonant-reduplication is remarkable or two reasons. First,reduplication is not systematically used in English, so the sensitivityo English speakers to reduplication might refect the encoding o

    a well-ormedness constraint that is not directly evident in theirown language (or other evidence consistent with this possibility,see Berent et al., 2007a). Second, the nding that reduplicated (AABand ABB) stems are more readily recognized as speech suggests that

    well-ormedness aects not only non-speech stimuli but also theprocessing o speech inputs.

    GEnErAL dIScuSSIon

    Much research suggests that people can extract linguistic messagesrom auditory carriers that they classiy as non-linguistic (e.g.,Remez et al., 1981, 2001). Here, we examine whether structural

    aspects o linguistic messages can inorm the classication o thesemessengers as linguistic. Four experiments gauged the eect ophonological well-ormedness on the discrimination o non-speechstimuli rom various speech controls. In Experiment 1, we showedthat English speakers experience diculty in the classication onon-speech stimuli generated rom syllables that are phonologically

    well-ormed (e.g., melif) compared to ill-ormed counterparts (e.g.,mlif). Experiment 3 replicated this eect using a second maniesta-tion o well-ormedness in a dierent language the restrictionson identical consonants in Hebrew stems. Once again, participants(Hebrew speakers) experienced diculties responding to non-speech stimuli that are well-ormed in their language (e.g.,gitut)compared to ill-ormed controls (e.g., titug). The converging di-

    culties across diverse maniestations o well-ormedness suggestthat these eects are likely due to phonological structure, ratherthan the acoustic properties o these stimuli.

    Experiments 2 and 4 urther support this conclusion by show-ing that acoustic manipulations similar to the ones in Experiments1 and 3, respectively, ail to produce such diculties once well-

    ormedness is held constant. Specically, Experiment 2 showedthat merely increasing the duration o a vowel (a manipulation thatmimics the mlifm elifcontrast rom Experiment 1) is insucientto impair the classication o non-speech stimuli once long- andshort-vowel items are both well-ormed in participants language

    (English). Similarly, Experiment 4 showed that non-speech itemsthat are well-ormed in Hebrew present no diculties or Englishparticipants. The convergence across two dierent manipulationso well-ormedness, on the one hand, and its divergence with theoutcomes o similar (or even identical) acoustic manipulations, onthe other, suggest that the observed diculties with well-ormed

    non-speech stimuli are due to productive linguistic knowledge.As such, these results suggest that structural properties o linguis-

    tic messages inorm the classication o acoustic messengers aslinguistic.

    While our present results do not directly speak to the natureo the knowledge consulted by participants, previous ndings

    suggest that it is inexplicable by amiliarity either statisticalknowledge or amiliarity with the coarse acoustic propertieso the language (e.g., amiliarity accounts such as Rumelhartand McClelland, 1986; Goldinger and Azuma, 2003; Iversonet al., 2003; Iverson and Patel, 2008; Yoshida et al., 2010). Recall,or example, that well-ormed nasal-initial sequences exhib-

    ited a stronger speechiness eect even when the stimuli were

    2Note that our conclusions concern cognitive architecture, not its neural instan-tiation. The account outlined here is perectly consistent with the possibility thatphonological knowledge reshapes auditory brain areas, including low-level substra-tes. At the unctional level, however, such changes support generalizations that arediscrete and algebraic.

    Berent et al. Phonological constraints on speech identication

    www.ontsn.o September 2011 | Volume 2 | Article 182 | 13

    http://www.frontiersin.org/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    14/15

    to universal grammar. Science 275,

    16041610.

    Remez, R. E., Pardo, J. S., Piorkowski, R.

    L., and Rubin, P. E. (2001). On the

    bistability o sine wave analogues o

    speech. Psychol. Sci. 12, 2429.

    Remez, R. E., Rubin, P. E., Pisoni, D. B., and

    Carrell, T. D. (1981). Speech percep-

    tion without traditional speech cues.

    Science 212, 947949.

    Rogalsky, C., Rong, F., Saberi, K., and

    Hickock, G. (2011). Functional anat-

    omy o language and music percep-

    tion: temporal and structural actors

    investigated using unctional magnetic

    resonance imaging. J. Neurosc i. 31,

    28433852.

    Rumelhart, D. E., and McClelland, J. L.

    (1986). On learning past tense o

    english verbs: implicit rules or paral-

    lel distributed processing? in Parallel

    Distributed Processing: Explorations in

    the Microstructure of Cognition, Vol. 2,

    eds D. E. Rumelhart, J. L. McClelland,

    and T. P. R. Group (Cambridge, MA:MIT Press), 216271.

    Sandler, W., Arono, M., Meir, I., and

    Padden, C. (2011). The gradual emer-

    gence o phonological orm in a new

    language.Nat. Lang. Linguist. Theory

    29, 505543.

    Sandler, W., and Lillo-Martin, D. C.

    (2006). Sign Language and Linguistic

    Universals. Cambridge: Cambridge

    University Press.

    Shultz, S., and Vouloumanos, A. (2010).

    Three-month-olds preer speech to

    other naturally occurring signals.

    Lang. Learn. Dev. 6, 241257.

    Suzuki, K. (1998). A Typo lo gi cal

    Investigation of Dissimilation. Tucson,AZ: University o Arizona.

    Telkemeyer, S., Rossi, S., Koch, S. P.,

    Nierhaus, T., Steinbrink, J., Poeppel,

    D., Obrig, H., and Wartenburger, I.

    (2009). Sensitivity to newborn audi-

    tory cortex to the temporal structure o

    sounds.J. Neurosci.29, 1472614733.

    Trout, J. (2003). Biological specialization

    or speech: what can the animals tell

    us?Curr. Dir. Psychol. Sci. 12, 155159.

    Van Orden, G. C., Pennington, B. F.,

    and Stone, G. O. (1990). Word

    Liberman, A. M., Cooper, F. S.,

    Shankweiler, D. P., and Studdert-

    Kennedy, M. (1967). Perception o the

    speech code.Psychol. Rev.74, 431461.

    Liberman, A. M., and Mattingly, I. G.

    (1989). A specialization or speech

    perception. Science 243, 489494.

    Liebenthal, E., Binder, J. R., Piorkowski,

    R. L., and Remez, R. E. (2003). Short-

    term reorganization o auditory analy-

    sis induced by phonetic experience.J.

    Cogn. Neurosci. 15, 549558.

    Lukatela, G., Eaton, T., Sabadini, L., and

    Turvey, M. T. (2004). Vowel duration

    aects visual word identication: evi-

    dence that the mediating phonology is

    phonetically inormed.J. Exp. Psychol.

    Hum. Percept. Perform. 30, 151162.

    Lukatela, G., Eaton, T., and Turvey, M. T.

    (2001). Does visual word identica-

    tion involve a sub-phonemic level?

    Cognition 78, B41B52.

    Maddieson, I. (2006). In search o uni-

    versals, in Linguistic Universals, eds

    R. Mairal and J. Gil (Cambridge:Cambridge University Press), 80100.

    Marcus, G. (2001). The Algebraic Mind:

    Integrating Connectionism and Cognitive

    Science. Cambridge: MIT press.

    McCarthy, J. J. (1986). OCP eects: gemi-

    nation and antigemination. Linguist.

    Inq. 17, 207263.

    Meyer, M., Zaehle, T., Gountouna, V.-E.,

    Barron, A., Jancke, L., and Turk, A.

    (2005). Spectro-temporal processing

    during speech perception involves let

    posterior auditory cortex.Neuroreport

    16, 19851989.

    Molese, D. L., and Molese, V. J. (1980).

    Cortical response o preterm inants

    to phonetic and nonphonetic speechstimuli. Dev. Psychol. 16, 574581.

    Pinker, S. (1994). The Language Instinct.

    New York: Morrow.

    Pinker, S., and Jackendo, R. (2005). The

    aculty o language: whats special

    about it? Cognition 95, 201236.

    Prince, A., and Smolensky, P. (1993/2004).

    Optimality Theory: Constraint

    Interaction in Generative Grammar.

    Malden, MA: Blackwell Pub.

    Prince, A., and Smolensky, P. (1997).

    Optimality: rom neural networks

    Berent, I., Steriade, D., Lennertz, T., and

    Vaknin, V. (2007a). What we know

    about what we have never heard:

    evidence rom perceptual illusions.

    Cognition 104, 591630.

    Berent, I., Vaknin, V., and Marcus, G.

    (2007b). Roots, stems, and the uni-

    versality o lexical representations:

    evidence rom Hebrew. Cognition

    104, 254286.

    Berent, I., Wilson, C., Marcus, G., and

    Bemis, D. (in press). On the role o

    variables in phonology: remarks on

    Hayes and Wilson. Linguist. Inq. 43.

    Brentari, D., Coppola, M., Mazzoni,

    L., and Goldin-Meadow, S. (2011).

    When does a system become pho-

    nological? Handshape production

    in gestures, signers and homesign-

    ers.Nat. Lang. Linguist. Theory. doi:

    10.1007/s11049-011-9145-1. [Epub

    ahead o print].

    Chomsky, N. (1980). Rules and

    Representations. New York: Columbia

    University Press.Dehaene-Lambertz, G., Pallier, C.,

    Semiclaes, W., Sprenger-Charolles,

    L., Jobert, A., and Dehaene, S. (2005).

    Neural correlates o switching rom

    auditory to speech perception.

    Neuroimage24, 2133.

    Fodor, J. (1983). The Modularity of Mind.

    Cambridge, MA: MIT Press.

    Goldinger, S. D., and Azuma, T. (2003).

    Puzzle-solving science: the quixotic

    quest or units in speech perception.

    J. Phon. 31, 305320.

    Greenberg, J. H. (1950). The patterning

    o morphemes in semitic. Word 6,

    162181.

    Iverson, J. R., and Patel, A. D. (2008).Perception o rhythmic grouping

    depends on auditory experience. J.

    Acoust. Soc. Am. 124, 22632271.

    Iverson, P., Kuhl, P. K., Akahane-Yamada,

    R., Diesch, E., Tohkura, Y., Kettermann,

    A., and Siebert, C. (2003). A perceptual

    intererence account o acquisition

    diculties or non-native phonemes.

    Cognition 87, B47B57.

    Kirk, R. (1982). Experimental Design:

    Procedures For the Behavioral Sciences.

    Pacic grove: Brooks/Cole.

    rEFErEncESAbrams, D. A., Bhatara, A., Ryali, S.,

    Balaban, E., Levitin, D. J., and Menon,

    V. (2010). Decoding temporal struc-

    ture in music and speech relies on

    shared brain resources but elicits di-

    erent ne-scale spatial patterns.Cereb.

    Cortex21, 15071518.

    Azadpour, M., and Balaban, E. (2008).

    Phonological representations are

    unconsciously used when processing

    complex, non-speech signals. PLoS

    ONE3, e1966. doi: 10.1371/journal.

    pone.0001966

    Berent, I., Balaban, E., Lennertz, T.,

    and Vaknin-Nusbaum, V. (2010).

    Phonological universals constrain

    the processing o nonspeech.J. Exp.

    Psychol. Gen. 139, 418435.

    Berent, I., Bibi, U., and Tzelgov, J. (2005).

    The autonomous computation o lin-

    guistic structure in reading: evidence

    rom the Stroop task.Mental Lex. 1,

    201230.

    Berent, I., Everett, D. L., and Shimron, J.(2001a). Do phonological represen-

    tations speciy variables? Evidence

    rom the obligatory contour principle.

    Cogn. Psychol. 42, 160.

    Berent, I., Shimron, J., and Vaknin, V.

    (2001b). Phonological constraints on

    reading: evidence rom the obligatory

    contour principle.J. Mem. Lang. 44,

    644665.

    Berent, I., and Lennertz, T. (2010).

    Universal constraints on the sound

    structure o language: phonological or

    acoustic?J. Exp. Psychol. Hum. Percept.

    Perform. 36, 212223.

    Berent, I., Lennertz, T., Smolensky, P.,

    and Vaknin-Nusbaum, V. (2009).Listeners knowledge o phonological

    universals: evidence rom nasal clus-

    ters. Phonology26, 75108.

    Berent, I., Marcus, G. F., Shimron, J.,

    and Gaos, A. I. (2002). The scope

    o linguistic generalizations: evi-

    dence rom Hebrew word ormation.

    Cognition 83, 113139.

    Berent, I., and Shimron, J. (1997). The

    representation o Hebrew words: evi-

    dence rom the obligatory contour

    principle. Cognition 64, 3972.

    these eects can originate rom structural descriptions computedby the language system itsel. Although these ndings leave openthe possibility that the language system is specialized with respectto structures that it can compute, they do suggest that the selectiono its inputs is not encapsulated.

    AcknoWLEdGMEnt

    This research was supported by NIDCD grant DC003277 to IrisBerent. We wish to thank Tracy Lennertz and Katherine Harder ortheir help with the preparation o the auditory stimuli, and MonicaBennett, or technical assistance.

    results are nonetheless more consistent with the proposal thatwell-ormedness acilitates the classication o acoustic stimuli asspeech. Accordingly, the status o a stimulus as linguistic appearsto depend not only on its inherent acoustic attributes, but also onthe structure o the outputs it aords.

    The nding that better-ormed linguistic structures are morereadily classied as speech suggests that the outputs o the language

    system (i.e., structural description) can penetrate the processingo its inputs (i.e., the classication o acoustic stimuli as speech).Top-down eects on speech classication are not new (e.g., Remezet al., 1981). Our results, however, are the rst to demonstrate that

    Berent et al. Phonological constraints on speech identication

    Fonts n Psycholoy | Language Sciences September 2011 | Volume 2 | Article 182 | 14

    http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/
  • 7/29/2019 Berent Balaban Phonological constraints on speech identification

    15/15

    tion. Front. Psychology2:182. doi: 10.3389/

    fpsyg.2011.00182

    This article was submitted to Frontiers in

    Language Sciences, a specialty of Frontiers

    in Psychology.

    Copyright 2011 Berent, Balaban and

    Vaknin-Nusbaum. This is an open-access

    article subject to a non-exclusive license

    between the authors and Frontiers Media

    SA, which permits use, distribution andreproduction in other forums, provided the

    original authors and source are credited

    and other Frontiers conditions are com-

    plied with.

    Conflict of Interest Statement: The

    authors declare that the research was

    conducted in the absence o any com-

    mercial or inancial relationships that

    could be construed as a potential confict

    o interest.

    Received: 29 March 2011; paper pending

    published: 11 May 2011; accepted: 19 July

    2011; published online: 13 September 2011.Citation: Berent I, Balaban E and Vaknin-

    Nusbaum V (20 11) How lin gui st ic

    chickens help spot spoken-eggs: phono-

    logical constraints on speech identifica-

    Vouloumanos, A., and Werker, J. F. (2007).

    Listening to language at birth: evi-

    dence or a bias or speech in neonates.

    Dev. Sci. 10, 159164.

    Yip, M. (1988). The obligatory contour

    principle and phonological rules:

    a loss o identity. Linguist. Inq. 19,

    65100.

    Yoshida, K. A., Iversen, J. R., Patel, A. D.,

    Mazuka, R., Nito, H., Gervain, J., andWerker, J. F. (2010). The develop-

    ment o perceptual grouping biases in

    inancy: a Japanese-English cross-lin-

    guistic study. Cognition 115, 356361.

    identiication in reading and the

    promise o subsymbolic psycholin-

    guistics. Psychol. Rev. 97, 488522.

    Vouloumanos, A., Hauser, M. D., Werker,

    J. F., and Martin, A. (2010). The tun-

    ing o human neonates preerence or

    speech. Child Dev. 81, 517527.

    Vouloumanos, A., Kiehl, K. A., Werker, J.

    F., and Liddle, P. F. (2001). Detection

    o sounds in the auditory stream:event-related MRI evidence or di-

    erential activation to speech and

    nonspeech. J. Cogn. Neurosci. 13,

    9941005.

    Berent et al. Phonological constraints on speech identication

    S b | V l | A i l |

    http://www.frontiersin.org/http://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/language_sciences/archivehttp://www.frontiersin.org/