individual differences in phoneme categorization

Individual differences in phoneme categorization

Effie Kapnoula, Bob McMurray, Eunjong Kong, Matthew Winn, & Jan Edwards

19th Mid-Continental Phonetics & Phonology Conference

The problem of lack of invariance• There is no one-to-one relation between a sound (i.e. formant

frequencies) and the perceived phoneme

Hillenbrand, Getty, Clark & Wheeler, 19952

• One solution: categorical perception

3

• One solution: categorical perception

bar par

1 2 3 4 5 6 7

par

bar

4

• One solution: categorical perception+Simple solution+Fast commitment bar par

1 2 3 4 5 6 7

par

bar

5

Two alternative forced choice (2AFC)

Werker & Tees, 1987; Joanisse et al, 2000; López-Zamora et al, 20106

Presenter

Presentation Notes

In these studies, “gradient perception” is usually viewed as a bad thing; also referred to as “weak categorization”. This is mostly due to the fact that usually, gradiency is quantified by the steepness of the slope in a 2AFC task, but is this really measuring gradiency, and how is it differentiated from noise? Maybe we need to find a different way of measuring gradiency.

transitions) and the perceived phoneme

• One solution: categorical perception+Simple solution+Fast commitment

• Alternative: gradient perception

bar par

1 2 3 4 5 6 7

par

bar

bar par

1 2 3 4 5 6 7

par

bar

7

transitions) and the perceived phoneme

• One solution: categorical perception+Simple solution+Fast commitment

• Alternative: gradient perception +Flexibility +Late commitment+Keep useful within-category information

bar par

1 2 3 4 5 6 7

par

bar

bar par

1 2 3 4 5 6 7

par

bar

8

Gradiency in speech perception

0 5 10 15 20 25 30 35 400.02

0.03

0.04

0.05

0.06

0.07

0.08

VOT (ms)

CategoryBoundary

Response = Response =

Looks to

Com

petit

or F

ixat

ions

McMurray, Tanenhaus & Aslin (2002)

• Evidence for gradiency from eye-movements

9

Presenter

Presentation Notes

However, CP is wrong. McMurray et al found that people would still look to the picture of the pear even when they clicked on the bear, if the VOT was higher. This shows that people are sensitive to within-category differences in acoustic cues, and therefore able of displaying gradiency in phoneme categorization.

Two alternative forced choice (2AFC) • Is gradiency good or bad for speech perception?

Werker & Tees, 1987; Joanisse et al, 2000; López-Zamora et al, 2010

?10

Presenter

Presentation Notes

In these studies, “gradient perception” is usually viewed as a bad thing; also referred to as “weak categorization”. This is mostly due to the fact that usually, gradiency is quantified by the steepness of the slope in a 2AFC task, but is this really measuring gradiency, and how is it differentiated from noise? Maybe we need to find a different way of measuring gradiency.

bull pull

bull• Measuring gradiency: Visual analog scaling (VAS) task

11

Presenter

Presentation Notes

Kong and Edwards developed such a method in which they gave participants the opportunity to display their gradient perception of phonemes; participants were asked to drag a box on a line to report how close to either endpoints a stimulus was. In a way the VAS task is everything the 2AFC task is (since it has the two endpoints) plus it allows gradient responses; if you still wanna only use the endpoints, that’s fine, but if someone wants to use the in-between space they can do that too.

Kong, E. J., & Edwards, J. (2011)12

Presenter

Presentation Notes

Interestingly, some participants showed more step-like functions than others. In order to quantify this effect Kong & Edwards used the response histogram (3rd column).

Kong, E. J., & Edwards, J. (2011)13

Presenter

Presentation Notes

Interestingly, some participants showed more step-like functions than others. In order to quantify this effect Kong & Edwards used the response histogram (3rd column).

• Summary points: Listeners are capable of gradient categorization of phonemes The VAS task allows for this gradiency to be expressed in participants’ responses

Summary and aims

14

• Where does gradiency come from? Is it good or bad for speech perception?

Summary and aims

15

Establish a way of quantifying gradiency via the VAS task

Summary and aims

16

Establish a way of quantifying gradiency via the VAS task

1. Investigate possible sources of gradiency (e.g. executive function)

2. Link gradiency to multiple cue use

3. Examine whether gradiency is good or bad for speech perception

Summary and aims

17

• Stimuli:

• Seven (7) VOT steps (primary cue) and five (5) F0 steps (secondary cue)

Methodlabial alveolar

Real words bull-pull den-tenNonwords buv-puv dev-tevCVs buh-puh deh-teh

F0 s

teps

54321

1 2 3 4 5 6 7VOT steps

18

• Stimuli:

• Seven (7) VOT steps (primary cue) and five (5) F0 steps (secondary cue)

• Tasks:• Visual analog scaling (VAS) task

• Two alternative forced choice (2AFC)

Methodlabial alveolar

Real words bull-pull den-tenNonwords buv-puv dev-tevCVs buh-puh deh-teh

bull pull

pullbull

19

• Additional tasks:

Trail making task (cognitive flexibility)

N-Back task (working memory)

Flanker task (inhibition)

Method

non-speech cognitive processes

20

AZ-bio (sentences in babbling noise - 1:1 STN ratio)

Method

21

AZ-bio (sentences in babbling noise - 1:1 STN ratio)

• Participants: 130 undergraduates at the U of Iowa

Method

22

Results

23

Results

010203040

0 10 20 30 40 50 60 70 80 90100

010203040

0 10 20 30 40 50 60 70 80 90100

010203040

0 10 20 30 40 50 60 70 80 90100

010203040

0 10 20 30 40 50 60 70 80 90100

Sub 8

Sub 9

Sub 7

Sub 68

24

Results: Quantifying gradiency

25

Presenter

Presentation Notes

To accomplish this, we developed a new equation (see Eq.1 in Appendix), which assumes a diagonal boundary in two-dimensional space that can be described as a line with some cross-over point (along the primary cue) and an angle, θ (Figure 7). Here, a θ of 90o would indicate that the listener only used the primary cue, while a θ of 45o would indicate equal use of both. Once this boundary is identified, we can conceptually rotate the coordinate space to be orthogonal to this boundary, allowing us to model the gradiency of the function with a single parameter that indicates that derivative of the function orthogonal to the boundary; the steeper the slope the more categorical the response pattern.

Results: Quantifying gradiency• Extracting gradiency from VAS data

F0 s

teps

54321

1 2 3 4 5 6 7VOT steps

F0 s

teps

54321

1 2 3 4 5 6 7VOT steps

VOT steps

F0 s

teps

θ

s

VOT steps

F0 s

teps

θ

s

26

Presenter

Presentation Notes

Results: Quantifying gradiency• Extracting gradiency from VAS data

F0 s

teps

54321

1 2 3 4 5 6 7VOT steps

F0 s

teps

54321

1 2 3 4 5 6 7VOT steps

VOT steps

F0 s

teps

θ

s

VOT steps

F0 s

teps

θ

s

Steep s slope

Shallow s slopegradient

categorical

27

Presenter

Presentation Notes

Results: Quantifying secondary cue use• Extracting F0 use from 2AFC data

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7

b

2AFC

resp

onse

p

VOT step

12F0 = 90 hZF0 = 125 hZ

28

Presenter

Presentation Notes

F0 use = difference between cross-over points for two F0 values

Results

29

Presenter

Presentation Notes

We will first see whether there are stimulus differences

Results: Stimulus and place effects

0

20

40

60

80

100

1 2 3 4 5 6 7b

VA

S re

spon

se

p

VOT step

NPNWRW

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7

b

2A

FC

res

pons

e

p

VOT step

Real words

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7b

2AF

C r

espo

nse

pVOT step

Nonwords

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7

b

2A

FC

res

pons

e

p

VOT step

CVs

90hZ 125hZ

0

20

40

60

80

100

1 2 3 4 5 6 7b

V

AS

resp

onse

p

VOT step

alvlab

30

Results: Stimulus and place effects

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7

b

2A

FC

res

pons

e

p

VOT step

Real words

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7b

2AF

C r

espo

nse

pVOT step

Nonwords

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7

b

2A

FC

res

pons

e

p

VOT step

CVs

90hZ 125hZ

0

20

40

60

80

100

1 2 3 4 5 6 7b

VA

S re

spon

se

p

VOT step

NPNWRW

0

20

40

60

80

100

1 2 3 4 5 6 7b

V

AS

resp

onse

p

VOT step

alvlab

F<1

31

Results: Place differences in F0 useF(1,250) = 27.8, p < 0.001

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7

b

2A

FC r

espo

nse

p

VOT step

Labials

0

0.2

0.4

0.6

0.8

1

1 2 3 4 5 6 7VOT step

Alveolars

1590hZ 125hZ

32

Results1. Do individual differences in gradiency derive from differences in

general cognitive function?

33

Presenter

Presentation Notes

We entered stimulus type in the first step, EF measures in the second, and stimulus type by EF interactions in the third step. Given the statistically significant correlations reported in the Preliminary analyses section, we also included collinearity diagnostics. All VIF values across all steps were <1.7. Crucially, entering the EF measures did not account for a statistically significant amount of variance in VAS slope, F(3,108)=1.75, p=ns, nor did the interaction terms, F(6,102)=1.32, p=ns.

EF measures did not account for a statistically significant amount of variance in VAS slope, F(3,108)=1.75, p=.162, or F0 use, F<0

gradient

categorical

gradient

categorical

gradient

categorical

34

Presenter

Presentation Notes

EF measures did not account for a statistically significant amount of variance in VAS slope, F(3,108)=1.75, p=.162, or F0 use, F<0

Speech perception processes may be played out on a different level of processing than higher cognitive processes, such as working memory

35

Presenter

Presentation Notes

2. Are individual differences in gradiency linked to multiple cue use?

36

Positive relationship: Better encoding of fine-grained detail (more gradiency) enables access to multiple cues

Negative relationship: Listeners who use more cues have more accurate, sharper boundaries

37

Resultsβ=-0.305, t=-3.4, p < 0.01

gradient

categorical

38

Presenter

Presentation Notes

Gradiency in phoneme categorization is reliably predicted by the extent to which listeners use one versus two cues; use of only one cue is related to more categorical pattern of responses, whereas use of two cues is related to a more gradient response pattern.

Positive relationship: Better encoding of fine-grained detail (more gradiency) enables access to multiple cues

Negative relationship: Listeners who use more cues have more accurate, sharper boundaries

39

Presenter

Presentation Notes

Gradiency in phoneme categorization is reliably predicted by the extent to which listeners use one versus two cues; use of only one cue is related to more categorical pattern of responses, whereas use of two cues is related to a more gradient response pattern. Yes, more gradient listeners tend to rely more on the secondary cue (F0)

3. In what way are these differences important for speech perception?

40

Results• Gradiency and perception of speech-in-noise

r = .164, p=.068

gradientcategorical

41

Presenter

Presentation Notes

The degree to which a listener manifests a gradient pattern of phoneme perception reliably and positively predicts how well one can perceive speech in a noisy background. However, this relationship seems to be mediated by working memory.

r = .243, p=.007r = .164, p=.068

gradientcategorical

42

Presenter

Presentation Notes

Gradiency Speech-in-noise

Working Memory

1)

2)

43

Presenter

Presentation Notes

R2 = 0.019

β=-0.14, t=-1.48, p = .143

categorical gradient

44

Presenter

Presentation Notes

Gradiency Speech-in-noise

Working Memory

1)

2)

45

Presenter

Presentation Notes

3. In what way are these differences important for speech perception?

More gradient listeners tend to better perceive speech in noise

46

Summary and conclusions1. Do individual differences in gradiency derive from differences in

general cognitive function? Probably not. Maybe speech perception operates on a different level than higher cognitive processes.

47

2. Are individual differences in gradiency linked to multiple cue use? Yes, more gradient listeners tend to rely more on the secondary cue (F0). Better encoding of fine-grained detail (more gradiency) enables access to multiple cues. And/or more gradient listeners commit later to a category.

48

2. Are individual differences in gradiency linked to multiple cue use? Yes, more gradient listeners tend to rely more on the secondary cue (F0). Better encoding of fine-grained detail (more gradiency) enables access to multiple cues. And/or more gradient listeners commit later to a category.

3. In what way are these differences important for speech perception? More gradient listeners do a bit better (marginally) in perceiving speech in noise. Gradiency is not all that bad - maybe good for some things.

49

Take home messages

1. Gradiency indicates more accurate, true-to-the-signal perception.

50

Presenter

Presentation Notes

And this gradiency is related to multiple cue integration. We are not sure what the source of these differences is, but it looks like it is not related to higher cognitive functions.

Take home messages

2. Some listeners are more gradient than others in categorizing phonemes.

51

Presenter

Presentation Notes

Gradiency has been traditionally considered a bad thing (“weak” categorization), but this not be entirely true.

Take home messages

2. Some listeners are more gradient than others in categorizing phonemes.

3. This gradiency may be a good thing.

52

Presenter

Presentation Notes

Gradiency has been traditionally considered a bad thing (“weak” categorization), but this not be entirely true.

Thank you!

53

individual differences in phoneme categorization

Documents

phoneme recognition: neural networks vs. hidden markov...

phoneme ear air

phoneme deletion test

spanish_english vowel phoneme comparison

the vowel phoneme mergers in english: monophthong phoneme

the phoneme - phoneme vs. allophone - minimal pair -...

automatic estimating sound wave overview acoustic of p ·...

extended preston blair phoneme series

differences between early-developing and...

arabic phoneme recognition using neural networks phoneme...

phoneme and feature theory

sims 202, marti hearst metadata, objects, relations:...

previous mact sub categories epa has recognized differences...

phoneme recognition as a function of task and...

phoneme (2)

the phoneme - phoneme vs. allophone - minimal pair ......

male female differences in width of cognitive...

sources of phoneme errors in repetition: perseverative...

allophonic variation in english phoneme

variation in phoneme inventories