gareth walkergareth-walker.staff.shef.ac.uk/pubs/walker-lboro-2017.pdf · transcript lines are...
TRANSCRIPT
Visual representations of acoustic data inCA research: a survey and suggestions
Gareth Walker
Introduction
Introduction Introduction
Introduction
http://www.incredibleart.org/lessons/elem/elem2.html
Aims of this talk
• to encourage thought about how visual representations ofacoustic data are prepared and used
• give some suggestions on good practice in constructingvisual representations of acoustic data
Visual representations of acoustic data in ROLSI
2001 2004 2007 2010 2013 20160
10
20
30
Year
cum
ulat
ive
prop
ortio
n(%
)
Cumulative proportion of ROLSI papers including phonetics termsthat contain visual representations of acoustic data
Visual representations of acoustic data in ROLSI
from Clayman, S. E. & Raymond, C. W. (2015). Modular pivots: Aresource for extending turns at talk. ROLSI, 48(4), 388-405.
In this case, the speaker raises her pitch toward the end of the first sentential completion, butthen begins to drop to her previous level on the second syllable of “turkey.” This downwardtrajectory is sustained across the pivot’s onset boundary. As evident in the spectrogram, there isalso a merging of the final consonant of “turkey” with the initial consonant of “yihknow” across
(10) [NB.IV.13.R, Page 3] 1 Emm: °°Ril cute °° But uh (0.7) .t.hhh They left early Lottie 'n 2 then we decideh we j'z we were goin ho:::me 'n then we 3 -> deci:ded it wz so nice'n quiet dow-.hhhh HEY I B'N EAT'N A 4 -> LO:TTA TURKEY YIHKNOW I DON'T HAVE ONE: BITTA ITCHI:NGk? 5 (1.2) 6 Emm: .t.hhhh YIHKNOW AH HEARD THET T(h)URKEY WZ GOO::D FOR YUH 7 with this thi:ng? 8 (0.3) 9 Lot: Is that ri::ght? 10 Emm: eeYah a girl'n the apartm'n tol'me tha:t. Thet the doctor 11 cured it? An' I'm tellin yuh yin- I've never had s'ch a 12 healing. I have no(h)o pro(h)oblems:.
A LO:TTA TURKEY YIHKNOW I DON’T HAVE ONE:
75
500
100
200
300
Pitc
h (H
z)
Time (s)
0 1.8750
5000
Freq
uenc
y (H
z)
MODULAR PIVOTS 399
Dow
nloa
ded
by [R
oyal
Hal
lam
shire
Hos
pita
l], [G
aret
h W
alke
r] a
t 03:
45 2
3 N
ovem
ber 2
015
Visual representations of acoustic data in ROLSI
from Clayman & Raymond (2015)
In this case, the speaker raises her pitch toward the end of the first sentential completion, butthen begins to drop to her previous level on the second syllable of “turkey.” This downwardtrajectory is sustained across the pivot’s onset boundary. As evident in the spectrogram, there isalso a merging of the final consonant of “turkey” with the initial consonant of “yihknow” across
(10) [NB.IV.13.R, Page 3] 1 Emm: °°Ril cute °° But uh (0.7) .t.hhh They left early Lottie 'n 2 then we decideh we j'z we were goin ho:::me 'n then we 3 -> deci:ded it wz so nice'n quiet dow-.hhhh HEY I B'N EAT'N A 4 -> LO:TTA TURKEY YIHKNOW I DON'T HAVE ONE: BITTA ITCHI:NGk? 5 (1.2) 6 Emm: .t.hhhh YIHKNOW AH HEARD THET T(h)URKEY WZ GOO::D FOR YUH 7 with this thi:ng? 8 (0.3) 9 Lot: Is that ri::ght? 10 Emm: eeYah a girl'n the apartm'n tol'me tha:t. Thet the doctor 11 cured it? An' I'm tellin yuh yin- I've never had s'ch a 12 healing. I have no(h)o pro(h)oblems:.
A LO:TTA TURKEY YIHKNOW I DON’T HAVE ONE:
75
500
100
200
300
Pitc
h (H
z)
Time (s)
0 1.8750
5000
Freq
uenc
y (H
z)
MODULAR PIVOTS 399
‘As evident in the spectrogram,there is. . . a merging of thefinal consonant of ‘‘turkey’’with the initial consonant of‘‘yihknow’’ across thisjuncture, with no break invoicing and a single palatalplace of articulation.’ (p. 400)
Visual representations of acoustic data in ROLSI
Visual representations of acoustic data in ROLSI
Freq
uenc
y (k
Hz)
0
1
2
3
4
5
KEY YIH
Time (s)0 0.05 0.1 0.15 0.2 0.25
i) provides support for the claim
ii) the reader can independently verify that claim
Pitch traces
GREETING: DISPLAYING STANCE THROUGH PROSODY 379
FIGURE 1 F0 trace of Paula’s greeting to Amanda (Excerpt 1, line 2).
FIGURE 2 F0 trace of Paula’s greeting to Derik (Excerpt 2, line 2).To facilitate comparison, the window sizes of Figures 1 and 2 are aboutthe same.
In this case, the speaker raises her pitch toward the end of the first sentential completion, butthen begins to drop to her previous level on the second syllable of “turkey.” This downwardtrajectory is sustained across the pivot’s onset boundary. As evident in the spectrogram, there isalso a merging of the final consonant of “turkey” with the initial consonant of “yihknow” across
(10) [NB.IV.13.R, Page 3] 1 Emm: °°Ril cute °° But uh (0.7) .t.hhh They left early Lottie 'n 2 then we decideh we j'z we were goin ho:::me 'n then we 3 -> deci:ded it wz so nice'n quiet dow-.hhhh HEY I B'N EAT'N A 4 -> LO:TTA TURKEY YIHKNOW I DON'T HAVE ONE: BITTA ITCHI:NGk? 5 (1.2) 6 Emm: .t.hhhh YIHKNOW AH HEARD THET T(h)URKEY WZ GOO::D FOR YUH 7 with this thi:ng? 8 (0.3) 9 Lot: Is that ri::ght? 10 Emm: eeYah a girl'n the apartm'n tol'me tha:t. Thet the doctor 11 cured it? An' I'm tellin yuh yin- I've never had s'ch a 12 healing. I have no(h)o pro(h)oblems:.
A LO:TTA TURKEY YIHKNOW I DON’T HAVE ONE:
75
500
100
200
300
Pitc
h (H
z)
Time (s)
0 1.8750
5000
Freq
uenc
y (H
z)
MODULAR PIVOTS 399
different interactional function. Although these multiples are produced exactlytwice and in immediate succession by the same speaker, these multiples contain apitch change such that the second saying of the token is produced with higher pitchthan the first token. Tokens in this category take the shape shown in Figure 2. Interms of its interactional placement, this type of double, that is, ja^ja., is alwayspositioned in interactional environments in which the interactants’ intersubjec-tivity or common world view is fractured. The basic sequence unfolds as follows:A produces an utterance, and B responds to it. B’s response displays B’s misalign-ment with the previous turn. It is in response to this misalignment that speaker Aproduces a turn containing a turn-initial ja^ja. of the prosodic shape described pre-viously. With the production of the ja^ja. turn, speaker A acknowledges speakerB’s utterance while simultaneously realigning the interactants. Put differently,with the ja^ja., its speaker treats the action/content of the previous speaker’s utter-ance as either unwarranted or self-evident and takes issue with it. All instances ofja^ja. turns or ja^ja.-fronted turns in the collection fall into the following typesof misalignment5: (a) the prior speaker tells the jaja speaker something over whichthe jaja speaker has epistemic authority (i.e., B-event statements; Labov & Fan-shel, 1977), (b) the prior speaker asks for clarification or comments on somethingthat the jaja speaker already said or implied in the preceding turn(s), or (c) the priorspeaker is responding to something that was not the main point of the jaja speaker(sequential misalignment). In all three categories, the “fault” for the misalignmentis attributable to the prior speaker (i.e., the non-ja^ja. speaker). Moreover, the dataconvey the sense that the prior speaker should have known better (i.e., “I alreadyknow, and you should have known that I already know, what you just said”). In the
252 GOLATO AND FAGYAL
FIGURE 2 ja^ja. token with pitch peak in the second syllable.
All confirmation sequences were analyzed auditorily and subsequently in PRAAT 5.3.77.2 The symbol ʔis used in the transcripts to represent glottalization, while the = symbol indicates vowel linking. At therelevant word boundary, pitch accents are recorded as capitals where they occur, with an indication of thepitch movement. Throughout the transcript syllable lengthening and pausing are also represented (see theappendix for transcription symbols). None of these prosodic parameters accounts for the contrastdescribed; as a result their analysis has been kept to a minimum. Similarly, finer phonetic detail has notbeen transcribed and is not referenced here, since the primary explanation for the contrast betweenglottalization and linking in the context of turn extension after initial confirmations is an action-basedrather than a phonological one.
Transcript lines are translated into English in a separate line. The translations aim to strike abalance between an appropriate gloss and a sufficiently strong sense of the original lexicalchoices. We draw attention to the fact that translation of all the nuances of the original is notpossible. Ashmore and Reed (2000) note that transcripts of natural data are twice removed fromthe original event through recording and subsequent notation. Translation adds another layer tothis process, and neither the transcripts in the original language nor their translations shouldtherefore be considered “data.”
Time (s)0 0.5646
-0.2435
0.2735
0
-0.2435
0.2735
0
ja a- -ber
75
600
Pitc
h (H
z)
Time (s)0 0.5646
Figure 2. Glottalized ja aber, Extract 7, line 473.
2http://www.fon.hum.uva.nl/praat/
RESEARCH ON LANGUAGE AND SOCIAL INTERACTION 133
176 BARTH-WEINGARTEN
Pitc
h (H
z)
100
150
200
300
50
500
70
jA JA
s o
Time (s)0 0.3093
FIGURE 3 Pitch trace of joke-aligning JAJA in excerpt 7, line 2315.(The end of the second JA is not properly taken up by PRAAT as it isoverlapped.)
On the (narrow-sense) phonetic side, these JAJAs are regularly accompanied by smile voice—i.e., lip spreading, which results in a raised first formant, an audibly “broader” pronunciationof vowels, and possibly the perception of raised pitch, for instance (see Ford & Fox, 2010)—and often they shade off into, or are followed by, laughter (for the relevance of this cluster offeatures for joke-aligning JAJAs, see Barth-Weingarten, in press). Interactionally, these JAJAsare affiliating with the jocular mode and accomplish alignment in that they are neither sequenceclosing nor are they followed by any topicalization of a misalignment or the like, but instead theirspeakers seem to be content with a continuation of the sequence in the same mode as before.
Consider Excerpt 7. It is taken from another edition of the TV talk show “Die Woche” withaudience present. The talk show host Gerd Müller-Gerbes (MG) invited, among others, the popsinger Howard Carpendale (HC) and the politician Heiner Geißler (HG). HC has just jokinglycomplimented HG on the way he presents himself in this show, upon which HG points out thatBiermann, another famous German political singer and songwriter, had already said the samething to him on some earlier occasion.
Excerpt 7 Biermann (Fasch_2304, Fasch_2309, Fasch_2314 (1:00:00-1:00:35)
2299 HG: der BIER]mann hat des AUCH [schon mal zu mir gesagt;=nIcht?=]{name: Biermann} said this to me too once you know
2301 HC: [((smiles till cut to HG)) ]2302 MG: =WER hat das-
=who has|_____________|((HG visible with smile))
2303 der BIERma[nn. ]{name: Biermann}_____________|((HG visible with smile))
Pitch traces
Callhome EN4093: 1207s
1 B: [ has ] Kim been here before2 A: [and the-]3 (0.2)4 A: → no
Pitch traces
0
150
300
450
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
Pitch traces
0
150
300
450
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
linear; notscaled to range
Linear vs. non-linear scales
we perceive differences in frequency better at lowerfrequencies
130.8 Hz
138.6 Hz 261.6 Hz277.2 Hz
Linear vs. non-linear scales
100
150
200
250
300
Pitc
h (H
z)
Time
100
150
200
250
300
Pitc
h (H
z)
Time
a non-linear scale makes changes at lower frequencies lookbigger; higher frequencies are ‘squished’
Linear vs. non-linear scales
50
150
300
450
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
Linear vs. non-linear scales
50
150
300
450
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
-5.1 ST
Linear vs. non-linear scales
50
150
300
450
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
Scaling to a speaker’s range
• what is high (or low) for one speaker might not be high (orlow) for another
• relative placement in a speaker’s range has interactionalrelevance (Couper-Kuhlen, 1996; Local, 2005)
Scaling to a speaker’s range
0
100
200
300
400
500
600
700
Speakers
Pitc
h(H
z)
malefemalemedianmean
• what is high (orlow) for A may notbe high (low) for B
• some speakersseem to have hugeranges, othersmuch smaller ones
• it looks like womenhave wider rangesthan men
Scaling to a speaker’s range
0
5
10
15
20
25
30
35
Speakers
Ran
ge(s
emito
nes
re.
base
line)
malefemalemedianmean
• semitones, relativeto the speaker’sbaseline
• men and womenare mixed
• ranges look moresimilar
Scaling to a speaker’s range
variation in pitch heights and ranges means it is often worthdrawing pitch traces relative to the speaker’s range
Scaling to a speaker’s range
86
150
200
250
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
Scaling to a speaker’s range
86
150
200
250
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
Pitch (sem
itones)
0
6
12
18
0
150
300
450
Pitc
h (H
z)
0 0.1 0.2 0.3 0.4Time (s)
Summing up
• trying to share some advice on preparing robust visualrepresentations
• trying to prompt more careful thought about how visualrepresentations are prepared and used
• visual representations, along with descriptions andtranscriptions, are how readers usually ‘get at’ the data
Visual representations of acoustic data inCA research: a survey and suggestions
Gareth Walker
scripts: tinyurl.com/visrepsslides: tinyurl.com/gwshef