workshop on - hypotheses.org...givenness and prosody, how french wh-in-situ questions are not linked...

Workshop on

Prosody and Meaning: Information Structure and Beyond

ProMAix

Abstracts

8 November 2018

Laboratoire Parole et Langage Aix-Marseille Université – CNRS

Aix-en- Provence, France

2

Committees

Organisers

Cristel Portes, Laboratoire Parole et Langage (LPL), Aix-Marseille Université (AMU),

Arndt Riester and Uwe Reyle, Institut für Maschinelle Sprachverarbeitung (IMS), Universität

Stuttgart.

Scientific committee

Stefan Baumann (Universität Köln),

Roxane Bertrand (CNRS, Aix-Marseille Université),

Daniel Büring (Universität Wien),

Sasha Calhoun (University of Wellington),

Elisabeth Delais-Roussarie (CNRS, Université de Nantes),

Kordula De Kuthy (Universität Tübingen),

Mariapaola D’Imperio (Aix-Marseille Université),

James German (Aix-Marseille Université),

Daniel Hole (Universität Stuttgart),

Frank Kügler (Universität Köln),

Amandine Michelas (CNRS, Aix-Marseille Université),

Caterina Petrone (CNRS, Aix-Marseille Université),

Giuseppina Turco (CNRS, Université Paris Diderot),

Pauline Welby (CNRS, Aix-Marseille Université),

Margaret Zellers (Universität Kiel).

Local committee

Carine André, LPL-CNRS

Axel Barrault, ILCB-AMU

Sébastien Bermond, LPL-AMU

Brigitte Bigi, LPL-CNRS

Giusy Cirillo, LPL-AMU

Cyril Deniaud, LPL-AMU

Stéphanie Desous, LPL-CNRS

Lydia Dorokhova, LPL-AMU

Simone Fuscone, ILCB-AMU

Aurélie Goujon, LPL-AMU

Rémi Lamarque, LPL-AMU

Joëlle Lavaud, LPL-CNRS

Frédéric Lefèvre, LPL-AMU

Nadia Monségu, LPL-CNRS

Claudia Pichon-Starke, LPL-CNRS

Elora Rivière, LPL-AMU

3

Acknowledgements

The Organisation Committee would like to thank all their financial supporters

4

Program

8:30-9:00 Registration + poster installation

9:00-9:10 Welcome Prosody & Meaning

9:10-10:10 Invited talk: Pilar Prieto. Intonational encoding of epistemic operations across

speech acts: Commitment and Agreement operators

10:10-11:30 Poster session 1 + Coffee

11:30-12:00

12:00-12:30

12:30-13:00

Sophie Egger and Bettina Braun. What does it take to make a question biased?

– Evidence from perception data.

Ramona Wallner. Givenness and prosody, how French wh-in-situ questions are

not linked to givenness.

Elisa Sneed German, Caterina Petrone, Kiwako Ito and James Sneed German.

Effects of tune choice on the multi-dimensional interpretation of requests and

offers.

13:00-14:30 Lunch

14:30-15:00

15:00-15:30

George Christodoulides. Prosody plus syntax does not equal discourse

structure, and why should it?

Muriel Assmann, Daniel Büring, Izabela Jordanoska and Max Prüller.

Focus interpretation is relational (but not stochastic).

15:30-16:50 Poster session 2 + Coffee

16:50-17:00 Welcome SemDial

17:00- 18:00 Invited talk: Michael Wagner (joint work with Dan Goodhue) Toward a

Bestiary of the Intonational Tunes of English

19:00 Welcome drink

Poster session 1

Muriel Assmann, Daniel Büring, Izabela Jordanoska and Max Prüller. A formal account of focus in French.

Stefan Baumann and Jane Mertens. Do prenuclear accents reflect meaning differences in German?

Piera Filippi. Prosody as a biological anchor of meaning effects: An evolutionary perspective.

Manon Lelandais and Gaëlle Ferré. The prosodic realisation of subordinate constructions: peaks or troughs?

Simon Wehrle, Francesco Cangemi and Martine Grice. Why so quiet? The nature and significance of silent gaps in

second language communication.

Poster session 2

Emilie Destruel and Caroline Féry. Prosodic realization of dual focus in French declarative sentences.

Branislav Gerazov. Embedding context-dependent variations of prosodic contours using variational encoding for

decomposing the structure of speech prosody.

Amandine Michelas and Maud Champagne-Lavau. Does the addressee matter when producing French prosodic

focus marking?

Riccardo Orrico, Renata Savy and Mariapaola D'Imperio. Individual variability in Salerno Italian question tune

production and epistemic attitude.

Marvin Schmitt, Alexandre Cremers and Jakub Dotlačil. CRISP: a semantics for focus-sensitive particles in

questions.

Matthijs Westera. Rise-fall-rise: A prosodic window on secondary QUDs.

5

Table of contents

Committees ................................................................................................................................................................. 2

Acknowledgements .................................................................................................................................................... 3

Program ....................................................................................................................................................................... 4

Invited Talks ................................................................................................................................................................. 7

P. Prieto Intonational encoding of epistemic operations across speech acts: Commitment and Agreement operators ........... 8

M. Wagner Toward a Bestiary of the Intonational Tunes of English .......................................................................................... 9

Oral Presentations ................................................................................................................................................... 11

M. Assmann, D. Büring, I. Jordanoska, M. Prüller Focus interpretation is relational (but not stochastic) ............................................................................................. 12

G. Christodoulides Prosody Plus Syntax Does Not Equal Discourse Structure, And Why Should It? Corpus-Based Research on the

Interaction between Prosody, Syntax and Discourse Structure............................................................................... 14

S. Egger, B. Braun What does it take to make a question biased? – Evidence from perception data .................................................... 16

E. Sneed German, C. Petrone, K. Ito, J. Sneed German Effects of tune choice on the multi-dimensional interpretation of requests and offers ........................................... 18

R. Wallner Givenness and prosody, how French in-situ questions are not linked to givenness............................................... 20

Poster Sessions ........................................................................................................................................................ 23

M. Assmann, D. Büring, I. Jordanoska, M. Prüller A formal account of focus in French ...................................................................................................................... 24

S. Baumann, J. Mertens Do prenuclear accents reflect meaning differences in German? ............................................................................. 26

E. Destruel, C. Féry Prosodic realization of dual focus in French declarative sentences ........................................................................ 28

P. Filippi Prosody as a biological anchor of meaning effects: An evolutionary perspective .................................................. 30

B. Gerazov, G. Bailly, O. Mohammed, Y. Xu, P. N. Garner Embedding Context-Dependent Variations of Prosodic Contours using Variational Encoding for Decomposing

the Structure of Speech Prosody ............................................................................................................................. 32

M. Lelandais, G. Ferré The prosodic realisation of subordinate constructions: peaks or troughs? .............................................................. 34

A. Michelas, M. Champagne-Lavau Does the addressee matter when producing French prosodic focus marking? ........................................................ 36

R. Orrico, R. Savy, M. D'Imperio Individual variability in Salerno Italian question tune production and epistemic attitude ...................................... 38

M. Schmitt, A. Cremers, J. Dotlačil CRISP: a semantics for focus-sensitive particles in questions ................................................................................ 40

S. Wehrle, F. Cangemi, M. Grice Why so quiet? The nature and significance of silent gaps in second language communication ............................ 42

M. Westera Rise-fall-rise: A prosodic window on secondary QUDs ......................................................................................... 44

Invited Talks

8

Intonational encoding of epistemic operations across speech acts:

Commitment and Agreement operators

Pilar Prieto ICREA-Universitat Pompeu Fabra, Barcelona, Spain

Even though intonation has been traditionally claimed to be an indicator of the epistemic commitments

of the participants in a discourse, very few empirical investigations have addressed specific semantic

hypotheses related to the precise semantic contribution of intonation to utterance interpretation. In this

talk, I will provide a set of empirical arguments showing that different types of statement and question

intonation contours across languages encode different levels of ASSERT (commitment) and REJECT

((dis)agreement) epistemic operators. First, I will show crosslinguistic data from typologically diverse

languages as supporting evidence that sentence-final discourse particles across languages (a) encode

similar meanings to those intonation encodes; and (b) encode the specification of dynamic epistemic

commitments in two complementary directions, i.e., speaker commitments to the speaker’s own

proposition and speaker agreement with the addressee’s propositions (e.g., different degrees of the

ASSERT and REJECT operators). Second, the results of two empirical studies will be presented to

further support this view. The first study will show results from a recent perception experiment

showing that different types of biased QUESTION intonation in Catalan encode fine-grained

information about the epistemic stance of the speaker, not only in relation to the speaker’s own

propositions but also in relation to the addressee’s propositions or to contextual information. A total of

119 Central Catalan listeners participated in an acceptability judgment task and were asked to rate the

perceived degree of acceptability between a set of interrogative utterances (variously produced with

one of four intonational contours) and their previous discourse context (which was controlled for

epistemic bias). We found that participants preferred some question intonation contours over others in

the six types of epistemic contexts (e.g., three degrees of speaker commitment and three degrees of

speaker agreement), revealing an epistemic specialization of intonation contours in this language. The

second study will show the results of a recent production experiment comparing two languages within

the Romance group (Catalan and Friulian) which have been reported to use intonation and sentence

particles to different extents to mark epistemic meanings. A total of 15 speakers per language were

asked to participate in a Discourse Completion Task designed to elicit statements with two degrees of

speaker commitment and agreement properties. The results of the two experiments show that (a)

intonation in Romance encodes speaker commitment and speaker agrement operations in statements

and questions through a different set of intonation contours; and (b) Catalan and Friulian display an

asymmetry in the marking of epistemically-biased statements: while Catalan uses a greater variety of

stance-marking intonation contours, Friulian uses a more varied set of stance modal particles and a

more restricted set of intonation contours. Overall, the results of the two experiments show that

intonation encodes commitment and agreement operators across two different speech acts, namely

questions and statements. Following up on recent proposals by Portes and colleagues, we claim that

intonation contours across languages encode multidimensionally a set of operators that refer to speech

act and epistemic information, as well as information structure, politeness and affective information.

We claim that dynamic semantic models enable us to integrate the study of compositional intonational

meaning with other parts of the grammar into a unified approach.

9

Toward a Bestiary of the Intonational Tunes of English

Michael Wagner McGill University

Reporting on joint work with Dan Goodhue, University of Maryland

What is the inventory of tunes of North American English? What do particular tunes contribute to the

pragmatic and semantic import of an utterance? How reliably are certain conversational goals and

intentions associated with the use of particular tunes? While English intonation is well-studied, the

answers to these questions still remain preliminary. We present the results of scripted experiments that

complement existing knowledge by providing some data on what tunes speakers use to accomplish

particular conversational goals, and how likely particular choices are. This research complements

studies of the meaning and form of individual contours, which often does not explore the alternative

prosodic means to achieve a certain conversational goal; it also complements more exploratory

research based on speech corpora, which offer a rich field for exploring which contours are generally

out there, but since the context often underdetermines the real intentions of the speaker, they make it

hard to come to firm conclusions with respect to the contribution of particular tunes.

Our studies focus on three types of conversational goals, the goal to contradict (‘Intended

Contradiction’), to imply something indirectly (‘Intended Implication’), or to express incredulity

(‘Intended Incredulity’). We looked at these three intents since their expression has been linked in the

prior literature with the use of three particular rising contours: the Contradiction Contour (Liberman &

Sag, 1974; Ladd, 1980; Ward & Hirschberg, 1985; Goodhue & Wagner 2018), the Rise-Fall-rise

Contour (Ward & Hirschberg, 1985; Constant, 2012; Wagner, 2012), and the incredulity contour

(Hirschberg & Ward, 1992).

Our results show that participants indeed use the expected contours more frequently than others to

achieve the respective conversational goals---except that they almost never used the Incredulity

Contour. To convey incredulity, speakers almost always chose the Polar Question Rise (Pierrehumbert

& Hirschberg, 1990, Bartels, 1999; Truckenbrodt 2012). In Contradictions, there was more variability

in the choice of intonational tune than with the other two intents. When speakers did not use the

Contradiction Contour, they often contradicted the interlocutor using a Declarative Fall with Polarity

Focus, or a hitherto undescribed falling contour, which we label the Presumption Contour. Our results

also show an interesting interaction between choice of tune and focus prominence (Goodhue &

Wagner 2016; cf. Schlöder 2018). We discuss the challenge such interactions pose for Rooth's

alternatives theory of focus, and how one might go about addressing it.

Oral Presentations

12

Focus interpretation is relational (but not stochastic)

Muriel Assmann, Daniel Büring, Izabela Jordanoska and Max Prüller University of Vienna

Overview:

In this paper we present a formally explicit metrical and relational, but non-stochastic, theory of focus

realization —i.e. the relation between prosodic structure and the generation of Roothian focus

alternatives— in English, cashing out several advantages of a stochastic relational metrical system

suggested in [5] and works reported therein. While consonant with [5]’s general line of argument, we

argue a number of points pace that work, among them: ① Apart from the ‘marked-unmarked’

opposition, there is an inherent asymmetry between weak and strong sisters, even under default stress

assignment; this makes unnecessary any ‘constraint . . . requiring F-marked elements to align with

nuclear accents’ ([5], p.12). ② Non-default metrical structures obligatorily lead to focal

interpretations (i.e. non-trivial alternatives), while default structures are literally completely neutral.

③ Non-default strong–weak assignment indeed marks the weak sister

as background, but the strong sister as containing a focus (‘focal’), rather than being one.

In the spirit of [5]: We explicate the prosodic defaults by the principles in (1), which are strictly

ranked, and determine weak-strong markings on syntactic trees, which in turn relate to stress and

accent patterns by rather standard, non-functional mappings, given in (2) and (3); following [3] the

annotated tree is the input for focus interpretation, and no intermediate diacritics such as F-markers are

used.

We show how this set-up solves a number of problems known to haunt accent-based approaches,

including some also tackled in [4, 5]. How prenuclear accents on non-focal elements (wrongly

excluded by e.g. [10, 9, 7, 8]) are possible, but also why pre-nuclear accents corresponding to a non-

default stress pattern lead —obligatorily, we argue— to a focal interpretation. Why non-narrow foci

are realized according to default prosody, whether they are all-new, or all-given ([10, 9] predicts

complex all-given foci like (5) to be ineffable; [8] allows for exceptional F-marking on a given

element in such cases, but thereby predicts that complex given foci incur additional violations of

AvoidF, and, in turn, that F-markings that are otherwise correctly ruled out, e.g. (6-b), become

available in such cases, which they don’t, (7-b)). Why Second Occurrence Foci are realized by stress-

only if post-nuclear, but by stress plus pitch accent if prenuclear, cf. [6, 1], and why the scope of SOFi

is limited to the background of the ‘primary focus’ [2].

Pace [5]: Re ①: Under default stress assignment, e.g. (4-a), a strong sister is focally neutral: it allows

for alternatives, including its literal meaning, i.e. it may or may not be focal. In addition, the weak

sister is conditionally focal: it may introduce non-trivial (NT) alternatives, but only if the strong sister

does, too; see (4-c). This has the effect that if a node is to be interpreted as focal within a (sub)tree, it

will dominate the metrically strongest (pre)terminal in that (sub)tree, without the need for a focus

alignment constraint.

Re ②: A constituent that is metrically strong against the default (and hence receives ‘extra’ stress, e.g.

(4-b)) is focal: it has (NT) alternatives, excluding, however its literal meaning; therefore, such

structures must be interpreted as signalling focus. Additionally, its sister is non-focal (i.e. has only its

literal meaning as an alternative), see (4-d).

Re ③: Being focal is not the same as being F-marked: For example, in (8), both S and DP are

prosodically reversed (and hence stronger than normal), yet DP-alternatives are restricted to ‘someone

else’s mother’, i.e. DP cannot be treated as if F-marked.

13

(1) defaults (highest to lowest):

(2) METRICAL TREE TO STRESS GRID: An assignment of degrees of stress to the terminals of a metrically

annotated phrase marker T is legitimate iff for any branching node N in T, N’s s(trong) daughter

dominates a terminal with a higher degree of stress than that of any terminal dominated by a w(eak)

daughter of N.

(3) STRESS–ACCENT ASSOCIATION: (4)

An association of pitch accents (PAs) to a metrical grid G is legitimate only if (a) no PA is associated

with a column to the right of the highest column of G, and, as far as compatible with that (b) if a

column of height n is associated with a PA, every column of height n or higher is associated with a PA.

(5)

(6) (Has anyone seen the young king’s murderer? — Well, I suppose) [NP the young KING]F has seen his

murderer

(6) John’s mother praised Bill. ([8]’s (47))

a. No, John’s mother praised [JOHN]F

b. #No, John’s mother [PRAISEDF John]F (2 Fs, looses to (6-a)’s 1)

(7) The young king’s mother praised Bill. (2Fs in both, predicted to be on a par)

a. No, the young king’s mother praised [the young KINGF]F

b. #No, the young king’s mother [PRAISEDF the young king]F.

(8)

WEAK STRONG

functional lexical

head complement

left projection right projection

14

Prosody Plus Syntax Does Not Equal Discourse Structure,

And Why Should It?

Corpus-Based Research on the Interaction between Prosody, Syntax and

Discourse Structure

George Christodoulides Service de Métrologie et des Sciences du langage, Université de Mons, Belgium

In this presentation we will focus on methodological aspects of using corpus data to study the

interaction between prosody, syntax and discourse structure in spoken language production and

comprehension.

Over the past decade, a substantial body of research has focused on the prosody-syntax interface, and

especially in the case of French, a particular emphasis has been put on defining a “basic unit” of

speech. Degand & Simon (2005, 2009, 2016) postulate that a “basic discourse unit” is delimited by

coinciding major prosodic and syntactic boundaries and that these units “allow the hearer to start

drawing inferences and seeking for coherence”. However, research in psycholinguistics has

consistently shown that during language understanding, both syntactic analysis and discourse

comprehension are continuous in nature: the listener integrates information as soon as possible, re-

evaluates previous analyses in light of new contradicting prosodic, syntactic or semantic data, and

makes anticipatory predictions (e.g. Trueswell et al., 1994; Tanenhaus et al., 1995; Waters & Caplan,

2004, Dahan & Tanenhaus, 2004). Therefore, the idea that the listener is waiting for a complete

“basic” unit of speech to begin processing would be incompatible with the results of several

psycholinguistic experiments. Furthermore, it has been claimed that these units reflect “a cognitive

reality”, based on the existence of differences in the distribution of “basic unit” types across speaking

styles. However, these differences in distribution of BDU types are an artefact of their definition:

speaking styles are characterised by differences in speech planning, which in turn affect the length of

syntactic structures, and the number and length of silent pauses (the main acoustic correlate of major

prosodic boundaries).

Based on the analysis of corpus data that was recently enhanced with additional annotations (e.g. the

Rhapsodie corpus (Lacheret et al., 2014) and the C-PhonoGenre (Goldman et al., 2014) corpus), we

will show that the prosodic, syntactic and discourse structure levels can be described with varying

levels of granularity. There are congruences and mismatches between each combination of 2 out of the

3 levels (prosody-syntax, syntax-discourse, prosody-discourse). The discourse structure level is a

“first-class” annotation level, equally important to describe as prosodic and syntactic structure. No two

levels are sufficient to predict the third; combining an annotation in prosodic units and an annotation in

syntactical units will not magically result in an annotation in discursive units. A given discursive

relation can be realised with different prosodic patterns combining with different syntactic structures.

While the three levels are not totally independent (hence the importance of studying their interactions),

their relationship is not one of total redundancy either.

We therefore postulate that it will be beneficial to abandon the search for an elusive “basic unit”,

despite its initial apparent simplicity, and that three independent levels of annotation and analysis are

needed for corpus-based research on the relationship between prosody and meaning in spoken

language. We suggest modelling the phenomenon as a time series of cues that are received by the

listener sequentially; congruence and mismatches are events of interest, that may be used by the

listener to update a continuously constructed representation.

Finally, we will attempt to estimate by means of mathematical simulation the size of a corpus that

would be needed for a meaningful statistical analysis of the relationships between prosodic units,

15

syntactic units and discourse relations, given the individual variability in the realisation of these

structures.

References

Dahan, D., Tanenhaus, M.K. (2004) Continuous mapping from sound to meaning in spoken-language

comprehension: evidence from immediate effects of verb-based constraints. Journal of Experimental

Psychology: Learning, Memory, Cognition, 30: 498–513.

Degand, L., Simon, A.C. (2005). Minimal Discourse Units: Can we define them, and why should we?

Proceedings of SEM-05. Connectors, Discourse Framing and Discourse Structure: From Corpus-

based and Experimental Analyses to Discourse Theories.

Goldman, J. P., Prsir, T., Christodoulides, G., Auchlin, A. (2014) Speaking Style Prosodic Variation:

An 8-hour 9-style Corpus Study. In: Campbell, N., Gibbons, Hirst, D. (eds.) Proceedings of Speech

Prosody 2014, 105–109.

Lacheret, A., Kahane, S., Beliao, J., Dister, A., Gerdes, K., Goldman, J.P., Obin, N., Pietrandrea, P.,

Tchobanov, A. (2014) Rhapsodie: a prosodic-syntactic treebank for spoken French. In: Proceedings of

the 9th International Conference on Language Resources and Evaluation (LREC), May 26– 31,

Reykjavik, Iceland.

Tanenhaus, M.K., Spivey Knowlton, M.J., Eberhard, K.M., Sedivy, J.C. (1995) Integration of visual

and linguistic information in spoken language comprehension. Science, 268: 1632–4.

Trueswell, J.C., Tanenhaus, M.K., Garnsey, S.M. (1994) Semantic influences on parsing: use of

thematic role information in syntactic ambiguity resolution. Journal of Memory and Language, 33:

285–318.

Waters, G.S., Caplan, D. (2004) Verbal working memory and on-line syntactic processing: evidence

from self-paced listening. Quarterly Journal of Experimental Psychology A, 57: 129–63.

16

What does it take to make a question biased? – Evidence from perception

data

Sophie Egger & Bettina Braun University of Konstanz

In everyday life we use subtle ways to communicate desires, often without explicitly expressing them.

Asking questions is one way to indirectly utter such desires. Questions with an additional non-truth-

conditional aspect are referred to as biased [1]: they are not plainly information seeking but additionally

express an attitude towards one of the possible answers, e.g., a wish or desire in questions with a bouletic

bias [1-5]. We hypothesize that speakers successfully convey their desires when expressing them in a

biased question, given that interlocutors seem to be aware of it. In order to better understand what leads to

this success in communication, we investigate which prosodic realizations of polar questions (PolQs) lead

to the impression of a bouletic bias.

In a prior production experiment 15 German native speakers (2 male, ø=23.3 years, SD=3.0) produced

either string-identical PolQs with a bouletic bias or as neutral information-seeking question as a verbal

response to written scenarios (see Table 1 for exemplar context). We selected 144 of these original

recordings as auditory stimuli in a decision task. The productions were split into 6 experimental lists.

Twenty-four German native speakers (9 male, ø=21.7 years, SD=3.1) participated in the perception

experiment and judged the questions they heard as “biased”, “neutral” or “I don’t know” by pressing a

button. Each participant was randomly assigned to two of the experimental lists, resulting in 8 responses

for each of the 144 PolQs.

Based on the responses we calculated the bias proportion (i.e., the probability of a question to be judged

as biased) as a continuous factor. We also included identification (biased vs. neutral) as a binary factor (if

identified as such in at least 5 out of 8 cases, i.e., 62.5%). Files that were equally often judged as biased

and neutral (bias proportion=0.5) were excluded from the analysis (15% of the data).

Our prosodic analysis partly follows the analysis in previous work on the realization of epistemic bias in

PolQs by [6]. While we did not find major differences in the accent placement throughout the utterance in

PolQs judged as biased versus neutral, there are phonetic differences in the realization of the rising

accents (L*+H and L+H*): in biased PolQs the pitch range in these accents is on average 1.6st (SD=2.6)

larger than in neutral PolQs (see Table 2 for detailed values split by accent type), although this difference

was not significant. As for the final boundary tone, we find significantly more low-rises (L-H%) in biased

than in neutral PolQs (46% vs. 15%, p=0.0006), while a high final rise (H-^H%) more often lead to the

percept of a neutral PolQ (78% vs. 39%, p=0.0001), see Fig. 1. In utterances with an H-^H% boundary

tone, PolQs judged as biased showed a smaller pitch range in the final rise (12.2st, SD=5.9 vs. 13.1st,

SD=4.1), however, this difference was not significant. Also, the probability of the PolQ being judged as

biased increases significantly with a longer duration of the sentence final object (p=0.004).

Our data suggests that prosodic variation can be sufficient to evoke the impression of a bouletic bias in

string-identical PolQs. Listeners integrate intonation and duration cues in their interpretation. However, in

a number of cases participants were not able to unambiguously judge the question as biased or

information seeking (15%) or a question intended to be produced with a bouletic bias was perceived as

neutral (61%) and vice versa (22%). Future work will, amongst other things, address the question whether

trained speakers (e.g., actors) are overall more successful in conveying the additional non-truth-

conditional aspect of a biased question solely via prosodic means, thus leading to clearer results in the

interpretation by hearers in a perception study.

17

References

[1] Sudo, Y. 2013. Biased polar questions in English and Japanese. Beyond expressives: Explorations in

use-conditional meaning, 275-296.

[2] Reese, B. 2007. Bias in Questions (PhD Dissertation). University of Texas, Austin, TX.

[3] Huddleston, R., & Pullum, G.K. 2002. The Cambridge Grammar of the English Language.

Cambridge: Cambridge University Press.

[4] Reese, B. 2006. The meaning and use of negative polar interrogatives. Empirical Issues in Syntax

and Semantics, 331-354.

[5] van Rooij, R., & Šafárová, M. 2003. On Polar Questions. Talk at the Semantics and Linguistic

Theory, 13 (Seattle, WA).

[6] Domaneschi, F., Romero, M., & Braun, B. (2017). Bias in polar questions: Evidence from English

and German production experiments. Glossa: a journal of general linguistics, 2(1): 26. 1–28.

Neutral condition Biased condition

You and one of your friends are going on

vacation and traveling with an intercity-bus.

You manage to get two seats next to each

other. It doesn’t matter to you where you sit,

but you don’t know which seat your friend

prefers. Therefore you ask him…

You and one of your friends are going on

vacation and traveling with an intercity-bus.

You manage to get two seats next to each

other. You would like to have the aisle seat and

you hope that your friend wants to sit by the

window. You ask him…

Speaker intention (not on display):

I want to know whether you want the window

seat or the aisle seat.

I want you to take the window seat.

Target question: Would you like to sit by the window?

Table 1: Example of written scenario and string-identical PolQ to be produced in the prior production

experiment (productions used as auditory stimuli).

Table 2: Pitch range in semitones (and standard deviation) of rising accents in PolQs

predominantly judged as biased or neutral, split by accent type.

Fig. 1: Percentages of produced boundary tones in PolQs predominantly judged as biased or neutral.

L+H* L*+H

biased 6.5st (2.7) 7.4st (2.7)

neutral 4.5st (1.0) 6.1st (3.2)

0

20

40

60

80

100

H-% H-^H% L-H%

Pe

rce

nta

ges

Boundary Tones

biased

neutral

18

Effects of tune choice on the multi-dimensional interpretation of requests and

offers

Elisa Sneed German1, Caterina Petrone

2, Kiwako Ito

3, and James Sneed German

2

1Université Paul Valéry Montpellier 3, EMMA, Montpellier, France

2Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France

3Department of Linguistics, The Ohio State University, Columbus, USA

Traditional accounts of the semantics of intonational contours assume compositionality, such that the

meaning of a given contour depends on the combined functions of pitch accents and boundary tones

[1]. This framework, however, has yet to incorporate recent research showing that affective meaning

may influence the judgement of speech act (e.g., statement vs. question [2]), that the speaker may

choose different tunes (e.g., for requests and offers) according to their familiarity with the listener [3],

or that perlocutionary meaning is a function of both sentence type and tune [4].

The present research explores how perlocutionary meaning is influenced by tune (rising vs. falling) for

two distinct, yet comparable illocutionary acts: requests and offers (e.g., Can [you/I] bring [me/you]

some water?). A perceptual rating task elicited participants’ responses along three scales: speaker

MOOD, SINCERITY, and AUTHORITY (cf. [5]). In line with [4], we expected the combination of falling

contour and polar question to evoke negative judgments of speaker mood and a perception of higher

speaker authority. We were particularly interested in a possible asymmetry between requests and offers

with respect to the effects of falling tune on perceived speaker sincerity: speakers can utter offers (Can

I bring you some water?) without really intending/desiring to carry out the offered act, while by

contrast, requests (Can you bring me some water?) are unlikely to be produced with no intention/desire

of receiving a favor.

Two female native speakers of AE recorded 96 request-offer pairs with both rising (L* L-H%) and

falling (H* L-L%) contours. Acoustic analyses of the stimuli showed similar speech rates for the two

speakers, while Speaker 2 had generally larger f0 movements than Speaker 1. In particular, the nuclear

pitch accent was higher before falling contours and lower before rising contours, and both the rising

and falling contours had larger f0 movements for Speaker 2 than for Speaker 1.

A total of 22677 responses from 237 participants were elicited using a Mechanical Turk online survey.

Each participant rated 96 items (6 blocks of 16 items sorted by utterance type (request/offer) and

question type (MOOD/AUTHORITY/SINCERITY)). Each trial presented an audio file, then a question with a

sliding scale (Figure1). Each participant received only one of the three question types (see (1) for an

example) per item.

Analyses of the responses confirmed the main effect of falling tune, irrespective of speaker

differences, to evoke the perception of a less happy MOOD of the speaker (t=-16.45, p<.001: Figure2),

higher speaker AUTHORITY (t=8.82, p<.001: Figure3), and less SINCERITY (t=-11.44, p<0.001: Figure4).

In addition, we found that the falling tune raises speaker AUTHORITY to a greater degree for requests

than for offers (t=3.6, p<.001: Figure3) and lowers SINCERITY to a greater degree for offers than for

requests (t=-4.67, p<.001: Figure4). These results reinforce findings that intonational tune is a

fundamental cue for perlocutionary/affective meaning. Moreover, they reveal that the different social

ramifications of different illocutionary acts can influence how tune maps onto such meaning. We aim

to further investigate the correlations among these interpretational dimensions, and test how the

presence of the discourse background or knowledge of speaker-listener power relationships influences

utterance assessments.

19

Figure 1. Example Display

Figure 2. Mood rating. Figure 3. Authority rating. Figure 4. Sincerity rating.

(1) Example target sentence and question set:

Target request/offer sentence:

Can [you/I] bring [me/you] some water?

MOOD question (for both request and offer):

What is the speaker's mood?

Very unhappy/happy -------------------------------------Very happy/unhappy

AUTHORITY question (for both request and offer):

Who does the speaker think has more authority in this situation?

The speaker/listener ------------------------------------- The listener/speaker

SINCERITY question:

(For request)

Does the speaker want the listener to bring her some water?

(For offer)

Does the speaker want to bring the listener some water?

Not at all/Very much ------------------------------------- Very much/Not at all

References [1] Pierrehumbert, J., & Hirschberg, J. 1990. The meaning of intonational contours in the interpretation of discourse. In P.

Cohen, J. Morgan, & M. Pollack (Eds.), Intentions in communication (pp. 342-365). Cambridge: MIT Press.

[2] Pihan, H., Tabert, M., Assuras, S., & Borod, J. 2008. Unattended emotional intonations modulate linguistic prosody

processing. Brain and Language, 105, 141-147.

[3] Astruc, L., Vanrell, M., and Prieto, P. (2016). Cost of the action and social distance affect the selection of question

intonation in Catalan. In: Intonational Grammar in Ibero-Romance: Approaches across linguistic subfields. John

Benjamins Publishing Company, pp. 91-114.

[4] Jeong, S., Potts, C. (2016). Intonational sentence-type conventions for perlocutionary effects: An experimental

investigation. Proceedings of SALT 26, 1-22.

[5] Roberts, F., Francis, A.L., Morgan, M. (2006). The interaction of inter-turn silence with prosodic cues in listener

perceptions of 'trouble' in conversation. Speech Communication 48 (9): 1079–1093.

20

Givenness and prosody, how French in-situ questions

are not linked to givenness

Ramona Wallner University of Konstanz

Spoken Continental French can employ two different strategies to form information-seeking

questions: the wh-word can be fronted (1a) or it can appear in-situ (1b):

1a) Qu’ est-ce que tu fais ce soir ? 1b) Tu fais quoi ce soir ?

what (-esk) you do this-evening ? you do what this-evening ?

‘What are you doing tonight?’

There is a substantial body of literature explaining speakers’ choice to use an interrogative with

non-fronted wh-word (WiQ) in French; all claiming that WiQs have to be more restricted than

their fronted counterpart in some way. The most recent claims by Hamlaoui (2010, 2011) and

Déprez et al. (2012) are based on the idea, that WiQs are linked to givenness, namely that the

non-wh-part of the question has to be given (in a broad sense i.e. evoked (Schwarzschild 1999))

build on the fact that French cannot realize focus stress to the left and the need to de-accent

given phrases. This is closely linked to analyses of echo questions (see Bartels 1999) where

"exempting all constituents except the wh-expression from the focus has the effect of linking the

utterance to a prior commitment the addressee has made to the presupposed proposition").

Outline: This poster takes a fresh approach at WiQs, claiming that the account of givenness may

be true for echo questions in French, but is not the right explanation for WiQs. WiQs are not

restricted by givenness, the inference that WiQs have to be given is not true, as we can find out-

of-the-blue WiQs. However, WiQs do not show certain surface structures, pointing towards the

inference that WiQs are restricted by prosodic constraints rather than de-accenting due to

givenness. The poster will show how a prosodic analysis can capture WiQs peculiarities. This is

strengthened by new experimental data.

Hypotheses: Speaker rate WiQ prosody much more natural if the WiQ uses dislocation of the

subject. Fully spelled out DPs as subjects disturb WiQ prosody no matter their givenness status,

showing that dislocation of full phrases in WiQs is not linked to givenness, but indicating a

prosodic strategy.

New experimental findings: In this pilot experimental study 23 contexts for non-discourse

given subjects were created (+19 filler), and the target sentence was presented auditorily in two

conditions: full-phrase (2a) and (right) dislocation (2b):

The subject of the question was not given in the context but evoked and in some cases out-of-

the-blue. 50 participants were asked to rate them on a 7-point-Likert-scale on how natural it

sounded to them. The ratings were analyzed using general additive models (GAMMs) with the

ocat-linking function for ordered categorical data. Phrase was added as a fixed factor, subjects

and items were entered as random smoothers. Results showed a significant effect of phrase

(2) Context: You are helping your friend to move into a new apartment. You are looking

through her stuff in the boxes. She is in another room but she can hear you. You ask:

a) La vasaille va où ? b) Elle va où, la vasaille ?

the dishes go where she1 goes where the dishes1

“Where do the dishes go?”

21

condition (ß = 0.9, SE = 0.02, z = 40, p < 0.0001). Dislocation were rated significantly higher

than full phrases with answers ranging from 6-7 (very good to excellent) and full phrases ranging

mostly at 3-6 (a bit bad to good). This can be seen in table 1. The results show, that even though

the dislocation was not provoked by givenness, it was the preferred strategy.

Proposal: WiQs in French are not tied to

givenness. WiQs have to adhere to a special

prosodical structure that will give focus-

marking to the wh-word as the first sentence

stress. To ensure this, the wh-phrase has to sit

on the right edge of the first Accentual Phrase

(AP) (Jun & Fougeron 2002). Subject clitics

have to replace full phrase subjects, as they

would create their own AP. Every intervener

that forms its own AP should be

automatically out as well.

References

Bartels (1999) ‘The Intonation of English Statements and Questions’

Deprez et al. (2012) ‘Interfacing Information and Prosody: French wh- in situ Questions.’

Hamblin (1973) ‘Questions in Montague English’

Hamlaoui (2010) ‘A prosodic study of wh-questions in French natural discourse’

Hamlaoui (2011) ‘On the role of phonology and discourse in Francilian French wh-questions’

Jun & Fougeron (2002) ‘The Realizations of the Accentual Phrase in French Intonation’

Schwarzschild (1999) ‘GIVENness, AvoidF and other constraints on the placement of accent’

Table 3: ratings 7-point-Likert-scale

Poster Sessions

24

A formal account of focus in French

Muriel Assmann, Daniel Büring, Izabela Jordanoska and Max Prüller University of Vienna

Introduction

In this paper we propose a formal account to calculate focus alternatives in French using the

Unalternative Semantics framework [2]. We base our model primarily on the data and findings in

[4], and submit that the possible focus configurations in French can be modeled using two simple

relational constraints: the weak restriction and the strong restriction.

Background

In languages like English, focus is marked mainly by pitch accenting. The prosody of English

can be represented into metrical trees, where each node is labelled as (w)eak or (s)trong

according to default stress assignment rules (see [2, p.561] for more details). UAS’s weak and

strong restrictions directly derive the possible (un)alternatives between sister nodes, without the

need of a mediator such as an F-marker. French, on the other hand, does not have word stress

and doesn’t use pitch accenting to mark focus, which is a challenge for a theory that calculates

focus alternatives from stress and accent patterns. We will show how the focus realization can be

defined for French if we take phrasing to be the primary prosodic effect of focus in French. Our

analysis is based on the data put forward in [4], which can be generalized as follows: i) Focused

elements always form phrases to the exclusion of unfocused material, ii) postfocal material

forms prosodically weak phrases. We write weak phrases as [ ]W, in contrast to fully accented

phrases ([ ]S). Generalization ii) above is a departure from Féry’s claim that postfocal material is

dephrased, motivated by the findings, among others, in [3] and [1], who describe various kinds

of post-focal prosody, such as (complete or relative) deaccenting or iterative downstep.

Differences between the realization of prefocal and focal phrases are assumed to be strictly

phonetic.

Data

A phrase never contains focused material and unfocused material at the same time (with the

exception of functional elements); this is illustrated in (1), where the focused PP was phrased

separately, regardless of pre-focus phrasing. Furthermore, focused elements can be split over

several phrases as long as none of them contains unfocused material, as shown in (2). Where not

sentence-final, the focused material will be followed by one or more weak phrases, as in (3).

Proposal

In order to capture these facts, we propose the rules in (7) – (9), consisting of the strong and

weak restriction of Unalternative Semantics as defined in (4) – (6).

Appendix

Examples of possible focus–phrasing patterns:

(1) Q: Comment Daniel promène-t-il son chien?

‘How does Daniel walk his dog?’

A: a. [Daniel promène son chien]S [en laisse]S

b. [Daniel promène]S [son chien]S [en laisse]S

c. [Daniel]S [promène son chien]S [en laisse]S

‘Daniel walks his dog ON A LEASH’. (ex. (31) in [4])

25

(2) Q: Qu’ont fait les marins?

‘What did the seamen do?’

A: [Les marins]S [ont réparé]S [le grand mât]S

‘The seamen FIXED THE MAST’. (VP-focus) (ex. (20) in [4])

(3) Q: Qui peint le garage en noir?

‘Who is painting the garage black?’

A: [Le garçon]S [peint le garage en noir]W

‘THE BOY is painting the garage black’. (Subject focus) (ex. (21) in [4])

Definitions:

(4) Focal elements:

a. A terminal element is focal if it introduces alternatives.

b. A constituent is focal if at least one of its daughters is focal.

(5) Weak restriction A B

The sister at the tail of the arrow can only be focal if the sister at the tip of the arrow is.

(6) Strong restriction A B

The sister of the tail of the arrow cannot be focal. The sister at the tip of the arrow is focal.

Rules:

(7) A B

Weak restriction applies in both directions if no phrase boundary intervenes.

Within a phrase, everything is focal or nothing is.

(8) A] [S B

The left sister of a full phrase can only be focal if its right sister is focal.

Phrases to the left of a focal phrases may always also be focal.

(9) A]S [W B

The strong left sister of a weak phrase is focal.

The rightmost strong phrase is always focal.

[1] C. Beyssade, E. Delais-Roussarie, J. Doetjes, J.-M. Marandin, A. Rialland. Prosody and

Information Structure in French. In F. Corblin and H. de Swart, eds, Handbook of French

Semantics, pages 477–499. CSLI publications, 2004.

[2] D. Büring. Unalternative semantics. In S. D’Antonio et al., eds, Proceedings of SALT 25,

pages 550–575. Linguistic Society of America, 2015.

[3] A. Di Cristo and L. Jankowski. Prosodic organisation and phrasing after focus in French. In

Proceedings of XIVth

ICPhS, p. 1565-1568, 1999.

[4] C. Féry. Focus and Phrasing in French. In C. Féry and W. Sternefeld, eds., Audiatur Vox

Sapientiae. A Festschrift for Arnim von Stechow, pages 153-181. Berlin, Akademie-Verlag,

2001.

26

Do prenuclear accents reflect meaning differences in German?

Stefan Baumann & Jane Mertens IfL Phonetik, University of Cologne, Germany

Background and Motivation: The majority of studies on the relation between prosody and meaning restrict themselves to the form and function of nuclear accents, commonly defined as the last pitch accent in an intonation unit. The status of prenuclear accents – i.e. pitch accents that occur before the nucleus within the same intonation unit – is less clear, however. It has been claimed that prenuclear accents do not contribute much to the meaning of an utterance and that they are optional in many cases (cf. Büring's [1] ornamental accents on prefocal elements). Other studies found that prenuclear accents were placed consistently, even on textually given information [2,3]. The aim of the present study is to find out whether differences in the information status of a sentence-initial referent and the type of focus domain the referent is part of influences its prosodic realisation. Methods: We collected data from 29 native German speakers (21f, 8m; age: 19-30), as part of a large-scale comparison with AE and Spanish speakers. They were presented with 20 different mini-stories on a screen. Subjects were asked to read out the stories, which consisted of two context sentences and a target sentence, at a natural but swift speech rate. By varying the second context sentence we designed four conditions rendering the subject argument in the target sentence either given, accessible, new or contrastive (see (1) for the target word Nonne 'nun'; expected prenuclear and nuclear accents are underlined). In the first three conditions the target words are in broad focus, while in the last condition the target word is a contrastive topic. Each participant was presented with only one condition per story, resulting in five realisations of each condition per speaker. The classification of phrase breaks and accent types that entered our ana-lysis was based on a consensus judgment of two trained phoneticians. In addition we measured a number of phonetic parameters such as F0 slope and range as well as duration and RMS intensity of the target word and its stressed syllable. Furthermore, we investigated the Tonal Center of Gravity (TCoG) [4], a holistic measure that incorporates the contributions of contour shape and the alignment and scaling of turning points. The measures themselves reflect either a temporal value (TCoG alignment) or a pitch level (TCoG scaling) within the sampled F0 region that represents the balancing point of the area under the curve. Results and Discussion: We had to exclude 15% of the target sentence realisations, mostly because subjects produced a phrase break after the target word, turning potentially prenuclear accents into nuclear accents. All but five of the remaining 493 utterances carried a prenuclear accent on the target word. L*+H was the most frequent marker in all conditions (74%, Fig.1). A linear regression analysis revealed a significant main effect (p<0.001) for the slope and range (p<0.001) of the prenuclear pitch rise. Figure 2 shows the results for all target words indicating an increase in range (and thus in prominence) from given to new referents – but also a less pronounced rise in pitch in contrastive topics (unlike [3]). TCoG scaling (as well as RMS intensity) showed a significantly lower value for contrast than for the three information status categories under broad focus. This result was surprising but turned out to be stable: Most subjects produced a rather flat hat pattern in the (contrastive) double focus condition. Presumably, speakers do not feel the need to make the contrasted items prosodically prominent since the contrast is already expressed by the parallel syntactic structure. Moreover, the prosodic makeup of the prenuclear accent also seems to depend on the nuclear accent. An investigation of our full dataset including both the shape and meaning of the whole contour and the contribution of the prenuclear area will be provided. In any case, although we only see a subtle influence of the subject argument’s informativeness on its prosodic realisation, some small but systematic phonetic effects suggest that prenuclear accents in German are to some extent affected by the information structure of an utterance, challenging a strict view on prenuclear accents as being merely 'ornamental'.

27

(1)

Context 1: Nach dem langen Winter freuten sich alle auf ein paar sonnige Stunden im Freien.

(After the long winter everybody was looking forward to a couple of sunny hours in the open.)

Context 2a ('given'): Die Nonne kümmerte sich um den Klostergarten.

(The nun was looking after the cloister garden.)

Context 2b ('accessible'): Im Klostergarten blühten die ersten Pflanzen.

(The first plants bloomed in the cloister garden.)

Context 2c ('new'): Die Sonne schien schon den ganzen Tag und der Schnee war endlich geschmolzen.

(The sun had been shining all day and the snow had finally melted.)

Context 2d ('contrast'): Der Mönch hat einen Brombeerstrauch gegossen.

(The monk watered a blackberry bush.)

Target: Die Nonne hat einen Mandelbaum gegossen. (The nun watered an almond tree.)

Figure 1. Distribution of accent types as a function of a sentence-initial referent's information status and focus

condition. Perceptual prominence of accent types increases from left (0 = deaccented) to right (L+H*).

Figure 2. F0 range of prenuclear rises on all test words and for all accent types as a function of their information

status and focus condition.

References [1] Büring, D. 2007. Intonation, Semantics and Information Structure. In Ramchand, G. & Reiss, C. (Eds.), The

Oxford Handbook of Linguistic Interfaces. Oxford University Press, 445-474.

[2] Féry, C., & Kügler, F. 2008. Pitch accent scaling on given, new and focused constituents in German. Journal of

Phonetics 36(4), 680-703.

[3] Braun, B. 2006. Phonetics and phonology of thematic contrast in German. Lang. and Speech 49(4), 451-493.

[4] Barnes, J., Veilleux, N., Brugos, A., & Shattuck-Hufnagel, S. 2012. Tonal Center of Gravity: A global approach

to tonal implementation in a level-based intonational phonology. Lab. Phonology 3(2), 337-383.

28

Prosodic realization of dual focus in French declarative sentences

Emilie Destruel1, Caroline Féry

2

1University of Iowa

2Goethe University Frankfurt

Introduction: Within the past literature on prosody, the realization of dual focus (i.e. sentences that answer

interrogatives containing two ‘wh’-phrases) has received little attention. Yet, this is an important issue

given that mainstream prosodic theories typically disallow two main prosodic heads within one prosodic

domain (Selkirk, 1995; Truckenbrodt, 1995). Indeed, dual focus elicits a conflict between the need to

include the entire sentence in a single intonation phrase (i.e. with one prominent accent) and the need to

realize two foci (i.e. each with their own prominent accents) and thus to divide the sentences in two

intonation phrases (Kabagema-Bilan, López-Jiménez & Truckenbrodt, 2011). Crosslinguistically, a few

studies do exist, all converging on the result that, when the sentence containing dual focus is sufficiently

long, its intonation amounts to more than just concatenating two single foci (see for instance Eady et al.,

1986 for English; Rump & Collier, 1996 for Dutch). Nevertheless, different strategies are reported for

different languages—but to date, no study has examined how French deals with this prosodic conflict—

although results for single (narrow) focus show that focus has different effects according to the prosodic

phrasing of the focused constituent and the following given portion of the sentence (Jun & Fougeron, 2000;

Delais-Roussarie et al., 2002). Given this necessarily short backdrop, the main goal of this paper is to

examine how French signals prosodic prominence in post-verbal sequences of dual focus sentences that

include objects (OO) or adjuncts (AA). More specifically, the following questions will be addressed: (RQ1)

Do objects and adjuncts differ in their phonetic correlates?; (RQ2) how does prosodic prominence vary

depending on the prosodic length of the post-verbal constituents?; and (RQ3) how does dual focus compare

to the realization of other foci, specifically single focus and all-new (or broad focus)?

Methods: The paper reports on a production experiment with 16 female native speakers of Standard

French who were asked to read aloud a series of target sentences after hearing questions triggering different

focus-background structures. The experiment was controlled for three factors: the POST-VERBAL SEQUENCE

(i.e. obj/obj or adj/adj), the PROSODIC LENGTH of the focus (i.e. short, 3-4 syllables, or long, 7-8 syllables),

and the TYPE OF FOCUS in the post-verbal sequence (i.e. initial, final, dual or all-focus). Examples (1) and

(2) illustrate target sentences. A total of 4 lexicalizations per condition was created—the analysis is based

on a total of 481 sentences. For each sentence, syllable boundaries were manually inserted with the help of

spectrograms in PRAAT. These labels provided the basis for duration measurements. A PRAAT script was

then used to get F0max and duration on each single post-verbal constituent as well as on the verb itself.

Mixed-effects linear regression models were used predicting F0max and duration from the fixed-effect

factors of interest (post-verbal constituent, focus type and length), and their interaction when relevant.

Results: Regarding RQ1, there was a main effect of Post-Verbal Sequence (β = -6.985, SE = 2.077, t = -

3.36), suggesting that V F0max and V duration were significantly lower and shorter when followed by an

object than an adjunct. Regarding RQ2, correlates of phrasing were clearly affected by length of the

prosodic constituents. Indeed, there were higher boundary tones in long constituents, more additional high

tones, less downstep, less occurrences of deaccenting, and more breaks separating the two constituents.

Finally, regarding RQ3, no clear correlate of dual focus was found as compared to all-focus in the

statistical data for F0 and duration: the F0max value of DF did not differ from AF in both constituents.

There was also no F0 lowering after the first focus in DF than in IF condition, and this because there is no

place to realize any compression due to the final high tone in this language (see Figure 1). A further very

interesting finding in the data concerns the amount of individual variation observed: the number and

position of high tones, as well as the particular scaling relationship between them provides a powerful tool

for the expression of (dual) focus. In sum, dual focus in French does not trigger any special prosodic

feature. It resembles all-focus more than a concatenation of an initial and a final focus, and as such, differs

29

drastically from the other languages investigated so far—giving some insight about why French prosody

can be so difficult to pin down.

(1) Object + object/short

Ségolène a caché [un trésor]object1 [à sa mère]object2

‘Ségolène hid a treasure from her mother.’

a. Initial focus (IF): Qu’est-ce que Ségolène a caché à sa mère? ‘What did S. hid from her mother?’

b. Dual focus (DF): Qu’est-ce que Ségolène a caché et à qui? ‘What did S. hid and from whom?’

c. Final focus (FF): À qui est-ce que Ségolène a caché un trésor? ‘From who did S. hid a treasure?

d. All focus (AF): Qu’est-ce qu’il s’est passé? ‘What happened?’

(2) Adjunct + adjunct/ long

Ségolène l’a caché [dans un placard abîmé]adjunct1 [au milieu du mois d’avril]adjunct2

‘S. hid it in an old cupboard during the month of April.’

Fig.1 Pooled normalized means for F0max per focus condition for sentences with OO (top panels)

and AA (bottom panels) sequences, per length (short on the left panels and long on the right panels,

respectively).

Selected references. Delais-Roussarie, E., A. Rialland, J. Doetjes & J-M. Marandin (2002) “The

prosody of post-focus sequences in French.” Proceedings of Speech Prosody • Jun, S.-A. & C.

Fougeron (2000) “A Phonological model of French intonation”. In Intonation: Analysis, modeling

and technology • Eady, S. J., W. E. Cooper, G. V. Klouda, P. R. Mueller, & D. W. Lotts (1986)

Acoustical characterization of sentential focus: narrow vs. broad and single vs. dual focus

environments. Language and Speech • Rump, H. H., R. Collier (1996). Focus Conditions and the

Prominence of Pitch-Accented Syllables. Language and Speech.

30

Prosody as a biological anchor of meaning effects: An evolutionary

perspective

Piera Filippi Aix Marseille Univ, CNRS, Brain and Language Research Institute, France

Prosodic modulation of the voice is a core component of language, which orients perception of

words within the spoken signal, syntactical connections between phrases, or discrimination of

sentence types, as for instance questions and statements (3). Despite its central role in language,

research on the biological anchors of prosody remains, to date, largely unexplored. To the aim of

filling this gap, recent studies have been conducted within a comparative approach to human and

nonhuman animal species. In this presentation, I will review these studies and present empirical

data which may shed light on the evolutionary origins of mechanisms underlying the interplay

between prosody and segmental information in conveying meanings and in triggering reactions

in the listeners.

The comparative study of emotional communication and of size expression in nonhuman animals

is particularly informative in this research frame. Specifically, the general hypothesis at the base

of my work is that ability to express and identify emotional or size-related information - which is

shared among all biological classes of vocalizing animals and across human cultures (8; 9; 15) -

is an innate mechanism that boosts arbitrary sound-meaning association learning and the

development of vocal communication.

Indeed, recent research suggests that the signaler’s emotional states and body size are

respectively expressed through prosodic correlates that are shared across animal vocal

communication systems (1; 8; 13; 14). Humans across different cultures use information related

to fundamental frequency to judge the emotional content of vocalizations across amphibia,

reptilia, and mammalia (8; 9), as well as body size of the signaler (14). These results suggest that

fundamental mechanisms of vocal emotional expression are widely shared among vocalizing

vertebrates and could represent an ancient signaling system. The combination of these data with

evidence on the coexistence of prosodic modulation and segmental information in modern

human’s language suggests that the ability to express emotional and size-related content through

prosodic modulation of the voice is evolutionary older than the ability to process segmental

information and may have boosted the emergence of the ability to articulate segmental

information within prosodic contours (2; 4; 5; 7). Accordingly, multiple studies suggest that

prosody drives words’ segmentation and the ability to map sounds to meanings in human adults

(6; 11) and preverbal children (12). Finally, in line with these studies, research conducted on

humans show that prosodic modulation of the voice is dominant over verbal content and faces in

meaning identification tasks (10). To conclude, implications for the debate on the uniqueness of

humans’ ability to express meaning by modulating prosodic information in the vocal signal will

be discussed.

31

References

1. Brown, S. (2017). A joint prosodic origin of language and music. Frontiers in Psychology, 8, 1894.

2. Cutler, A., Dahan, D., & Van Donselaar, W. (1997). Prosody in the comprehension of spoken

language: A literature review. Language and speech, 40(2), 141-201.

3. Darwin, C. (1871). The descent of man and selection in relation to sex. London: Murray.

4. Fitch, W. T. (2010). The evolution of language. Cambridge: Cambridge University Press.

5. Filippi, P. (2016). Emotional and interactional prosody across animal communication systems: A

comparative approach to the emergence of language. Frontiers in Psychology, 7, 1393.

6. Filippi, P., Congdon, J. V., Hoang, J., Bowling, D. L., Reber, S. A., Pašukonis, A., … Güntürkün, O.

(2017a). Humans recognize emotional arousal in vocalizations across all classes of terrestrial

vertebrates: Evidence for acoustic universals. Proceedings of the Royal Society B: Biological

Sciences, 284, 20170990.

7. Filippi, P., Gingras, B., & Fitch, W. T. (2014). Pitch enhancement facilitates word learning across

visual contexts. Frontiers in Psychology, 5, 1468.

8. Filippi, P., Gogoleva, S. S., Volodina, E. V., Volodin, I. A., & de Boer, B. (2017b). Humans identify

negative (but not positive) arousal in silver fox vocalizations: Implications for the adaptive value of

interspecific eavesdropping. Current Zoology, 63, 445-456.

9. Filippi, P., Ocklenburg, S., Bowling, D. L., Heege, L., Güntürkün, O., Newen, A., & de Boer, B.

(2017c). More than words (and faces): Evidence for a Stroop effect of prosody in emotion word

processing. Cognition and Emotion, 31, 879-891.

10. Johnson, E. K., & Jusczyk, P. W. (2001). Word segmentation by 8-month-olds: When speech cues

count more than statistics. Journal of Memory and Language, 44, 548-567.

11. Gout, A., Christophe, A., & Morgan, J. L. (2004). Phonological phrase boundaries constrain lexical

access II. Infant data. Journal of Memory and Language, 51, 548-567.

12. Ohala, J. J. (1983). Cross-language use of pitch: an ethological view. Phonetica, 40(1), 1-18.

13. Rendall, D., Kollias, S., Ney, C., & Lloyd, P. (2005). Pitch (F 0) and formant profiles of human

vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic

allometry. The Journal of the Acoustical Society of America, 117(2), 944-955.

32

Embedding Context-Dependent Variations of Prosodic Contours using

Variational Encoding for Decomposing the Structure of Speech Prosody

Branislav Gerazov1,2

, Gérard Bailly2, Omar Mohammed

2, Yi Xu

3, Philip N. Garner

4

1 FEEIT, UCMS, Skopje, Macedonia,

2 GIPSA-Lab, Grenoble, France,

3 UCL, London, UK

4 Idiap, Martigny, Switzerland

Prosody in speech is used to communicate a variety of linguistic, paralinguistic and non-linguistic

information via multiparametric contours. The Superposition of Functional Contours (SFC) model

is capable of extracting the average shape of these elementary contours through iterative analysis-

by-synthesis training of neural network contour generators (CGs) (Bailly and Holm, 2005).

grammatical dependencies, cliticisation, focus, as well as tones in Mandarin. An example prosodic

decomposition of the intonation contour for the French utterance “Son bagou pourrait faciliter la

communauté.” based on the annotated linguistic functions is shown in Fig. 1.

The Weighted SFC (WSFC) model is an extension to the SFC that can capture the prominence of

each functional contour in the final prosody (Gerazov et al., 2018b). It does so through expanding

the CGs with a weighting module that outputs a scaling factor based on their linguistic context.

The WSFC has been shown to be able to successfully capture the impact of attitude and emphasis

on prominence.

While the WSFC successfully captures gradience, the true spatio-temporal variance of these

prosodic contours is multidimensional. To this effect, we recently proposed a Variational Prosody

Model (VPM) that is able to capture a part of this variance (Gerazov et al., 2018a). Its variational

CGs (VCGs) use the linguistic context input to map out a prosodic latent space for each contour.

This two-dimensional latent space can then be used to visualise the captured context-specific

variation. Since the VCGs are still based on synthesising the contours based on rhythmic unit

position input, the mapped prosodic latent space is amenable for exploration only for short

contours, such as Chinese tones or clitics, shown in Fig. 2.

Here we propose an extension on the VPM based on variance embedding and recurrent neural

network contour generators (VRCGs). In our new approach, we use a variational encoder to

embed the context-dependent variance in a latent space that is used to initialise a long short term

memory (LSTM). The LSTM then uses rhythmic unit positions to generate the prosodic contour.

This approach decouples the prosodic latent space from the length of the contour’s scope, thus it

can now be readily explored even for longer contours. Fig. 3 shows the embedded variance in the

prosodic latent space of the left-dependency contour solicited in 6 different attitudes. We can

clearly see that the declaration and especially exclamation attitudes give a full contour realisation,

while the other induce its suppression.

References

[Bailly and Holm2005] Gérard Bailly and Bleicke Holm. 2005. SFC: a trainable prosodic model. Speech

communication, 46(3):348–364.

[Gerazov and Bailly2018] Branislav Gerazov and Gérard Bailly. 2018. PySFC – a system for prosody

analysis based on the superposition of functional contours prosody model. In Speech Prosody, June.

[Gerazov et al.2018a] Branislav Gerazov, Gérard Bailly, Omar Mohammed, Yi Xu, and Philip N. Garner.

2018a. A variational prosody model for the decomposition and synthesis of speech prosody. In ArXiv e-

prints https://arxiv.org/abs/1806.08685 , June.

[Gerazov et al.2018b] Branislav Gerazov, Gérard Bailly, and Yi Xu. 2018b. A weighted superposition of

functional contours model for modelling contextual prominence of elementary prosodic contours. In

INTERSPEECH, Septembre.

https://arxiv.org/abs/1806.08685

https://arxiv.org/abs/1806.08685

33

Figure 1: Example Praat annotation (left) and SFC decomposition (right) of the intonatino of the French utterance: “Son

bagou pourrait faciliter la communauté.” The example shows the extracted elementary contours for the annotated

linguistic functions: declaration (DC), dependency to the left/right (DG/DD), and cliticisation (DV, XX).

Decomposition was done using the PySFC system, and the figures are taken from (Gerazov and Bailly, 2018).

Figure 2: Structure of the prosodic latent space for the French clitic function contour XX dependent on the attitude

context (left): declaration (DC), question (QS), incredulous question (DI), evidence (EV), suspicion (SC), and

exclamation (EX); DC and EX only elicit a full-blown contour. Prosoodic latent space of Chinese tone 3 dependent on

the emphasis context (right): no (none), pre- (EMp), on- (EM), and post-emphasis (EMc); on-emphasis the tone has

pronounced prominence. Figures taken from (Gerazov et al., 2018a).

Figure 3: Prosodic latent space of left-dependency function contour (DG) structured based on attitude context with

attitude codes same as in Fig. 2; again DC and EX elicit full-blown contours, with EX inducing larger contour

prominence.

34

The prosodic realisation of subordinate constructions: peaks or troughs?

Manon Lelandais & Gaëlle Ferré Université de Nantes, UMR6310 CNRS LLING

Based on video recordings of conversational British English, this study tests whether several

different subordinate syntactic structures all vocally provide background information. The

analysis focuses on the three most widespread types of finite subordinate constructions working

as syntactic modifiers in our oral corpus of spontaneous interaction: adverbial clauses, restrictive

relative clauses, and appositive relative clauses. Modifiers are described in linguistics as

dependent elements specifying or elaborating upon another content in the host structure (e. g.

Tomlin 1985; Lambrecht 1996; Huddleston & Pullum 2002). However, the literature shows little

consensus in weighing their informational input: while the information conveyed in subordinate

structures is seen as serving grounding functions in discourse (Fleischman 1985), Cristofaro

(2003) and Langacker (2008) signal that semantic and/or illocutionary subordination need not

align with syntactic subordination, and that the notion of subordination is best understood in

terms of dynamic conceptualisation. This study therefore questions whether subordinate

constructions all express the same absence of prominence in terms of prosody. Although vocal

characteristics have been defined for subordination in general (Bolinger 1984; Couper-Kuhlen

1986; Ward and Hirschberg 1984; Hirschberg and Grosz 1992; Wichmann 2000), few studies

have provided a qualified picture of their vocal input. Beyond showing that subordinate

constructions do not show the same absence of prosodic prominence, the results suggest that

prosodic emphasis is mostly expressed with tonal resources in subordinate constructions.

Appositive relative clauses do not show any prosodic cue for prominence. They are the shortest

and fastest forms in terms of rhythm, and they show a majority of falling-rising contours. These

specificities serve more the expression of modality rather than that of informational emphasis.

While adverbial clauses significantly show more variation in pitch height and feature the highest

distribution of high rising contours among their embedding sequence, restrictive relative clauses

stand out from their co-text with distinctive rising-falling contours. Restrictive relative clauses

also show more syllabic lengthening than the co-text and the other syntactic types, and they

stand out as the longest segment. The vocal cues in restrictive relative clauses fully participate to

the construction of the foreground. Prosody then creates very distinct differences between the

types, contradicting their traditionally unified picture.

Keywords: subordination, background information, information structure, focus.

References

Bolinger, D. (1984). "Intonational signals of subordination." Proceedings of the Annual Meeting

of the Berkeley Linguistics Society. Berkeley, CA: eLanguage, pp. 401–413.

Couper-Kuhlen, E. (1986). "Intonation and grammar." In An Introduction to English Prosody.

Tübingen, Germany: Max Niemeyer Verlag, pp. 139–157.

Cristofaro, S. (2003). Subordination. Oxford, UK: Oxford University Press.

Fleischman, S. (1985). "Discourse functions of tense-aspect distinctions in narrative: Toward a

theory of grounding." Linguistics 23: 851–882.

Hirschberg, J. and Grosz, B. (1992). "Intonational features of local and global discourse

structure." Proceedings of the Workshop on Speech and Natural Language. Morristown,

NJ: Association for Computational Linguistics, pp. 441–446.

35

Huddleston, R. and Pullum, G. K. (2002). The Cambridge Grammar of the English Language.

Cambridge, UK: Cambridge University Press.

Lambrecht, K. (1996). Information Structure and Sentence Form: Topic, focus, and the mental

representations of discourse referents. New York: Cambridge University Press.

Langacker, R. W. (2008). "Complex sentences." In Cognitive Grammar. A Basic Introduction.

Oxford, UK: Oxford University Press, pp. 406–453.

Tomlin, R. S. (1985). "Foreground-background information and the syntax of subordination."

Text 5(1–2): 85–122.

Ward, G. and Hirschberg, J. (1985). "Implicating uncertainty: The pragmatics of fall-rise

intonation." Language 61–4: 747–776.

Wichmann, A. (2000). Intonation in Text and Discourse. London, UK: Longman.

36

Does the addressee matter when producing French prosodic focus marking?

Amandine Michelas, Maud Champagne-Lavau Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France

A common assumption about the prosody-pragmatics interface is that prosodic phrasing is one of

the main cues to encode contrastive focus in French (Féry, 2001; Dohen & Loevenbruck, 2004;

Chen & Destruel, 2010). For instance, in noun-adjective pairs such as bougies violettes vs.

bonbons violets ‘purple candle’ vs. ‘purple candies’, French speakers parse the noun in the 2nd

fragment in a separate prosodic phrase from the adjective when this noun contrasts with the 1st

noun in the pair (e.g., bougies violettes followed by [BONBONS] [violets]). By contrast, they

produce it in the same prosodic phrase when it refers to the same noun but with a different

modifier (bonbons marron followed by [bonbons violets]; Michelas et al., 2014). However, a

debate remains regarding to what extent speakers prosodically encode contrastive focus to serve

the needs of their addressee during spoken interaction. Two theoretical positions are proposed.

According to the first approach (the audience design hypothesis) language production is mainly

addressee-oriented and speakers formulate utterances by consulting information that is mutually

shared with their addressee (e.g., Clark, 1996). According to this view, speakers would

prosodically encode the contrastive part of the information on the basis of shared knowledge

with the addressee. By contrast, the speaker internal view assumes that most of linguistic choices

are motivated by the speaker’s own experience and rely primarily on his/her private knowledge

(e.g., Kahn & Arnold, 2012). Within this framework speakers would consider the pragmatic

status of referent from their own perspective independently of the addressee’s view. The aim of

the present study was to disentangle between these two positions by investigating whether the

prosodic encoding of focus is affected by the presence of an addressee.

To test this account, we conducted an experiment in which 30 native speakers of French played

an interactive game developed by Michelas et al. (2014). During this game, participants had to

indicate a given route from a departure point to an arrival point by producing noun-adjective

pairs in which the noun in the 2nd

noun-adjective fragment (the target noun) was either identical

to the noun in the 1st fragment (e.g., bonbons marron ‘brown candies’ vs. bonbons violets ‘purple

candies’) or contrasted with it (e.g., bougies violettes ‘purple candles’ vs. BONBONS violets

‘purple candies’). We also manipulated the presence vs. absence of an addressee meaning that

one group of participants performed the task with an addressee whereas another group describe

the route in the absence of an addressee.

We analyzed prosodic phrasing produced by participants in terms of whether the target noun was

phrased within the same Accentual Phrase as the following adjective (1-AP phrasing) or whether

it was phrased in a separate AP (2-AP phrasing). Results confirmed those of Michelas et al.

(2014) showing that speakers produced more 2-AP phrasing when the target noun was

contrastive in the presence of an addressee. By contrast, in the absence of an addressee, speakers

did not produce more 2-AP phrasing than 1-AP phrasing meaning they did not use prosodic

phrasing to encode pragmatic status of target nouns. In other words, when doing the task with an

addressee, French speakers seemed to take their partner into account by phrasing the target noun

in a separate AP from the adjective to warn him/her that this noun constituted a contrastive entity

relative to the noun of the 1st fragment. In a more general manner, our findings are more easily

reconcilable with the audience design hypothesis.

37

References

Clark, H.H. (1996). Using Language. NewYork, NY: Cambridge University Press.

Chen, A., and Destruel, E. (2010). Intonational encoding of focus in Toulousian French, in

Proceedings of Speech Prosody 2010.

Dohen, M., & Lœvenbruck, H. (2004). Pre-focal rephrasing, focal enhancement and postfocal

deaccentuation in French. in Proceedings of Interspeech 2004.

Féry, C. (2001). “Intonation of focus in French”. in Audiatur Vox Sapientes: A Festschrift for

Arnim von Stechow, ed. C. Féry, W. Sternefeld (Berlin: Akademi Verlag), 153-181.

Kahn, J. M., & Arnold, J. E. (2012). A processing-centered look at the contribution of givenness

to durational reduction. Journal of Memory and Language, 67(3), 311-325.

Michelas, A., Faget, C., Portes, C., Lienhart, A.-C., Boyer, L., Lançon, C., & Champagne-

Lavau, M. (2014). Do patients with schizophrenia use prosody to encode contrastive

discourse status? Frontiers in Psychology, 5:755.

38

Individual variability in Salerno Italian question tune production and

epistemic attitude

Riccardo Orrico1, Renata Savy

2 & Mariapaola D’Imperio

3

1University of Salerno, Italy & Aix Marseille Univ, LPL, CNRS, France

2University of Salerno, Italy

3Aix Marseille Univ, LPL, CNRS, France

The use of intonation as an epistemic operator has been widely investigated across languages,

and researchers agree on the fact that intonation plays a crucial role in the conveyance of

information concerning discourse commitment, agreement between speaker and addressee, and

degree of certainty of the proposition expressed (Pierrehumbert and Hirschberg, 1990; Bartels,

2014; Prieto, 2015). A key account of the dialogical status of intonation is proposed by

Gunlogson (2004), who argues that intonation alone can be used to attribute commitment to one

of the participants in a dialogue. The epistemic value of American English questions has also

been investigated by Nilsenová (2006), who found that listeners associate an L* L-L% contour

with the expectation of a negative answer, while the expectation of a positive answer is

associated with high terminals (H-H%). As for romance languages, evidence on the tonal

encoding of information about commitment and agreement between discourse participants has

also been found in Catalan (Vanrell et al., 2014; Prieto & Borras Comes, 2018) and in Bari

Italian (Grice and Savino 1997; Savino and Grice, 2011).

Our study intends to investigate the intonational encoding of epistemic disposition in Salerno

Italian (SI), with the aim of testing whether intonation can be used in questions to convey

speaker certainty about the proposition expressed. The intonational realization of yes-no

questions in SI has been investigated in a production study, using a Discourse Completion Task

(4 speakers) and analyzed through a ToBI-like annotation system (cf. Grice et al. 2005, Gili

Fivela et al. 2015). Three main phonological contours have been found for polar questions:

L*+H L-L%, L+H* L-H% and L+H* L-L% (Figure 1), whose distribution is highly dependent

on the speaker uttering the question.

The epistemic value of SI question tunes has been tested in perception, with the hypothesis that

the variability depends not only on the speaker, but also on the degree of certainty of the

expected answer: in a web-based survey, 45 listeners were asked to judge questions according to

the degree of speaker certainty about the answer expected. The experimental stimuli were

produced by a female native speaker of SI (24 items x 3 tunes). The listeners were asked to give

a response using a slide-bar ranging from 0 (‘She expects no’) to 100 (‘She expects yes’). The

data were analyzed using a mixed effect linear regression model, with the response as dependent

variable, while the fixed independent variable was the tonal configuration of the utterance.

Listeners and items were included as random variables. The effect of pitch accents and boundary

tones on the response was also tested in two other separate models.

The results show that tune has an effect on the response: listeners rated the tunes with falling

terminals lower than the rising one (see Figure 2), though a significant effect was found only for

L*+H L-L% (β=-0.21, t=-3.31, p<.01). The models for pitch accents and boundary tones show

that both variables have an effect on the response: listeners gave a higher response to contours

with an L+H* (β=0.17, t=3.00, p<.01) than those with L*+H and a lower response to L% (β=-

0.14, t=-2.45, p<.05) than H%. In addition, all the models show a high value of variance for

listeners, showing that there are differences in the way speaker certainty is perceived. However,

the random slopes for listeners show that such variance is related to how the same tune was rated

by different listeners, while the difference among the tunes is rather constant across listeners.

Although further experiments are needed to better understand the dialogical functions of

intonation and how it interacts with individual variability, the study proves that epistemic

attitude can indeed be encoded in intonational contours in SI.

39

Figure 1 Examples of the question tunes in SI. F0 contours for the utterance Sono le nove? “Is it nine o’clock?” with L*+H L-

L% (left), L+H* L-H% (middle), and L+H* L-L% (right).

Figure 2 Listeners response as an effect of type of question tune

References

Bartels, C. (2014). The intonation of English statements and questions: A compositional interpretation. Routledge.

Gili Fivela, B., Avesani, C., Barone, M., Bocci, G., Crocco, C., D'Imperio, M., Giordano, R., Marotta, G., Savino,

M. & Sorianello, P. (2015). Intonational phonology of the regional varieties of Italian. In Intonation in Romance

(pp. 140-197). Oxford University Press.

Grice, M., D’imperio, M., Savino, M., & Avesani, C. (2005). Strategies for intonation labelling across varieties of

Italian. Prosodic typology: The phonology of intonation and phrasing, (pp. 362-389) Oxford, Oxford University

Press.

Grice, M., & Savino, M. (1997). Can pitch accent type convey information status in yes-no questions?. Proceedings

Workshop Concept to Speech Generation Systems (pp. 29-38) AECL/EACL, Madrid.

Gunlogson, C. (2004). True to form: Rising and falling declaratives as questions in English. Routledge.

Nilsenová, M. (2006). Rises and Falls Studies in the Semantics and Pragmatics of Intonation. Unpublished Phd

thesis Universiteit van Amsterdam, Institute for Logic, Language and Computation.

Pierrehumbert, J., & Hirschberg, J. B. (1990). The meaning of intonational contours in the interpretation of

discourse. Intentions in communication, 271-311.

Prieto, P. (2015). Intonational meaning. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4), 371-381.

Prieto, P. & Borràs-Comes, J. (2018). Question intonation contours as dynamic epistemic operators. Natural

Language and Linguistic Theory, 36 (2), 563-586.

Savino, M., & Grice, M. (2011). The perception of negative bias in Bari Italian questions. In S. Frota, P. Prieto & G.

Elordieta (Eds.) Prosodic categories: Production, perception and comprehension (pp. 187-206). Springer,

Dordrecht.

Vanrell, M. D. M., Armstrong, M. E., & Prieto Vives, P. (2014). The role of prosody in the encoding of

evidentiality. Proceedings of Speech Prosody 7, Dublin, Ireland.

Tunes: A: L+H* L-H%

B: L*+H L-L%

C: L+H* L-L%

http://prosodia.upf.edu/home/arxiu/publicacions/Prieto&Borras-Comes_2018.pdf

40

CRISP: a semantics for focus-sensitive particles in questions

Marvin Schmitt1, Alexandre Cremers

2, Jakub Dotlačil

2

1HU Berlin

2ILLC/UvA

42

Why so quiet?

The nature and significance of silent gaps in second language communication

Simon Wehrle, Francesco Cangemi, Martine Grice

IfL Phonetik, University of Cologne

Data from a wide range of languages have shown that native speakers consistently achieve the

timing of turn transitions with an almost surgical degree of precision. This is achieved with gaps

between turns that typically measure only around 200ms [1;2]. Deviations from this in the form

of either longer silences or overlapping speech are rare, conspicuous and considered undesirable

[3]. This timing is all the more impressive since research has shown that the planning of an

utterance, from formulation to actual speech production, takes at least 900ms when the utterance

consists of more than two words [4], extending to around 1500ms for simple sentences [5;6].

This means that a speaker must plan their next utterance before the interlocutor has finished their

turn in order to avoid long gaps, while at the same time predicting the end of the interlocutor’s

turn in order to avoid overlaps.

A clear indicator of the considerable difficulty involved is found in observations of first language

acquisition: Although infants already engage in “proto-conversational” turn-taking with their

caregivers [7], gaps between turns remain twice as long as those of adults until well into middle

childhood [8;9], showing that the complexities of speech production in a developing language

system cause longer silent intervals.

Combining such evidence from first language (L1) acquisition with evidence that word

production is slower in second language (L2) speech, even for highly proficient, bilingual

speakers [10], we can hypothesise that adult L2 speakers will produce more, and longer, gaps at

turn transitions. To our knowledge, no studies have investigated this issue to date.

We collected Map Task [11] data from 4 L1 German speakers and 4 matched L2 speakers with

L1 Vietnamese, analysing a total of 78 minutes of dialogue. The L2 subjects spoke German at a

high level of proficiency (CEF B2) and were living and studying in Germany at the time of

recording.

Looking at the timing of turn transitions using the measure of Floor Transfer Offset (see Fig. 1),

we found that the L2 dyads produced almost twice as many gaps of over 700ms (21.7% of all

turn transitions) compared to L1 controls (11.9%). Investigating the entire speech signal, divided

into a) speech from one speaker, b) silence and c) overlaps, reveals even greater differences (see

Fig. 2). For the L2 speakers, a striking 42.6% of the interaction consists of silence, with a

concordant decrease of both speech by one speaker and by multiple speakers compared to the L1

group (where silence makes up only 18.8%). Note also that in the L2 group, a smaller proportion

of overlaps contains backchannels (BCs), meaning that, conversely, a larger proportion of

overlaps constitutes conversationally “unprincipled” interruptions (i.e. they were not

backchannels).

Looking at the duration of silent intervals in more detail, we can see, again, that L2 interactions

had not only more, but also considerably longer silent intervals (see Fig. 3), with some lasting

more than 10 seconds.

Gaps in conversation are not just empty space, but always carry semiotic weight and are

interpreted as communicative signals [12]. Any silences longer than ~700ms have been

consistently shown to signal a negative attitude or a lack of understanding [13]. Even for smaller

gaps in the range of a few hundred milliseconds, listeners are extremely sensitive to very small

differences. Moreover, it has been suggested that such differences in turn-taking styles form the

basis for a wide range of culture and character attributions and stereotypes [1]. As a result, it is

imperative that further detailed analyses are carried out in this understudied area and that the

findings are carried into the second language classroom and beyond.

43

References

[1] T. Stivers, N.J. Enfield, P. Brown, C. Englert, M. Hayashi, M., T. Heinemann, ... & S. C. Levinson, "Universals

and cultural variation in turn-taking in conversation", Proceedings of the National Academy of Sciences,

106(26), pp. 10587-10592, 2009.

[2] S. C. Levinson & F. Torreira, "Timing in turn-taking and its implications for processing models of language",

Frontiers in Psychology, 6, pp. 10-26, 2015.

[3] Sacks, H., E. Schegloff & G. Jefferson, "A simplest systematics for the organization of turn-taking in

conversation", Language, 50, 696-735, 1974.

[4] Schnur, T.T., A. Costa & A. Caramazza, "Planning at the phonological level during sentence production",

J. Psycholinguist. Res., 35, pp. 189-213, 2006.

[5] Griffin, Z.M. & K. Bock, "What the eyes say about speaking", Psychol. Sci., 4, pp. 274-279, 2000.

[6] Gleitmann, L.R., D. January, R. Nappa & J.C. Trueswell, "On the give and take between event apprehension and

utterance formulation", J. Mem. Lang., 57, pp. 544-596, 2007.

[7] Gratier, M., E. Devouche, B. Guellai, R. Infanti, E. Yilmaz & E. Parlato-Oliveira, "Early development of turn-

taking in vocal interaction between mothers and infants", Frontiers in Psychology, 6, pp. 67-76, 2015.

[8] Garvey, C. & G. Berninger, "Timing and turn-taking in children's conversations", Discourse Process., 4, pp. 7-

57, 1981.

[9] Casillas, M., S. C. Bobb & E.V. Clark, "Turn-taking, timing, and planning in early language acquisition, Journal

of child language, 1310-1337, 2016.

[10] Hanulová, J., D.J. Davidson & P. Indefrey, "Where does the delay in L2 picture naming come from?

Psycholinguistic and neurocognitive evidence on second language word production", Language and Cognitive

Processes, 26(7), 902-934, 2011.

[11] A.H. Anderson et al., "The HCRC map task corpus", Language and speech, 34(4), 351-366, 1991.

[12] K. Vogeley, "Two social brains: neural mechanisms of intersubjectivity", Philosophical

Transactions of the Royal Society B, 372(1727), 2017.

[13] Kendrick, K. & F. Torreira, "The timing and construction of preference: a quantitative study", Discourse

Process., 52, pp. 255-289, 2015.

Figure 2: Proportion of the speech signal for

each group (L1, L2), according to whether there

was Overlap (speech by two talkers at once),

Overlap with BC (backchannels), Silence, or

One Speaker speaking at a time.

Figure 1: Floor Transfer Offset (FTO) values of gap- and overlap

transitions between turns by group (L1, L2). Negative values represent

overlaps, positive values represent gaps.

Figure 3: Duration of all silent intervals, by group (L1, L2). Note

the logarithmic scale.

44

Rise-fall-rise: A prosodic window on secondary QUDs

Matthijs Westera Universitat Pompeu Fabra

There are several long-standing views of rise-fall-rise intonation (RFR), in English and related languages.

One is that RFR is a marker of secondary information (e.g., Gussenhoven 1984; Potts 2005) – see

examples (1) to (3). Another is that RFR marks the (contrastive) topic of the utterance, e.g., in (4) Fred

would be the topic and the beans the focus, and the other way around in (5) (examples from Jackendoff

1972; for formal accounts see Roberts 1996; Büring 2003). A third view on RFR is that it marks

uncertain relevance as in (6) (Ward and Hirschberg 1985) or, closely related, partial answerhood (e.g.,

Wagner et al. 2013). Can these views be reconciled? Is there a common denominator that ties the many

different uses of RFR together?

I suggest a positive answer to these questions, building on my theory of Intonational Compliance

Marking (ICM; Westera 2013, 2017, in press). The ICM theory says that boundary tones indicate

whether the speaker intends to comply (L%) with the conversation maxims relative to the main Question

Under Discussion (QUD) or not (H%). Trailing tones (L, H) of accents indicate the same, but relative to a

focus-congruent QUD (which may but not need be the same as the main QUD). RFR intonation (L*HL

H%) has a low trailing tone followed by a high boundary tone, which entails the presence of two distinct

QUDs: H% indicates potential non-compliance relative to the main QUD; L indicates compliance relative

to a focus-congruent QUD – and since a single utterance cannot both comply and not comply relative to

the same QUD, the two QUDs must be distinct. RFR, therefore, is predicted to be a marker of secondary

QUDs.

The foregoing predicts that, to understand a certain usage of RFR, we should try to understand what the

secondary QUD is, i.e., which question is being completely resolved why some other, main QUD is left

open. In the poster presentation I want to walk through a number of examples, including the ones on the

next page, that are representative of the aforementioned main strands in the literature on RFR. I hope to

show that each of these examples snaps into place once an analysis in terms of a secondary QUD is

considered.

Acknowledgment

This project has received funding from the European Research Council (ERC) under the European

Union’s Horizon 2020 research and innovation programme (grant agreement No 715154). This paper

reflects the authors’ view only, and the EU is not responsible for any use that may be made of the

information it contains.

Examples

(1) B: John, who is a vegetarian, envies Fred.

L*H H% L*HL H% H*L L%

(2) B: John – he’s a vegetarian – envies Fred.

L*H H% L*HL H% H*L L%

45

(3) B: On an unrelated note, Fred ate the beans.

L*HL H% H*L H*L L%

(4) A: What about Fred, what did he eat?

B: Fred, ate the beans.

L*HL H% H*L L%

(5) A: What about the beans, who had those?

B: Fred ate the beans…

H*L L*HL H%

(6) A: Have you ever been West of the Mississippi?

B: I’ve been to Missouri…

L*HL H%

(7) A: So I guess you like [æ]pricots then?

B: I don’t like [æ]pricots – I like [ei]pricots!

L*HL H% H*L L%

(8) B: As for Fred, he ate the beans.

L*HL H% H*L L%

References

Büring, D. (2003). On d-trees, beans, and b-accents. Linguistics and Philosophy 26, 511–545.

Gussenhoven, C. (1984). On the grammar and semantics of sentence accents. Number 16 in Publications in

Language Sciences. Walter de Gruyter.

Jackendoff, R. S. (1972). Semantic interpretation in generative grammar. Number 2 in Current Studies in

Linguistics. Cambridge, MA: MIT Press.

Potts, C. (2005). The logic of conventional implicatures. Oxford University Press.

Roberts, C. (1996). Information structure in discourse. In J. Yoon and A. Kathol (Eds.), OSU Working

Papers in Linguistics, Volume 49, pp. 91–136. Ohio State University.

Wagner, M., E. McClay, and L. Mak (2013). Incomplete answers and the rise-fall-rise contour. In R.

Fernández and A. Isard (Eds.), Proceedings of the Seventeenth Workshop on the Semantics and Pragmatics

of Dialogue (SemDial 17), Volume 17.

Ward, G. and J. Hirschberg (1985). Implicating uncertainty: the pragmatics of fall-rise intonation.

Language 61.4, 747–776.

Westera, M. (2013). ‘Attention, I’m violating a maxim!’ A unifying account of the final rise. In R.

Fernández and A. Isard (Eds.), Proceedings of SemDial 17.

Westera, M. (2017). Exhaustivity and intonation: a unified theory. Ph. D. thesis, submitted to ILLC,

University of Amsterdam.

Westera, M. (in press). Rising declaratives of the Quality-suspending kind. To appear in Glossa.

workshop on - hypotheses.org...givenness and prosody, how french wh-in-situ questions are not linked...

Documents