computational models of discourse analysis

Computational Models of Discourse Analysis

Carolyn Penstein RoséLanguage Technologies Institute/

Human-Computer Interaction Institute

Warm-Up Read the posts and be ready

to discuss what you see as the take aways for computationization of discourse analysis from today’s readings…

What are the computational implications of the debate between DA and CA?

Note: These preparatory activities were rated as least beneficial to students, so…

We will start the lecture/discussion at exactly 12:05pm.

Please be on time and ready to discuss!

Early Course Evaluation Good news: everyone rated

the lectures/discussions as valuable and engaging

Things to improve: Decrease preparation time

Change: only one discussion thread per week, but continue to use it throughout the week, include different options for response that require different amounts of time

Change: frontload readings for Monday, further divide readings into required, extra, and supplemental

More focus on fewer concepts for the remainder of the semester

This week:

* Not required!!!

Sections 7.2-7.7 are most important

Chicken and Egg…

Operationalization Computationalization

Main issue for this week:Exploring sequencing and linking between speech acts in conversation

* Where do the ordering constraints come from? Is it the language? Or is it what is behind the language (e.g., intentions, task structure)? If the latter, how do we computationalize that?

Reminder from last time RE Constraint from Ordering Inform is the most common class

(37.4%) Next most frequent is Assess (18.5%)

With bigrams, if we look for conditional probabilities above 25% The only case where the most likely

next class is not Inform is Elicit-Assessment, which is followed by Assessment 36% of the time

It is followed by Inform 33% of the time It only occurs about 1% of the time

Trigrams might be better, but this makes ordering information look pretty useless

More on what was least valuable (student quotes)

The forum prompt mini-assignments seem unbalanced in proportion to the homework - by the time the "real" homework came along, I felt I had done ten times more work on my posts already.

The Homework

•Nice job on the homeworks!!!•I saw SO much improvement over the several posts and finally the assignment.

Assignment 2 (not due til Feb23)Look at the Maptask dataset and Negotiation

coding that is providedThink about what distinguishes the codes at a

linguistic levelDo an error analysis on the dataset using a simple

unigram baseline, and from that propose one or a few new types of features motivated by your linguistic understanding of the Negotiation framework

Due on Week 7 lecture 2Turn in data your feature extractors (documented code)

and a formal write up of your experimentationHave a 5 minute powerpoint presentation ready for

class on Week 7 lecture 2

Interesting Observation!

Responses can address either illocutions or perlocutions Perlocutions are much less constrained Accounts for some of the difficulty in imposing ordering

constraints Argues in favor for thinking about conversation as organized

around intentions and tasks rather than linguistic categories Wednesday’s readings will argue just the opposite!! Are illocutions just the wrong categories??

Discourse Analysis vs Conversation Analysis(according to Levinson)

Rules, formulas, more typical of linguistics and philosophers

Categories, contingencies, grammars

Use of a small but strategic amount of data

Accused of “premature” theory construction

Martin & Rose, Levinson

More rigorously empirical and inductive

Focus on what is found in data, not on what is expected to be found or would sound odd

Hesitant to make generalizations/ Accused of being atheoretical

Questions about whether the rules “work” on real data

* Is it a question about the nature of language (is there a fundamental segmentation difference between utterances and acts?), or is it a question about research methodology? Are these linked?

Do you see a connection with semisupervised learning?

The nature of what we are modeling

How we learn what we know

What we can know about it and how certain we can be

Rules, like speech acts

Qualitative observations, anthrooplogy style

… An now for Elijah’s SIDE presentation

Questions?

computational models of discourse analysis

Documents

semesterthis week

error analysis

linguistic understanding

linguistic leveldo

common class

frontload readings

ordering information

computational implications