computational models of discourse analysis
DESCRIPTION
Computational Models of Discourse Analysis. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Warm-Up. Read the posts and be ready to discuss what you see as the take aways for computationization of discourse analysis from today’s readings… - PowerPoint PPT PresentationTRANSCRIPT
Computational Models of Discourse Analysis
Carolyn Penstein RoséLanguage Technologies Institute/
Human-Computer Interaction Institute
Warm-Up Read the posts and be ready
to discuss what you see as the take aways for computationization of discourse analysis from today’s readings…
What are the computational implications of the debate between DA and CA?
Note: These preparatory activities were rated as least beneficial to students, so…
We will start the lecture/discussion at exactly 12:05pm.
Please be on time and ready to discuss!
Early Course Evaluation Good news: everyone rated
the lectures/discussions as valuable and engaging
Things to improve: Decrease preparation time
Change: only one discussion thread per week, but continue to use it throughout the week, include different options for response that require different amounts of time
Change: frontload readings for Monday, further divide readings into required, extra, and supplemental
More focus on fewer concepts for the remainder of the semester
This week:
* Not required!!!
Sections 7.2-7.7 are most important
Chicken and Egg…
Operationalization Computationalization
Main issue for this week:Exploring sequencing and linking between speech acts in conversation
* Where do the ordering constraints come from? Is it the language? Or is it what is behind the language (e.g., intentions, task structure)? If the latter, how do we computationalize that?
Reminder from last time RE Constraint from Ordering Inform is the most common class
(37.4%) Next most frequent is Assess (18.5%)
With bigrams, if we look for conditional probabilities above 25% The only case where the most likely
next class is not Inform is Elicit-Assessment, which is followed by Assessment 36% of the time
It is followed by Inform 33% of the time It only occurs about 1% of the time
Trigrams might be better, but this makes ordering information look pretty useless
More on what was least valuable (student quotes)
The forum prompt mini-assignments seem unbalanced in proportion to the homework - by the time the "real" homework came along, I felt I had done ten times more work on my posts already.
The Homework
•Nice job on the homeworks!!!•I saw SO much improvement over the several posts and finally the assignment.
Assignment 2 (not due til Feb23)Look at the Maptask dataset and Negotiation
coding that is providedThink about what distinguishes the codes at a
linguistic levelDo an error analysis on the dataset using a simple
unigram baseline, and from that propose one or a few new types of features motivated by your linguistic understanding of the Negotiation framework
Due on Week 7 lecture 2Turn in data your feature extractors (documented code)
and a formal write up of your experimentationHave a 5 minute powerpoint presentation ready for
class on Week 7 lecture 2
Interesting Observation!
Responses can address either illocutions or perlocutions Perlocutions are much less constrained Accounts for some of the difficulty in imposing ordering
constraints Argues in favor for thinking about conversation as organized
around intentions and tasks rather than linguistic categories Wednesday’s readings will argue just the opposite!! Are illocutions just the wrong categories??
Discourse Analysis vs Conversation Analysis(according to Levinson)
Rules, formulas, more typical of linguistics and philosophers
Categories, contingencies, grammars
Use of a small but strategic amount of data
Accused of “premature” theory construction
Martin & Rose, Levinson
More rigorously empirical and inductive
Focus on what is found in data, not on what is expected to be found or would sound odd
Hesitant to make generalizations/ Accused of being atheoretical
Questions about whether the rules “work” on real data
* Is it a question about the nature of language (is there a fundamental segmentation difference between utterances and acts?), or is it a question about research methodology? Are these linked?
Do you see a connection with semisupervised learning?
The nature of what we are modeling
How we learn what we know
What we can know about it and how certain we can be
Rules, like speech acts
Qualitative observations, anthrooplogy style
… An now for Elijah’s SIDE presentation
Questions?