carolyn penstein rosé language technologies institute human-computer interaction institute school...

Carolyn Penstein RoséLanguage Technologies InstituteHuman-Computer Interaction InstituteSchool of Computer Science

With funding from the National Science Foundation and the Office of Naval Research

LightSIDE

lightsidelabs.com/research/

Click here to load a file

Select Heteroglossia as the predicted category

Make sure the text field is selected to extract text features from

Punctuation can be a “stand in” for mood “you think the answer is 9?” “you think the answer is 9.”

Bigrams capture simple lexical patterns “common denominator” versus “common multiple”

Trigrams (just like bigrams, but with 3 words next to each other) Carnegie Mellon University

POS bigrams capture syntactic or stylistic information “the answer which is …” vs “which is the answer”

Line length can be a proxy for explanation depth

Feature Space Customizations

Contains non-stop word can be a predictor of whether a conversational contribution is contentful “ok sure” versus “the common denominator”

Remove stop words removes some distracting featuresStemming allows some generalization

Multiple, multiply, multiplicationRemoving rare features is a cheap form of feature

selection Features that only occur once or twice in the corpus won’t generalize, so

they are a waste of time to include in the vector space

Feature Space Customizations

Think like a computer!Machine learning algorithms look for features that are good predictors, not features that are necessarily meaningful

Look for approximations If you want to find questions, you don’t need to do a complete

syntactic analysis Look for question marks Look for wh-terms that occur immediately before an auxilliary

Click to extract text features

Select Logistic Regression as the Learner

Evaluate result by cross validation over sessions

Run the experiment

Stretchy Patterns(Gianfortoni, Adamson, & Rosé, 2011)

A sequence of 1 to 6 categories May include GAPs

Can cover any symbol GAP+ may cover any number

of symbols Must not begin or end with a GAP

Now it’s your turn!We’ll explore some advanced features and error analysis

after the break!

Error Analysis Process

Identify large error cellsMake comparisons

Ask yourself how it is similar to the instances that were correctly classified with the same class (vertical comparison)

How it is different from those it was incorrectly not classified as (horizontal comparison)

PositiveNegative

Error Analysis on Development Set

Positive: is interesting, an interesting scene

Negative: would have been more interesting, potentially interesting, etc.

What’s different?

* Note that in this case we get no benefit if we use feature selection over the original feature space.

Feature Splitting (Daumé III, 2007)

General

Domain A Domain BGeneral

Why is this nonlinear?

It represents the interaction between each feature and the Domain variable

Now that the feature space represents the nonlinearity, the algorithm to train the weights can be linear.

Healthcare Bill Dataset

carolyn penstein rosé language technologies institute human-computer interaction institute school...

error analysis process

original feature space

development set20

text features frompunctuation

advanced features

feature splitting daum

interesting scenenegative

text features11

Documents

carolyn penstein rosé language technologies institute and...

computer power institute

carolyn penstein rosecprose/cv-rose-2017.pdf · 2017. 5....

machine learning in practice lecture 2 carolyn penstein...

alabama computer forensics institute -

computer technology institute, greece german aerospace...

international computer science institute

social and communicative factors in learning and lightside...

brochura - business & computer training institute

carolyn penstein rosé language technologies institute...

computer network training institute in ambala ! batra...

computer graphics - worcester polytechnic institute

carolyn penstein rose - carnegie mellon school of computer...

carolyn penstein rose -...

what sociolinguistics and machine learning have to say to...

machine learning in practice lecture 5 carolyn penstein...

international computer science institute · the...

institute of computer science computer science allar tammik

university of tartu institute of computer science computer

basic computer training institute in ambala ! batra...