empirical research methods in computer science lecture 7 november 30, 2005 noah smith

30
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Upload: gertrude-cobb

Post on 03-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Empirical Research Methods in Computer Science

Lecture 7November 30, 2005Noah Smith

Page 2: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Using Data

Action

Model

Dataestimation; regression; learning; training

classification; decision

pattern classificationmachine learning

statistical inference...

Page 3: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Probabilistic Models

Let X and Y be random variables.(continuous, discrete, structured, ...)

Goal: predict Y from X.

A model defines P(Y = y | X = x).

1. Where do models come from?2. If we have a model, how do we use it?

Page 4: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Using a Model

We want to classify a message, x, as spam or mail: y ε {spam, mail}.

ModelxP(spam | x)P(mail | x)

otherwisemail

x|mailPx|spamPifspamy

Page 5: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Bayes’ Rule

)x(P)y(P)y|x(P

)x|y(P

what we said the model must define

likelihood: one distribution over complex observations per y

prior

normalizes into a distribution: 'y

)'y|x(P)'y(P)x(P

Page 6: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Naive Bayes Models

Suppose X = (X1, X2, X3, ..., Xm).

Let

m

1ii )y|x(P)y|x(P

Page 7: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Naive Bayes: Graphical Model

Y

X1 X2 X3 Xm...

Page 8: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Part II

Where do the model parameters come from?

Page 9: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Using Data

Action

Model

Dataestimation; regression; learning; training

Page 10: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Warning

This is a HUGE topic. We will barely scratch the

surface.

Page 11: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Forms of Models

Recall that a model definesP(x | y) and P(y).

These can have a simple multinomial form, like

P(mail) = 0.545, P(spam) = 0.455

Or they can take on some other form, like a binomial, Gaussian, etc.

Page 12: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Example: Gaussian

Suppose y is {male, female}, and one observed variable is H, height.

P(H | male) ~ (μm, σm2)

P(H | female) ~ (μf, σf2)

How to estimate μm, σm2, μf, σf

2?

Page 13: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Maximum Likelihood

Pick the model that makes the data as likely as possible

max P(data | model)

Page 14: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Maximum Likelihood (Gaussian)

Estimating the parameters μm, σm

2, μf, σf2 can be seen as

fitting the data estimating an underlying statistic

(point estimate)

males#

hmaleyˆ

n

1iii

m

1males#

ˆhmaleyˆ

n

1i

2mii

2m

Page 15: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Using the model

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10 12

p(H | male)p(H | female)

Page 16: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Using the model

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10 12

P(male | H)P(female | H)

Page 17: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Example: Regression

Suppose y is actual runtime, and x is input length.

Regression tries to predict some continuous variables from others.

Page 18: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Regression

Linear: assume linear relationship, fit a line.

We can turn this into a model!

Page 19: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Linear Model

Given x, predict y.

y = β1x + β0 + (0, σ2)

true regression line random deviation

Page 20: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Principle of Least Squares

Minimize the sum of squared vertical deviations.

Unique, closed form solution!

vertical deviation

Page 21: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Other kinds of regression

transform one or both variables (e.g., take a log)

polynomial regression (least squares → linear system)

multivariate regression logistic regression

Page 22: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Example: text categorization

Bag-of-words model: x is a histogram of counts for all

words y is a topic

w

)x;w(countuni )y|w(p)y|x(P

Page 23: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

MLE for Multinomials

“Count and Normalize”

)training(*;count)training;w(count

y|wpuni

Page 24: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

The Truth about MLE

You will never see all the words.

For many models, MLE isn’t safe.

To understand why, consider a typical evaluation scenario.

Page 25: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Evaluation

Train your model on some data.

How good is the model?

Test on different data that the system never saw before. Why?

Page 26: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Tradeoff

overfits the training data low variance

doesn’t generalize low accuracy

Page 27: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Text categorization again

Suppose ‘v1@gra’ never appeared in any document in training, ever.

What is the above probability for a new document containing ‘v1@gra’ at test time?

w

)x;w(countuni )y|w(p)y|x(P

Page 28: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Solutions

Regularization Prefer less extreme parameters

Smoothing “Flatten out” the distribution

Bayesian Estimation Construct a prior over model

parameters, then train to maximize

P(data | model) × P(model)

Page 29: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

One More Point

Building models is not the only way to be empirical. Neural networks, SVMs, instance-

based learning MLE and smoothed/Bayesian

estimation are not the only ways to estimate. Minimize error, for example

(“discriminative” estimation)

Page 30: Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

Assignment 3

Spam detection We provide a few thousand

examples Perform EDA and pick features Estimate probabilities Build a Naive-Bayes classifier