bayesian analysis for natural language processing lecture...

17
Bayesian Analysis for Natural Language Processing Lecture 1 Shay Cohen January 28, 2013

Upload: others

Post on 22-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Bayesian Analysis for Natural LanguageProcessingLecture 1

Shay Cohen

January 28, 2013

Page 2: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Overview

I Bayesian analysis for NLP has been catching on since the lastdecade

I Before that: Bayesian analysis in NLP amounted to “MAPestimation”

I Bayesian Statistics, in general, is an approach to do StatisticsI As opposed to classical Statistics

Page 3: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

This class

I Discuss some of the recent advances in Bayesian NLPI Some of the topics to touch on:

I Bayesian analysis in generalI PriorsI Inference (sampling, variational, etc.)I Bayesian NLP models (generative models, nonparametric

models, etc.)I Other things you request or want to read

I Prerequisites: probability, basic statistical principles and somegeneral knowledge of NLP.

Page 4: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Administrativia

I Class schedule: Mondays, 10:10-12:00I Things to do in the seminar:

I Read papers / other materialI Lead paper discussionsI Participate in discussionsI White paper (maybe? probably not.)

I Office hours: right after class (or email me)

Page 5: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Administrativia

I Please look around for three papers that you want to read hereI Send them by email to meI You will lead a discussion on one or more of these papersI You can use slides if you feel better supported this way

Page 6: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Homework for next class

I I will give a manuscript about Bayesian priors in NLPI You should read it and send me in email by Saturday, 10pm:

I At least two-three questions that you have about the material(more are welcome); or

I Points that you noticed about the topic and you think othersshould be aware of.

I We will discuss these in class

Page 7: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Topics for today

I What is the main idea about the Bayesian approach?I Bayes’ theorem and its use in Bayesian inferenceI Bayesian updatingI Bayesian decision theoryI Hidden variablesI Maximum likelihood and maximum aposteriori estimation

In general, today’s goal is to play with Bayes’ theorem in many ways!

Feel free to interrupt to ask questions!

Page 8: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Uniform prior, coin with 0.7 prob. for heads

0.0 0.2 0.4 0.6 0.8 1.0

0.6

0.8

1.0

1.2

1.4

prob. of heads

density

Page 9: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Posterior after 10 tosses, coin with 0.7 prob. for heads

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

prob. of heads

density

Page 10: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Posterior after 100 tosses, coin with 0.7 prob. forheads

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

prob. of heads

density

Page 11: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Posterior after 1000 tosses, coin with 0.7 prob. forheads

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

2025

prob. of heads

density

Page 12: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Non-uniform prior, coin with 0.7 prob. for heads

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

6

prob. of heads

density

Page 13: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Posterior after 10 tosses, coin with 0.7 prob. for heads

0.0 0.2 0.4 0.6 0.8 1.0

01

23

4

prob. of heads

density

Page 14: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Posterior after 100 tosses, coin with 0.7 prob. forheads

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

prob. of heads

density

Page 15: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Posterior after 1000 tosses, coin with 0.7 prob. forheads

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

2025

prob. of heads

density

Page 16: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Modeling with latent variables

Why is Bayesian statistics now often used with incomplete data inNLP?

I Discriminative models do best in the supervised caseI Priors play much more important role in the unsupervised caseI Parameters are latent variables, easy to add more latent

variables

Page 17: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP

Summary

Advantages of the Bayesian approach:

I Mananging uncertainty over the parameters as a distribution -diversity

I Simple theory, elegant inference (in theory!)I Incorporate prior knowledge through the prior distribution

“Disadvantage”: always need to pick a priorI Almost...