bayesian analysis for natural language processing lecture...
TRANSCRIPT
![Page 1: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/1.jpg)
Bayesian Analysis for Natural LanguageProcessingLecture 1
Shay Cohen
January 28, 2013
![Page 2: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/2.jpg)
Overview
I Bayesian analysis for NLP has been catching on since the lastdecade
I Before that: Bayesian analysis in NLP amounted to “MAPestimation”
I Bayesian Statistics, in general, is an approach to do StatisticsI As opposed to classical Statistics
![Page 3: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/3.jpg)
This class
I Discuss some of the recent advances in Bayesian NLPI Some of the topics to touch on:
I Bayesian analysis in generalI PriorsI Inference (sampling, variational, etc.)I Bayesian NLP models (generative models, nonparametric
models, etc.)I Other things you request or want to read
I Prerequisites: probability, basic statistical principles and somegeneral knowledge of NLP.
![Page 4: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/4.jpg)
Administrativia
I Class schedule: Mondays, 10:10-12:00I Things to do in the seminar:
I Read papers / other materialI Lead paper discussionsI Participate in discussionsI White paper (maybe? probably not.)
I Office hours: right after class (or email me)
![Page 5: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/5.jpg)
Administrativia
I Please look around for three papers that you want to read hereI Send them by email to meI You will lead a discussion on one or more of these papersI You can use slides if you feel better supported this way
![Page 6: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/6.jpg)
Homework for next class
I I will give a manuscript about Bayesian priors in NLPI You should read it and send me in email by Saturday, 10pm:
I At least two-three questions that you have about the material(more are welcome); or
I Points that you noticed about the topic and you think othersshould be aware of.
I We will discuss these in class
![Page 7: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/7.jpg)
Topics for today
I What is the main idea about the Bayesian approach?I Bayes’ theorem and its use in Bayesian inferenceI Bayesian updatingI Bayesian decision theoryI Hidden variablesI Maximum likelihood and maximum aposteriori estimation
In general, today’s goal is to play with Bayes’ theorem in many ways!
Feel free to interrupt to ask questions!
![Page 8: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/8.jpg)
Uniform prior, coin with 0.7 prob. for heads
0.0 0.2 0.4 0.6 0.8 1.0
0.6
0.8
1.0
1.2
1.4
prob. of heads
density
![Page 9: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/9.jpg)
Posterior after 10 tosses, coin with 0.7 prob. for heads
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
prob. of heads
density
![Page 10: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/10.jpg)
Posterior after 100 tosses, coin with 0.7 prob. forheads
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
prob. of heads
density
![Page 11: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/11.jpg)
Posterior after 1000 tosses, coin with 0.7 prob. forheads
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
2025
prob. of heads
density
![Page 12: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/12.jpg)
Non-uniform prior, coin with 0.7 prob. for heads
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
6
prob. of heads
density
![Page 13: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/13.jpg)
Posterior after 10 tosses, coin with 0.7 prob. for heads
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
prob. of heads
density
![Page 14: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/14.jpg)
Posterior after 100 tosses, coin with 0.7 prob. forheads
0.0 0.2 0.4 0.6 0.8 1.0
02
46
8
prob. of heads
density
![Page 15: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/15.jpg)
Posterior after 1000 tosses, coin with 0.7 prob. forheads
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
2025
prob. of heads
density
![Page 16: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/16.jpg)
Modeling with latent variables
Why is Bayesian statistics now often used with incomplete data inNLP?
I Discriminative models do best in the supervised caseI Priors play much more important role in the unsupervised caseI Parameters are latent variables, easy to add more latent
variables
![Page 17: Bayesian Analysis for Natural Language Processing Lecture 1homepages.inf.ed.ac.uk/scohen/bayesian/lecture1.pdf · This class I Discuss some of the recent advances in Bayesian NLP](https://reader033.vdocuments.us/reader033/viewer/2022050219/5f65293905dd7734f334b644/html5/thumbnails/17.jpg)
Summary
Advantages of the Bayesian approach:
I Mananging uncertainty over the parameters as a distribution -diversity
I Simple theory, elegant inference (in theory!)I Incorporate prior knowledge through the prior distribution
“Disadvantage”: always need to pick a priorI Almost...