a hierarchical nonparametric bayesian approach to statistical language model domain adaptation frank...
TRANSCRIPT
A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model
Domain Adaptation
Frank Wood and Yee Whye Teh AISTATS 2009
Presented by: Mingyuan ZhouDuke University, ECEDecember 18, 2009
Outline
• Background• Pitman-Yor Process• Hierachical Pitman-Yor Process Language Models• Doubly Hierachical Pitman-Yor Process Language Model • Inference• Experimental results• Summary
Background: Language modeling and n-Gram models
• “A language model is usually formulated as a probability distribution p(s) over strings s that attempts to reflect how frequently a string s occurs as a sentence”.
• n-Gram (n=2: bigram, n=3: trigram)
• Smoothing:
Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
• Example
• Smoothing
Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
• Evaluation
• Train the n-Gram model:
• Calculate:
• Cross-entropy:
• Perplexity:
Reference: S.F. Chen and J.T Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.
Dirichlet Process and Pitman-Yor Process
• Dirichlet Process
Number of unique words grows at
• Pitman-Yor Process
Number of unique words grows at
• When d=0, Pitman-Yor Process reduces to DP
• Both can be understood through the Chinese Restaurant process
DP Pitman-Yor
Sitting at Table k
Sitting at new Table
0~ DP( , )G G
1
( ) /( )t
k kk
c d c
1
( ) /( )t
kk
dt c
1
/( )t
k kk
c c
1
/( )t
kk
c
Power-law properties of the Pitman-Yor Process
Num
ber
of u
niqu
e w
ords
Number of words
0d
0.5d 0.9d
Pro
port
ion
of w
ords
app
earin
g on
ce
Number of words
0d
0.5d
0.9d
Hierachical Pitman-Yor Process Language Models
Doubly Hierachical Pitman-Yor Process Language Model
Doubly Hierachical Pitman-Yor Process Language Model
Inference• Direchlet Process, Chinese Restaurant Process
• Hierachical Direchlet Process, Chinese Restaurant Franchise
• Pitman-Yor Process, Chinese Restaurant Process
• Hierachical Pitman-Yor Process, Chinese Restaurant Franchise
• Doubly Hierachical Pitman-Yor Language Model, Graphical Pitman-Yor Process, Multi-floor Chinese Restaurant Process, Multi-floor Chinese Restaurant Franchise
Experimental results (HPYLM)
Experimental results (DHPYLM)
Summary
• DHPYLM achieves encouraging domain adaptation results.
• A graphical Pitman-Yor process is constructed and a multi-floor Chinese restaurant representation is proposed for doing sampling.
• DHPYLM may be integrated into topic models to eliminate “bag-of-words” assumptions.