latent lstm allocation · 2018-12-30 · zaheer, m., ahmed, a., and smola, a. j. (2017). latent...
TRANSCRIPT
![Page 1: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/1.jpg)
Latent LSTM Allocation
Manzil Zaheer, Amr Ahmed and Alexander J Smola
Presented by Akshay Budhkar & Krishnapriya Vishnubhotla
March 3, 2018
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 1 / 22
![Page 2: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/2.jpg)
Outline
1 IntroductionLatent Dirichlet AllocationLSTMs
2 Latent LSTM AllocationAlgorithmInferenceDifferent Models
3 Results
4 Conclusion
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 2 / 22
![Page 3: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/3.jpg)
Latent Dirichlet Allocation
Probabilistic graphical model
Not sequential, but easily interpretable.
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 3 / 22
![Page 4: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/4.jpg)
LSTMs
Good for modeling sequential data, preserves temporal aspect
Too many parameters
Hard to interpret
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 4 / 22
![Page 5: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/5.jpg)
Latent LSTM Allocation (LLA) - Algorithm
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 5 / 22
![Page 6: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/6.jpg)
Graphical model for LLA
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 6 / 22
![Page 7: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/7.jpg)
Marginal probability of observing a document is
p(wd |LSTM, φ) =∑zd
p(wd , zd |LSTM, φ)
=∑zd
∏t
p(wd ,t |zd ,t ;φ)p(zd ,t |zd ,1:t−1; LSTM)(1)
Uses a K × H dense matrix and a V × K sparse matrix.
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 7 / 22
![Page 8: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/8.jpg)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 8 / 22
![Page 9: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/9.jpg)
Inference
Stochastic Expectation Maximization is used to compute theposterior.
The Evidence Lower Bound (ELBO) can be written as:∑d
log p(wd |LSTM, φ)
≥∑d
∑zd
q(z) logp(zd ; LSTM)
∏t p(wd ,t |zd ,t ;φ)
q(zd)
(2)
Conditional probability of topic at time step t is:
p(zd ,t = k|wd ,t , zd ,1:t−1|LSTM, φ)
∝ p(zd ,t = k |zd ,1:t ; LSTM)p(wd ,t |zd ,t = k ;φ)(3)
And
p(wd ,t |zd ,t = k ;φ) = φw ,k =nw ,k + β
nk + Vβ(4)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 9 / 22
![Page 10: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/10.jpg)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 10 / 22
![Page 11: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/11.jpg)
Mathematical Intuition
LDAlog p(w) =
∑t
log p(wt |model)
=∑t
log∑zt
p(wt |zt)p(zt |doc)(5)
LSTMlog p(w) =
∑t
log p(wt |wt−1,wt−2, . . . ,w1) (6)
LLA
log p(w) = log∑z1:T
∏t
p(wt |zt)p(zt |zt−1, zt−2, . . . , z1) (7)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 11 / 22
![Page 12: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/12.jpg)
Different Models
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 12 / 22
![Page 13: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/13.jpg)
Perplexity vs. Number of topics (Wikipedia)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 13 / 22
![Page 14: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/14.jpg)
Perplexity vs. Number of topics (User Search)
Cannot use Char LLA, since URLs lack morphological structure
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 14 / 22
![Page 15: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/15.jpg)
LDA Ablation Study
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 15 / 22
![Page 16: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/16.jpg)
Interpreting Cleaner Topics
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 16 / 22
![Page 17: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/17.jpg)
Interpreting Factored Topics
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 17 / 22
![Page 18: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/18.jpg)
LSTM Topic Embedding (Wikipedia)
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 18 / 22
![Page 19: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/19.jpg)
Convergence Speed
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 19 / 22
![Page 20: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/20.jpg)
Effect of Joint vs. Independent Training
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 20 / 22
![Page 21: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/21.jpg)
Final Thoughts
Pros
Provides a knob for interpretability and accuracyLess number of parameters for a reasonable perplexityCleaner factored topics
Cons
Did not compare to something like hierarchical LDACan’t use Char LLA for every problemPerplexity is not a good measure of text generation accuracy
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 21 / 22
![Page 22: Latent LSTM Allocation · 2018-12-30 · Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Joint clustering and non-linear dynamic modeling of sequence data](https://reader036.vdocuments.us/reader036/viewer/2022081406/5f1946d0cd2316681809a033/html5/thumbnails/22.jpg)
Bibliography
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal ofmachine Learning research, 3(Jan):993–1022.
Galley, M., Brockett, C., Sordoni, A., Ji, Y., Auli, M., Quirk, C., Mitchell, M., Gao, J.,and Dolan, B. (2015). deltableu: A discriminative metric for generation tasks withintrinsically diverse targets. arXiv preprint arXiv:1506.06863.
Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neuralcomputation, 9(8):1735–1780.
Zaheer, M., Ahmed, A., and Smola, A. J. (2017). Latent lstm allocation: Jointclustering and non-linear dynamic modeling of sequence data. In InternationalConference on Machine Learning, pages 3967–3976.
Manzil Zaheer, Amr Ahmed and Alexander J Smola (Presented by Akshay Budhkar & Krishnapriya Vishnubhotla)Latent LSTM Allocation March 3, 2018 22 / 22