selective encoding for abstractive sentence summarization
TRANSCRIPT
![Page 1: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/1.jpg)
Selective Encoding for Abstractive Sentence
Summarization Oingyu Zhou, Nan Yang, Furu Wei and Ming Zhou
ACL 2017
Presentator: Kodaira Tomonori
![Page 2: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/2.jpg)
TaskTask: Abstractive sentence summarization
Input: sentence Output: sentence
Figure 1.
![Page 3: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/3.jpg)
Introduction
• This task different from MT:1. there is no explicit alignment relationship.2. this task needs to keep the highlights and remove the unnecessary information.
![Page 4: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/4.jpg)
Improvement
• Problem: Previous framework is no explicit alignment relationship between the input sentence and the summary except for extracted common words.
• Solution:There method is not to infer the alignment, but to select the highlights while filtering out secondary information in the input.
![Page 5: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/5.jpg)
Problem Formulation
• Input sentence x = (x1, x2, …, xn) { xi ∈ Vs (=source vocab)}
• Output y = (y1, y2, … , yl) { l <= n}
![Page 6: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/6.jpg)
Model
![Page 7: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/7.jpg)
Sentence Encoder
• bidirectional GRU
• The initial states are set to zero vectors.
• After reading the sentence, hidden states are concatenated.
![Page 8: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/8.jpg)
Selective Mechanism
• sentence representation: s = [hbackward,1, hforward, n]
• sGatei = σ(Wshi + Uss + b)
• h’i = hi ○ sGatei
![Page 9: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/9.jpg)
Summary Decoder• GRU:
st = GRU(wt-1, ct-1, st-1)s0 = tanh(Wdhbackward,1 + b)
• Atention: et,i = vaTtanh(Wast-1 + Uah’i)at,i = exp(et,i) / ∑ni=1exp(et,i)ct = ∑ni=1at,ih’i
• Predict:rt = Wrwt-1 + Urct + Vrstmt =[max{rt,2j-1, rt,2j}]Tj=1,…,dp(yt|y1, …, yt-1) = softmax(Womt)
• w: word embeddingc: context vectors: hidden state
• h’i: encoder state
• rt: readout state
![Page 10: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/10.jpg)
Objective Function
• J(θ) = - (1 / |D|) ∑(x,y) ∈D log p (y|x)(D: a set of parallel sentence-summary pair)
• optimizer: Stochastic Gradient Descent
![Page 11: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/11.jpg)
Dataset
• Training set:English Gigaword dataset (Napoles et al., 2012)Training: 3.8M sentence-summary pairsDevelop: 189K
• Test set:1. English Gigaword 2. DUC 20043. MSR Abstractive Text Compression test sets
![Page 12: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/12.jpg)
Data statics
Table 2
![Page 13: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/13.jpg)
Evaluation Metric
• ROUGE (Lin, 2004)ROUGE-1, ROUGE-2, ROUGE-L
![Page 14: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/14.jpg)
Implementation Details• Parameters:
Embedding size: 300GRU hidden state sizes to 512dropout(Srivastava et al., 2014) [p = 0.5]
• Training:Adam: (α = .001, β1 = .9, β2 = .999)gradient clipping [-5, 5]
• BeamSearchbeamsize 12
![Page 15: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/15.jpg)
Baselines• ABS(Rush et al., 2015)
• ABS+ (Rush et al., 2015)
• CAs2s (Chopra et al., 2016)
• Feats2s (Nallapati et al., 2016)
• Luong-NMT (Luong et al., 2015)
• s2s+attthey also implement a s2s model with attention
![Page 16: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/16.jpg)
English Gigaword
Table 3
![Page 17: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/17.jpg)
DUC 2004
Table 4
![Page 18: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/18.jpg)
MSR-ATC
Figure 5
![Page 19: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/19.jpg)
Saliency Heat Map of Selective Gate
• they use the method in Li et al., 2016 to visualize the ocntribution of the selective gate to the final output.
• They approximate the Sy(g) by computing the first order Taylor expansion.
• THey draw the Euclidean norm of the first derivative of the output y with respect to the selective gate g associated with each input words.
Figure 3
![Page 20: Selective encoding for abstractive sentence summarization](https://reader034.vdocuments.us/reader034/viewer/2022051521/5a647a247f8b9a57568b4777/html5/thumbnails/20.jpg)
Conclusion
• propose a selective encoding model.
• greatly improves