a progressive sentence selection strategy for document summarization
DESCRIPTION
Presenter : Bo- Sheng Wang Authors: You Quyang , Wenjie Li, Renxian Zhang, Qin Lu IPM, 2013. A progressive sentence selection strategy for document summarization. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/1.jpg)
A progressive sentence selection strategy for document summarization
Presenter : Bo-Sheng Wang Authors : You Quyang, Wenjie Li, Renxian Zhang, Qin Lu
IPM, 2013
1
![Page 2: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/2.jpg)
Outlines
• Motivation• Objectives• Methodology• Experiments• Conclusions• Comments
2
![Page 3: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/3.jpg)
Motivation• Since there are actually many overlapping
concepts in the input documents, it is indeed unnecessary and redundant to repeatedly mention one concept in the summary.
3
![Page 4: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/4.jpg)
Objectives• They mainly consider the problem of how to
construct summaries with good saliency and coverage.
4
![Page 5: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/5.jpg)
Methodology
• In this paper, they propose a novel sentence selection strategy that follows a progressive way to select the summary sentences
5
![Page 6: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/6.jpg)
Methodology
• Step 1 :Define the subsuming relationship between two sentences.
6
![Page 7: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/7.jpg)
Methodology
• Step 1 :Define the subsuming relationship between two sentences.
7
![Page 8: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/8.jpg)
Methodology
• Step 1 :Define the subsuming relationship between two sentences.
① The relationship between two sentences is determined by the relations between the concepts.
8
![Page 9: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/9.jpg)
Methodology
• Step 1-1 :The target is to study the subsuming relations between the words in the input documents.
① Linguistic relation database (WordNet)② Frequency-based statistics (co-occurrence)
9
![Page 10: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/10.jpg)
Methodology
• Step 1-1 :• The target is to study the subsuming relations
between the words in the input documents.
10
![Page 11: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/11.jpg)
Methodology
• They expect the relations to have the characteristics listed below.
1. Sentence-level coverage.2. Set-based coverage3. Transitive reduction
11
![Page 12: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/12.jpg)
Methodology-Sentence-level coverage• In document summarization, sometimes a document set just
consists of only a few documents. (For example : 10 documents per set in the DUC 2004 data.)
They intend to study the sentence-level co-occurrence statistics instead of document-level co-occurrence.
12
![Page 13: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/13.jpg)
Methodology-Set-based coverage• Sentence-level co-occurrence is sparser than document-level
co-occurrence due to the shorter length of sentences. • Therefore ,the sentence-level coverage of a word with respect
to another is usually much smaller. They intend to examine the coverage not only between two
words, but also between a word and a word set.(For example : there are two common phrases ‘‘King Norodom’’ and ‘‘Prince Norodom’’. In the input documents, the coverage of ‘‘Norodom’’ with respect to either ‘‘King’’ or ‘‘Prince’’ is not large enough and thus ‘‘Norodom’’ is not recognized to be subsumed by any one of the two. On the other side, ‘‘Norodom’’ is almost entirely covered by the set {‘‘King’’, ‘‘Prince’’}. Therefore, if we can define a set-based coverage, more relations can be discovered)
13
![Page 14: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/14.jpg)
Methodology-Transitive reduction• They also conduct a transitive reduction on the relations.
i.e : to three words a, b, c that satisfy a > b, b > c and a > c (a > b denotes a subsuming b), the long-term relationship a > c will be ignored, since we prefer to include the subsuming word b into the summary before including the subsumed word c.
14
![Page 15: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/15.jpg)
Methodology-Necessary measures• Spanned Sentence Set(SPAN) : SPAN(w) :
The Spanned Sentence Set of a word w in document set D SD : Sentence Set SPAN(w)={s|sϵSD ^ wϵs}= Define as the set of the sentences.
• Concept Coverage(COV) : COV(w|W)=|SPAN(w)∩∪iSPAN(wi)|/|SPAN(W)|
=Defined as the proportion of the sentences in SPAN(w) that appear in SPAN(W).
15
![Page 16: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/16.jpg)
Methodology
16
![Page 17: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/17.jpg)
Methodology
• Step1-2 :(1)They define the concept of “Connected Word”i.e :W={w1,…..,wl}; W’={w’1,…..,w’m}condition :w l1; . . . ; wlk ϵW W∪ , s.t.wi < wl1 ^ w11 < wl2 ^ . . . ^ wl(k-1) < wlk ^ wlk< w’
1
• (2)The Conditional Saliency of calculated as a weighted sum of the importance of all the ‘‘connected words’’CS(s|s’)Σwi ϵ sLOG(MAXw’jϵs’CON(wi|w’
j * score(wi)))
17
![Page 18: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/18.jpg)
Methodology
• Step 2 :
18
![Page 19: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/19.jpg)
Methodology
• Step :1. To every word that is not subsumed by any other word, we
regard it as a general word and attach it to ROOT-W.2. we calculate the score of each unselected sentence based on
its conditional saliency to each selected sentence. Formula :
Score(s|Sold)=Max stϵ Sold{CS(s,st)} * 1/len(s) * (1-pos(s))
Penalizing :Score(wi)=α * Score(wi)
19
![Page 20: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/20.jpg)
Experiments
20
• Step :① Evaluated on a generic multi-document summarization data set.② Evaluated on a query-focused multi-document summarization data set.
• Pre-processed :– Removing the stop-words and stemming the
remaining words
![Page 21: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/21.jpg)
Experiments-Evaluation metrics• ROUGE– State-of-the-art automatic summarization evaluation– They mainly makes use of N-gram comparison.
• DUC
21
![Page 22: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/22.jpg)
Experiments-Generic summarization
22
![Page 23: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/23.jpg)
Experiments-Generic summarization
23
![Page 24: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/24.jpg)
Experiments-Generic summarization
24
![Page 25: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/25.jpg)
Experiments-Generic summarization
25
![Page 26: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/26.jpg)
Experiments-Query-focused summarization
26
![Page 27: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/27.jpg)
Conclusions• Progressive system consistently performs better than
the sequential system on every data set.
• The method competes comparably with the best submitted systems.
• The results clearly demonstrate the advantages of the progressive sentence selection strategy in constructing summaries with better saliency and coverage.
27
![Page 28: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/28.jpg)
Comments
• Advantages– The method that have better saliency and coverage. – In unsupervised case, find the number of
categories can be save some time.
• Applications– Object Discovery
28
![Page 29: A progressive sentence selection strategy for document summarization](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816367550346895dd43f84/html5/thumbnails/29.jpg)
Comments
• Advantages– The method that have better saliency and coverage.
• Disadvantage– The method spend some time than traditional
methods.• Applications– Sentence selection
29