summarizing email conversations with clue words giuseppe carenini raymond t. ng xiaodong zhou...
TRANSCRIPT
![Page 1: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/1.jpg)
Summarizing Email Conversations with Clue Words
Giuseppe CareniniRaymond T. NgXiaodong Zhou
Department of Computer ScienceUniv. of British Columbia
![Page 2: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/2.jpg)
2
Motivations of Email Summarization
Email overloading– 40~60 emails per day or even more…
Personal information repository Email summarization can be helpful
– Two examples Meeting Access emails from mobile devices.
![Page 3: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/3.jpg)
3
Outline
Characteristics of email Related work Our summarization approach Experimental results Conclusions and future work
![Page 4: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/4.jpg)
4
Characteristics of Emails
Conversation structure– Context related: reply to the previous
messages. (>60%)
Hidden email– A hidden email is an email quoted by at
least one email in a folder but is not present itself in the same folder.
Writing style– Short length, informal writing, multiple
authors, etc.
AB
> A> BCE
> D> > A> > B > C
F> > A> > BG
m1
m2
m3
m4
![Page 5: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/5.jpg)
5
Requirements for Email Summarization
Conversation structure– Context information is provided.
Information completeness– Include hidden emails as well as existing
messages.
Informative summarization– Cover the core points of the email discussion.– Replacement of the original emails.
![Page 6: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/6.jpg)
6
Outline
Characteristics of email Related work Our summarization approach Result Conclusions and future work
![Page 7: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/7.jpg)
7
Related Work
Multi-Document Summarization (MDS)– Extractive: MEAD, MMR-MD.– Abstractive/Generative: MultiGen, SEA
Email summarization– Single email summarization(Muresan et al.)– Summarizing email threads by sentence
selection (Rambow et al. and Wan et al.)
![Page 8: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/8.jpg)
8
Related Work
MDS methods Email summarization Our method
MEAD & MMR-MD
MultiGen SEA Muresan
et al.
Rambow
et al.
Wan
et al.
Hidden Email
Hidden
Emailx
Conv.
Structure
Thread x x x
Quotation analysis x
informative
summary
Sentence selection x x x x x
Lang. gen.
x x
![Page 9: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/9.jpg)
9
Outline
Characteristics of email Related work Our summarization approach
– Fragment quotation graph– ClueWordSummarizer (CWS)
Result Conclusions and future work
![Page 10: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/10.jpg)
10
Framework
Input: a set of emails Output: email summaries Process:
– Discover and represent email conversations as fragment quotation graphs
– ClueWordSummarizer generates email summaries.
![Page 11: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/11.jpg)
11
Conversation Structure - Fragment Quotation Graph
Complications of email conversation: – Header information
E.g., subject, in-reply-to, and references. Not accurate enough.
– Quotation A good indication for email conversation(Yeh et al.). Selective quotations reflect the conversation in detail.
– Assumption: quotation conversation
Build a fragment quotation graph email conversation.
![Page 12: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/12.jpg)
12
Fragment Quotation Graph
Create nodes– Compare quotations and
new messages– a, b, c, d, e, f, g, h, i, j.
Create edges– Neighbouring quotations
![Page 13: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/13.jpg)
13
Outline
Characteristics of email Related work Our summarization approach
– Fragment quotation graph– ClueWordSummarizer (CWS)
Result Conclusions and future work
![Page 14: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/14.jpg)
14
ClueWordSummarizer
Clue words in the fragment quotation graph– A clue word in node (fragment) F is a word which
also appears in a semantically similar form in a parent or a child node of F in the fragment quotation graph.
– E.g.,
![Page 15: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/15.jpg)
15
ClueWordSummarizer
Three types of clue words
– Root/stem:
settle vs. settlement– Synonym/antonym:
war vs. peace– Loose semantic
meaning:
Friday vs. deadline
![Page 16: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/16.jpg)
16
ClueWordSummarizer
1. ClueScore(CW)– A word CW is in a sentence S
of a fragment F
– ClueScore(discussed, a )=1– ClueScore(settle, b ) = 2
))(,(
))(,(),(
FchildCWfreq
FparentCWfreqFCWClueScore
![Page 17: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/17.jpg)
17
ClueWordSummarizer
2.
3. For each conversation, rank all of the sentences based on their ClueScores.
4. Select the top-k sentences as the summary.
scw
FCWClueScoreSClueScore ),()(
![Page 18: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/18.jpg)
18
Outline
Characteristics of email Related work Our summarization approach Result
– User study– Empirical experiments
Conclusions and future work
![Page 19: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/19.jpg)
19
Result 1: User Study
Objective:– Gold standard– How human summarize email conversations
Setup– Dataset: 20 conversations from Enron dataset– Human reviewers: 25 grads/ugrads in UBC– Each sentence is evaluated by 5 different human reviewers. – Select important sentences and mark crucial important ones.
Gold standard– 4 selections and at least 2 are essentially important.– 88 “gold” sentences out of the 20 conversations (12%).
![Page 20: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/20.jpg)
20
Result 1: User Study
Information completeness– 18% gold sentences from hidden emails. – Hidden emails carry crucial information as well.
Significance of clue words– Clue words appears more frequently in the 88
gold sentences. – Average ratio of ClueScore in gold sentences &
ClueScore in non-gold sentences 3.9
![Page 21: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/21.jpg)
21
Result 2: Empirical Experiments
RIPPER A machine learning classifier In the summary or not. 14 features(Rambow et al.): linguistic and email specific. Sentence/conversation level training 10-fold cross validation
CWS & MEAD
The same summary length(2%) as that of RIPPER.
![Page 22: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/22.jpg)
22
Result 2: Empirical Experiments (CWS v.s MEAD)
sumLen = 15% CWS has a higher
accuracy. P-value:
– 0.077 (precision)– 0.049 (recall)– 0.053 (F-measure)
![Page 23: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/23.jpg)
23
Result 2: Empirical Experiments (CWS v.s MEAD)
CWS has a higher accuracy when sumLen <= 30%.
MEAD is more accurate when sumLen = 40% and higher.
Clue words are significant in important sentences.
![Page 24: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/24.jpg)
24
Result 2: Empirical Experiments (Fragment quotation graph)
![Page 25: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/25.jpg)
25
Outline
Characteristics of email Related work Our conversation-based approach Result Conclusions and future work
![Page 26: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/26.jpg)
26
Conclusions and Future Work
Conclusions– The conversation structure is important and
should be paid more attention.– Fragment quotation graph– Clue Words and ClueWordSummarizer– Empirical evaluation
Clue words frequently appears in important sentences. CWS is accurate.
![Page 27: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/27.jpg)
27
Future Work
Refine the fragment quotation graph User study on different dataset Try other ML classifiers Integrate CWS and other methods … …
![Page 28: Summarizing Email Conversations with Clue Words Giuseppe Carenini Raymond T. Ng Xiaodong Zhou Department of Computer Science Univ. of British Columbia](https://reader030.vdocuments.us/reader030/viewer/2022032414/56649efa5503460f94c0cf2a/html5/thumbnails/28.jpg)
Thank you!
Questions?