![Page 1: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/1.jpg)
Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries
Lokesh Shrestha
![Page 2: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/2.jpg)
2
Reasons for Summarizing Email Email has become a primary means of
business and personal communication.
Conversations take place and decisions are made entirely through email.
Given the high volume of email each individual accumulates, how can we efficiently retrieve information from our email archives?
![Page 3: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/3.jpg)
3
Summarizing Email vs. Summarizing Newswire Email has interactive structure Email can have informal language Email does not have different,
independent documents about same topic (not “multi-document summarization”)
![Page 4: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/4.jpg)
4
Contributions
Email specific features can be used for machine learning based extractive summarization of email threads
A novel approach to question-answer pair detection
Integration of QA pair sentences with extractive sentences improve summaries.
![Page 5: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/5.jpg)
5
Overview Related Work Corpus Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs
Detection Approach 3: Integration Outlook Email Client Conclusion
![Page 6: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/6.jpg)
6
Related Work Summarizing individual emails
Derek Lam, Steven L. Rohall, Chris Schmandt, and Mia K. Stern. 2002 Sentence extraction
Smaranda Muresan, Evelyne Tzoukermann, and Judith Klavans. 2001. Key phrase extraction
Summarizing discussion lists Ani Nenkova and Amit Bagga. 2003.
Sentence extraction Paula Newman and John Blitzer. 2003.
Thread topic clustering and sentence extraction. Summarizing speech dialogues
Klaus Zechner. 2002. Sentence Extraction and QA pairs
![Page 7: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/7.jpg)
7
Overview Related Work Corpus Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs
Detection Approach 3: Integration Outlook Email Client Conclusion
![Page 8: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/8.jpg)
8
Corpus Columbia ACM chapter executive board mailing
list Approximately 10 regular participants
~300 Threads, ~1000 Messages Threads include: scheduling and planning of
meetings and events, question and answer, general discussion and chat.
Annotated by human annotators: Hand-written summary Categorization of threads and messages Highlighting important information (such as question-answer
pairs)
![Page 9: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/9.jpg)
9
Sample Hand-Written Summary for Thread
Annotator 1 Summary: Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night. Raju Gupta tells McCaughly that he is able to reschedule his C-session. Reema Ramachandran reminds McCaughly that he scheduled an MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
![Page 10: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/10.jpg)
10
Overview Related Work Corpus Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs
Detection Approach 3: Integration Outlook Email Client Conclusion
![Page 11: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/11.jpg)
11
Sentence Extraction
Machine learning approach to extractive summarization of email threads
Creating Training Data
Learn extractive rules
Use rules to generate summary
![Page 12: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/12.jpg)
12
Sentence Extraction: Creating Training Data
Using human generated summaries to create a model extractive summary
Compare thread sentences with human summary sentences using SimFinder
Given a summary size, select highly ranked sentences
Represent each sentence with a vector of features and the class
![Page 13: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/13.jpg)
13
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
![Page 14: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/14.jpg)
14
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
SimFinder: 0.0038
![Page 15: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/15.jpg)
15
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
SimFinder: 0.0028
![Page 16: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/16.jpg)
16
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
SimFinder: 0.0028
![Page 17: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/17.jpg)
17
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
SimFinder: 0.0028
![Page 18: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/18.jpg)
18
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
SimFinder: 0.983
![Page 19: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/19.jpg)
19
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.
SimFinder: 0.563
![Page 20: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/20.jpg)
20
SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?dan, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Daniel Kestin asks the group if he can reschedule his C-session for Wednesday night.
Janak Parekh tells Medina that he is able to reschedule his C-session.
Christy Lauridsen reminds Medina that he scheduled on MS Office Session for November 14, and she asks Kestin to confirm that he can be at that session.
SimFinder: 0.0038
SimFinder: 0.983
SimFinder: 0.0038
SimFinder: 0.0038SimFinder: 0.0038
SimFinder: 0.752
SimFinder: 0.221
SimFinder: 0.368
![Page 21: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/21.jpg)
21
Determining Summary Size
Determine the summary size the human summarizers used
Create gold-standard data manually Select about 10% of ACM threads
gold-standard threads Manually classify sentences in gold-standard
threads positive if content reflected in human summary negative otherwise
Compare SimFinder derived classifications at various summary sizes with gold-standard classifications
![Page 22: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/22.jpg)
22
Determining Summary Size Results
Use 45% Verifies the use of SimFinder
Summary size 20% 30% 40% 45% 50% 55% 60%
Recall 0.268 0.500 0.625 0.768 0.803 0.821 0.857
Precision 0.750 0.824 0.833 0.827 0.803 0.780 0.750
F-score 0.394 0.622 0.714 0.796 0.803 0.80 0.80
![Page 23: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/23.jpg)
23
Result: Sentences Marked as in Summary/not in Summary
Guys, I can't come tonight. Can I reschedule my C session for
Wednesday night, 11/8, at 8:00? If that's cool with you guys, please
reserve me a room. Sure we can, but that's the day after
Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled
to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
--please confirm that you can do that session/posters
Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions
Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.
Raju Gupta tells McCaughly that he is able to reschedule his C-session.
Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session
N Y
N
N
N
Y
N
Y
![Page 24: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/24.jpg)
24
Sentence Features: Thread as a document Length: number of words in sentence TF-IDF scores: highest, sum and mean Centroid similarity Subject similarity Relative position in thread Is question?
![Page 25: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/25.jpg)
25
Sentence Features:Email-Specific Features Number of responses to the email. Number of recipients of email Has sender names: does the sentence
contain the name of the senders of messages in the thread?
Email contains forwarded message? Features derived from quoted material
![Page 26: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/26.jpg)
26
Learn extractive rules: Results Using full feature set, 5-fold cross-
validation with Ripper
Baseline scores are obtained with random classification
Data Set Precision Recall F1-score Baseline F1-score
Annotator 1 0.550 0.516 0.532 0.422
Annotator 2 0.514 0.468 0.490 0.392
![Page 27: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/27.jpg)
27
Sample Ruleset: Nice Rules
1. IF centroid_sim_local 0.32 AND thread_line_num 4 AND isQuestion = 1 AND tfidfavg 0.21 AND tfidfavg 0.30 THEN Y.
2. IF centroid_sim 0.72 AND numOfRecipients 8 THEN Y.3. IF centroid_sim_local 0.31 AND thread_line_num 4 AND
tfidfmax 0.61 AND m_rel_pos 0.36 AND t_rel_pos 0.18 THEN Y.
4. IF centroid_sim_local 0.31 AND centroid_sim 0.76 AND centroid_sim 0.79 AND tfidfavg 0.19 THEN Y.
5. IF subject_sim 0.33 AND tfidfsum 2.84 AND tfidfsum 2.64 AND tfidfmax 0.68 THEN Y.
6. ELSE N
![Page 28: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/28.jpg)
28
Automatically Generated Sample Summary
Regarding "meeting tonight...", on Oct 30, 2000, Alexander Max McCaughly wrote: Can I reschedule my C session for Wednesday night, 11/8, at 8:00?
Responding to this on Oct 30, 2000, Raju J Gupta wrote: Are you sure you want to do it then?
Responding to this on Oct 30, 2000, Reema Ramachandran wrote: alex, a reminder that your scheduled to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.
![Page 29: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/29.jpg)
29
Overview Summarizing Email Corpus Development Approach 1: Sentence Extraction Approach 2: Question-Answer
Pairs Detection Approach 3: Integration Outlook Email Client Conclusion
![Page 30: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/30.jpg)
30
The Problem Question-answer exchanges common
in email Multiple questions in one thread; in one
message Multiple, possibly contradictory, answers
to a single question If a summary has question, and
answer is in thread, summary should have the answer
![Page 31: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/31.jpg)
31
Questions in Email SummariesComplete summary from our rule-based sentence extractor:
Regarding "acm home/bjarney", on Apr 9, 2001, Muriel Danslop wrote:Two things: Can someone be responsible for the press releases for Stroustrup?
Responding to this on Apr 10, 2001, Theresa Feng wrote:I think Phil, who is probably a better writer than most of us, is writing up something for dang and Dave to send out to various ACM chapters. Phil, we can just use that as our "press release", right?
In another subthread, on Apr 12, 2001, Kevin Danquoit wrote:Are you sending out upcoming events for this week?
![Page 32: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/32.jpg)
32
Approach
Same machine learning as before: Supervised rule induction based Ripper (Cohen, ’96)
Same email corpus as before ACM Corpus
![Page 33: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/33.jpg)
33
Detection of QuestionsDetecting questions is non-trivial Informal use of question mark
Use question mark in cases other than questions - to denote uncertainty, to make a suggestion. I am on with Monday - perhaps some time in the
afternoon or evening? I suggest 7pm? If it's better for ppl we could also have shorter lunch meetings
(mon,tues,thurs)?
Overlook using a question mark after posing a question Who can we get in touch with at your organization regarding
these services.
The work we present here is based on the detection of interrogative questions – inverted subject-verb order.
![Page 34: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/34.jpg)
34
Detection of Questions Training Corpus - Speech
Switchboard corpus annotated with DAMSL tags. 5000 positive examples, 5000 negative
examples negative examples - "statement-opinion" and
"statement-non-opinion". positive examples - "yes-no-question", "Wh-
question", and "rhetorical-question" Test Corpus - Email
manually extracted from the ACM corpus 300 positive examples, 300 negative examples.
![Page 35: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/35.jpg)
35
Detection of Questions
Features POS tags for the first five terms POS tags for the last five terms length of the utterance most discriminating POS-bigrams
![Page 36: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/36.jpg)
36
Detection of Questions
Results
Recall low because:Questions in ACM corpus start with a declarative clause So, if you're available, do you want to come? if you don't mind, could you post this to the class bboard?
Results without declarative clause:
Recall 0.56
Precision 0.96
F-measure 0.70
Recall 0.72
Precision 0.96
F-measure 0.82
![Page 37: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/37.jpg)
37
Detection of Answers Detection difficult
Multiple topics discussed in parallel Those that begin with a single topic may spin
off different ones Use of reply back function to answer a
question asked earlier in the thread.
We show how various features derived from the structure of email threads can improve upon lexical similarity between message segments
![Page 38: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/38.jpg)
38
Detection of Answers
ACM Corpus Annotators were asked to
Highlight and link Question and Answer pairs. Annotator 1: 200 Threads, 81 QA Threads Annotator 2: 138 Threads, 62 QA Threads Inter-Annotator Agreement (Kappa statistic)
Question Detection: 0.68 Answer Detection (given question): 0.81
![Page 39: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/39.jpg)
39
Detection of Answers
Methods Use human annotated data to generate training
data Textual Unit:
use message segments rather than individual sentences to reduce lexical gap between questions and candidate answers
Learn a classifier that predicts if a subsequent segment to a question segment answers it Represent each question and candidate answer
segment by a feature vector
![Page 40: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/40.jpg)
40
Detection of Answers
Features Used Standard: word counts, word overlap (Cosine,
Euclidean) Based on thread structure:
is candidate answer the first number of emails between the question and the
answer segments the number of emails in the thread before the
question segment Based on other candidate answer segments
is candidate the most similar relative position of the candidate among other
candidates number of other candidates
![Page 41: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/41.jpg)
41
Detection of Answers
Data Set Precision Recall F1-score
Union 0.698 0.619 0.656
Union <= 2 0.879 0.921 0.899
Union > 2 0.631 0.619 0.625
Composite 0.728 0.732 0.730
Experiments and Results 5 fold cross validation using Ripper (Cohen, 96)
![Page 42: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/42.jpg)
42
Detection of Answers
Data Set Precision Recall F1-score
Union 0.698 0.619 0.656
Union <= 2 0.879 0.921 0.899
Union > 2 0.631 0.619 0.625
Composite 0.728 0.732 0.730
Experiments and Results 5 fold cross validation using Ripper (Cohen, 96)
![Page 43: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/43.jpg)
43
Detection of Answers
Data Set Precision Recall F1-score
Union 0.698 0.619 0.656
Union <= 2 0.879 0.921 0.899
Union > 2 0.631 0.619 0.625
Composite 0.728 0.732 0.730
Experiments and Results 5 fold cross validation using Ripper (Cohen, 96)
![Page 44: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/44.jpg)
44
Overview Summarizing Email Corpus Development Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs
Detection Approach 3: Integration Outlook Email Client Conclusion
![Page 45: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/45.jpg)
45
Integrating extractive summaries with QA pairs: Approaches
Use QA pairs as features Add corresponding answers to extracted
questions and corresponding questions to extracted answers
Add extractive sentences to QA pairs Use all QA pairs detected as basis for
summary Use machine learning technique to
identify QA pairs to be included in summary
![Page 46: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/46.jpg)
46
Integrating extractive summaries with QA pairs: First Approach
Use QA pairs as features Each sentence in the thread is
represented by a feature vector Relative position of the sentence in email
and thread TFIDF weights Is question? . . . Is answer?
![Page 47: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/47.jpg)
47
Integrating extractive summaries with QA pairs: First Approach
Use QA pairs as features Number of rules learned with this
augmented set of features: 1397 Number of rules that include the answer
feature: 54 Maximum number of rules that any feature
is included in: 160
![Page 48: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/48.jpg)
48
Integrating extractive summaries with QA pairs: Second Approach
Add corresponding answers to extracted questions Alex -- since you're in OS, what do you think?
Do you think students will be working on the 15th?
I'm in OS, and yeah, I'm pretty sure people will be working on the weekend of a week before.
Add corresponding questions to extracted answers Sure we can, but that's the day after Election
Day. Can I reschedule my C session for Wednesday
night, 11/8, at 8:00?
![Page 49: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/49.jpg)
49
Integrating extractive summaries with QA pairs: Third Approach
Augment QA pair sentences with extractive sentences Automatically detect QA segment pairs in a
thread Select the question sentence from each
question segment Select an answer sentence from each
answer segment Add extractive sentences if they do are not
in any automatically detect QA segment pairs
![Page 50: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/50.jpg)
50
Integrating extractive summaries with QA pairs: Third Approach
Example Summary: Adding questions
Regarding "ACM / CUSFS Film Cosponsorship (fwd)", on Wed Aug 16 10:01:56 EDT 2000, Raju J Gupta wrote: Are you all around before September?
In a subsequent message in the same thread, on Thu Aug 17 14:22:11 EDT 2000, Raju J Gupta wrote: Well, shall we do this the weekend before classes? How about Monday, the labor day before class?
Responding to this on Thu Aug 17 20:55:24 EDT 2000, Justin Liu wrote: I am on with Monday - perhaps some time in the afternoon or evening?
![Page 51: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/51.jpg)
51
Integrating extractive summaries with QA pairs: Third Approach
Example Summary: Adding answers
Regarding "ACM / CUSFS Film Cosponsorship (fwd)", on Wed Aug 16 10:01:56 EDT 2000, Raju J Gupta wrote: Are you all around before September?
Responding to this on Wed Aug 16 12:05:41 EDT 2000, Manij Ali wrote: however, i will be around the following week and i'll be able to make any meeting that does not conflict with any orientation event
In another subthread, on Thu Aug 17 14:22:11 EDT 2000, Raju J Gupta wrote: Well, shall we do this the weekend before classes? How about Monday, the labor day before class?
Responding to this on Thu Aug 17 20:55:24 EDT 2000, Justin Liu wrote: I am on with Monday - perhaps some time in the afternoon or evening?
Responding to this on Fri Aug 18 11:31:25 EDT 2000, Manij Ali wrote: so only under the condition that the time does not conflict with anything that i might have been scheduled for will monday afternoon be okay.
![Page 52: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/52.jpg)
52
Integrating extractive summaries with QA pairs: Third Approach
Example Summary: Adding extractive sentences
Regarding "ACM / CUSFS Film Cosponsorship (fwd)", on Wed Aug 16 10:01:56 EDT 2000, Raju J Gupta wrote: Are you all around before September? You guys realize that this means it's time for the 1st meeting.
Responding to this on Wed Aug 16 12:05:41 EDT 2000, Manij Ali wrote: however, i will be around the following week and i'll be able to make any meeting that does not conflict with any orientation eventi won't be around next week.
In another subthread, on Thu Aug 17 04:01:49 EDT 2000, Ritu Shetty wrote: I won't be back on campus till Sept. 3
In another subthread, on Thu Aug 17 09:30:40 EDT 2000, Daniel Max Kestin wrote: I am back on campus on the 27th.
Responding to this on Thu Aug 17 14:22:11 EDT 2000, Raju J Gupta wrote: Well, shall we do this the weekend before classes? How about Monday, the labor day before class? ...Alex (Markov), when you get back from wherever you are it should be your responsibility to organize these :)
Responding to this on Thu Aug 17 20:55:24 EDT 2000, Justin Liu wrote: I am on with Monday - perhaps some time in the afternoon or evening?
Responding to this on Fri Aug 18 11:31:25 EDT 2000, Manij Ali wrote: so only under the condition that the time does not conflict with anything that i might have been scheduled for will monday afternoon be okay.
![Page 53: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/53.jpg)
53
Integrating extractive summaries with QA pairs: Results
Approach Baseline
Precision 0.55
Recall 0.52
F-score 0.53
![Page 54: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/54.jpg)
54
Integrating extractive summaries with QA pairs: Results
Approach Baseline QA features
Precision 0.55 0.591
Recall 0.52 0.506
F-score 0.53 0.545
![Page 55: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/55.jpg)
55
Integrating extractive summaries with QA pairs: Results
Approach Baseline QA features Add answers and questions to extractive sentences
Precision 0.55 0.591 0.561
Recall 0.52 0.506 0.571
F-score 0.53 0.545 0.566
![Page 56: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/56.jpg)
56
Integrating extractive summaries with QA pairs: Results
Approach Baseline QA features Add answers and questions to extractive sentences
Add extractive sentences to QA pair sentences
Precision 0.55 0.591 0.561 0.534
Recall 0.52 0.506 0.571 0.617
F-score 0.53 0.545 0.566 0.573
![Page 57: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/57.jpg)
57
Integrating extractive summaries with QA pairs: Results
Approach Baseline QA features Add answers and questions to extractive sentences
Add extractive sentences to QA pair sentences
Precision 0.55 0.591 0.561 0.534
Recall 0.52 0.506 0.571 0.617
F-score 0.53 0.545 0.566 0.573
![Page 58: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/58.jpg)
58
Overview Summarizing Email Corpus Development Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs
Detection Approach 3: Integration Outlook Email Client Conclusion
![Page 59: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/59.jpg)
59
What is SUMUI? User Interface that exposes Natural
Language Processing functionalities through an email client such as MS Outlook.
NLP functionalities: Summarization of email Categorization of email Summarization of email thread Categorization of email thread Email clustering and topic detection Summarization of mailbox
Functionalities in italics are work in progress.
![Page 60: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/60.jpg)
60
Components
![Page 61: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/61.jpg)
61
MS Outlook Client Add-On
![Page 62: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/62.jpg)
62
Conclusion Email specific features can be used
for machine learning based extractive summarization of email threads.
We presented our novel approach to question-answer pair detection with high accuracy.
We showed how integration of QA pair sentences with extractive sentences improve summaries.
![Page 63: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha](https://reader035.vdocuments.us/reader035/viewer/2022081501/56649d6b5503460f94a4a93b/html5/thumbnails/63.jpg)
63
Questions?