summarizing threads of email conversations: using qa pairs detection to improve extractive summaries...

63
Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

Post on 21-Dec-2015

232 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries

Lokesh Shrestha

Page 2: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

2

Reasons for Summarizing Email Email has become a primary means of

business and personal communication.

Conversations take place and decisions are made entirely through email.

Given the high volume of email each individual accumulates, how can we efficiently retrieve information from our email archives?

Page 3: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

3

Summarizing Email vs. Summarizing Newswire Email has interactive structure Email can have informal language Email does not have different,

independent documents about same topic (not “multi-document summarization”)

Page 4: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

4

Contributions

Email specific features can be used for machine learning based extractive summarization of email threads

A novel approach to question-answer pair detection

Integration of QA pair sentences with extractive sentences improve summaries.

Page 5: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

5

Overview Related Work Corpus Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs

Detection Approach 3: Integration Outlook Email Client Conclusion

Page 6: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

6

Related Work Summarizing individual emails

Derek Lam, Steven L. Rohall, Chris Schmandt, and Mia K. Stern. 2002 Sentence extraction

Smaranda Muresan, Evelyne Tzoukermann, and Judith Klavans. 2001. Key phrase extraction

Summarizing discussion lists Ani Nenkova and Amit Bagga. 2003.

Sentence extraction Paula Newman and John Blitzer. 2003.

Thread topic clustering and sentence extraction. Summarizing speech dialogues

Klaus Zechner. 2002. Sentence Extraction and QA pairs

Page 7: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

7

Overview Related Work Corpus Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs

Detection Approach 3: Integration Outlook Email Client Conclusion

Page 8: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

8

Corpus Columbia ACM chapter executive board mailing

list Approximately 10 regular participants

~300 Threads, ~1000 Messages Threads include: scheduling and planning of

meetings and events, question and answer, general discussion and chat.

Annotated by human annotators: Hand-written summary Categorization of threads and messages Highlighting important information (such as question-answer

pairs)

Page 9: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

9

Sample Hand-Written Summary for Thread

Annotator 1 Summary: Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night. Raju Gupta tells McCaughly that he is able to reschedule his C-session. Reema Ramachandran reminds McCaughly that he scheduled an MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

Page 10: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

10

Overview Related Work Corpus Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs

Detection Approach 3: Integration Outlook Email Client Conclusion

Page 11: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

11

Sentence Extraction

Machine learning approach to extractive summarization of email threads

Creating Training Data

Learn extractive rules

Use rules to generate summary

Page 12: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

12

Sentence Extraction: Creating Training Data

Using human generated summaries to create a model extractive summary

Compare thread sentences with human summary sentences using SimFinder

Given a summary size, select highly ranked sentences

Represent each sentence with a vector of features and the class

Page 13: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

13

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

Page 14: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

14

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

SimFinder: 0.0038

Page 15: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

15

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

SimFinder: 0.0028

Page 16: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

16

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

SimFinder: 0.0028

Page 17: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

17

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

SimFinder: 0.0028

Page 18: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

18

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

SimFinder: 0.983

Page 19: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

19

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session.

SimFinder: 0.563

Page 20: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

20

SimFinder in ActionGuys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?dan, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Daniel Kestin asks the group if he can reschedule his C-session for Wednesday night.

Janak Parekh tells Medina that he is able to reschedule his C-session.

Christy Lauridsen reminds Medina that he scheduled on MS Office Session for November 14, and she asks Kestin to confirm that he can be at that session.

SimFinder: 0.0038

SimFinder: 0.983

SimFinder: 0.0038

SimFinder: 0.0038SimFinder: 0.0038

SimFinder: 0.752

SimFinder: 0.221

SimFinder: 0.368

Page 21: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

21

Determining Summary Size

Determine the summary size the human summarizers used

Create gold-standard data manually Select about 10% of ACM threads

gold-standard threads Manually classify sentences in gold-standard

threads positive if content reflected in human summary negative otherwise

Compare SimFinder derived classifications at various summary sizes with gold-standard classifications

Page 22: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

22

Determining Summary Size Results

Use 45% Verifies the use of SimFinder

Summary size 20% 30% 40% 45% 50% 55% 60%

Recall 0.268 0.500 0.625 0.768 0.803 0.821 0.857

Precision 0.750 0.824 0.833 0.827 0.803 0.780 0.750

F-score 0.394 0.622 0.714 0.796 0.803 0.80 0.80

Page 23: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

23

Result: Sentences Marked as in Summary/not in Summary

Guys, I can't come tonight. Can I reschedule my C session for

Wednesday night, 11/8, at 8:00? If that's cool with you guys, please

reserve me a room. Sure we can, but that's the day after

Election Day. Are you sure you want to do it then?alex, a reminder that your scheduled

to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

--please confirm that you can do that session/posters

Confirmed. Intro to MS Office, then there will be three more where we'll work on the individual programs for full sessions

Alexander McCaughly asks the group if he can reschedule his C-session for Wednesday night.

Raju Gupta tells McCaughly that he is able to reschedule his C-session.

Reema Ramachandran reminds McCaughly that he scheduled on MS Office Session for November 14, and she asks McCaughly to confirm that he can be at that session

N Y

N

N

N

Y

N

Y

Page 24: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

24

Sentence Features: Thread as a document Length: number of words in sentence TF-IDF scores: highest, sum and mean Centroid similarity Subject similarity Relative position in thread Is question?

Page 25: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

25

Sentence Features:Email-Specific Features Number of responses to the email. Number of recipients of email Has sender names: does the sentence

contain the name of the senders of messages in the thread?

Email contains forwarded message? Features derived from quoted material

Page 26: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

26

Learn extractive rules: Results Using full feature set, 5-fold cross-

validation with Ripper

Baseline scores are obtained with random classification

Data Set Precision Recall F1-score Baseline F1-score

Annotator 1 0.550 0.516 0.532 0.422

Annotator 2 0.514 0.468 0.490 0.392

Page 27: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

27

Sample Ruleset: Nice Rules

1. IF centroid_sim_local 0.32 AND thread_line_num 4 AND isQuestion = 1 AND tfidfavg 0.21 AND tfidfavg 0.30 THEN Y.

2. IF centroid_sim 0.72 AND numOfRecipients 8 THEN Y.3. IF centroid_sim_local 0.31 AND thread_line_num 4 AND

tfidfmax 0.61 AND m_rel_pos 0.36 AND t_rel_pos 0.18 THEN Y.

4. IF centroid_sim_local 0.31 AND centroid_sim 0.76 AND centroid_sim 0.79 AND tfidfavg 0.19 THEN Y.

5. IF subject_sim 0.33 AND tfidfsum 2.84 AND tfidfsum 2.64 AND tfidfmax 0.68 THEN Y.

6. ELSE N

Page 28: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

28

Automatically Generated Sample Summary

Regarding "meeting tonight...", on Oct 30, 2000, Alexander Max McCaughly wrote: Can I reschedule my C session for Wednesday night, 11/8, at 8:00?

Responding to this on Oct 30, 2000, Raju J Gupta wrote: Are you sure you want to do it then?

Responding to this on Oct 30, 2000, Reema Ramachandran wrote: alex, a reminder that your scheduled to do an MSOffice session on Nov. 14, at 7pm in 252Mudd.

Page 29: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

29

Overview Summarizing Email Corpus Development Approach 1: Sentence Extraction Approach 2: Question-Answer

Pairs Detection Approach 3: Integration Outlook Email Client Conclusion

Page 30: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

30

The Problem Question-answer exchanges common

in email Multiple questions in one thread; in one

message Multiple, possibly contradictory, answers

to a single question If a summary has question, and

answer is in thread, summary should have the answer

Page 31: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

31

Questions in Email SummariesComplete summary from our rule-based sentence extractor:

Regarding "acm home/bjarney", on Apr 9, 2001, Muriel Danslop wrote:Two things: Can someone be responsible for the press releases for Stroustrup?

Responding to this on Apr 10, 2001, Theresa Feng wrote:I think Phil, who is probably a better writer than most of us, is writing up something for dang and Dave to send out to various ACM chapters. Phil, we can just use that as our "press release", right?

In another subthread, on Apr 12, 2001, Kevin Danquoit wrote:Are you sending out upcoming events for this week?

Page 32: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

32

Approach

Same machine learning as before: Supervised rule induction based Ripper (Cohen, ’96)

Same email corpus as before ACM Corpus

Page 33: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

33

Detection of QuestionsDetecting questions is non-trivial Informal use of question mark

Use question mark in cases other than questions - to denote uncertainty, to make a suggestion. I am on with Monday - perhaps some time in the

afternoon or evening? I suggest 7pm? If it's better for ppl we could also have shorter lunch meetings

(mon,tues,thurs)?

Overlook using a question mark after posing a question Who can we get in touch with at your organization regarding

these services.

The work we present here is based on the detection of interrogative questions – inverted subject-verb order.

Page 34: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

34

Detection of Questions Training Corpus - Speech

Switchboard corpus annotated with DAMSL tags. 5000 positive examples, 5000 negative

examples negative examples - "statement-opinion" and

"statement-non-opinion". positive examples - "yes-no-question", "Wh-

question", and "rhetorical-question" Test Corpus - Email

manually extracted from the ACM corpus 300 positive examples, 300 negative examples.

Page 35: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

35

Detection of Questions

Features POS tags for the first five terms POS tags for the last five terms length of the utterance most discriminating POS-bigrams

Page 36: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

36

Detection of Questions

Results

Recall low because:Questions in ACM corpus start with a declarative clause So, if you're available, do you want to come? if you don't mind, could you post this to the class bboard?

Results without declarative clause:

Recall 0.56

Precision 0.96

F-measure 0.70

Recall 0.72

Precision 0.96

F-measure 0.82

Page 37: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

37

Detection of Answers Detection difficult

Multiple topics discussed in parallel Those that begin with a single topic may spin

off different ones Use of reply back function to answer a

question asked earlier in the thread.

We show how various features derived from the structure of email threads can improve upon lexical similarity between message segments

Page 38: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

38

Detection of Answers

ACM Corpus Annotators were asked to

Highlight and link Question and Answer pairs. Annotator 1: 200 Threads, 81 QA Threads Annotator 2: 138 Threads, 62 QA Threads Inter-Annotator Agreement (Kappa statistic)

Question Detection: 0.68 Answer Detection (given question): 0.81

Page 39: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

39

Detection of Answers

Methods Use human annotated data to generate training

data Textual Unit:

use message segments rather than individual sentences to reduce lexical gap between questions and candidate answers

Learn a classifier that predicts if a subsequent segment to a question segment answers it Represent each question and candidate answer

segment by a feature vector

Page 40: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

40

Detection of Answers

Features Used Standard: word counts, word overlap (Cosine,

Euclidean) Based on thread structure:

is candidate answer the first number of emails between the question and the

answer segments the number of emails in the thread before the

question segment Based on other candidate answer segments

is candidate the most similar relative position of the candidate among other

candidates number of other candidates

Page 41: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

41

Detection of Answers

Data Set Precision Recall F1-score

Union 0.698 0.619 0.656

Union <= 2 0.879 0.921 0.899

Union > 2 0.631 0.619 0.625

Composite 0.728 0.732 0.730

Experiments and Results 5 fold cross validation using Ripper (Cohen, 96)

Page 42: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

42

Detection of Answers

Data Set Precision Recall F1-score

Union 0.698 0.619 0.656

Union <= 2 0.879 0.921 0.899

Union > 2 0.631 0.619 0.625

Composite 0.728 0.732 0.730

Experiments and Results 5 fold cross validation using Ripper (Cohen, 96)

Page 43: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

43

Detection of Answers

Data Set Precision Recall F1-score

Union 0.698 0.619 0.656

Union <= 2 0.879 0.921 0.899

Union > 2 0.631 0.619 0.625

Composite 0.728 0.732 0.730

Experiments and Results 5 fold cross validation using Ripper (Cohen, 96)

Page 44: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

44

Overview Summarizing Email Corpus Development Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs

Detection Approach 3: Integration Outlook Email Client Conclusion

Page 45: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

45

Integrating extractive summaries with QA pairs: Approaches

Use QA pairs as features Add corresponding answers to extracted

questions and corresponding questions to extracted answers

Add extractive sentences to QA pairs Use all QA pairs detected as basis for

summary Use machine learning technique to

identify QA pairs to be included in summary

Page 46: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

46

Integrating extractive summaries with QA pairs: First Approach

Use QA pairs as features Each sentence in the thread is

represented by a feature vector Relative position of the sentence in email

and thread TFIDF weights Is question? . . . Is answer?

Page 47: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

47

Integrating extractive summaries with QA pairs: First Approach

Use QA pairs as features Number of rules learned with this

augmented set of features: 1397 Number of rules that include the answer

feature: 54 Maximum number of rules that any feature

is included in: 160

Page 48: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

48

Integrating extractive summaries with QA pairs: Second Approach

Add corresponding answers to extracted questions Alex -- since you're in OS, what do you think?

Do you think students will be working on the 15th?

I'm in OS, and yeah, I'm pretty sure people will be working on the weekend of a week before.

Add corresponding questions to extracted answers Sure we can, but that's the day after Election

Day. Can I reschedule my C session for Wednesday

night, 11/8, at 8:00?

Page 49: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

49

Integrating extractive summaries with QA pairs: Third Approach

Augment QA pair sentences with extractive sentences Automatically detect QA segment pairs in a

thread Select the question sentence from each

question segment Select an answer sentence from each

answer segment Add extractive sentences if they do are not

in any automatically detect QA segment pairs

Page 50: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

50

Integrating extractive summaries with QA pairs: Third Approach

Example Summary: Adding questions

Regarding "ACM / CUSFS Film Cosponsorship (fwd)", on Wed Aug 16 10:01:56 EDT 2000, Raju J Gupta wrote: Are you all around before September?

In a subsequent message in the same thread, on Thu Aug 17 14:22:11 EDT 2000, Raju J Gupta wrote: Well, shall we do this the weekend before classes? How about Monday, the labor day before class?

Responding to this on Thu Aug 17 20:55:24 EDT 2000, Justin Liu wrote: I am on with Monday - perhaps some time in the afternoon or evening?

Page 51: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

51

Integrating extractive summaries with QA pairs: Third Approach

Example Summary: Adding answers

Regarding "ACM / CUSFS Film Cosponsorship (fwd)", on Wed Aug 16 10:01:56 EDT 2000, Raju J Gupta wrote: Are you all around before September?

Responding to this on Wed Aug 16 12:05:41 EDT 2000, Manij Ali wrote: however, i will be around the following week and i'll be able to make any meeting that does not conflict with any orientation event

In another subthread, on Thu Aug 17 14:22:11 EDT 2000, Raju J Gupta wrote: Well, shall we do this the weekend before classes? How about Monday, the labor day before class?

Responding to this on Thu Aug 17 20:55:24 EDT 2000, Justin Liu wrote: I am on with Monday - perhaps some time in the afternoon or evening?

Responding to this on Fri Aug 18 11:31:25 EDT 2000, Manij Ali wrote: so only under the condition that the time does not conflict with anything that i might have been scheduled for will monday afternoon be okay.

Page 52: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

52

Integrating extractive summaries with QA pairs: Third Approach

Example Summary: Adding extractive sentences

Regarding "ACM / CUSFS Film Cosponsorship (fwd)", on Wed Aug 16 10:01:56 EDT 2000, Raju J Gupta wrote: Are you all around before September? You guys realize that this means it's time for the 1st meeting.

Responding to this on Wed Aug 16 12:05:41 EDT 2000, Manij Ali wrote: however, i will be around the following week and i'll be able to make any meeting that does not conflict with any orientation eventi won't be around next week.

In another subthread, on Thu Aug 17 04:01:49 EDT 2000, Ritu Shetty wrote: I won't be back on campus till Sept. 3

In another subthread, on Thu Aug 17 09:30:40 EDT 2000, Daniel Max Kestin wrote: I am back on campus on the 27th.

Responding to this on Thu Aug 17 14:22:11 EDT 2000, Raju J Gupta wrote: Well, shall we do this the weekend before classes? How about Monday, the labor day before class? ...Alex (Markov), when you get back from wherever you are it should be your responsibility to organize these :)

Responding to this on Thu Aug 17 20:55:24 EDT 2000, Justin Liu wrote: I am on with Monday - perhaps some time in the afternoon or evening?

Responding to this on Fri Aug 18 11:31:25 EDT 2000, Manij Ali wrote: so only under the condition that the time does not conflict with anything that i might have been scheduled for will monday afternoon be okay.

Page 53: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

53

Integrating extractive summaries with QA pairs: Results

Approach Baseline

Precision 0.55

Recall 0.52

F-score 0.53

Page 54: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

54

Integrating extractive summaries with QA pairs: Results

Approach Baseline QA features

Precision 0.55 0.591

Recall 0.52 0.506

F-score 0.53 0.545

Page 55: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

55

Integrating extractive summaries with QA pairs: Results

Approach Baseline QA features Add answers and questions to extractive sentences

Precision 0.55 0.591 0.561

Recall 0.52 0.506 0.571

F-score 0.53 0.545 0.566

Page 56: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

56

Integrating extractive summaries with QA pairs: Results

Approach Baseline QA features Add answers and questions to extractive sentences

Add extractive sentences to QA pair sentences

Precision 0.55 0.591 0.561 0.534

Recall 0.52 0.506 0.571 0.617

F-score 0.53 0.545 0.566 0.573

Page 57: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

57

Integrating extractive summaries with QA pairs: Results

Approach Baseline QA features Add answers and questions to extractive sentences

Add extractive sentences to QA pair sentences

Precision 0.55 0.591 0.561 0.534

Recall 0.52 0.506 0.571 0.617

F-score 0.53 0.545 0.566 0.573

Page 58: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

58

Overview Summarizing Email Corpus Development Approach 1: Sentence Extraction Approach 2: Question-Answer Pairs

Detection Approach 3: Integration Outlook Email Client Conclusion

Page 59: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

59

What is SUMUI? User Interface that exposes Natural

Language Processing functionalities through an email client such as MS Outlook.

NLP functionalities: Summarization of email Categorization of email Summarization of email thread Categorization of email thread Email clustering and topic detection Summarization of mailbox

Functionalities in italics are work in progress.

Page 60: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

60

Components

Page 61: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

61

MS Outlook Client Add-On

Page 62: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

62

Conclusion Email specific features can be used

for machine learning based extractive summarization of email threads.

We presented our novel approach to question-answer pair detection with high accuracy.

We showed how integration of QA pair sentences with extractive sentences improve summaries.

Page 63: Summarizing Threads of Email Conversations: Using QA Pairs Detection to Improve Extractive Summaries Lokesh Shrestha

63

Questions?