progress update

13
Progress Update Lin Ziheng 06/14/22

Upload: rafael-day

Post on 03-Jan-2016

46 views

Category:

Documents


4 download

DESCRIPTION

Progress Update. Lin Ziheng. Outline. Update Summarization Opinion Summarization Discourse Analysis. Update Summarization. TAC 2008 update summarization task slightly differ from the DUC 2007 update task - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Progress Update

Progress Update

Lin Ziheng

04/20/23

Page 2: Progress Update

Outline Update Summarization Opinion Summarization Discourse Analysis

04/20/23

Page 3: Progress Update

Update Summarization TAC 2008 update summarization task slightly

differ from the DUC 2007 update task The documents will be from the AQUAINT-2

collection rather than the AQUAINT collection Cluster format:

There will only be two sets per cluster (Set A and Set B) Each document set will have exactly 10 documents The summary for document Set A should be a regular

topic-focused summary The summary for Set B should be written under the

assumption that the user has already read all the documents in Set A

04/20/23

Page 4: Progress Update

Tarsqi: a tool for event/time anchoring/ordering Recognizes events and times Creates event/event, event/time, time/time

temporal linksJohn fell after Mary pushed him.

They heard an explosion on Monday, but not in 2007.

This reminded them of the 1968 war, which ravaged the countryside in 1969.

He slept on Friday night.

She hopes to succeed before noon.

Gonzalez said he would resign on Tuesday.

He thought it was a great deal.

John leaves today.

John does not leave today. 04/20/23

Page 5: Progress Update

1

2

3

4

5

6

7

8

9

D1 D2

1

2

3

4

5

6

7

8

9

D1 D2

Tarsqi

1

3 4

2

6

5 7

8

9

GraphLayering

04/20/23

Page 6: Progress Update

D0703A-A

04/20/23

Page 7: Progress Update

BFS

04/20/23

Page 8: Progress Update

Topmost layering

04/20/23

Page 9: Progress Update

Optimal layering

04/20/23

Page 10: Progress Update

Opinion Summarization Input:

Output: a summary for each target that summarizes the answers to the questions

<target id = "9902" text = "Time Magazine 2005 Person of the Year"> <q id = "9902.1" type= "SquishyList">

Why did readers support Time's inclusion of Bono for Person of the Year?

</q> <q id = "9902.2" type= "SquishyList">

Why did readers not support the inclusion of Bill Gates as Person of the Year?

</q> <q id = "9902.3" type= "SquishyList">

Why did readers not support the inclusion of Melinda Gates as Person of the Year?

</q> <doc id = "BLOG06-20051222-014-0013437834" /> <doc id = "BLOG06-20051224-070-0016186787" /> <doc id = "BLOG06-20051225-087-0014047570" /> <doc id = "BLOG06-20051225-022-0003271778" /> <doc id = "BLOG06-20051223-002-0006769403" /> <doc id = "BLOG06-20051222-023-0003513393" /> <doc id = "BLOG06-20051228-009-0011259747" /><doc id = "BLOG06-20051221-029-0028769327" />

</target>

04/20/23

Page 11: Progress Update

Existing opinion corpus: Movie Review corpus Document level:

1000 +ve documents and 1000 –ve documents Problem: coarse grain level

Sentence level: 5331 +ve sentences and 5331 –ve sentences Problem: not enough data

We collected data from productreview.com.au and rateitall.com Fine grain:

Productreview.com.au: each review has pros, cons, overall, and a rating

Rateitall.com: each review has a rating Large datasets

Productreview.com.au: 2.4G Rateitall.com: 2.0G

http://wing.comp.nus.edu.sg/~hung/productreview/ http://wing.comp.nus.edu.sg/~hung/rateitall/

04/20/23

Page 12: Progress Update

Discourse Analysis Penn Discourse Treebank 2.0

Based on PTB 2 18459 Explicit relations,16053 Implicit relations

TEMPORAL(950::3696)Asynchronous (697::2090)

precedencesuccession

Synchronous (251::1594)CONTINGENCY (4255::3417)

Cause (4172::2240)reasonResult

Pragmatic Cause (83::13)Justification

Condition (1::1416)hypotheticalgeneralunreal presentunreal pastfactual presentfactual past

Pragmatic Condition (1::67)relevanceimplicit assertion

COMPARISON (2503::5589)Contrast (2120::3928)

juxtapositionopposition

Pragmatic Contrast (4::32)Concession (223::1213)

expectationcontra-expectation

Pragmatic Concession (1::15)

EXPANSION (8861::6423)Conjunction (3534::5320)Instantiation (1445::302)Restatement (3206::162)

specificationequivalencegeneralization

Alternative (185::351)conjunctivedisjunctivechosen alternative

Exception (2::14)List (400::250)

04/20/23

Page 13: Progress Update

Marcu and Echihabi baseline Used word-pairs in a Naive Bayes model

Wellner et al. baseline Used totally 7 feature classes Claimed that proximity and connective are the most useful feature

classes prox: 0.60 prox + conn: 0.7677

I only implemented prox and conn in the baseline system

Accuracy

exp 0.3466

imp 0.5474

exp+imp 0.4119

prox conn prox+conn

exp 0.3488 0.9404 0.9414

imp 0.5435 0.5435 0.5435

exp+imp 0.4373 0.76 0.7604/20/23