progress update
DESCRIPTION
Progress Update. Lin Ziheng. Outline. Update Summarization Opinion Summarization Discourse Analysis. Update Summarization. TAC 2008 update summarization task slightly differ from the DUC 2007 update task - PowerPoint PPT PresentationTRANSCRIPT
Progress Update
Lin Ziheng
04/20/23
Outline Update Summarization Opinion Summarization Discourse Analysis
04/20/23
Update Summarization TAC 2008 update summarization task slightly
differ from the DUC 2007 update task The documents will be from the AQUAINT-2
collection rather than the AQUAINT collection Cluster format:
There will only be two sets per cluster (Set A and Set B) Each document set will have exactly 10 documents The summary for document Set A should be a regular
topic-focused summary The summary for Set B should be written under the
assumption that the user has already read all the documents in Set A
04/20/23
Tarsqi: a tool for event/time anchoring/ordering Recognizes events and times Creates event/event, event/time, time/time
temporal linksJohn fell after Mary pushed him.
They heard an explosion on Monday, but not in 2007.
This reminded them of the 1968 war, which ravaged the countryside in 1969.
He slept on Friday night.
She hopes to succeed before noon.
Gonzalez said he would resign on Tuesday.
He thought it was a great deal.
John leaves today.
John does not leave today. 04/20/23
1
2
3
4
5
6
7
8
9
D1 D2
1
2
3
4
5
6
7
8
9
D1 D2
Tarsqi
1
3 4
2
6
5 7
8
9
GraphLayering
04/20/23
D0703A-A
04/20/23
BFS
04/20/23
Topmost layering
04/20/23
Optimal layering
04/20/23
Opinion Summarization Input:
Output: a summary for each target that summarizes the answers to the questions
<target id = "9902" text = "Time Magazine 2005 Person of the Year"> <q id = "9902.1" type= "SquishyList">
Why did readers support Time's inclusion of Bono for Person of the Year?
</q> <q id = "9902.2" type= "SquishyList">
Why did readers not support the inclusion of Bill Gates as Person of the Year?
</q> <q id = "9902.3" type= "SquishyList">
Why did readers not support the inclusion of Melinda Gates as Person of the Year?
</q> <doc id = "BLOG06-20051222-014-0013437834" /> <doc id = "BLOG06-20051224-070-0016186787" /> <doc id = "BLOG06-20051225-087-0014047570" /> <doc id = "BLOG06-20051225-022-0003271778" /> <doc id = "BLOG06-20051223-002-0006769403" /> <doc id = "BLOG06-20051222-023-0003513393" /> <doc id = "BLOG06-20051228-009-0011259747" /><doc id = "BLOG06-20051221-029-0028769327" />
</target>
04/20/23
Existing opinion corpus: Movie Review corpus Document level:
1000 +ve documents and 1000 –ve documents Problem: coarse grain level
Sentence level: 5331 +ve sentences and 5331 –ve sentences Problem: not enough data
We collected data from productreview.com.au and rateitall.com Fine grain:
Productreview.com.au: each review has pros, cons, overall, and a rating
Rateitall.com: each review has a rating Large datasets
Productreview.com.au: 2.4G Rateitall.com: 2.0G
http://wing.comp.nus.edu.sg/~hung/productreview/ http://wing.comp.nus.edu.sg/~hung/rateitall/
04/20/23
Discourse Analysis Penn Discourse Treebank 2.0
Based on PTB 2 18459 Explicit relations,16053 Implicit relations
TEMPORAL(950::3696)Asynchronous (697::2090)
precedencesuccession
Synchronous (251::1594)CONTINGENCY (4255::3417)
Cause (4172::2240)reasonResult
Pragmatic Cause (83::13)Justification
Condition (1::1416)hypotheticalgeneralunreal presentunreal pastfactual presentfactual past
Pragmatic Condition (1::67)relevanceimplicit assertion
COMPARISON (2503::5589)Contrast (2120::3928)
juxtapositionopposition
Pragmatic Contrast (4::32)Concession (223::1213)
expectationcontra-expectation
Pragmatic Concession (1::15)
EXPANSION (8861::6423)Conjunction (3534::5320)Instantiation (1445::302)Restatement (3206::162)
specificationequivalencegeneralization
Alternative (185::351)conjunctivedisjunctivechosen alternative
Exception (2::14)List (400::250)
04/20/23
Marcu and Echihabi baseline Used word-pairs in a Naive Bayes model
Wellner et al. baseline Used totally 7 feature classes Claimed that proximity and connective are the most useful feature
classes prox: 0.60 prox + conn: 0.7677
I only implemented prox and conn in the baseline system
Accuracy
exp 0.3466
imp 0.5474
exp+imp 0.4119
prox conn prox+conn
exp 0.3488 0.9404 0.9414
imp 0.5435 0.5435 0.5435
exp+imp 0.4373 0.76 0.7604/20/23