probabilistic content models,
TRANSCRIPT
![Page 1: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/1.jpg)
Probabilistic Content Model, with Applications to Generation and Summarization BRYAN ZHANG HANG|
![Page 2: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/2.jpg)
Outline:
Goal: Modeling Topic Structures of Text
We will use:
Hidden Markov Model
Bigrams
Clustering
Application:
Sentence Ordering
Extractive Summerization
![Page 3: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/3.jpg)
Review: Hidden Markov Model:
S1 S2 S3
O1 O3 O2
STATES
OBSERVATIONS
TRANSITIONS EMISSIONS
![Page 4: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/4.jpg)
Imagine :
You call your friend who lives in a foreign country from time to tim
e. Every time you ask him or her “ What are you up to?”
The possible answers are:
“ walk” “ice cream” “shopping”
“reading” “programming” “kayaking”
Review: Hidden Markov Model:
![Page 5: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/5.jpg)
Possible answers over a month:
“kayaking” “walk” “ shopping” “kayaking” “programming”…
sunny sunny probably sunny sunny ? Probably rainy
Review: Hidden Markov Model:
Latent class ( Hidden Part)
![Page 6: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/6.jpg)
Review: Hidden Markov Model:
S1 S2 S3
O1 O3 O2
TRANSITIONS
probability
EMISSIONS
probability
S
![Page 7: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/7.jpg)
Review: Hidden Markov Model:
R S S
Programming Read Walk
P(Programming |R)
*
P(R|*) P(S|R) P(S|S)
P(walk|S) P(Read|S)
![Page 8: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/8.jpg)
Review: Hidden Markov Model:
R S S
Programming Reading Walking
P(Programming |R)
*
P(R|*) P(S|R) P(S|S)
P(walk|S) P(Read|S)
The probability of the sequence Programming Walking Reading given the weather is :
P(R|*) * P(S|R) * P(S|S) * P(Programming |R) * P(walk|S) * P(Read|S )
![Page 9: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/9.jpg)
Exercise:
Rainy Sunny
Walk Clean Go shopping
S
0.1
0.4
0.5 0.6 0.3
0.1
0.3
0.4 0.6 ?
0.4 0.6
What is the state sequence (Start-S1-S2) that can maximize
probability of the observation sequence “ clean shopping”
![Page 10: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/10.jpg)
Rainy Sunny
Walk Clean Go shopping
S
0.1 0.4
0.5 0.6 0.3
0.1
0.3
0.4 0.6 ?
0.4 0.6
Transition P.
P(R|START)=0.6
P(S |START)=0.4
P(S|R)=0.3
P(S|S)=0.6
P(R|R)=0.7
P(R|S)=0.4
Emission P.
P(CLEAN|R)=0.5
P(CLEAN|S)=0.1
P(SHOPPING|R)=0.4
P(SHOPPING|S)=0.3
![Page 11: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/11.jpg)
Transition P. P(R|START)=0.6
P(S |START)=0.4
P(S|R)=0.3
P(S|S)=0.6
P(R|R)=0.7
P(R|S)=0.4
STATES { R, S}
Emission P. P(CLEAN|R)=0.5
P(CLEAN|S)=0.1
P(SHOPPING|R)=0.4
P(SHOPPING|S)=0.3
EMISSIONS{CLEAN,SHOPPING}
START S1 S2
CLEAN SHOPPING
START
P(S |START) P(S|S)P(CLEAN|S) P(SHOPPING|S)
P(S |START) P(R|S) P(CLEAN|S) P(SHOPPING|R)
P(R |START) P(S|R) P(CLEAN|R) P(SHOPPING|S)
P(R |START) P(R|R) P(CLEAN|R) P(SHOPPING|R)
ANSWER IS START-RAIN-RAIN
![Page 12: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/12.jpg)
Probabilistic Content Model
S1 S2 S3
O1 O3 O2
TOPICS
TRANSITIONS EMISSIONS
SENTENCES
![Page 13: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/13.jpg)
Sentences are Bigram Sequences
Probability of a n-word sentence generated from a state s is :
![Page 14: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/14.jpg)
Probabilistic Content Model
S1 S2 S3
O1 O3 O2
TOPICS
TRANSITIONS EMISSIONS
SENTENCES
![Page 15: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/15.jpg)
TOPICS:
Derived from the content
Partition sentences from the documents within a domai
n-specific collection into k clusters (Initial Clusters) .
Use Bigram Vectors as features
Sentence similarity is the cosine of bigram vectors.
STEP 1
![Page 16: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/16.jpg)
![Page 17: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/17.jpg)
An example of the output: LOCATION INFORMATION
![Page 18: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/18.jpg)
TOPICS:
Derived from the content
D(C,C’): Number of documents in which a sentence from C immediately
precedes one from C’
D(C): Number of documents containing sentences from C.
For two States C,C’, smoothed estimate of state transition probability is:
![Page 19: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/19.jpg)
EM-like Viterbi Re-estimation
we can compute the transition probability from the initial sentence clusters (Topic Clusters)
Hidden Markov Model can estimate the topics of sentences
Assign sentence s in the topic clusters as the estimated topic.
Cluster/estimate cycle is repeated until the clusters stabilize
TOPICS:
Derived from the content
STEP 2
![Page 20: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/20.jpg)
Evaluation Task 1
Information Ordering
Information ordering task is essential to many text-
synthesis applications
e.g. concept-to-text generation, multi-document
summarization.
![Page 21: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/21.jpg)
Evaluation Task 1
Information Ordering
![Page 22: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/22.jpg)
Evaluation Task 1
Information Ordering
Num. of Sentences
![Page 23: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/23.jpg)
Evaluation Task 1
Information Ordering
Number of Order of Sentences:
3 sentences= 3*2*1=6 kinds of different sentence order
4 sentences =4*3*2*1=24
Number of sentences over 10 means :
There are over 3 million kinds of different orders .
![Page 24: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/24.jpg)
Evaluation Task 1
Information Ordering
Generate all the sentence orders
Compute Probability of each order
Rank the orders by probability
Metric :
OSO: Original Sentence Order:
Position of Original Sentence in the ranked list
Baseline:
Word bigram model
![Page 25: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/25.jpg)
Evaluation Task 1
Information Ordering
Rank is the Rank of the original sentence order (OSO)
by the model
OSO prediction rate is the percentage of the test
cases in which the model gives highest probability to
the OSO among all possible permutations.
![Page 26: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/26.jpg)
Evaluation Task 1
Information Ordering
Indicator of the swaps
•Lapata’ technique is feature-rich method (in this
experiment using linguistic features such as noun-
verb dependency.
•It aggravates the data sparseness problems
for a smaller corpus
Kendall T: measure how much an ordering
differs from the OSO
![Page 27: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/27.jpg)
Evaluation Task 2
Summarization
Baseline: the “Lead” baseline, pick the first L sentences
Sentence classifer:
1.each sentence is labelled “ in” or “ out” of the summary
2.features for each sentence are unigrams and its location,
which means we look at the words and their location in the
sentences.
![Page 28: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/28.jpg)
Evaluation Task 2
Summarization Probabilistic Content Model:
All the sentences in the documents are assigned with the topics
All the sentences in the summaries are assigned with the topics
Probability( Topic A in summary)=
(Number of documents in summary where topic A appears)
(Number of documents in documents where topic A appears )
Sentences in which its topic has high appearance probability in
summaries are extracted.
![Page 29: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/29.jpg)
Evaluation Task 1
Information Ordering
Content Model outperforms sentence-level,
Locally-focused method and L baseline
![Page 30: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/30.jpg)
Content model
Word+ Location
baseline
![Page 31: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/31.jpg)
Relation Between Two Tasks
Single Domain: Earthquakes
Ordering : OSO prediction rate
Summarization: Extractive accuracy
Optimization of parameters on one task promises to yield good performance on the other
This content model serves as effective representation of text structure in general
![Page 32: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/32.jpg)
Conclusions:
In this paper , this unsupervised, knowledge-lean method validates the
hypothesis:
Word distribution patterns strongly correlate with discourse patterns within a
text ( at least specific domains)
Future direction :
This model is a domain-dependent model
Incorporation of domain-independent relations in the transition structure of
the content model.
![Page 33: Probabilistic content models,](https://reader031.vdocuments.us/reader031/viewer/2022030304/5879836a1a28ab6c358b5f23/html5/thumbnails/33.jpg)