2005/09/13 a probabilistic model for retrospective news event detection zhiwei li, bin wang*,...
DESCRIPTION
2005/09/13 Introduction News Event A specific thing happens at a specific time and place RED The discovery of previously unidentified events in historical news corpus Applications: detect earthquakes happened in the last ten years from historical news articles Exploration The better representations of news articles and events, which should effectively model both the contents and time information. Model events in probabilistic mannersTRANSCRIPT
2005/09/13
A Probabilistic Model for Retrospective News Event Detection
Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma
University of Science and Technology of ChinaMicrosoft Research Asia
SIGIR2005
2005/09/13
AbstractRetrospective news event detection (RED)
The discovery of previously unidentified events in historical news corpusBoth the contents and time information of news article are helpful to RED, but most researches focus on the utilization of the contents of news article.Few research works have been carried out on finding better usages of time information.
ProposeA probabilistic model to incorporate both content and time information in unified framework.Build an interactive RED system, HISCOVERY, which provides additional functions to present events, Photo Story and Chronicle.
2005/09/13
IntroductionNews Event
A specific thing happens at a specific time and placeRED
The discovery of previously unidentified events in historical news corpusApplications: detect earthquakes happened in the last ten years from historical news articlesExploration
• The better representations of news articles and events, which should effectively model both the contents and time information.
• Model events in probabilistic manners
2005/09/13
Introduction (cont.)Main contributions
Proposing a multi-model RED algorithm, in which both the contents and time information of news articles are modeled explicitly and effectively.Proposing an approach to determine the approximate number of events from the articles count-time distribution.
2005/09/13
Related WorkRED
First proposed and defined by Yang et al. (SIGIR1998), and an agglomerative clustering algorithm (Group Average Clustering, GAC) was proposed.There are few right-on-the-target research work reported.
New Event Detection (NED)Similar topic, it has been extensively studied.The most prevailing approach of NED was proposed by Allan et al. (SIGIR1998) and Yang et al. (SIGIR1998)Modifications: better representation of contents and utilizing of time information
2005/09/13
Related Work (cont.)From the aspect of utilizing the contents
TF-IDF and cosine similarityNew distance metrics, such as the Hellinger distance metric (SIGIR2003)Better representation of documents, i.e. feature selection, Yang et al. (SIGKDD2002)The usage of named entities have been studied, such as in Allan et al. (1999), Yang et al. (2002) and Lam et al. (2001)Re-weighting of terms, firstly proposed by Allan et al.(1999)Kumaran et al. (SIGIR2004) exploited to use both text classification and named entities to improve the performance of NED.
2005/09/13
Related Work (cont.)From the aspect of utilizing time informaiton
Two kinds of usages Some approaches only use the chronological order of documentsThe others use decaying functions to modify the similarity metrics of the contents. (Brants et al. SIGIR2003)
2005/09/13
Characteristics of News Articles and Events
“Halloween” is a topic, while it includes lots of events.
2005/09/13
Characteristics of News Articles and Events (cont.)
Two most important characteristics of news articles and eventsNews articles are always aroused by news events, the articles counts of an event are changed with time. Events are peaks. However, in some situations, the observed peaks and events are not exactly corresponding.Both the contents and time of the articles reporting the same event are similar on different news sites. The start and end time of reports to events on different websites are very similar.
MethodThe first characteristic leads RED algorithm to be modeled by a latent variable model, where events are latent variables and articles are observations.The second characteristic can gather lots of news stories on the same event by mixing articles coming from different sources.
2005/09/13
Multi-model Retrospective News Event Detection Method
Multi-model approachSince contents and timestamps have different characteristics, it proposes multi-model to incorporate them in a unified probabilistic framework.
RepresentationsAccording to the knowledge about news, news articles can be represented by four kinds of information: who (persons), when (time), where (location) ,and what (keywords)
{ , , , }{ , , , }
article persons locations keywords timeevent persons locations keywords time
( ) ( ) ( ) ( ) ( )p article p persons p locations p keywords p time
2005/09/13
The Generative Model of News ArticlesGenerative Model
Contents• Use mixture of unigram models to model contents• Since persons and locations are important, we model persons,
locations and keywords by three mixtures of unigram models.Timestamps
• The articles count-time distribution is a mixture of many distributions of event.
• A peak is usually modeled by a Gaussian functions.• Thus, Gaussian Mixture Model (GMM) is chosen to model
timestamps.The whole model combines the four mixture models
• Three mixture of unigram model and one GMM
2005/09/13
The Generative Model of News Articles (cont.)
The two-step generating process of a news articles:
1. Choose an event ~ ( )
2. Generate a news article ~ ( | )
For each entity of it, according to the type of current entity:
Choose a person ~ ( )
Choose a loca
jj
i i j
jip p
e Multinomial
x p x e
person Multinomial
tion ~ ( )
Choose a keyword ~ ( )For its timestamp:
Draw a timestamp ~ ( , )
jil l
jik n
j ji
location Multinomial
keyword Multinomial
time N u
2005/09/13
The Generative Model of News Articles (cont.)
A graphical representation of this modelN: the terms space sizes of the three kinds of entities (Np, Nl and Nn)
2005/09/13
Learning Model ParametersModel Parameter
Can be estimated by Maximum Likelihood method
1
1 1
( ; ) log( ( | )) log( ( | ))
log( ( ) ( | , ))
M
ii
M k
j i ji j
l X p X p x
p e p x e
Given an event j, the four kinds of information of the i-th article are conditional independent:
( | ) ( | ) ( | ) ( | ) ( | )i i j i j i j i jjp x e p time e p persons e p locations e p keywords e
2005/09/13
Learning Model Parameters (cont.)Expectation Maximization algorithm
EM is generally applied to maximize log-likelihood.By using the independent assumptions, parameters of the four mixture models can be estimated independently.In E-step, compute the posteriors probability
( ) ( )( 1) ( ) ( )
( )
( ) ( | )( | ) ( ) ( | )
( )
t tj i jt t t
j i j i jti
p e p x ep e x p e p x e
p x
2005/09/13
Learning Model Parameters (cont.)Expectation Maximization algorithm
In M-step, update the parameters of the four modelFor the three mixture of unigram models, parameters are updated by:
( 1)
( 1) 1
( 1)
1 1
1 ( | ) * ( , )( | )
( ( | ) * ( , ))
Mt
j it i
n j M Nt
j ii s
p e x tf i np w e
N p e x tf i s
2005/09/13
Learning Model Parameters (cont.)Expectation Maximization algorithm
In M-step, the parameters of the GMM are updated by:
( 1)( 1) 1
( 1)1
( 1) ( 1) 2( 1) 1
( 1)1
( | )
( | )
( | ) ( )
( | )
M tj i it i
j M tj ii
M t tj i i jt i
j M tj ii
p e x time
p e x
p e x time
p e x
Since the mean and variance of the GMM are changed consistently with the whole model, the Gaussian functions work like sliding windows on time line.
2005/09/13
Learning Model Parameters (cont.)Expectation Maximization algorithm
In M-step the mixture proportions are updated by:( 1)
( 1) 1( | )
( )M t
j it ij
p e xp e
M
The EM algorithm increases the log-likelihood consistently, while it will stop at a local maximum.
2005/09/13
How Many Event?Basic
The initial estimate of events number can be set as the number of peaksBut noises damage the distribution
Salient peakDefine salient scores for peaks as:
( ) ( ) ( )score peak left peak right peak
2005/09/13
How Many Event? (cont.)Salient peak
Use hill-climbing to detect all peaks, and calculate their salient score. the number of top 20% peaks is the initial estimation of k.
Alternative way of kUser can specify the initial value of k, and use split/merge
Model selectionApply the Minimum Description Length (MDL) principle to select among values of k:
arg max(log( ( ; )) log( ))2
3 1 ( 1) ( 1) ( 1)
k
k
k p l n
mk p X M
m k k N k N k N
2005/09/13
Event SummarizationTwo ways to summarize news events
Choose some features with the maximum probabilities to represent event
• For event j, the ‘protagonist’ is the person with the maximum p(personp|ej)
• The read abilities are so bad
Choose one news article as the representative for each news event
• The article with the maximum p(xi|ej)• The first article of each event is also a good representative
2005/09/13
Algorithm Summary1. Multi-model RED Algorithm:
a. Using hill-climbing algorithm to find all peaksb. Using salient scores to determine the TOP 20% peaks, and initialize events correspondingly.
2. Learning model parametersa. E-step: computing posteriorsb. M-step: updating parameters
3. Increasing/decreasing the initial number of events until the minimum/maximum events numbers is reacheda. Using Splitting/merging current big/small peaks, and re-initialize events
correspondinglyb. Goto step 2
4. Performing model selections by MDL5. Summarizing
2005/09/13
Application: HISCOVERY SystemHISCOVERY (HIStory disCOVERY)
Photo Story and ChronicleNews article come from 12 news sitesPhoto Story
2005/09/13
Application: HISCOVERY System (cont.)
HISCOVERYChronicle
• User enters a topic• HISCOVERY search the news corpus to gather related articles• Apply the proposed RED approach to detect events belonging to this
topic, and then sort summaries of events in chronological order.
2005/09/13
Experimental MethodsData Preparation
The first is TDT4 dataset
Choose three representative topics form TDT4 dataset, and download articles from some news websites
2005/09/13
Experimental Methods (cont.)Experimental Design
In the first two experiments, set the cluster numbers as the number of events, but in practice, the event number must be determined automaticallyTo compare, Yang et al.’s augmented Group Average Clustering (GAC) and kNN algorithm are chosen as baselines
Evaluation MeasuresOnce got contingency tables and corresponding measures (precision, recall, and F1) are calculated
2005/09/13
ResultsOverall Performance on Dataset 1
The better performance of the full Probabilistic Model indicates the benefits of modeling named entities by separate models.Name entities are very important for news articles.
2005/09/13
Results (cont.)Overall Performance on Dataset 2
2005/09/13
Results (cont.)How many events?
Salient peak
Use Mutual information to measure the fitness of a partition with the ground truth
2005/09/13
Conclusions and Future WorkContribution
Use a multi-model RED algorithm to model two characteristics of news articles and events
Future WorkFind better representation of the contents of news articlesStudy how to use dynamic models to model news events, such as Hidden Markov Model (HMM) and Independent Components Analysis (ICA)