2005/09/13 a probabilistic model for retrospective news event detection zhiwei li, bin wang*,...

2005/09/13

A Probabilistic Model for Retrospective News Event Detection

Zhiwei Li, Bin Wang*, Mingjing Li, Wei-Ying Ma

University of Science and Technology of ChinaMicrosoft Research Asia

SIGIR2005

2005/09/13

AbstractRetrospective news event detection (RED)

The discovery of previously unidentified events in historical news corpusBoth the contents and time information of news article are helpful to RED, but most researches focus on the utilization of the contents of news article.Few research works have been carried out on finding better usages of time information.

ProposeA probabilistic model to incorporate both content and time information in unified framework.Build an interactive RED system, HISCOVERY, which provides additional functions to present events, Photo Story and Chronicle.

2005/09/13

IntroductionNews Event

A specific thing happens at a specific time and placeRED

The discovery of previously unidentified events in historical news corpusApplications: detect earthquakes happened in the last ten years from historical news articlesExploration

• The better representations of news articles and events, which should effectively model both the contents and time information.

• Model events in probabilistic manners

2005/09/13

Introduction (cont.)Main contributions

Proposing a multi-model RED algorithm, in which both the contents and time information of news articles are modeled explicitly and effectively.Proposing an approach to determine the approximate number of events from the articles count-time distribution.

2005/09/13

Related WorkRED

First proposed and defined by Yang et al. (SIGIR1998), and an agglomerative clustering algorithm (Group Average Clustering, GAC) was proposed.There are few right-on-the-target research work reported.

New Event Detection (NED)Similar topic, it has been extensively studied.The most prevailing approach of NED was proposed by Allan et al. (SIGIR1998) and Yang et al. (SIGIR1998)Modifications: better representation of contents and utilizing of time information

2005/09/13

Related Work (cont.)From the aspect of utilizing the contents

TF-IDF and cosine similarityNew distance metrics, such as the Hellinger distance metric (SIGIR2003)Better representation of documents, i.e. feature selection, Yang et al. (SIGKDD2002)The usage of named entities have been studied, such as in Allan et al. (1999), Yang et al. (2002) and Lam et al. (2001)Re-weighting of terms, firstly proposed by Allan et al.(1999)Kumaran et al. (SIGIR2004) exploited to use both text classification and named entities to improve the performance of NED.

2005/09/13

Related Work (cont.)From the aspect of utilizing time informaiton

Two kinds of usages Some approaches only use the chronological order of documentsThe others use decaying functions to modify the similarity metrics of the contents. (Brants et al. SIGIR2003)

2005/09/13

Characteristics of News Articles and Events

“Halloween” is a topic, while it includes lots of events.

2005/09/13

Characteristics of News Articles and Events (cont.)

Two most important characteristics of news articles and eventsNews articles are always aroused by news events, the articles counts of an event are changed with time. Events are peaks. However, in some situations, the observed peaks and events are not exactly corresponding.Both the contents and time of the articles reporting the same event are similar on different news sites. The start and end time of reports to events on different websites are very similar.

MethodThe first characteristic leads RED algorithm to be modeled by a latent variable model, where events are latent variables and articles are observations.The second characteristic can gather lots of news stories on the same event by mixing articles coming from different sources.

2005/09/13

Multi-model Retrospective News Event Detection Method

Multi-model approachSince contents and timestamps have different characteristics, it proposes multi-model to incorporate them in a unified probabilistic framework.

RepresentationsAccording to the knowledge about news, news articles can be represented by four kinds of information: who (persons), when (time), where (location) ,and what (keywords)

{ , , , }{ , , , }

article persons locations keywords timeevent persons locations keywords time

( ) ( ) ( ) ( ) ( )p article p persons p locations p keywords p time

2005/09/13

The Generative Model of News ArticlesGenerative Model

Contents• Use mixture of unigram models to model contents• Since persons and locations are important, we model persons,

locations and keywords by three mixtures of unigram models.Timestamps

• The articles count-time distribution is a mixture of many distributions of event.

• A peak is usually modeled by a Gaussian functions.• Thus, Gaussian Mixture Model (GMM) is chosen to model

timestamps.The whole model combines the four mixture models

• Three mixture of unigram model and one GMM

2005/09/13

The Generative Model of News Articles (cont.)

The two-step generating process of a news articles:

1. Choose an event ~ ( )

2. Generate a news article ~ ( | )

For each entity of it, according to the type of current entity:

Choose a person ~ ( )

Choose a loca

jj

i i j

jip p

e Multinomial

x p x e

person Multinomial

tion ~ ( )

Choose a keyword ~ ( )For its timestamp:

Draw a timestamp ~ ( , )

jil l

jik n

j ji

location Multinomial

keyword Multinomial

time N u

2005/09/13

The Generative Model of News Articles (cont.)

A graphical representation of this modelN: the terms space sizes of the three kinds of entities (Np, Nl and Nn)

2005/09/13

Learning Model ParametersModel Parameter

Can be estimated by Maximum Likelihood method

1

1 1

( ; ) log( ( | )) log( ( | ))

log( ( ) ( | , ))

M

ii

M k

j i ji j

l X p X p x

p e p x e

Given an event j, the four kinds of information of the i-th article are conditional independent:

( | ) ( | ) ( | ) ( | ) ( | )i i j i j i j i jjp x e p time e p persons e p locations e p keywords e

2005/09/13

Learning Model Parameters (cont.)Expectation Maximization algorithm

EM is generally applied to maximize log-likelihood.By using the independent assumptions, parameters of the four mixture models can be estimated independently.In E-step, compute the posteriors probability

( ) ( )( 1) ( ) ( )

( )

( ) ( | )( | ) ( ) ( | )

( )

t tj i jt t t

j i j i jti

p e p x ep e x p e p x e

p x

2005/09/13


In M-step, update the parameters of the four modelFor the three mixture of unigram models, parameters are updated by:

( 1)

( 1) 1

( 1)

1 1

1 ( | ) * ( , )( | )

( ( | ) * ( , ))

Mt

j it i

n j M Nt

j ii s

p e x tf i np w e

N p e x tf i s

2005/09/13


In M-step, the parameters of the GMM are updated by:

( 1)( 1) 1

( 1)1

( 1) ( 1) 2( 1) 1

( 1)1

( | )

( | )

( | ) ( )

( | )

M tj i it i

j M tj ii

M t tj i i jt i

j M tj ii

p e x time

p e x

p e x time

p e x

Since the mean and variance of the GMM are changed consistently with the whole model, the Gaussian functions work like sliding windows on time line.

2005/09/13


In M-step the mixture proportions are updated by:( 1)

( 1) 1( | )

( )M t

j it ij

p e xp e

M

The EM algorithm increases the log-likelihood consistently, while it will stop at a local maximum.

2005/09/13

How Many Event?Basic

The initial estimate of events number can be set as the number of peaksBut noises damage the distribution

Salient peakDefine salient scores for peaks as:

( ) ( ) ( )score peak left peak right peak

2005/09/13

How Many Event? (cont.)Salient peak

Use hill-climbing to detect all peaks, and calculate their salient score. the number of top 20% peaks is the initial estimation of k.

Alternative way of kUser can specify the initial value of k, and use split/merge

Model selectionApply the Minimum Description Length (MDL) principle to select among values of k:

arg max(log( ( ; )) log( ))2

3 1 ( 1) ( 1) ( 1)

k

k

k p l n

mk p X M

m k k N k N k N

2005/09/13

Event SummarizationTwo ways to summarize news events

Choose some features with the maximum probabilities to represent event

• For event j, the ‘protagonist’ is the person with the maximum p(personp|ej)

• The read abilities are so bad

Choose one news article as the representative for each news event

• The article with the maximum p(xi|ej)• The first article of each event is also a good representative

2005/09/13

Algorithm Summary1. Multi-model RED Algorithm:

a. Using hill-climbing algorithm to find all peaksb. Using salient scores to determine the TOP 20% peaks, and initialize events correspondingly.

2. Learning model parametersa. E-step: computing posteriorsb. M-step: updating parameters

3. Increasing/decreasing the initial number of events until the minimum/maximum events numbers is reacheda. Using Splitting/merging current big/small peaks, and re-initialize events

correspondinglyb. Goto step 2

4. Performing model selections by MDL5. Summarizing

2005/09/13

Application: HISCOVERY SystemHISCOVERY (HIStory disCOVERY)

Photo Story and ChronicleNews article come from 12 news sitesPhoto Story

2005/09/13

Application: HISCOVERY System (cont.)

HISCOVERYChronicle

• User enters a topic• HISCOVERY search the news corpus to gather related articles• Apply the proposed RED approach to detect events belonging to this

topic, and then sort summaries of events in chronological order.

2005/09/13

Experimental MethodsData Preparation

The first is TDT4 dataset

Choose three representative topics form TDT4 dataset, and download articles from some news websites

2005/09/13

Experimental Methods (cont.)Experimental Design

In the first two experiments, set the cluster numbers as the number of events, but in practice, the event number must be determined automaticallyTo compare, Yang et al.’s augmented Group Average Clustering (GAC) and kNN algorithm are chosen as baselines

Evaluation MeasuresOnce got contingency tables and corresponding measures (precision, recall, and F1) are calculated

2005/09/13

ResultsOverall Performance on Dataset 1

The better performance of the full Probabilistic Model indicates the benefits of modeling named entities by separate models.Name entities are very important for news articles.

2005/09/13

Results (cont.)Overall Performance on Dataset 2

2005/09/13

Results (cont.)How many events?

Salient peak

Use Mutual information to measure the fitness of a partition with the ground truth

2005/09/13

Conclusions and Future WorkContribution

Use a multi-model RED algorithm to model two characteristics of news articles and events

Future WorkFind better representation of the contents of news articlesStudy how to use dynamic models to model news events, such as Hidden Markov Model (HMM) and Independent Components Analysis (ICA)

2005/09/13 a probabilistic model for retrospective news event detection zhiwei li, bin wang*,...

Documents