automatically obtain a description for a larger cluster of relevant documents identify terms related...

14
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms Synonyms, stemming variations, terms close to query terms Local analysis Use correlated terms from retrieved documents for query expansion

Post on 22-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms

Synonyms, stemming variations, terms close to query terms

Local analysis Use correlated terms from retrieved

documents for query expansion

Three types of clusters Association clusters

Stems co-occurring frequently inside documents have a synonymity association

Un-normalized correlation factor Su,v=Cu,v

Normalized correlation factor

Build local association clusters as follows

Find clusters for the query terms

Metric clusters Consider the distance between two terms

to compute their correlation factor

Un-normalized correlation factor

Su,v =Cu,v

Normalized correlation factor

Build local metric clusters as follows

Scalar clusters Two stems with similar neighborhoods

have some synonymity relationship

A term Su is a neighbor of Sv if Su belongs to a cluster (of size n) associated with Sv

Neighbor stems having a synonymity relationship are not necessarily synonyms in the grammatical sense

Union of un-normalized and normalized clusters provides a better representation of possible correlations

Metric clusters seem to perform better than purely association clusters

Global analysis Expand the query using information

from the whole set of documents in the collection Build a thesaurus-like structure Select terms for expansion based on their

similarity to the whole query Previous approaches failed to yield good

results by considering individual query terms

Query expression done in three steps Represent the query as follows

Compute the similarity between each term correlated to the query terms and the whole query

Expand the query with the top r ranked terms according to the similarity computed

Yield improved retrieval performance in the range of 20%