automatically obtain a description for a larger cluster of relevant documents identify terms related...
Post on 22-Dec-2015
220 views
TRANSCRIPT
Automatically obtain a description for a larger cluster of relevant documents Identify terms related to query terms
Synonyms, stemming variations, terms close to query terms
Local analysis Use correlated terms from retrieved
documents for query expansion
Three types of clusters Association clusters
Stems co-occurring frequently inside documents have a synonymity association
Un-normalized correlation factor
Su,v =Cu,v
Normalized correlation factor
Build local metric clusters as follows
A term Su is a neighbor of Sv if Su belongs to a cluster (of size n) associated with Sv
Neighbor stems having a synonymity relationship are not necessarily synonyms in the grammatical sense
Union of un-normalized and normalized clusters provides a better representation of possible correlations
Metric clusters seem to perform better than purely association clusters
Global analysis Expand the query using information
from the whole set of documents in the collection Build a thesaurus-like structure Select terms for expansion based on their
similarity to the whole query Previous approaches failed to yield good
results by considering individual query terms