Diversifying Query Results on Semi-Structured Data
Md. Mahbub HasanUniversity of California,
Riverside
XML Document
School
UToronto
PhDThesis
First Name
Author
Last Name
MichalisFaloutso
s
PhDThesis
First Name
Author
Last Name
Christos
Faloutsos
School
UToronto
Paper
First Name
Author
Last Name
Michalis Faloutsos
Title
Networking
Bib
QueryFind all Bibliography records related to
Faloutsos
Bib
Faloutsos
//Bib//Faloutsos
Twig Pattern
XPath Expression
Results
School
UToronto
PhDThesis
First Name
Author
Last Name
Michalis Faloutsos
PhDThesis
First Name
Author
Last Name
Christos
Faloutsos
School
UToronto
Paper
First Name
Author
Last Name
Michalis Faloutsos
Title
Networking
Bib
ProblemSuppose we can return the user only two
results( k = 2)Which two results we should return?
Which Two Results We Should Return?
School
UToronto
PhDThesis
First Name
Author
Last Name
Michalis Faloutsos
PhDThesis
First Name
Author
Last Name
Christos
Faloutsos
School
UToronto
Paper
First Name
Author
Last Name
Michalis Faloutsos
Title
Networking
Bib
SolutionSuppose we can return the user only two
results( k = 2)Which two results we should return?
Return the results that are most diverse to each otherThe idea is to help the user to better
understand/explore the result set
Diversity ProblemCan be divided into two subproblems
How to compute the distance between two results?
How to find k most diverse results efficiently from the set of candidate answers?
How to Compute the Distance between Two Results?Two types of differences between results
Structural differenceContent difference
Structural Differences
School
UToronto
PhDThesis
First Name
Author
Last Name
MichalisFaloutso
s
Bib
PhDThesis
First Name
Author
Last Name
Christos
Faloutsos
School
UToronto
Bib
Paper
First Name
Author
Last Name
MichalisFaloutso
s
Title
Networking
Bib
Content Differences
School
UToronto
PhDThesis
First Name
Author
Last Name
MichalisFaloutso
s
Bib
PhDThesis
First Name
Author
Last Name
Christos Faloutsos
School
UToronto
Bib
Paper
First Name
Author
Last Name
MichalisFaloutso
s
Title
Networking
Bib
Finding Diverse ResultsNaïve Approach
Compute all pair-wise distances of the resultsFind the k-result subset with maximum
diversityChallenges to improve the naïve approach
Reduce the number of distance computationsPrune large fraction of k-result subsets
ConclusionDistance Measure for Structural Query results
Novel and EfficientConsiders both Structural and Content Information
Diversification AlgorithmHeuristic approach to improve the naïve algorithm
Future WorkConsider approximate matches
Approximation in structure Approximation in value
Thank You!