identifying meaningful return information for xml keyword search yi chen ziyang liu, yi chen arizona...

28
Identifying Meaningful Return Identifying Meaningful Return Information Information for XML Keyword Search for XML Keyword Search Ziyang Liu, Yi Chen Yi Chen Arizona State University

Post on 21-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

Identifying Meaningful Return Identifying Meaningful Return InformationInformation

for XML Keyword Searchfor XML Keyword Search

Ziyang Liu, Yi ChenYi ChenArizona State University

Page 2: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Searching XML DataSearching XML Data

XQuery

for $x in doc(“DB.xml”)//player

$y in $x/name

where $y = “Mutombo”

return $x/position

Find the position of the player with name “Mutombo”

Keyword Search

Mutombo, position

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Page 3: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

How to identify meaningful return information? Inferring return clauses in XQuery Limited research has been done

Users or system administrators specify [Hristidis et al 03, Li et al 04] Whole document [Carmel et al 02] Subtree Return [Cohen et al 03, Guo et al 03, Xu et al 05] Path Return variants [Hristidis et al 06]

Challenges in XML Keyword Challenges in XML Keyword SearchSearch

How to select relevant keyword matches and connect them? Inferring for clauses (with variable bindings) and where clauses in

XQuery Have been much studied

XRank [Guo et al 03] XSEarch [Cohen et al 03] Meaningful LCA [Li et al 04] Smallest LCA[Xu et al 05]

XSeekXSeek: automatically and intelligently identifies return information

Page 4: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Selecting and Connecting Keyword Selecting and Connecting Keyword MatchesMatches

Identify relevant matches using variants of LCA concepts

[Cohen et al 03, Li et al 04, Xu et al 05]

Q1: Mutombo, position

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Page 5: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Selecting and Connecting Keyword Selecting and Connecting Keyword MatchesMatches

Q1: Mutombo, position

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Given relevant matches, what should be returned?

Page 6: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Example I: Subtree ReturnExample I: Subtree Return

Q1: Mutombo, position

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Q2: Mutombo, center

Page 7: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Example I: Path ReturnExample I: Path Return

Q1: Mutombo, position

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Q2: Mutombo, center

Page 8: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Example I: XSeekExample I: XSeek

Q1: Mutombo, position

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Q2: Mutombo, center

Page 9: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Example II: Subtree Return, Path Example II: Subtree Return, Path ReturnReturn

Q3: Rockets

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Page 10: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Example II: XSeekExample II: XSeek

Q3: Rockets

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Page 11: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

ContributionsContributions

XSeek: automatically infers meaningful return information for XML keyword Search No elicitation from users or system administrators is required No schema information is required

Inferring search semantics Analyzing XML data structure Analyzing keyword match pattern Determining search results based on node types and match

types

Efficient implementation of the search semantics

Experimental verification on effectiveness and efficiency

Page 12: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

RoadmapRoadmap

Motivation

Inferring search semantics Analyzing keyword match patterns Analyzing XML data structure Identifying search results

XSeek architecture

Experiments

Conclusions

Page 13: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Analyzing Keyword Match Analyzing Keyword Match PatternsPatterns

Identifying search predicates and return nodes in keywords

Examples of keyword searches Q1: Mutombo, position

Q2: Mutombo, center

Q3: Rockets

Examples of structured queries SQL:select position from Player where name = “Mutombo”

XQuery:for $x in doc(“DB.xml”)//playerwhere $x/name = “Mutombo”return $x/position

Return Nodes Search Predicates

Return Nodes

Search Predicates

Page 14: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Analyzing XML Data Analyzing XML Data StructureStructure

Three types of data nodesEntity nodesAttribute nodesConnection nodes

Related work on identifying node types [Xu et al 06]

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Page 15: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Identifying Search ResultsIdentifying Search Results

Search results consist of

Matches to search predicates This allows users to verify the relevance of search results

Matches to return nodes This is what the user is searching for Matches are output according to node types

Attribute node: display name, value Entity node: display name, attributes, optionally entity and

connection descendants Connection node: display name, optionally entity and connection

descendants

Nodes that connect these matches

Page 16: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

A Search Result ExampleA Search Result Example

Q1: Mutombo, position

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Page 17: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

What if Return Nodes Are Absent?What if Return Nodes Are Absent?

Explicit return nodes: nodes that are explicitly identified in input keywords

Inferring implicit return nodes if no explicit return nodes in input keywords Users may be interested in general information of entities that are

relevant to the search Master entity: the lowest ancestor-or-self entity of the LCA node, or

the XML tree root Relevant entity: the entities on a path from a master entity to a

relevant keyword match, inclusively

Page 18: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Search with Implicit Return Search with Implicit Return Nodes (I)Nodes (I)

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Q2: Mutombo, center

Page 19: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Search with Implicit Return Nodes Search with Implicit Return Nodes (II)(II)

Q3: Rockets

team

founded stadiumplayers

player

name position nationality

CongocenterMutombo

division

1967 Toyota southwest

name

Rockets

league

team

… team

… player

Centerplayer

name position nationality

U.SguardWells

founded

1967

name

Rockets

Page 20: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

RoadmapRoadmap

Motivation

Inferring search semantics Analyzing keyword match patterns Analyzing XML data structure Identifying search results

XSeek architecture

Experiments

Conclusions

Page 21: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Data Analyzer

Architecture of XSeekArchitecture of XSeek

Index Builder

Keyword Matcher

Match Grouper

Keyword Analyzer

Return Node Recognizer

Result Generator

Indexes

Search Result

XML

Keywords

• Entities

• Attributes

• Connection nodes

• Search predicates

• Return nodes

• Explicit return nodes

• Implicit return nodes

Page 22: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Experimental SetupExperimental Setup

Compare the performance of XSeek Subtree Return Path Return

Measurements Search quality Speed Scalability

Data sets: Mondial, WSU, XMark benchmarkQuery sets: eight queries for each data set

Page 23: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Search Quality: PrecisionSearch Quality: Precision

Precision: measures the soundness of search results

XSeek in general has a precision as good as Path Return

0

20

40

60

80

100

QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8

Subtree Return Path Return XSeek

||||

returnreturnrelevantprecision

open auction, person257 seller, person179, buyer, price, date

Page 24: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Recall: measures the completeness of search results

XSeek in general has a recall as good as Subtree Return

Search Quality: RecallSearch Quality: Recall

0

20

40

60

80

100

QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8

Subtree Return Path Return XSeek

||||

relevantreturnrelevantrecall

Page 25: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

F-Measure is a weighted harmonic mean of precision and recall

XSeek has the best F-Measure

Search Quality: F-MeasureSearch Quality: F-Measure

0

20

40

60

80

100

α=0.5 α=1.0 α=2.0

Subtree Return Path Return XSeek

recallprecisionrecallprecisionF

)1(

Page 26: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

Speed: Benchmark DataSpeed: Benchmark Data

0

0.3

0.6

0.9

1.2

1.5

QA1 QA2 QA3 QA4 QA5 QA6 QA7 QA8

Tim

e (s

)

Subtree Return Path Return XSeek

2.0 4.23.7

seller, person179, buyer, price, date

person257, person133

Page 27: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

SIGMOD 2007

ConclusionsConclusions

The first work that automatically infers meaningful return information for XML keyword search No elicitation from users or system administrators, no schema information is required

Analyzing keyword match patterns Search predicates Return nodes

Analyzing XML node types Entities Attributes Connection nodes

Identifying two types of return information Explicit return nodes Implicit return nodes

Outputting an XML node based on its match type and node type

Experiments verify XSeek’s effectiveness and efficiency

Page 28: Identifying Meaningful Return Information for XML Keyword Search Yi Chen Ziyang Liu, Yi Chen Arizona State University

Thank You!Thank You!

Questions?Questions?

Welcome to visit XSeek demo in VLDB Welcome to visit XSeek demo in VLDB 0707