keyword searching and browsing in databases using banks seoyoung ahn mar 3, 2005 the university of...

27
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Upload: arleen-wilkerson

Post on 03-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Keyword Searching and Browsing in Databases using

BANKS

Seoyoung Ahn

Mar 3, 2005

The University of Texas at Arlington

Page 2: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Outline

Introduction Database and Query Model Searching for the best answers Browsing features of BANKS Experiment Conclusion

Page 3: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Introduction

Search engines on Web have popularized an unstructured querying and browsing Simple and user-friendly Users just type in keywords and follow hyperlink

Relational databases are commonly searched using structured query language Users need to know the schema

Keyword searching techniques cannot be used on data stored in databases It often splits across the tables/tuples due to normalization

Page 4: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Introduction(cond..)

BANKS (Browsing And Keyword Searching) a system which enables keyword-based search on

relational databases, together with data and schema browsing

User BANKSsystem

DatabaseHTTP JDBC

Page 5: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Introduction(cond..)

BANKS (Browsing And Keyword Searching) a framework for keyword querying of relational

database a novel and efficient heuristic algorithm for executing

keyword queries key features of BANKS system

Page 6: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Outline

Introduction Database and Query Model

Informal Model Formal Model Query and Answer Model

Searching for the best answers Browsing features of BANKS Experiment Conclusion

Page 7: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

Informal Model

Model Description

each tuple in db

fk-pk-Link

database

node in the graph

directed edge

directed graph

Page 8: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

The Schema

Page 9: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

A Fragment of the Database

Page 10: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

Informal Model(cond.) An answer to a query should be a subgraph

connecting nodes matching the keywords. The importance of a link depends upon the type of

the link i.e. what relations it connects and on its semantics

Ignoring directionality would cause problems because of “hubs” which are connected to a large numbers of nodes.

Page 11: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

Formal Database ModelNodes and edges

Node Weight : N(u)

Depends on the prestige

Set the node prestige = the indegree of the node

Nodes that have multiple pointers to them get a higher prestige

Node score N = root node weight

+ ∑ leaf node weight

Page 12: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

Formal Database Model (Cond.) Edge Weights

Some pupluar tuples can be connected many other tuples Edge with forward and backward edge weights

Weight of a forward link = the strength of the proximity relationship between two tuples

(set to 1 by default)

Weight of a backward link = indegree of edges pointing to the node

Total edge weight = ∑ edge weights

Edge score E = 1 / Total edge weight

Page 13: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

Formal Database Model (Cond.) Overall relevance score

= Node weights + Edge Weight

Normalize in the range [0,1]

Combine using weighting factor Additive: (1- ) E + N;

multiplicative: E * N

Page 14: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model

Query and Answer Model Query

A set of keywords e.g.{k1,k2,…kn}

A set of nodes Si = {S1,S2,…Sn}

Locate nodes matching search terms t1,t2,…tn

Answer Model A rooted directed tree connecting keyword nodes

Relevance score of an answer tree

Relevance scores of it nodes and its edge weight

Page 15: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model Answer Model

A rooted directed tree connecting keyword nodes

Multiple answersRanked by proximity + prestige

Proximity edges weights

Prestige indegree of nodes

Relevance score of an answer treeRelevance scores of it nodes and its edge weight

Page 16: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Database and Query Model Result of query “sudarshan soumen”

Page 17: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Outline

Introduction Database and Query Model Searching for the best answers

Backward expanding search algorithm

Browsing features of BANKS Experiment Conclusion

Page 18: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Searching for the best answers Backward expanding search algorithm

Offers a heuristic solution for incrementally computing query results.

Assume that the graph fits in memory Start at leaf nodes each containing a query keyword Run concurrent single source shortest path algorithm from each

such node Traverses the graph edges backwards Confluence of backward paths identify answer tree roots

Output a node whenever it is on the intersection of the sets of nodes reached from each keyword

Answer trees may not be generated in relevance orderInsert answers to a small buffer (heap)

Output highest ranked answer from buffer to user when buffer is full

Page 19: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Searching for the best answers

Model (Query : Charuta Sudarshan Roy )

S. Sudarshan

Prasan Roy

writes

author

paper

Charuta

BANKS: Keyword search…

Page 20: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Outline

Introduction Database and Query Model Searching for the best answers Browsing features of BANKS Experiment Conclusion

Page 21: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Browsing

BANKS system provides A rich interface to browse data stored in a relational

database Automatically generates browsable views of database

relations and query results Schema browsing and data browsing A hyperlink to the referenced tuple Templates for several predefined ways of displaying

data

Page 22: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Browsing Data browsing

Page 23: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Browsing Schema browsing

Page 24: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Outline

Introduction Database and Query Model Searching for the best answers Browsing features of BANKS Experiment Conclusion

Page 25: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Error scores vs parameter choices

The rankings are

relatively stable across different choices of parameter values

= 0.2 coupled

with log scaling of

edges weights

does best

Page 26: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Outline

Introduction Database and Query Model Searching for the best answers Browsing features of BANKS Experiment Conclusion

Page 27: Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington

Seoyoung Ahn Keyword Searching and Browsing in Databases using BANKS

Conclusion

BANKS system provides an integrated browsing and keyword

querying system for relational databases allows users with no knowledge of database systems

or schema to query and browse relational database with ease