harikrishnan karunakaran sulabha balan cse 6339. introduction database and query model ◦...
TRANSCRIPT
![Page 1: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/1.jpg)
Harikrishnan Karunakaran Sulabha Balan
Keyword Searching and Browsing in Databases using BANKS
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, S. Sudarshan
CSE 6339
![Page 2: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/2.jpg)
Introduction
Database and Query Model◦ Informal Model◦ Formal Model◦ Query and Answer Model
Searching for the Best Answers◦ Backward Expanding Search Algorithm
Browsing through BANKS
Experience and Performance
Related Work
Conclusion
Outline
![Page 3: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/3.jpg)
With the onset of the web, number of users needing to access online databases have increased
Search engines have popularized use of unstructured querying which just needs the user to type in the keyword and follow links
Same methodology cannot be used in querying databases as knowledge of schema and querying language like SQL is needed
Keyword searching will not also work on datatbases because the data is usually spread across tables/tuples due to normalization
Introduction
![Page 4: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/4.jpg)
Browsing ANd Keyword Searching
Enables Keyword-Based search on Relational Databases along with Data and Schema Browsing
User interacts with data through typing keywords, following hyperlinks and using controls made available.
Absolutely no knowledge of querying or programming languages required of the user
BANKS
![Page 5: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/5.jpg)
BANKS contd…
![Page 6: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/6.jpg)
BANKS contd… Makes Joins Implicit and Transparent
Incorporates Notions of Proximity and Prestige
Methods to publish relational data which would otherwise remain invisible on the web are provided
Creates hierarchical and graphical views of data with hyperlinks to navigate through them
![Page 7: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/7.jpg)
Database and Query Model
![Page 8: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/8.jpg)
Schema
![Page 9: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/9.jpg)
Database Fragment Snapshot
![Page 10: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/10.jpg)
Answer to query should be sub-graph connecting nodes matching the keywords
Central node that connects all keyword nodes is Information Node and tree is Connection Tree. Foreign Key Dependencies constitute edges in the graph
Edges (References) are given weights according to its type
Weight of the tree is proportional to total of its edge weights and relevance is inversely proportional to its weight
Informal Model
![Page 11: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/11.jpg)
To obtain model with edges directed away from information node and preserve directionality we make use of backward edges
Backward Edge assigned weight proportional to in-degree
Information Node is selected from certain sets of nodes in the graph
Backward Edges ensure that tree is rooted at the Information Node
To avoid problem of “hubs”, edges connecting popular nodes are given a higher weight thus lowering proximity
Informal Model contd…
![Page 12: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/12.jpg)
Informal Model contd…
Concept of Node weights introduced to include Prestige Rankings
Nodes having more pointers given higher prestige
In BANKS node prestige is assigned based on in-degree of the node
Node weights and Tree Weights combined to obtain relevance score
![Page 13: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/13.jpg)
Formal Database Model Each tuple Τ has a corresponding node uτ
Each node u has a node weight N(u) depending upon the prestige of the node
Between each pair of related tuples T1 & T2 , graph contains edge between uτ1 to uτ2 and back edge from uτ2 to uτ1
Similarity between two relations R1 and R2 depends upon the type of link from R1 to R2 and is set to infinity if R1 does not refer to R2
![Page 14: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/14.jpg)
Edge weights
Depending upon importance of the link we set a value to the edge. Default value is 1
The weight of the directed edge(u, v) depends on factors: If (u, v) exists but (v, u) does not, assign the weight s(R(u),R(v)) to
(u, v) If (u, v) does not exist and (v, u) does, assign the weight INv (u)
s(R(v), R(u)) to (u, v) where INv is the indegree of u contributed by the tuples belonging to relation R(v)
If both (u, v) and (v, u) exist in the graph , assign the weight as the minimum of two values
min{s(R(u),R(v)) , INv (u) s(R(v), R(u))
Formal Database Model
![Page 15: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/15.jpg)
QueryA set of keywords e.g.{k1,k2,…kn}A set of nodes Si = {S1,S2,…Sn}Locate nodes matching search terms t1,t2,…tn
Answer Model A rooted directed tree connecting keyword nodes (at
least one node from Si).Note: Tree may also contain nodes not in any Si , Steiner
Tree
Relevance score of an answer treeCombination of its nodes and its edge weight presented
in decreasing order
Query and Answer Model
![Page 16: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/16.jpg)
Calculating Relevance Score involves adjustment of both node weights and edge weights along with a factor to control individual weight variations
Node weights Scaled to Nmax and depressed using log Nscore(v) = N(v)/ Nmax or log(1+N(v)/Nmax
Overall Nscore taken to be average of node scores
Edge Weights Normalized Escore(e) obtained by diving edge weight by minimum edge weight Escore(e) = log(1+w(e)/wmin) Overall Edge Score = 1/(1 + Σe Escore(e))
Combination of Overall Edge Score and Node Score Additive : (1-λ)Escore + λNscore Multiplicative : Escore * Nscoreλ
Query and Answer Model
![Page 17: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/17.jpg)
We have to use not just the tree with the highest relevance score but also those with high scores
Answers have to be generated incrementally so that the user are provided with the ‘best’ answers at the beginning
Resultant Graph is assumed to fit in memory since only Row IDs and index to map RowIDs to nodes in the graph need to be stored by us.
Searching for the Best Answers
![Page 18: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/18.jpg)
Incrementally computes search results
Start at leaf nodes each containing a query keyword
Run concurrent single source shortest path algorithm from each such node
Traverses the graph edges backwards Confluence of backward paths identify answer tree roots
Output a node whenever it is on the intersection of the sets of nodes reached from each keyword
Answer trees may not be generated in relevance order
Insert answers to a small buffer (heap) Output highest ranked answer from buffer to user when buffer is full
Backward Expanding Search
![Page 19: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/19.jpg)
Backward Expanding Search
![Page 20: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/20.jpg)
Backward Expanding Search
![Page 21: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/21.jpg)
Searching for Best Answers Model (Query : Roy Sudarshan)
![Page 22: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/22.jpg)
Backward Expanding Search
Due to the graphs being Steiner Trees lot of time is spent doing wasteful exploration of the graph
As keyword nodes increase, the feasibility of the algorithm decreases
Connection Trees are only approximately sorted in their increasing order of weights
Node weights are not considered, hence trees may not be produced in exact decreasing order of relevance
![Page 23: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/23.jpg)
BANKS system providesA rich interface to browse data stored in a relational
database
Automatically generates browsable views of database relations and query results
Schema browsing and data browsing
A hyperlink to the referenced tuple
Browsing through BANKS
![Page 24: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/24.jpg)
Browsing through BANKS contd…
Functionalities
Columns can be projected away
Selections can be imposed on columns
Joins can be performed with foreign key columns by joining them with referencing tables
Results can be grouped by on columns which returns only distinct values in column being displayed
Sorting can be done on columns
![Page 25: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/25.jpg)
Sample Browsing
![Page 26: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/26.jpg)
Cross Tabs
Group By Template to view Data hierarchially
Folder Views modeled after the folder view supported by Windows Explorer etc.
In the form of bar chart, line chart or pie chart with HTML image maps to embed hyperlinks in the graphics
Templates
![Page 27: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/27.jpg)
Datasets of varying sizes have been tested
No agreed upon benchmarks for Ranking Algorithms in this domain
System was found to return the most intuitive answers
Performance Evaluation
![Page 28: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/28.jpg)
Ideal answers were obtained using different queries
Compute absolute value of rank difference of the ideal answers with rank in the answers for given parameter setting
Sum of rank differences gives the raw error score for that parameter
We map error scores against λ and log-scaling of edge weights
Evaluating Error Score
![Page 29: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/29.jpg)
Error Scores
![Page 30: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/30.jpg)
Setting λ = 0.2 produced best results while λ = 1 produced worst with error scores of around 15
Log scaling of edge weight is important as otherwise back-edges from popular nodes would result in correct answers getting low relevance scores
Additive or Multiplicative combination has no effect on ranking
Node weights were abandoned as log-scaling and no log scaling produced same ranking
Observations
![Page 31: Harikrishnan Karunakaran Sulabha Balan CSE 6339. Introduction Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649ec65503460f94bd1714/html5/thumbnails/31.jpg)
Effective when using queries matching non-metadata keywords
Brings to light data that might not be readily available on the web to the non technical user
Higher the no. of keywords, the less useful backward-expanding search algorithm becomes
Over reliance on Java can at times cause slow down of application
Conclusions