1
Querying Infinite Databases
Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90)
Queries and Computation on the Web (Abiteboul and Vianu ’97)
Itay Maman
049011 Student Symposium, 5 July 2006
2/19
Simple Technion Queries…
(Domain: The Technion’s students database)
• Q1: Which courses did Gidi attend? SELECT course FROM students WHERE name='Gidi'
• Q2: Which students took 234218? SELECT name FROM students WHERE course='234218'
courses
course name
234218 Gidi
236703 Gidi
234218 Dina
… …
3/19
Simple Web Queries…
• Q3: Which pages does my home page link to? SELECT target FROM links WHERE source='www.geocities.com/mysite'
• Q4: Which pages link to my home page? SELECT source FROM links WHERE target='www.geocities.com/mysite'
• Q4 is challenging: No matter how long my web-crawler works… … I can never find all incoming links of a page! This is an infinite query
• The more you crawl the more answers you get (In Q3 the size of the result set is bounded)
linksSource target www.google.com www.google.co.il www.geocities.com/mysite www.ynet.co.il www.cnn.com www.geocities.com/mysite … …
4/19
Leading questions
• What does an infinite DB look like? • Can we evaluate a query over an infinite DB?• Can we determine the finiteness of a query?
• But first, some Datalog…
5/19
Datalog• Why Datalog?
Supports recursion/transitive closure (unlike SQL)• Recursion is essential in large data-sets
Terminates if DB is finite Very simple
• program = A collection of rules• rule = A sequence of terms
• In our program: Three rules Two queries (AKA: IDB): g(X), small(X,Y) One Table (AKA: EDB): before(X,Y) A goal predicate from which execution starts
• We choose g(X) as the goal
g(X) :- small(X,2).
small(X,Y) :- before(X,Y).
small(X,Y) :- small(X,Z), before(Z,Y).
g(X) :- small(X,2).
small(X,Y) :- before(X,Y).
small(X,Y) :- small(X,Z), before(Z,Y).
6/19
Finiteness
• A DB is finite If every table is a finite set before(X,Y) { (0,1), (1,2), (2,3) }
• Possible evaluation schemes: Brute force Bottom up
• Optimizations
•The Requirement: Finiteness of tables
•The guarantee: Termination of the Datalog program
7/19
Infinity
• Here is another definition for our table before(X,Y) { (X,X+1) | X 0 }
• We now have an infinite DB The Problem: we cannot iterate over the tuples in the set The solution: Top-down algorithm
• Such tables are quite common The internet links relation
links(X,Y) { (X,Y) | page X links to page Y } Java’s subclassing relation
extends(X,Y) { (X,Y) | class X extends Y }
Leading question:What does as infinite DB look like?
8/19
Example: Top-down evaluation
g(W) = s(W,2) = b(W,2) s(W,Z) b(Z,2) = {(1,2)} s(W,1) {(1,2)} = {(1,2)} [b(W,1) s(W,Z) b(Z,1)] {(1,2)} = {(1,2)} [{(0,1)} s(W,0) {(0,1)}] {(1,2)} = {(1,2)} [{(0,1)} [b(W,0) s(W,Z) b(Z,0)] {(0,1)}]
{(1,2)} = {(1,2)} [{(0,1)} [ s(W,Z) ] {(0,1)}] {(1,2)} = {(1,2)} [{(0,1)} {(0,1)}] {(1,2)} = {(1,2)} {(0,1)} {(1,2)} = {(1,2)} {(0,2)} = {(1,2),
(0,2)}
g(W) :- small(W,2).
small(A,B) :- before(A,B).
small(X,Y) :- small(X,Z), before(Z,Y).
before(X,Y) { (X,X+1) | X 0 }
g(W) :- small(W,2).
small(A,B) :- before(A,B).
small(X,Y) :- small(X,Z), before(Z,Y).
before(X,Y) { (X,X+1) | X 0 }
•b : before•s : small : Join
s(X,Y) = b(X,Y) s(X,Z) b(Z,Y)
9/19
Top-down evaluation• The Top-down algorithm
Init: assign r body of the goal Loop:
• (Intelligently) Pick a term, t, from r• If t is a query term:
Replace it with the union of the rules indicated by t
• If t is a table term: Replace it with the set generated by the table
• Replace s expressions (in r) with • Replace s expressions (in r) with s• Evaluate relational algebra expressions (if both sides are known)
Stop if no further replacements can be made
Leading question:Can we evaluate a query over an infinite DB?
Yes
10/19
Infinite Queries• Can the top-down algorithm run forever?
Yes
• Case 1: An table that returns an infinite result evenProduct(X,Y) { (X,Y) | X*Y mod 2 = 0 } divides(X,Y) { (X,Y) | X mod Y = 0 } links(X,Y) { (X,Y) | page X links to page Y }
• weak-safety: all intermediate results are finite
• Result #1 (Sagiv and Vardi ’90): Weak-safety is decidable given F/C (finiteness constraints) of tables
• F/C of evenProduct: None• F/C of divides: X => Y• F/C of links: X => Y
Algorithm: Tracking flow of values from assigned variables
11/19
g(W) = s(2,W) = b(2,W) s(2,Z) b(Z,W) = {(2,3)} s(2,Z) b(Z,W) = {(2,3)} [b(2,Z) s(2,Z’) b(Z’,Z)] b(Z,W)…
Infinite Queries (cont.)• Can the top-down algorithm run forever?
Yes
• Case 2: The algorithm’s recursion never stops A query/table is used in its “unbounded” direction
g(W) :- small(2,W).
small(A,B) :- before(A,B).
small(X,Y) :- small(X,Z), before(Z,Y).
before(X,Y) { (X,X+1) | X 0 }
g(W) :- small(2,W).
small(A,B) :- before(A,B).
small(X,Y) :- small(X,Z), before(Z,Y).
before(X,Y) { (X,X+1) | X 0 }
s(X,Y) = b(X,Y) s(X,Z) b(Z,Y)
• Results #2-3 (Sagiv and Vardi ’90): Termination is undecidable in the general case Termination is decideable if all queries are unary
12/19
Infinite Queries (summ.)
• We can automatically determine weak-safety
• We cannot (automatically) determine termination
• But, one can analytically prove that a given query over a given DB is finite E.g., our small(W,2) program
Leading question:Can we determine the finiteness of a query?
No
13/19
The Web as a DB
• The web data model (WDM): A scheme of a DB that can represent the web graph Just three tables:
urls = { u | u is a url of a web-page }links = { (u1,u2) | u1 links to u2; u1, u2 urls }Words = { (u,w) | w appears in page u; u urls }
• Result #4 (Abiteboul and Vianu ’97): If a Datalog program with no literals halts over
an infinite DB, its result is • => A non-trivial query (over an infinite DB) must have a literal
14/19
Web - Machines
• Browsing Machine A weakly safe Datalog program (over WDM) At least one URL literal
• Searching/Browsing Machine An unsafe Datalog program (over WDM)
• Evaluates queries in parallel Allowed literal types: URLs, Words
• Claims #1-2 (Abiteboul and Vianu ’97): Browsing machine:
• Represent a user following static links from a page Searching/Browsing machine:
• Also allows the user to access search engine
15/19
Discussion: Finite approximation• Relational Database servers are very popular
Such DBs are finite
• Also, computing a table on demand may be slow Better performance at batch processing
The challenge:
Build a finite replacement for an infinite DB
• Formally: Given a finite query, q, over an infinite DB,
• (Finiteness of q proved analytically)
Build a finite Database, , such that q over yield the same result as q over
16/19
Discussion: Finite approximation
• Example: Our small(W,2) program A finite, sound table: before(X,Y) { (0,1), (1,2) } A finite, unsound table: before(X,Y) { (0,1) }
• The process: Compute the transitive closure of the before relation Start from the literal ‘2’ at the right-hand side position
• Condition: the table graph must end with a sink In before the sink is the vertex ‘0’
• => We can build a finite DB
Sadly, In the web-graph no such sink exists
17/19
Discussion: Temporality
• Crawling takes time
• The subject may change while crawling The DB is a snapshot which never happened
• (Open Question):
• Can we decide whether a result was really “true” at some point?
18/19
More issues
• Relational algebra over large relations BDD
• Negation Stratified Datalog
19/19
- Questions ? -
20/19
21/19
Datalog
• Semantics: ???
• Straight forward mapping to Relational Algebra??
g(X) :- small(X,2).
small(X,Y) :- before(X,Y).
small(X,Y) :- small(X,Z), before(Z,Y).
g(X) :- small(X,2).
small(X,Y) :- before(X,Y).
small(X,Y) :- small(X,Z), before(Z,Y).
22/19
Example: Bottom-up evaluation
beforeX Y0 11 22 3
Initialization: Translate the EDBs into relations
23/19
Example: Bottom-up evaluation
smallX Y0 11 22 3
apply small(X,Y) :- before(X,Y).beforeX Y0 11 22 3
24/19
Example: Bottom-up evaluation
beforeZ Y0 11 22 3
apply small(X,Y) :- small(X,Z), before(Z,Y).lessX Z0 11 22 3
smallX Y0 11 22 30 21 3
Join
smallX Z0 11 22 3
beforeZ Y0 11 22 3
smallX Z0 11 22 30 21 3
smallX Z0 11 22 30 21 3
smallX Z0 11 22 30 21 30 3
25/19
Example: Bottom-up evaluation
apply g(X) :- small(X,2).smallX Y0 11 22 30 21 30 3
gX10
smallX Y0 11 22 30 21 30 3
26/19
Finiteness
before(X,Y) { (0,1) (1,2) (2,3) }
• The Bottom-up algorithm: Init:
• For each EDB, p, assign r(p) Relation of all tuples satisfying p• For each IDB, p, assign r(p)
Loop:• Choose a rule p(…) :- t1(…), t2(…), … tn(…)
• t join of all r(ti), where 1 i n• r(p) r(p) t
Continue until a fix-point is reached•Requires: Finiteness of EDBs•Ensures: Termination