l8-s1 datalog queries 2003 sjsu -- cmpe database design dr. m.e. fayad, professor computer...

26
2003 SJSU -- CmpE L8-S1 Datalog Queries Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San José State University One Washington Square San José, CA 95192-0180 http://www.engr.sjsu.edu/~fayad

Upload: bryan-arnold

Post on 16-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU -- CmpE L8-S1 Datalog Queries

Database Design

Dr. M.E. Fayad, Professor

Computer Engineering Department, Room #283I

College of Engineering

San José State University

One Washington Square

San José, CA 95192-0180

http://www.engr.sjsu.edu/~fayad

Page 2: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S2 Datalog Queries

2

Lesson 08:Datalog Queries

Page 3: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S3 Datalog Queries

Lesson Objectives

Objectives

3

Understand datalog langauge Learn about:

Syntax of Datalog Rules

Datalog with Sets

Datalog with Abstract Data Types

Semantics

Recursive Datalog Queries

Page 4: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S4 Datalog Queries

Datalog is a rule-based language that is related to Prolog

Each rule is a statement has some points that belong to some relations and other points must belong to a defined relation.

Each Datalog query contains a Datalog program and an input database.

4

Datalog

Page 5: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S5 Datalog Queries

Datalog query – a finite set of rules of the form:

R0(x1,…,xk) :– R1(x1,1,…, x1,k1),..., Rn(xn,1,…, xn,kn

)

where each Ri is either an input or a defined relation name. including built-in relations such as +(x,y,z) which meansx + y = z. (We normally use the latter syntax.)

The preceding rule is read “R0 is true if R1 and .. and Rn are all true.

head of the rule – R0

body of the rule – R1,…,Rn 5

Datalog Queries: Syntax

Page 6: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S6 Datalog Queries

[4.1.1] See Taxrecord and Taxtable (Page 8)

Find the SSN and the tax.

Tax_Due(s, t) :– Taxrecord(s, w, i, c), Taxtable(inc, t),

w+i+c = inc.

Requires one rule where Taxrecord and Taxtable are input database relations and Tax_Due is the only defined relation. 6

Datalog Queries: Syntax – Example (1)

Page 7: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S7 Datalog Queries

[4.1.2] Suppose that relation Street (n, x, y) contains all combinations of street name n and locations (x, y) such that a location belongs to the street.

Find the streets that can be reached from (x0,y0).

Reach(n) :– Street(n, x0, y0). -- Rule 1

This rule says that street n is reachable if it contains the initial point.

Reach(n) :– Reach(m), Street(m, x, y), Street(n, x, y). – Rule 2

This rule says that if m is reachable and m and n intersect on some point, than n is also reachable. 7

Datalog Queries: Syntax – Example (2)

Page 8: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S8 Datalog Queries

[4.1.3] Find the time to travel from x to y.

Travel(x, y, t) :– Go(x, 0, y, t).

Travel(x, y, t) :– Travel(x, z, t2), Go(z, t2, y, t).

The head defines the travel(x, y, t) relation, which is true if it is possible to travel from city x to city y in time t.

8

Datalog Queries: Syntax – Example (3)

Page 9: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S9 Datalog Queries

[4.1.8] Find town points covered by a radio station or find every place in town that can e reached by at least one broadcast station.

A kind of “map overlay problem” In this kind of problems, the scales and the

points of the maps to be overlayed are not the same.

Suppose that a relation Parameters(ID, Scale, X0, Y0) records for each map it scale and point of origin.

9

Datalog Queries: Syntax – Example (4)

Page 10: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S10 Datalog Queries

Suppose a town map where each point corresponds to a half kilometer, and in the broadcast map each point corresponds to a kilometer and the points of origin are the same for both maps.

Then (1, 1, 0, 0), (2, 1, 0, 0), (3, 1, 0, 0), and (San Jose, 0.5, 0, 0) would be parameters relation.

10

Datalog Queries: Syntax – Example (4)

Page 11: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S11 Datalog Queries

We can find the points in town that are covered using the following query

Covered(x2, y2) :– Broadcast(n, x, y), Town(t, x2, y2), Parameters(n, s, blat, blong),

Parameters(t, s2, tlat, tlong),

x2 = s/ s2 x + (tlat – blat),

y2 = s/ s2 y + (tlong – blong). This query at first scales up and shift every point (x, y) in

the broadcast map n to match the scale and point of the town map.

If it corresponds to a point (x2, y2) in the town map, then it is added to relation Covered.

11

Datalog Queries: Syntax – Example (4)

Page 12: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S12 Datalog Queries

12

Datalog Queries: Syntax – Example (4)

Page 13: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S13 Datalog Queries

We think of the domain of each attribute is a set. We can represent an undirected graph in two

relations: Vertices and Edge (X1, X2) that contains a pair of singleton sets of city names iff there is an edge between them. For example:

Edge ({San Jose}, {San Francisco}) would be one tuple in the Edge relation.

13

Datalog with Sets (1)

Page 14: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S14 Datalog Queries

Hamiltonian Cycle inAn Undirected graph is a path that starts and ends with the same vertex and goes through each vertex exactly once.

Assume Start (X) is an input relation where X is a singleton set containing the name of starting vertex of the cycle.

Find a Hamiltonian Cycle where a set of vertices A not yet visited.

14

Datalog with Sets (2)

Page 15: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S15 Datalog Queries

Example: Hamiltonian CycleInput:

• Vertices(S) where S is a set of vertices

• Edge ({c1}, {c2}) if there is an edge from c1 to c2

• Start({c}) where c is start city name

Output:• Path ({c}, B) if there is a path from c that

uses all vertices except those in B.

• Hamiltonian ({c}) if there is a Hamiltonian path. 15

Datalog with Sets (3)

Page 16: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S16 Datalog Queries

Base case – Path is a single vertex. All vertices except the start vertex is unvisited.

Path(X1, B) :– Vertices(A), Start(X1), B = A \ X1.

Recursion –

a path to X1 with B unvisited

exists if there is

Path(X1, B) : – Path(X2, A), a path to X2 with A unvisited

Edge(X2, X1), and an edge from X2 to X1,

X1 A, which is unvisited, and

B = A \ X1. B is A minus X1 16

Datalog with Sets (4)

Page 17: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S17 Datalog Queries

If there is a path from start to X2 that visits all

vertices and an edge from X2 to start, then there

is a Hamiltonian cycle.

Hamiltonian(X1) :– Path(X2, ),

Edge(X2, X1),

Start(X1).

17

Datalog with Sets (5)

Page 18: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S18 Datalog Queries

The domain of each abstract data type ranges over some set S.

For example, in the Street example, the Extend attribute was an abstract data type that ranged over sets of points in the plane over R2 where R is the set of real numbers.

With each abstract data type only certain operations are allowed and can be used with the query. 18

Datalog with Abstract Data Types (1)

Page 19: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S19 Datalog Queries

For an abstract data type that ranges over points in the plane, all the basic set operations can be allowed:

– equality (=)– containment – intersection – union U

19

Datalog with Abstract Data Types (2)

Page 20: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S20 Datalog Queries

Example:

Streets(Name, Extent) where extent is a set of 2D points. Let (x0, y0) be a start location.

Express the reach relation:

Reach(n) :– Street(n, Extent), {(x0,y0)} Extent.

Reach(n) :– Reach(m), Street(m, S1), Street(n, S2),

S1 S2 20

Datalog with Abstract Data Types (3)

Page 21: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S21 Datalog Queries

Rule instantiation – substitution of variables by constants

⊢Q,I R(a1,…..ak) – R(a1,….ak) has a proof using query Q

and input database I, iff

• R represents input relation r and (a1,….ak) r , or

• There is some rule and instantiation

R(a1,…,ak):–R1(a1,1,…,a1,k1),…, Rn(an,1,…, an, kn

).

where ⊢Q,I Ri(ai,1,…,ai,ki) for each 1 i n .

21

Semantics (1)

Page 22: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S22 Datalog Queries

Reach(Vine)

Reach(Vine) :– Street(Vine, 5, 2). -- from point (5, 2)

Reach(Bear)

Reach(Bear) :– Reach(Vine), Street(Vine, 5, 12), Street(Bear, 5, 12). -- (5, 12) is the intersection point

Reach(Hare) Reach(Hare) :– Reach(Bear), Street(Bear, 8, 13),

Street(Hare, 8, 13). 22

Semantics: Example (1)

Page 23: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S23 Datalog Queries

Example: By the input database (Figure 1.2): Go(Omaha, 0, Lincoln, 60) Go(Lincoln, 60, Kansas_City, 210)

Go(Kansas_City, 210, Des_Moines, 390) Go(Des_Moines, 390, Chicago, 990)We also have:

Travel(Omaha, Lincoln, 60) Travel(Omaha, Lincoln, 60):- Go(Omaha, 0, Lincoln, 60) Travel(Omaha, Kansas_City, 210) Travel(Omaha,Kansas_City,210):- Travel(Omaha,Lincoln,60),

Go(Lincoln,60,Kansas_City,210). 23

Semantics: Example (2)

Page 24: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S24 Datalog Queries

24

Proof Trees (1)

Page 25: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S25 Datalog Queries

Proof-based semantics – derived relations are the set

of tuples that can be proven.

Fixed point semantics – an interpretation of the derived

relations such that nothing new

can be proven.

Least fixed point semantics – smallest possible FP semantics.

Proof-based semantics = Least fixed point semantics 25

Proof Trees (2)

Page 26: L8-S1 Datalog Queries 2003 SJSU -- CmpE Database Design Dr. M.E. Fayad, Professor Computer Engineering Department, Room #283I College of Engineering San

2003 SJSU – CmpE --- M.E. Fayad L8-S26 Datalog Queries

T/F

1. Datalog is an extension of SQL.

2. Datalog is a rule-based language and it is related to Prolog.

3. A fact is the same thing as a tuple in a relation or a row in an SQL table.

4. A rule is a way to derive new facts, that is, a rule is (part of) a query.

5. Datalog is more powerful than SQL3 since it permits a more general form of recursion.

Discuss: Since we can do selection, projection, and join in Datalog just like in SQL and relational algebra, is there any difference among the three? 26

Discussion Questions