1 data models and query languages cse 590db, winter 1999 theory of databases zack ives january 10,...

21
1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

Upload: janel-matthews

Post on 05-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

1

Data Models and Query Languages

CSE 590DB, Winter 1999Theory of Databases

Zack IvesJanuary 10, 1999

Page 2: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

2

The Relational Model

Database consists of relations, where columns are attributes and rows are tuples

Declarative queries using an algebraic or logic language (or SQL, which we shall ignore)

attribute

tuple

arity = 3, cardinality = 3NewMovies(Name, Rating, Director)

NewMoviesName Rating DirectorThe Phantom Menace PG George LucasPatch Adams PG-13 Tom ShadyacStepmom PG-13 Chris Columbus

Page 3: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

3

Schema

Each attribute in a relation has a type:NewMovies(Name: String, Rating: RatingDomain, Director:

String), where RatingDomain = {G, PG, PG-13, R, NC-17}(In many cases, we will omit the type, and assume the type

most appropriate to the expression)

Relation schema: relation name, attributes, and types Relation instance: a set of tuples for a given schema Database schema: set of relation schemas Database instance: relation instance for every

relation schema

Page 4: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

4

More on Tuples

Formally, a tuple is a mapping from attribute names to (correctly typed) values: Director -> “George Lucas” Name -> “Stepmom”

May specify a tuple using values & schema notation:NewMovies(“The Phantom Menace”, “PG”, “George Lucas”)

Can have constraints on attribute values: Integrity constraints Keys Foreign keys Functional dependencies

Page 5: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

5

Relational Algebra

All operators take one or more sets of tuples as input, and produce a new set as output Basic set operators (, , —, but normally no

complement) Selection () Projection () Cartesian Product () Join () Division ()

Page 6: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

6

Set Operations

Sets must be compatible! Same number of attributes Same types/domains

R1 R2 = {all tuples in either R1 or R2}R1 R2 = {all tuples in both R1 and R2}Complement???Negation: R1 — R2 = {all tuples in R1 not

in R2}Query without unions is conjunctive

Page 7: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

7

Selection

Chooses a subset of tuples satisfying a predicate

Rating = “PG-13” (NewMovies)

Name Rating DirectorPatch Adams PG-13 Tom ShadyacStepmom PG-13 Chris Columbus

NewMoviesName Rating DirectorThe Phantom Menace PG George LucasPatch Adams PG-13 Tom ShadyacStepmom PG-13 Chris Columbus

Page 8: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

8

Projection

Chooses a subset of attributes, removes duplicates

NewMoviesName Rating DirectorThe Phantom Menace PG George LucasPatch Adams PG-13 Tom ShadyacStepmom PG-13 Chris Columbus

Name DirectorThe Phantom Menace George LucasPatch Adams Tom ShadyacStepmom Chris Columbus

Name, Director (NewMovies)

Page 9: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

9

Cartesian Product

Combine two relations - all pairings of tuples

NewMovieName DirectorPhantom Menace George LucasPatch Adams Tom ShadyacStepmom Chris Columbus

MetroShowsName TimePatch Adams 7:00Patch Adams 9:40Stepmom 6:50Stepmom 9:00

NewMovie MetroShowsName Director Name TimePhantom Menace George Lucas Patch Adams 7:00Phantom Menace George Lucas Patch Adams 9:40Phantom Menace George Lucas Stepmom 6:50Phantom Menace George Lucas Stepmom 9:00Patch Adams Tom Shadyac Patch Adams 7:00… … … ...

Page 10: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

10

Join

Combine tuples from two relations by predicate. If predicate is equality, remove duplicate attribute.

NewMovieName DirectorPhantom Menace George LucasPatch Adams Tom ShadyacStepmom Chris Columbus

MetroShowsName TimePatch Adams 7:00Patch Adams 9:40Stepmom 6:50Stepmom 9:00

NewMovie NewMovie.Name = MetroShows.Name MetroShows

Name Director TimePatch Adams Tom Shadyac 7:00Patch Adams Tom Shadyac 9:40Stepmom Chris Columbus 6:50Stepmom Chris Columbus 9:00

Equivalent to:Name, Director, Time ( NewMovie.Name = MetroShows.Name

(NewMovie MetroShows))

Page 11: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

11

Division

Only returns results from dividend which match attributes of all of tuples in divisor (“for all”)Customersname branchJohnson DowntownSmith BrightonLindsay RedwoodGreen BrightonGreen Downtown

BranchesbranchBrightonDowntown

Customers BranchesnameGreen

Page 12: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

12

Logic-Based Query Languages

Tuple-relational calculus Based on first-order predicate calculus, but

imposes restrictions (e.g., no function symbols)

Equivalent in power to relational algebra

Datalog More powerful than tuple-relational calculus

Supports recursion and thus transitive closure

Equivalent to set of Horn clauses

Page 13: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

13

Predicates & Atoms

Relations represented by predicatesTuples represented by atoms:

Flight(“Alon”, “British Airways”, 1234, “Seattle”, “Israel”, “1/9/1999”, “10:00”)

Arithmetic atoms:X < 100, X + Y + 5 > Z * 2

In certain cases, negated atoms:not Flight(“Zack”, “British Airways”, 1234, “Seattle”,

“Israel”, “1/9/1999”, “10:00”)

Page 14: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

14

Datalog Rules & Programs

“Pure” datalog rules:head :- atom1, atom2, atom3

where all atoms are non-negated and relational

Datalog program is set of datalog rules Conjunctive query has single rule:

HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M)

Disjunctive query:HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M)HanksDirector(D) :- ProducedBy(“Hanks”, M) & DirectedBy(D, M)

Page 15: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

15

Intension & Extension

Distinction between EDB & IDB: Extensional DB

what’s stored in the databasecan only be in the body of a Datalog rule

Intensional DBresult of a datalog rulecan be in head or body of a Datalog rule

Page 16: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

16

Evaluating Rules

Evaluating rules: Start with EDB, iteratively derive facts for IDB’s Repeat until cannot derive any new facts:

Consider every possible assignment from database to variables in body

If each atom in body is made true by assignment, add tuple for the head into the head’s relation

Conjunctive queries are inexpensive Disjunctive are more expensive (even though

they’re Horn clauses)

Page 17: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

17

Transitive Closure

Suppose we to know all possible destination airports reachable from SeaTac, given an EDB Flight(From, To):

SEA

SFO

LAX

JFK

LGA DEN

We need to express the query: Find all airports reachable from SEA

Page 18: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

18

Recursion in Datalog

Program:Dest(X, Y) :- Flight(X, Y)Dest(X, Y) :- Dest(X, Z), Flight(Z, Y)

Evaluate unknown # of times until fixpoint Flight: {(SEA,SFO), (SEA,LAX), (SEA,JFK), (LAX,LGA),

(JFK,LGA),(LGA,DEN)}Dest0: {}

Dest1: Dest0 {(SEA,SFO), (SEA,LAX), (SEA,JFK)} Dest2: Dest1 {(SEA,LGA),(LAX,DEN),(JFK,DEN)} Dest3: Dest2 {(SEA,DEN)} No more changes, so stop (minimum fixpoint)

Page 19: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

19

Built-in Predicates

Rules may include atoms with built-in predicates:Expensive(X) :- Product(X, Y, P) & P > 100

But need to restrict use of built-in atoms:P(X) :- R(X) & X < YWhen is X < Y?

We require that every variable from a built-in atom appear in a relational atom

Page 20: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

20

Negated Subgoals

Restrict forms:P(X, Y) :- Between(X, Y, Z) & NOT Direct(X,Z)

Bad:Q(X, Y) :- From(X) & NOT To(Y)

Bad but fixable:P(X) :- From(X) & NOT Connects(X, Y)

Rewrite as:Connects’(X) :- Connects(X,Y)P(X) :- End(X) & NOT Connects’(X)

Page 21: 1 Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

21

Stratified/Stratisfiable Rules

Predicate P depends on predicate Q if Q appears negated in a rule defining P

If cycle in dependency graph, program is not stratifiable:p(X) :- r(X) & NOT q(X)

q(X) :- r(X) & NOT p(X)

Suppose r has tuple {1}

Intuitively, stratification provides a way of executing program P as sequence of subprograms P1 … Pn defining IDB relations w/o “forward references”. Results independent of stratification.