lecture 7: foundations of query languages

32
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001

Upload: camden-davenport

Post on 30-Dec-2015

27 views

Category:

Documents


1 download

DESCRIPTION

Lecture 7: Foundations of Query Languages. Tuesday, January 23, 2001. A History of DB Theory: In the Beginning. Up to 1970, a “database” was a file of records COBOL/CODASYL Network model, with low level navigational interface Codd proposed the relational model in 1970 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 7: Foundations of Query Languages

Lecture 7: Foundations of Query Languages

Tuesday, January 23, 2001

Page 2: Lecture 7: Foundations of Query Languages

A History of DB Theory:In the Beginning...

• Up to 1970, a “database” was a file of records– COBOL/CODASYL– Network model, with low level navigational interface

• Codd proposed the relational model in 1970– Database = a first order structure

• This was a great vision; it took 10 years for the community to adopt it

• Today: relational databases heralded as major success of theory

Page 3: Lecture 7: Foundations of Query Languages

The Golden Years

• The 80s: rich research work on foundations • Relational model and algebra:

– Theory of functional dependecies– Transaction processing

• Study other data models:– Complex objects, object oriented

• Study other query languages:– Query complexity descriptive complexity

• Study other applications:– Distributed query processing, semijoin reduction– Partial information

Page 4: Lecture 7: Foundations of Query Languages

• But practical database interested only in:– one particular language (SQL)– one particular application (OLTP queries) and

one particular architecture (client-server)

• Transaction processing = useful

• Functional dependencies = somewhat

• The rest = great but useless

Page 5: Lecture 7: Foundations of Query Languages

Database Theory in the Web Age

• Sudden interest in changing everything– Web data is not relational: what is it ?

• The XML-Schema has a few hundreds pages; how to understand it ?

– New query languages are not relational algebra: what are they ?

• W3C is designing a new XML query language; how to proceed ?

– New architectures that are not client-server:• Distributed data, incomplete information, etc.

Page 6: Lecture 7: Foundations of Query Languages

Our Goal

• Talk about fundamental concepts in the theory of the relational model and relational query languages:

• Use AHV’s book liberally

Page 7: Lecture 7: Foundations of Query Languages

• Given:– a vocabulary, R1, …, Rk

– An arity, ar(Ri), for each i=1,…,k

– an infinite supply of variables x1, x2, x3, …

• FO formulas, , are:

First Order Logic

,' ,'

. , . xx

ji)ar(R1i x x),x,...,(xRi

Sometimes we also allow constants

Page 8: Lecture 7: Foundations of Query Languages

Examples of FO Formulas

))),(),(.(),(.(. yzRzxRzyxRyx

),(. xxRx

)),(),(.( xyRyxRy x is a free variable

“a b” abbreviates as usual “¬a V b”

Bound and free variables defined the usual way

Page 9: Lecture 7: Foundations of Query Languages

Models for FO

• Given a vocabulary R1, …, Rk

• A model is D = (D, R1, …, Rk)

– D = a set, called domain, or universe

– Ri D x D x ... x D, (ar(Ri) times) i = 1,...,k

• The model is finite if R1, ..., Rk is finite

• E.g. D = int, while R1,...,Rk are finite sets

Page 10: Lecture 7: Foundations of Query Languages

Remarks

• Vocabulary R1, …, Rk = database schema

• Model = database instance

• Abuse of notation: Ri and Ri

• Abuse of notation: D and D

• We are interested in finite models, but we will consider infinite models too, for a while

Page 11: Lecture 7: Foundations of Query Languages

Meaning (Semantics) of FO formulas

• Given:– A formula , with free variables x1, ..., xn

(we write )

– A model (D, R1, ..., Rk)

• We say that is true on a1, ..., an D:

– In notation: D |= a1, ..., an)

– Defined inductively (next)

)x,...,(x n1

Page 12: Lecture 7: Foundations of Query Languages

Meaning of FO formulas

]a...,[a' | D and ]a...,[a| D if ],a...,[a' |D n1n1n1

b],a,...,[a | D s.t. D b exists thereif ]a,...,[ax.|D k1k1

in1n1i R )a ..., ,(a if ),a,...,(aR | D

jiji aa if ), x (x | D

b],a,...,[a | D D, b allfor if ]a,...,[ax.|D k1k1

(similarly for OR and NOT)

Page 13: Lecture 7: Foundations of Query Languages

FO Formulas as Queries

• Given:– A FO formula– A (finite) model D = (D, R1, ..., Rk)

• The answer of evaluating on D is:

• Hence: an FO formula defines a function mapping a database to a relation

)x,...,(x n1

]}a,...,[a|D|)a,...,{(a (D) n1n1

Page 14: Lecture 7: Foundations of Query Languages

Examples of Formulas = Queries

1

2

4

3

D =

Vocabulary: single relation R

1 2

2 1

2 3

1 4

3 4

R=Graphs are the most“common” models

Page 15: Lecture 7: Foundations of Query Languages

Examples of Formulas = Queries

z))z.(R(y,y)y.(R(x,q3

)(1,q1 x Notice: uses a constant, 1Looks for successors of 1Answer: q1(D) = {2, 4}

)),(),(.(q2 yzzxz Looks for pairs (x,y) connected by paths of length 2Answer: q2(D) = {(1,1), (2,2), (1,3), (2,4)}

Answer: q3(D)={1}

Page 16: Lecture 7: Foundations of Query Languages

Boolean Queries

A boolean query is one without free variablesIts answer is true or false

y))R(x, y))R(v, v)v.R(y, x)R(u, u)u.R(x,y.((x.qgraph in the nodes arey and that x says

4

Tests for a clique

Page 17: Lecture 7: Foundations of Query Languages

More Examples

• Vocabulary (= schema):– Employee(name, office, mgr), Manager(name, office)

• Queries:– Find offices:

– Find offices with at least two employees:

– Find managers that share office with all their employees:

z))x,z.(E(y,y.(x)q1

))y(y)zx,,E(y)zx,,.(E(yz.y.z.y(x)q 21221122112

v))yx)v,v.(E(u,u. y)y.(M(x,(x)q3

Page 18: Lecture 7: Foundations of Query Languages

Properties of Queries

• Decidable

• Generic

• Domain-independent

• They make more sense if we think of queries in general, not just FO queries

• Define next general queries

Page 19: Lecture 7: Foundations of Query Languages

Queries

• A query, q, is a function from models to relations, s.t. for every model (D, R1, ..., Rk):

– q(D, R1, ..., Rk) = R, s.t. R Dn

• Here n is called the arity of q; when n=0, q is called a boolean query

Page 20: Lecture 7: Foundations of Query Languages

Property 1: Decidable Queries

• q is decidable if there exists a Turing Machine that, for some encoding of D, given R1, ..., Rk on its input tape, computes q(D, R1, ..., Rk)

Page 21: Lecture 7: Foundations of Query Languages

Property 2: Domain Independence

• In English– q only depends on R1, ..., Rk, not on D !

– Intuition: a database consists only of R1, ..., Rk, not on D.

• Formally: a query q is domain independent if– for any model (D, R1, ..., Rk)

– for any set D’ s.t. R1 (D’)ar(R1), ..., Rk (D’)ar(Rk)

– the following holds• q(D , R1, ..., Rk) = q(D’, R1, ..., Rk)

Page 22: Lecture 7: Foundations of Query Languages

Property 2: Domain Independence

Examples:• Queries that are domain independent:

– “Find pairs of nodes connected by a path of length 2”– “Find the manager of Smith”– “Find the largest salary in the database”

• Queries that are not domain independent:– “Find all nodes that are not in the graph”– “Find the average salary”

Page 23: Lecture 7: Foundations of Query Languages

Property 3: Genericity

• In English:– q does not depend on the particular encoding of the

database

• Formally: – for every h:(D,R1, ...,Rk) (D’,R’1, ...,R’k)

– s.t. h=injective, h(D) = D’, h(R1)=R’1,..., h(Rk)=R’k

– It follows: h(q(D ,R1, ...,Rk)) = q(D’,R’1, ...,R’k)

Page 24: Lecture 7: Foundations of Query Languages

Property 3: Genericity

Example:

1

2

4

3

D =

10

20

40

30

D’=

q(D)={1,3}

q(D’)= ??

Page 25: Lecture 7: Foundations of Query Languages

Property 3: Genericity

Examples:• Queries that are generic:

– “Find pairs of nodes connected by a path of length 2”– “Find all employees having the same office as their manager”– “Find all nodes that are not in the graph”

• Queries that are not generic:– “Find the manager of Smith”

• we often relax the definition to allow this to be generic• C-genericity, for a set of constants C

– “Find the largest salary in the database”

Page 26: Lecture 7: Foundations of Query Languages

Property 3: Genericity

More example:

1

2

4

3

D = q(D)={4}

This query cannot be generic (why ?)

Page 27: Lecture 7: Foundations of Query Languages

Back to FO Queries

1. All FO queries are computable

2. NOT All FO queries are domain independent– Why ? Next...

3. All FO queries are generic– In particular query on previous slide not

expressible in FO

Page 28: Lecture 7: Foundations of Query Languages

FO Queries and Domain Independence

• Find all nodes that are not in the graph:

• Find all nodes that are connected to “everything”:

• Find all pairs of employees or offices:

• We don’t want such queries !

x)z.R(z,y)y.R(x,q

y)y.R(x,q

Office(y)Emp(x)y)q(x,

Page 29: Lecture 7: Foundations of Query Languages

FO Queries and Domain Independence

• Domain independent FO queries are also called safe queries

• Definition. The active domain of (D, R1, ..., Rk) is Da = the set of all constants in R1, ..., Rk

• E.g. for graphs, Da =

• Very important:– If a query is safe, it suffices to range quantifiers only

over the active domain (why ?)

x)}z.R(z,y)y.R(x,|{x

Page 30: Lecture 7: Foundations of Query Languages

FO Queries and Domain Independence

• The bad news:– Theorem It is undecidable if a given a FO

query is safe.

• The good news:– no big deal– can define a subset of FO queries that we know

are safe = range restricted queries (rr-query)– Any safe query is equivalent to some rr-query

Page 31: Lecture 7: Foundations of Query Languages

Range-restriction

• Syntactic, rather ad-hoc definition (several exists):

• OK, not OK

• OK, not OK

• OK, not OK

• If a query q is safe, it is equivalent to a rr-query:

x)R(x,

y))R(y, y.(S(y) y))R(y,y.(

y))R(y, y.(S(y) y))R(y,y.(

)D y.(y toy. a )D y.(y toy. a

y,...))R(x,...D y D (x toy,...)R(x, aa

x)R(x,

Page 32: Lecture 7: Foundations of Query Languages

FO = Relational Algebra

• Recall the 5 operators in the relational algebra:– U, -, x, ,

• Theorem. A domain independent query is expressible in FO iff it is expressible in the relational algebra