lecture 9: query complexity

36
Lecture 9: Query Complexity Tuesday, January 30, 2001

Upload: selena

Post on 25-Feb-2016

27 views

Category:

Documents


1 download

DESCRIPTION

Lecture 9: Query Complexity. Tuesday, January 30, 2001. Outline. Properties of queries Relational Algebra v.s. First Order Logic Classical Logic v.s. Logic on Finite Models Query Complexity start today, finish Thursday Reading assignment: Sections 1-3 from the paper. A Note on Notation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lecture 9: Query Complexity

Lecture 9: Query Complexity

Tuesday, January 30, 2001

Page 2: Lecture 9: Query Complexity

Outline

• Properties of queries• Relational Algebra v.s. First Order Logic• Classical Logic v.s. Logic on Finite Models• Query Complexity

– start today, finish Thursday

• Reading assignment:– Sections 1-3 from the paper

Page 3: Lecture 9: Query Complexity

A Note on Notation

• Used to denote models D = (D, R1, ..., Rk)

• New notation: D = (D, R1, ..., Rk)– model is in boldface, domain is in normal font

Page 4: Lecture 9: Query Complexity

Properties of Queries

• Decidable• Generic• Domain-independent

• They make more sense if we think of queries in general, not just FO queries

• Define next general queries

Page 5: Lecture 9: Query Complexity

Queries

• A query, q, is a function from models to relations, s.t. for every model D = (D, R1, ..., Rk):– q(D) = R, s.t. R Dn

• Here n is called the arity of q; when n=0, q is called a boolean query

Page 6: Lecture 9: Query Complexity

Property 1: Decidable Queries

• q is decidable if there exists a Turing Machine that, for some encoding of D, given R1, ..., Rk on its input tape, computes q(D)

Page 7: Lecture 9: Query Complexity

Property 2: Domain Independence

• In English– q only depends on R1, ..., Rk, not on D !

– Intuition: a database consists only of R1, ..., Rk, not on D.

• Formally: a query q is domain independent if– for any model (D, R1, ..., Rk)

– for any set D’ s.t. R1 (D’)ar(R1), ..., Rk (D’)ar(Rk)

– the following holds• q(D , R1, ..., Rk) = q(D’, R1, ..., Rk)

Page 8: Lecture 9: Query Complexity

Property 2: Domain Independence

Examples:• Queries that are domain independent:

– “Find pairs of nodes connected by a path of length 2”– “Find the manager of Smith”– “Find the largest salary in the database”

• Queries that are not domain independent:– “Find all nodes that are not in the graph”– “Find the average salary”

Page 9: Lecture 9: Query Complexity

Property 3: Genericity

• In English:– q does not depend on the particular encoding of the

database

• Formally: – for every h:(D,R1, ...,Rk) (D’,R’1, ...,R’k)

– s.t. h=bijective, h(D) = D’, h(R1)=R’1,..., h(Rk)=R’k

– It follows: h(q(D ,R1, ...,Rk)) = q(D’,R’1, ...,R’k)

Page 10: Lecture 9: Query Complexity

Property 3: Genericity

Example:

1

2

4

3

D =

10

20

40

30

D’=

q(D)={1,3}

q(D’)= ??

Page 11: Lecture 9: Query Complexity

Property 3: Genericity

Examples:• Queries that are generic:

– “Find pairs of nodes connected by a path of length 2”– “Find all employees having the same office as their manager”– “Find all nodes that are not in the graph”

• Queries that are not generic:– “Find the manager of Smith”

• we often relax the definition to allow this to be generic• C-genericity, for a set of constants C

– “Find the largest salary in the database”

Page 12: Lecture 9: Query Complexity

Property 3: Genericity

More example:

1

2

4

3

D = q(D)={4}

This query cannot be generic (why ?)

Page 13: Lecture 9: Query Complexity

Back to FO Queries

1. All FO queries are computable2. NOT All FO queries are domain independent

– Why ? Next...

3. All FO queries are generic– In particular query on previous slide not

expressible in FO

Page 14: Lecture 9: Query Complexity

FO Queries and Domain Independence

• Find all nodes that are not in the graph:

• Find all nodes that are connected to “everything”:

• Find all pairs of employees or offices:

• We don’t want such queries !

x)z.R(z,y)y.R(x,q(x)

y)y.R(x,q(x)

Office(y)Emp(x)y)q(x,

Page 15: Lecture 9: Query Complexity

FO Queries and Domain Independence

• Domain independent FO queries are also called safe queries

• Definition. The active domain of (D, R1, ..., Rk) is Da = the set of all constants in R1, ..., Rk

• E.g. for graphs, Da = • Very important:

– If a query is safe, it suffices to range quantifiers only over the active domain (why ?)

x)}z.R(z,y)y.R(x,|{x

Page 16: Lecture 9: Query Complexity

FO Queries and Domain Independence

• The bad news:– Theorem It is undecidable if a given a FO

query is safe.• The good news:

– no big deal– can define a subset of FO queries that we know

are safe = range restricted queries (rr-query)– Any safe query is equivalent to some rr-query

Page 17: Lecture 9: Query Complexity

Range-restriction

• Syntactic, rather ad-hoc definition (several exists):• OK, not OK• OK, not OK• OK, not OK

• If a query q is safe, it is equivalent to a rr-query:

x)R(x, S(x)

y))R(y, y.(S(y) y))R(y,y.(

y))R(y, y.(S(y) y))R(y,y.(

)D y.(y toy. a )D y.(y toy. a

y,...))R(x,...D y D (x toy,...)R(x, aa

x)R(x,

Page 18: Lecture 9: Query Complexity

Safe-FO = Relational Algebra

• Recall the 5 operators in the relational algebra:– U, -, x, ,

• Theorem. A query is expressible in safe-FO iff it is expressible in the relational algebra

Page 19: Lecture 9: Query Complexity

Proof

RA query E safe FO query

)x,...,(x')x,...,(x E'E n1n1

)x,...,R(x R n1

)x,...,(x')x,...,(x E'E n1n1

)y,...,(y')x,...,(x E'E m1n1 a)(x(...) (E)σ ax

)x,...,x,x,...,(x.x,...,x (E)Π m1nn1m1nx,...,x n1

Page 20: Lecture 9: Query Complexity

Proof

Define: Active domain formula:

safe FO query RA query ER )x,...,R(x n1

......)(S)Π(S)(Π )...(R)Π(R)(ΠD 2121a

E' E )z,...,z,x,...,(x')y,...,y,x,...,(x p1n1m1n1

pa

ma

p1n1m1n1

)(D E' )(DE

)z,...,z,x,...,(x')y,...,y,x,...,(x

Page 21: Lecture 9: Query Complexity

E -)(D )x,...,(x nan1

(E)Π )x,...,(x.x n2,3,...,n11

No need for (why ?)

Page 22: Lecture 9: Query Complexity

Examples

• Vocabulary (= schema):– Employee(name, office, mgr), Manager(name, office)

• Find offices:

(E)Π(E))(ΠΠ z))x,z.(E(y,y.(x)q 21,221

Factoid: existential quantifiers ARE projections, and vice versa

Page 23: Lecture 9: Query Complexity

Examples (cont’d)

• Find the manager of all employees:

x))v,v.E(u,w)v,w.E(u,v.u.(y))y.M(x,((x)q2

Page 24: Lecture 9: Query Complexity

Discussion

• (safe)-FO and RA:– (safe)-FO: for declarative query.– RA: for query plan.– Theorem says: translate (safe)-FO to RA– In practice: need to consider “best” RA

• Query languages– (safe)-FO is just one instance; will discuss smaller and

larger languages– All will express only computable, generic, and domain

independent queries

Page 25: Lecture 9: Query Complexity

Classical Logic v.s.Logic on Finite Models

• Recall:– given a model D=(D,R1,...,Rk)– and given a closed FO formula – we have defined what D |= means

• A formula is valid if, for every D, D |= – It is finitely valid if for every finite D, D |=

• A formula is satisfiable if there exists D s.t. D |= – It is finitely satisfiable if there exists a finite D s.t. D |=

• Obviously: is valid iff not() is not satisfiable

Page 26: Lecture 9: Query Complexity

Classical Logic• Notation: |= means is valid • Notation: |-- means is “provable”

Godel’s Completeness Theorem: |= iff |-- Corollary. The set of valid formulas is r.e.

– Idea: enumerate all proofs

Church’s Theorem: if ar(Ri) > 1 for some i, then the set of valid formulas is not decidable.

Corollary. The set of satisfiable formulas is not r.e.

Page 27: Lecture 9: Query Complexity

Logic on Finite Models

Simple Fact: the set of finitely satisfiable formulas is r.e.– Idea: enumerate all finite models D, and all formulas s.t. D |=

Trakhtenbrot’s Theorem: if ar(Ri) > 1 for some i, then the set of finitely satisfiable formulas is not decidable

Corollary: the set of finitely valid formulas is not r.e.

Page 28: Lecture 9: Query Complexity

An Example Where Finite/Infinite Differ

A formula that is satisfiable but not finitely satisfiable – “< is a total order and has no maximal element”

• It has an infinite model, but no finite oneyy.xx.

z)(xz)yyz.(xy.x. x)yy(x x)y y y.(xx.

x)(xx.

Page 29: Lecture 9: Query Complexity

Applications of Trakhtenbrot’s Theorem

• Given a FO query , it is undecidable if is safe– Proof: the query is unsafe iff is finitely

satisfiable• Given two FO queries ’, it is undecidable if they are

equivalent, i.e. ’– Proof the queries and are equivalent iff

is not finitely satisfiable

• Trakhtenbrot’s theorem for FO queries = like Rice’s theorem for programs

R(x)

R(x) R(x)

Page 30: Lecture 9: Query Complexity

More of That

• Definition. A query q is monotone if, for any two finite modelsD = (D, R1, ..., Rk) and D’ = (D’, R1’, ..., Rk’)s.t. D D’, R1 R1’, ..., Rk Rk’we have q(D) q(D’).

• Proposition. It is undecidable if a query q in FO is monotone.

• Proof: why ?

Page 31: Lecture 9: Query Complexity

Complexity of Query Languages

• All queries in a query language L are computable

• Converse false: usually L does not express all computable queries. Limited expressive power.

• Why do we care about such languages ?– Typically queries always terminate (e.g. FO)– Typically queries have a low complexity (next)

Page 32: Lecture 9: Query Complexity

Complexity of Query Languages

For a query language L, define:

• Data complexity: fix a query q, how complex is it to evaluate q(D), for finite models D.

• Expression complexity: fix a finite model D, how complex is it to evaluate q(D), for queries q in L

• Combined complexity: how complex is it to evaluate q(D), for finite models D and queries q in L

Page 33: Lecture 9: Query Complexity

Complexity of Query Languages

Formally:• Data complexity of L is the complexity of

deciding the set:

for some q in L• Combined complexity of L is the

complexity of deciding the set:

q(D)}a and model, finiteD|)a{(D, S

L} q q(D),a and model, finiteD|)a{(D, S

Page 34: Lecture 9: Query Complexity

Who Cares About What

• Users: care about data complexity:– the query q is fixed; the database D is variable

• Database Systems: care about combined complexity:– both the query q and the database D are variable

• Database Theoreticians:– care about expression complexity, when they need to

publish more papers

Page 35: Lecture 9: Query Complexity

Crash Course in Complexity Classes

• Fix a problem, i.e. a set S. Given a value x, how difficult is it for a Turing Machine to decide whether x S

Finitecontrol

a b c b c d

Initially holds an encoding of x

Page 36: Lecture 9: Query Complexity

• Let n = |x|• Definition. S is in PTIME if there exists a Turing

machine that on every input x takes nO(1) steps (i.e. O(nk), for some k > 0).

• Definition. S is in PTIME if there exists a Turing machine for S that on every input x takes nO(1) space. Note: may take A LOT of time.

• Definition. S is LOGSPACE if there exists a Turing machine for S that on every input takes O(log n) space. OOPS !?!