relational model - homepages.inf.ed.ac.ukhomepages.inf.ed.ac.uk/apieris/courses/atfd2020/relational...

Post on 17-May-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RelationalModel

AdvancedTopicsinFoundationsofDatabases,UniversityofEdinburgh,2019/20

RelationalModel

• Manyadhocmodelsbefore1970

− Hardtoworkwith

− Hardtoreasonabout

• 1970:RelationalModelbyEdgarFrankCodd

− Dataarestoredinrelations (ortables)

− Queriedusingadeclarativelanguage

− DBMSconvertsdeclarativequeriesintoproceduralqueries thatare

optimizedandexecuted

• KeyAdvantages

− Simpleandcleanmathematicalmodel(basedonlogic)

− Separationofdeclarativeandprocedural

RelationalDatabases

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

Constants

VIE,LHR,…

BA,U2,…

Vienna,London,…

RelationalDatabases

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

Relations

Constants

VIE,LHR,…

BA,U2,…

Vienna,London,…

RelationalDatabases

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

Constants

VIE,LHR,…

BA,U2,…

Vienna,London,…

Relations

Tuples

Relationalatoms

Flight(LHR,EDI,BA)

Airport(LGW,London)

Querying:RelationalAlgebra

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

Listalltheairlines

Querying:RelationalAlgebra

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

Listalltheairlines

πairline Flight

{BA,U2,OS}

Querying:RelationalAlgebra

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListthecodesoftheairportsinLondon

Querying:RelationalAlgebra

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListthecodesoftheairportsinLondon

πcode (σcity=‘London’Airport)

{LHR,LGW}

Querying:RelationalAlgebra

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListtheairlinesthatflydirectlyfromLondontoGlasgow

Querying:RelationalAlgebra

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListtheairlinesthatflydirectlyfromLondontoGlasgow

πairline ((Flight⋈origin=code(σcity=‘London’ Airport))⋈destination=code(σcity=‘Glasgow’Airport))

Querying:RelationalAlgebra

Aux origin destination airline code city code city

LGW GLA U2 LGW London GLA Glasgow

ListtheairlinesthatflydirectlyfromLondontoGlasgow

πairline ((Flight⋈origin=code(σcity=‘London’ Airport))⋈destination=code(σcity=‘Glasgow’Airport))

{U2}

definestheauxiliaryrelationAux

• Selection:σ

• Projection:π

• Crossproduct:×

• Naturaljoin:⋈

• Rename:ρ

• Difference:∖

• Union:∪

• Intersection:∩

RelationalAlgebra

inboldaretheprimitiveoperators

Formaldefinitionscanbefoundinanydatabasetextbook

Querying:DomainRelationalCalculus

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

Listalltheairlines

{z|∃x∃yFlight(x,y,z)}

Querying:DomainRelationalCalculus

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

Listalltheairlines

{BA,U2,OS}

Querying:DomainRelationalCalculus

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListthecodesoftheairportsinLondon

{x|∃yAirport(x,y) ∧ y=London}

Querying:DomainRelationalCalculus

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListthecodesoftheairportsinLondon

{LHR,LGW}

Querying:DomainRelationalCalculus

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListtheairlinesthatflydirectlyfromLondontoGlasgow

{z|∃x∃y∃u∃v Airport(x,u) ∧ u=London∧ Airport(x,u) ∧ u=London}∧ Flight(x,y,z)}

Querying:DomainRelationalCalculus

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

ListtheairlinesthatflydirectlyfromLondontoGlasgow

{U2}

{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}

DomainRelationalCalculus

{x1,…,xk |φ}

first-orderformulawith

freevariables{x1,…,xk}

But,wecanexpress“problematic”queries,i.e.,dependonthedomain

DomainRelationalCalculus

domain ={1,2,3}

D ={R(1,1),R(1,2)}Ans={}

{x1,…,xk |φ}

first-orderformulawith

freevariables{x1,…,xk}

But,wecanexpress“problematic”queries,i.e.,dependonthedomain

{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}

DomainRelationalCalculus

domain ={1,2}

D ={R(1,1),R(1,2)}Ans={1}

{x1,…,xk |φ}

first-orderformulawith

freevariables{x1,…,xk}

But,wecanexpress“problematic”queries,i.e.,dependonthedomain

{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}

DomainRelationalCalculus

…thus,weadopttheactivedomainsemantics - quantifiedvariablesrangeover

theactivedomain,i.e.,theconstantsoccurringintheinputdatabase

{x1,…,xk |φ}

first-orderformulawith

freevariables{x1,…,xk}

But,wecanexpress“problematic”queries,i.e.,dependonthedomain

{x|∀yR(x,y)}{x|¬R(x)}{x,y |R(x)∨ R(y)}

Algebra=Calculus

Afundamentaltheorem(assumingtheactivedomainsemantics):

Theorem: Thefollowingquerylangaugesareequallyexpressive

• RelationalAlgebra(RA)

• DomainRelationalCalculus(DRC)

• TupleRelationalCalculus(TRC)

Note:Tuplerelationalcalculus isthedeclarativelanguageintroducebyCodd.Domainrelational

calculushasbeenintroducedlaterasaformalismclosertofirst-orderlogic

{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧

Flight(x,z,w)∧ Flight(z,y,v)

Quiz!

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

IsGlasgowreachablefromVienna?

London

Vienna

Larnaca

Glasgow

Edinburgh

YES

Quiz!

London

Vienna

Larnaca

Glasgow

Edinburgh

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

IsGlasgowreachablefromVienna?

NO

{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧

Flight(x,z,w)∧ Flight(z,y,v)

{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧

Flight(x,z,w)∧ Flight(z,y,v)

Quiz!

London

Vienna

Larnaca

Glasgow

Edinburgh

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

IsGlasgowreachablefromVienna?

YES∧ Flight(z,z1,w1) ∧

∃z1∃w1

Quiz!

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

IsGlasgowreachablefromVienna?

NO

London

Vienna

Larnaca

Glasgow

Edinburgh

{|∃x∃y∃z∃w∃vAirport(x,Vienna) ∧ Airport(y,Glasgow) ∧

Flight(x,z,w)∧ Flight(z,y,v)

∧ Flight(z,z1,w1) ∧

∃z1∃w1

Quiz!

Flight origin destination airline

VIE LHR BA

LHR EDI BA

LGW GLA U2

LCA VIE OS

Airport code city

VIE Vienna

LHR London

LGW London

LCA Larnaca

GLA Glasgow

EDI Edinburgh

IsGlasgowreachablefromVienna?

London

Vienna

Larnaca

Glasgow

Edinburgh

Recursivequery- notexpressibleinRA/DRC/TRC

(unlessweboundthenumberofintermediatestops)

ComplexityofQueryLanguages

• Thegoalistounderstandthecomplexityofevaluatingaqueryoveradatabase

• Ourmaintechnicaltooliscomplexitytheory

• Whattomeasure?Queriesmayhavealargeoutput,anditwouldbeunfairto

counttheoutputas“complexity”

• Wethereforeconsiderthefollowingdecisionproblems:

− QueryOutputTuple(QOT)

− BooleanQueryEvaluation(BQE)

AFewWordsonComplexityTheory

detailscanbefoundinthestandardtextbooks

seealsonotesonthewebpageofthecourse

ComplexityClasses

TIME(f(n)) = {Π |ΠisdecidedbysomeDTMintimeO(f(n))}

NTIME(f(n)) = {Π |ΠisdecidedbysomeNTMintimeO(f(n))}

SPACE(f(n)) = {Π |ΠisdecidedbysomeDTMusingspaceO(f(n))}

NSPACE(f(n)) = {Π |ΠisdecidedbysomeNTMusingspaceO(f(n))}

PTIME = ⋃k>0 TIME(nk)

NP = ⋃k>0 NTIME(nk)

EXPTIME = ⋃k>0 TIME(2nk)

NEXPTIME = ⋃k>0 NTIME(2nk)

LOGSPACE = SPACE(logn)

NLOGSPACE = NSPACE(logn)

PSPACE = ⋃k>0 SPACE(nk)

EXPSPACE = ⋃k>0 SPACE(2nk)

ComplexityClasses

• Wecannowrecallthestandardtimeandspacecomplexityclasses:

thesedefinitionsarerelyingontwo-

tapeTuringmachineswitharead-

onlyandaread/writetape

• ForeverycomplexityclassCwecandefineitscomplementaryclasscoC

RelationshipAmongComplexityClasses

LOGSPACE⊆ NLOGSPACE⊆ PTIME⊆ NP,coNP ⊆

PSPACE⊆ EXPTIME⊆ NEXPTIME,coNEXPTIME ⊆⋯

Someusefulnotes:

• ForadeterministiccomplexityclassC,coC =C

• coNLOGSPACE =NLOGSPACE

• ItisgenerallybelievedthatPTIME≠ NP,butwedon’tknow

• PTIME⊂ EXPTIME⇒ atleastonecontainmentbetweenthemisstrict

• PSPACE=NPSPACE,EXPSPACE=NEXPSPACE,etc.

• But,wedon’tknowwhetherLOGSPACE=NLOGSPACE

CompleteProblems

• Thesearethehardestproblemsinacomplexityclass

• AproblemthatiscompleteforaclassC,itisunlikelytobelonginalowerclass

• AproblemΠiscomplete foracomplexityclassC,orsimplyC-complete,if:

1. Π ∈ C

2. ΠisC-hard,i.e.,everyproblemΠ’∈ Ccanbeefficientlyreduced toΠ

• ToshowthatΠisC-harditsufficestoreducesomeC-hardproblemΠ’toit

thereexistsalogspace algorithmthatcomputesafunctionfsuchthat

w ∈ Π’ iff f(w) ∈ Π - inthiscasewewriteΠ’≤L Π

SomeCompleteProblems

• NP-complete

‒ SAT(satisfiabilityofpropositionalformulas)

‒ Manygraph-theoreticproblems(e.g.,3-colorability)

‒ Travelingsalesman

‒ etc.

• PSPACE-complete

‒ QuantifiedSAT(orsimplyQSAT)

‒ Equivalenceoftworegularexpressions

‒ Manygames(e.g.,Geography)

‒ etc.

BacktoQueryLanguages

ComplexityofQueryLanguages

• Thegoalistounderstandthecomplexityofevaluatingaqueryoveradatabase

• Ourmaintechnicaltooliscomplexitytheory

• Whattomeasure?Queriesmayhavealargeoutput,anditwouldbeunfairto

counttheoutputas“complexity”

• Wethereforeconsiderthefollowingdecisionproblems:

− QueryOutputTuple(QOT)

− BooleanQueryEvaluation(BQE)

ComplexityofQueryLanguages

QOT(L)

Input:adatabaseD,aqueryQ/k ∈ L,atupleofconstantst∈ adom(D)k

Question: t ∈ Q(D)?

Someusefulnotation:

• GivenadatabaseD,andaqueryQ,Q(D)istheanswer toQ overD

• adom(D)istheactivedomain ofD - theconstantsoccurringinD

• WewriteQ/k forthefactthatthearity ofQ isk≥ 0

L issomequerylanguage;forexample,RA,DRC,etc. - wewillseeseveralquerylanguages

ComplexityofQueryLanguages

BQE(L)

Input:adatabaseD,aBooleanqueryQ ∈ L

Question: isQ(D)non-empty?

Someusefulnotation:

• GivenadatabaseD,andaqueryQ,Q(D)istheanswer toQ overD

• adom(D)istheactivedomain ofD - theconstantsoccurringinD

• WewriteQ/k forthefactthatthearity ofQ isk≥ 0

L issomequerylanguage;forexample,RA,DRC,etc. - wewillseeseveralquerylanguages

(≡Lmeanslogspace-equivalent)

Theorem: QOT(L)≡L BQE(L),whereL ∈ {RA,DRC,TRC}

BQE(L)

Input:adatabaseD,aBooleanqueryQ ∈ L

Question: isQ(D)non-empty?

QOT(L)

Input:adatabaseD,aqueryQ/k ∈ L,atupleofconstantst∈ adom(D)k

Question: t ∈ Q(D)?

ComplexityofQueryLanguages

Theorem: QOT(DRC)≡L BQE(DRC)

Proof: (≤L)ConsideradatabaseD,ak-ary queryQ ={x1,…,xk |φ},andatuple(t1,…,tk)

LetQbool ={|∃x1⋯∃xk (φ ∧ x1=t1 ∧ x2=t2 ∧⋯ ∧ xk=tk)}

Clearly,(t1,…,tk) ∈ Q(D) iff Qbool (D)isnon-empty

(≥L)Trivial- aBooleandomainRCqueryisadomainRCquery

ComplexityofQueryLanguages

(letusshowthisfordomainrelationalcalculus)

BQE[D](L)

Input:aBooleanqueryQ ∈ L

Question: isQ(D)non-empty?

ComplexityMeasures

• Combinedcomplexity - bothD andQ arepartoftheinput

• Querycomplexity - fixedD,inputQ

• Datacomplexity - inputD,fixedQ

BQE[Q](L)

Input:adatabaseD

Question: isQ(D)non-empty?

Theorem: ForL ∈ {RA,DRC,TRC}thefollowinghold:

• BQE(L)isPSPACE-complete(combinedcomplexity)

• BQE[D](L)isPSPACE-complete,forafixeddatabaseD(querycomplexity)

• BQE[Q](L)isinLOGSPACE,forafixedqueryQ ∈ L (datacomplexity)

ComplexityofRA,DRC,TRC

Proof hints:

• Recursive algorithm that uses polynomial space in Q and logarithmic space in D

• Reduction from QSAT (a standardPSPACE-hardproblem)

Evaluating(Boolean)DRCQueries

Eval(D,φ) - forbrevitywewriteφ insteadof{|φ}

• Ifφ= R(t1,…,tk), then YESiff R(t1,…,tk) ∈ D

• Ifφ= ψ1∧ψ2,then YESiff Eval(D,ψ1) =YESandEval(D,ψ2) =YES

• Ifφ= ¬ψ,thenNOiff Eval(D,ψ) =YES

• Ifφ= ∃xψ(x),then YESiff forsomet∈ adom(D),Eval(D,ψ(t)) =YES

Lemma: It holds that

• Eval(D,φ) always terminates - this is trivial

• Eval(D,φ) = YES iff Q(D) is non-empty, whereQ = { | φ}

• Eval(D,φ) uses O(|φ| ⋅ log |φ| + |φ|2 ⋅ log |D|) space

Theorem: ForL ∈ {RA,DRC,TRC}thefollowinghold:

• BQE(L)isPSPACE-complete(combinedcomplexity)

• BQE[D](L)isPSPACE-complete,forafixeddatabaseD(querycomplexity)

• BQE[Q](L)isinLOGSPACE,forafixedqueryQ ∈ L (datacomplexity)

ComplexityofRA,DRC,TRC

Proof hints:

• Recursive algorithm that uses polynomial space in Q and logarithmic space in D

• Reduction from QSAT (a standardPSPACE-hardproblem)

EQUIV(L)

Input:twoqueriesQ1 ∈ Land Q2 ∈ L

Question:Q1≡Q2? orQ1(D)=Q2(D)forevery(finite)databaseD?

SAT(L)

Input:aqueryQ ∈ L

Question: istherea(finite)databaseD suchthatQ(D)isnon-empty?

OtherImportantAlgorithmicProblems

CONT(L)

Input:twoqueriesQ1 ∈ Land Q2 ∈ L

Question:Q1⊆Q2? orQ1(D)⊆Q2(D)forevery(finite)databaseD?

EQUIV(L)

Input:twoqueriesQ1 ∈ Land Q2 ∈ L

Question:Q1≡Q2? orQ1(D)=Q2(D)forevery(finite)databaseD?

SAT(L)

Input:aqueryQ∈ L

Question: istherea(finite)databaseDsuchthatQ(D)isnon-empty?

OtherImportantAlgorithmicProblems

CONT(L)

Input:twoqueriesQ1 ∈ Land Q2 ∈ L

Question:Q1⊆Q2? orQ1(D)⊆Q2(D)forevery(finite)databaseD?

theseproblemsareimportant

foroptimizationpurposes

SAT(L)

Input:aqueryQ ∈ L

Question: istherea(finite)databaseD suchthatQ(D)isnon-empty?

OtherImportantAlgorithmicProblems

• If the answer is no, then the input query Q makes no sense

• Query evaluation becomes trivial - the answer is always NO!

EQUIV(L)

Input:twoqueriesQ1 ∈ Land Q2 ∈ L

Question:Q1≡Q2? orQ1(D)=Q2(D)forevery(finite)databaseD?

OtherImportantAlgorithmicProblems

• Replace a query Q1 with a query Q2 that is easier to evaluate

• But, we have to be sure that Q1(D) = Q2(D) for every database D

• Approximate a query Q1 with a query Q2 that is easier to evaluate

• But, we have to be sure that Q1(D)⊆Q2(D) for every database D

CONT(L)

Input:twoqueriesQ1 ∈ Land Q2 ∈ L

Question:Q1⊆Q2? orQ1(D)⊆Q2(D)forevery(finite)databaseD?

OtherImportantAlgorithmicProblems

Theorem: ForL ∈ {RA,DRC,TRC},SAT(L)isundecidable

Proof hint: By reduction from the halting problem.

Given a Turing machine M, we can construct a query QM ∈ L such that:

Mhaltsontheemptystringiff thereexistsadatabaseD suchthatQ(D)isnon-empty

SATisUndecidable

Note:Actually,thisresultgoesbacktothe1950when

BorisA.Trakhtenbrotprovedthattheproblemofdeciding

whetherafirst-ordersentencehasafinitemodelisundecidable

AneasyconsequenceofthefactthatSATisundecidableisthat:

Theorem: ForL ∈ {RA,DRC,TRC},EQUIV(L)andCONT(L)areundecidable

Proof: By reduction from the complement of SAT(L)

• Consider a query Q ∈ L - i.e., an instance of SAT(L)

• LetQ’ beaquerythatisunsatisfiable,i.e.,Q’(D)isemptyforeveryD

• Forexample,whenL =DRC,Q’ canbethequery{|∃xR(x)∧¬R(x)}

• Clearly,Qisunsatisfiable iff Q≡Q’(orevenQ⊆Q’)

EQUIVandCONTareUndecidable

Recap

• The main languages for querying relational databases are:

− Relational Algebra (RA)

− Domain Relational Calcuclus (DRC)

− Tuple Relational Calculus (TRC)

• Evaluation is decidable, and highly tractable indata complexity

− Foundations of the database industry

− The core of SQL is equally expressive to RA/DRC/TRC

RA =DRC =TRC

(undertheactivedomainsemantics)

• Satisfiability, equivalence and containment are undecidable

− Perfect query optimization is impossible

top related