compsci516 data intensive computing systems - cs.duke.edu · duke cs, fall 2016 compsci 516: data...

69
CompSci 516 Data Intensive Computing Systems Lecture 21 Datalog Instructor: Sudeepa Roy 1 Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems

Upload: others

Post on 17-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

CompSci 516DataIntensiveComputingSystems

Lecture21Datalog

Instructor:Sudeepa Roy

1DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

Page 2: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Announcement

• HW3duenextWednesday:11/16

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 2

Page 3: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Today

• Datalog– forrecursion indatabasequeries

• AquicklookatIncrementalViewMaintenance(IVM)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 3

Page 4: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

ReadingMaterial:DatalogOptional:1. Thedatalog chaptersinthe“AliceBook”FoundationsofDatabasesAbiteboul-Hull-VianuAvailableonline:http://webdam.inria.fr/Alice/

2.Datalog tutorialSIGMOD2011“Datalog andEmergingApplications:AnInteractiveTutorial”

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 4

Page 5: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

BriefHistoryofDatalog

• MotivatedbyProlog– startedbackin1970-80’s– thenquietforalongtime

• AlongargumentintheDatabasecommunitywhetherrecursionshouldbesupportedinquerylanguages– “Nopracticalapplicationsofrecursivequerytheory...havebeenfoundto

date”—MichaelStonebraker,1998ReadingsinDatabaseSystems,3rdEditionStonebraker andHellerstein,eds.

– RecentworkbyHellerstein etal.onDatalog-extensionstobuildnetworkingprotocolsanddistributedsystems.[Link]

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 5

Page 6: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Datalog isresurging!

• NumberofpapersandtutorialsinDBconferences

• Applicationsin– dataintegration,declarativenetworking,programanalysis,information

extraction,networkmonitoring,security,andcloudcomputing

• Systemssupportingdatalog inbothacademiaandindustry:– Lixto (informationextraction)– LogicBlox (enterprisedecisionautomation)– Semmle (programanalysis)– BOOM/Dedalus (Berlekey)– Coral– LDL++

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 6

Page 7: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

RecallourdrinkerexampleinRC(Lecture4)

Find drinkers that frequent some bar that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

7CompSci516:DataIntensiveComputingSystems

DukeCS,Fall2016

DrinkerexampleisfromslidesbyProfs.Balazinska andSuciuandthe[GUW]book

Page 8: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

WriteitasaDatalog RuleFind drinkers that frequent some bar that serves some beer they like.

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

8CompSci516:DataIntensiveComputingSystems

DukeCS,Fall2016

RC:Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Datalog:Q(x) :- Frequents(x, y), Serves(y,z), Likes(x,z)

Page 9: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

WriteitasaDatalog RuleFind drinkers that frequent some bar that serves some beer they like.

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

9CompSci516:DataIntensiveComputingSystems

DukeCS,Fall2016

Datalog:Q(x) :- Frequents(x, y), Serves(y,z), Likes(x,z)

RC:Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

• Quickdifferences:– Uses“:-”not=– noneedfor$ (assumedbydefault)– Use“,”ontherighthandside(RHS)– AnythingonRHStheof:- isassumedtobecombinedwith∧ bydefault– ",Þ,notallowed– theyneedtousenegation¬– Standard“Datalog”doesnotallownegation– Negationallowedindatalog withnegation

• Howtospecifydisjunction(OR/⋁)?

Page 10: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example:ORinDatalogFind drinkers that (a) either frequent some bar that serves some beer they like, (b) or like beer “BestBeer”

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

10CompSci516:DataIntensiveComputingSystems

DukeCS,Fall2016

Datalog:Q(x) :- Frequents(x, y), Serves(y,z), Likes(x,z)Q(x) :- Likes(x, “BestBeer”)

RC:Q(x)=[$y.$z.Frequents(x,y)∧Serves(y,z)∧Likes(x,z)]⋁ [Likes(x,“BestBeer”)]

Page 11: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example:ORinDatalogFind drinkers that (a) either frequent some bar that serves some beer they like, (b) or like beer “BestBeer”, (c) or, frequent bars that “Joe” frequents

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

11CompSci516:DataIntensiveComputingSystems

DukeCS,Fall2016

Datalog:JoeFrequents(w):- Frequents(“Joe”,w)Q(x):- Frequents(x,y),Serves(y,z),Likes(x,z)Q(x):- Likes(x,“BestBeer”)Q(x):- Frequents(x,w),JoeFrequents(w)

RC:Q(x)=[$y.$z.Frequents(x,y)∧Serves(y,z)∧Likes(x,z)]⋁ [Likes(x,“BestBeer”)]

⋁ [$w Frequents(x,w)∧ Frequents(“Joe”,w)]

• Tospecify“OR”,writemultipleruleswiththesame“Head”• Next:terminologyforDatalog

Page 12: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

• Each rule is of the form Head :- Body

• Each variable in the head of each rule must appear in the body of the rule

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

12CompSci516:DataIntensiveComputingSystems

DukeCS,Fall2016

JoeFrequents(w):- Frequents(“Joe”,w)Q(x):- Frequents(x,y),Serves(y,z),Likes(x,z)Q(x):- Likes(x,“BestBeer”)Q(x):- Frequents(x,w),JoeFrequents(w)

Fourrules

BodyHead

Datalog Rules

Atom

Variable

Page 13: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

EDBsandIDBs

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

13

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

• Intensional DataBases (IDBs)– Relationsthatarederived– Canbeintermediateorfinaloutputtables– e.g.JoeFrequents,Q– CanbeontheLHSorRHS(e.g.JoeFrequents)

• ExtensionalDataBases (EDBs)– Inputrelationnames– e.g.Likes,Frequents,Serves– canonlybeontheRHSofarule

JoeFrequents(w):- Frequents(“Joe”,w)Q(x):- Frequents(x,y),Serves(y,z),Likes(x,z)Q(x):- Likes(x,“BestBeer”)Q(x):- Frequents(x,w),JoeFrequents(w)

TupleinanEDBoranIDB:aFACT

eitherbelongstoagivenEDBrelation,orisderivedinanIDB relation

Page 14: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

GraphExample

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

14

a

b

c

d e

V1 V2

a c

b a

b d

c d

d a

d e

E(edgerelation)

Page 15: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example1

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

15

E(edgerelation)

WriteaDatalog programtofindpathsoflengthtwo(outputstartandfinishvertices)

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

Page 16: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example1

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

16

E(edgerelation)

WriteaDatalog programtofindpathsoflengthtwo(outputstartandfinishvertices)

P2(x,y):- E(x,z),E(z,y)

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

Page 17: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example1:Execution

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

17

E(edgerelation)

WriteaDatalog programtofindpathsoflengthtwo(outputstartandfinishvertices)

P2(x,y):- E(x,z),E(z,y)

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

V1 V2

a db cb ec ac ed c

P2

sameasE⨝E.V2=E.V1 E

Page 18: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

18

E(edgerelation)

WriteaDatalog programtofindallpairsofvertices(u,v)suchthatvisreachablefromu

• CanyouwriteaSQL/RA/RCqueryforreachability?

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

Page 19: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

19

E(edgerelation)

• CanyouwriteaSQL/RA/RCqueryforreachability?• NO- SQL/RA/RCcannotexpressreachability

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e WriteaDatalog programtofindallpairsofvertices(u,v)suchthatvisreachablefromu

Page 20: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

20

E(edgerelation)

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)

Option1

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e WriteaDatalog programtofindallpairsofvertices(u,v)suchthatvisreachablefromu

Page 21: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

21

E(edgerelation)

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)

Option1 R(x,y):- E(x,y)R(x,y):- R(x,z),E(z,y)

Option2

R(x,y):- E(x,y)R(x,y):- R(x,z),R(z,y)

Option3

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e WriteaDatalog programtofindallpairsofvertices(u,v)suchthatvisreachablefromu

linear

non-linear

Page 22: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

LinearDatalog

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

22

• Linearrule– atmostoneatominthebodythatisrecursivewiththeheadoftherule

– e.g.R(x,y):- E(x,z),R(z,y)• Lineardatalog program

– ifallrulesarelinear– likelinearrecursion

• Top-downandbottom-upevaluationarepossible– wewillfocusonbottom-up

Page 23: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2:Execution

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

23

E

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)

Option1

V1 V2

a cb ab dc dd ad e

Iteration1 R=E

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

(verticesreachablein1-hopbyadirectedge)

Page 24: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2:Execution

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

24

E

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)

Option1

V1 V2

a cb ab dc dd ad ea db cb ec ac ed c

RIteration2

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

(verticesreachablein2-hops)

Page 25: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2:Execution

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

25

E

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)

Option1

V1 V2

a cb ab dc dd ad ea db cb ec ac ed ca ea ac cd d

RIteration3

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

(verticesreachablein3-hops)

Page 26: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Example2:Execution

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

26

E

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)

Option1

V1 V2

a cb ab dc dd ad ea db cb ec ac ed ca ea ac cd d

RIteration4

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

Runchanged- stop

Page 27: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Examples3and4

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

27

E(edgerelation)

WriteaDatalog programtofindallverticesreachablefromb

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)QB(y):- R(b,y)

V1 V2

a c

b a

b d

c d

d a

d e

a

b

c

d e

WriteaDatalog programtofindallverticesureachablefromthemselvesR(u,u)

R(x,y):- E(x,y)R(x,y):- E(x,z),R(z,y)Q(x):- R(x,x)

Page 28: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

TerminationofaDatalog Program

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

28

Q. ADatalog programalwaysterminates– why?

Page 29: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

TerminationofaDatalog Program

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

29

Q. ADatalog programalwaysterminates– why?

• Becausethevaluesofthevariablesarecomingfromthe“activedomain”intheinputrelations(EDBs)

• Activedomain=(finite)valuesfromthe(possiblyinfinite)domainappearingintheinstanceofadatabase

– e.g.agecanbeanyinteger(infinite),butactivedomainisonlyfinitelymanyinR(id,name,age)

• ThereforethenumberofpossiblevaluesineachoftheIDBsisfinite

• e.g.inthereachabilityexampleR(x,y),thevaluesofxandycomefrom{a,b,c,d,e}

– atmost5x5=25tuplespossibleintheIDBR(x,y)– inanyiteration,atleastonenewtupleisaddedinatleastoneIDB– Muststopafterfinitesteps– e.g.themaximumnumberofiterationinthereachabilityexampleforanygraphwith

fiveverticesis25(itwasonly4inourexample)

Page 30: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Bottom-upEvaluationofaDatalog Program

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

30

• Naïveevaluation

• Semi-naïveevaluation

Page 31: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Naïveevaluation- 1

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

31

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

Iteration1:R=E=R1(say)

E

a

b

c

d e

Inallsubsequentiteration,checkifanyoftherulescanbeapplied

DounionofalltheruleswiththesameheadIDB

Page 32: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Naïveevaluation- 2

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

32

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

V1 V2

a cb ab dc dd ad ea db cb ec ac ed c

Iteration1:R=E=R1(say)

E

a

b

c

d e

Iteration2:R=E∪

E⨝ R1=R2(say)

R1≠R2socontinue

Page 33: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Naïveevaluation- 3

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

33

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

V1 V2

a cb ab dc dd ad ea db cb ec ac ed c

V1 V2

a cb ab dc dd ad ea db cb ec ac ed ca ea ac cd d

Iteration1:R=E=R1(say)

E

a

b

c

d e

Iteration2:R=E∪

E⨝ R1=R2(say)

R1≠R2socontinue

Iteration3:R=E∪

E⨝ R2=R3(say)

R2≠R3socontinue

Page 34: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Naïveevaluation- 4

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

34

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

V1 V2

a cb ab dc dd ad ea db cb ec ac ed c

V1 V2

a cb ab dc dd ad ea db cb ec ac ed ca ea ac cd d

Iteration1:R=E=R1(say)

E

a

b

c

d e

Iteration2:R=E∪

E⨝ R1=R2(say)

R1≠R2socontinue

Iteration3:R=E∪

E⨝ R2=R3(say)

R2≠R3socontinue

Iteration4:R=E∪

E⨝ R3=R4(say)

R3=R4soSTOP

Page 35: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

ProblemwithNaïveEvaluation

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

35

• ThesameIDBfactsarediscoveredagainandagain– e.g.ineachiterationalledgesinEareincludedinR– Inthe2nd-4th iterations,thefirstsixtuplesinRarecomputed

repeatedly

• Solution:Semi-NaïveEvaluation

• Workonlywiththenewtuplesgeneratedinthepreviousiteration

Page 36: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Semi-Naïveevaluation- 1

DukeCS,Fall2016CompSci 516:DataIntensiveComputing

Systems 36

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

Iteration1:R=E=R1(say)ΔR1=R1

E

a

b

c

d e

Initially:R=Φ

Page 37: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Semi-Naïveevaluation- 2

DukeCS,Fall2016CompSci 516:DataIntensiveComputing

Systems 37

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

V1 V2

a cb ab dc dd ad ea db cb ec ac ed c

Iteration1:R=E=R1(say)ΔR1=R1

E

a

b

c

d e

Iteration2:R=R1∪

E⨝ ΔR1=R2(say)

ΔR2=R2– R1

ΔR2≠Φsocontinue

Initially:R=Φ

Page 38: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Semi-Naïveevaluation- 3

DukeCS,Fall2016CompSci 516:DataIntensiveComputing

Systems 38

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

V1 V2

a cb ab dc dd ad ea db cb ec ac ed c

V1 V2

a cb ab dc dd ad ea db cb ec ac ed ca ea ac cd d

Iteration1:R=E=R1(say)ΔR1=R1

E

a

b

c

d e

Iteration2:R=R1∪

E⨝ ΔR1=R2(say)

ΔR2=R2– R1

ΔR2≠Φsocontinue

Iteration3:R=R2∪

E⨝ ΔR2=R3(say)

ΔR3=R3– R2

ΔR3≠Φsocontinue

Initially:R=Φ

Page 39: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Semi-Naïveevaluation- 4

DukeCS,Fall2016CompSci 516:DataIntensiveComputing

Systems 39

V1 V2

a cb ab dc dd ad e

V1 V2

a c

b a

b d

c d

d a

d e

V1 V2

a cb ab dc dd ad ea db cb ec ac ed c

V1 V2

a cb ab dc dd ad ea db cb ec ac ed ca ea ac cd d

Iteration1:R=E=R1(say)ΔR1=R1

E

a

b

c

d e

Iteration2:R=R1∪

E⨝ ΔR1=R2(say)

ΔR2=R2– R1

ΔR2≠Φsocontinue

Iteration3:R=R2∪

E⨝ ΔR2=R3(say)

ΔR3=R3– R2

ΔR3≠Φsocontinue

Iteration4:R=R3∪E⨝ ΔR3=R4(say)

ΔR4=R4– R3ΔR=Φ(CHECKJ)soSTOP

Initially:R=Φ

Page 40: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

IncrementalViewMaintenance(IVM)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

40

• Whydidthesemi-naïvealgorithmwork?• BecauseofthegenerictechniqueofIncrementalView

Maintenance(IVM)

• Supposeyouhave– adatabaseD=(R1,R2,R3)– aqueryQthatgivesanswerQ(D)– D=(R1,R2,R3)getsupdatedtoD’=(R1’,R2’,R3’)– e.g.R1’=R1∪ ΔR1(insertion),R2’=R2- ΔR1(deletion)etc.

Page 41: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

IncrementalViewMaintenance(IVM)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

41

• Whydidthesemi-naïvealgorithmwork?• BecauseofthegenerictechniqueofIncrementalView

Maintenance(IVM)

• Supposeyouhave– adatabaseD=(R1,R2,R3)– aqueryQthatgivesanswerQ(D)– D=(R1,R2,R3)getsupdatedtoD’=(R1’,R2’,R3’)– e.g.R1’=R1∪ ΔR1(insertion),R2’=R2- ΔR1(deletion)etc.

• IVM: CanyoucomputeQ(D’)usingQ(D)andΔR1,ΔR2,ΔR3withoutcomputingitfromscratch(i.e.donotrerunthequeryQ)?

Page 42: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

IVM Example:Selection

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

42

V1 V2

a cb ad ac d

V1 V2

a cb ad ac db dd e

σV1=bR

R

V1 V2

b a

R’=R∪ ΔR

ΔR

σV1=bR’

V1 V2

b ab d

V1 V2

b a

V1 V2

b d

σV1=bR σV1=bΔR

• σV1=b(R∪ ΔR)=σV1=bR∪ σV1=bΔR• Itsufficestoapplytheselectionconditiononly onΔR

– andincludewiththeoriginalsolution

Page 43: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

IVM Example:Projection

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

43

• πV1(R∪ ΔR)=πV1R∪ πV1ΔR• Itsufficestoapplytheprojectionconditiononly onΔR

– andincludewiththeoriginalsolution

V1 V2

a cb ad ad e

V1 V2

a cb ad ad eb dc d

πV1R

R

V1

abd

R’=R∪ ΔR

ΔR

πV1R’

V1

bc

πV1R

πV1ΔR

V1

abdc

V1

abd

Page 44: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

IVM Example:Join

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

44

(R∪ ΔR)⨝ (S∪ ΔS)=(R⨝ S)∪ (R⨝ ΔS)∪ (ΔR⨝ S)∪ (ΔR⨝ ΔS)

A B

a1 b1a2 b2a3 b1

B C

b1 c1b2 c2

S’=S∪ ΔSR’=R∪ ΔR

ΔRΔS

A B

a1 b1B C

b1 c1A B C

a1 b1 c1⨝

=

A B

a1 b1B C

b1 c1⨝

A B

a1 b1⨝

B C

b2 c2

A B

a2 b2a3 b1

⨝B C

b1 c1

=

A B

a2 b2a3 b1

B C

b2 c2⨝

A B C

a1 b1 c1a3 b1 c1a2 b2 c2

= =

Page 45: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

IVM forLinearDatalog Rule

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

45

(R∪ ΔR)⨝ (S∪ ΔS)=(R⨝ S)∪ (R⨝ ΔS)∪ (ΔR⨝ S)∪ (ΔR⨝ ΔS)

A B

a1 b1a2 b2a3 b1

B C

b1 c1

S’=S

R’=R∪ ΔR

ΔR

A B

a1 b1B C

b1 c1A B C

a1 b1 c1⨝

=

A B C

a1 b1 c1a3 b1 c1

=

• R(x,y):- E(x,z),R(z,y)– i.e.Rnew =E⨝ R

• ButEisEDB– ΔE=Φ

• Therefore,E⨝ (R∪ ΔR)=(E⨝ R)∪ (E⨝ ΔR)• Itsufficestojoinwiththedifference

ΔRandincludeintheresultinthepreviousroundE⨝ R

• Advantageofhaving“linearrule”

Page 46: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

(Non-recursive)Datalog withNegation

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

46

• RecursionandnegationtogethermakeDatalog executioncomplicated– thereisanotioncalled“stratifiedsemantic”forthispurpose– computeIDBrelationsinstrata/layersbeforetakinganegation– notcoveredinthisclass

• Wewillonlydonegationfornon-recursiveDatalog

Page 47: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Unsafe/SafeDatalog Rules

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

47

• Whatistheproblemwiththisrule?• Whatshouldthisrulereturn?

– namesofalldrinkersintheworld?– namesofalldrinkersintheUSA?– namesofalldrinkersinDurham?

Find drinkers who like beer “BestBeer” Q(x) :- Likes(x, “BestBeer”)

Find drinkers who DO NOT like beer “BestBeer”

Q(x) :- ¬Likes(x, “BestBeer”)

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

Page 48: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

48

Find drinkers who like beer “BestBeer” Q(x) :- Likes(x, “BestBeer”)

Find drinkers who DO NOT like beer “BestBeer”

Q(x) :- ¬Likes(x, “BestBeer”)

ProblemwithNegationinDatalog Rules

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

• Whatistheproblemwiththisrule?• Dependenton“domain”ofdrinkers

– domain-dependent– infiniteanswerspossibletoo..

• keepgenerating“names”

Page 49: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

49

• Solution:• Restrictto“activedomain”ofdrinkersfromtheinputLikes (orFrequents)relation– “domain-independence”– samefiniteansweralways

• Becomesa“saferule”

Find drinkers who like beer “BestBeer” Q(x) :- Likes(x, “BestBeer”)

Find drinkers who DO NOT like beer “BestBeer”

Q(x) :- ¬Likes(x, “BestBeer”)

ProblemwithNegationinDatalog Rules

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

Q(x) :- Likes(x, y), ¬Likes(x, “BestBeer”)

Page 50: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer (so much) that they frequent all bars that serve it

CompSci516:DataIntensiveComputingSystems

50

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2016

RC→Datalog withnegation→ SQL(1/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

Page 51: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Step 1: Replace " with $ using de Morgan’s Laws

Q(x) = $y. Likes(x, y)∧ ¬$z.(Serves(z,y) ∧ ¬Frequents(x,z))

CompSci516:DataIntensiveComputingSystems

51

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

"x P(x) same as¬$x ¬P(x)

¬(¬P∨Q) same asP∧ ¬ Q

º Q(x) = $y. Likes(x, y)∧"z.(¬ Serves(z,y) ∨ Frequents(x,z))

DukeCS,Fall2016

RC→Datalog withnegation→SQL(2/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

P => Q same as ¬P∨Q

Page 52: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Step 1: Replace " with $ using de Morgan’s Laws

Q(x) = $y. Likes(x, y)∧ ¬$z.(Serves(z,y) ∧ ¬Frequents(x,z))

(new) Step 2: Make all subqueries domain independent

Q(x) = $y. Likes(x, y) ∧ ¬$z.(Likes(x,y)∧Serves(z,y)∧¬Frequents(x,z))

CompSci516:DataIntensiveComputingSystems

52

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2016

RC→Datalog withnegation→ SQL(3/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

Page 53: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

(new) Step 3: Create a datalog rule for some subexpressions of the form$x $y…. R(….)∧ S(….)∧ T(….)∧….

Q(x) = $y. Likes(x, y) ∧¬ $z.(Likes(x,y)∧Serves(z,y)∧¬Frequents(x,z))

H(x,y) :- Likes(x,y),Serves(z,y), not Frequents(x,z)Q(x) :- Likes(x,y), not H(x,y)

H(x,y)

CompSci516:DataIntensiveComputingSystems

53

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2016

RC→Datalog withnegation→ SQL(4/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

Page 54: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Step 4: Write it in SQL

SELECT DISTINCT L.drinker FROM Likes LWHERE ……

H(x,y) :- Likes(x,y),Serves(z,y), not Frequents(x,z)Q(x) :- Likes(x,y), not H(x,y)

54

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

RC→Datalog withnegation→ SQL(5/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

Page 55: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Step 4: Write it in SQL

SELECT DISTINCT L.drinker FROM Likes LWHERE not exists

(SELECT * FROM Likes L2, Serves SWHERE … …)

H(x,y) :- Likes(x,y),Serves(z,y), not Frequents(x,z)Q(x) :- Likes(x,y), not H(x,y)

55

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

CompSci516:DataIntensiveComputingSystems

DukeCS,Fall2016

RC→Datalog withnegation→ SQL(6/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

Page 56: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Step 4: Write it in SQL

SELECT DISTINCT L.drinker FROM Likes LWHERE not exists

(SELECT * FROM Likes L2, Serves SWHERE L2.drinker=L.drinker and L2.beer=L.beer

and L2.beer=S.beerand not exists (SELECT * FROM Frequents F

WHERE F.drinker=L2.drinkerand F.bar=S.bar))

H(x,y) :- Likes(x,y),Serves(z,y), not Frequents(x,z)Q(x) :- Likes(x,y), not H(x,y)

56

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

RC→Datalog withnegation→ SQL(7/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

Page 57: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Sometimes can simplify the SQL query by using an unsafe datalog ruleCorrectness ensured by safe outermost rule

SELECT DISTINCT L.drinker FROM Likes LWHERE not exists

(SELECT * FROM Serves SWHERE L.beer=S.beer

and not exists (SELECT * FROM Frequents FWHERE F.drinker=L.drinker

and F.bar=S.bar))

H(x,y) :- Likes(x,y),Serves(z,y), not Frequents(x,z)Q(x) :- Likes(x,y), not H(x,y) Unsafe rule

CompSci516:DataIntensiveComputingSystems

57

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2016

RC→Datalog withnegation→SQL(8/8)

Ack:slidesbyProfs.Balazinska andSuciu

RevisitexamplefromLecture4

Page 58: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

AnOverviewofDataProvenancewithAnnotations

Selected/adaptedslidesfromthekeynotebyProf.ValTannen,EDBT2010

(optionalmaterial:fullslidedeckisavailableonVal’swebpage)

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems

58

Optional/additionalslides

Page 59: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Lineage

• [Cui-Widom-Wiener’00]• Lineage:

– Givenadataitemintheview– Determinethesourcedatathatproducedit– Theprocessbywhichitwasproduced

59

optionalslide

Page 60: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Applications

• OLAP/OLAM(mining)– originofanomalousdatatoverifyreliability

• ScientificDatabases– howanswerwasproducedfromrawdata

• OnlineNetworkmonitoringandDiagnosissystem– identifyfaultysensorfromnetworkmonitors

60

optionalslide

Page 61: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

Applications

• Cleanseddatafeedback– Cleanrawdataandsendreporttosources

• Materializedviewschemaevolution– ifviewschemaischanged(newcolumnadded),recomputation maynotbenecessary

• Viewupdate– translateviewupdatestobasedataupdates

61

optionalslide

Page 62: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 62

SlidebyValTannen,EDBT2010

optionalslide

Page 63: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 63

SlidebyValTannen,EDBT2010

optionalslide

Page 64: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 64

SlidebyValTannen,EDBT2010

optionalslide

Page 65: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 65

SlidebyValTannen,EDBT2010

optionalslide

Page 66: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

DukeCS,Fall2016 CompSci516:DataIntensiveComputingSystems 66

SlidebyValTannen,EDBT2010

optionalslide

Page 67: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

HasAsthma

Ann

Bob

Friend

Ann Joe

Ann Tom

Bob Tom

Smoker

Joe

Tom

Booleanquery Q():- HasAsthma (x),Friend(x,y),Smoker(y)

• x,y,z∈ {0,1}• 1=present• 0 =absent• Therearethreealternativewaystoderivethe“true”answer

x1x2

z1z2

y1y2y3

67

ProvenanceExample

ProvenanceFQ,D =x1y1z1 +x1y2z2 +x2y3z2

optionalslide

Page 68: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

HasAsthma

Ann

Bob

Friend

Ann Joe

Ann Tom

Bob Tom

Smoker

Joe

Tom

Booleanquery Q:$ x$ yHasAsthma (x)Ù Friend(x,y)Ù Smoker(y)

• x,y,z∈ {0,1}

• WhathappensifAnnisdeleted?– Doestheanswerchangetofalsefromtrue?

x1x2

z1z2

y1y2y3

68

Applicationto“DeletionPropagation”

ProvenanceFQ,D =x1y1z1 +x1y2z2 +x2y3z2 [Greenetal.’07]

optionalslide

Page 69: CompSci516 Data Intensive Computing Systems - cs.duke.edu · Duke CS, Fall 2016 CompSci 516: Data Intensive Computing Systems 4. Brief History of Datalog • Motivated by Prolog –

HasAsthma

Ann

Bob

Friend

Ann Joe

Ann Tom

Bob Tom

Smoker

Joe

Tom

Booleanquery Q:$ x$ yHasAsthma (x)Ù Friend(x,y)Ù Smoker(y)

• x,y,z∈ {0,1}

• WhathappensifAnnisdeleted?– Doestheanswerchangetofalsefromtrue?

• Noneedtore-evaluatethequery– justpluginx1=0andevaluate

x1x2

z1z2

y1y2y3

69

Applicationto“DeletionPropagation”

ProvenanceFQ,D =x1y1z1 +x1y2z2 +x2y3z2

0

[Greenetal.’07]

optionalslide