cs 4604: introducon to database management...

89
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #10: Query Processing

Upload: others

Post on 11-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

CS4604:Introduc0ontoDatabaseManagementSystems

B.AdityaPrakashLecture#10:QueryProcessing

Page 2: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 2

Outline

§  introduc?on§  selec?on§  projec?on§  join§  set&aggregateopera?ons

Page 3: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 3

Introduc0on

§  Today’stopic:QUERYPROCESSING§  Somedatabaseopera?onsareEXPENSIVE§  Cangreatlyimproveperformancebybeing“smart”–  e.g.,canspeedup1,000,000xovernaïveapproach

Page 4: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 4

Introduc0on(cnt’d)

§ Mainweaponsare:– cleverimplementa?ontechniquesforoperators– exploi?ng“equivalencies”ofrela?onaloperators– usingsta?s?csandcostmodelstochooseamongthese.

Page 5: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 5

AReallyBadQueryOp0mizer

§  ForeachSelect-From-Wherequeryblock– docartesianproductsfirst–  thendoselec?ons–  etc,ie.:

•  GROUPBY;HAVING•  projec?ons•  ORDERBY

§  Incrediblyinefficient– Hugeintermediateresults!

× σpredicates

tables …

Page 6: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 6

Cost-basedQuerySub-System

Query Parser

Query Optimizer

Plan Generator Plan Cost Estimator

Query Plan Evaluator

Catalog Manager

Usuallythereisaheuris?cs-basedrewri?ngstepbeforethecost-basedsteps.

Schema Sta?s?cs

Select * From Blah B Where B.blah = blah

Queries

Page 7: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 7

TheQueryOp0miza0onGame§  “Op?mizer”isabitofamisnomer…§  Goalistopicka“good”(i.e.,lowexpectedcost)plan.–  Involveschoosingaccessmethods,physicaloperators,operatororders,…

– No?onofcostisbasedonanabstract“costmodel”

Page 8: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 8

Rela0onalOpera0ons§  Wewillconsiderhowtoimplement:–  Selec3on(σ)Selectsasubsetofrowsfromrela?on.–  Projec3on(π)Deletesunwantedcolumnsfromrela?on.–  Join()Allowsustocombinetworela?ons.–  Set-difference(-)Tuplesinreln.1,butnotinreln.2.–  Union(∪)Tuplesinreln.1andinreln.2.–  Aggrega3on(SUM,MIN,etc.)andGROUPBY

§  Recall:opscanbecomposed!§  Later(akerspringbreak),we’llseehowtoop3mizequerieswithmanyops

▹◃

Page 9: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 9

SchemaforExamples

§  Similartooldschema;rnameaddedforvaria?ons.§  Sailors:–  Eachtupleis50byteslong,80tuplesperpage,500pages.–  N=500,pS=80.

§  Reserves:–  Eachtupleis40byteslong,100tuplesperpage,1000pages.– M=1000,pR=100.

Sailors(sid:integer,sname:string,ra3ng:integer,age:real)Reserves(sid:integer,bid:integer,day:dates,rname:string)

Page 10: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 10

SimpleSelec0ons

§  Oftheform§  Ques?on:howbesttoperform?

SELECT*FROMReservesRWHERER.rname<‘C%’

σ R attr valueop R. ( )

Page 11: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 11

SimpleSelec0ons

§  A:Dependson:– whatindexes/accesspathsareavailable– whatistheexpectedsizeoftheresult(intermsofnumberoftuplesand/ornumberofpages)

Page 12: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 12

SimpleSelec0ons

§  Sizeofresultapproximatedas sizeofR*reduc3onfactor– “reduc?onfactor”isalsocalledselec3vity.– es?mateofreduc?onfactorsisbasedonsta?s?cs–wewilldiscussshortly.

Page 13: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 13

Alterna0vesforSimpleSelec0ons

§ Withnoindex,unsorted:– Mustessen?allyscanthewholerela?on– costisM(#pagesinR).For“reserves”=1000I/Os.

Page 14: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 14

SimpleSelec0ons(cnt’d)

§ Withnoindex,sorted:– costofbinarysearch+numberofpagescontainingresults.

– Forreserves=10I/Os+⎡selec?vity*#pages⎤

Page 15: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 15

SimpleSelec0ons(cnt’d)

§ Withanindexonselec?onauribute:– Useindextofindqualifyingdataentries,–  thenretrievecorrespondingdatarecords.–  (Hashindexusefulonlyforequalityselec?ons.)

Page 16: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 16

UsinganIndexforSelec0ons

§  Costdependson#qualifyingtuples,andclustering.– Cost:• findingqualifyingdataentries(typicallysmall)• pluscostofretrievingrecords(couldbelargew/oclustering).

Page 17: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 17

Selec0onsusingIndex(cnt’d)

Index entries

Data entries

direct search for

(Index File) (Data file)

Data Records

data entries

Data entries

Data Records

CLUSTEREDUNCLUSTERED

Page 18: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 18

Selec0onsusingIndex(cnt’d)–  Inexample“reserves”rela?on,if10%oftuplesqualify(100pages,10,000tuples).• Withaclusteredindex,costisliulemorethan100I/Os;•  ifunclustered,couldbeupto10,000I/Os!unless…

Page 19: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 19

Selec0onsusingIndex(cnt’d)§  Importantrefinementforunclusteredindexes:

1.Findqualifyingdataentries.2.Sorttherid’softhedatarecordstoberetrieved.3.Fetchridsinorder.Thisensuresthateachdatapageislookedatjustonce(though#ofsuchpageslikelytobehigherthanwithclustering).

Page 20: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 20

GeneralSelec0onCondi0ons

§  Q:Whatwouldyoudo?

(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP

Page 21: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 21

GeneralSelec0onCondi0ons

§  Q:Whatwouldyoudo?§  A:trytofindaselec?ve(clustering)index.Specifically:

(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP

Page 22: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 22

GeneralSelec0onCondi0ons

§  Converttoconjunc3venormalform(CNF):–  (day<8/9/94ORbid=5ORsid=3)AND(rname=‘Paul’ ORbid=5ORsid=3)

§  WeonlydiscussthecasewithnoORs(aconjunc?onoftermsoftheformaUropvalue).

(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP

Page 23: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 23

GeneralSelec0onCondi0ons

§  AB-treeindexmatches(aconjunc?onof)termsthatinvolveonlyauributesinaprefixofthesearchkey.–  Indexon<a,b,c>matchesa=5ANDb=3,butnotb=3.

§  ForHashindex,musthaveallauributesinsearchkey

(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3 SKIP

Page 24: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 24

TwoApproachestoGeneralSelec0ons

§  Firstapproach:Findthecheapestaccesspath,retrievetuplesusingit,andapplyanyremainingtermsthatdon’tmatchtheindex

§  Secondapproach:getridsfromfirstindex;ridsfromsecondindex;intersectandfetch.

SKIP

Page 25: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 25

TwoApproachestoGeneralSelec0ons

§  Firstapproach:Findthecheapestaccesspath,retrievetuplesusingit,andapplyanyremainingtermsthatdon’tmatchtheindex:– Cheapestaccesspath:AnindexorfilescanwithfewestI/Os.

– Termsthatmatchthisindexreducethenumberoftuplesretrieved;othertermshelpdiscardsomeretrievedtuples,butdonotaffectnumberoftuples/pagesfetched.

SKIP

Page 26: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 26

CheapestAccessPath-Example§  Considerday<8/9/94ANDbid=5ANDsid=3.

§  AB+treeindexondaycanbeused;– then,bid=5andsid=3mustbecheckedforeachretrievedtuple.

§  Similarly,ahashindexon<bid,sid>couldbeused;– Then,day<8/9/94mustbechecked.

SKIP

Page 27: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 27

CheapestAccessPath-cnt’d

§  Considerday<8/9/94ANDbid=5ANDsid=3.

§  HowaboutaB+treeon<rname,day>?§  HowaboutaB+treeon<day,rname>?§  HowaboutaHashindexon<day,rname>?

SKIP

Page 28: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 28

Intersec0onofRIDs

§  Secondapproach:ifwehave2ormorematchingindexes(w/Alterna?ves(2)or(3)fordataentries):– Getsetsofridsofdatarecordsusingeachmatchingindex.

– Thenintersectthesesetsofrids.– Retrievetherecordsandapplyanyremainingterms.

SKIP

Page 29: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 29

Intersec0onofRIDs(cnt’d)

§  EXAMPLE:Considerday<8/9/94ANDbid=5ANDsid=3.

§  WithaB+treeindexondayandanindexonsid,§  wecanretrieveridsofrecordssa?sfyingday<8/9/94usingthefirst,

§  ridsofrecssa?sfyingsid=3usingthesecond,§  intersect,§  retrieverecordsandcheckbid=5.

SKIP

Page 30: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 30

TheProjec0onOpera0on

§  Issueisremovingduplicates.§  Basicapproach:sor?ng– 1.ScanR,extractonlytheneededaurs(why?)– 2.Sorttheresul?ngset– 3.RemoveadjacentduplicatesCost:Reserveswithsizera?o0.25=250pages.With20bufferpagescansortin2passes,so1000+250+2*2*250+250=2500I/Os

SELECTDISTINCTR.sid,R.bidFROMReservesR

Page 31: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 31

Projec0on

§  Canimprovebymodifyingexternalsortalgorithm(seechapter13):– ModifyPass0ofexternalsorttoeliminateunwantedfields.

– Modifymergingpassestoeliminateduplicates.Cost:forabovecase:read1000pages,writeout250inrunsof40pages,mergeruns=1000+250+250=1500.

SKIP

Page 32: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 32

DiscussionofProjec0on

§  Ifanindexontherela?oncontainsallwantedauributesinitssearchkey,candoindex-onlyscan.– Applyprojec?ontechniquestodataentries(muchsmaller!)

Page 33: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 33

DiscussionofProjec0on

§  Ifanordered(i.e.,tree)indexcontainsallwantedauributesasprefixofsearchkey,candoevenbeuer:– Retrievedataentriesinorder(index-onlyscan),discardunwantedfields,compareadjacenttuplestocheckforduplicates.

AB-treeindexmatches(aconjunc?onof)termsthatinvolveonlyauributesinaprefixofthesearchkey.–  Indexon<a,b,c>matchesa=5ANDb=3,butnotb=3.

ForHashindex,musthaveallauributesinsearchkey

Page 34: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 34

Joins

§  Joinsareverycommon.§  Joinscanbeveryexpensive(crossproductin

worstcase).§  Manyapproachestoreducejoincost.

Page 35: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 35

Joins

§  Jointechniqueswewillcover:– Nested-loopsjoin–  Index-nestedloopsjoin– Sort-mergejoin– Hashjoin

Page 36: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 36

EqualityJoinsWithOneJoinColumn

§  Inalgebra:RS.Common!Mustbecarefullyop?mized.R×Sislarge;so,R×Sfollowedbyaselec?onisinefficient.

§  Remember,joinisassocia?veandcommuta?ve.

SELECT*FROMReservesR1,SailorsS1WHERER1.sid=S1.sid

▹◃

Page 37: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 37

EqualityJoins

§  Assume:– MpagesinR,pRtuplesperpage,mtuplestotal– NpagesinS,pStuplesperpage,ntuplestotal–  Inourexamples,RisReservesandSisSailors.

§ Wewillconsidermorecomplexjoincondi?onslater.

§  Costmetric:#ofI/Os.Wewillignoreoutputcosts.

Page 38: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 38

Nestedloops

§  Algorithm#0:(naive)nestedloop(SLOW!)

R(A,..)

S(A, ......) m

n

Page 39: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 39

Nestedloops

§  Algorithm#0:(naive)nestedloop(SLOW!)foreachtuplerofR

foreachtuplesofSprint,iftheymatch

R(A,..)

S(A, ......) m

n

Page 40: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 40

Nestedloops

§  Algorithm#0:(naive)nestedloop(SLOW!)foreachtuplerofR

foreachtuplesofSprint,iftheymatch

R(A,..)

S(A, ......) m

n

outer relation

inner relation

Page 41: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 41

Nestedloops

§  Algorithm#0:whyisitbad?§  howmanydiskaccesses(‘M’and‘N’arethenumberofblocksfor‘R’and‘S’)?

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

Page 42: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 42

Nestedloops

§  Algorithm#0:whyisitbad?§  howmanydiskaccesses(‘M’and‘N’arethenumberofblocksfor‘R’and‘S’)?M+m*N

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

Page 43: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 43

SimpleNestedLoopsJoin

§  Actualnumber(pR*M)*N+M=100*1000*500+1000I/Os.– At10ms/IO,Total:???

§ Whatifsmallerrela?on(S)wasouter?

§ Whatassump?onsarebeingmadehere?

Page 44: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 44

SimpleNestedLoopsJoin

§  Actualnumber§  (pR*M)*N+M=100*1000*500+1000I/Os.– At10ms/IO,Total:~6days(!)

§ Whatifsmallerrela?on(S)wasouter?– slightlybeuer

§ Whatassump?onsarebeingmadehere?– 1bufferforeachtable(and1foroutput)

Page 45: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 45

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  Algorithm #1: Blocked nested-loop join –  read in a block of R

•  read in a block of S –  print matching tuples COST?

Page 46: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 46

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  Algorithm #1: Blocked nested-loop join –  read in a block of R

•  read in a block of S –  print matching tuples COST= M+M*N

Page 47: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 47

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  Which one should be the outer relation?

COST= M+M*N

Page 48: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 48

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  Which one should be the outer relation? •  A: the smallest (page-wise)

COST= M+M*N

Page 49: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 49

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  M=1000, N=500 •  Cost = 1000 + 1000*500 = 501,000 •  = 5010 sec ~ 1.4h COST= M+M*N

Page 50: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 50

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  M=1000, N=500 - if smaller is outer: •  Cost = 500 + 1000*500 = 500,500 •  = 5005 sec ~ 1.4h COST= N+M*N

Page 51: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 51

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  What if we have B buffers available?

Page 52: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 52

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  What if we have B buffers available? •  A: give B-2 buffers to outer, 1 to inner, 1 for

output

Page 53: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 53

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  Algorithm #1: Blocked nested-loop join –  read in B-2 blocks of R

•  read in a block of S –  print matching tuples COST= ?

Page 54: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 54

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  Algorithm #1: Blocked nested-loop join –  read in B-2 blocks of R

•  read in a block of S –  print matching tuples COST= M+M/(B-2)*N

Page 55: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 55

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  and, actually: •  Cost = M + ceiling(M/(B-2)) * N

COST= M+M/(B-2)*N

Page 56: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 56

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  If smallest (outer) fits in memory •  (ie., B= N+2), •  Cost =? COST= N+N/(B-2)*M

Page 57: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 57

Nestedloops

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  If smallest (outer) fits in memory •  (ie., B= N+2), •  Cost =N+M (minimum!) COST= N+N/(B-2)*M

Page 58: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 58

Nestedloops-guidelines

§  pickasouterthesmallesttable(=fewestpages)

§  fitasmuchofitinmemoryaspossible§  loopovertheinner

Page 59: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 59

§  useanexis?ngindex,orevenbuildoneonthefly

§  cost:M+m*c(c:look-upcost)

IndexNLjoin

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

Page 60: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 60

§  cost:M+m*c(c:look-upcost)§  ‘c’dependswhethertheindexisclusteredornot.

IndexNLjoin

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

Page 61: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 61

Joins

§  Jointechniqueswewillcover:– Nested-loopsjoin–  Index-nestedloopsjoin– Sort-mergejoin– Hashjoin

Page 62: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 62

Sort-mergejoin

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  sort both on joining attributed •  scan each and merge •  Cost, given B buffers?

Page 63: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 63

Sort-mergejoin

R(A,..)

S(A, ......) M pages,

m tuples N pages,

n tuples

•  Cost, given B buffers? •  ~ 2*M*logM/logB + 2*N* logN/logB + M + N

Page 64: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 64

Sort-MergeJoin§  Usefulif

Page 65: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 65

Sort-MergeJoin§  Usefulif– oneorbothinputsarealreadysortedonjoinauribute(s)

– outputisrequiredtobesortedonjoinauributes(s)

§  “Merge”phasecanrequiresomebacktrackingifduplicatevaluesappearinjoincolumn

Page 66: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 66

ExampleofSort-MergeJoin

sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.544 guppy 5 35.058 rusty 10 35.0

sid bid day rname28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber31 101 10/11/96 lubber58 103 11/12/96 dustin

Page 67: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 67

ExampleofSort-MergeJoin

§  With35,100or300bufferpages,bothReservesandSailorscanbesortedin2passes;totaljoincost:7500.

§  (while Block Nested Loop (BNL) cost: 2,500 to 15,000 I/Os)

Page 68: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 68

Sort-mergejoin

§ Worstcaseformergingphase?

§  Cost?

Page 69: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 69

Refinements

§  Alltherefinementsofexternalsor?ng§  plusoverlappingofthemergingofsor?ngwiththemergingofjoining.

Page 70: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 70

Joins

§  Jointechniqueswewillcover:– Nested-loopsjoin–  Index-nestedloopsjoin– Sort-mergejoin– Hashjoin

Page 71: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 71

§  hashjoin:usehashingfunc?onh()– hash‘R’into(0,1,...,‘max’)buckets– hash‘S’intobuckets(samehashfunc?on)–  joineachpairofmatchingbuckets

Hashjoins

R(A, ...) S(A, ......) 0

1

max

Page 72: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 72

– howtojoineachpairofpar??onsHr-i,Hs-i?– A:buildanotherhashtableforHs-i,andprobeitwitheachtupleofHr-i

Hashjoin-details

R(A, ...) S(A, ......)

Hr-0

0

1

max

Hs-0

Page 73: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 73

Hashjoin-details

§  Inmoredetail:§  Choosethe(page-wise)smallest-ifitfitsinmemory,do~NL– and,actually,buildahashtable(withh2()!=h())– andprobeit,witheachtupleoftheother

Page 74: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 74

§  whatifHs-iistoolargetofitinmain-memory?

§  A:recursivepar??oning§  moredetails(overflows,hybridhashjoins):inbook

§  costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)(why?Seenextslide)

Hashjoindetails

Page 75: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 75

CostofHash-Join

§  Inpar??oningphase,read+writebothrelns;2(M+N).Inmatchingphase,readbothrelns;M+NI/Os.

§  Inourrunningexample,thisisatotalof4500I/Os.

Page 76: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 76

§  [costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)]

§ Whatis‘enough’?sqrt(N),orsqrt(M)?

Hashjoindetails

Page 77: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 77

§  [costofhashjoin?(ifwehaveenoughbuffers:)3(M+N)]

§  Whatis‘enough’?sqrt(N),orsqrt(M)?§  A:sqrt(smallest)(why?)–  Becauseyouonlyneedenoughmemorytoholdthehashtablepar??onsofthesmallertableinmemorysoB>sizeofsmaller/B-1èB~sqrt(size-of-smaller)

Hashjoindetails Details

Page 78: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 78

Sort-MergeJoinvs.HashJoin

§  Givenaminimumamountofmemorybothhaveacostof3(M+N)I/Os.

(min.memoryforsort-merge=sqrt(largertable)usingaggressiverefinements---intextbook)(min.memoryforhash=sqrt(smallertable)---seepreviousslides)

Page 79: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 79

Sort-MergevsHashjoin

§  HashJoinPros:– ??– ??– ??

§  Sort-MergeJoinPros:– ??

Page 80: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 80

Sort-MergevsHashjoin

§  HashJoinPros:– Superiorifrela?onsizesdiffergreatly– Showntobehighlyparallelizable(beyondscopeofclass)

§  Sort-MergeJoinPros:– ??

Page 81: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 81

Sort-MergevsHashjoin

§  HashJoinPros:– Superiorifrela?onsizesdiffergreatly– Showntobehighlyparallelizable(beyondscopeofclass)

§  Sort-MergeJoinPros:– Lesssensi?vetodataskew– Resultissorted(mayhelp“upstream”operators)– goesfasterifoneorbothinputsalreadysorted

Page 82: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 82

GeneralJoinCondi0ons

§  Equali?esoverseveralauributes(e.g.,R.sid=S.sidANDR.rname=S.sname):– allpreviousmethodsapply,usingthecompositekey

Page 83: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 83

GeneralJoinCondi0ons

§  Inequalitycondi?ons(e.g.,R.rname<S.sname):§  whichmethodss?llapply?– NL–  indexNL– Sortmerge– Hashjoin

Page 84: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 84

GeneralJoinCondi0ons

§  Inequalitycondi?ons(e.g.,R.rname<S.sname):§  whichmethodss?llapply?– NL (probably,thebest!)–  indexNL (onlyifclusteredindex)– Sortmerge (doesnotapply!)(why?)– Hashjoin (doesnotapply!)(why?)

Page 85: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 85

SetOpera0ons

§  Intersec?onandcross-product:specialcasesofjoin

§  Union(Dis?nct)andExcept:similar;we’lldounion:

§  Effec?vely:concatenate;usesor?ngorhashing§  Sor?ngbasedapproachtounion:– Sortbothrela?ons(oncombina?onofallauributes).– Scansortedrela?onsandmergethem.– Alterna3ve:MergerunsfromPass0forbothrela?ons.

SKIP

Page 86: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 86

SetOpera0ons,cont’d

§  Hashbasedapproachtounion:– Par??onRandSusinghashfunc?onh.– ForeachS-par??on,buildin-memoryhashtable(usingh2),scancorrespondingR-par??onandaddtuplestotablewhilediscardingduplicates.

SKIP

Page 87: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 87

AggregateOpera0ons(AVG,MIN,etc.)

§ Withoutgrouping:–  Ingeneral,requiresscanningtherela?on.– GivenindexwhosesearchkeyincludesallauributesintheSELECTorWHEREclauses,candoindex-onlyscan.

SKIP

Page 88: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 88

Summary§  Avirtueofrela?onalDBMSs:–  queriesarecomposedofafewbasicoperators–  Theimplementa?onoftheseoperatorscanbecarefullytuned

–  Importanttodothis!

§  Manyalterna?veimplementa?ontechniquesforeachoperator–  Nouniversallysuperiortechniqueformostoperators.

“it depends” [Guy Lohman (IBM)]

Page 89: CS 4604: Introducon to Database Management Systemscourses.cs.vt.edu/~cs4604/Spring16/lectures/lecture-10.pdfTwo Approaches to General Selec0ons § First approach: Find the cheapest

Prakash2016 VTCS4604 89

Summarycont’d

§  Mustconsideravailablealterna?vesforeachopera?oninaqueryandchoosebestonebasedonsystemsta?s?cs,etc.–  Partofthebroadertaskofop?mizingaquerycomposedofseveralops.