ch10 tree index

9
Da ta bas e Ma n age me nt S yst ems 3e d, R. Ra ma kr is hn an and J . Ge hrke 1 Tree-StructuredIndexes Chapter9 Da ta bas e Ma n age me nt S yst ems 3e d, R. Ra ma k ri sh n an and J . Ge hrk e 2 Introduction  Asforanyindex,3alternativesfordataentries k*: Datarecordwithkeyvalue k <k,ridofdatarecordwithsearchkeyvalue k> <k,listofridsofdatarecordswithsearchkey k> Choiceisorthogonaltothe indexingtechnique usedtolocatedataentries k*. Tree-structuredindexingtechniquessupport  bothrangesearchesandequalitysearches . ISAM:staticstructure; B+tree:dynamic, adjustsgracefullyunderinsertsanddeletes. Da ta bas e Ma n age me nt S yst ems 3e d, R. Ra ma kr is hn an and J . Ge hrke 3 RangeSearches ``Findallstudentswith gpa >3.0’’ Ifdataisinsortedfile,dobinarysearchtofindfirst suchstudent,thenscantofindothers. Costofbinarysearchcanbequitehigh. Simpleidea:Createan`index’file. * Candobinarysearchon(smaller)indexfile! Page1 Page2 PageN Page3 DataFile k2 kN k1 IndexFile

Upload: yigit-guelan

Post on 06-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 1/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1

Tree-StructuredIndexes

Chapter9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2

Introduction

 

 Asforanyindex,3alternativesfordataentriesk*:¡

Datarecordwithkeyvalue k¡

<k,ridofdatarecordwithsearchkeyvaluek>¡

<k,listofridsofdatarecordswithsearchkeyk> 

Choiceisorthogonaltotheindexingtechniqueusedtolocatedataentries k*.

 

Tree-structuredindexingtechniquessupport bothrangesearchesandequalitysearches.

 

ISAM:staticstructure; B+tree:dynamic,

adjustsgracefullyunderinsertsanddeletes.

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 3

RangeSearches 

``Findallstudentswith gpa >3.0’’¢ Ifdataisinsortedfile,dobinarysearchtofindfirst

suchstudent,thenscantofindothers.¢ Costofbinarysearchcanbequitehigh.

 

Simpleidea:Createan`index’file.

* Candobinarysearchon(smaller)indexfile!

Page1 Page2 PageNPage3 DataFile

k2 kNk1 IndexFile

Page 2: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 2/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 4

ISAM

 

Indexfilemaystillbequitelarge.Butwecanapplytheidearepeatedly!

* Leafpagescontaindataentries.

P0

K1 P

1K 2 P

2K

mP m

indexentry

Non-leaf

Pages

Pages

Overflowpage

Primarypages

Leaf

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 5

CommentsonISAM

 

Filecreation:Leaf(data)pagesallocatedsequentially,sortedbysearchkey;thenindexpagesallocated,thenspaceforoverflowpages.

 

Indexentries:<searchkeyvalue,pageid>;they`direct’searchfordataentries,whichareinleafpages.

 

Search:Startatroot;usekeycomparisonstogotoleaf.CostlogFN;F=#entries/indexpg,N=#leafpgs

 

Insert:Findleafdataentrybelongsto,andputitthere. 

Delete:Findandremovefromleaf;ifemptyoverflowpage,de-allocate.

* Statictreestructure:inserts/deletesaffectonlyleafpages.

Data

Pages

IndexPages

Overflowpages

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 6

ExampleISAMTree

 

Eachnodecanhold2entries;noneedfor`next-leaf-page’pointers.(Why?)

10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*

20 33 51 63

40

Root

Page 3: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 3/9

Page 4: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 4/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 0

ExampleB+Tree

 

Searchbeginsatroot,andkeycomparisonsdirectittoaleaf(asinISAM).

 

Searchfor5*,15*,alldataentries>=24*...

* Basedonthesearchfor15*,weknow itisnotinthetree!

Root

17 24 30

2* 3 * 5 * 7 * 14 * 1 6* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*

13

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 1

B+TreesinPractice

 

Typicalorder:100.Typicalfill-factor:67%.  averagefanout=133

 

Typicalcapacities:  Height4:1334 =312,900,700records  Height3:1333 =2,352,637records

 

Canoftenholdtoplevelsinbufferpool:  Level1=1page=8Kbytes  Level2=133pages=1Mbyte 

Level3=17,689pages=133MBytes

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 2

InsertingaDataEntryintoaB+Tree 

FindcorrectleafL. 

PutdataentryontoL.  IfLhasenoughspace,done!  Else,mustsplit L(intoLandanewnodeL2)

• Redistributeentriesevenly,copyup middlekey.

• InsertindexentrypointingtoL2intoparentofL. 

Thiscanhappenrecursively  Tosplitindexnode,redistributeentriesevenly,butpushup middlekey.(Contrastwithleafsplits.)

 

Splits“grow”tree;rootsplitincreasesheight.  Treegrowth:getswider oroneleveltallerattop.

Page 5: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 5/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 3

Inserting8*intoExampleB+Tree

 

Observehowminimumoccupancyisguaranteedin

 bothleafandindexpgsplits.

 

Notedifference betweencopy-up and push-up;

 besureyouunderstandthereasonsforthis.

2* 3* 5* 7* 8*

5

Entrytobeinsertedinparentnode.(Notethat5iscontinuestoappearintheleaf.)scopiedupand

appearsonceintheindex.Contrast

5 24 30

17

13

Entrytobeinsertedinparentnode.(Notethat17ispushedupandonly

thiswithaleafsplit.)

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 4

ExampleB+TreeAfterInserting8*

v Noticethatrootwassplit,leadingtoincreaseinheight.

v Inthisexample,wecanavoidsplitbyre-distributing

entries;however,thisisusuallynotdoneinpractice.

2 * 3 *

Root

17

24 30

14* 16* 1 9* 2 0* 2 2* 2 4* 27 * 29* 33* 34* 38* 39*

135

7*5* 8*

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 5

DeletingaDataEntryfromaB+Tree

 

Startatroot,findleafL whereentrybelongs. 

Removetheentry.  IfLisatleasthalf-full,done! 

IfLhasonlyd-1entries,• Trytore-distribute,borrowingfromsibling (adjacent

nodewithsameparentasL).

• Ifre-distributionfails,merge Landsibling. 

Ifmergeoccurred,mustdeleteentry(pointingtoLorsibling)fromparentofL.

 

Mergecouldpropagatetoroot,decreasingheight.

Page 6: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 6/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 6

ExampleTreeAfter(Inserting8*,Then)Deleting19*and20*...

 

Deleting19*iseasy. 

Deleting20*isdonewithre-distribution.Noticehowmiddlekeyiscopiedup.

2 * 3 *

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 7

...AndThenDeleting24*

 

Mustmerge. 

Observe`toss’ofindexentry(onright),and  pulldown’ofindexentry(below).

30

2 2* 2 7* 29* 33* 34* 38* 39*

2* 3* 7* 14* 16* 22* 27* 29* 33* 34* 38* 39*5* 8*

Root

30135 17

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 8

ExampleofNon-leafRe-distribution

 

Treeisshownbelowduringdeletionof24*.(Whatcouldbeapossibleinitialtree?)

 

Incontrasttopreviousexample,canre-distribute

entryfromleftchildofroottorightchild.Root

135 17 20

22

30

14* 16* 17* 18* 20* 33* 34* 38* 39*22* 27* 29*21*7*5* 8*3*2*

Page 7: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 7/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 1 9

 AfterRe-distribution

 

Intuitively,entriesarere-distributedby pushingthrough’thesplittingentryintheparentnode.

 

Itsufficestore-distributeindexentrywithkey20;we’vere-distributed17aswellforillustration.

14* 16* 33* 34* 38* 39*22* 27* 29*17* 18* 20* 21*7*5* 8*2 * 3 *

Root

135

17

3020 22

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 0

PrefixKeyCompression

 

Importanttoincreasefan-out.(Why?) 

Keyvaluesinindexentriesonly`directtraffic’;canoftencompressthem.

  E.g.,IfwehaveadjacentindexentrieswithsearchkeyvaluesDannonYogurt,DavidSmith andDevarakondaMurthy,wecanabbreviateDavid SmithtoDav.(Theotherkeyscanbecompressedtoo...)

• Isthiscorrect?Notquite!WhatifthereisadataentryDaveyJones?(CanonlycompressDavidSmithtoDavi)

• Ingeneral,whilecompressing,mustleaveeachindexentrygreaterthaneverykeyvalue(inanysubtree)toitsleft.

 

Insert/deletemustbesuitablymodified.

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 1

BulkLoadingofaB+Tree 

Ifwehavealargecollectionofrecords,andwewanttocreateaB+treeonsomefield,doingso

 byrepeatedlyinsertingrecordsisveryslow. 

BulkLoading canbedonemuchmoreefficiently. 

Initialization:Sortalldataentries,insertpointertofirst(leaf)pageinanew(root)page.

3* 4* 6 * 9* 1 0* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44*

Sortedpagesofdataentries;notyetinB+treeRoot

Page 8: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 8/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 2

BulkLoading(Contd.)

 

Indexentriesforleaf

pagesalwaysenteredintoright-mostindexpagejustaboveleaflevel.Whenthisfillsup,itsplits.(Splitmaygoupright-mostpathtotheroot.)

 

Muchfasterthanrepeatedinserts,especiallywhenoneconsiderslocking!

3* 4* 6* 9* 10*11* 12*13* 20*22* 23*31* 35*36* 38*41* 44*

Root

DataentrypagesnotyetinB+tree

3523126

1 0 2 0

3* 4* 6 * 9* 1 0* 11* 12*13* 20*22* 23* 31* 35*36* 38*41* 44*

6

Root

10

12 23

20

35

38

notyetinB+tree

Dataentrypages

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 3

SummaryofBulkLoading

 

Option1:multipleinserts.  Slow.  Doesnotgivesequentialstorageofleaves.

 

Option2: BulkLoading  Hasadvantagesforconcurrencycontrol.  FewerI/Osduringbuild.  Leaveswillbestoredsequentially(andlinked,of

course).  Cancontrol“fillfactor”onpages.

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 4

 ANoteon`Order’

 

Order (d)conceptreplacedbyphysicalspacecriterioninpractice( atleasthalf-full’).

  Indexpagescantypicallyholdmanymoreentriesthanleafpages.

  Variablesizedrecordsandsearchkeysmeandifferntnodeswillcontaindifferentnumbersofentries.

  Evenwithfixedlengthfields,multiplerecordswiththesamesearchkeyvalue(duplicates)canleadtovariable-sizeddataentries(ifweuseAlternative(3)).

Page 9: Ch10 Tree Index

8/3/2019 Ch10 Tree Index

http://slidepdf.com/reader/full/ch10-tree-index 9/9

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 5

Summary

 

Tree-structuredindexesareidealforrange-searches,alsogoodforequalitysearches.

 

ISAMisastaticstructure.  Onlyleafpagesmodified;overflowpagesneeded.  Overflowchainscandegradeperformanceunlesssize

ofdatasetanddatadistributionstayconstant. 

B+treeisadynamicstructure.  Inserts/deletesleavetreeheight-balanced;logF Ncost.  Highfanout(F)meansdepthrarelymorethan3or4.  Almostalwaysbetterthanmaintainingasortedfile.

D at aba se Ma nag em ent Sys tems 3 ed, R. R am ak ri sh na n an d J. G ehrk e 2 6

Summary(Contd.)

  Typically,67% occupancyonaverage.  UsuallypreferabletoISAM,modulo locking

considerations;adjuststogrowthgracefully.  Ifdataentriesaredatarecords,splitscanchangerids!

 

Keycompressionincreasesfanout,reducesheight. 

BulkloadingcanbemuchfasterthanrepeatedinsertsforcreatingaB+treeonalargedataset.

 

Mostwidelyusedindexindatabasemanagementsystemsbecauseofitsversatility.Oneofthemost

optimizedcomponentsofaDBMS.