lecture 8 - stanford...

Post on 12-Oct-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture8HASHING!!!!!

Announcements

• HW3dueFriday!

• HW4postedFriday!

Today:hashing

n=9buckets

1

2

3

9

13

22

43

9…

NIL

NIL

NIL

NIL

#

Outline

• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.

• likeself-balancingbinarytrees

• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.

• LikeQuickSort vs.MergeSort

• Hashfamiliesarethemagicbehindhashtables.

• Universalhashfamiliesareevenmoremagic.

Goal:JustlikeonMonday

• WeareinterestinginputtingnodeswithkeysintoadatastructurethatsupportsfastINSERT/DELETE/SEARCH.

• INSERT

• DELETE

• SEARCH

5

datastructure

5

4

52

HEREITIS

nodewithkey“2”

Today:

• Hashtables:

• O(1)expectedtimeINSERT/DELETE/SEARCH

• Worseworst-caseperformance,butoftengreatinpractice.

OnMonday:

• Selfbalancingtrees:

• O(log(n))deterministicINSERT/DELETE/SEARCH

#prettysweet

#evensweeterinpractice

eg,Python’sdict,Java’sHashSet/HashMap,C++’sunordered_map

Hashtablesareusedfordatabases,caching,objectrepresentation,…

OnewaytogetO(1)time

• Sayallkeysareintheset{1,2,3,4,5,6,7,8,9}.

• INSERT:

• DELETE:

• SEARCH:

9 6 3 5

4 5 6 7 8 9

963 5

1 2 3

6

3 2

3ishere.

Thisiscalled

“directaddressing”

Thatshouldlookfamiliar

• KindoflikeBUCKETSORT fromLecture6.

• Sameproblem:ifthekeysmaycomefromauniverse U={1,2,….,10000000000}….

Thesolutionthenwas…• Putthingsinbucketsbasedononedigit.

1 2 3 4 5 6 7 8 90

345

50 1321

101

1

234

21 345 13 101 50 234 1

INSERT:

NowSEARCH 21

It’sinthisbucketsomewhere…

gothroughuntilwefindit.

22 342 12 102 52 232 2

INSERT:

Problem…

1 2 3 4 5 6 7 8 90

342

52

12

22

102

2

232

NowSEARCH 22….thishasn’tmade

ourliveseasier…

Hashtables

• Thatwasanexampleofahashtable.

• notaverygoodone,though.

• Wewillbemoreclever(andlessdeterministic) aboutourbucketing.

• Thiswillresultinfast(expectedtime)INSERT/DELETE/SEARCH.

Butfirst!Terminology.• WehaveauniverseU,ofsizeM.

• Misreallybig.

• Butonlyafew(sayatmostnfortoday’slecture)elementsofMareevergoingtoshowup.

• Miswaaaayyyyyyy biggerthann.

• Butwedon’tknowwhichoneswillshowupinadvance.

Allofthekeysinthe

universeliveinthis

blob.

UniverseU

Afewelementsarespecial

andwillactuallyshowup.

Example:Uisthesetofallstringsofatmost

140ascii characters.(128140 ofthem).

TheonlyoneswhichIcareaboutarethose

whichappearastrendinghashtagson

twitter.#hashhashtags

Therearewayfewerthan128140 ofthese.

Examplesaside,I’mgoingtodrawelementslikeI

alwaysdo,asblueboxeswithintegersinthem…

Thepreviousexamplewiththisterminology

• WehaveauniverseU,ofsizeM.• atmostnofwhichwillshowup.

• Mis waaaayyyyyy biggerthann.

• WewillputitemsofUintonbuckets.

• Thereisahashfunction h:U →{1,…,n}whichsayswhatelementgoesinwhatbucket.

Allofthekeysinthe

universeliveinthis

blob.

UniverseU

nbuckets1

2

3

h(x)=least

significantdigitofx.

Forthislecture,I’massumingthatthe

numberofthingsisthesameasthe

numberofbuckets,botharen.

Thisdoesn’thavetobethecase,

althoughwedowant:

#buckets=O(#thingswhichshowup)

Thisisahashtable(withchaining)

• Arrayofnbuckets.

• Eachbucketstoresalinkedlist.• WecaninsertintoalinkedlistintimeO(1)

• TofindsomethinginthelinkedlisttakestimeO(length(list)).

• h:U → {1,…,n}canbeanyfunction:• butforconcretenesslet’sstickwithh(x)=leastsignificantdigitofx.

nbuckets(sayn=9)

1

2

3

9

13 22 43

Fordemonstration

purposesonly!

Thisisaterriblehash

function!Don’tusethis!

9

INSERT:

13

22

43

9

SEARCH43:

Scanthroughalltheelementsin

bucketh(43)=3.

Aside:Hashtableswithopenaddressing

• Thepreviousslideisabouthashtableswithchaining.

• There’salsosomethingcalled“openaddressing”

• You’llseeitonyourhomeworkJ

n=9buckets

1

2

3

9

13 43

Thisisa“chain”

n=9buckets

1

2

3

9

13

43

\end{Aside}

Thisisahashtable(withchaining)

• Arrayofnbuckets.

• Eachbucketstoresalinkedlist.• WecaninsertintoalinkedlistintimeO(1)

• TofindsomethinginthelinkedlisttakestimeO(length(list)).

• h:U → {1,…,n}canbeanyfunction:• butforconcretenesslet’sstickwithh(x)=leastsignificantdigitofx.

nbuckets(sayn=9)

1

2

3

9

13 22 43

Fordemonstration

purposesonly!

Thisisaterriblehash

function!Don’tusethis!

9

INSERT:

13

22

43

9

SEARCH43:

Scanthroughalltheelementsin

bucketh(43)=3.

Thisisagoodideaaslongastherearenottoomanyelementsinthatbucket!

Themainquestion

• Howdowepickthatfunctionsothatthisisagoodidea?

1. Wewanttheretobenotmanybuckets(say,n).

• Thismeanswedon’tusetoomuchspace

2. Wewanttheitemstobeprettyspread-out inthebuckets.

• ThismeansitwillbefasttoSEARCH/INSERT/DELETE

n=9buckets

1

2

3

9

13

22

43

9

n=9buckets

1

2

3

9

13 43

21

93

vs.

Worst-caseanalysis

• Designafunctionh:U->{1,…,n} sothat:

• Nomatterwhatinput(fewerthannitemsofU)DarthVaderchooses,thebucketswillbebalanced.

• Here,balancedmeansO(1)entriesperbucket.

• Ifwehadthis,thenwe’dachieveourdreamofO(1)INSERT/DELETE/SEARCH

Takeaminutetotalktotheperson

nexttoyou.Canyoucomeupwith

suchafunction?

Wereallycan’tbeatDarthVaderhere.

.

UniverseU

h(x)nbuckets

Theseareallthethingsthat

hashtothefirstbucket.

• TheuniverseUhasM items

• Theygethashedintonbuckets

• Atleastonebucket receivesatleastM/nitems

• MisWAAYYYYYbigger thenn,soM/nisbiggerthann.

• DarthVaderchoosesnoftheitemsthatlandedinthis

veryfullbucket.

Solution:

Randomness

Thegame

13 22 43 92

1. Anadversarychoosesanynitems

�", �$, … , �& ∈ �,andanysequence

ofINSERT/DELETE/SEARCH

operationsonthoseitems.

2. You,thealgorithm,

choosesarandom hash

functionℎ: � → {1,… , �}.

3. HASHITOUT

1

2

3

n

13

22

92

437

7

Whatdoes

randommean

here?Uniformly

random?

Pluckythepedanticpenguin

INSERT13,INSERT22,INSERT43,

INSERT92,INSERT7,SEARCH43,

DELETE92,SEARCH7,INSERT92

Whyshouldthishelp?

• Saythathis uniformlyrandom.

• Thatmeansthath(1)isauniformlyrandom numberbetween1andn.

• h(2)isalsoauniformlyrandomnumberbetween1andn,independentofh(1).

• h(3)isalsoauniformlyrandom numberbetween1andn,independentofh(1),h(2).

• …

• h(n)isalsoauniformlyrandom numberbetween1andn,independentofh(1),h(2),…,h(n-1).

Universe

U

nbucke

ts

h

Whatdowewant?

1

2

3

n

14

22

92

43

8

7 ui 32 5 15

It’sbad iflotsofitemslandinui’s bucket.

Sowewantnotthat.

Moreprecisely

1

2

3

n

14

22

92

43

8

ui

• Supposethatforallui thatthebadguychose• E[numberofitemsinui ‘sbucket]≤ 2.

• Thenforeachoperationinvolvingui• E[timeofoperation]=O(1)

• Bylinearityofexpectation,

• � �������������ℎ������������

• = � ∑ ���������������BCDEFGHIC&J

• = ∑ �[���������������BCDEFGHIC&J ]

• = ∑ � 1BCDEFGHIC&J

• =O(numberofoperations)

aka,O(1)peroperation!

Sowewant:

• Foralli=1,…,n,

E[numberofitemsinui ‘sbucket]≤ 2.

Aside:whynotjust:

• Foralli=1,…,n:

E[numberofitemsinbucketi ]≤ 2?

1

2

3

n

14 22 92

43 8

thishappenswith

probability1/n

Suppose:

1

2

3

n

14 22 92

43 8

andthishappens

withprobability1/netc.

ThenE[numberofitemsinbucketi ]=1foralli.

ButP{thebucketsgetbig}=1.

Sowewant:

• Foralli=1,…,n,

E[numberofitemsinui ‘sbucket]≤ 2.

Expectednumberofitemsinui’s bucket?

UniverseU

nbucke

ts

h

ujui

• � = ∑ � ℎ �I = ℎ �N&NO"

• = 1 +∑ � ℎ �I = ℎ �NBNQI

• = 1 +∑ 1/�BNQI

• = 1 +&S"

&≤ 2.

That’swhat

wewanted.youwillverify

thisonHW

COLLISION!

That’sgreat!

• Foralli=1,…,n,

• E[numberofitemsinui ‘sbucket]≤ 2

Thisimplies(aswesawbefore):

Foranysequence ofLINSERT/DELETE/SEARCH

operationsonanynelementsofU,theexpected

runtime(overtherandomchoiceofh)isO(L).

aka,anythingDarthVadermight

pickinStep1ofthegame. aka,O(1)per

operation.

Theelephantintheroom

Theelephantintheroom

h(1)=2

h(2)=7

h(3)=9

h(4)=1

h(5)=0

h(6)=7

h(7)=2

h(8)=3

h(9)=7

h(10)=3

h(11)=4

h(12)=5

h(13)=7

h(14)=3

h(15)=2

h(16)=9

h(17)=3

h(18)=2

h(19)=1

h(20)=5

h(4511)=3

h(4512)=7

h(4513)=2

h(4514)=6

h(4515)=3

h(4516)=1

h(4517)=0

h(4518)=0

h(4519)=3

h(4520)=1

h(264511)=3

h(264512)=1

h(264513)=0

h(264514)=0

h(264515)=7

h(264516)=8

h(264517)=9

h(264518)=2

h(264519)=6

h(264520)=3

... ….

Randomizationisfine…

• Saythatthiselephant-shapedblob

representsthesetofallhashfunctions.

• Howbigisthisset?

• n|U| =nM =REALLYBIG.

• Inordertowritedown

anarbitraryelement

ofasetofsizeA,we

needlog(A)bits.

• Sowe’dneedaboutMlog(n)bits

torememberoneofthesehash

functions. That’s enough to do direct addressing!!!!

butweneedtobeabletostoreourchoiceofh!

Anotherthought…

• Justrememberhontherelevantvalues

Algorithmnow Algorithmlater

1322

4392

7

h(13)=6

h(13)=6

h(22)=3

h(92)=3

Butthat’swhatwe

wantedtobeginwith…

Solution

• Pickfromasmallersetoffunctions.

Acleverlychosen subset

offunctions.Wecallsuch

asubsetahashfamily.

Weneedonlylog|H|bits

tostoreanelementofH.

H

Howtopickthehashfamily?

• Let’sgobacktothatcomputationfromearlier….

Expectednumberofitemsinui’s bucket?

UniverseU

nbucke

ts

h

ujui

• � = ∑ � ℎ �I = ℎ �N&NO"

• = 1 +∑ � ℎ �I = ℎ �NBNQI

• = 1 +∑ 1/�BNQI

• = 1 +&S"

&≤ 2.

Sothenumber

ofitemsinui’s

bucketisO(1).

youwillverify

thisonHW

COLLISION!

Howtopickthehashfamily?

• Let’sgobacktothatcomputationfromearlier….

• � numberofthingsinbucketℎ �I

• =∑ � ℎ �I = ℎ �N&NO"

• = 1 +∑ � ℎ �I = ℎ �NBNQI

• ≤ 1 +∑ 1/�BNQI

• = 1 +&S"

&≤ 2.

• Allweneededwasthatthis ≤ 1/n.

Strategy

• PickasmallhashfamilyH,sothatwhenIchoosehrandomlyfromH,

forall�I , �N ∈ �with�I ≠ �N ,

�i∈j ℎ �I = ℎ �N ≤1

H

h

• ThenwestillgetO(1)-sizedbuckets

inexpectation.

• Butnowthespaceweneedis

log(|H|)bits.• Hopefullyprettysmall!

Sothewholeschemewillbe

nbucke

ts

h

ui

UniverseU

Choosehrandomly

fromauniversalhash

familyH

Wecanstorehinsmallspace

sinceHissosmall.

Probably

these

bucketswill

bepretty

balanced.

Whatisthisuniversalhashfamily?

• Here’sone:

• Pickaprime� ≥ �.

• Define�G,m � = �� + �����

ℎG,m � = �G,m � ����

• Claim:

� = {ℎG,m � ∶ � ∈ {1,… , � − 1}, � ∈ {0,… , � − 1}}

isauniversalhashfamily.

Saywhat?

• Example:M=p=5,n=3

• TodrawhfromH:

• Pickarandomain{1,…,4},bIn{0,…,4}

• Asperthedefinition:

• �$," � = 2� + 1���5

• ℎ$," � = �$," � ���3

1,2,3,4,5a=2,b=1

1

23

40

�$," �

1

23

4 0

�$," 1

�$," 0

�$," 3

�$," 4�$," 2U=

1

2

3

mod3

Thisstepjust

scramblesstuffup.

Nocollisionshere!

Thisstepistheone

wheretwodifferent

elementsmightcollide.

Ignoringwhythisisagoodidea…

howbigisH?

• Wehavep-1choicesfora,andpchoicesforb.

• So|H|=p(p-1)=O(M2)

• ThisismuchbetterthannM!!!!

• spaceneededtostoreh:O(log(M)).

O(Mlog(n))

bits

O(log(M))bits

Whydoesthiswork?

• Thisisactuallyalittlecomplicated.

• I’llgoovertheargumentnow,becauseit’sagoodexampleofhowtoreasonabouthashfunctions.

• Fancycounting!

• BUT! don’tworryifyoudon’tfollowallthecalculationsrightnow.

• Youcanalwaystakealookbackattheslidesorlecturenoteslater.

• Theimportantpartisthestructureoftheargument.

Whydoesthiswork?

• Wanttoshow:

• forall�I , �N ∈ �with�I ≠ �N , �i∈j ℎ �I = ℎ �N ≤"

&

• aka,theprobabilityofanytwoelementscollidingissmall.

• Let’sjustfixtwoelementsandseeanexample.

• Let’sconsider�I , = 0, �N = 1.

1

23

40

�G,m �

1

23

4 0U=

1

2

3

mod3

�� + �����

Convince

yourselfthatit

willbethesame

foranypair!

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• �i∈j ℎ 0 = ℎ 1 ≤"

&

• Forany�w ≠ �" ∈ {0,1,2,3,4},howmanya,b aretheresothat�G,m 0 = �wand�G,m 1 = �"?

• Claim:it’sexactlyone.

• Proof:solvethesystemofeqs.foraandb.

1

23

40

�G,m �

1

23

4 0U=

1

2

3

mod3

�� + �����

eg,y0 =3,y1 =1.

� ⋅ 1 + � = �"����

� ⋅ 0 + � = �w����

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• �i∈j ℎ 0 = ℎ 1 ≤"

&

• Forany�w ≠ �" ∈ {0,1,2,3,4}, exactlyonepaira,b have�G,m 0 = �wand�G,m 1 = �".

• If0and1collideit’sb/cthere’ssome�w ≠ �"sothat:

• �G,m 0 = �wand�G,m 1 = �".

• �w = �"����.

1

23

40

�G,m �

1

23

4 0U=

1

2

3

mod3

�� + �����

eg,y0 =3,y1 =1.

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• �i∈j ℎ 0 = ℎ 1 ≤"

&

• Thenumberofa,b sothat0,1collideunderha,b isatmostthenumberof�w ≠ �"sothat�w = �"����.

• Howmanyisthat?• Wehavepchoicesfor�w,thenatmost1/noftheremainingp-1arevalidchoicesfor�"…

• Soatmost� ⋅DS"

&.

1

23

40

�G,m �

1

23

4 0U=

1

2

3

mod3

�� + �����

eg,y0 =3,y1 =1.

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• �i∈j ℎ 0 = ℎ 1 ≤"

&

• The#of(a,b) sothat0,1collideunderha,b is≤ � ⋅DS"

&.

• Theprobability(overa,b)that0,1collideunderha,b is:

• �i∈j ℎ 0 = ℎ 1 ≤D⋅

yz{

|

j

• = D⋅

yz{

|

D DS"

• ="

&.

Thesameargumentgoesforanypair

forall�I , �N ∈ �with�I ≠ �N ,

�i∈j ℎ �I = ℎ �N ≤1

That’sthedefinitionofauniversalhashfamily.

SothisfamilyHindeeddoesthetrick.

Sothewholeschemewillbenbucke

ts

h

ui

UniverseUofsizeM

Chooseh

randomlyfromH

Wecanstorehinspace

O(log(M)).

TheexpectedtimetodoanyL

operationsonthesenelementsisO(L).

Recap

WantO(1)INSERT/DELETE/SEARCH

• WeareinterestinginputtingnodeswithkeysintoadatastructurethatsupportsfastINSERT/DELETE/SEARCH.

• INSERT

• DELETE

• SEARCH

5

datastructure

5

4

52

HEREITIS

Westudiedthisgame

13 22 43 92

1. Anadversarychoosesanynitems

�", �$, … , �& ∈ �,andanysequence

ofLINSERT/DELETE/SEARCH

operationsonthoseitems.

2. You,thealgorithm,

choosesarandom hash

functionℎ: � → {1,… , �}.

3. HASHITOUT

1

2

3

n

13

22

92

437

7

INSERT13,INSERT22,INSERT43,

INSERT92,INSERT7,SEARCH43,

DELETE92,SEARCH7,INSERT92

Uniformlyrandomhwasgood

• Ifwechoosehuniformlyatrandom,forall�I , �N ∈ �with�I ≠ �N ,

�i∈j ℎ �I = ℎ �N ≤1

• Thatwasenoughtoensurethat,inexpectation,abucketisn’ttoofull.

Abitmoreformally:

Foranysequence ofLINSERT/DELETE/SEARCH

operationsonanynelementsofU,theexpected

runtime(overtherandomchoiceofh)isO(L).

aka,O(1)peroperation.

Uniformlyrandomhwasbad

• Ifweactuallywanttoimplementthis,wehavetostorethehashfunctionh!

• Thattakesalotofspace!• WemayaswellhavejustinitializedabucketforeverysingleiteminU.

• Instead,wechoseafunctionrandomlyfromasmallerset.

Weneededasmallersetthatstillhasthisproperty

• Ifwechoosehuniformlyatrandom,forall�I , �N ∈ �with�I ≠ �N ,

�i∈j ℎ �I = ℎ �N ≤1

Thiswasallweneededtomake

surethatthebucketswere

balancedinexpectation!

• Wecallanysetwiththatpropertya

universalhashfamily.

• Wewereabletocomeupwithareallysmallone!

Conclusion:

• WecanbuildahashtablethatsupportsINSERT/DELETE/SEARCHinO(1)expectedtime,• ifweknowthatonlynitemsareeverygoingtoshowup,whereniswaaaayyyyyy lessthanthesizeMoftheuniverse.

• Thespacetoimplementthishashtableis

O(nlog(M)).

• Miswaaayyyyyy biggerthann,butlog(M)probablyisn’t.

NextWeek

• Graphalgorithms!

top related