lecture 8 - web.stanford.edu

82
Lecture 8 HASHING!!!!!

Upload: others

Post on 13-Jun-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8 - web.stanford.edu

Lecture8HASHING!!!!!

Page 2: Lecture 8 - web.stanford.edu

Announcements

• HW3dueFriday!

• HW4postedFriday!

• Q:WherecanIseeexamplesofproofs?• LectureNotes• CLRS• HWSolutions

• Officehours:linesarelongL

• Solutions:• Wewillbe(more)mindfulofthroughput.• GetmoreTAs• Stopassigninghomework• UsePiazza!• Startearly. (TherearenolinesonMonday!)

Page 3: Lecture 8 - web.stanford.edu

Today:hashing

n=9buckets

1

2

3

9

13

22

43

9…

NIL

NIL

NIL

NIL

#

Page 4: Lecture 8 - web.stanford.edu

Outline

• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.

• likeself-balancingbinarytrees

• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.

• Hashfamiliesarethemagicbehindhashtables.

• Universalhashfamiliesareevenmoremagic.

Page 5: Lecture 8 - web.stanford.edu

Goal:JustlikeonMonday

• WeareinterestinginputtingnodeswithkeysintoadatastructurethatsupportsfastINSERT/DELETE/SEARCH.

• INSERT

• DELETE

• SEARCH

5

datastructure

5

4

52

HEREITIS

nodewithkey“2”

Page 6: Lecture 8 - web.stanford.edu

Today:

• Hashtables:

• O(1)expectedtimeINSERT/DELETE/SEARCH

• Worseworst-caseperformance,butoftengreatinpractice.

OnMonday:

• Selfbalancingtrees:

• O(log(n))deterministicINSERT/DELETE/SEARCH

#prettysweet

#evensweeterinpractice

eg,Python’sdict,Java’sHashSet/HashMap,C++’sunordered_map

Hashtablesareusedfordatabases,caching,objectrepresentation,…

Page 7: Lecture 8 - web.stanford.edu

OnewaytogetO(1)time

• Sayallkeysareintheset{1,2,3,4,5,6,7,8,9}.

• INSERT:

• DELETE:

• SEARCH:

9 6 3 5

4 5 6 7 8 9

963 5

1 2 3

6

3 2

3ishere.

Thisiscalled

“directaddressing”

Page 8: Lecture 8 - web.stanford.edu

Thatshouldlookfamiliar

• KindoflikeBUCKETSORT fromLecture6.

• Sameproblem:ifthekeysmaycomefromauniverse U={1,2,….,10000000000}….

Page 9: Lecture 8 - web.stanford.edu

Thesolutionthenwas…• Putthingsinbucketsbasedononedigit.

1 2 3 4 5 6 7 8 90

345

50 1321

101

1

234

21 345 13 101 50 234 1

INSERT:

NowSEARCH 21

It’sinthisbucketsomewhere…

gothroughuntilwefindit.

Page 10: Lecture 8 - web.stanford.edu

22 342 12 102 52 232 2

INSERT:

Problem…

1 2 3 4 5 6 7 8 90

342

52

12

22

102

2

232

NowSEARCH 22….thishasn’tmade

ourliveseasier…

Page 11: Lecture 8 - web.stanford.edu

Hashtables

• Thatwasanexampleofahashtable.

• notaverygoodone,though.

• Wewillbemoreclever(andlessdeterministic) aboutourbucketing.

• Thiswillresultinfast(expectedtime)INSERT/DELETE/SEARCH.

Page 12: Lecture 8 - web.stanford.edu

Butfirst!Terminology.• WehaveauniverseU,ofsizeM.

• Misreallybig.

• Butonlyafew(sayatmostnfortoday’slecture)elementsofMareevergoingtoshowup.

• Miswaaaayyyyyyy biggerthann.

• Butwedon’tknowwhichoneswillshowupinadvance.

Allofthekeysinthe

universeliveinthis

blob.

UniverseU

Afewelementsarespecial

andwillactuallyshowup.

Example:Uisthesetofallstringsofatmost

140ascii characters.(128140 ofthem).

TheonlyoneswhichIcareaboutarethose

whichappearastrendinghashtagson

twitter.#hashinghashtags

Therearewayfewerthan128140 ofthese.

Examplesaside,I’mgoingtodrawelementslikeI

alwaysdo,asblueboxeswithintegersinthem…

Page 13: Lecture 8 - web.stanford.edu

Thepreviousexamplewiththisterminology

• WehaveauniverseU,ofsizeM.• atmostnofwhichwillshowup.

• Mis waaaayyyyyy biggerthann.

• WewillputitemsofUintonbuckets.

• Thereisahashfunction h:U →{1,…,n}whichsayswhatelementgoesinwhatbucket.

Allofthekeysinthe

universeliveinthis

blob.

UniverseU

nbuckets1

2

3

h(x)=least

significantdigitofx.

Forthislecture,I’massumingthatthe

numberofthingsisthesameasthe

numberofbuckets,botharen.

Thisdoesn’thavetobethecase,

althoughwedowant:

#buckets=O(#thingswhichshowup)

Page 14: Lecture 8 - web.stanford.edu

Thisisahashtable(withchaining)

• Arrayofnbuckets.

• Eachbucketstoresalinkedlist.• WecaninsertintoalinkedlistintimeO(1)

• TofindsomethinginthelinkedlisttakestimeO(length(list)).

• h:U → {1,…,n}canbeanyfunction:• butforconcretenesslet’sstickwithh(x)=leastsignificantdigitofx.

nbuckets(sayn=9)

1

2

3

9

13 22 43

Fordemonstration

purposesonly!

Thisisaterriblehash

function!Don’tusethis!

9

INSERT:

13

22

43

9

SEARCH43:

Scanthroughalltheelementsin

bucketh(43)=3.

Page 15: Lecture 8 - web.stanford.edu

Aside:Hashtableswithopenaddressing

• Thepreviousslideisabouthashtableswithchaining.

• There’salsosomethingcalled“openaddressing”

• ReadinCLRSifyouareinterested!

n=9buckets

1

2

3

9

13 43

Thisisa“chain”

n=9buckets

1

2

3

9

13

43

\end{Aside}

Page 16: Lecture 8 - web.stanford.edu

Thisisahashtable(withchaining)

• Arrayofnbuckets.

• Eachbucketstoresalinkedlist.• WecaninsertintoalinkedlistintimeO(1)

• TofindsomethinginthelinkedlisttakestimeO(length(list)).

• h:U → {1,…,n}canbeanyfunction:• butforconcretenesslet’sstickwithh(x)=leastsignificantdigitofx.

nbuckets(sayn=9)

1

2

3

9

13 22 43

Fordemonstration

purposesonly!

Thisisaterriblehash

function!Don’tusethis!

9

INSERT:

13

22

43

9

SEARCH43:

Scanthroughalltheelementsin

bucketh(43)=3.

Page 17: Lecture 8 - web.stanford.edu

IPython notebooktime

• (Seemstowork!)

• (Willthisexamplebeagoodidea?)

Page 18: Lecture 8 - web.stanford.edu

SometimesthisagoodideaSometimesthisisabadidea

• Howdowepickthatfunctionsothatthisisagoodidea?

1. Wewanttheretobenotmanybuckets(say,n).

• Thismeanswedon’tusetoomuchspace

2. Wewanttheitemstobeprettyspread-outinthebuckets.

• ThismeansitwillbefasttoSEARCH/INSERT/DELETE

n=9buckets

1

2

3

9

13

22

43

9

n=9buckets

1

2

3

9

13 43

21

93

vs.

Page 19: Lecture 8 - web.stanford.edu

Worst-caseanalysis

• Designafunctionh:U->{1,…,n} sothat:

• Nomatterwhatinput(fewerthannitemsofU)abadguychooses,thebucketswillbebalanced.

• Here,balancedmeansO(1)entriesperbucket.

• Ifwehadthis,thenwe’dachieveourdreamofO(1)INSERT/DELETE/SEARCH

Canyoucomeupwith

suchafunction?

Page 20: Lecture 8 - web.stanford.edu
Page 21: Lecture 8 - web.stanford.edu

Wereallycan’tbeatthebadguyhere.

.

UniverseU

h(x)nbuckets

Theseareallthethingsthat

hashtothefirstbucket.

• TheuniverseUhasM items

• Theygethashedintonbuckets

• AtleastonebuckethasatleastM/nitemshashedtoit.

• MisWAAYYYYYbigger thenn,soM/nisbiggerthann.

• Badguychoosesnoftheitemsthatlandedinthis

veryfullbucket.

Page 22: Lecture 8 - web.stanford.edu

Solution:

Randomness

Page 23: Lecture 8 - web.stanford.edu

Thegame

13 22 43 92

1. Anadversarychoosesanynitems

𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence

ofINSERT/DELETE/SEARCH

operationsonthoseitems.

2. You,thealgorithm,

choosesarandom hash

functionℎ: 𝑈 → {1,… , 𝑛}.

3. HASHITOUT

1

2

3

n

13

22

92

437

7

Whatdoes

randommean

here?Uniformly

random?

Pluckythepedanticpenguin

INSERT13,INSERT22,INSERT43,

INSERT92,INSERT7,SEARCH43,

DELETE92,SEARCH7,INSERT92

#hashpuns

Page 24: Lecture 8 - web.stanford.edu

Example

• Saythathis uniformlyrandom.

• Thatmeansthath(1)isauniformlyrandom numberbetween1andn.

• h(2)isalsoauniformlyrandomnumberbetween1andn,independentofh(1).

• h(3)isalsoauniformlyrandom numberbetween1andn,independentofh(1),h(2).

• …

• h(n)isalsoauniformlyrandom numberbetween1andn,independentofh(1),h(2),…,h(n-1).

Universe

U

nbucke

ts

h

Page 25: Lecture 8 - web.stanford.edu

Whyshouldthathelp?

Intuitively:Thebadguycan’tfoilahash

functionthathedoesn’tyetknow.

Whynot?Whatifthere’ssomestrategy

thatfoilsarandomfunctionwithhigh

probability?

We’llneedtodosomeanalysis…

Page 26: Lecture 8 - web.stanford.edu

Whatdowewant?

1

2

3

n

14

22

92

43

8

7 ui 32 5 15

It’sbad iflotsofitemslandinui’s bucket.

Sowewantnotthat.

Page 27: Lecture 8 - web.stanford.edu

Moreprecisely

1

2

3

n

14

22

92

43

8

ui

• Wewant:• Forallui thatthebadguychose

• E[numberofitemsinui ‘sbucket]≤ 2.

• Ifthatwerethecase,• Foreachoperationinvolvingui• E[timeofoperation]=O(1)

So,inexpectation,

itwouldtakesO(1)timeper

INSERT/DELETE/SEARCH

operation.

Page 28: Lecture 8 - web.stanford.edu

Sowewant:

• Foralli=1,…,n,

E[numberofitemsinui ‘sbucket]≤ 2.

Page 29: Lecture 8 - web.stanford.edu

Aside:whynot:

• Foralli=1,…,n:

E[numberofitemsinbucketi ]≤ ___?

1

2

3

n

14 22 92

43 8

thishappenswith

probability1/n

Supposethat:

1

2

3

n

14 22 92

43 8

andthishappens

withprobability1/netc.

ThenE[numberofitemsinbucketi ]=1foralli.

ButP{thebucketsgetbig}=1.

Thisslide

skippedinclass

Page 30: Lecture 8 - web.stanford.edu

Expectednumberofitemsinui’s bucket?

UniverseU

nbucke

ts

h

ujui

• 𝐸 = ∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7&78"

• = 1 +∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7�7;6

• = 1 +∑ 1/𝑛�7;6

• = 1 +&="

&≤ 2.

That’swhat

wewanted.youwillverify

thisonHW

COLLISION!

hisuniformlyrandom

Page 31: Lecture 8 - web.stanford.edu

That’sgreat!

• Foralli=1,…,n,

• E[numberofitemsinui ‘sbucket]≤ 2

• Thisimplies(aswesawbefore):

• Foranysequence ofINSERT/DELETE/SEARCHoperationsonanynelementsofU,theexpectedruntime(overtherandomchoiceofh)isO(1)peroperation.

So,thesolutionis:

pickauniformlyrandomhashfunction.

Page 32: Lecture 8 - web.stanford.edu

Theelephantintheroom

Page 33: Lecture 8 - web.stanford.edu

Theelephantintheroom

How do we do that?

Page 34: Lecture 8 - web.stanford.edu

Let’simplementthis!

• IPython NotebookforLecture8

Page 35: Lecture 8 - web.stanford.edu

Let’s NOT implementthis!

• SupposeU={allofthepossiblehashtags}

• Ifwecompletelychoosetherandomfunctionupfront,wehavetoiteratethroughallofU.

• 128140possibleASCIIstringsoflength140.

• (Morethanthenumberofparticlesintheuniverse)

• Andevenignoringthetimeconsiderations

• Wehavetostoreh(x)foreveryx.

Issues:

Page 36: Lecture 8 - web.stanford.edu

Anotherthought…

• Justrememberhontherelevantvalues

Algorithmnow Algorithmlater

1322

4392

7

h(13)=6

h(13)=6

h(22)=3

h(92)=3

Page 37: Lecture 8 - web.stanford.edu

Howmuchspacedoesittake

tostoreh?

• ForeachelementxofU:

• storeh(x)

• (whichisarandomnumberin{1,…,n}).

• Storinganumberin{1,..,n}takeslog(n)bits.

• SostoringMofthemtakesMlog(n)bits.

• Incontrast,directaddressingwouldrequireMbits.

Page 38: Lecture 8 - web.stanford.edu

Hangonnow

• Sure,that wayofstoringthefunctionhwon’twork.

• Butmaybethere’sanotherway?

Page 39: Lecture 8 - web.stanford.edu

Aside:descriptionlength

• SayIhaveasetSwithsthingsinit.

• IgettowritedowntheelementsofShoweverIlike.

• (inbinary)

• HowmanybitsdoIneed?

S

I’llcallthisone“Fido”Thisoneisnamed“Hercules”

Or,01101011Or,101

Onboard:theanswerislog(s)

Page 40: Lecture 8 - web.stanford.edu

Spaceneededtostorearandomfn h?

• Saythatthiselephant-shapedblobrepresentstheset

ofallhashfunctions.

• IthassizenM.(Reallybig!)

• Towritedownarandomhashfunction,weneed

log(nM)=Mlog(n)bits.L

Page 41: Lecture 8 - web.stanford.edu

Solution

• Pickfromasmallersetoffunctions.

Acleverlychosen subset

offunctions.Wecallsuch

asubsetahashfamily.

Weneedonlylog|H|bits

tostoreanelementofH.H

Page 42: Lecture 8 - web.stanford.edu

Outline

• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.

• likeself-balancingbinarytrees

• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.

• Hashfamiliesarethemagicbehindhashtables.

• Universalhashfamiliesareevenmoremagic.

Page 43: Lecture 8 - web.stanford.edu

Hashfamilies

• Ahashfamilyisacollectionofhashfunctions.

”Allofthehashfunctions”is

anexampleofahashfamily.

Page 44: Lecture 8 - web.stanford.edu

Example:asmallerhashfamily

• H ={functionwhichreturnstheleastsig.digit,

functionwhichreturnsthemostsig.digit}

• PickhinHatrandom.

• Storejustonebittorememberwhichwepicked.

Thisisstillaterribleidea!

Don’tusethisexample!

Forpedagogicalpurposesonly!

H

Page 45: Lecture 8 - web.stanford.edu

Thegame

19 22 42 92

1. Anadversary(whoknowsH)choosesanyn

items𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence

ofINSERT/DELETE/SEARCHoperationson

thoseitems.

2. You,thealgorithm,choosesarandom hash

functionℎ: 𝑈 → {0,… , 9}.Chooseit

randomlyfromH.

3. HASHITOUT

0

1

2

9 19

22 92

42

00

INSERT19,INSERT22,INSERT42,

INSERT92,INSERT0,SEARCH42,

DELETE92,SEARCH0,INSERT92

#hashpuns

h0 =Most_significant_digit

h1 = Least_significant_digit

H={h0,h1}

Ipickedh1

Page 46: Lecture 8 - web.stanford.edu

Thegame

1. Anadversary(whoknowsH)choosesanyn

items𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence

ofINSERT/DELETE/SEARCHoperationson

thoseitems.

2. You,thealgorithm,choosesarandom hash

functionℎ: 𝑈 → {0,… , 9}.Chooseit

randomlyfromH.

3. HASHITOUT

0

1

2

9

11

101

#hashpuns

h0 =Most_significant_digit

h1 = Least_significant_digit

H={h0,h1}

Ipickedh1

11101

111

121

131

141

111

121

131141

Thisadversary

couldhavebeen

moreadversarial!

Page 47: Lecture 8 - web.stanford.edu

Outline

• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.

• likeself-balancingbinarytrees

• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.

• Hashfamiliesarethemagicbehindhashtables.

• Universalhashfamiliesareevenmoremagic.

Page 48: Lecture 8 - web.stanford.edu

Howtopickthehashfamily?

• Definitelynotlikeinthatexample.

• Let’sgobacktothatcomputationfromearlier….

H

Page 49: Lecture 8 - web.stanford.edu

Expectednumberofitemsinui’s bucket?

UniverseU

nbucke

ts

h

ujui

• 𝐸 = ∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7&78"

• = 1 +∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7�7;6

• = 1 +∑ 1/𝑛�7;6

• = 1 +&="

&≤ 2.

Sothenumber

ofitemsinui’s

bucketisO(1).

youwillverify

thisonHW

COLLISION!

Page 50: Lecture 8 - web.stanford.edu

Howtopickthehashfamily?

• Let’sgobacktothatcomputationfromearlier….

• 𝐸 numberofthingsinbucketℎ 𝑢6

• =∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7&78"

• = 1 +∑ 𝑃 ℎ 𝑢6 = ℎ 𝑢7�7;6

• ≤ 1 +∑ 1/𝑛�7;6

• = 1 +&="

&≤ 2.

• Allweneededwasthatthis ≤ 1/n.

Page 51: Lecture 8 - web.stanford.edu

Strategy

• PickasmallhashfamilyH,sothatwhenIchoosehrandomlyfromH,

forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

H

h

• AhashfamilyHthatsatisfiesthisis

calledauniversalhashfamily.

• ThenwestillgetO(1)-sizedbucketsin

expectation.

• Butnowthespaceweneedis

log(|H|)bits.• Hopefullyprettysmall!

InEnglish:fixany

twoelementsofU.

Theprobability

thattheycollide

underarandomh

inHissmall.

Page 52: Lecture 8 - web.stanford.edu

Sothewholeschemewillbe

nbucke

ts

h

ui

UniverseU

Choosehrandomly

fromauniversalhash

familyH

Wecanstorehinsmallspace

sinceHissosmall.

Probably

these

bucketswill

bepretty

balanced.

Page 53: Lecture 8 - web.stanford.edu

UniversalhashfamilyLet’sstareatthisdefinition

• Hisauniversalhashfamilyif:

• WhenhischosenuniformlyatrandomfromH,

forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

Youactuallysawthisinyourpre-lectureexercise!

Toads=hashfns

Icecream=items

”Like”and“Dislike”=buckets

Page 54: Lecture 8 - web.stanford.edu

Checkourunderstanding…

• Hisauniversalhashfamilyif:

• WhenhischosenuniformlyatrandomfromH,

forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

• His[somethingelse]if:

• WhenhischosenuniformlyatrandomfromH,

forall𝑢 ∈ 𝑈, forall𝑥 ∈ {0, … , 𝑛 − 1},

𝑃U∈V ℎ 𝑢6 = 𝑥 ≤1

𝑛 Arethese

different?

Slide

(probably)

skippedin

class

Page 55: Lecture 8 - web.stanford.edu

Pre-lectureexercise

Universe={vanilla,chocolate}

Buckets={like,dislike}

Toads=differentpossiblewaysofdistributingitems

Statement1:P[randomtoadlikesvanilla]=½,P[randomtoadlikeschocolate]=½

P[“vanilla”landsinthebucket“like”]=½

Statement2:P[randomtoadfeelsthesameaboutchocolateandvanilla]=½

P [vanillaandchocolatelandinthesamebucket]=½

Slideskippedinclass

Page 56: Lecture 8 - web.stanford.edu

Pre-lectureexercise

Universe={vanilla,chocolate}

Buckets={like,dislike}

Toads=differentpossiblewaysofdistributingitemsSeemliketheymightbethesame…?

Statement1:P[randomtoadlikesvanilla]=½,P[randomtoadlikeschocolate]=½

P[“vanilla”landsinthebucket“like”]=½

Statement2:P[randomtoadfeelsthesameaboutchocolateandvanilla]=½

P [vanillaandchocolatelandinthesamebucket]=½

Slideskippedinclass

Page 57: Lecture 8 - web.stanford.edu

Pre-lectureexercise

Universe={vanilla,chocolate}

Buckets={like,dislike}

Toads=differentpossiblewaysofdistributingitemsButno!1istruebut2isnot.

Statement1:P[randomtoadlikesvanilla]=½,P[randomtoadlikeschocolate]=½

P[“vanilla”landsinthebucket“like”]=½

Statement2:P[randomtoadfeelsthesameaboutchocolateandvanilla]=½

P [vanillaandchocolatelandinthesamebucket]=½

Slideskippedinclass

Page 58: Lecture 8 - web.stanford.edu

Checkourunderstanding…

• Hisauniversalhashfamilyif:

• WhenhischosenuniformlyatrandomfromH,

forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

• His[somethingelse]if:

• WhenhischosenuniformlyatrandomfromH,

forall𝑢 ∈ 𝑈, forall𝑥 ∈ {0, … , 𝑛 − 1},

𝑃U∈V ℎ 𝑢6 = 𝑥 ≤1

𝑛 Theseare

different!

Slideskippedinclass

Page 59: Lecture 8 - web.stanford.edu

Example

• Uniformlyrandomhashfunctionh

• [Wejustsawthis]

• [Ofcourse,thisonehasotherdownsides…]

• PickasmallhashfamilyH,sothatwhenIchoosehrandomlyfromH,

forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

Page 60: Lecture 8 - web.stanford.edu

Non-example

• h0 =Most_significant_digit

• h1 =Least_significant_digit

• H={h0,h1}

• [discussiononboard]

• PickasmallhashfamilyH,sothatwhenIchoosehrandomlyfromH,

forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

Page 61: Lecture 8 - web.stanford.edu

Asmalluniversalhashfamily??

• Here’sone:

• Pickaprime𝑝 ≥ 𝑀.

• Define𝑓],^ 𝑥 = 𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝

ℎ],^ 𝑥 = 𝑓],^ 𝑥 𝑚𝑜𝑑𝑛

• Claim:

𝐻 = {ℎ],^ 𝑥 ∶ 𝑎 ∈ {1,… , 𝑝 − 1}, 𝑏 ∈ {0,… , 𝑝 − 1}}

isauniversalhashfamily.

Page 62: Lecture 8 - web.stanford.edu

Saywhat?

• Example:M=p=5,n=3

• TodrawhfromH:

• Pickarandomain{1,…,4},bin{0,…,4}

• Asperthedefinition:

• 𝑓$," 𝑥 = 2𝑥 + 1𝑚𝑜𝑑5

• ℎ$," 𝑥 = 𝑓$," 𝑥 𝑚𝑜𝑑3

1,2,3,4,5a=2,b=1

1

23

40

𝑓$," 𝑥

1

23

4 0

𝑓$," 1

𝑓$," 0

𝑓$," 3

𝑓$," 4𝑓$," 2U=

1

2

3

mod3

Thisstepjust

scramblesstuffup.

Nocollisionshere!

Thisstepistheone

wheretwodifferent

elementsmightcollide.

Page 63: Lecture 8 - web.stanford.edu

Ignoringwhythisisagoodidea

• Canwestorehwithsmallspace?

• Justneedtostoretwonumbers:

• aisin{1,…,p-1}

• bisin{0,…,p-1}

• Soabout2log(p)bits

• Byourchoiceofp,that’sO(log(M))bits.

1,2,3,4,5a=2,b=1

Compare:directaddressingwasMbits!

Twitterexample:log(M)=140log(128)=980 vsM=128140

Page 64: Lecture 8 - web.stanford.edu

AnotherwaytoseethisusingonlythesizeofH

• Wehavep-1choicesfora,andpchoicesforb.

• So|H|=p(p-1)=O(M2)

• Spaceneededtostoreanelementh:

• log(M2)=O(log(M)).

O(Mlog(n))bits

perfunction

O(log(M))bits

perfunction

Page 65: Lecture 8 - web.stanford.edu

Whydoesthiswork?

• Thisisactuallyalittlecomplicated.

• Therearesomehiddenslideshereaboutwhy.

• Alsoseethelecturenotes.

• Thethingwehavetoshowisthatthecollisionprobabilityisnotverylarge.

• Intuitively,thisisbecause:

• forany(fixed,notrandom)pair𝑥 ≠ 𝑦 in{0,….,p-1},

• Ifaandbarerandom,

• ax+banday+bareindependentrandomvariables.(why?)

Page 66: Lecture 8 - web.stanford.edu

Whydoesthiswork?

• Wanttoshow:

• forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 , 𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤"

&

• aka,theprobabilityofanytwoelementscollidingissmall.

• Let’sjustfixtwoelementsandseeanexample.

• Let’sconsider𝑢6 , = 0, 𝑢7 = 1.

1

23

40

𝑓],^ 𝑥

1

23

4 0U=

1

2

3

mod3

𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝

Convince

yourselfthatit

willbethesame

foranypair!

Thisslideskippedinclass– hereforreference!

Page 67: Lecture 8 - web.stanford.edu

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• 𝑃U∈V ℎ 0 = ℎ 1 ≤"

&

• Forany𝑦j ≠ 𝑦" ∈ {0,1,2,3,4},howmanya,b aretheresothat𝑓],^ 0 = 𝑦jand𝑓],^ 1 = 𝑦"?

• Claim:it’sexactlyone.

• Proof:solvethesystemofeqs.foraandb.

1

23

40

𝑓],^ 𝑥

1

23

4 0U=

1

2

3

mod3

𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝

eg,y0 =3,y1 =1.

𝑎 ⋅ 1 + 𝑏 = 𝑦"𝑚𝑜𝑑𝑝

𝑎 ⋅ 0 + 𝑏 = 𝑦j𝑚𝑜𝑑𝑝

Thisslideskippedinclass– hereforreference!

Page 68: Lecture 8 - web.stanford.edu

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• 𝑃U∈V ℎ 0 = ℎ 1 ≤"

&

• Forany𝑦j ≠ 𝑦" ∈ {0,1,2,3,4}, exactlyonepaira,b have𝑓],^ 0 = 𝑦jand𝑓],^ 1 = 𝑦".

• If0and1collideit’sb/cthere’ssome𝑦j ≠ 𝑦"sothat:

• 𝑓],^ 0 = 𝑦jand𝑓],^ 1 = 𝑦".

• 𝑦j = 𝑦"𝑚𝑜𝑑𝑛.

1

23

40

𝑓],^ 𝑥

1

23

4 0U=

1

2

3

mod3

𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝

eg,y0 =3,y1 =1.

Thisslideskippedinclass– hereforreference!

Page 69: Lecture 8 - web.stanford.edu

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• 𝑃U∈V ℎ 0 = ℎ 1 ≤"

&

• Thenumberofa,b sothat0,1collideunderha,b isatmostthenumberof𝑦j ≠ 𝑦"sothat𝑦j = 𝑦"𝑚𝑜𝑑𝑛.

• Howmanyisthat?• Wehavepchoicesfor𝑦j,thenatmost1/noftheremainingp-1arevalidchoicesfor𝑦"…

• Soatmost𝑝 ⋅l="

&.

1

23

40

𝑓],^ 𝑥

1

23

4 0U=

1

2

3

mod3

𝑎𝑥 + 𝑏𝑚𝑜𝑑𝑝

eg,y0 =3,y1 =1.

Thisslideskippedinclass– hereforreference!

Page 70: Lecture 8 - web.stanford.edu

Theprobabilitythat0and1collideissmall

• Wanttoshow:

• 𝑃U∈V ℎ 0 = ℎ 1 ≤"

&

• The#of(a,b) sothat0,1collideunderha,b is≤ 𝑝 ⋅l="

&.

• Theprobability(overa,b)that0,1collideunderha,b is:

• 𝑃U∈V ℎ 0 = ℎ 1 ≤l⋅

mno

p

V

• = l⋅

mno

p

l l="

• ="

&.

Thisslideskippedinclass– hereforreference!

Page 71: Lecture 8 - web.stanford.edu

Thesameargumentgoesforanypair

forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

That’sthedefinitionofauniversalhashfamily.

SothisfamilyHindeeddoesthetrick.

Thisslideskippedinclass– hereforreference!

Page 72: Lecture 8 - web.stanford.edu

Butlet’scheckthatitdoes work

• BacktoIPython NotebookforLecture8…

Empiricalprobabilityofcollisionoutof100trials

Numberofpairsof(x,y).

(Outof$jj$

=19900pairs)

M=200,n=10

Page 73: Lecture 8 - web.stanford.edu

Sothewholeschemewillbe

nbucke

ts

ha,b

ui

UniverseU

Chooseaandbatrandom

andformthefunctionha,b

Wecanstorehinspace

O(log(M))sincewejustneed

tostoreaandb.

Probably

these

bucketswill

bepretty

balanced.

Page 74: Lecture 8 - web.stanford.edu

Outline

• HashtablesareanothersortofdatastructurethatallowsfastINSERT/DELETE/SEARCH.

• likeself-balancingbinarytrees

• Thedifferenceiswecangetbetterperformanceinexpectationbyusingrandomness.

• Hashfamiliesarethemagicbehindhashtables.

• Universalhashfamiliesareevenmoremagic.

Recap

Page 75: Lecture 8 - web.stanford.edu

WantO(1)INSERT/DELETE/SEARCH

• WeareinterestinginputtingnodeswithkeysintoadatastructurethatsupportsfastINSERT/DELETE/SEARCH.

• INSERT

• DELETE

• SEARCH

5

datastructure

5

4

52

HEREITIS

Page 76: Lecture 8 - web.stanford.edu

Westudiedthisgame

13 22 43 92

1. Anadversarychoosesanynitems

𝑢", 𝑢$, … , 𝑢& ∈ 𝑈,andanysequence

ofLINSERT/DELETE/SEARCH

operationsonthoseitems.

2. You,thealgorithm,

choosesarandom hash

functionℎ: 𝑈 → {1,… , 𝑛}.

3. HASHITOUT

1

2

3

n

13

22

92

437

7

INSERT13,INSERT22,INSERT43,

INSERT92,INSERT7,SEARCH43,

DELETE92,SEARCH7,INSERT92

Page 77: Lecture 8 - web.stanford.edu

Uniformlyrandomhwasgood

• Ifwechoosehuniformlyatrandom,forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

• Thatwasenoughtoensurethat,inexpectation,abucketisn’ttoofull.

Abitmoreformally:

Foranysequence ofINSERT/DELETE/SEARCHoperations

onanynelementsofU,theexpectedruntime(overthe

randomchoiceofh)isO(1)peroperation.

Page 78: Lecture 8 - web.stanford.edu

Uniformlyrandomhwasbad

• Ifweactuallywanttoimplementthis,wehavetostorethehashfunctionh.

• Thattakesalotofspace!• WemayaswellhavejustinitializedabucketforeverysingleiteminU.

• Instead,wechoseafunctionrandomlyfromasmallerset.

Page 79: Lecture 8 - web.stanford.edu

Weneededasmallersetthatstillhasthisproperty

• Ifwechoosehuniformlyatrandom,forall𝑢6 , 𝑢7 ∈ 𝑈with𝑢6 ≠ 𝑢7 ,

𝑃U∈V ℎ 𝑢6 = ℎ 𝑢7 ≤1

𝑛

Thiswasallweneededtomake

surethatthebucketswere

balancedinexpectation!

• Wecallanysetwiththatpropertya

universalhashfamily.

• WegaveanexampleofareallysmalloneJ

Page 80: Lecture 8 - web.stanford.edu

Conclusion:

• WecanbuildahashtablethatsupportsINSERT/DELETE/SEARCH inO(1)expectedtime,

• ifweknowthatonlynitemsareeverygoingtoshowup,whereniswaaaayyyyyy lessthanthesizeMoftheuniverse.

• Thespacetoimplementthishashtableis

O(nlog(M))bits.• O(n)buckets

• O(n)itemswithlog(M)bitsperitem

• O(log(M))tostorethehashfn.

• Miswaaayyyyyy biggerthann,butlog(M)probablyisn’t.

Page 81: Lecture 8 - web.stanford.edu

That’sitfordatastructures(fornow)

DataStructure:RBTrees andHashTables

Nowwecanusethesegoingforward!

Page 82: Lecture 8 - web.stanford.edu

Before NextTime

• Graphalgorithms!

• Pre-lectureexerciseforLecture9

• Introtographs

NextTime