randomized algorithms and probabilistic analysis of algorithms

7/30/2019 Randomized Algorithms and Probabilistic Analysis of Algorithms

1/65

Lecture: Randomized AlgorithmsTU/e 5MD20

5MD20Design Automation

Randomized Algorithms and ProbabilisticAnalysis of Algorithms

Phillip Stanley-M


2/65


Lecture Outline

Motivation

Probability Theory Refresher

Example Randomized Algorithm and Analysis

Tail Distribution Bounds

Example Application of Tail Bounds

Chernoff Bounds

The Probabilistic Method

Hashing

Summary of Key Ideas


3/65


What are Randomized Algorithms and Analyse

Randomized algorithms

Algorithms that make random decisions during their execution Example: Quicksort with a random pivot

Probabilistic analysis of algorithms Using probability theory to analyze the behavior of (randomized or deterministic) algorithms

Example: determining the probability of a collision of a hash function

Probability andComputation

Randomized Algorithms Probabilistic Analysis

of algorithms

Monte Carloalgorithms

Las Vegasalgorithms

May fail or return anincorrect answer

Always return rightanswer


4/65


Why Randomized Algorithms and Analyses

Why randomized algorithms? Many NP-hard problems may be easy to solve for typical inputs

One approach is to use heuristics to deal with pathological inputs

Another approach is to use randomization (of inputs, or of algorithm) to reduce the chance of worst-case behav


of algorithms


5/65Lecture: Randomized AlgorithmsTU/e 5MD20

Why Randomized Algorithms and Analyses

Why probabilistic analysis of algorithms? Naturally, if algorithm makes random decisions, performance is not deterministic

Also, deterministic algorithm behavior may vary with inputs

Probabilistic analysis also lets us estimate bounds on behavior; well talk about such bounds today


of algorithms



Theoretical Foundations

Probability theory (things you covered in 2S610, 2nd year) Probability spaces

Events

Random variables

Characteristics of random variables

Combinatorics & number theory (some things you might have seen in 2D Many relations come in handy in simplifying analysis

Algorithm analysis

We will review relevant material in the next half hour



Lecture Outline

Motivation




Example Application of Tail Bounds Chernoff Bounds


Hashing





Probability space, (, ,), defines

The possible occurrences (simple events), sets of occurrences (subsets of), and likelihood of occurrences

Sample space, Composed of all the basic events we are concerned with

Example: for a coin toss, = {H, T}

Sigma algebra,

Possible occurrences we can build out of Example: for coin toss, = {, , H, T} Events are members of

Probability measure, A mapping from to [0, 1]

Assigns probability (a real number p [0, 1]) to events

One example of a probability measure is a probability mass function



Notation

Event sets Will start today by representing events with sets, using letters early in the alphabet e.g., A, B, ...

Events may be unitary elements or subsets of

Probability Probability of event A will be written as Pr{A}

e1

e2

e3

e4

e5

e6

e7

e8


10/65


Independence, Disjoint Events, and Union

Two events, A and Bare said to be independent, iff

Occurrence ofA does not influence outcome ofB Pr{AB} = Pr{A}Pr{B}

Note that this is different form events being mutually exclusive If two events A and B are mutually exclusive, then Pr{AB} =

For any two events E1 and E2

Pr{E1E2} = Pr{E1} + Pr{E2} Pr{E1E2}

Union bound (often comes in handy in probabilistic analysis)

Pr i1

Ei i1

PrEi


11/65


Conditional Probability

Probability of event Boccurring, given A has occurred, Pr{B| A}

Pr{B| A} = Pr{BA} Pr{A}

If events A and Bare independent: Pr{BA} = Pr{B}Pr{A}

Pr{B| A} = Pr{BA}

Pr{A}

Pr{B| A} = Pr{B}Pr{A} = Pr{B} Pr{A}


12/65


Events and Random Variables

So far, we have talked about probability and independence ofevents

Rather than work with sets, we can map events to real values

Random Variables A random variable is a function on the elements of the sample space, , used to identify elements of .

Definition: A random variable,Xon a sample space is a real-valued function on ; i.e.,X: .

We will only deal with discrete random variables, which take on a finite or countably infinite number of values

Random variables define events The occurrence of a random variable taking on a specific value defines an event

Example: Coin toss. LetXbe a random variable defining the number of heads resulting from a coin toss

Sample space, = {H, T}, sigma algebra of subsets of, = {, , {H}, {T}} X: {, , {H}, {T}} {0, 1}

Events: {X= 0}, {X= 1}

In general, an event defined on a random variableXis of the form {s |X(s) = x}


13/65


Notation

Will represent random variables with uppercase letters, late in alphabet

Example:X, Y, Z Will use the abbreviation rvarfor random variable

Events Events correspond to a random variable, say,X, (uppercase) taking on a specific value, say, x(lowercase)

Probability of rvarXtaking on the specific value xis written as Pr{X= x} or fX(x)

Example: Coin toss LetXbe an rvar representing number of heads; Pr{X= 0} = fX(0) =(for a fair coin)

R d V i bl I i i


14/65


Random Variables Intuition

So far, weve presented a lot of notation; can we gain more intuition ?

Imagine a phenomenon, that can be represented w/ real values Example: the result of rolling a die

LetXand Ybe functions mapping the result of rolling die to a number

e.g.,X= die result, : {1, 2, 3, 4, 5, 6} or Y = 2(die result)+1 : {3, 5, 7, 9, 11, 13}

Xand Yare two different functions (random variables) defined on the same set of events

Each timeXtakes on a specific value is an event For the above die rolling example, with rvarsXand Y: Pr{X= 1} = Pr{Y= 3}, Pr{X= 4} = Pr{Y= 9}, and so on

Ch t i ti f R d V i bl


15/65


Characteristics of Random Variables

Random variables and events1. We first talked about random phenomenon events in terms of sets

2. We then introduced rvars, to let us represent events with real numbers

3. When representing events with rvars, we can then look at some measures or characteristics of event phenomen

Link to randomized algorithms and analyses; will reason about: Randomized algorithms in terms ofrvars characterizing actions of the algorithm

Probabilistic analysis of algorithms in terms ofrvars characterizing properties of the alg. behavior given inp

Randomized Algorithms Probabilistic Analysisof algorithms

Ch t i ti f R d V i bl


16/65


Characteristics of Random Variables

Expectation or Expected Value, E[X], of an rvarX

Properties ofE[X] Linearity

Constant multiplier

Question What is E[XE[X]] ?

EX x

xfxx EX i

iPrX ior

Ei1n

Xi = i1n

EXi

Ec X = cEX

C Di t Di t ib ti


17/65


Common Discrete Distributions

Uniform discrete All values in range equally likely

= {a, ..., b}, = 2, : Pr{X=x} = 1/||

Bernoulli or indicatorrandom variable Success or failure in a single trial

= {0, 1}, = 2{0, 1} = {, {0}, {1}, }, : Pr{X=0} = p, Pr{X=1} = 1-p E[X] = p, Var[X] = p(1-p)

Binomial Number of successes in n trials

= n+1 = {0, 1, 2, ..., n}, = 2, : fX(k) = pk(1-p)n-k E[X] = np, Var[X] = np(1-p)

Geometric Number of trials before first failure

= = 2, : fX(k) = p(1-p)k-1 E[X] = 1/p, Var[X] = (1-p)/p2

( )nk

U f l M th ti l R lt


18/65


Useful Mathematical Results

Some useful results from number theory and combinatorics well use lat

i0

ri= 1

1r

i1

ri

=r

1r

i0

m

ri= 1r

m1

1r

1 -k

n

e

- kn

, when kis small compare

For any y, 1+y ey

i1

n

1i= ln(n) + O(1)

Lect re O tline


19/65


Lecture Outline

Motivation





Chernoff Bounds


Hashing


Probabilistic Analysis of Quicksort


20/65


Quicksort

Input: A list S= {x1, ..., xn} of n distinct elements over a totally ordered universeOutput: The elements ofSin sorted order

1. If Shas one or zero elements, return S. Otherwise continue.2. Choose an element ofSas a pivot; call it x3. Compare every other element ofSto xin order to divide the other elements into two sublists

a. S1 has all the elements ofSthat are less than x;b. S2 has all those that are greater than x.

4. Apply Quicksort to S1 and S25. Return the list S1,x, S2


Worst case performance is (n2) E.g., if input list in decreasing order and pivot choice rule is pick first element

On the other hand, if pivot always splits Sinto lists of approximately equal size, performance is O(n log n)

Question: Assuming we use the pick first element pivot choice, and input elements are chosen from a uniform discrete

distribution on a range of values, what is the expected number of comparisons?

i.e., letXbe an rvar denoting number of comparisons; what is E[X] ?



21/65



Theorem.

If the first list element is always chosen as pivot, and input is chosen uniformlyat random from all possible permutations of values in input support set, then

the expected number of comparisons made by Quicksort is 2n ln n + O(n)

Proof.

Given an input set x1, x2, ..., xn chosen uniformly at random from possible permutations, lety1, y2, ..., yn be thsame values sorted in increasing order

LetXijbe anindicator rvarthat takes on value 1 ifyiandyjare compared at any point in the algorithm, 0 othe

for some i< j. The total number of comparisons, is the total number of timesXij=1

LetXbe an rvar denoting the total number of comparisons of Quicksort. Then,

= i1n1ji1

n Xij and

where weve used the linearityproperty introduced on slide 16

E[X] = Ei1

n1

ji1n Xij

= i1n1

ji1n

EXij



22/65



Theorem.



Proof. (contd)

SinceXijis an indicator rvar, E[Xij] is the probability thatXij= 1 (from slide 17). But recall thatXijis the event ttwo elementsyiandyjare compared.

Two elementsyiandyjare compared iff either of them is first pivot selected by Quicksort from the set Yij =

yi+1, ...,yj}. This is because if any other item in Yij

were chosen as a pivot, since that item would lie between yyj, it would placeyiandyjon different sublists (and they would never be compared to each other).

Now, the order in the sublists is the same as in original list (we are in process of sorting). From theorem, we al

choose first element as pivot; since input is chosen uniformly at random from all possible permutations, any ele

of the ordering Yijis equally likely to be first in the (random ordered) input sublist.

Thus probability thatyioryj is selected as pivot, which is the probability thatyiandyjare compared, which is

probability thatXij= 1, which is E[Xij], is (from definition of discrete uniform distribution on slide 17), 2/(j-i+1).



23/65



Theorem.



Proof. (contd)

Substituting E[Xij] = 2/(j-i+1) into the expression forE[X] form slide 21:

E[X] = i1n1

ji1n 2

ji1

= i1n1k2

ni1 2k

= k2ni1

n1k 2k

=k2n

n1 k

2

k

= n1k2n 2

k 2n1

= 2n2k1n 1

k 4n

= 2n lnn On from slide 18

Randomized Quicksort


24/65


Randomized Quicksort

What if inputs are not the uniformly random selections of permutations?

How to avoid pathological inputs? pick a random pivot!

Analysis of number of comparisons is similar to foregoing analysis

Theorem.

Suppose that, whenever a pivot is chosen for Randomized Quicksort, it is

chosen independently and uniformly distributed over all possibly choices.Then, for any input, the expected number of comparisons made by

Randomized Quicksort is 2n ln n + O(n).

Almost identical to proof of expected number of comparisons for deterministic Quicksort with randomized inpu

Try doing this proof yourself as an exercise.

Lecture Outline


25/65


Lecture Outline

Motivation





Chernoff Bounds


Hashing




26/65



Weve seen one example of measures for characterizing a distribution Expectation, E[X] gives us an idea of the average value taken on by an rvar

Another important characteristic is the tail distribution Tail distribution is the probability that an rvar takes on values far from its expectation

Useful in estimating the probability of failure of randomized algorithms

Intuitively, one may think of it as Pr{|X-k| a}

We will now look at a few different bounds on tail distribution Loose bounds dont tell us much; they are often however easier to calculate

Tight(er)bounds give us a narrower range on values, but often require more information

Pr{X=x

}

x

Pr{Xa}

a

Markovs Inequality


27/65


Markov s Inequality

A loose bound that is easy to calculate is Markovs inequality We can easily calculatePr{Xa}knowing only the expectation ofX

This however often doesnt tell us much!

We will use a similar argument in the Probabilistic Methodlater today

Theorem [Markovs Inequality].

LetXbe a random variable that assumes only nonnegative values.Then, for all a> 0, Pr{Xa} (E[X] /a)

Proof.

Fora> 0, let Ibe a Bernoulli/indicator random variable, with I= 1 ifXa, 0 otherwise. Since

Xis nonnegative, IX/a. From slide 17, E[I] = Pr{I= 1} = Pr{Xa}, thus

Pr{Xa} E[X/a] = E[X]/a(from slide 16).

Moments


28/65


To derive at tighter bounds, we will need the idea of moments of an rva

Definition: kth moment The kth moment of an rvarXis E[Xk] ,

k= 0 is termed the first moment, and so on

Definition: variance The variance of an rvarXis defined as Var[X] = E[(X E[X])2]

Exercise: Show that Var[X] = E[X2] - (E[X])2

Definition: standard deviation The standard deviation of an rvarX, is [X] = Var[X]

Moments

Chebyshevs Inequality


29/65


Chebyshev s Inequality

Now that we know about Var[X], we can introduce a tighter bound on ta

Theorem [Chebyshevs Inequality].

For any a> 0, Pr{|X E[X]| a} (Var[X] /a2)

Proof.

Pr{|X E[X]| a} = Pr{(X E[X])2a2}. Since (X E[X])2 is a nonnegative rvar, we canapply Markovs inequality to yield:

Pr{(X E[X])2a2} E[(X E[X])2]/a2 = (Var[X] /a2).

Lecture Outline


30/65


Lecture Outline

Motivation





Chernoff Bounds


Hashing


Randomized Algorithm for Median, RM


31/65


Randomized Algorithm for Median, RM

Idea Find two nearby elements dand u, spanning a small set C, by sampling S

Since |C| is o(n/log n), can sorted it in o(n) time using alg. that is O(klog k) for kelements

The check in step 7 is to validate that the set Cis indeed small so that above assumption holds

Randomized Median Algorithm

Input: A set Sofn elements over a totally ordered universeOutput: The median element ofS, denoted m.

1. Pick a (multi-)set R ofn3/4elements in S, chosen independently and uniformly at

random, with replacement.

2. Sort the set R.3. Let dbe the (n3/4 -n)th smallest element in the sorted set R.4. Let u be the (n3/4 +n)th smallest element in the sorted set R.5. By comparing every element in Sto dand u, compute the set C= {xS: dxu}

and the numbers ld= |{xS: x< d}| and lu = |{xS: x> u}|6. If ld> n/2 orlu > n/2 then FAIL7. If |C| 4n3/4 then sort the set C, otherwise FAIL8. Output the (n/2- ld+ 1)th element in the sorted order ofC

What is the probability that RM Fails?


32/65


What is the probability that RM Fails?

What can go wrong? Sample might not be representative in terms of median:

e1: Y1 = |{rR | rm}| < n3/4n too few elements in sample smallerthanm,

e1: Y1 = |{rR | rm}| < n3/4n too few elements in sample largerthanm

e3: |C| > 4n3/4 sample picked from Shas dandutoo far apart

Pr{RM fails} = Pr{e1e2e3} = Pr{e1} + Pr{e2} + Pr{e3}, since the events e are disjoint

Lets look at determining probability of event e1

Reminder Bernoulli/Indicator and Binom


33/65


Reminder Bernoulli/Indicator and Binom

Bernoulli or indicatorrvar Success or failure in a single trial

Example: Coin toss, with rvarX= 1 when heads,X= 0 when tails

= {0, 1}

Pr{X=0} = p, Pr{X=1} = 1-p

E[X] = p

Var[X] = p(1-p)

Binomial rvar Number of successes in n Bernoulli trials of parameter p

Sum of n Bernoulli(p) rvars is a Binomial(n, p) rvar

= n+1 = {0, 1, 2, ..., n},

fX(k) = pk(1-p)n-k

E[X] = np

Var[X] = np(1-p)

( )nk

Determining Pr{e1}


34/65


Determining Pr{e1}

Lets define an indicator random variableXi

Xiare independent since from definition ofRM, sampling is with replacement

By definition, (n-1)/2 +1 elements in the input set Sto RM are smaller than median

So, probability that a random sample is smaller than median is

Y1 is an rvar representing # items (in sample R, of sizen3/4) smaller than median m

We can therefore write Y1 in terms ofXi as

X

i 1 if the ith sample is m

0 otherwise

PrXi 1 n12 1

n

1

2

1

2n

Y1i1n34

Xi

By definition of RM alg

Determining distribution of Y1


35/65


g

Recall (slide 33) that sum of n Bernoulli(p) rvars is Binomial(n, p), so

and

Y1i1n34

Xi

fY1y n34

y 12

1

2ny 1

2

1

2nn

34y

VarY1 n34 121

2n 1

2EY1 n34 12

1

2n

Determining Pr{e1}


36/65


g { 1}

Back to determining Pr{e1} (recall: its one of events in which RM fails).. Pr{e1} = Pr{Y1 < n3/4n}

Even though we can determine distribution of rvar Y1, determining Pr{Y1


37/65


Motivation





Chernoff Bounds


Hashing


Chernoff Bounds


38/65


Bounds are useful!

We saw in previous example how knowing about the Chebyshev inequality helped us to quanswer questions about probability of failure of a randomized algorithm

But, how tight are the bounds?

Not all bounds tell us something useful

Example: Pr{X= x} 1 is always true for any rvarXand value x, but it tells you nothing

Chernoff bounds give us tighter bounds on Pr{|X-E[X]| > a}

Pr{X=

x}

xa

Loosebound

1.0

Tighterbound

Pr{X= a} (if X is a discrete rvar)

Chernoff Bounds


39/65


Unlike Markov and Chebyshev inequalities, these are a class of bounds There are Chernoff bounds for different specific distributions

Chernoff bounds are however all formulated in terms of moment generating functions

Moment generating function for an rvarX, MX(t) = E[etX] MX(t) uniquely characterizes distribution

We will be most interested in the property that E[Xn] = MX(n)(0)

i.e. nth derivative ofMX(t) at t= 0 yields E[Xn]

Example: Moment generating function for Bernoulli rvar

(Recall: coin toss, heads or 1 with probability p, tails or 0 (1-p)):

MXt EetX

pet 11pet 0

pet1p

Chernoff Bounds


40/65


rX a PretX eta

EetXeta

Chernoff bounds generally make use of the ff. (from Markovs ineq., slid

For a sequence of independent (but not necessarily i.i.d.) indicator rvar The following Chernoff bounds (which can be derived from the above) exist:

For 0 < 1,

For 0 < < 1,

for t>0

PrX a PretX

EetXeta

for t


41/65


Problem: You have been asked to create a model of errors on a real communication interconnect

At the high communication speeds, transmitted data may be subject to bit errors

You want to estimate the probability of a bit error by measurement (e.g., eye diagrams):

How many measurement samples do you need? Can you state a precise tradeoff between the accuracy of estimate and # of samples?

Jitter

Noise

Superposed bit streams yield "eye-d

"0's"Processing element(MSP430F2274)

Interconnect; majority of interconnect

routed on bottom layer of PCB

53 mm

102 mm

measurement

Chernoff Bounds Estimating a Paramete


42/65


Estimating probability of bit error from n measurements Let pbe the probability we are trying to estimate, taking n measurements

LetX= pn be the number of measurements in which we observe bit errors

Ifn is sufficiently large, we expect pto be close to p

Confidence interval A 1 - confidence interval for a parameter p is an interval [p-, p+] such that

Pr{p [p-, p+]} 1 - i.e., Pr{np [n(p-), n(p+)]} 1 -

If actual pdoes not lie in interval, i.e., p [p- , p+] If p< p, then X> n(p+ ) (sinceX= np)

If p> p+ , then X< n(p)

We can apply the Chernoff bounds for Binomial we showed

earlierX= npis the number of observed errors in n measurements is Binomial(n, p) distr

~

~

~ ~

~ ~ ~ ~

~

~

~

~

Chernoff Bounds Estimating a Paramete


43/65


Prp p , p PrX np1 p PrX np1

en2

2pen2

3p

en2

2 en2

3

Applying Chernoff bounds

So, probability that the real pis less than away from estimated p,

can be set by performing an appropriate minimum # of measurements, n

Example: = 0.95, = 0.01n 95,430 measurements

(since p 1 by definition of prob

(applying the Binomial Chernoff b

en2

2 e

Other Applications of Parameter Estimati


44/65


Derive Chernoff bounds for distribution at hand

You cant always assume underlying distribution is Gaussion/normal

Semiconductor process / device models An important part of the modern IC design flow

Diminishing device feature sizes (~100s atoms per transistor at 45nm); statistical models

Semiconductor fabrication companies (fab houses) use test chips to characterize proce

How many test structures does one need to get a certain confidence in parameter estima

More applications Characterizing probability of device failures: how many measurements do you need?

Characterizing Probability of Device Failu


45/65


Radioactive Decay of238

U and232

Thfrom device packaging mold resin,

210Po from PbSn solder (and Al wire)

12C

!-particles!- raysLithium

Cosmic rays Thermal neutrons

High energy neutron(can penetrate up to

5 ft of concrete)

Neutron capture within Si

and B in integrated circuits

Unstable isotope

Magnesium

or

Possible interaction paths

Circuit state disturbance inducement

Micropr

electrica

Secondary ions and energetic particles may genelectron-hole pairs in silicon; these may migra

through device and aggregate, creating currepulses that lead to changes of logic state.

+

+

temperatureuctuations

}LD@

(R4),R2

Program

!x.+2x

More Applications of Randomized Algs.


46/65


Hashing: can use the basic tools introduced in the last two lectures to Determine the expected number of items in a bin

Bound on the maximum number of items in a bin

Probability of false positives when using hash functions with fingerprints

Applicable to many areas of design automation (you will see example later in this course)

Approximate set membership: Bloom filters Use probabilistic analysis to determine tradeoff b/n space and false positive probability

Hamiltonian cycles Monte Carlo algorithms (will return a Hamiltonian cycle or failure)

Lecture Outline


47/65


Motivation





Chernoff Bounds


Hashing




48/65


A method for proving the existence of objects

Why is it relevant ? The proofs are of a form that enables them to guide the creation of a randomized algorithm for finding the des

object

Basic idea: Construct a sample space such that the probability of selecting the desired object is > 0. (if the probability of picking the desired element is > 0, then the element must exist.)

Alternatively: an rvarXmust take on at least one value E[X], and at least one value E[X]

Other approaches: second moment method, Lovasz local lemma

The Probabilistic Method: Example


49/65


A multiprocessor module (left) and its logical topology (right) We want a grouping of the hardware into two sets, with a maximum number of connecting links

cpu

6

0210

cpu

2

0120

cpu

11

1021

cpu

12

1201

cpu

0

010

1

cpu

8

1010

cpu

15

121

2

cpu

1

0102

cpu

16

201

0

cpu

23

2121

cpu

17

2012

cpu

7

0212

cpu

22

2120

cpu

21

2102

cpu

4

0201

cpu

5

0202

cpu

10

1020

cpu

13

1202

cpu

18

2020

cpu

19

2021

Processing element

(MSP430F2274)

Interconnect; majority of interconnect

routed on bottom layer of PCB

53 mm

102 mm



50/65


There may also be restrictions on valid topologies due to layout constrai

We can reformulate this as finding the Maxcut of the topology graph

Maxcut: a cut of graph of maximum weight; an NP-hard problem

Well use the probabilistic method to prove that a cut with certain properties exists

Well then turn proof into a randomized algorithm for finding the desired topology

PartitionA

Partition B

This partitioning does notyield the largest number of

links for a cut of the topology



51/65


How we will approach this problem:1. Problem: topology partitioning for fault-tolerance

2. Restate as a Maxcut problem3. Existence prooffor a maxcut of value at least m/2

4. Conversion of proof into a simple randomized algorithm

Probabilistic Method: Problem Proof


52/65


Theorem [Maxcut].

Given any undirected graph G= (V, E), with n vertices and m edges, there is apartition ofVinto two disjoint sets A and B, such that at least m/2 edges

connect a vertex in A to a vertex in B, i.e., there is a cut with value at leastm/2.

Proof.

Construct sets A and Bby randomly and independently assigning each vertex to one of the twLet e1, ..., em be an arbitrary enumeration of the edges in G. Fori= 1, ..., m, defineXisuch t

Pr{edge eiconnects a vertex in A to a vertex in B} = 1/2 (since we split the vertices into trandomly).Xi is therefore an Bernoulli/indicator rvar with p= 1/2 and E[Xi] = p= 1/2.Let C(A, B) be an rvar denoting the value of the cut between A and B. Then,

Since E[C(A, B)] = m/2, there must be at least one value ofC(A, B) m/2.

Xi 1 if edge i connects A to B

0 otherwise

E

C

A, B

E

i1m

Xi

=

i1m

E

Xi

m

2

Probabilistic Method: Proof Algorithm


53/65


Basic procedure Monte Carlo or Las Vegas algorithm Repeat basic procedure a fixed number of times; return best m/2 cut or FAIL (MC)

Or, repeat procedure until we find an m/2 cut (LV)

What is the expected number of tries before we find a cut with value m We can use this as guide for number of times to repeat basic steps until we find a Maxcut

or FAIL (i.e., to direct a Monte Carlo algorithm)

Randomized Maxcut

Input: A graph Gwith n vertices and m edgesOutput: A partition ofG, into two sets A sets Bsuch that at least m/2 edges connect A and B.

1. Randomly choose a partition. This can be done in linear time by scanning through vertices

and flipping a fair coin to pick destination set as A orB.2. Check whether the selected cut is at least m/2, by counting edges crossing the cut

(polynomial time).

Probabilistic Method: Algorithm Performa


54/65


Expected number of tries before we find a cut with value m/2

Let p= Pr{C(A, B) m/2}

The value of a cut cannot be more than the number of edges, i.e., C(A, B) m

Previous proof showed that E[C(A, B)] = m/2, so,

Recall, geometric probability distribution

# trials before first failure, or, # trials before first success = , fX(k) = p(1-p)k-1, E[X] = 1/p

Expected number of tries before we find a cut is 1/p, i.e., at least m/2

m2 ECA, B

im21

iPrCA, Bi im

2

iPrCA, Bi

1p m21 p m

p1

m21

The Probabilistic Method Example Recap


55/65


A method for proving the existence of objects

Why is it relevant The proofs can be used to guide construction of a randomized algorithm

There are also techniques to turn proofs into a deterministic algorithmsderandomization

What we just saw1. A problem: topology partitioning for fault-tolerance

2. Restated as a Maxcut problem

3. Existence prooffor a Maxcut of value at least m/24. Constructed a simple randomized algorithm based on proof

5. Analysis of the expected running time of the randomized algorithm

Question: was the algorithm Monte Carlo or Las Vegas

Lecture Outline


56/65


Motivation





Chernoff Bounds


Hashing


Hashing


57/65


Hash tables Data structure that enables, on average, O(1) insertion and lookups

Useful when one would like to maintain a set of items, with fast lookup

Notation Top-level table/array, T[]

Element for insertion in hash table, x, from a set Uof possible elements

Key, k, is an identifier for x; assume we can easily map elements to integer keys

Hash function h(key[x]) specifies index in T[] where element xshould be stored

Assumptions Simple uniform hashing any element equally likely to hash to any slot

That is, h(key[x]) distributes the xelements uniformly at random over slots in T[]

Populating the Hash Table


58/65


Simplest approach: direct addressing One element in T[] for each hash key when we can afford the space cost

May make sense when number of keys to be storedis approx. number of possible keys, |U|

Collisions Want T[] to have about as many elements as well insert, n (not as many as exist, |U|)

Want h() to map larger set with |U| elements, to m slots

Since m < |U|, it is possible to have multiple elements hash to same slot

Can resolve collisions with two different approaches: chain hashing or open addressing

Chain Hashing Keep items that hash to the same slot in a linked list or chain

Will now need to search through chain for insert/delete/lookup

The ratio = n/m is called the load

0 1 2 3 5 9x1, x2, ..., x6 = {2, 0, 3, 1, 9, 5}0 1 2 3 4 5 6 7 8 9

bin or slot

x U = {0, ... 9}

Expected Search Time in Chain Hashing


59/65


Expected # of comparisons(assume new elements added to head of chain, simple unifo If element is not already in hash table (compare to all elements in bin h(key(x))): (1+)

If element is in hash table (stop when we find element in bin h(key(x))): (1+):

Proof

Assume element we seek is equally likely to be any of the n elements in table. Number of elexamined in lookup for element xis Lx= 1 + number of elements in bin h(key(x)) beforeelements seen in chain before xare were added afterxwas.

Now, we can find avg. Lx by calculating expected value over the n possible elements in ta

Let xidenote ith element inserted into table, i= 1, ..., n, and ki= key(xi). Define an indicato

, and E[Xij] = 1/m. ThusXij1 hki hkj, with probability 1m0 otherwise

E 1ni1

n

Lx E 1n i1

n

1 Xijji1

n 1ni1

n

1 EXijji1

n

1 1nm

ni1

n

1 1nmn2nn12 1n12m1 2 2nNot aconstant

Hash functions and Universal Hashing


60/65


Universal hashing At runtime, pick the hash function that will be used at random...

... from a family of universal hash functions

Universal hashing gives good average case behavior If key kis in table, expected length of chain containing kis at most 1 +

Definition [Universal Hash Function].

A finite collection, , of hash functions that map a given universe Uof keys into

the range {0, 1, ..., m-1} is said to be universal, if for each pair of distinct keys k, l,

U, the number of hash functions h for which h(k) = h(l) is at most ||/m

Other Forms: Perfect Hashing, Bloom Filt


61/65


Perfect hashing Uses two levels of hashing w/ universal hash functions: second level hashing upon collision

Can guarantee no collisions at second level

Unlike other forms of hashing, worst-case performance is O(1)

Bloom filters Tradeoff between space and false positive probability

...

For each element xi, to be inserted, calculate khashes:

T[h1(x0)] 1

T[hk(x0)] 1

Calculate khashes of e

T[h1(x)] = 1?, and

...

and T[hk(x)] = 1? theny

Insertion: Checking:

T[h1(xi)] 1

T[hk(xi)] 1

,0 1 1 0 1 0 0 1T:...

Afterkhashes, probability of a given element ofT[] being zero is

If we assume some elements still zero, probability of a false positive is then1

1

n

km

11 1nk

Other Forms: Open Addressing


62/65


All elements stored in the top-level tableT[] itself No chaining

1 since hash table can get full once its m slots are taken by elements Upon a collision, hash function defines next slot to probe until an empty slot is found

Advantages No need for pointers used in chaining: may have more slots for same memory usage

Disadvantages

Entry deletion is complicated: cant simply remove entry as it will affect probe sequence

Probe sequence strategies Linear probing, quadratic probing, double hashing

Lecture Outline


63/65


Motivation





Chernoff Bounds


Hashing


Summary Why randomized algorithms and analyses ?


64/65


Why randomized algorithms and analyses ? Analysis ofalgorithms that make use of randomness

Analysis ofalgorithms in the presence of random input

Designing algorithms that avoid pathological behavior using random decisions

Probability review Probability space, events, random variables

Characteristics of random variables: expectation, moments

Randomized algorithms and Probabilistic analysis

Tail distribution bounds Markov inequality, Chebyshev inequality, Chernoff bounds

The Probabilistic Method Proofs algorithms

Hashing example and analysis

Probing Further...


65/65


Books Kleinberg & Tardos chapter 13

Randomized Algorithms (Motwani and Raghavan) Probability and Computing (Mitzenmacher and Upfal)

randomized algorithms and probabilistic analysis of algorithms

Documents