cis435 week03

45
Summations, Probability, and Randomized Algorithms Advanced Data Structures & Algorithm Design

Upload: bansal-ashish

Post on 20-Nov-2014

135 views

Category:

Education


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Cis435 week03

Summations, Probability, and Randomized Algorithms

Advanced Data Structures & Algorithm Design

Page 2: Cis435 week03

Introduction

Often the result of analyzing an algorithm is a summation Loops directly translate to summations, and

recursion can often be reduced to a summation Section 3.1 discusses a number of useful

summation formulas and properties; refer to it as needed throughout the semester

Page 3: Cis435 week03

Bounding Summations

Being able to determine the upper bound on a summation is important if we want to actually use the summation in our analysis

We will investigate four methods of bounding summations: Using induction Bounding the terms Splitting the summations Approximation by integrals

Page 4: Cis435 week03

Bounding Summations Using Induction We’ve already discussed induction as a method for

bounding recurrences and for solving summations Induction can also be used to show the bound on a

summation, rather than the exact value of the summation As an example, we will show the bound for the following

equation:

nn

k

k O 330

Page 5: Cis435 week03

Bounding Summations Using Induction

3/2cor 1,1/c)(1/3 as long as3

3131

33

333

:1for holdsit that provemust and ,for holds bound theassume We

.1 allfor 113 ,0For

1

1

1

0

11

0

0

0

n

n

nn

n

k

nkn

k

k

kk

c

cc

c

nn

ccn

Page 6: Cis435 week03

Bounding Summations by Bounding the Terms Sometimes a series can be bound by

bounding the individual terms in the series We can quickly bound a series using the

largest term of the series, then derive a series bound from it:

2

11

n

nkn

k

n

k

Page 7: Cis435 week03

Bounding Summations by Bounding the Terms

n

kk

kn

k k

naa

aaa

1max

max1: maximum thelet weif , series afor general,In

This technique is a weak method for bounding a summation if the series can instead be bound by a geometric series.

Page 8: Cis435 week03

Bounding Summations by Bounding the Terms Suppose we have a series such that ak+1/ak

<= r for some constant r<1 and all k>=0 In other words, the ratio of consecutive elements

in the series is less than a constant value If this property holds for all k, then any

element in the series ak <= a0rk In this case, we can bound the series using

an infinite decreasing geometric series:

Page 9: Cis435 week03

Bounding Summations by Bounding the Terms

ra

ra

raa

k

k

k

kn

kk

11

0

00

00

0

Page 10: Cis435 week03

Bounding Summations by Bounding the Terms

13/21

131

32

31

3

:boundour create to of value thisuse nowcan We32

131

3/3/)1(

:is termseconsecutiv of ratio theand 1/3,just is first term The3

summation thebound want to wesuppose example,For

11

1

1

k

k

kk

k

k

kk

kr

kk

kkr

k

Page 11: Cis435 week03

Bounding Summations by Splitting Summations Difficult summations can sometimes be “split

apart” into pieces that are easier to solve individually

In these situations, the summation range is split and the summation expressed as the sum of each partition

This technique can be used to ignore a small number of initial terms, when each term in the summation is independent of n

Page 12: Cis435 week03

Bounding Summations Using Approximation by Integrals Approximating the summation through the

use of integration provides a convenient means of obtaining a bound

This technique can be used when the summation can be expressed as the sum of some f(k), where f(k) is monotonically increasing or decreasing In other words, any x > y implies f(x) > f(y)

(monotonically increasing) or f(x) < f(y) (monotonically decreasing)

Page 13: Cis435 week03

Bounding Summations Using Approximation by Integrals If f(k) is monotonically increasing, it can be

approximated by the integrals:

n

m

n

mk

n

mdxxfkfdxxf

1

1)()()(

Similarly, if f(k) is monotonically decreasing, it can be approximated by the integrals:

1

1)()()(

n

m

n

mk

n

mdxxfkfdxxf

Page 14: Cis435 week03

Bounding Summations Using Approximation by Integrals

nxdx

k

nxdx

k

nn

k

nn

k

n

k

2

12

2

1

11

1

log

1:similar is boundupper The

)1(log

1:obtain webound,lower aFor

:k1function harmonic heconsider t example,an As

Page 15: Cis435 week03

Counting Theory

Attempts to answer the question of “how many” without enumerating all the possibilities E.g., how many permutations of the string

“discovery” are there? One way to find out is to write ‘em all down Counting theory lets us calculate the answer without

having to

Page 16: Cis435 week03

Rules of sum and product

Given a set of items, we can sometimes count the items using one of these rules: Rule of sum: the number of ways to choose an

element from one of two disjoint sets is the sum of the size of the sets

Rule of product: the number of ways to choose an ordered pair is the number of ways to choose the first element times the number of ways to choose the second element

Page 17: Cis435 week03

Strings

String = a sequence of elements from the same set If the string is of length k, we sometimes call it a k-

string Given a string s, a substring s’ is an ordered

sequence of consecutive elements of s A k-substring is therefore a substring of length k

Given a set S of size n, how many k-strings are in the set?

Page 18: Cis435 week03

How many k-strings are in a set? If we have n elements, and can pick any

element for any position in the string, then there’s n choices for the first, n for the second, etc. The rule of product applies over the entire string

The answer is therefore nk

Page 19: Cis435 week03

Permutations

A permutation is an ordered sequence of all the elements of set S, with each element appearing once For example, if S = { c, a, t }, there are 6

permutations of S: cat, cta, act, atc, tca, tac

For the entire set S consisting of n elements, there are n! permutations

Page 20: Cis435 week03

K-Permutations

A k-permutation consists of k elements from S, with no element appearing more than once E.g., if S = { a, b, c, d }, then there are 12 2-

permutations: ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc

If S has n elements, then the number of k-permutations in S is (n!)/(n-k)!

Page 21: Cis435 week03

Combinations

A k-combination of an n-set S is simply a k-subset of S Combinations must be distinct, but elements in

the combination are unordered (unlike permutations)

For every k-combination, there are k! permutations Each permutation is a distinct k-permutation of S The number of k-combinations is therefore just the

number of k-permutations divided by k!:n!/k!(n-k)!

Page 22: Cis435 week03

Probability

Probability is defined in terms of a sample space S, a set whose elements are called elementary events Each elementary event is a possible outcome of

an experiment E.g., flipping two coins can result in one of 4 elementary

events, which makes up the sample space: S = { HH, HT, TH, TT }

Page 23: Cis435 week03

Probability

An event is a subset of the sample space S E.g., the event of obtaining one head and one tail

is the subset { HT, TH } The event S is called the certain event The event {} is called the null event Two events A and B are mutually exclusive if

they cannot occur simultaneously I.e., AB={}

Page 24: Cis435 week03

Probability

The probability of an event A is written Pr{A} A probability distribution is a way to map the

events of S to real numbers, such that these axioms are met: Pr{A} >= 0 for any event A Pr{S} = 1 Pr{AB} = Pr{A} + Pr{B} for any two mutually

exclusive events

Page 25: Cis435 week03

Discrete Probability Distributions A distribution is discrete

if it is defined over a finite or countably infinite sample space

A uniform distribution is a distribution such that all events are equally likely I.e., picking an element

at random

As

sA }Pr{}Pr{

Ss /1}Pr{

Page 26: Cis435 week03

Continuous Uniform Probability Distributions A probability distribution in which all subsets

of the sample space are not considered to be events

They are defined over a closed interval [a, b], with each point in the interval being equally likely Since the number of points are uncountable, we

cannot satisfy axioms 2 & 3 – the probability of each “point” is effectively 0

Page 27: Cis435 week03

Continuous Uniform Probability Distributions Given a closed range [a, b], and any closed

interval on that range [c, d] such that a <= c <= d <= b, the continuous uniform probability distribution defines the probability of the event [c, d] to be

abcddc

]},Pr{[

Page 28: Cis435 week03

Discrete Random Variables

A discrete random variable X is a function from a finite or countably infinite sample space S to the real numbers It associates a real number with each possible

outcome of an experiment This lets us work on the probability distribution

X is random in the sense that its value depends on the outcome of some experiment, and cannot be predicted with certainty before the experiment is run

Page 29: Cis435 week03

Discrete Random Variables

To use discrete random variables, we must define a probability density function: This is the probability that X is some particular

value or event It is simply the sum of the probabilities of all the

individual events represented by the random variable

xsXSs

sxX)(:

}Pr{}Pr{

Page 30: Cis435 week03

Discrete Random Variables

Let’s look at rolling 2 6-sided dice: X is a random variable defining the maximum of

the two values shown on the dice There are 36 possible elementary events (2 dice,

each has 6 faces) If we define X=3, meaning the highest value on

either die is three, then Pr{X=3} = 5/36 (five possible outcomes out of 36 total)

Page 31: Cis435 week03

Expected Value of a Random Variable This is the “average” of the values it takes on

Expected value defines the center of the distribution of the variable, i.e., if we were to run the experiment an infinite number of times, the expected value is the mean value of X over all experiments

x

xXxXE }Pr{][

Page 32: Cis435 week03

Variance and Standard Deviation Variance: Var[X] = E[X2] – E2[X]

This is a measure of how much the distribution varies

Standard Deviation: sqrt(Var[X])

Page 33: Cis435 week03

QuickSortvoid QuickSort(ArrayType &A, int begin, int end){ if ( begin < end ) { int q = Partition(A, begin, end); QuickSort(A, begin, q-1); QuickSort(A, q+1, end); }} This is a great algorithm if all inputs are equally likely

That’s not always the case! We can overcome the problem of worst case input by

introducing some randomness into the algorithm

Page 34: Cis435 week03

QuickSort

When does the worst case occur in QuickSort? Why? It occurs because partition algorithm does not

partition the array evenly One way to improve it might be to partition around

the middle element – but even so, there is still a worst case such that the array is anti-optimally partitioned

Page 35: Cis435 week03

Randomized Algorithms

An algorithm is randomized if its behavior is determined not only by the input, but also by the output of a random number generator

Let’s assume a random number generator with a function Random(a, b), that returns a random number in the range a-b

We will use a random number generator to impose a distribution such that no particular input elicits its worst case behavior

Page 36: Cis435 week03

Randomized Algorithms

A randomized strategy is useful when there are many ways an algorithm can proceed, but no good way to know which way is “good”; if many alternatives are good, you pick one randomly

The benefits of good choices must outweigh the cost of bad choices

Page 37: Cis435 week03

Randomized QuickSort

How do we randomize QuickSort? Create a RandomizedPartition function, then use

that in our main QuickSort function:

void RandomizedQuickSort(ArrayType &A, int begin, int end){ if ( begin < end ) { int q = RandomizedPartition(A, begin, end); RandomizedQuickSort(ArrayType(A, begin, q-1); RandomizedQuickSort(ArrayType(A, q+1, end); }}

Page 38: Cis435 week03

Randomized QuickSort

int RandomizedPartition(ArrayType &A, int begin, int end)

{ int i = Random(begin, end); ArrayType temp = A[end]; A[end] = A[i]; A[i] = temp; return Partition(A, begin, end);}

Page 39: Cis435 week03

Analysis of Randomized QuickSort How does this change our previous analysis?

We have added a constant factor to the Partition running time, which can be ignored

However, we have made worst-case behavior nearly impossible – no particular input can create it, only an unlucky partitioning

So the analysis doesn’t actually change, however we have made the average or expected case much more likely, and the worst case much less likely

Page 40: Cis435 week03

Random Numbers

Adding randomness to an algorithm implies an ability to generate random numbers

Computers are unfortunately not directly capable of generating truly random sequences

The approach that is generally taken is therefore to generate a sequence of “pseudo-random” numbers that exhibits good random behavior

Ref: Numerical Recipes in C, Chapter 7, available online at

http://www.ulib.org/webRoot/Books/Numerical_Recipes/

Page 41: Cis435 week03

Random Numbers

Most languages have a set of library functions for generating pseudo-random numbers System supplied generators typically suffer from a

number of problems due to poor specification and implementation The sequence generally repeats with a period no greater

than 32767 The randomness of the sequence is highly dependent

on the implementor’s choice of constants used by the algorithm, and in many standard implementations the choice is poor

Page 42: Cis435 week03

Random Numbers

One fast method of choosing random numbers is the linear congruential method Each number in the sequence is determined by a

mathematical operation performed on the previous choice

Ij+1 = aIj + c % m m is the modulus, and determines the periodicity

of the generated sequence a is called the multiplier c is called the increment

Page 43: Cis435 week03

Random Numbers

The quality of the generator here is highly dependent on the choice of m, a, and c Poor choices will limit the period, and more

importantly, significantly impact the randomness of the sequence

We can eliminate the need for m by using 32-bit integers and choosing m = 232

The result will be 64 bits, but since we’re using 32 bit variables, the hi order bits will be truncated

Some good choices for a and c are: a = 1664525, c = 1013904223

Page 44: Cis435 week03

A Random Number Generator Implementationclass Random{public: static const unsigned long RANDMAX, MULTIPLIER, INCREMENT;

explicit Random(unsigned long seed = 0) :m_seed(seed) {}

unsigned long operator()(void) { return (m_seed = MULTIPLIER*m_seed+INCREMENT); }

unsigned long seed(void) const { return m_seed; } void seed(unsigned long value) { m_seed = value; }

private: unsigned long m_seed;};

Page 45: Cis435 week03

Using the Random Number Generator This generator will produce a number between 0 and

232-1 Note that prior to generating numbers, the generator

should be seeded with a non-zero value If repeatability isn’t required, time() is often used to

generate the seed To produce a number in an arbitrary range, use the

following: j = LO+(int) ((float)(HI)*rand() / (RAND_MAX+1.0)); This forces the use of the hi order bits, which are

much more random than the lo order bits