randomly permuting arrayselomaa/teach/aads-2013-4.pdf · randomly permuting arrays • we assume...

9/19/2013

1

Randomly permuting arrays

• We assume that we are given an arraywhich, w.l.g., contains the elements 1, 2, … ,

• Produce a random permutation of the array• Assign element ] a random priority ],

and sort the elements according to priorities• E.g., if our initial array is = 1,2,3,4 and we

choose random priorities = 36,3,62,19 , wewould produce array = 2,4,1,3

• We call this procedure PERMUTE-BY-SORTING19-Sep-13MAT-72006 AA+DS, Fall 2013 157

Permute-By-Sorting(A)

1. . length2. let [1. . ] be a new array3. for = 1 to4. ] = RA N D O M (1, )5. sort , using as sort keys• We use a range of 1 to for random

numbers to make it likely that all the prioritiesin are unique

19-Sep-13MAT-72006 AA+DS, Fall 2013 158

9/19/2013

2

• It remains to prove that the procedureproduces a uniform random permutation,– It is equally likely to produce every

permutation of the numbers 1 through

Lemma 5.4 PERMUTE-BY-SORTING produces auniform random permutation of the input,assuming that all priorities are distinct

Proof See the book.19-Sep-13MAT-72006 AA+DS, Fall 2013 159

• It is better to permute the given array in place• RANDOMIZE-IN-PLACE does so in ) time• In its th iteration, it chooses the element ]

randomly from among elements through]

• After the th iteration, is never altered

RANDOMIZE-IN-PLACE )12 for = 1 to3 swap with [RANDOM( , )]

19-Sep-13MAT-72006 AA+DS, Fall 2013 160

9/19/2013

3

• A -permutation on a set of elements is asequence containing of the elements,with no repetitions

• There are ! ! -permutations

loop invariant:• Just prior to the th iteration of the for loop of

lines 2–3, for each possible 1)-permutation of the elements, the subarray

[1. . 1] contains this 1)-permutationwith probability + 1 ! !

19-Sep-13MAT-72006 AA+DS, Fall 2013 161

Initialization: loop invariant trivially holdsMaintenance:• Consider a particular -permutation, and

denote the elements in it by , … ,• This permutation consists of an 1)-

permutation , … , followed by thevalue that the algorithm places in ]

• Let denote the event in which the rst1) iterations have created the particular1)-permutation , … , in

[1. . 1]

19-Sep-13MAT-72006 AA+DS, Fall 2013 162

9/19/2013

4

• By the loop invariant, Pr = + 1 ! !• Let be the event that th iteration puts in

position ]• The -permutation , … , appears in

1. . precisely when both and occurPr Pr Pr{

• Pr = 1 + 1 because in line 3the algorithm chooses randomly from the

+ 1 values in positions . .

Pr =1

+ 1+ 1 !! =

!!

19-Sep-13MAT-72006 AA+DS, Fall 2013 163

Termination:• At termination, + 1, and we have that

the subarray 1. . is a given -permutationwith probability

+ 1 + 1 ! ! = 0! ! = 1 !

• Thus, RANDOMIZE-IN-PLACE produces auniform random permutation

19-Sep-13MAT-72006 AA+DS, Fall 2013 164

9/19/2013

5

5.4.1 The birthday paradox

• How many people must there be in a roombefore there is a 50% chance that two ofthem were born on the same day of the year?

• The answer is surprisingly few• The paradox is that it is in fact far fewer

– than the number of days in a year, or– even half the number of days in a year

19-Sep-13MAT-72006 AA+DS, Fall 2013 165

An analysis using indicatorrandom variables

• We use indicator random variables to providea simple but approximate analysis of thebirthday paradox

• For each pair of the people in theroom, de ne the indicator random variable

, for , by• and have the same birthday}

= and have the same birthday0 otherwise

19-Sep-13MAT-72006 AA+DS, Fall 2013 166

9/19/2013

6

• Once birthday for is chosen, theprobability that is chosen to be the sameday is , where = 365

• E = Pr and have the same birthday =

• Let be a random variable counting thenumber of pairs of individuals having thesame birthday

=

19-Sep-13MAT-72006 AA+DS, Fall 2013 167

• Taking expectations of both sides andapplying linearity of expectation, we obtain

E = E

= ] = 21

=1)

• When 1) , the expected number ofpairs of people with the same birthday is atleast 1.

19-Sep-13MAT-72006 AA+DS, Fall 2013 168

9/19/2013

7

• Thus, if we have at least + 1 individualsin a room, we can expect at least two to havethe same birthday

• For = 365, if = 28, the expected numberof pairs with the same birthday is(28 27) (2 365) 1.0356

• With at least 28 people, we expect to nd atleast one matching pair of birthdays

• Analysis using only probabilities gives adifferent exact number of people, but sameasymptotically: ( )

19-Sep-13MAT-72006 AA+DS, Fall 2013 169

5.4.2 Balls and bins

• Consider tossing identical balls randomly intobins, numbered 1,2, … ,

• Tosses are independent, and on each tossthe ball is equally likely to end up in any bin

• The probability that a tossed ball lands in anygiven bin is

• The ball-tossing process is a sequence ofBernoulli trials with a probability ofsuccess the ball falls in the given bin

19-Sep-13MAT-72006 AA+DS, Fall 2013 170

9/19/2013

8

• How many balls fall in a given bin?– The number of balls that fall in a given bin

follows the binomial distribution , 1 )– If we toss balls, the expected number of

balls that fall in the given bin is• How many balls must we toss, on the

average, until a given bin contains a ball?– The number of tosses until the given bin

receives a ball follows the geometricdistribution with probability and

– the expected number of tosses until successis 1 (1 ) =

19-Sep-13MAT-72006 AA+DS, Fall 2013 171

• How many balls must we toss until every bincontains at least one ball?– Call a toss in which a ball falls into an empty

bin a “hit”– We want to know the expected number of

tosses required to get hits– We can partition the tosses into stages– The th stage consists of the tosses after the

1)st hit until the th hit– The rst stage consists of the rst toss, since

we are guaranteed to have a hit when all binsare empty

19-Sep-13MAT-72006 AA+DS, Fall 2013 172

9/19/2013

9

– During the th stage, 1 bins contain ballsand + 1 bins are empty

– For each toss in the th stage, the probabilityof obtaining a hit is + 1)/

– is the number of tosses in the th stage– The number of tosses required to get hits is

=– Each has a geometric distribution with

probability of success + 1)/

E = + 1

19-Sep-13MAT-72006 AA+DS, Fall 2013 173

E = E = ]

= + 1

1

(ln (1))• By harmonic series

19-Sep-13MAT-72006 AA+DS, Fall 2013 174

9/19/2013

10

• It therefore takes approximately ln tossesbefore we can expect that every bin has aball

• This problem is also known as the couponcollector’s problem, which says that aperson trying to collect each of differentcoupons expects to acquire approximately

ln randomly obtained coupons in order tosucceed

19-Sep-13MAT-72006 AA+DS, Fall 2013 175

II Sorting (and OrderStatistics)

HeapsortQuicksort

Sorting in Linear Time

9/19/2013

11

6 Heapsort

• Heapsort’s running time is lg )• It sorts in place

– only a constant number of array elements arestored outside the input array at any time

• Heapsort combines the better attributes ofthe sorting algorithms we have alreadydiscussed

• Heapsort also introduces another algorithmdesign technique: using a data structure

19-Sep-13MAT-72006 AA+DS, Fall 2013 177

6.1 Heaps

• The (binary) heap data structure is an arrayobject that we can view as a nearly completebinary tree

• Each node of the tree corresponds to anelement of the array

• The tree is completely lled on all levelsexcept possibly the lowest, which is lledfrom the left up to a point

19-Sep-13MAT-72006 AA+DS, Fall 2013 178

9/19/2013

12

• An array that represents a heap is anobject with two attributes:– . length gives the number of elements in the

array, and– . heap size represents how many elements

in the heap are stored within array

19-Sep-13MAT-72006 AA+DS, Fall 2013 179

• Although [1. . . length may containnumbers, only the elements in

[1. . . heap size], where . heap size. length, are valid elements of the heap

• The root of the tree is [1], and given theindex of a node, we can easily compute theindices of its parent, left child, and right child:

PARENT return /2LEFT returnRIGHT return + 1

19-Sep-13MAT-72006 AA+DS, Fall 2013 180

9/19/2013

13

• There are two kinds of binary heaps:– max-heaps and min-heaps

• In both, the values in the nodes satisfy aheap property

• The max-heap property is that for every nodeother than the root

[PARENT )] [i]• The largest element in a max-heap is stored

at the root• The subtree rooted at node contains values

no larger than that contained at itself19-Sep-13MAT-72006 AA+DS, Fall 2013 181

• The height of a node in a heap is the numberof edges on the longest simple downwardpath from the node to a leaf, and

• We de ne the height of the heap to be theheight of its root

• Since a heap of elements is based on acomplete binary tree, its height is (lg

• The basic operations on heaps run in time atmost proportional to the height of the tree andthus take (lg ) time

19-Sep-13MAT-72006 AA+DS, Fall 2013 182

9/19/2013

14

6.2 Maintaining the heap property

• MAX-HEAPIFY maintains the heap property• Its inputs are an array and index into it• It assumes that the binary trees rooted at

LEFT ) and RIGHT ) are max-heaps, but that] might be smaller than its children

• MAX-HEAPIFY lets the value at ] oatdown” in the max-heap so that the subtreerooted at index obeys the max-heapproperty

19-Sep-13MAT-72006 AA+DS, Fall 2013 183

MAX-HEAPIFY(A,i)

1. =LEFT )2. =RIGHT )3. if . heap size and ] > ]4. largest5. else largest6. if . heap size and ] > [largest]7. largest8. if largest9. exchange ] with largest10. MAX-HEAPIFY ( , largest)

19-Sep-13MAT-72006 AA+DS, Fall 2013 184

9/19/2013

15

19-Sep-13MAT-72006 AA+DS, Fall 2013 185

• At each step, the largest of the elements [ ],[LE F T )], and [RIG H T )] is determined,

and its index is stored in largest• If ] is largest, then the subtree rooted at

node is already a max-heap• Otherwise, one child has the largest element,

and ] is swapped with largest• The node indexed by largest now has the

original value ], and thus the subtreerooted at largest might violate the max-heapproperty => call MAX-HEAPIFY recursively

19-Sep-13MAT-72006 AA+DS, Fall 2013 186

9/19/2013

16

• Running time of MAX-HEAPIFY on a subtree ofsize rooted at a given node is– the (1) time to x up the relationships of

], [LE FT )], and [RIG H T )]– plus the time to run MAX-HEAPIFY on a subtree

rooted at one of the children of node

• The children’s subtrees have size at most• the worst case occurs when the bottom level

of the tree is exactly half full• We have the recurrence

= ( 3) + (1) = (lg )19-Sep-13MAT-72006 AA+DS, Fall 2013 187

6.3 Building a heap

• We can use the procedure MAX-HEAPIFYbottom-up to convert an array [1. . ], where

, into a max-heap• Elements in the subarray [ 2 + 1 . . ]

are all leaves of the tree, and so each is a 1-element heap to begin with

• BUILD-MAX-HEAP goes through the remainingnodes of the tree and runs MAX-HEAPIFY oneach one

19-Sep-13MAT-72006 AA+DS, Fall 2013 188

9/19/2013

17

BUILD-MAX-HEAP( )1. . heap-size . length2. for = . length 2 downto 13. MAX-HEAPIFY )

• A simple upper bound on the running time:– Each call to MAX-HEAPIFY costs (lg ) time– BUILD-MAX-HEAP makes ) such calls– Thus, the running time is lg )

• This upper bound is not asymptotically tight19-Sep-13MAT-72006 AA+DS, Fall 2013 189

19-Sep-13MAT-72006 AA+DS, Fall 2013 190

9/19/2013

18

• Observe that the time for MAX-HEAPIFY to runat a node varies with the height of the node inthe tree

• The heights of most nodes are small• The time required by MAX-HEAPIFY when

called on a node of height is )• The total cost of BUILD-MAX-HEAP is bounded

by• Hence, we can build a max-heap from an

unordered array in linear time19-Sep-13MAT-72006 AA+DS, Fall 2013 191

6.4 The heapsort algorithm

• The heapsort algorithm starts by using BUILD-MAX-HEAP to build a max-heap on the inputarray [1. . ]

• The maximum element of the array is storedat the root [1], we can put it into its correctnal position by exchanging it with ]

• We can now discard node from the heap bysimply decrementing . heap-size

19-Sep-13MAT-72006 AA+DS, Fall 2013 192

9/19/2013

19

• The children of the root remain max-heaps,but the new root element might violate themax-heap property

• To restore the max-heap property call MAX-HEAPIFY , 1), which leaves a max-heap in

[1. . 1]• The heapsort algorithm then repeats this

process for the max-heap of size downto a heap of size 2

19-Sep-13MAT-72006 AA+DS, Fall 2013 193

HEAPSORT( )1. BUILD-MAX-HEAP )2. for downto 23. exchange [1] with ]4. . heap-size . heap-size 15. MAX-HEAPIFY , 1)

• The procedure takes time lg ), since thecall to BUILD-MAX-HEAP takes time ) andeach of the calls to MAX-HEAPIFY takestime (lg )

19-Sep-13MAT-72006 AA+DS, Fall 2013 194

9/19/2013

20

19-Sep-13MAT-72006 AA+DS, Fall 2013 195

19-Sep-13MAT-72006 AA+DS, Fall 2013 196

9/19/2013

21

6.5 Priority queues

• Maintain a set of elements, each with anassociated value called a key

• INSERT ) insert into• MAXIMUM ) return element with largest key• EXTRACT-MAX ) removes and returns the

element of with the largest key• INCREASE-KEY ) increases the value of

element ’s key to the new value , which isat least as large as ’s current key value

19-Sep-13MAT-72006 AA+DS, Fall 2013 197

• We can use max-priority queues, e.g., toschedule jobs on a shared computer

• The max-priority queue keeps track of thejobs to be performed and their relativepriorities

• When a job is nished or interrupted, thescheduler selects the highest-priority job fromamong pending ones by calling EXTRACT-MAX

• The scheduler can add a new job to thequeue at any time by calling INSERT

19-Sep-13MAT-72006 AA+DS, Fall 2013 198

9/19/2013

22

• It is easy to implement MAXIMUM operation in1 time using a heap

HEAP-MAXIMUM return [1]HEAP-EXTRACT-MAX

1. if . heap-size < 12. error “heap under ow”3. max = [1]4. 1 = . heap-size]5. . heap-size . heap-size 16. MAX-HEAPIFY , 1)7. return max

19-Sep-13MAT-72006 AA+DS, Fall 2013 199

• Running time of HEAP-EXTRACT-MAX is(lg ); it performs a constant amount of

work on top of the (lg ) for MAX-HEAPIFY

• In HEAP-INCREASE-KEY index identi es theelement whose key we wish to increase

• Increasing the key of ] might violate themax-heap property

• The procedure traverses a simple path fromthis node toward the root to nd a properplace for the newly increased key

19-Sep-13MAT-72006 AA+DS, Fall 2013 200

9/19/2013

23

• The procedure repeatedly compares anelement to its parent, exchanging their keysand continuing if the element’s key is larger,and terminating otherwise

HEAP-INCREASE-KEY )1. if ]2. error “new key is smaller than current key”3.4. while > 1 and [PARENT ] < ]5. exchange ] with [PARENT ]6. = PARENT

19-Sep-13MAT-72006 AA+DS, Fall 2013 201

19-Sep-13MAT-72006 AA+DS, Fall 2013 202

9/19/2013

24

• The running time of HEAP-INCREASE-KEY onan -element heap is (lg ), since the pathtraced to the root has length (lg )

MAX-HEAP-INSERT )

1. . heap-size . heap-size + 12. . heap-size] =3. HEAP-INCREASE-KEY . heap-size )

• Also this procedure has running time (lg )

19-Sep-13MAT-72006 AA+DS, Fall 2013 203

7 Quicksort

• Quicksort has a worst-case running time ofon an array of numbers

• Despite this, quicksort is often the bestpractical choice for sorting because it isremarkably ef cient on the average:– Its expected running time is lg , and the

constant factors are quite small• It also sorts in place and it works well even in

virtual-memory environments19-Sep-13MAT-72006 AA+DS, Fall 2013 204

9/19/2013

25

7.1 Description of quicksort

• Divide: Rearrange the array . . ] into two(possibly empty) subarrays . . 1] and

+ 1. . ]– such that each element of . . 1] ],

which in turn each element of + 1. . ]Compute the index as part of thispartitioning procedure

• Conquer: Sort the two subarrays byrecursive calls to quicksort

19-Sep-13MAT-72006 AA+DS, Fall 2013 205

• Combine: Because the subarrays arealready sorted, no work is needed to combinethem: the entire array . . ] is now sorted

QUICKSORT )1. if2. PARTITION )3. QUICKSORT 1)4. QUICKSORT + 1,• To sort an entire array , the initial call is

QUICKSORT( , 1, . )19-Sep-13MAT-72006 AA+DS, Fall 2013 206

9/19/2013

26

Partitioning the arrayPARTITION )1. ]2. 13. for to 14. if5. + 16. exchange ] with ]7. exchange + 1] with ]8. return + 1

19-Sep-13MAT-72006 AA+DS, Fall 2013 207

19-Sep-13MAT-72006 AA+DS, Fall 2013 208

9/19/2013

27

• PARTITION always selects an element] as a pivot element around which to

partition the subarray . . ]• As the procedure runs, it partitions the array

into four (possibly empty) regions• At the start of each iteration of the for loop in

lines 3–6, the regions satisfy properties,shown above

19-Sep-13MAT-72006 AA+DS, Fall 2013 209

• At the beginning of each iteration of the loopof lines 3–6, for any array index ,1. If , then2. If + 1 1, then3. If , then

• Indices between and are not coveredby any case, and the values in these entrieshave no particular relationship to the pivot

• The running time of PARTITION on thesubarray [ . . ] is , = + 1

19-Sep-13MAT-72006 AA+DS, Fall 2013 210

9/19/2013

28

7.2 Performance of quicksort

• Running time of quicksort depends onwhether the partitioning is balanced orunbalanced– which in turn depends on which elements are

used for partitioning.– Balanced: the algorithm runs asymptotically

as fast as merge sort– Unbalanced: it can run asymptotically as

slowly as insertion sort

19-Sep-13MAT-72006 AA+DS, Fall 2013 211

Worst-case partitioning

• The worst-case behavior occurs when thepartitioning routine produces one subproblemwith elements and one with 0 elements

• Let us assume that this unbalancedpartitioning arises in each recursive call

• Partitioning costs , time• The recursive call on an array of size 0 just

returns, (0) = 1

19-Sep-13MAT-72006 AA+DS, Fall 2013 212

9/19/2013

29

• The recurrence for the running time is1 + 0 +1 +

• The substitution method proves that thisrecurrence has the solution

• The worst-case running time of quicksort isno better than that of insertion sort

• The running time occurs when theinput array is already completely sorted– a common situation in which insertion sort

runs in ( ) time19-Sep-13MAT-72006 AA+DS, Fall 2013 213

Best-case partitioning

• In the most even possible split, PARTITIONproduces two subproblems, each of size atmost /2: of sizes 2 and 2 1

• The recurrence for the running time is then= 2 2) )

• By case 2 of the master theorem, thisrecurrence has the solution lg )

• By balancing the two sides of the partition,we get an asymptotically faster algorithm

19-Sep-13MAT-72006 AA+DS, Fall 2013 214

9/19/2013

30

Balanced partitioning

• Average-case running time of quicksort iscloser to the best case than to the worst case

• Suppose, that the partitioning algorithmalways produces a 9-to-1 proportional split– at rst blush this seems quite unbalanced

• We then obtain the recurrence10 10) ,

• where we have explicitly included theconstant hidden in the ( ) term

19-Sep-13MAT-72006 AA+DS, Fall 2013 215

19-Sep-13MAT-72006 AA+DS, Fall 2013 216

9/19/2013

31

• In the recursion tree every level of the treehas cost , until the recursion reaches aboundary condition at depth log (lg ),and then the levels have cost

• Recursion terminates at log (lg )• Total cost of quicksort is therefore lg• Thus, with this intuitively quite unbalanced

seeming split quicksort runs asymptotically asif the split were right down the middle

• In fact, any split of constant proportionalityleads to lg running time

19-Sep-13MAT-72006 AA+DS, Fall 2013 217

randomly permuting arrayselomaa/teach/aads-2013-4.pdf · randomly permuting arrays • we assume...

Documents