1 algorithms csci 235, fall 2015 lecture 19 order statistics ii

12
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

Upload: margaretmargaret-gibson

Post on 19-Jan-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

1

Algorithms

CSCI 235, Fall 2015Lecture 19

Order Statistics II

Page 2: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

2

Finding the Median

Last time, we showed that we can find the kth order statistic (i.e. the kth smallest element) in(n) time, by repeatedly finding the minimum and discarding it.

How long will it take to find the median using this strategy?Note that the position of the median (n/2) increases as n increases.

T(n) = ?

Conclusion:•This method does not work as well for finding the median.•Larger values of k take longer to find (although the order of growth is the same).Can we do better?

Page 3: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

3

Randomized-Select

Randomized-Select(A, lo, hi, i) {Find the ith order statisticbetween lo and hi}

if lo = hi thenreturn A[lo]

split Randomized-Partition(A, lo, hi)length (split - lo) + 1if i <= length then

return Randomized-Select(A, lo, split, i)else

return Randomized-Select(A, split+1, hi, i-length)

Idea: Partition the array as in Quick-sort. Recursively search the appropriate partition for the kth element.

Page 4: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

4

Example

A 17 6 34 18 9 5 11 22 28 2

Find the 3rd order statistic:Randomized-Select(A, 1, 10, 3)

1 2 3 4 5 6 7 8 9 10

Page 5: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

5

Running time of Randomized-Select

Worst Case: As with QuickSort, we can get unlucky and partition the array into two pieces of size 1 and n-1, with the ith statistic in the larger side.

T(n) = T(n-1) + n = (n2)

cost of partition

A good case: Partition into two equal parts:T(n) = T(n/2) + n

(We will work this one out in class).

Average case:

Can show that T(n) <= cn, so T(n) = O(n)

Page 6: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

6

Selection in Worst case linear time

To make a selection in worst case linear time, we want to use an algorithm that guarantees a good split when we partition. To do this, we use the "median of median of c" algorithm.

• To start, we pick c, an integer constant >= 1.

• We write our input array, A, as a 2-D array with c rows, n/c columns. (If n/c is not an integer, we can pad the array with large numbers that won't change the result).

•Sort the columns of this new, 2D array.

Page 7: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

7

Example

A=[43, 5, 17, 91, 2, 42, 19, 72, 37, 3, 7, 15, 0, 63, 51, 73, 6, 30, 62, 10, 24, 26, 25, 28, 29] n = 25

Choose c = 5 Sort each column:

B[1..c, 1..n/c] = B[1..5, 1..5]

After sorting, the median row contains the median of each column.Sorting the columns takes (c2(n/c)) = (n) time.

Page 8: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

8

Median-of-median-of-c continued

We now call the Median-of-median-of-c algorithm again, on the single median row of B, with the same value of c as before.

Write median row as B' = [17, 37, 15, 30, 26]

Write B' as 2D array, with c= 5 rows and n/c = 1 column:

Sort columns:Value at the middle row is mm, the median of medians.

We use this as our pivot for the partition.

Page 9: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

9

Showing that it gives a good split

We can show that at least 1/4 of the elements are less than mm and at least 1/4 of the elements are greater than mm by imagining that the columns of B are sorted by the value of each median. (Note: we only imagine it, we don't actually do it).

At least 1/4 are less than 26

At least 1/4 are greater than 26

Page 10: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

10

Partitioning

Partition A using mm = 26 as the pivot. Use a partition that keeps mm in the high part of the partition:

"low" = 2, 5, 17, 3, 19, 0, 7, 15, 6, 10, 24, 25 (12 items)

"high" = 26, 43, 91, 37, 42, 72, 51, 63, 30, 62, 73, 28, 29 (13 items)

If the number of items in the low part of the partition = k, and the order statistic we are looking for is given by i, then

if i <= k, iterate the entire procedure on the lower partitionif i > k, iterate on the higher partition (looking for (i - k)th

element).

Page 11: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

11

Running time

T(n) = (n) + T(n/c) + T(3n/4) + (n)

Cost of sorting columns

Cost of finding m-of-m-of-c on median row of B

Worst case split.

Cost of partition

T(n) = T(n/c) + T(3n/4) + (n)

We can show that T(n) = (n) for c >=5

Page 12: 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

12

Benefits of M-of-M-of-c

• Good order statistic algorithm

• Can use this with other algorithms. For example, we can use it with QuickSort to guarantee a good split and an nlgn order of growth.

• The linear time is not the result of constraining the problem (as we did with counting-sort). It is a comparison-based method!