sorting data structures and algorithms (60-254). sorting sorting is one of the most well-studied...

56
Sorting Data Structures and Algorithms (60- 254)

Upload: scot-holt

Post on 17-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

SortingData Structures and Algorithms (60-254)

Page 2: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

2

Sorting

• Sorting is one of the most well-studied problems in Computer Science

• The ultimate reference on the subject is:

“The Art of Computer Programming: Vol. 3Sorting and Searching”,by D. E. Knuth

Page 3: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

3

Formal Statement

Given a sequence of n numbers:

a1, a2, …, an

find a permutation of the numbers 1, 2, …, n such that

a(1) a(2) … a(n)

Permutation:3, 2, 1 (1) = 3, (2) = 2, (3) = 1 2, 1, 3 (1) = 2, (2) = 1, (3) = 31, 3, 2 (1) = 1, (2) = 3, (3) = 2…

are all permutations of 1, 2, 3

Page 4: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

4

Comparison Sorts

• A comparison sort sorts by comparing elements pairwise.

• We study these comparison sorts:• Insertion Sort• Shellsort• Mergesort• Quicksort

Page 5: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

5

Insertion Sort

Sort the sequence3, 1, 5, 4, 2

Sort 3 3Sort 3, 1 1, 3Sort 1, 3, 5 1, 3, 5Sort 1, 3, 5, 4 1, 3, 4, 5Sort 1, 3, 4, 5, 2 1, 2, 3, 4, 5

Page 6: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

6

Incremental sorting

In general, at the ith step,

a1, a2, a3, …, ai-1, ai

are already sorted

a(1) a(2) … a(i)

for some permutation of 1, 2, …, i.

In the next step, ai+1 has to be inserted in the correct position

Page 7: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

7

Analysis of Insertion Sort

What is worst-case input?Elements in decreasing order!! Example:

5, 4, 3, 2, 1# of comparisons

5 05, 4 4, 5 14, 5, 3 3, 4, 5 23, 4, 5, 2 2, 3, 4, 5 32, 3, 4, 5, 1 1, 2, 3, 4, 5 4

Page 8: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

8

Worst case

In general, to insert ai+1 in its proper place,w.r.t. the sorted preceeding i numbers

a1, a2, …, ai, we can makei comparisons in the worst case.

Thus,

Page 9: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

9

Shellsort

Due to Donald ShellFor example:

Shellsort the sequence:81, 94, 11, 96, 12, 35, 17, 95 (1)

Step 1: Sort all sequences that are four positions apart.

81, 12 12, 8194, 35 35, 9411, 17 11, 1796, 95 95, 96

Results in:12, 35, 11, 95, 81, 94, 17, 96 (2)

Page 10: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

10

Shellsort

Step 2: 12, 35, 11, 95, 81, 94, 17, 96 (2)

Sort all sequences of (2) that are two positions apart.12, 11, 81, 17 11, 12, 17, 8135, 95, 94, 96 35, 94, 95, 96

Results in:11, 35, 12, 94, 17, 95, 81, 96 (3)

Step 3:

Sort all sequences of (3) that are one position apart.11, 35, 12, 94, 17, 95, 81, 96 11, 12, 17, 35, 81, 94, 95, 96 (4)

Sequence (4) is sorted !!

Page 11: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

11

Observations

h1, h2, h3 = 4, 2, 1 is called a gap sequence.Different gap sequences are possibleEvery one of them must end with 1Shell’s gap sequence:

h1 = n/2

hi = hi-1 / 2 (downto hk = 1)

All sequences were sorted using insertion sortIn Step 3, we sorted the entire sequence, using insertion sort!

Advantage over straightforward insertion sort?

Page 12: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

12

Example

Insertion sort on:81, 94, 11, 96, 12, 35, 17, 9581 081, 94 111, 81, 94 211, 81, 94, 96 111, 12, 81, 94, 96 411, 12, 35, 81, 94, 96 411, 12, 17, 35, 81, 94, 96 511, 12, 17, 35, 1, 94, 95, 96 2

__Total # of comparisons 19

Page 13: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

13

Example

Insertion sort on:11, 35, 12, 94, 17, 95, 81, 9611 011, 35 111, 12, 35 211, 12, 35, 94 111, 12, 17, 35, 94 311, 12, 17, 35, 94, 95 111, 12, 17, 35, 81, 94, 95 311, 12, 17, 35, 81, 94, 95, 96 1

__Total # of comparisons 12

Page 14: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

14

Analysis of Shellsort

Clever choice of a gap sequence leads to a subquadratic algorithm That is, for an n-element sequence, the # of comparisons:

when using the Hibbard sequence: 1, 3, 7, …, 2k-1

)()( 2/3nOnT

Page 15: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

15

Mergesort

Sort:81, 94, 11, 96, | 12, 35, 17, 95 Mergesort (81, 94, 11, 96)

Mergesort(12, 35, 17, 95)Merge the two sorted lists from above two lines.

This is a Divide-and-conquer algorithm.

Page 16: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

16

Divide

MS (81, 94, 11, 96, 12, 35, 17, 95)

MS (11, 96)

MS (12, 35, 17, 95)

MS (81, 94)

MS (81, 94, 11, 96)

MS (12, 35) MS (17, 95)

Page 17: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

17

Conquer

Merge two sorted lists.MS (81, 94, 11, 96) = 11, 81, 94, 96 (1)MS (12, 35, 17, 95) = 12, 17, 35, 95 (2)

Compare 11 and 12Output 11Move index in list (1)Compare 12 and 81Output 12Move index in list (2)Compare 17 and 81Output 17Move index in list (2)

Page 18: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

18

Number of Comparisons

A total of seven comparisons to generate the sorted list11, 12, 17, 35, 81, 94, 95, 96

This is the maximum!For if the lists were

81, 94, 95, 96 and11, 12, 17, 35

We would need only four comparisonsThe algorithm follows…

Page 19: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

19

Procedure Mergesort(A)

n size of Aif (n > 1)

Set A1 A[1 ... n/2] // Create a new array A1

Set A2 A[n/2+1 ... n] // Create a new array A2

Mergesort(A1)

Mergesort(A2)

Merge(A, A1, A2)else

// A has only one element do nothing!

Page 20: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

20

Procedure Merge(A, A1, A2)

n1 size of A1

n2 size of A2

i 1; j 1; k 1

while (i <= n1 and j <= n2)

if (A1[i] < A2[j])

A[k] A1[i]; i i +1else

A[k] A2[j]; j j + 1k k + 1

for m i to n1

A[k] A1[m]; k k + 1

for m j to n2

A[k] A2[m]; k k + 1

Page 21: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

21

Theorem

To merge two sorted lists, each of length n, we need at most 2n – 1 comparisons

Page 22: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

22

Complexity of Mergesort

T(n) = 2 T(n/2) + O(n) n > 2= 1 n = 1

Solution: O(n log n)

Page 23: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

23

A Partitioning Game

Given L = 5, 3, 2, 6, 4, 1, 3, 7 Partition L

into L1 and L2 such that

every element in list L1

is less than or equal to

every element in list L2

How?

Page 24: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

24

Split

a = first element of LMake

Every element of list L1

less than or equal toa

less than or equal to

every element of list L2

How?Using two indices

lx = left indexrx = right index

Page 25: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

25

rx

Initial configuration

5, 3, 2, 6, 4, 1, 3, 7

lx

Rules: lx moves right until it meets an element 5 rx moves left until it meets an element 5 exchange elements and continue until indices meet or cross.

Page 26: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

26

Intermediate configurations

5, 3, 2, 6, 4, 1, 3, 7

lx rxExchange and continue: 3, 3, 2, 6, 4, 1, 5, 7

Exchange and continue: 3, 3, 2, 1, 4, 6, 5, 7

lx and rx have crossed !!

Page 27: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

27

Intermediate configurations

L1 = 3, 3, 2, 1, 4

L2 = 6, 5, 7

Now, do the same with the lists

L1 and L2

Initial configuration for L1

3, 3, 2, 1, 4

lx rx

3 = first element.

Page 28: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

28

Intermediate configuration for L1

3, 3, 2, 1, 4

lx rxExchange and continue:

1, 3, 2, 3, 4

lx rxExchange and continue:

1, 2, 3, 3, 4

rx lx

Left and right indices have crossed!

Page 29: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

29

Quicksort

We have new lists

L11 = 1, 2

L12 = 3, 3, 4with which we continue to do the same Partitioning stops once we have a list with only one elementAll this, done in place gives us the following sorted list

Lsorted = 1, 2, 3, 3, 4, 5, 6, 7This is Quicksort!!!

Page 30: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

30

Partition – Formal Description

Procedure Partition(L, p, q)a L[p]lx p – 1rx q + 1while true

repeat rx rx -1 // Move right indexuntil L[rx] arepeat lx lx + 1 // Move left indexuntil L[lx] aif (lx < rx)

exchange(L[lx], L[rx])else

return rx // Indices have crossed

Page 31: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

31

Quicksort

Procedure Quicksort(L, p, q)if (p < q)

r Partition(L, p, q)Quicksort(L, p, r)Quicksort(L, r+1, q)

To sort the entire array, the initial call is:

Quicksort(L, 1, n)where n is the length of L

Page 32: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

32

Observations

• Choice of the partitioning element a is important

• Determines how the lists are split

• Desirable: To split the list evenly

• How?...

Page 33: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

33

Undesirable Partitioning

List of size 2

List of size n-1

List of size n

Page 34: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

34

Example

Such an undesirable partitioningis possible if we takethe following sorted sequence

3, 4, 5, 6, 7, 8, 9, 10

and we partition as described above

Page 35: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

35

Desirable Partitioning

.

.

.

Page 36: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

36

Choosing the pivot

Steering between the two extremes:Can we choose the partitioning element to steer between the two extremes?Heuristics:

Median-of-three Find median of first, middle and last element

orFind median of three randomly chosen elements.

Page 37: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

37

Analysis

Worst-case behaviorT(n) = n + n-1 + … + 2 = O(n2)

Since to partition a list of size n-i (i 0) into two lists of size 1 and n-i-1we need to look at all n-i elements

n-i

1 n-i-1

T(n) = T(n-1) + O(n)

Page 38: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

38

Best-case Behavior

T(n) = 2T(n/2) + O(n) T(n) = O(n log n) where T(n) = time to partition a list of

size n into two lists of size n/2 each.

Page 39: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

39

Average-case Behavior

Tavg(n) = O(n log n) T(n) = T(n) + T(n) + O(n)

where + = 1

Page 40: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

40

Sorting – in Linear Time??

Yes… but…only under certain assumptions on the input data

Linear-time sorting techniques:• Counting sort• Radix sort• Bucket sort

Page 41: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

41

Counting Sort

Assumption:Input elements are integers in the range 1 to kwhere

k = O(n) Example:

Sort the listL = 3, 6, 4, 1, 3, 4, 1, 4

using counting sort.

Page 42: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

42

Example

A 3 6 4 1 3 4 1 4  1 2 3 4 5 6 7 8

C              1 2 3 4 5 6

B                  1 2 3 4 5 6 7 8

Input is in array A C[i] counts # of times i occurs in the input at first Sorted array is stored in B

Page 43: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

43

Example - Continued

Count # of times i occurs in A:

C 2 0 2 3 0 1  1 2 3 4 5 6

Cumulative counters:

C 2 2 4 7 7 8  1 2 3 4 5 6

Now C[i] contains the count of the number of elementsin the input that are i

Page 44: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

44

Example - Continued

• Go through array A• First element is 3• From C[3] we know that 4 elements are to 3• So B[4] = 3• Decrease C[3] by 1 B       3        

  1 2 3 4 5 6 7 8

C 2 2 3 7 7 8  1 2 3 4 5 6

Page 45: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

45

Example - Continued

• Next element in A is 6• C[6] = 8 eight elements are 6• B[8] = 6• Decrease C[6] by 1 B       3       6

  1 2 3 4 5 6 7 8

C 2 2 3 7 7 7  1 2 3 4 5 6

and so on…

Page 46: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

46

Example - Continued

A[3] = 4, C[4] = 7, B[7] = 4

B       3     4 6  1 2 3 4 5 6 7 8

C 2 2 3 6 7 7  1 2 3 4 5 6

Page 47: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

47

Example - Continued

A[4] = 1, C[1] = 2, B[2] = 1

B   1   3     4 6  1 2 3 4 5 6 7 8

C 1 2 3 6 7 7  1 2 3 4 5 6

Page 48: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

48

Example - Continued

A[5] = 3, C[3] = 3, B[3] = 3

B   1 3 3     4 6  1 2 3 4 5 6 7 8

C 2 2 2 6 7 7  1 2 3 4 5 6

Page 49: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

49

Example - Continued

A[6] = 4, C[4] = 6, B[6] = 4

B   1 3 3   4 4 6  1 2 3 4 5 6 7 8

C 1 2 2 5 7 7  1 2 3 4 5 6

Page 50: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

50

Example - Continued

A[7] = 1, C[1] = 1, B[1] = 1

B 1 1 3 3   4 4 6  1 2 3 4 5 6 7 8

C 0 2 2 5 7 7  1 2 3 4 5 6

Page 51: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

51

Example - Continued

A[8] = 4, C[4] = 5, B[5] = 4

B 1 1 3 3 4 4 4 6  1 2 3 4 5 6 7 8

C 0 2 2 4 7 7  1 2 3 4 5 6

Page 52: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

52

Formal Algorithm

Procedure CountingSort(A, B, k, n)for i 1 to k

C[i] 0for i 1 to n

C[A[i]] C[A[i]] + 1// C[i] now contains a counter of how often i occursfor i 2 to k

C[i] C[i] + C[i-1]// C[i] now contains # of elements ifor i n downto 1

B[C[A[i]]] A[i]C[A[i]] C[A[i]] - 1

Page 53: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

53

Without using a second array B…Procedure Single_Array_CountingSort(A, k, n)

for i 1 to kC[i] 0

for i 1 to nC[A[i]] C[A[i]] + 1

// C[i] now contains a counter of how often i occurspos 1for i 1 to k

for j 1 to C[i]A[pos] ipos pos + 1

Page 54: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

54

Analysis of Single-Array Counting SortFirst and second for loops take k = O(n) stepsThen, two nested for loops… O(n2) ??? A more accurate upper bound??...Yes…For each i … inner for loop executes C[i] timesThen, two for loops execute

k

iiC

1][

Theorem: niCk

i 1

][

Proof (sketch):

Second for loop executed n times.

Each step an element in C is increased by 1. q.e.d.

Page 55: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

55

Discussion

Complexity:If the list is of size n and k = O(n), then T(n) = O(n)

Stability:

A sorting method is stable ifequal elements are output

in the same order they had in the input. Theorem:

Counting Sort is stable.

Page 56: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on

56

Lower Bounds on Sorting

Theorem:For any comparison sort of n elements

T(n) = (n log n) Remark:

T(n) = (g(n)) means thatT(n) grows at least as fast as g(n)