g64ads advanced data structures

54
1 G64ADS Advanced Data Structures Sorting

Upload: jodie

Post on 22-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

G64ADS Advanced Data Structures. Sorting. Insertion sort. 1) Initially p = 1 2) Let the first p elements be sorted. 3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted. 4) increment p and go to step (3). Insertion sort. 3. Insertion sort. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: G64ADS Advanced Data Structures

1

G64ADSAdvanced Data Structures

Sorting

2

Insertion sort

1) Initially p = 1

2) Let the first p elements be sorted

3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted

4) increment p and go to step (3)

3

Insertion sort

4

Insertion sort

o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

among the first p + 1 elements

5

Insertion sort

To sort the following numbers in increasing order

34 8 64 51 32 21

p = 1 tmp = 8

34 gt tmp so second element a[1] is set to 34 8 34hellip

We have reached the front of the list Thus 1st position a[0] = tmp=8

After 1st pass 8 34 64 51 32 21

(first 2 elements are sorted)

6

p = 3 tmp = 51

51 lt 64 so we have 8 34 64 64 32 21

34 lt 51 so stop at 2nd position set 3rd position = tmp

After 3rd pass 8 34 51 64 32 21

(first 4 elements are sorted)p = 4 tmp = 32

32 lt 64 so 8 34 51 64 64 21

32 lt 51 so 8 34 51 51 64 21

next 32 lt 34 so 8 34 34 51 64 21

next 32 gt 8 so stop at 1st position and set 2nd position = 32

After 4th pass 8 32 34 51 64 21

p = 5 tmp = 21

After 5th pass 8 21 32 34 51 64

p = 2 tmp = 64

34 lt 64 so stop at 3rd position and set 3rd position = 64

After 2nd pass 8 34 64 51 32 21

(first 3 elements are sorted)

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 2: G64ADS Advanced Data Structures

2

Insertion sort

1) Initially p = 1

2) Let the first p elements be sorted

3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted

4) increment p and go to step (3)

3

Insertion sort

4

Insertion sort

o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

among the first p + 1 elements

5

Insertion sort

To sort the following numbers in increasing order

34 8 64 51 32 21

p = 1 tmp = 8

34 gt tmp so second element a[1] is set to 34 8 34hellip

We have reached the front of the list Thus 1st position a[0] = tmp=8

After 1st pass 8 34 64 51 32 21

(first 2 elements are sorted)

6

p = 3 tmp = 51

51 lt 64 so we have 8 34 64 64 32 21

34 lt 51 so stop at 2nd position set 3rd position = tmp

After 3rd pass 8 34 51 64 32 21

(first 4 elements are sorted)p = 4 tmp = 32

32 lt 64 so 8 34 51 64 64 21

32 lt 51 so 8 34 51 51 64 21

next 32 lt 34 so 8 34 34 51 64 21

next 32 gt 8 so stop at 1st position and set 2nd position = 32

After 4th pass 8 32 34 51 64 21

p = 5 tmp = 21

After 5th pass 8 21 32 34 51 64

p = 2 tmp = 64

34 lt 64 so stop at 3rd position and set 3rd position = 64

After 2nd pass 8 34 64 51 32 21

(first 3 elements are sorted)

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 3: G64ADS Advanced Data Structures

3

Insertion sort

4

Insertion sort

o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

among the first p + 1 elements

5

Insertion sort

To sort the following numbers in increasing order

34 8 64 51 32 21

p = 1 tmp = 8

34 gt tmp so second element a[1] is set to 34 8 34hellip

We have reached the front of the list Thus 1st position a[0] = tmp=8

After 1st pass 8 34 64 51 32 21

(first 2 elements are sorted)

6

p = 3 tmp = 51

51 lt 64 so we have 8 34 64 64 32 21

34 lt 51 so stop at 2nd position set 3rd position = tmp

After 3rd pass 8 34 51 64 32 21

(first 4 elements are sorted)p = 4 tmp = 32

32 lt 64 so 8 34 51 64 64 21

32 lt 51 so 8 34 51 51 64 21

next 32 lt 34 so 8 34 34 51 64 21

next 32 gt 8 so stop at 1st position and set 2nd position = 32

After 4th pass 8 32 34 51 64 21

p = 5 tmp = 21

After 5th pass 8 21 32 34 51 64

p = 2 tmp = 64

34 lt 64 so stop at 3rd position and set 3rd position = 64

After 2nd pass 8 34 64 51 32 21

(first 3 elements are sorted)

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 4: G64ADS Advanced Data Structures

4

Insertion sort

o Consists of N - 1 passeso For pass p = 1 through N - 1 ensures that the elements

in positions 0 through p are in sorted ordero elements in positions 0 through p - 1 are already sortedo move the element in position p left until its correct place is found

among the first p + 1 elements

5

Insertion sort

To sort the following numbers in increasing order

34 8 64 51 32 21

p = 1 tmp = 8

34 gt tmp so second element a[1] is set to 34 8 34hellip

We have reached the front of the list Thus 1st position a[0] = tmp=8

After 1st pass 8 34 64 51 32 21

(first 2 elements are sorted)

6

p = 3 tmp = 51

51 lt 64 so we have 8 34 64 64 32 21

34 lt 51 so stop at 2nd position set 3rd position = tmp

After 3rd pass 8 34 51 64 32 21

(first 4 elements are sorted)p = 4 tmp = 32

32 lt 64 so 8 34 51 64 64 21

32 lt 51 so 8 34 51 51 64 21

next 32 lt 34 so 8 34 34 51 64 21

next 32 gt 8 so stop at 1st position and set 2nd position = 32

After 4th pass 8 32 34 51 64 21

p = 5 tmp = 21

After 5th pass 8 21 32 34 51 64

p = 2 tmp = 64

34 lt 64 so stop at 3rd position and set 3rd position = 64

After 2nd pass 8 34 64 51 32 21

(first 3 elements are sorted)

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 5: G64ADS Advanced Data Structures

5

Insertion sort

To sort the following numbers in increasing order

34 8 64 51 32 21

p = 1 tmp = 8

34 gt tmp so second element a[1] is set to 34 8 34hellip

We have reached the front of the list Thus 1st position a[0] = tmp=8

After 1st pass 8 34 64 51 32 21

(first 2 elements are sorted)

6

p = 3 tmp = 51

51 lt 64 so we have 8 34 64 64 32 21

34 lt 51 so stop at 2nd position set 3rd position = tmp

After 3rd pass 8 34 51 64 32 21

(first 4 elements are sorted)p = 4 tmp = 32

32 lt 64 so 8 34 51 64 64 21

32 lt 51 so 8 34 51 51 64 21

next 32 lt 34 so 8 34 34 51 64 21

next 32 gt 8 so stop at 1st position and set 2nd position = 32

After 4th pass 8 32 34 51 64 21

p = 5 tmp = 21

After 5th pass 8 21 32 34 51 64

p = 2 tmp = 64

34 lt 64 so stop at 3rd position and set 3rd position = 64

After 2nd pass 8 34 64 51 32 21

(first 3 elements are sorted)

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 6: G64ADS Advanced Data Structures

6

p = 3 tmp = 51

51 lt 64 so we have 8 34 64 64 32 21

34 lt 51 so stop at 2nd position set 3rd position = tmp

After 3rd pass 8 34 51 64 32 21

(first 4 elements are sorted)p = 4 tmp = 32

32 lt 64 so 8 34 51 64 64 21

32 lt 51 so 8 34 51 51 64 21

next 32 lt 34 so 8 34 34 51 64 21

next 32 gt 8 so stop at 1st position and set 2nd position = 32

After 4th pass 8 32 34 51 64 21

p = 5 tmp = 21

After 5th pass 8 21 32 34 51 64

p = 2 tmp = 64

34 lt 64 so stop at 3rd position and set 3rd position = 64

After 2nd pass 8 34 64 51 32 21

(first 3 elements are sorted)

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 7: G64ADS Advanced Data Structures

7

Insertion sort worst-case running time

o Inner loop is executed p times for each p=1N-1 Overall 1 + 2 + 3 + + N-1 = hellip= O(N2)o Space requirement is O()

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 8: G64ADS Advanced Data Structures

8

Heapsort

(1) Build a binary heap of N elements o the minimum element is at the top of the heap

(2) Perform N DeleteMin operationso the elements are extracted in sorted order

(3) Record these elements in a second array and then copy the array back

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 9: G64ADS Advanced Data Structures

9

Heapsort -Analysis

(1) Build a binary heap of N elements o repeatedly insert N elements O(N log N) time

(2) Perform N DeleteMin operationso Each DeleteMin operation takes O(log N) O(N log N)

(3) Record these elements in a second array and then copy the array backo O(N)

o Total time complexity O(N log N)o Memory requirement uses an extra array O(N)

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 10: G64ADS Advanced Data Structures

10

Heapsort ndash No Extra Memory

o Observation after each deleteMin the size of heap shrinks by 1

o We can use the last cell just freed up to store the element that was just deleted

after the last deleteMin the array will contain the elements in decreasing sorted order

o To sort the elements in the decreasing order use a min heap

o To sort the elements in the increasing order use a max heapo the parent has a larger element than the child

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 11: G64ADS Advanced Data Structures

11

Heapsort ndash No Extra Memory

Sort in increasing order use max heap

Delete 97

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 12: G64ADS Advanced Data Structures

12

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 13: G64ADS Advanced Data Structures

13

Mergesort

Based on divide-and-conquer strategy

o Divide the list into two smaller lists of about equal sizes

o Sort each smaller list recursivelyo Merge the two sorted lists to get one sorted list

o How to divide the list o Running timeo How to merge the two sorted lists o Running time

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 14: G64ADS Advanced Data Structures

14

Mergesort Divide

o If the input list is a linked list dividing takes (N) timeo We scan the linked list stop at the N2 th entry and

cut the link

o If the input list is an array A[0N-1] dividing takes O(1) timeo we can represent a sublist by two integers left and right to divide A[leftright] we compute center=(left+right)2 and obtain A[leftcenter] and A[center+1right]

o Try left=0 right = 50 center=

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 15: G64ADS Advanced Data Structures

15

Mergesort

o Divide-and-conquer strategyo recursively mergesort the first half and the

second halfo merge the two sorted halves together

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 16: G64ADS Advanced Data Structures

16

Mergesort

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 17: G64ADS Advanced Data Structures

17

Mergesort Merge

o Input two sorted array A and Bo Output an output sorted array Co Three counters Actr Bctr and Cctr

o initially set to the beginning of their respective arrays

(1) The smaller of A[Actr] and B[Bctr] is copied to the next entry in C and the appropriate counters are advanced

(2) When either input list is exhausted the remainder of the other list is copied to C

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 18: G64ADS Advanced Data Structures

18

Mergesort Merge

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 19: G64ADS Advanced Data Structures

19

Mergesort Merge

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 20: G64ADS Advanced Data Structures

20

Mergesort Analysis

o Merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists

o Space requiremento merging two sorted lists requires linear extra

memoryo additional work to copy to the temporary array

and back

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 21: G64ADS Advanced Data Structures

21

Mergesort Analysis

o Let T(N) denote the worst-case running time of mergesort to sort N numbers

o Assume that N is a power of 2

o Divide step O(1) timeo Conquer step 2 T(N2) timeo Combine step O(N) time o Recurrence equation

o T(1) = 1o T(N) = 2T(N2) + N

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 22: G64ADS Advanced Data Structures

22

Mergesort Analysis

kNN

T

NN

T

NNN

T

NN

T

NNN

T

NN

TNT

kk

)2(2

3)8(8

2)4

)8(2(4

2)4(4

)2

)4(2(2

)2(2)(

)log(

log

)2(2)(

NNO

NNN

kNN

TNTk

k

Since N=2k we have k=log2 n

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 23: G64ADS Advanced Data Structures

23

Quicksort

o Divide-and-conquer approach to sortingo Like MergeSort except

o Donrsquot divide the array in halfo Partition the array based elements being less than or

greater than some element of the array (the pivot)

o Worst case running time O(N2)o Average case running time O(N log N)o Fastest generic sorting algorithm in practiceo Even faster if use simple sort (eg

InsertionSort) when array is small

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 24: G64ADS Advanced Data Structures

24

Quicksort Algorithm

o Given array S

o Modify S so elements in increasing order

1 If size of S is 0 or 1 return

2 Pick any element v in S as the pivot

3 Partition S ndash v into two disjoint groupso S1 = x Є(S ndashv) | x le vo S2 = x Є(S ndashv) | x ge v

4 Return QuickSort(S1) followed by v followed by QuickSort(S2)

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 25: G64ADS Advanced Data Structures

25

Quicksort Example

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 26: G64ADS Advanced Data Structures

26

Why so fast

o MergeSort always divides array in halfo QuickSort might divide array into

subproblems of size 1 and N-1o Wheno Leading to O(N2) performance

o Need to choose pivot wisely (but efficiently)o MergeSort requires temporary array for

merge stepo QuickSort can partition the array in place

o This more than makes up for bad pivot choices

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 27: G64ADS Advanced Data Structures

27

Picking the Pivot

o Choosing the first elemento What if array already or nearly sortedo Good for random arrayo Choose random pivoto Good in practice if truly randomo Still possible to get some bad choiceso Requires execution of random number

generator

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 28: G64ADS Advanced Data Structures

28

Picking the Pivot

o Best choice of pivoto Median of array

o Median is expensive to calculateo Estimate median as the median of

three elementso Choose first middle and last elementso eg lt8 1 4 9 6 3 5 2 7 0gt

o Has been shown to reduce running time (comparisons) by 14

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 29: G64ADS Advanced Data Structures

29

Partitioning Strategy

o Partitioning is conceptually straightforward but easy to do inefficiently

o 1048708Good strategyo Swap pivot with last element S[right]o Set i = lefto Set j = (right ndash1)o While (i lt j)

o Increment i until S[i] gt pivot

o Decrement j until S[j] lt pivot

o If (i lt j) then swap S[i] and S[j]

o Swap pivot and S[i]

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 30: G64ADS Advanced Data Structures

30

Partitioning Example

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 31: G64ADS Advanced Data Structures

31

Partitioning Example

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 32: G64ADS Advanced Data Structures

32

Partitioning Strategy

o How to handle duplicateso Consider the case where all elements

are equalo Current approach Skip over elements

equal to pivoto No swaps (good)

o But then i = (right ndash1) and array partitioned into N-1 and 1 elements

o Worst case O(N2) performance

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 33: G64ADS Advanced Data Structures

33

Partitioning Strategy

o How to handle duplicateso Alternative approach

o Donrsquot skip elements equal to pivoto Increment i while S[i] lt pivoto Decrement j while S[j] gt pivot

o Adds some unnecessary swapso But results in perfect partitioning for array

of identical elementso Unlikely for input array but more likely for

recursive calls to QuickSort

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 34: G64ADS Advanced Data Structures

34

Small Arrays

o When S is small generating lots of recursive calls on small sub-arrays is expensive

o General strategyo When N lt threshold use a sort more efficient for

small arrays (eg InsertionSort)o Good thresholds range from 5 to 20o Also avoids issue with finding median-of-three pivot

for array of size 2 or less

o Has been shown to reduce running time by 15

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 35: G64ADS Advanced Data Structures

35

QuickSort Implementation

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 36: G64ADS Advanced Data Structures

36

QuickSort Implementation

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 37: G64ADS Advanced Data Structures

37

QuickSort Implementation

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 38: G64ADS Advanced Data Structures

38

Analysis of QuickSort

o Let i be the number of elements sent to the left partition

o Compute running time T(N) for array of size N

o T(0) = T(1) = O(1)

o T(N) = T(i) + T(N ndashi ndash1) + O(N)

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 39: G64ADS Advanced Data Structures

39

Analysis of QuickSort

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 40: G64ADS Advanced Data Structures

40

Analysis of QuickSort

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 41: G64ADS Advanced Data Structures

41

Comparison Sorting

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 42: G64ADS Advanced Data Structures

42

Comparison Sorting

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 43: G64ADS Advanced Data Structures

43

Comparison Sorting

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 44: G64ADS Advanced Data Structures

44

Lower Bound on Sorting

o Best worst-case sorting algorithm (so far) is O(N log N)

o Can we do bettero Can we prove a lower bound on the

sorting problemo Preview

o For comparison sorting no we canrsquot do better

o Can show lower bound of Ω(N log N)

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 45: G64ADS Advanced Data Structures

45

Decision Trees

o A decision tree is a binary treeo Each node represents a set of possible

orderings of the array elementso Each branch represents an outcome of

a particular comparison

o Each leaf of the decision tree represents a particular ordering of the original array elements

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 46: G64ADS Advanced Data Structures

46

Decision Trees

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 47: G64ADS Advanced Data Structures

47

Decision Tree for Sorting

o The logic of every sorting algorithm that uses comparisons can be represented by a decision tree

o In the worst case the number of comparisons used by the algorithm equals the depth of the deepest leaf

o In the average case the number of comparisons is the average of the depths of all leaves

o There are N different orderings of N elements

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 48: G64ADS Advanced Data Structures

48

Lower Bound for Comparison Sorting

o Lemma 71o Lemma 72o Theorem 76o Theorem 77

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 49: G64ADS Advanced Data Structures

49

Linear Sorting

o Some constraints on input array allow faster than Θ(N log N) sorting (no comparisons)

o CountingSort

o Given array A of N integer elements each less than Mo Create array C of size M where C[i] is the number of

irsquos in Ao Use C to place elements into new sorted array Bo Running time Θ(N+M) = Θ(N) if M = Θ(N)

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 50: G64ADS Advanced Data Structures

50

Linear Sorting

o BucketSort

o Assume N elements of A uniformly distributed over the range [01)

o Create N equal-sized buckets over [01)o Add each element of A into appropriate bucketo Sort each bucket (eg with InsertionSort)o Return concatentation of bucketso Average case running time Θ(N)

o Assumes each bucket will contain Θ(1) elements

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 51: G64ADS Advanced Data Structures

51

External Sorting

o What is the number of elements N we wish to sort do not fit in memory

o Obviously our existing sort algorithms are inefficiento Each comparison potentially requires a disk access

o We want to minimize disk accesses

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 52: G64ADS Advanced Data Structures

52

External Mergesorting

o N = number of elements in array A to be sortedo M = number of elements that fit in memoryo K = roof [NM]o Approach

o Read in M amount of A sort it using QuickSort and write it back to disk O(M log M)

o Repeat above K times until all of A processedo Create K input buffers and 1 output buffer each of

size M(K+1)o Perform a K-way merge O(N)

o Update input buffers one disk-page at a timeo Write output buffer one disk-page at a time

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 53: G64ADS Advanced Data Structures

53

External Mergesort

o T(NM) = O(KM log M) + N)o T(NM) = O((NM)M log M) + N)o T(NM) = O((N log M) + N)o T(NM) = O(N log M)o Disk accesses (all sequential)

o P = page size

o Accesses = 4NP (read-allwrite-all twice)

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary
Page 54: G64ADS Advanced Data Structures

54

Summary

  • G64ADS Advanced Data Structures
  • Insertion sort
  • Slide 3
  • Slide 4
  • Slide 5
  • Slide 6
  • Insertion sort worst-case running time
  • Heapsort
  • Heapsort -Analysis
  • Heapsort ndash No Extra Memory
  • Slide 11
  • Mergesort
  • Slide 13
  • Mergesort Divide
  • Slide 15
  • Slide 16
  • Mergesort Merge
  • Slide 18
  • Slide 19
  • Mergesort Analysis
  • Slide 21
  • Slide 22
  • Quicksort
  • Quicksort Algorithm
  • Quicksort Example
  • Why so fast
  • Picking the Pivot
  • Slide 28
  • Partitioning Strategy
  • Partitioning Example
  • Slide 31
  • Slide 32
  • Slide 33
  • Small Arrays
  • QuickSort Implementation
  • Slide 36
  • Slide 37
  • Analysis of QuickSort
  • Slide 39
  • Slide 40
  • Comparison Sorting
  • Slide 42
  • Slide 43
  • Lower Bound on Sorting
  • Decision Trees
  • Slide 46
  • Decision Tree for Sorting
  • Lower Bound for Comparison Sorting
  • Linear Sorting
  • Slide 50
  • External Sorting
  • External Mergesorting
  • External Mergesort
  • Summary