6 -1 chapter 6 sorting. 6 -2 sorting a file of size n is a sequence of n items. each item in the...

35
6 -1 Chapter 6 Sorting

Post on 21-Dec-2015

223 views

Category:

Documents


3 download

TRANSCRIPT

6 -1

Chapter 6

Sorting

6 -2

Sorting A file of size n is a sequence of n

items. Each item in the file is called a record.Key Other fields

Record 1 4 DDD

Record 2 2 BBB

Record 3 1 AAA

Record 4 5 EEE

Record 5 3 CCC

1 AAA

2 BBB

3 CCC

4 DDD

5 EEE original file sorted file

6 -3

Original pointer table

4 DDD

2 BBB

1 AAA

5 EEE

3 CCC

Record 1Record 2

Record 3

Record 4

Record 5

FileSorted pointer table

It is easier to search a particular element after sorting. (e.g. binary search)

6 -4

internal sorting: data stored in main memory ( more than 20 algorithms )

external sorting: data stored in auxiliary storage.

stable sorting : the records with the same key have the same relative order as they have before sorting.

Types of sorting

6 -5

n1050

100500

1,0005,000

10,00050,000100,000500,000

a = 0.01n2

125100

2,50010,000

250,0001,000,000

25,000,000100,000,000

2,5000,000,000

b = 10n100500

1,0005,000

10,00050,000100,000500,000

1,000,0005,000,000

a+b101525

1,1007,50020,000300,000

1,100,00025,500,000101,000,000

2,505,000,000

(a+b) n2 1.010.210.110.030.020.010.010.010.010.01

Time and space efficiency

6 -6

O notation f(n) is O(g(n)) if there exist positive integers a and b such that f(n) ≦a . g(n) for all n ≧b e.g. 4n2 + 100n = O(n2) ∵ n ≧100, 4n2+100n ≦5n2

4n2 + 100n = O(n3) ∵ n ≧10, 4n2+100n ≦2n3

f(n)= c1nk + c2nk-1 +…+ ckn + ck+1

= O(nk+j), for any j ≧0 f(n)= c = O(1), c is a constantlogmn = logmk . logkn ,for some constants m and k logmn = O(logkn) = O(log n)

6 -7

Time complexity

polynomial order: O(nk), for some constant k.

exponential order: O(dn), for some d >1. NP-complete(intractable) problem: requiring exponential time algorithms.

best sorting algorithm with comparisons: O(nlogn)

6 -8

nlog10n and n2

n1 × 101

5 × 101

1 × 102

5 × 102

1 × 103

5 × 103

1 × 104

5 × 104

1 × 105

5 × 105

1 × 106

5 × 106

1 × 107

nlog10n1.0 × 101

8.5 × 101

2.0 × 102

1.3 × 103

3.0 × 103

1.8 × 104

4.0 × 104

2.3 × 105

5.0 × 105

2.8 × 106

6.0 × 106

3.3 × 107

7.0 × 107

n2

1.0 × 102

2.5 × 103

1.0 × 104

2.5 × 105

1.0 × 106

2.5 × 107

1.0 × 108

2.5 × 109

1.0 × 1010

2.5 × 1011

1.0 × 1012

2.5 × 1013

1.0 × 1014

6 -9

Bubble sort 相鄰兩個資料相比 , 若未符合順序 , 則對調 (exchan

ge) 之 .e.g. 8 2 9 5 6 ( 由大而小 sort)

8 2 9 5 68 9 2 5 6

8 9 5 2 6

8 9 5 6 2

9 8 5 6 2

9 8 5 6 2

9 8 6 5 2

9 8 6 5 2

9 8 6 5 2

pass 1

pass 2

pass 3

decreasing order

nonincreasing order

6 -10

如果在某一個 pass 中,沒有任何相鄰兩項資料對調,表示已經 sort 完畢

best case : 未 sort 之前 , 已按順序排好 , 需 1 pass

worst case: 需 n-1 個 pass ( n 為資料量 )

比較 (comparison) 次數最多為 :

(n-1)+(n-2)+...+1 = = O(n2)

Time complexity: O(n2)

n(n-1)2

Time complexity of bubble sort

6 -11

void bubble(int x[], int n){ int hold, j, pass; int switched = TRUE;

for (pass=0; pass < n-1 && switched == TRUE; pass++){ /*outer loop controls the number of passes */ switched = FALSE; /* initially no interchanges have */ /* been made on this pass */ for (j = 0; j < n-pass-1; j++) /* inner loop governs each individual pass */ if (x[j] > x[j+1]){ /* elements out of order */ /* an interchange is necessary */ switched = TRUE; hold = x[j]; x[j] = x[j+1]; x[j+1] = hold; } /* end if */ } /* end for */} /* end bubble */

6 -12

Quicksort (partition exchange sort)

e.g. 由小而大 sort (nondecreasing order)

[26 5 37 1 61 11 59 15 48 19][26 5 19 1 61 11 59 15 48 37][26 5 19 1 15 11 59 61 48 37][11 5 19 1 15] 26 [59 61 48 37][11 5 1 19 15] 26 [59 61 48 37][ 1 5] 11 [19 15] 26 [59 61 48 37] 1 5 11 15 19 26 [59 61 48 37] 1 5 11 15 19 26 [59 37 48 61] 1 5 11 15 19 26 [48 37] 59 [61] 1 5 11 15 19 26 37 48 59 61

6 -13

Quicksort 方法 : 每組的第一個資料為基準 (pivot), 比它小

的資料放在左邊 , 比它大的資料放在右邊 , 然後以 pivot 中心 , 將這組資料分成兩部份 .

worst case: 每次的基準資料恰為最大 , 或最小 比較次數 :

)(O2

)1(12)2()1( 2n

nnnn

6 -14

... ...

...log2n

n

×2 = nn2

×4 = nn4

Best case of quicksort best case:

每次分割 (partition) 時 , 均分成大約相同個數的兩部份 .

6 -15

T(n): n 個資料所需時間T(n)≦ cn+2T(n/2), for some constant c.

≦ cn+2(c . n/2 + 2T(n/4)) ≦ 2cn + 4T(n/4)

≦ cnlog2n + nT(1) = O(nlogn)

...

Mathematical analysis of best case

6 -16

void partition(int x[], int lb, int ub, int *pj) { int a, down, temp, up; a = x[lb]; /* a is the element whose final position */ /* is sought */ up = ub; down = lb; while (down < up){ while (x[down] <= a && down < ub) down++; /* move up the array */ while (x[up] > a) up--; /* move down the array */ if (down < up){ /* interchange x[down] and x[up] */ temp = x[down]; x[down] = x[up]; x[up] = temp; } /* end if */ } /* end while */ x[lb] = x[up]; x[up] = a; *pj = up; } /* end partition */

6 -17

main program:

if (lb >= ub) return; // array is sorted partition(x, lb, ub, j);

// partition the elements of the // subarray such that one of the // elements(possibly x[lb]) is // now at x[j] (j is an output // parameter) and: // 1. x[i] <= x[j] for lb <= i < j // 2. x[i] >= x[j] for j < i <= ub // x[j] is now at its final position

quick(x, lb, j-1); // recursively sort the subarray // between posiitons lb and j-1

quick(x, j+1, ub); // recursively sort the subarray // between positions j+1 and ub

6 -18

Selection sorte.g. 由大而小 sort

8 2 9 5 6

9 2 8 5 6

9 8 2 5 6

9 8 6 5 2

9 8 6 5 2

pass 1

pass 2

pass 4

pass 3

方法 : 每次均從剩餘未 sort 部份之資料 , 找出最大者 ( 或最小者 ), 然後對調至其位置

比較次數 :(n-1)+(n-2)+...+1 = =O(n2)

Time complexity: O(n2)

n(n-1)2

6 -19

Binary tree sorte.g. input data: 8, 2, 9, 5, 6 建立 binary search tree:

8

92

5

6

inorder traversal: output: 2 5 6 8 9

worst case:input data: 2, 5, 6, 8, 9

比較次數 :

best case: time complexity:

2

6

5

8

9

d

i 1

i*2i = O(nlogn), d log2n

)(O2

)1()1(321 2n

nnn

6 -20

Heapsorte.g. input data: 25 57 48 37 12 92 86 33

將 input data 存入 almost complete binary treeStep 1: Construct a heap

25

25

57

37

57

4825

57

48

25

37

57

48

25 12

37

92

57

25 12 48

37

92

86

25 12 48 57

6 -21

37

92

86

33 12 48 57

25

0

37

86

57

33 12 48 25

92

1 2

3 4 5 6

7

Step 2: Adjust the heap.

37

57

48

33 12 25 86

92

(a) x[7] = 最大值 (b) x[6] = 第二大

6 -22

37

48

25

33 12 57 86

92

33

37

25

12 48 57 86

92

(c) x[5] = 第三大 (d) x[4] = 第四大

12

33

25

37 48 57 86

92

12

25

33

37 48 57 86

92

(e) x[3] = 第五大 (f) x[2] = 第六大

6 -23

Final:

25

12

33

37 48 57 86

92

x[0]

x[2]

x[6]

x[5]x[4]x[7]

x[3]

x[1]

The heapsort should be implemented by an array, not by a binary tree

time complexity: O(nlogn) (in the worst case)

(g) x[1] = 第七大

6 -24

Insertion sort

e.g. ( 由大而小 sort) 8 2 9 5 6

8 2 9 5 6

8 9 2 5 6

9 8 2 5 6

9 8 5 2 6

9 8 5 2 6

9 8 5 6 2

9 8 6 5 2

9 8 6 5 2

pass 1

pass 2

pass 3

pass 4

6 -25

方法 : 每次處理一個新的資料時 , 一定 insert 至適當的位置才停止 .

需要 n-1 個 pass

best case: 未 sort 前 , 已按順序排好 , 每個 pass 僅需一次比較 , 共需 (n-1) 次比較 . worst case: 未 sort 前 , 按相反順序排好 , 比較次

數為 :

Time complexity: O(n2)

Insertion sort

)(O2

)1()1(321 2n

nnn

6 -26

void insertsort(int x[], int n){ int i, k, y;//initially x[0] may be thought of as a sorted //file of one element. After each repetition of//the following loop, the elements x[0] through//x[k] are in order for (k = 1; k < n; k++){ /* Insert x[k] into the sorted file */ y = x[k];/* Move down 1 position all elements greater*//* than y */ for (i = k-1; i >= 0 && y < x[i]; i--) x[i+1] = x[i]; /* Insert y at proper position */ x[i+1] = y; } /* end for */} /* end insertsort */

6 -27

Shell sort (diminishing increment sort)

方法 : insertion sort 是相鄰兩個資料做比較 , 再決定是 否互換 . Shell sort 則是相距為 d 的兩個 " 比較與互換 "(compare and exchange). d 為任意大於 1 的整數 , 但在最後一個 pass, d 必須為 1.

6 -28

e.g. 由大到小 sort21 11 09 02 16 31 26 01 27 05 13 19

pass 1: d1 = = 612+1

2d1

26 11 27 05 16 31 21 01 09 02 13 19

pass 2: d2 = = = 3d1+12

6+12

26 16 31 21 13 27 05 11 19 02 01 09

d2

pass 3: d3 = = = 2d2+1

23+12

31 27 26 21 19 16 13 11 05 09 01 02

d3

pass 4: d4 = = = 1d3+1

22+1

2

31 27 26 21 19 16 13 11 09 05 02 01d4

6 -29

每個 pass 均進行多組的 insertion sort. 若一開始 d=1, 則與 insertion sort 完全一樣Knuth 證明 :

di-1 = 3di+1, 即 di= 為最好di-1-1

3 time complexity: O(nlog2n)~O(n3/2)

適合數百個資料之 sorting

6 -30

void shellsort (int x[], int n, int incrmnts[], int numinc) { int incr, j, k, span, y; for (incr = 0;incr < numic; incr++){ /* span is the size of the increment */ span = incrmnts[incr]; for (j = span; j< n;j++){ /* Insert element x[j] into its proper */ /* position within its subfile */ y = x[j]; for (k = j-span; k >= 0 && y < x[k]; k -= span) x[k+span] = x[k]; x[k+span] = y; } /* end for */ } /* end for */ } /* end shellsort */

6 -31

address caculation sort (sorting by hashing)

e.g. 由小到大 sort input data: 19 13 05 27 01 26 31 16 02 09 11 21

分成 10 個 subfile, 每個 subfile 是一個 linkedlist, 其資料由小而大排列

01 02 05 0911 13 16 1921 26 2731 ...

假設有 n 個資料 , m 個 subfile best case: 1, 且 uniform distribution time complexity: O(n) n

m

nm

worst case: >>1, 或 not uniform distribution time complexity: O(n2)

~~

6 -32

Two-way merge

Merge two sorted sequences into a single one. e.g.

[25 37 48 57][12 33 86 92]

[12 25 33 37 48 57 86 92]merge

設兩個 sorted lists 長度各為 m, ntime complexity: O(m+n)

6 -33

Merge sort e.g. ( 由小而大 )

[25] [57] [48] [37] [12] [92] [86] [33]

[25 57] [37 48] [12 92] [33 86]

[25 37 48 57] [12 33 86 92]

[12 25 33 37 48 57 86 92]

pass 1

pass 2

pass 3

需要 log2n 個 pass time complexity: O(nlogn) It can be implemented by a recursive function.

6 -34

Radix sorte.g. 由小到大的 sort

191305270126311602091121

0)1) 01,31,11,212) 023) 134)5) 056) 26,167) 278)9) 19,09

013111210213052616271909

0) 01,02,05,091) 11,13,16,192) 21,26,273) 314)5)6)7)8)9)

010205091113161921262731

input data pass 1 merge pass 2 merge

6 -35

方法 :(1) 每個資料不與其它資料做比較 , 只看自己放在何處(2)pass 1: 從個位數開始處理 , 若是個位數為 1, 則放在 bucket 1, 以此類推(3)pass 2: 處理十位數

好處 : 速度快 , time complexity: O(nlogpk) k: input data 之最大數 p: 以 p 為基底 logpk: 位數之長度 缺點 : 需要額外的 memory( 可使用 linked list, 將所需 memory 減至最少 , 但會增加時

間 ).