6 -1 chapter 6 sorting. 6 -2 sorting a file of size n is a sequence of n items. each item in the...
Post on 21-Dec-2015
223 views
TRANSCRIPT
6 -2
Sorting A file of size n is a sequence of n
items. Each item in the file is called a record.Key Other fields
Record 1 4 DDD
Record 2 2 BBB
Record 3 1 AAA
Record 4 5 EEE
Record 5 3 CCC
1 AAA
2 BBB
3 CCC
4 DDD
5 EEE original file sorted file
6 -3
Original pointer table
4 DDD
2 BBB
1 AAA
5 EEE
3 CCC
Record 1Record 2
Record 3
Record 4
Record 5
FileSorted pointer table
It is easier to search a particular element after sorting. (e.g. binary search)
6 -4
internal sorting: data stored in main memory ( more than 20 algorithms )
external sorting: data stored in auxiliary storage.
stable sorting : the records with the same key have the same relative order as they have before sorting.
Types of sorting
6 -5
n1050
100500
1,0005,000
10,00050,000100,000500,000
a = 0.01n2
125100
2,50010,000
250,0001,000,000
25,000,000100,000,000
2,5000,000,000
b = 10n100500
1,0005,000
10,00050,000100,000500,000
1,000,0005,000,000
a+b101525
1,1007,50020,000300,000
1,100,00025,500,000101,000,000
2,505,000,000
(a+b) n2 1.010.210.110.030.020.010.010.010.010.01
Time and space efficiency
6 -6
O notation f(n) is O(g(n)) if there exist positive integers a and b such that f(n) ≦a . g(n) for all n ≧b e.g. 4n2 + 100n = O(n2) ∵ n ≧100, 4n2+100n ≦5n2
4n2 + 100n = O(n3) ∵ n ≧10, 4n2+100n ≦2n3
f(n)= c1nk + c2nk-1 +…+ ckn + ck+1
= O(nk+j), for any j ≧0 f(n)= c = O(1), c is a constantlogmn = logmk . logkn ,for some constants m and k logmn = O(logkn) = O(log n)
6 -7
Time complexity
polynomial order: O(nk), for some constant k.
exponential order: O(dn), for some d >1. NP-complete(intractable) problem: requiring exponential time algorithms.
best sorting algorithm with comparisons: O(nlogn)
6 -8
nlog10n and n2
n1 × 101
5 × 101
1 × 102
5 × 102
1 × 103
5 × 103
1 × 104
5 × 104
1 × 105
5 × 105
1 × 106
5 × 106
1 × 107
nlog10n1.0 × 101
8.5 × 101
2.0 × 102
1.3 × 103
3.0 × 103
1.8 × 104
4.0 × 104
2.3 × 105
5.0 × 105
2.8 × 106
6.0 × 106
3.3 × 107
7.0 × 107
n2
1.0 × 102
2.5 × 103
1.0 × 104
2.5 × 105
1.0 × 106
2.5 × 107
1.0 × 108
2.5 × 109
1.0 × 1010
2.5 × 1011
1.0 × 1012
2.5 × 1013
1.0 × 1014
6 -9
Bubble sort 相鄰兩個資料相比 , 若未符合順序 , 則對調 (exchan
ge) 之 .e.g. 8 2 9 5 6 ( 由大而小 sort)
8 2 9 5 68 9 2 5 6
8 9 5 2 6
8 9 5 6 2
9 8 5 6 2
9 8 5 6 2
9 8 6 5 2
9 8 6 5 2
9 8 6 5 2
pass 1
pass 2
pass 3
decreasing order
nonincreasing order
6 -10
如果在某一個 pass 中,沒有任何相鄰兩項資料對調,表示已經 sort 完畢
best case : 未 sort 之前 , 已按順序排好 , 需 1 pass
worst case: 需 n-1 個 pass ( n 為資料量 )
比較 (comparison) 次數最多為 :
(n-1)+(n-2)+...+1 = = O(n2)
Time complexity: O(n2)
n(n-1)2
Time complexity of bubble sort
6 -11
void bubble(int x[], int n){ int hold, j, pass; int switched = TRUE;
for (pass=0; pass < n-1 && switched == TRUE; pass++){ /*outer loop controls the number of passes */ switched = FALSE; /* initially no interchanges have */ /* been made on this pass */ for (j = 0; j < n-pass-1; j++) /* inner loop governs each individual pass */ if (x[j] > x[j+1]){ /* elements out of order */ /* an interchange is necessary */ switched = TRUE; hold = x[j]; x[j] = x[j+1]; x[j+1] = hold; } /* end if */ } /* end for */} /* end bubble */
6 -12
Quicksort (partition exchange sort)
e.g. 由小而大 sort (nondecreasing order)
[26 5 37 1 61 11 59 15 48 19][26 5 19 1 61 11 59 15 48 37][26 5 19 1 15 11 59 61 48 37][11 5 19 1 15] 26 [59 61 48 37][11 5 1 19 15] 26 [59 61 48 37][ 1 5] 11 [19 15] 26 [59 61 48 37] 1 5 11 15 19 26 [59 61 48 37] 1 5 11 15 19 26 [59 37 48 61] 1 5 11 15 19 26 [48 37] 59 [61] 1 5 11 15 19 26 37 48 59 61
6 -13
Quicksort 方法 : 每組的第一個資料為基準 (pivot), 比它小
的資料放在左邊 , 比它大的資料放在右邊 , 然後以 pivot 中心 , 將這組資料分成兩部份 .
worst case: 每次的基準資料恰為最大 , 或最小 比較次數 :
)(O2
)1(12)2()1( 2n
nnnn
6 -14
... ...
...log2n
n
×2 = nn2
×4 = nn4
Best case of quicksort best case:
每次分割 (partition) 時 , 均分成大約相同個數的兩部份 .
6 -15
T(n): n 個資料所需時間T(n)≦ cn+2T(n/2), for some constant c.
≦ cn+2(c . n/2 + 2T(n/4)) ≦ 2cn + 4T(n/4)
≦ cnlog2n + nT(1) = O(nlogn)
...
Mathematical analysis of best case
6 -16
void partition(int x[], int lb, int ub, int *pj) { int a, down, temp, up; a = x[lb]; /* a is the element whose final position */ /* is sought */ up = ub; down = lb; while (down < up){ while (x[down] <= a && down < ub) down++; /* move up the array */ while (x[up] > a) up--; /* move down the array */ if (down < up){ /* interchange x[down] and x[up] */ temp = x[down]; x[down] = x[up]; x[up] = temp; } /* end if */ } /* end while */ x[lb] = x[up]; x[up] = a; *pj = up; } /* end partition */
6 -17
main program:
if (lb >= ub) return; // array is sorted partition(x, lb, ub, j);
// partition the elements of the // subarray such that one of the // elements(possibly x[lb]) is // now at x[j] (j is an output // parameter) and: // 1. x[i] <= x[j] for lb <= i < j // 2. x[i] >= x[j] for j < i <= ub // x[j] is now at its final position
quick(x, lb, j-1); // recursively sort the subarray // between posiitons lb and j-1
quick(x, j+1, ub); // recursively sort the subarray // between positions j+1 and ub
6 -18
Selection sorte.g. 由大而小 sort
8 2 9 5 6
9 2 8 5 6
9 8 2 5 6
9 8 6 5 2
9 8 6 5 2
pass 1
pass 2
pass 4
pass 3
方法 : 每次均從剩餘未 sort 部份之資料 , 找出最大者 ( 或最小者 ), 然後對調至其位置
比較次數 :(n-1)+(n-2)+...+1 = =O(n2)
Time complexity: O(n2)
n(n-1)2
6 -19
Binary tree sorte.g. input data: 8, 2, 9, 5, 6 建立 binary search tree:
8
92
5
6
inorder traversal: output: 2 5 6 8 9
worst case:input data: 2, 5, 6, 8, 9
比較次數 :
best case: time complexity:
2
6
5
8
9
d
i 1
i*2i = O(nlogn), d log2n
)(O2
)1()1(321 2n
nnn
6 -20
Heapsorte.g. input data: 25 57 48 37 12 92 86 33
將 input data 存入 almost complete binary treeStep 1: Construct a heap
25
25
57
37
57
4825
57
48
25
37
57
48
25 12
37
92
57
25 12 48
37
92
86
25 12 48 57
6 -21
37
92
86
33 12 48 57
25
0
37
86
57
33 12 48 25
92
1 2
3 4 5 6
7
Step 2: Adjust the heap.
37
57
48
33 12 25 86
92
(a) x[7] = 最大值 (b) x[6] = 第二大
6 -22
37
48
25
33 12 57 86
92
33
37
25
12 48 57 86
92
(c) x[5] = 第三大 (d) x[4] = 第四大
12
33
25
37 48 57 86
92
12
25
33
37 48 57 86
92
(e) x[3] = 第五大 (f) x[2] = 第六大
6 -23
Final:
25
12
33
37 48 57 86
92
x[0]
x[2]
x[6]
x[5]x[4]x[7]
x[3]
x[1]
The heapsort should be implemented by an array, not by a binary tree
time complexity: O(nlogn) (in the worst case)
(g) x[1] = 第七大
6 -24
Insertion sort
e.g. ( 由大而小 sort) 8 2 9 5 6
8 2 9 5 6
8 9 2 5 6
9 8 2 5 6
9 8 5 2 6
9 8 5 2 6
9 8 5 6 2
9 8 6 5 2
9 8 6 5 2
pass 1
pass 2
pass 3
pass 4
6 -25
方法 : 每次處理一個新的資料時 , 一定 insert 至適當的位置才停止 .
需要 n-1 個 pass
best case: 未 sort 前 , 已按順序排好 , 每個 pass 僅需一次比較 , 共需 (n-1) 次比較 . worst case: 未 sort 前 , 按相反順序排好 , 比較次
數為 :
Time complexity: O(n2)
Insertion sort
)(O2
)1()1(321 2n
nnn
6 -26
void insertsort(int x[], int n){ int i, k, y;//initially x[0] may be thought of as a sorted //file of one element. After each repetition of//the following loop, the elements x[0] through//x[k] are in order for (k = 1; k < n; k++){ /* Insert x[k] into the sorted file */ y = x[k];/* Move down 1 position all elements greater*//* than y */ for (i = k-1; i >= 0 && y < x[i]; i--) x[i+1] = x[i]; /* Insert y at proper position */ x[i+1] = y; } /* end for */} /* end insertsort */
6 -27
Shell sort (diminishing increment sort)
方法 : insertion sort 是相鄰兩個資料做比較 , 再決定是 否互換 . Shell sort 則是相距為 d 的兩個 " 比較與互換 "(compare and exchange). d 為任意大於 1 的整數 , 但在最後一個 pass, d 必須為 1.
6 -28
e.g. 由大到小 sort21 11 09 02 16 31 26 01 27 05 13 19
pass 1: d1 = = 612+1
2d1
26 11 27 05 16 31 21 01 09 02 13 19
pass 2: d2 = = = 3d1+12
6+12
26 16 31 21 13 27 05 11 19 02 01 09
d2
pass 3: d3 = = = 2d2+1
23+12
31 27 26 21 19 16 13 11 05 09 01 02
d3
pass 4: d4 = = = 1d3+1
22+1
2
31 27 26 21 19 16 13 11 09 05 02 01d4
6 -29
每個 pass 均進行多組的 insertion sort. 若一開始 d=1, 則與 insertion sort 完全一樣Knuth 證明 :
di-1 = 3di+1, 即 di= 為最好di-1-1
3 time complexity: O(nlog2n)~O(n3/2)
適合數百個資料之 sorting
6 -30
void shellsort (int x[], int n, int incrmnts[], int numinc) { int incr, j, k, span, y; for (incr = 0;incr < numic; incr++){ /* span is the size of the increment */ span = incrmnts[incr]; for (j = span; j< n;j++){ /* Insert element x[j] into its proper */ /* position within its subfile */ y = x[j]; for (k = j-span; k >= 0 && y < x[k]; k -= span) x[k+span] = x[k]; x[k+span] = y; } /* end for */ } /* end for */ } /* end shellsort */
6 -31
address caculation sort (sorting by hashing)
e.g. 由小到大 sort input data: 19 13 05 27 01 26 31 16 02 09 11 21
分成 10 個 subfile, 每個 subfile 是一個 linkedlist, 其資料由小而大排列
01 02 05 0911 13 16 1921 26 2731 ...
假設有 n 個資料 , m 個 subfile best case: 1, 且 uniform distribution time complexity: O(n) n
m
nm
worst case: >>1, 或 not uniform distribution time complexity: O(n2)
~~
6 -32
Two-way merge
Merge two sorted sequences into a single one. e.g.
[25 37 48 57][12 33 86 92]
[12 25 33 37 48 57 86 92]merge
設兩個 sorted lists 長度各為 m, ntime complexity: O(m+n)
6 -33
Merge sort e.g. ( 由小而大 )
[25] [57] [48] [37] [12] [92] [86] [33]
[25 57] [37 48] [12 92] [33 86]
[25 37 48 57] [12 33 86 92]
[12 25 33 37 48 57 86 92]
pass 1
pass 2
pass 3
需要 log2n 個 pass time complexity: O(nlogn) It can be implemented by a recursive function.
6 -34
Radix sorte.g. 由小到大的 sort
191305270126311602091121
0)1) 01,31,11,212) 023) 134)5) 056) 26,167) 278)9) 19,09
013111210213052616271909
0) 01,02,05,091) 11,13,16,192) 21,26,273) 314)5)6)7)8)9)
010205091113161921262731
input data pass 1 merge pass 2 merge