chapter 05

11

Click here to load reader

Upload: nguyen-phuong

Post on 23-Jun-2015

32 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 05

Chapter 5 - 1

����� ������ ���

Chapter 5

External Sorting

Chapter 5 - 1����� ������ ���

TABLE OF CONTENTS● Introduction

● External Sort/Merge Algorithms

● 2 Phase Multiway Merge Sort

● Optimization Strategy

Page 2: Chapter 05

Chapter 5 - 2

Chapter 5 - 2����� ������ ���

1. Introduction� File processing��� sorting����

✔Order-by� group-by, join ��������

✔Efficient sequential update– new master = old master + transaction

� Internal sorting algorithm����

✔Sorting����������������

✔File������������

– ����������������

– Array������ → ���������?

– Solution: External Sorting

Chapter 5 - 3����� ������ ���

2. External Sort/Merge Algorithm

� Basic Idea✔Sorting���������(run) ���

– Run: ��������

✔� run��� internal sorting ��������

✔���� run�����������

✔��� run��� 1���������

� ������� sort/merge algorithm���

✔Binary Sort/Merge✔Balanced Binary Sort/Merge✔Balanced K-way Sort/Merge✔Polyphase Sort/Merge

Page 3: Chapter 05

Chapter 5 - 3

Chapter 5 - 4����� ������ ���

Basic Idea of External Sorting

750 records

run 1

750 records

run 2

750 records

run 3

750 records

run 4

750 records

run 5

750 records

run 6

1500 records

run 1

1500 records

run 2

1500 records

run 3

3000 records

run 1

4500 records

run 1

Chapter 5 - 5����� ������ ���

Binary Sort/Merge� Sorting Phase

✔�� run�� sorting� �, 2�������

� Merging Phase✔�������� run������ run ���

��������

✔������� 2 �������������

Page 4: Chapter 05

Chapter 5 - 4

Chapter 5 - 6����� ������ ���

Binary Sort/Merge� �

� Input File (Run = 3)✔ 50 110 95 10 100 36 153 40 120 60 70 130 22 140 80

� Sorting Phase����

✔ File 1: (50 95 110) (40 120 153) (22 80 140)✔ File 2: (10 36 100) (60 70 130)

� ��������

✔ File 3: (10 36 50 95 100 110) (40 60 70 120 130 153) (22 80 140)

� ������������

✔ File 1: (10 36 50 95 100 110) (22 80 140)✔ File 2: (40 60 70 120 130 153)✔ File 3: ���

Chapter 5 - 7����� ������ ���

Binary Sort/Merge� � - ��

� ��������

✔ File 3: (10 40 36 50 60 70 95 100 110 120 130 153) (22 80 140)

� ������������

✔ File 1: (10 40 36 50 60 70 95 100 110 120 130 153) ✔ File 2: (22 80 140)✔ File 3: ���

� ��������

✔ File 3: (10 22 40 36 50 60 70 80 95 100 110 120 130 140 153)

Page 5: Chapter 05

Chapter 5 - 5

Chapter 5 - 8����� ������ ���

�� Sort/Merge Algorithm�

� Balanced Binary Sort/Merge✔����� = ����� = 2✔���������, ����������

� Balanced k-way Sort/Merge✔k-way Sort/Merge� Balanced version✔����������������� run ��

✔k�������������?

� Polyphase Sort/Merge✔����� ≠ ����������, ����

���

Chapter 5 - 9����� ������ ���

Ex: Balanced Binary Sort/Merge� Input File: 50 110 95 10 100 36 153 40 120 60 70 130 22 140 80

� Sorting Phase����✔ File 1: (50 95 110) (40 120 153) (22 80 140)✔ File 2: (10 36 100) (60 70 130)

� ��������✔ File 3: (10 36 50 95 100 110) (22 80 140)✔ File 4: (40 60 70 120 130 153)

� ��������✔ File 1: (10 40 36 50 60 70 95 100 110 120 130 153) ✔ File 2: (22 80 140)

� ��������✔ File 3: (10 22 40 36 50 60 70 80 95 100 110 120 130 140 153)

Page 6: Chapter 05

Chapter 5 - 6

Chapter 5 - 10����� ������ ���

Ex: Balanced k-way Sort/Merge� Input File: 50 110 95 10 100 36 153 40 120 60 70 130 22 140 80

� Sorting Phase����

✔ File 1: (50 95 110) (60 70 130) ✔ File 2: (10 36 100) (22 80 140)✔ File 3: (40 120 153)

� ��������

✔ File 4: (10 36 40 50 95 100 110 120 153) ✔ File 5: (22 60 70 80 130 140) ✔ File 6:

� ��������

✔ File 1: (10 22 40 36 50 60 70 80 95 100 110 120 130 140 153)

Chapter 5 - 11����� ������ ���

Ex: Polyphase Sort/Merge� Sorting Phase����

✔ File 1: (50 95 110) (40 120 153) (22 80 140)✔ File 2: (10 36 100) (60 70 130)

� ��������✔ File 1: (22 80 140)✔ File 2: ���✔ File 3: (10 36 50 95 100 110) (40 60 70 120 130 153)

� ��������✔ File 1: ���✔ File 2: (10 22 36 50 80 95 100 110 140)✔ File 3: (40 60 70 120 130 153)

� ��������✔ File 1: (10 22 40 36 50 60 70 80 95 100 110 120 130 140 153)

Page 7: Chapter 05

Chapter 5 - 7

Chapter 5 - 12����� ������ ���

������ ��

� �� run��� R� Binary Sort/Merge✔Level�� = ⎡log2R⎤ + 1✔�������� = ⎡log2R⎤

� �� run��� R� k-way Sort/Merge✔�������� = ⎡logkR⎤✔k�� run����� key ���� run �

– Linear search: ����� = n * (k – 1) * ⎡logkR⎤– Selection tree: ����� = n * log2k * ⎡logkR⎤

= n * ⎡ log2R⎤

– Selection Tree? ⇒ See Section 5.8 of HSF

Chapter 5 - 13����� ������ ���

3. 2 Phase Multiway Merge/Sort� 2PMM����

✔2 Phase– Sorting Phase + 1�� Merging Phase– � phase������ �� read/write� 1� ��

✔Multiway– ��������������

✔2PMM � sorting���������� Memory �� M�����

Page 8: Chapter 05

Chapter 5 - 8

Chapter 5 - 14����� ������ ���

2PMM Algorithm� Phase 1

✔Fill main memory with records.✔Sort using favorite internal sort. (e.g. Quick Sort)✔Write sorted sub-list to a specific file.✔Repeat until all records are put into one of the sorted lists.

� Phase 2✔� sorted-list��������������.✔����������������, ����

�.✔���������, �����������.✔�����������������, ����

���������������.

Chapter 5 - 15����� ������ ���

Discussion of 2PMM� Analysis of Naive Implementation

✔Assume blocks are stored at random, so average access time is about 15 ms.

✔File stored on 250,000 blocks, read and written once in each phase.

✔1,000,000 disk I/O’s * 15 ms = 15,000 sec = 4+ hours.

� How many records can you sort with 2PMMS?✔(M / R)((M / B) - 1)

Page 9: Chapter 05

Chapter 5 - 9

Chapter 5 - 16����� ������ ���

4. Optimization� k-way Sort/Merge�� Parallel I/O

✔���

– Buffer �: 2k + 2 (double buffering for I/O)– ��, ��, ���������

✔Fixed Buffer Allocation– ������ 2�������

– ���������

✔Dynamic Buffer Allocation– Run��������� (Selection Tree ��)– Algorithm: HSF Section 7.11.3 ��

✔2PMM�� Parallel I/O ����?

Chapter 5 - 17����� ������ ���

Optimization - ��

� Run Generation✔���

– Memory ���� � �� run������

– Merge pass����

✔Algorithm– Double buffering���� 2�� I/O ���

– ��������� selection tree ��

– Selection tree: ��� = (M – 4 ) * rec_per_page– Tree� full���, ������

– ������ ������������

�����, run number���

– HSF Section 7.11.4 ��

Page 10: Chapter 05

Chapter 5 - 10

Chapter 5 - 18����� ������ ���

Optimization - ��

� Optimal Merging of Runs✔Run generation������ run������

✔� run����������������

✔External path length���������

� �

��

� � � ��

���� �� �� � ��

���� �� �� � ��

Chapter 5 - 19����� ������ ���

�� ����: Huffman Tree

void huffman(tree_pointer heap[], int n){

/* heap is a list of n single node binary trees */int i;tree_pointer tree;

initialize(heap, n); /* initialize min heap */for (i = 1; i < n; i++) {

tree = (tree_pointer) malloc(sizeof(tree_node));

tree->left_child = least(heap, n-i+1);tree->right_child = least(heap, n-i);tree->weight = tree->left_child->weight + tree->right_child->weight;insert(heap, n-i-1, tree);

}}

Page 11: Chapter 05

Chapter 5 - 11

Chapter 5 - 20����� ������ ���

Construction of a Huffman Tree� Run: 2, 3, 5, 7, 9, 13

� � �

� �

�� ��

� �

� �

�� ��

��

� �

�� ��

����

� �

��