algorithmsesslab.hanyang.ac.kr/uploads/algorithm_2018_2/lecture... · 2018-11-05 · greedy...
TRANSCRIPT
Content
• 16.1 An activity selection problem
• 16.3 Huffman Codes
2
Greedy Algorithms
• Greedy Algorithms
– A greedy algorithm always makes the choice that looks best at the moment.
– it makes a locally optimal choice in the hope that this choice will lead to a globally optimal solution.
3
An activity selection problem
• An activity selection problem– to select a maximum-size subset of mutually compatible
activities.
• For example– n classes, 1 lecture room
– to select a maximum number of classes
4
An activity selection problem
– activities set S = {a1, a2, ..., an}
– start time si, finish time fi, where 0 ≤ si < fi < ∞
– ai takes place during [si, fi)
– ai and aj are compatible if the intervals [si, fi) and [si, fi) do not overlap.
5
– {a3, a9, a11} : consists of mutually compatible activities. (not a maximal subset)
– {a1, a4, a8, a11} : largest subset of mutually compatible activities
– {a2, a4, a9, a11} : another largest subset
An activity selection problem
6
i 1 2 3 4 5 6 7 8 9 10 11
si 1 3 0 5 3 5 6 8 8 2 12
fi 4 5 6 7 8 9 10 11 12 13 14
An activity selection problem
• Subproblem (optimal substructure)
• a0 and an+1 and adopt the conventions that f0 = 0 and sn+1 = ∞.
• S = S0,n+1 for 0 ≤ i, j ≤ n + 1.
}:{ jkikij ssfSaS
7
An activity selection problem
• Let c[i, j] be the number of activities in a maximum-size subset of mutually compatible activities in Sij.– We have c[i, j]=0 whenever Sij = Ø ; in particular, c[i, j]=0 for
i≥ j.
– If ak is used in a maximum-size subset of mutually compatible activities of Sij,
– We also use maximum-size subsets of mutually compatible activities for the subproblems Sik and Skj.
8
An activity selection problem
9
}1],[],[{max
0
],[
jkckicjic
ijk Sajki
if Sij= Ø
if Sij≠ Ø
An activity selection problem
10
Theorem 16.1
Consider any nonempty subproblem Sij, andlet am be the activity in Sij with the earliest finish time:
fm = min {fk : ak ∈Sij}.
Then 1. Activity am is used in some maximum-size subset of mutually compatible activities of Sij.
2. The subproblem Sim is empty, so that choosing am leaves the subproblem Smj as the only one that may be nonempty.
An activity selection problem
Proof.
2. Some activity ak such that fi ≤ sk < fk ≤ sm < fm.
Then ak is also in Sij and it has an earlier finish time than am, which contradicts our choice of am.
We conclude that Sim is empty.
1. Aij : a maximum-size subset of Sij order the activities in Aij
in monotonically increasing order of finish time.
ak : the first activity in Aij.
11
An activity selection problem
– If ak = am
• We are done , am is used in some maximum-size subset of mutually compatible activities of Sij.
– If ak ≠ am
• Construct the subset
• A’ij are disjoint since Aij are
• ak is the first activity in Aij to finish fm ≤ fk.
• A’ij has the same number of activities as Aij
• A’ij is a maximum-size subset of mutually compatible activities of Sij that includes am.
12
An activity selection problem
13
An activity selection problem
• Optimal solution– Take earliest finish time
14
Greedy Algorithms
• Huffman Codes
– A widely used and very effective technique for compressing data
– Savings 20% ~ 90% are typical, depending on the characteristics of the data being compressed.
– Uses character frequencies
15
Greedy Algorithms
– For example• To represent 100,000 characters drawn from 6 characters
(a, b, c, d, e, f)
• Uses fixed-length : 300,000 bits• Uses variable-length :
(45·1+13·3+12·3+16·3+9·4+5·4)·1000=224,000 bits• A savings of approximately 25%
16
a b c d e f
Frequency (in thousands) 45 13 12 16 9 5
Fixed-length codeword 000 001 010 011 100 101
Variable-length codeword 0 101 100 111 1101 1100
Greedy Algorithms
• Prefix codes– Prefix code: no codeword is also a prefix of some other
codeword.• Encoding abc : 0·101·100
• Decoding 0·0·101·1101: aabe
• Easy decoding : tree for codes
17
a:45 b:13 c:12 d:16 f:5e:9
58 28 14
86 14
1000 1
0
0 0 0
01
1 1 1
a:45
b:13c:12 d:16
f:5 e:9
25 30
14
55
1000
0
0 0
0
1
1
1
1
1
Greedy Algorithms
– Decoding problem• a: 0 b: 01 c: 1
• 001: aac or ab
– Prefix code is required.• 0 : left child
• 1 : right child
• For example : – 0 = left = a
– 101=right-left-right = b
18
(the optimal prefix code tree)
a:45
b:13c:12 d:16
f:5 e:9
25 30
14
55
1000
0
0 0
0
1
1
1
1
1
Greedy Algorithms
19
Cc
T cdcfTB )()()(
– A cost of tree T• each character c in the alphabet C
• frequency of c : f(c)
• length of the codework for character c : dT(c)
(16.4)
Greedy Algorithms
• Huffman code : An optimal prefix code
– An optimal prefix code: full binary tree (Every node is either leaf or has two children)
– A full binary tree for alphabet C has |C| leaves and |C|-1
internal nodes.
20
Greedy Algorithms
• Building Huffman tree– Running Time : O(n lg n)
– Algorithm
21
f : 5 e : 9 c : 12 b : 13 d : 16 a : 45
a b c d e f
Frequency (in thousands) 45 13 12 16 9 5
Fixed-length codeword 000 001 010 011 100 101
Variable-length codeword 0 101 100 111 1101 1100
Greedy Algorithms
22
f : 5 e : 9 c : 12 b : 13 d : 16 a : 45
f : 5 e : 9
c : 12 b : 13 d : 16 a : 4514
0 1
Greedy Algorithms
23
f : 5 e : 9
c : 12 b : 13 d : 16 a : 4514
f : 5 e : 9 c : 12 b : 13
d : 16 a : 4514
10
25
10
Greedy Algorithms
24
f : 5 e : 9 c : 12 b : 13
d : 16 a : 4514
1
25
10
f : 5 e : 9
c : 12 b : 13d : 16
a : 45
14
1
25
10
0
30
10
Greedy Algorithms
25
f : 5 e : 9
c : 12 b : 13 d : 16
a : 45
14
1
25
10
0
30
10
f : 5 e : 9
c : 12 b : 13 d : 16
a : 45
14
1
25
10
0
30
10
55
10
Greedy Algorithms
26
a : 45
100
10
f : 5 e : 9
c : 12 b : 13 d : 1614
1
25
10
0
30
10
55
10
a b c d e f
Frequency (in thousands) 45 13 12 16 9 5
Fixed-length codeword 000 001 010 011 100 101
Variable-length codeword 0 101 100 111 1101 1100
Greedy Algorithms
• Correctness
27
Lemma 16.2
Let C be an alphabet in which each character c in C has frequency f [c].
Let x and y be two characters in C having the lowest frequencies.
Then there exists an optimal prefix code for C in which the codewords for x and y have the same length and differ only in the last bit.
Greedy Algorithms
Proof.
Idea : take an arbitrary optimal prefix code tree T.
Modify it and to make a tree representing another optimal prefix code such that the characters x and y appear as sibling leaves of maximum depth in the new tree.
Codewords will have the same length and differ only in the last bit.
28
Greedy Algorithms
29
b
T
y
a
x
T′
y
x b
a
T′′
b
x y
a
Greedy Algorithms
• Let a and b are sibling leaves of maximum depth in T.
• assume f[a] ≤ f[b] and f[x] ≤ f[y].
• f[x] and f[y]are lowest leaf frequencies, in order
• f[a] and f[b] are arbitrary frequencies, in order
• f[x] ≤ f[a] and f[y]≤ f[b] .
• exchange the positions in T and T′
• By equation (16.5), the difference in cost between T and T’ is
30
0
))()(])([][(
)(][)(][)(][)(][
)(][)(][)(][)(][
)()()()()'()(
''
'
xdadxfaf
xdafadxfadafxdxf
adafxdxfadafxdxf
cdcfcdcfTBTB
TT
TTTT
TTTT
CcT
CcT
Greedy Algorithms
– f [a]-f [x] and dT(a)-dT(x) are nonnegative. because x is a minimum-frequency leaf and a is a leaf of maximum depth in T.
– B(T’ ) - B(T ’’) is nonnegative. Therefore, B(T ’’) ≤B(T), and since T is optimal, B(T) ≤ B(T ’’), which implies B(T ’’) = B(T).
– Thus, T ’’ is an optimal tree in which x and y appear as sibling leaves of maximum depth, from which the lemma follows.
31
Greedy Algorithms
32
Lemma 16.3
Let C be a given alphabet with frequency f[c] defined for each character c∈C.
Let x and y be two characters in C with minimum frequency. Let C′ be the alphabet C with characters x, y removed and
character z added, so that C′ = C - {x, y}U{z}; define f for C′ as for C, except that f[z] = f[x] + f[y].
Let T′ be any tree representing an optimal prefix code for the alphabet C′.
Then the tree T, obtained from T’ by replacing the leaf node for z with an internal node having x and y as children,
represents an optimal prefix code for the alphabet C.
Greedy Algorithms
Proof.
For each c∈C – {x, y}, we have dT(c) = dT’(c), and hence f [c]dT(c) = f [c]dT’(c). Since dT(x) = dT(y) = d’(z) + 1, we have
from which we conclude that B(T) = B(T’) + f [x] + f [y]
or, equivalently, B(T’) = B(T) - f [x] - f [y].
33
])[][()(][
)1)(])([][()(][)(][
'
'
yfxfzdzf
zdyfxfydyfxdxf
T
TTT
Greedy Algorithms
Suppose that T does not represent an optimal prefix code for C.
• B(T’’) < B(i)
• T’’ has x and y as siblings.
• Let T’’’ be the tree T’’ with the common parent of x and y replaced by a leaf z with frequency f [z] = f [x] + f [y].
B(T’’’) = B(T’’) – f [x] – f [y] < B(T) – f [x] – f [y] = B(T’)
contradiction
– T must represent an optimal prefix code for the alphabet C.
34
Greedy Algorithms
Proof.
– Immediate from Lemmas 16.2 and 16.3.
Theorem 16.4Procedure HUFFMAN produces an optimal prefix code.
35