the problem solving process - mcmaster …cs2md3/lecturenotes.doc · web viewmultiply matrices a...
TRANSCRIPT
THE PROBLEM SOLVING PROCESS
1
MATHEMATICALMODEL
INFORMAL ALGORITHM
ABSTRACT DATA TYPES
PSEUDO – LANGUAGE PROGRAM
OROTHER FORMAL
DESCRIPTION
DATASTRUCTURES
PROGRAM(Pascal, C, C++, ete.)
DATA TYPE VERSUS ABSTRUCT DATA TYPE
DATA TYPE: Set of values (or objects)
ABSTRUCT DATA TYPE (ADT): Set of objects + a mathematical model with a collection of operations defined on the model.
2
DATA TYPE: Set of values (or objects).
Fortran 77: INTEGER, REAL, CHARACTER/STRING LOGICAL
Composite types: Array of Integers Array of reals Etc.
Pascal: Basic types: integer, real, character, BooleanComposite types: array of integers
Array of charactersEtc.
Record of integers/reals/charactersEtc.
Set of…. File of…
THERE ARE OPERATIONS ASSOCIATED WITH EACH TYPE
AGGREGATING TOOLS: array, record, file
3
C:
Basic types: int, real, char
Composite types: arrays, structures
WHAT ABOUT POINTERS?
Pointer can be treated as a data type, but usually it’s treated as a DATA STRUCTURING FACILTY.
4
ABSTRUCT DATA TYPE (ADT)
Set of objects plus a mathematical model with a collection of operations defined on the model
Example: List a1, a2 ,…, an
LIST (of integers) is an ADT with the following operations:
1. Calculate the length of the list
2. Get the fist member of the list and return null if empty
3. Retrieve the member at position P and return null if P doesn’t exist
4. Locate X in the list
5. Insert X into the list at position P
6. Delete the member at position PP = 1 2 3 4 5 6 7 8L = 50, 60, 23, 47, 21, 39, 60, 40
1. LENGTH (L) = 8
2. FIRST (L) = 50
3. RETRIEVE (4, L) = 47RETRIEVE (9, L) = null
5
4. LOCATE (60, L) = 2
5. INSERT (30, 5, L) gives the result:
L = 50, 60, 23, 47, 30, 21, 39, 60, 40
6. DELETE (3, L) gives the result
L = 50, 60, 47, 21, 29, 39, 60, 40
ALL OPERATIONS ARE ATOMIC EXCEPT FIRST SINCE:FIRST (L) = RETRIEVE (1, L)
EXAMPLE: ADT STACK (OF INTEGERS)
1. Retrieve the top element2. Delete the top element (POP)3. Insert x at the top (PUSH)4. Test if the stack is empty
27
40
326
S =
1. TOP (S) = 27
2. POP(S) = results in S =
3. PUSH (0, S) results in
S =3. EMPTY (S) = false
Example: ADT MATRIX (OF REALS)
1. Return number of rows
2. Return number of columns
3. Multiply matrices A and B
4. Add A and B
5. Compute the transpose of matrix A
6. Delete a rows/column
40
32
0
274032
7
7. Add a row/column
8. Multiply matrix A by real number 6
OBSERVATIONS:
1. Domain of an operation may involve more than one ADTType
2. Some operations are partial
3. Range of an operation may be a different ADT
A simple application of ADT – evaluation of arithmetic express
a + b*c/d **e +f
Algorithm: Value (x : expression); oprnd: STACK OF REALS optor: STACK OF CHARS x1, x2 : REAL i: INTEGER Initialize oprnd and optor for i:= to LEN (x) do
case x[i] of
8
real: PUSH (x[i], oprnd)char: if TOP (optor) < x[i] then
PUSH (x[i], optor)
elserepeat
x2: = TOP (oprnd);POP (oprnd);x1: = TOP (oprnd);POP (oprnd);x1: = x1 TOP (optor) x2; PUSH (x1, oprnd);POP (optor)
until TOP (optor) < x[i];PUSH (x[i], optor)
end if endcaseValue: = top (oprnd)
Comparison of ADT’s with procedure – the advantages
1. GENERALIZATION
Procedures are generalization of primitive operations (e.g. +, -, *,….)
ADT’s are generalizations of primitive data types.
2. ENCAPSULATION (OR MODULARITY)
A procedure encapsulates all the statements relevant to a certain aspect of a program.
9
An ADT encapsulates all the definitions and the operations relevant to a data type.
How to implement an ADT?
Note that a data structure doesn’t have to be associated with an ADT.
Data Structure: A collection of data objects connected in various ways.
A data structure is always associated with a specific programminglanguage.
FORTRAN 77: the only data structuring facility is ARRAY
PASCAL: we have: ARRAY, RECORD, and POINTER
C: ARRAY, STRUCTURE, POINTER
Some important terms:
10
Cell: a box capable of holding a value drawn from some basic or composite data types (e.g. integer, record…)
CELL IS BASIC BUILDING BLOCK OF DATA STRUCTURE
Pointer: a cell whose value indicates another cell
Cursor: an integer-valued cell, used as a pointer to an array
11
Example: A simple data structure is given below.It may be used in the implementation of ADT MATRIX.
A p o i n t e rCELL
a11
a11 a11
a22 a22a22
am1 am1 am1
12
TypeCell type = record
Element: real
Down: cell type
Right: cell type
End
13
Example: A data structure below way be used in the implementation of ADT LIST
1
2
3
4A CURSOR
L = 7.8, 1.2, 5.6, 3.4
1
2
3
4
1.2 3
3.4 0
5.6 2
7.8 1
2 4
1.2 3
3.4 0
5.6 2
7.8 1
TypeRecord type = RecordCursor = integer;
Ptr: Record typeend
-1 ≡ uil pointer cursor 0 ≡ uil pointer
14
ALGORITHM VERSUS PROGRAM
An algorithm is a finite sequence of instructions satisfying the following criteria.
1) Definiteness : - each instructions must be clear and unambiguous
2) Finiteness: - the algorithm will terminate after a finite number of steps for all cases.
3) Effectiveness:- each instruction can be performed using a finite amount of resource (time and space)
A (well-defined) program in principle is similarly described, but program:
1) is always associated with a specific programming Language
2) May not half (e.g. Operating systems)
All the programs. We are interested in; half pseudo-Pascal is our chosen language
WE WILL USE ALGORITHM AND PROGRAM INTERCHANGABLE
15
Examples:
Proc search (x: integer; A : array [1…10] of integer)i=1;while x <> A[i] and I <= 10 do i:= i + 1;search: = i
end
proc print ( S: set) Print the elements of Send
proc Pi print all the digits of Pi never endsend
16
The running time of a program depends on
1) Computer speed2) Compiler quality3) Input to program4) Program efficiency (or quality)
The TIME COMPLEXITY of a program is defined as a function of input, usually the SIZE of input.
17
Program A is of worst case time complexity T(n) if the maximum running time of A on any input of size n is T(n).
THE UNITS OF T (n) ARE UNSPECIFED.
Although the constants in T (n) are important, we are more interested in the growth rate (or order) of T (n).
e.g. 2n ≈ 10n + 1000
2n << n2 when n is large
f (n) << g (n) ↔ lim f (n) n∞ g (n)
18
IMPORTANT DEFINITION
T (n) is 0 (f(n)) if there are constants C and n0 such that
T (n) ≤ C f (n) when n ≥ n0
Note : 0 (f(n)) actually denotes a class of functions of the same or slower growth rates, and it would be better to write
T (n) Є 0 (f (n))
No
19
Examples:
3n2 + 16n + 8 = 0 (n2)
C = 4 n0 = 17 n > 17 3n2 + 16n + 8 ≤ 4 n2
--------------- ----- T (n) f (n)
n logn = 0 (n2)
n > 0 n logn < n2, jo C = 1, n = 0
3n3 – 6n2 ≠ 0 (n2)
If n > 0 THEN 3n3 -6n2 >Cn2 3n – 6 > CHence if n > C + 6 then 3n3 – 6n2 > Cn2
3
Stands for “is”, not “equals”
V C Э n0 n > n0 3n3 – 6n2 > Cn2
20
kΣ ai ni = 0 (nk) when ak > 0i= 0
106 = 0 (1) = 0 (2)
100n + 105 = 0 (n)
n4 + n2 + n + 6 = 0(n4)
2n + n100 = 0(2n)
3n >> 0 (2n)
log10 n = 0 (log2n) since log10n = log2nlog210
0 (f(n)) is an upper bound of the at the growth rate order of T (n) if T (n) = 0 (f(n))
21
To specify a lower bound, we use Ω.
DEFINITION:
T (n) is Ω (f(n)) if there is a constant C such that T (n) ≥ c f(n) infinitely of ten.
½ n + 100 = Ω (n) T (n)
F (n)
C = ½ ½ n + 100 > C n
T (n) = n n is odd & n ≥ 1
n2 /100 n is odd & n ≥ 1
T (n) = Ω (n2)
C = 1/100 T (n) ≥ C n2 for n = 0, 2, 4, 6,
22
WHY IT IS IMPORTANT?5n2
2n n3/2100n
3000
2000
1000
5 10 15 20 n
Running times of 4 programs
1000 jek ≈ 17 minutes
23
HOW LARGE A PROBLEM CAN WE SOLVE
SUPPOSE THAT WE NOW WE BUY A MACHINE THAT RUN 10 TIMES FASTER AT NO ADDITIONAL COST. THEN FOR THE SAME COST WE CAN SPNED 104 SECONDS ON A PROBLEM WHRE WE SPENT 103 SEC BEFORE Running time
T (n)Max Problem size
for 103 secMax problem
size for 104 secIncrease in
Max problem size
100 10 100 1000%
5n2 14 45 320%
n 3/2 12 271 230%
2n 10 13 130%
THE 0 (2n) PROGRAMS CAN SLOVE ONLY SMALL PROGRAMS NO MATTER HOW FAST THE UNDERLYING COMPUTER IS.
24
THEOREM
IF T1 (n) = 0 (f(n)) AND T2 (n) = 0 (g(n))
THEN T1 (n) + T2 (n) = 0 (max (f (n)), g (n)).
PROOF
THERE ARE c1, n1, c2, n2 SUCH THAT
n ≥ n1 T1 (n) ≤ c1 f (n)n ≥ n2 T2 (n) ≤ c2 g (n)
LET n3 = max (n1,n2). THEN
n ≥ n3 T1 (n) + T2 (n) ≤ c1 f (n) + c2 g (n) ≤ (c1 + c2) max (f(n), g (n)).
ENDPROOF
HENCE: 0 ( f (n)) + 0 (g (n)) = 0 (max (f (n),g (n)))
0 ( n2) + 0 (n3) = 0 (n3)0 (n2) + 0 (2n2) = 0 (2n2) = 0 (n2)
25
THEOREM
IF T1 (n) = 0 (f (n)) AND T2 (n) = 0 (g (n))
THEN T1 (n) T2 (n) = 0 (f (n) g(n))
PROOF
THERE ARE c1, n1, c2, n2 SUCH THAT
n ≥ n1 T1 (n) ≤ c1 f (n)n ≥ n2 T2 (n) ≤ c2 g (n)
LET n3 = max (n1,n2). THEN
n ≥ n3 T1 (n) T2 (n) ≤ c1 c2 f(n) g (n)
ENDPROOF
HENCE: 0 ( f (n)) 0 (g (n)) = 0 (f (n) g (n))
0 ( n2) 0 (n5) = 0 (n7)0 (n2) 0 (2nh) = 0 (n22h) = 0 (2h+2logh)
OTHER IMPLICATIONSf(n) f (n)
Σ 0 (g (i, n)) = 0 ( Σ g (i,n)) i=1 i=1
max (0 (f(n)), 0(g (n)) = 0 (max(f (n)), g(n)))
26
0 (f(n)) = 0 (g (n)) * ASYMMETRIC!
0 (f (n)) ≤ 0 ( g(n))
Means IS
MY CAT IS BLACK ≠ BLACK IS MY CAT
0 : FUNCTION → SET OF FUNCTIONS
0 (n2)n2 0
DEF: 0 (f (n) 0 (f (n) == 0 (f (n) g (n))
Any Operator+, ., ETC.
N2
1000 n2 + 5
27
OTHER USEFUL RULES
f(n) 0 (f (n))
C 0(f(n)) 0 (f(n))
0 (f(n)) + 0 (f(n)) 0(f(n))
0 (0 (f(n)) 0 (f(n))
0 (f(n))0(g(n)) 0 (f(n))g(n))
0 (f(n)g(n)) f(n)0(g(n))
REMEMBER: HERE IS ASYMMERIC!
28
CALCULATING COMPLEXITIES OF ALGORITHMS
Procedure : bubble (var A : array [1…..n] of int);
BUBBLE SORT A INTO INCREASING ORDERVar i, j, temp: interger:
Begin
1 for i: = 1 to n -1 do 2 for j:=n down to it 1 do
3 if A[j-1] > A[j] then begin 4 temp: = A[j-1]; swap
A[j-1] and A[j]5 A[j-1] := A[j];
6 A[j] := tempend
(3) – (6) TAKES 0(1)
(2) – (6) TAKES (n -1) 0(1) + 0(1) = 0(n-1)
(1) – (6) TAKES:
n n
Σ [0(n-i) + 0(1) = 0 ﴾Σ (n-i) = 0 (n(n-1))i=1 i=1 2
29
0(n 2 -n) 0 (n2) 2Function test (m: integer) : Boolean;TESTS IF m IS A POWER OF 2, I.E. M2K FOR SOME kBegin if m =1 then test:= true
else
if (m mod 2 = 0) then test:= test (m/2)
else test:=false endLET T(m) = time complexity of test
1→C1
2 →C2
3→C1
4→C2 + T(m/2)
5→C2 C1 +C2 m =12c1 + c2 m odd, m >1
T (m) = 2C1 + C2 + T (m/2) m even
A recurrence equation
Define a new function:
30
C1 + C2 m ≤ 1T’(m) =
2C1 + C2 + T’ (m/2) m > 1
Then T(m) ≤ T ‘(m) for all the m > 0 i.e. T ‘(m) is an upper bound of T (m).
Note: T ‘(m) is defined for all real numbers.
T ‘(m) = 2C1 + C2 + T’ (m/2)
= 2(C1 + C2 )+ T’ (m/22) = 3(2C1 + C2 ) + T’(m/23 )
… = [log2m] (2C1 + C2) + T’ m
2[log2m]
= (2C1 + C2 ) [log2m] + C1 + C2
= 0(logm)
THUS: T’(m) = 0 (logm), T(m) = 0 (logm)
Worst case occurs when M= 2k
31
Ceiliuy :
[x] is the smallest integer ≥ x→e.g. [1.5] = 2
[3.1] = 4[3.0] = 3
NOTE THAT:
2[log2m] ≥ m
IF m = 2k log2m = k,
[log2m] = k and 2[log2m] = m
m = 6
log24 = 2 & log2 8 = 3 → 2 <log26 < 3 → [log26] =3 → 2[log2m] > m
32
Problem:
What is T (m) ? Is m the length of input!
M is 100, 15, 64, etc, just number!!
IF m is BINARY and is the number of bits of m, THEN
n = [log2m]
And T (m) = 0 (log2m) → T (n) = T ([log2m]) = 0 (log2m)
i.e. T (n) = 0 (n)
M CAN BE TREATED AS : 000….0,m
I.E. m UNITS, THEN
THEN “LENGTH” OF M IS m, and
T (n) = T (m) = 0 (log n)
33
DESIGN OF A PROGRAM
TOP – DOWN / BOTTOM – UP APPROACH
STEPWISE REFINEMENT, COOSE ADT’S AND DATA STRUCTURES
CODING
A REMARK ABOUT RUNNING TIME
ALTHOUGH THE ORDER OF RUNNING TIME IS VERY IMPORTANT, WE SHOULD ALSO CONSIDER THE FOLLOWING FACTORS IN PRACTIC.
1. THE TIME IT TAKES TO WRITE AND DEBUG THE PROGRAM
2. READABILITY, MODULARITY, ETC. HOW HARD IS TO MAINTAIN THE PROGRAM
3. SOMETIMES CONSTANTS ARE ALSO IMPORTANT
4. SPACE (OR STORAGE) COMLEXITY
5. ACURACY
34
ADT LIST
A list is a sequence of zero or more elements of a given type (element type).
L = a1, a2, a3, …., an
length = nfirst = a1
last = a1 some data type
ai is at postion i
ai1 precedes ai
ai followa ai1
END(L) = position n+1
Operations
INSERT (x,p,l); DELETE(p,L);
LOCATE (x,L); RETRIEVE(p,L);
MAKENULL(L): L ← Є
FIRST(L); NEXT(p,L); PREVIOOUS(p,L)
PRINT(L); LENGETH(L); REVERSE(L)
CONCAT(L1, L2); etc. EMPTY(L)
Array implementation of lists
35
Last 1
2 list
empty
max
Const max =?;type position = 1..max; LIST = record elements: array [positions] of elements type: last: o ..max end;function END (var L: LIST) : integer; begin END : = L.last+1 end;
last
a1
a2
an
36
1 2 3 p max
a1 a2 a3 ap an-1 an
Procedures INSERT (x: elements type; p: position; var L:LIST);
Var q: position begin if L.last = max then error (‘list is full) else if (p>L.last+1) or (p>1) then error else begin for q:=L.list downto p do L.elements[q+1]:=L.elements[q];
//shifting to the right// L.last:=L.last+1; L.elements[p]:=x End End;
Time co,plexity: INSERT, DELETE, LOCATE – O(n) RETIEVE, NEXT, PREVIOUS END, FIRST, MAKENULL – O(1)Avg. time INSERT, DELETE, LOCATE – O(n)
Pointer implementation (linked list)
37
Cell 0 cell 1 cell 2 cell n
..
Headerlist
L
Type Celltype = record Element : elementtype; Next: ↑ ceeltype
End; LIST = ↑ celltype;
Position = ↑ celltype;Position i : a pointer to cell i -1, 1≤ i ≤n+1
Function END(L.LIST) : positionVar
q: position begin q:=L; while q. ↑. Next < > nil do q := q. ↑. Next; END:=q End;
LIST: record
a1 an .
a2
38
first: ↑ celltype; last: ↑ celltype end;Insert x at p time O(1)
…… …..
p
Delete cell at p Time O(1)…
….
p
Time O(n)
LHeader p
PREVIOUS (p, L)
a b
a b c
39
INSERT, DELETE, RETRIVE, NEXT, FIRST, MAKENULL –O(1) PREVIOUS, LOCATE, END – O(n)Compare the two implementations
1. maximum size of the list – array
2. waste of space – both
3. operation speeds
array pointer
INSERT O(n) O(1)DELETE O(n) O(1) PREVIOUS O(1) O(n) END O(1) O(1) or O(n)
4. pointer representation can be dangerous!
e.g. q:=NEXT(p,L);INSERT (X,P,L)
. . .
IF q=NEXT(p,L) then
40
P q q≠NEXT(p,L)!
DOUBLY – LINK – LISTS Cell 1 cell 2 cell3 cell n
Type
Cell type = recordElement: elementtype;
Next, previous: celltype End; Position: celltype;
Position I: a pointer to cell I
Function LAST (L)Begin LAST : = L.previous
End;
a2 a1 a3 an
WHAT HAPPENS IF POINTERS AREN’T AVAILABLE? USE CURSOR!
41
PATTERN MATCHING IN STRINGS
ALPHABET A = a1, a2, …. , ak
SYMBOL/CHARACTER
A STRING x = a1, a2, …. , an n 0 , ai A
STRINGD A SPECIAL CAST OF LISTS
PATTERN MACTHING: x = a1, a2, …. , an
Pat = b1, b2, …. , bm
Is pat a substring of x?
i.e. ( I : 1 I n – m +1) ai ai+1 …. = b1b2 …bm
x = aabbbabbbaaa pat = bab
1234567891011x = aabbabbbaaa yes i= 4 pat = bab
pat = abab => No
42
SIMPLE ALGORITHM
x = aabbabbbaaa pat = bab
aabbabbbaaa NO bab aabbabbbaaa NO bab
aabbabbbaaa YES! bab
BUT FOR pat = aaa WE NEED TO MOVE
FROM aabbabbbaaa TO aabbabbbaaa aaa aaa
SIMILARLY for pat = abab, from 1234567891011
aabbabbbaaa TO aabbabbbaaa abab abab
8 = 11 – 4 +1
43
WORST CASE
x = a1, a2, …., am am+1 …..an-m+1 …. an b1 b2…bm b1 b2… bm
n-m+1 passesEACH PASS TAKES O(m) comparisons, HENCE(n-m+1) O(m) = O(m(n-m+1)) = O (mn)procedure find (x, pat : STRING; var found : Boolean; var i : position)
Found is set to false if pat doesn’t occur int x, otherwise found is set to take and I is set to the first position in x where pat begins)
Var p,q : position;
BeginIf not EMPTY (x) and not EMPTY (pat) then
Begin Found: = false; i:= FIRST(x);
while not found and i END (x) do
begin p:= i; q: = FORST(pat);
while RETRIEVE (p,x) = RETRIEVE (q, pat) and not found do
44
begin
p: = NEXT(p,x); q:= NEXT(q,pat)
IF q= END (pat) then found : = true End;
If not found then i:=NEXT(i,x) End; IF END(L) IS O(1) THEN T(n,M) = O(MN) ENDEND
THE KNUTH, MORRIS PRATT ALGORITHM ( KMP)
X = abaababaabacabaababaabaab MISMATCH
Pat= abaababaabaab
45
WHAT DO YOU DO NEXT?X =
Pat = NEXT MOVE
X =
Start comparisonX= abaababaabacabaababaabcab
abaababaabaab
math u
abaababaaba
uabaababaabacabaababaabaab abaaba baabaab
start comparing & mismatch
u w u c
u w u a
u w u c
u w u a
46
abaaba
abaababaabacabaababaabaab abaababaabaab
START COMPARING & MISMATCHU
aba uabaababaabacabaababaabaab
abaababaabaab start comparing & mismatchabaababaabacabaababaabaab abaababaabaab ↑ mismatch
abaababaabacabaababaabaab abaababaabaab COMPARING
NUMBER OF COMPARISON IS O (n) BUT HOW TO FIND
OUT WHAT IS U?
47
LET pat = b1b2 … bm
OR EACH 1 ≤ j ≤ m, LET
Largest i sud that 0 < i < j and b1… bi = bj –i+1 … bj f (j) =
0 if sud i does not exist
f (j) < j FAILURE FUNCTION
j 1 2 3 4 6 7 8 9 10 11 12 13 14Pat = a b a a b a b a a b a a Bf(j) 0 0 1 1 2 3 2 3 4 5 6 4 5
aba abaa abaab abaaba
abaabab abaababa abaababaa
abaababaab abaababaaba
abaababaabaa abaababaabaab
48
TIME COMPLEXITY:
T (n, m) = O (n + complexity of defining g) = O (n + complexity of defining f)
0 if j = 1
f(j) = fs(j -1) +1 where s is the smallest I such
that bfi(j-1) +1 = bj
0 if no such i exist
f i (j -1) = f (f(… f(j -1)…..)
i times
49
f 3 (j -1) = f (f (f (j-1)))T(j-1) T1 j - 1
jf(j- 1)
HERE f(j) = f (j -1) +1
j -1
i= f(j -1) j
u j -1
ww
j
f(i) = f (f(i-1)) = f2 (i-1)
u a u a
U b u a
a b a
50
f2(i-1) +1
proc fail (pat[1…m], vav f: away [1…m] of integer )
vav i, j = integer;
begin f[1] : = 0; for j: = 2 to m do
begin i:= f [j -1];
while (pat[j] ≠ pa[i+1 and i > 0) do i:= f[i]; if pat[j] = pat[i+1] then f[j]:= i+1
else f[j]:=0 andend
T (m) = O (m) !
51
Procedure KMP (x, pat, g, found, i);
x[1..n] , pat [1..m] are strings ;
g[j] = g (j), 1 ≤ j ≤ m
var p, q : position
begin if n ≠ 0 and m ≠ 0 then
begin p:=1; q:=1; while (p ≠ n+1 and q ≠ m+1) do
if x[p] = pat [9] then begin p: = p+1; q: = q+1; end else if q =1 then p:= p+1 else q: = g[q];Time = 0(m)
If q = m+1 then begin found : = true; i: = p-m
end else found :=false
end else ……end ;
52
ADT STACK
“ LAST-IN-FIRST – OUT” LIST (LIFD)
OPERATIONS:
MAKENULL(S) : make stack S empty
TOP(s) : Return the top element of s RETRIEVE (FIRST, S)
TOP (s) = RETRIEVE (I, S)
POP(S) : Delete the top element of S Sometimes POP is defined as function that returns the element being popped out DELETE (FIRST (s) , S)
POP (s) = DELETE (I, S)
PUSH (X,S); insert x at the top of S
PUSH (x, s) = INSERT (x, I , s) INSERT (x, FIRST(S), S)
EMPTY (s) : Return true if S is empty, false otherwise
53
A SIMPLE EXAMPLE:
F: erase characters, if cancels the previous uncancelled character
@: kill character, if cancels all previous characters on the line
abc # d @ aa#b = ab
Procedure EDIT
Var S : STACK OF CHAR; C: CHAR
Begin MAKENULL (S); Read (c);
While not end ( c ) do
Begin If c = ‘#’ then POP (s)
Else if c = ‘@’ then MAKENULL(S) Else PUSH (c, S); Read (c) End;
PRINT S IN REVERSE ORDER
End
54
ARRAY IMPLEMENTATION OF STACKS
TOP
1 1
2 force
K K
stack
max MAX
type : position = 1 … max; STACK = record
Top : 1 .. max +1;
Elements : away [position] of element type PUSH, POP, TOP – O ( 1)
k
1ST ELEMENT
2ND ELEMENT
LAST ELEMENT
55
MORE SPACE – EFFICIENT IMPLEMENTATION
POINTER IMPLEMENTAION
Stack
MANY STACKS IN ONE ARRAYTOP
12 tree
3
BOTTOM
12
3
Stack pace
a b c .
STACK 1
STACK 2
STACK 3
56
ADT QUEUE
A QUEUE IS A “First – in – First – Out” LIST > (FIFO)
OPERATIONS:
MAKENULL (Q);
FRONT (Q) : return the first element of Q
FRONT (Q) = retrieve (first (Q), Q)
ENQUEUE (x, Q) : inserts x at the end of Q ENQUEUE (X, Q) = INSERT (X, END (Q), Q)
DEQUEUE (Q): DELETES THE FIRST ELEMENT OF Q
DEQUEUE (Q) = DELETE (FIRST (Q),Q)
EMPTY (Q):
57
POINTER IMPLEMENTATION
header…
front
near
type celltype = record
element : elementtype;
next : ↑ celltype
end;
QUEUE = record Front, rear : ↑ celltype End;
FUNCTION EMPTY (Q : QUEUE) : Boolean; Begin
If Q. front = Q.rear then EMPTY: = true Else EMPTY : = false End
EACH OPERATION – 0 (1)
ana2a1
58
ARRAY IMPLEMANTATION
FRONT TREE
1
QUEUE
REAR
MAX
TREE
1ST ELEMENT
2ND ELELMENT
LAST ELELEMT
59
ENQUEUE – 0 (n)
CIRCULAR ARRAY IMPLEMENTATION(BUFFER!)
Max -1
max
real 1
2
an
…. .
a2 . . a1 A1
60
HOW DO WE DISTRINGUSH BETWEEN FULL AND EMPTY
MAINTAIN AN EXTRA BIT FULL ≡ (FRONT = addone(addone(real)))
1 if have been to (i, j) Mark [i, j] =
0 otherwise
IF NO WAY OUT, BEACK UP ONE CELL AND TRY A DIFFERENT MOVE
MUST STRORE THE CURRENT PATH SOMEWHERE
A PATH: (i, j), (i2, j2), …., (is, js)
(is, js)
(is-1, js-1)
.
.
.
.
(iz, jz)
(i, j)
61
STACK
Type offsets = record X: -1 …1 Y: -1 …1 End
NW N NE
(i-j, j -1) (i-1, j) (i-1, j+1)
W (j, j-1) (i,j) (i, j + 1) E
WS (i + 1, j -1) (i+1, j) (i+1, j+1) SE
62
Directions = (N, NE, E, SE, S, SW W, NW);
Var move : away [directions] of offsets
d Move[d] .x Move[d].y
N -1 0NE -1 1E 0 1SE 1 1S 1 0SW 1 -1W 0 -1NW -1 -1
Var maze : array [0 : m+1, 0 … n + 1] of 0 …1 Mark : array [1..m, 1…n] of 0…1
63
Type: dir = (N, NE, E, SE, S, SW,W,NW,D)Type: elementtype = recond DEAD END X: 1 … m; Y: 1 … n;
Start: dir End
STACK = ……..
Var path : STACK
Fuy NEXTMOVE (loc : elementtype) : dir
Var d: dir; S, r, i, j: out Found : bool;
begin i = loc.x; j := loc.y; d:=loc.start; found:= false;
while d# D ∩ not found do
Begin s: = move [d].x; r: = move[d].y; if maze [i + s, j+r] = 0 and mark [ i + s, j +r] = 0 then found : = true else d:= Succ (d)
end; NEXTMOVE : = d end
64
proc rat ( var: maze [0 … m+1, 0 … n+1] of 0…1);
var mark: array [1 …m, 1…n] of 0…1; path : STACK; location : elementtype; d: dir
function NEXTMOVE (X: elementtype): dir; ….;begin mark: = (0); MAKENULL (path); initialzation (should be specified last) location:= (1,1, E) ; mark [1,1] : = 1; PUSH (location, path );
While not EMPTY (pathy) dobegin location: = TOP (path) ; POP (path); d: = NEXTMOVIE (Location) if d = D then begin location.start: = succ(d); PUSH (location, path); location.x : = location.x + move [d].x; location.y : location.y + move [d].y; if location.x = m and location.y = n then begin print(path); return end else begin PUSH ((location.x, location.y, N), path); Mar(location.x, location.y) =1 End End End endend
65
TIME COMPLEXITY OF RAT : O (mn)
SPACE COMPLEXITY OF RAT :O (mn)
BUT WITHOUT SING MARK O (8mn) = O (2mn)
An application of queues - breadth-first search in trees
tree T
V1
V2 V3
V4V5
V6
V11V9
V7 V8 V10
a1
a10
a2
a11
a6
a4
a3
a9
a5
a8a7
66
binary LEFT (v), RIGHT(v) left child right child of V of V
e.g., LEFT (v3 = null, RIGHT (V3) = V6 DATA(V1) = a1
ROOT(T) = v1
Searching in tree
Given tree T and data x,
find a node v of T s.t. DATA(v) = X.
Possible approaches:
1. Breadth-first search
try level 1, then level 2, then
level 3, ...etc.
2. Depth-first search:
search along the leftmost path until the leaf is reached, then back-
up, try the 2nd leftmost path, ...etc.
67
Breadth-first
X = 20
V1
V2 V3
V4V5
V6
V12V10
V7 V8 V9 V11
Searching v1, v2, v3, v4, v5, v6, . . .
Depth-first
Searching order: v1, v2, v3, v4, v5, v6, . .
10
7
50
30
20
2
5
5
60
24 20
68
Procedure DSearch(x,T)
begin if x = DATA(ROOT(T))’ then PrintROOT(T) Else DSearch left subtree; DSearch right subtree; end;nonrècursive version
procedure DSearch (x,T); var path : STACK of nodes v: node; begin v := ROOT(T); MAKENULL (path); PUSH(v,path); while not empty (path) do begin v := TOP(path); pop(path); if DATA(v) = X then Print v elsel PUSH (LEFT(v)); PUSH(RIGHT(v)); e // swap// end end
Time: 0(n)Space: 0(n)Space avg: 0(Iogn)
69
Procedur BSearch(x,T)
Var level : QUEUE of nodes; V : node;begin v := ROOT(T); MAKENULL(level); ENQUEUE(vjevel);
while not empty (level); begin V := FRONT(level); DEQUEUE(Ievel); if DATA(V) = x then Print v; stop else begin ENQUEUE(LEFT(v), level); ENQUEUE(RIGHT(v), level) end endend;
Time:O(n) n =/T/ --------------size of T
Space: 0(n)
Avg :0(n)
70
Application – implement a DOS command cd:\
Cd:\ name – change current directory to subdirectory name
What should cd:\letters do?
BFS DFS?
When do we use DFS?
e.g., solution tree
A:
jobletters
WP 5.0letters
project letters
study
homework
71
Proc. A(x1, x2,....)
Var y1, y2, …Begin
.
.A(a1, a2, …)
L1 …. ….
Proc. A(x1, x2,....)Var y1, y2, …
begin.....
A(b1, b2, …)L3 .
.
.
.
.
.
Proc. A(x1, x2,....)Var y1, y2, …
begin.....
B(c1, c2, …)L3 .
.
.
.
.
.
.
Proc. B(x1, x2,....)Var f1, f2, …
begin.............
72
Ellmination of Recursion
Sometimes it is absolutely necessary to eliminate recursive
• recursive calls are not supported e.g., FORTRAN
• speed is the first priority - do it by yourself
Solution: STACK of activation records
Generally, an activation record holds
1. current values of the parameters (pass by value)
2. current values of the local variables
3. a label indicating return address
Assume that if procedure p(x1, x2 …. var y1, y2, ….)
then the recursive call is p(a1, a2, …, y1, y2, ….)
73
General Rules:
Procedure P (x1, x2: int; var y: int);
Var i, j: int;
Begin
____________________________________;
____________________________________; . . .
P(a1, a2,y)
_________________________________________;
_________________________________________;
. . .
end;
74
Example 1
procedure Ackerrnann (m,n:integer, var A:int);
1. begin if n<O or m<O then wnteln(“error”) else if m=O then A:=n+1 else if n=O then Ackermann (m-1 ,1 ,A) else begin Ackermann(m,n-1 ,A);2. Ackermanfl(m-i ,A,A);3. end end;
75
Recursion Elimination
procedure Ackerrnann(m,n:int; var A:int);
label 1,2,3;var
S : STACK of recordm, n, l:intend;
1:2.3;begin
MAKENULL(S);
1: if n<O or m<O then write!n(uerrorx) else if m=O then A:=n+1 else if n=O then begin PUSH((m,n, 3), S); m:=m- 1; n:=1; goto 1 end else begin PUSH((m,n,2), S); n:=n-1; goto 1;
2. PUSH((m, a, 3), S); m:=m-1; n:=A; goto l;
end;
76
3. if not EMPTY(S) then begin
(m, n,1):=TOP(S); POP(S); case 1 of 2: goto2; 3: goto 3; end end
end; Ackermann
More details in [AHU] pp. 64- 69.
[HS] pp. 150-153.
* The method works only when
no pass-by-reference parameters or, same p-b-r parameters are passed each time (e.g.,
function)
General case???
POINTER!!! p(...,var x:type1)
p(...,xp:↑type1)
77
no global variables
procedure R(x:integer var y,z: integer);
var i: integer;
begin
---------------------
---------------------
y:=x*i
-------------------
-------------------
R(a,i,y);
-------------------
------------------
end;
Trees
Basic Terminology
78
1. a single node is a tree, also the root.2. if T1,T2 are trees with roots n1, n2, …., nk. Then n
nk
T= n1 n2 TR
T1 T2 a subtree of n(and of T)Is a tree with root n.n1, n2, …, nk are the children of n.actually. A rooted tree or oriented
siblings
n is the parent of n1, n2, …, nk
10
12
4
89
3
75
1
2
611
79
Note the every node (except root) has a unique parent.
A node with no children is a leaf.
A non-leaf node is also called an internal node.
n1
n2
n3
nk
n1, n2, n3,…., nk is a path of length k-1 from n1 to nk
Note: n1 is a path of length 0 n1 is an ancestor of nk
nk is a desendent of n1
height of n: length of the longest path from n to a leaf
height of a leaf is 0!
depth of n: length of the unique path from root to n.
depth of root is o!
80
height (or depth) of a tree: height of root.
Order of nodes
in a tree, siblings are ordered from left-to-right
(ordered)
≠
if n is to the left of n2 then all descendents of n1 are to the left of all descendents of n2
Tree Traversals
T
a
b c
a
bc
n
T1 TR81
Preorder traversal of T is
n, preorder traversal of T1
DFS preorder traversal of T2
preorder traversal of Tk
Inorder traversal of T is
i.t. of T1 n, i.t. of T2 ..., i.t. of Tk
Postorder traversal of T isp.t. of T1 p.t. of T2, ..., p.t. of Tk, n.
↑ evaluation of expression trees, divide-and conquer
Example
T2
1
82
Preorder: 1, 2, 5, 3, 6, 7, 4, 8, 9, 10
Inorder: 5, 2, 1, 6, 3, 7, 8, 4, 9, 10
Postorder: 5, 2, 6, 7, 3, 8, 9, 10, 4, 1
Preorder: we list a node the first time we pass it
Postorder: we list a node the last time we pass it
Inorder: we list the first time, but list an interior node the second time we pass it
procedure Preorder (T:tree); var v: node;
begin
8
1
10
2
4
7
3
9
5 6
83
V := ROOT(T); Print v; for each subtree T of v, from left to right do Preorder (T)end;
time complexity: O(|T|) ← number of nodes
Pre/In/Post
space complexity: 0 (height of T) ← stack
Procedure Preorder (T:tree); // no stack //
var v: node;
84
begin
V := ROOT(T) while v ≠ null do begin print v; if v ≠ leaf then v := 1st child of v
elseback up until while v ≠ null andv is not the v = last child of Parent(v)last child of doparent(v) v := Parent(v); if v ≠ null then v := next sibling of v end end;
time = O(|T|) if parent () is 0(1)
space ÷?
Reconstructing a tree from its traversals
Preorder and Postorder traversals are sufficient.
85
Preorder and Inorder traversals aren’t sufficient.
Inorder and Postorder traversals aren’t sufficient.
example trees?
Any single traversal isn’t sufficient.
(pre/in/post)
Labelled Trees, Expression Trees
*
a
b c
a
be
d d e
c
n1
86
+ +
a ab c
n2 represents a+b
n3 represents a+c
n1 represents (a + b) * (a + c)
Evaluation can be done by a postorder traversal.
pre/in/post-order listings give
prefix (Polish), infix, postfix (Reverse Polish)
↑ ↑ ↑*+ab+ac a+b*a+c ab+ac+*
ADT TREE
1. PARENT (n,T).: node. If no parent return null node.
2. LEFTMOST-CHILD (n,T) : node
n4 n5
n2
n6
n3
n7
87
3. RIGHT-SIBLING (n,T): node
returns the sibling immediately following n.
4. LABEL (n,T): label
≡ DATA(n,T)
5. ROOT(T) : node
6. MAKENULL(T)
7. CREATEL (v1, T1, T2 Ti ): tree; i=O,1,2,...
v
T1 Ti
T2
Alternative: ATTACH (T1 T2 ) : tree
8. DELETE (n,T) - delete the subtree rooted at n.
a1
a2 a5
a3 a4
n
n1
n2
n3
n5
n4
88
a7 a9
a6 a8
LEFTMOST-CHiLD (n1, T = n2 )RIGHT-SIBLING (n1, T) = n4
RIGHT-SIBLING (n7, T) = ^
procedure PREORDER (n:node);
//list labels of descendents of n in T (global) in preorder!!
varbegin print LABEL (n,T); n := LEFTMOST-CHILD (n,T) while n ≠ ^ do begin PREORDER (n); n := RIGHT-SIBLING (n,T) endend;
Array Implementation
a a
bb
n6 n6 n8n9
1
23
910
876
54
89
c b a
a c
b
1 2 3 4 5 6 7 8 9 10 parent (10, T)
0 = ^
label
node i is to the left of node j then i < ji.e., number siblings from left to righte.g., preorder, even inorder
type node = 1 .. max cell = record parent : 0 … max; label : labletype end; THREE = array [1… max] of cellfunction LEFTMOST-CHILD(n:node; T:TREE):node; var i:integer
begin i : =1;
0 1 1 2 2 5 5 5 3 3a b a c b a b c a b
90
while i ≤ max and T [i] . parent ≠ n dotime: i := i+1;O(|T|) if i > max then LEFTMOST-CHILD := 0 else LEFTMOST-CH := i end;
function RIGHT-SIBLING(n:node; T:TREE):node;
var i:integer parent:node;
begin
parent := T[n].parent;time i := n+1;O(|T|) while i ≤ max and T [i]. parent ≠ parent do i := i+1; if i > max then RIGHT-SIBLING := 0 else RIGHT-SIBLING := 1 end;
Trees as lists of children
Label children node right sibling
91
1 2
3 4 5 6 7 8 9 10
node space
type node = 1 .. max LIST = … TREE = record header : array [1..max] of LIST; labels : array [1..max] of labletype root : node end;
no matter how LIST is implemented,LEFTMOST-CHILD; RIGHT- SIBLING _ 0(1) PARENT – 0(|T|)
If want 0(1) for all, add parent field
Considering CREATE (n, T1, T2, …, Ti);
node space
1 2
.
.
.
.
.
.
A
C .
B
G .ID .F .E
H .
6 7 8
6 4 .
92
T1 3 4 5 6 T 7 8 9 T2 10 11 12 13 14 15
T1 T2
SimplifiedLeftmost=child & right-sibling representation
10 .
2 12 .
11 148
I
A
C
E
HGF
D
B
A
93
Leftmost label right Child siblings
3
5
7 8
Var cellspace : array [1..max] of record Label : labeltype; Leftmost-child, right-sibling:0 .. max End
SUMMARY1. Array of Parents
• PARENT--O(1)
• LEFTMOST-CHILD, RIGHT-SIBLING - O(|T|)
8 B 5
0 C 0
3 A 00 D 0
BC
D
94
ALL-CHILDREN — 0(m)
• simple, space-efficient
2. List of Children
• LEFTMOST-CHILD - 0(1)
• PARENT, RIGHT-SIBLING -- 0(|T|)
• can store several trees, CREATE
3. Leftmost-child, Right-sibling
• LEFTMOST-CHILD, RIGHT-SIBLING -- 0(1)
• PARENT — O(|T|)
• make tree, CREATE, slightly more space than (2)
BINARY TREES
A node is a binary tree
If T is a binary tree, v is a node, then
95
If T1, T2 binary trees, v a node then
A binary tree is NOT a tree!!!
≠
Binary Trees A child is either a left or right child
Binary tree are not really trees
VV
T
T
T1
V
T2
BA
B
A
96
full binary tree: every internal node has two children and leaves have the same depth
complete binary tree: obtained from a full binary tree as follows: fix a leaf and delete all the leaves to the right of it
• no. of nodes of depth i ≤ 2i
isize of a binary tree of depth i ≤ Σ 2i = 2i+1 -1
j=0
If complete, 2i -1 < size ≤ 2i+1 -1
If full, size = 2i+1 -1
size-1
≥ Depth
≤ log2 (size +1) -1
Binary tree traversals
v
T1 T2
97
Preorder (T):
V, preorder (T1), preorder (T2)
Inorder (T):
* Inorder (T1), v, inorder (T2)
Postorder (T):
Postorder (T1), postorder (T2), v
How to reconstruct a binary tree from its traversals?
Just Preorder (or inorder or postorder) traversal is not enough.
Preorder & Postorder aren’t enough!
v
T2
a a
98
Preorder and Inorder
a1, a2, ….an b1, b2, …, bn
1. Find i s.t. a1 = bi
Then T1 = Reconstruct (a2, … ai,bi, …., bi -1)
T = Reconstruct (ai+1, …., an, bi+1, …, bn)
T =
Posorder & inorder similar
Representation of binary trees
bb
a1
T1 T2
A
B . C
99
Type node = record label : labeltype; left, right : ↑ node end
TREE ↑ node;
Notes: 1. cursors may also be used.
2. if operation PARENT ( ) is crucial, a parent field could be included.
3. but if traversal is the only concern, then the parent field is not really needed.
procedure PREORDER(T:TREE); var temp, tempparent, tempchild; procedure BACKUP; var stop : boolean; begin // find the successor of temp in preorder traversal// stop false; temp:=tempparent;
. E . . F . . D .
100
while temp ≠ nil and not stop do begin if temp ↑. tag = 0 then begin tempparent := temp ↑. left; temp ↑ .left := tempchild; if temp 1. right ≠ nil then begin tempchild := temp t right; temp ↑. right := tempparent; temp ↑. tag := 1 temparent : = then temp := tempchild; stop:= true; return; end else begin // tempt. tag = 1 // tempparent := temp ↑ .right; temp ↑. right := tempchitd end; tempchild := temp; temp := tempparent end end; // end of backup //
Begin // print nodes of T in preorder //
temp := T;tempparent := nil; while temp ≠ nil do
101
begin Print temp ↑. label; if temp ↑. left ≠ nil then begin
tempchild := temp ↑ . left; temp ↑. left := tempparent; temp ↑ . tag := 0; tempparent := temp; temp := tempchild end
else if temp ↑ . right ≠ nil then begin tempchild := temp ↑ . right; temp ↑. right := tempparent; temp ↑. tag := 1; tempparent := temp’ temp := tempchild end else //temp ↑ .left = temp ↑. right = nil //
BACKUP end
end; end of PREORDER
Threaded binary trees
102
0 → left = leftchild lefttag = 1 →left = leftthread (predecessor) in inorder
0 → right = right childrighttag = 1 right thread (successor)
predecessor/successor in inorder can be found without using stack orflipping
Representation of complete binary trees
A
0 0
0 1 0 0
. 1 1 . . 1 1
1
2
103
B C D E
H F G
1 2 3 4 5 6 7 8 9 10 11 12 13 T
← largest integer ≤ i/2
parent of node i = [1/2], 1 < i ≤ n
left child of node I = 2i, 2i ≤ n
right child of node I = 2i +1, 2i +1 ≤ n
type THREE = recordn : 0 .. max;labels : array [1..max] of labeltype
end;
A B C D E F G H I J
4
8
3
610
5
79
B
A
C
104
Var
temp, tempparent, tempchild : ↑ node;
tag = 0 → left points to parent
1 → right points to parent
type
node = recordlabel : labeltype;left, right: ↑ node; tag:0..1;end;
TREE = ↑ node;
An a of binary trees - Huffman codes
characters : a1,a2, …. Ak = A
string or message : x1, x2,….xn є A
. E .
. G .
. H .
D .
. F .
105
p(ai) - the probability that a will appear in a message
Encoding: assign a binary code c(ai) for each ai
c(x1, x2…xn) = c(x1)… c(xn)
Decoding: given code bib find the unique message
x1,x2….xn such that C(x1,x2… xn) = b1b2 …bm
Average code length:
k
Σ p(a1).| c(a1) |i =1
| c(a1)| : length of c(a1)
character probability code 1 code 2 code 3
106
abcde
.30
.10
.10
.10
.40
000001010011100
01001000110001
0001100001
average 3 2.1 1.7
Prefix property: c(ai ) is not prefix of c(ai ) for any j ≠ i
e.g., Code 1 and Code 2 have prefix property,
Code 3 doesn’t!
Claim : prefix property makes decoding easy e.g., comsider
Decoding code 000 ….
Code 1 a … on-line decoding
Code 2 d …
Code 3 ??
Huffman Code - an optimal (least average code length) prefix code
107
Algorithm Huffman (a1, a2, …. an);
//find Huffman code c(a for each ai //
Let a and a be two characters such that p(ai) and p(aj) are the lowest among a1, a2, …., an;
Let a be a new character and p(a’) = p(ai) + p(aj);
Huffman (a1, a2, …, an - (ai,aj + a’);
c(ai) = c(a’) 0;
c(aj) = c(a) 1;
end;
Example: a,b,c p(a) = 0.5, p(b) = 0.3, p(c) = 0.2
Hufiman (a,b,c) c(a)=0, c(b)=10, c(c)=11
Huff man (a,[bc])=> c(a) = 0, c([b]) = 1
Binary tree representation of prefix code 0
1 1
108
0 0 0 0 0 e a d 0 1
0 1 0 1 0 b c a b c d e
code 1 code 2
typenode = record
left, right :↑ node;probability : real ;character : a1, a2, …., ak)end;
used only in leaves
A more efficient implementation is given in [AHU] pp.94 -101
example a b c d e f g h
109
.10 .20 .05 .05 .10 .30 .10 .10
1)
a b c d e f g h
2) a b e f g h
c d
(3) & (4)
called a forest a
e g
c d
b f h
(5)
.10 .20 .05.05 .10.10.30.10
.10 .20 .10.10 .10.10.30
.05 .05
.20
.05.05
.10.10 .10
.20
.10
.10.30.20
.20.20
110
called a forest a
e g
c d
f
bh
(6)
ge
a c d
(7)
.05
.10
.10.10 .10.10
.20
.30
.30.30
.40
.10.05.05
.10.10
.10
.20
.20
111
ge
a c d
f
b h
(6) 10
.40
.10.05.05
.10.10
.10
.20
.20
.30.30
.10
.60
.20
112
10
10
f
0 10 1 0 1
0 1e g b h
a
c d
using a modified preorder listing, we can print the Huffman codes for the characters (using a stack)
Algotithm Huffman-Tree;
// construct a huffman tree for characters a1,a2,….,an// var forest: array [1… max] of THREE;
p:real;begin
for i:=1 to n do begin
new(forest[i]); forest[i] ↑. left := nil;
113
forest[i] ↑. right:= nil; forest[i] ↑. probability := p(ai);
forest[i] ↑. Character := ai
end;
while forest contains more than one tree do begin
i := index of the tree with the smallest prob.; j := index of the tree with the second smallest prob.; p := forest[i] ↑. prob + forest [i] ↑. Prob.;
forest [i] := CR EATE2( (p,-) ,forest[i],forest[j]); Delete tree forest[j] EndEnd
A set is a collection of elements/members
114
Notes:
1. An element can be a set!
2. A set can be infinite or empty.
3. Usually (in this course), members of a set are of the same type.
4. Members of a set are different (otherwise, a multiset).
5. Members could be nearly ordered.A relation is a linear order on some set S
(i) for any a + b in S. exactly one of a<b, a+b, a>b is true. (Trichotomy)
(ii) for a,b,c in S,
a<b, b<c ==> a<c (Transitivity)
115
Some notation:
S = a1, a2, …an
or S = (x|x satisfies condition?)
e.g. 1,2,...,10 = (x|x is an integer and 1 ≤ x ≤ 10)
Ø =
Membership: x є S, x ∉ S
inclusion: S1⊆ S2 S1⊈ S2
(subset) S1⊆ S2 iff S1 ≠ S2 and S1 ⊆ S2
superset S1⊇ S2
proper superset: S1⊇ S2
Union: S1∪ S2 1, 2∪ (2, 3)=1,2,3
Intersection: S1∩S2 1, 2 ∩1, 3 = 2
Difference: S1-S2 1, 2 – 2, 3 = 1
ADT SET
116
1. MAKENULL(S): S:=ø
2. INSERT(x,S): S:=S∪ (x
3. DELETE(x,S): S:=S-x
4. MEMBER(x,S):true iff x ∊ S
5. ASSIGN(A,B): copy B into A
6. EQUAL(A,B): true iff A = B
7. UNION(A,B,C): C:= A∪B
8. INTERSECTION(A,B,C): C:=A∪B
9. DWFERENCE(A,B,C): C:=A-B
10. MERGE(A,B,C): if 4∩B=Ø, C:=A∪Botherwise C undefined
11. MIN(S): returns the minimum element In S assuming S is linearly ordered
12. FIND(X): disjoint A1,A2 ,…An - globalfind the unique A1 St. X ∊A1
13. SIZE(S). *SUBSET(A,B) COMPLEMENT(A)
SET with Union, Intersection, DifferenceExample – data-flow analysis
GEN = 1,2,3B1 KILL = 4, 5, 6, 7, 8, 91. t: = ?
2. p:= ?3. q:= ?
4. read (p)5. read (q)
q ≤ p ?6. t : = p7. P : = q
8. q : = t
P mod q =0
Write (q)9. t : = pmodq117
B2 GEN = 4,5 KILL = 2, 3, 7, 8
GEN = KILL
yB3
GEN = 6 KILL = 1, 9
B4
GEN = 7, 8 KILL = 2, 3, 4, 5
B6
GEN = KILL = ∅
B6 yGEN = KILL = ∅
B7GEN = 9KILL = 1,6
B8 GEN[i] = data definition in B1 KILL[i] = d|d ∊Bi & ∊d ėBi
defining same var as D
DEF1NE[i] d|∃ a path B1….BiBi, such that d is the last definition of the variable defined d in the path
118
reaching definitionsof Bi
DEFIN = (1,4,5)
DEFIN = (4,5,6,7,8,9)
GEN[i]= data definitions in Bi
KILL[i] = (data definitions not in B), but defining the same variablesas GEN[i]
DEFOUT[i] = d|(same as in DEFIN[I] except “Bi…BiBi”)
leaving definitions
DEFOUT[i] = (DEFIN[i] – KILL[i]) ∪GEN[i]
DEFIN[i] = ∪ DEFOUT[i]
Bi is apredeceasorof Bi,
i.e. there is an arc from Bi to Bi)
Algorithm dataflow ( GEN;KILL; var DEFIN);
Var temp SET; i = integer;
119
changed : boolean;
begin no. of blocks
for i:= 1 to n do begin MAKENULL(DEFIN[i]); MAKENULL(DEFOUT[i]) end; repeat changed := false; for i:= 1 to n do begin DIFFERENCE(DEFIN[i], KILL[i], temp); UNION (temp. GEN[i], temp); If not EQUAL (temp, DEFOUT[i]) then ASSIGN (DEFOUT[i], temp); Change : = true; End;
For i:= 1 to n do begin MAKENULL(DEFIN[i]); for each predecessor Bi of Bi do UNION(DEFIN[i], DEFOUT[i],DEFIN[i]) end;until not changed;end;
Example
1. read (x)2. read (y)
120
B1 GEN[1] =1,2
KILL[1]= 3,5
GEN[2] = 3 4 B2 KILL[2] = 1
B3 GEN[3] = KILL [3] =Ø
GEN[4] =5 KILL[4] = 2B4
DEFOUT[I] = (DEFIN[I] – KILL GEN[I]
DEFIN[I]= DEFOUT[j]
Bj is a predecessor of BIiteration Ø 1 2 3 4
DEFIN[1]
DEFOUT[1]
DEFIN[2]
DEFOUT[2]
DEFIN[3]
DEFOUT[3]
DEFIN[4]
DEFOUT[4]
ØØØØØØØØ
Ø1,21,23,43,4ØØ5
3,4
1,2
1,2
2,3,4
2,3,4
3,4
3,4
5
2,3,4
1,2,4
1,2,4
2,3,4
2,3,4
2,3,4
2,3,4
3,4,5
2,3,4
1,2,4
1,2,4
2,3,4
2,3,4
2,3,4
2,3,4
3,4,5
BIT- VECTOR IMPLEMENATIONA,B,….Z
S 1,2,…N UNIVERSAL SET
1 2 I N
true iff i Є S
const N =?Type SET = packed array [1..N] of boolean;
Procedure UNIN (A,B: SET; var C:SET);var I = interger;begin
3. x: = x+y4. z: = 10.0
x z?
5. y :=x*z
121
for I:=1 to N doC[I]:=A[I] or B[I];
End;
MEMBER. INSERT, DELETE, - O (1)MAKENULL, ASSIGN, EQUALUNION, DIFFERENCE, INTERSECTION – O(N)EMPTY
Linked –list implementation
most general, size id unlimited efficient if the sets are ordering by “<” in that case, a set is represented as a shorted list a1,a2…, an where
a1 <a2 < …..an
Unsorted
MAKENULL, EMPTY – 0(N)
INSERT, MEMBER, DELETE, ASSIGN, 0(n)
EQUAL, UNION INTERSECTION, DIFFRENCE – 0(nm)
SORTED
MAKENULL, EMPLTY –(1)
122
INSERT, MEMBER, DELETE, ASSIGN, EQUAL, UNION,
INTERSECTION, DIFFERENCE – O(n) OR O(n+m)
Can be improved to O(logn)If balanced search trees are used
ADT Dictionary
SET with IINSERT, DELETE, MEMBER, and MAKENULL
Example
Dean’s list data base
Program deanlist(input, output);Type name = packed array[1…20] of char,
Grade = -1 ..12
Var student :name;Average: grade;Database: DICTIOARY (of names)
BeginMAKENULL(database);Readln(student, average);While student# ‘’ do begin
123
Case avergage of 12..10 : INSERT (student, database) 9:8.. 0 : DELETE(student, database) -1 : if MEMBER(student, database)
then writein(‘yes’)else writeln(‘no’)
endcase end
end
A modified dictionary
TypeElementtype = record
Key : keytype; Data : datatype End;
ThenMAKENULL, INSERT,
DELETE
QUERY (x:keytype) : datatype;
INSERT((key,data), dictionary)
DELETE(KEY, dictionary)
124
QUERY(key, dictionary)
Implementation of dictionary
1. Bit-vector if the universal ser is 1,2,…
INSERT, DELETE, MEMBER – O(1)
2. Sorted or unsorted o(n)
INSERT O(n) INSERT O(n)DELETE or DELETEMEMBER O(logn) MEMBER – o(n)
1. Unsorted array (of some constant size)
If set is ordered
125
TypeDICTIONARY = record
Last : 0..max+1Data : array [1..max] of element type end;
Procedure MAKENULL (var : A DICTIONARY)Begin
A last :=0End;
0(1)
Function MEMBER(x:elementype; varA:DICTIONARY):boolean;
Var i : integer;Begin
For i:=1 A.last doO(n) if A.data[i] = x
Then return (true);Return(false);
End
Procedure INSERT (x:elementtype; varA:dict…);Begin
If not MEMBER(x,A) thenIf A.last <max then begin
O(n) A.last := A.last +1
A.data[A.last]:xEndElse error(‘full’)
End;
126
Proceudre DELETE(x,A);;Var i:= integerBegin
Find the i s.t A.data[i] =xO(n) or I>a.last;
If A.data[i] = then A.data[I]:=…
Hashing – O(1) time/operation on avg
INSERT. DELETE, MEMBER
Universal set
To represent set S: put ai in cell I if ai Є
.
.
.
.
a1
a2
an
a4
an-1
a3
127
0(1) time if rank (a) = i is o(1) time
Generally, partition the elements into groups and let all elements in a group share a cell
O(1) time if h(x) = I if x in a group I can be computed in O(1) time.
Perfect if elements from a same group do not occur simultaneously!
Good if it is unlikely two elements from a same group occur simultaneously!
Okay if not TOO MANY elements from a same group occur simultaneously!
Some hash functions: I mod p, sum of digits, h(135) = 9
Hashing
Goal:
O(1) / Operation of average
INSERT, DELETE, MEMBER
Pr(time > C) << 1.0
Open hashing buckets
Partition elements in to B classesHashing function h(x) = I if x Є class I
128
0
1
b-1
Bucket tables list of elementsHeaders in each bucket
Avg time =1 + N/B per operation
If N ≤ C*B., avg time ≤ 1+ C
Closed Hashing
01
B-1
Bucket table
Insert: X x is placed in bucket h1(x)
If bucket h1(x) is already taken collision
129
Then try bucket h2(x) rehashing
If bucket h2(x) is taken
Then try bucket h3(x) .
.
.Member: X try bucket h1(x), h2(x)
Until find it or an empty bucket is met
Example 2
Sorting using priority queue
Key ≡ priority
Pool: priority queue
Procedure PQSort(var a array [1…n] of ….); Var pool:PRIORITY QUEUE of ….;
I:Intger; Begin MAKENULL(pool);
For i:=1 to n doINSERT(A[i], pool);
For i:=1 to n doA[I]:= DELELTEMIN(pool)
End;
130
Obs: if INSERT, DELETEMIN- o(login), then PQSort – O(nlogn)
Previous implantation of sets
Bit -vector – O (N) DELETEMIN
Array - O (n) INSERT & DELETEMIN
Linked list – unsorted O(n) DELETEMIN
- sorted O(n) INSERT
Hashing - DELETEMIN O(n)
Solution – heap partially ordered tree in [AHU]
12 3 parent ≤ child3
5 9
131
45
76
8 9 10
1 2 3 4 5 6 7 8 9 10 11
3 4 9 6 8 9 10 10 18 9
DELETEMIN:
6
10
18
8
99
10
9
5
910
101
8
8
9
6
59
8
9
6
132
Generally, time = O(depth of tree) = O(logn)(2depth ≤ n ≤ 2depth +1)
Insert (4, heap); 3
5 9
86 9 10
9 410 18
3
5 9
46 9 10
910
101
8
5
66
910
101
8
8
9
9
133
9 810 18
3
4 9
56 9 10
9 810 18
time = O(depth) = O(logn)
A linear order’<’ is a relation on elements;
(i) for any two elements a and b
a < b, a > b, or a = b
(ii) a<b and b<c
a <c
A set is ordered if a linear order’<’ on its members exists, e.g. sets of integers reals character strings (by lexicographical order)
Note: the appearance order of elements in a set representation is unimportant, e.g. (1,3,4) = (3,1,4 = 4,3,1
134
A sorted list:a1,a2 a3 a4,…. an-1, an
Representing ordered sets – binary search trees
Elements are ordered by ‘<’
Interested In operations:
MAKENULL, INSERT, MEMBER, DELETE, MIN
Previous implementation:
sorted linked list: MEMBER - 0(n)
sorted array: INSERT, DELETE - 0(n)
Solution: binary search tree
2030 left subtree < parent
135
15 25 45parent < right subtree
10 1728
16
Pascal Implementation
typeelementtype = recordkey:real;data:datatype end;
nodetypes = (leaf, interior)
twothreernode = recordcase kind : nodetypes ofleaf: (element:elementtype);
I nterior (first,second,third:↑twothreenode;lowofseconcd,lowofthird:real
136
end;
SET = ↑twothreenode
need parent: ↑Twothreenode?
2-3 three: 3-way B-tree
AVL – tree (Adelson – Velskii, landis)
Balance binary search tree [HS pp.436-452]
AVL tree
AVL tree
I dL – dR 1≤1
Empty, single nodes are also AVL trees.
An AVL tree is also called a height-balance (or depth) binary tree.
dL dR
137
BF = dL- dR
BF =1-1
-1 0
0 1
0
nd: minimum number of nodes in an AVL tree with depth d.
Fact
n0 = 1n1 =2nd = nd-1 + nd-2 +1
Similar to Fd = Fd-1 + Fd-2 Fibonacci number
nd ≥ Fd
= Cd/5
c = 1+2 5 >1
d ≤ log cnd + logc 5
depth O(logn)
12
7
2
15
19
8
10
138
MEMBER, INSERT, DELETE — O(Iogn)
Sets with MERGE and FIND
MERGE(A,B,C): it A∩B =∅ then C:=A∪B
environment = A1,A2,…,Am
FIND(x): the unique Ai s.t. x ∊Ai
Example
Equivalence problem:
Equivalence relation ‘≡’ on set S
1. a ≡a (reflexivity)
2. a ≡b b ≡ a (symmetry)3. a≡b, b≡c a ≡c (transitivity)
e.g. congruence modulo K i ≡kj iff(i-j) mod K = 0
139
equivalence classes:
S = S1, ∪ S2∪ S3 ∪...
s.t. a,b ∊si a≡ b
a ∊si a ∊sj a ≢b, i≠j
e.g. (0,k,2k,… 1,k+1, 2k+1,… …k-1, 2k-1,…
s = a1,a2,a3,a4,a5,a6,a5,a6,a7
Fortran: EQUIVALENCE a11≡a12
a13≡a14
. .
. .
. .
a1 a2 a3 a4 a5 a6 a7
a1≡a1 a1, a2 a3 a4 a5 a6 a7
a5≡a6 a1, a2 a3 a4 a5, a6 a7
a3≡a5 a1, a2 a3, a5, a6 a4 a7
a4≡a7 a1, a2 a3,a5, a6 a4 a7
ai≡aj A = FIND(ai; B = FIND(aj );
140
MERGE (A,B,A);
MAKENULL(B);
∪ = a1, a2, …., an =A1∪A2 ∪…∪=Am
A Partition
ADT MFSET A1, A2,… Am component
1. MERGE(A,B): A:=A∪B or B:=A∪B
2. FIND(X)
3. INITIAL(A,x): A:=x
A simple implementation element-based
Type
MFSET = array[membertype]of set-id-type
∪ = 1, 2, …, 121 2 3 4 5 6 7 8 9 10 11 12
= (2, 3, 6, 1, 4, 9 5, 8,12, 7, 10,
11
typeset-id-type = integermembertype = 1…n
functionFIND(x:1..n; var C:MFSET);
BeginO(1) FIND := c(x)
2 1 1 2 3 1 4 3 2 4 4 3
141
End
Procedure MERGE (A, B:integer; var C:MFSET); // A: A∪B// Var
X:1..n;Begin
For x:=1 to n doIf C[x] = B then
O(n) C[x] :=A
End;
By some minor improvementN merges can be done in O(nlogn) time using member list).
A tree implementation component-based
A = 1, 2, 3, 4 B=5, 6 C= 7
MERGE (A,B)
A CB
1
1 1
5 7
6
1
A
1
2 65
34
142
Time=0(1)
FIND (x) – O(depth)i
*weight rule:
if we always merge the smaller tree into the large tree, then depth ≤ log2n
Root must conatin weight of the tree.
Path compression
i
13
25
6
187
4
3 63 5 7 84
143
Find 6
With path compression only
n consecutive FIND - O(n)time
n Intermixed FIND and MERGE - O(nlogn)
With both path compression and rule ( *)
n intermixed FIND and MERGE - O(na(n))
α(n) = the least m s.t. n ≤ A(m,m)
pseudo-Inverse of Ackermann’s ftn.
In practice, α (n) ≤ 4 I
since2
A(4,4) = 2 . . 65536 .
2
144
2
Ordered Sets with MEREGE, FIND, SPLIT
SLPIT (S, S1, S2, x):
S1 := a| a ∊S and a < x
S2 : = a| a ∊S and a ≥ x
Longest Common Subsequence Problem
Sequence = string e.g. abcdaaa
A subsequence of a sequence x is obtained by removing zero or more (not necessarily contiguous) character from x
e.g. ab, aaaa, ada are subsequences of the above sequence
Longest common subsequence of x and y:
A longest sequence that is a subsequence of both x and y
145
e.g. x = 1 2 1 4 3 2 1
y = 2 5 1 3 4 1 2 1
21421 is an LCS21321 is another one
Application : UNIX diffDNA analysis, ete.
Solutions: x1, x2, …xn y1,y2, …, ym
|x| = n |y| = m
1. O(nm) dynamic programming
2. O(plogn)
Where p is the size of
(i,j) | ≤ n, 1 ≤i ≤n, 1≤i≤m, and x1 = y1
146
worst case p = O(mn)
In practice p = O(m+n)
Key idea
Input : A = a1a2…an B = b1b2…bm
To find | LCS(A,B) |
For j:=1 to m do
Find | LCS( a1….ai,bi….bj) |
Def.
Sk = i | |LCS(a1…aib1…bj)| = k
1 2 3 4 5 6 7 8 A = 1 2 1 4 3 2 1
B = 2 5 1 3 4 1 2 1
J s0 s1 s2 s3 s4 s5 s6 s7
1 1 2,3,4,5,6,7 ∅ ∅ ∅ ∅ ∅
147
2 1 2,3,4,5,6,7 ∅ ∅ ∅ ∅ ∅
3 ∅ 1,2 3,4,5,6,7 ∅ ∅
4 ∅1,2 3,4 5,6,7 ∅
5 ∅1,2 3 4, 5,6,7 ∅
6 ∅
7 ∅
8 ∅Def. PLACES (a) = I| 1 ≤ i≤n, ai = a
All PLACES(a) can be obtained in O(n) time, assuming the alphabet is finite.
If not, O(nlogn) e.g. if PLACES(a) = i1,i2…., ik
i1 > i2 > ….ik
PLACES [a] . . .
Hashing
Intuitive fact: in iteration j (i.e. when considering bj), new matches happen at PLACES (bj) in A. These matches may have a position from sk to sk +1.
Rule: r ∊sr to sk+1 iff j -1
Move r to sk+1 iff
1. ar = bj (i.e., γ ∊PLACES(bj))2. r-1 ∊sk
a1…ar-1 ar …
i1 iRi2
148
b1 … bj-1 bj …
Procedure LCS:
Begin
(1) Initialize s0 = [ 1, 2,…n andSi = 0 for i=1,2,…n;
(2) for j:=1 to m do
//compute sk’s for postion j //
(3) for r in PLACES(bj)(4) k:=FIND(r);(5) if K = find(r -1) then begin(6) SLPIT(sk, sk, s’k , r);(7) MERGES(sk, sk+1, sk+1)
End End
End;
Obs: if FIND, MERGE, SPLIT can be done in 0(logn), then the total time is
m0 ∑ | PLACES(bj).| logn
j-1
0 = (p.logn)
149
Data structure for sets S0, S1, …, Sn
2 -3 trees ! ! !
8 9 10 11 12 13 14
FIND ( r) : O(depth) = O(logn)
MERGES (S’k, Sk+1, Sk+1):
New Sk+1 New Sk+1
S’k Sk+1
Sk+1 S’k Sk+1
K
150
S’k
Similar to INSERT, repair
Takes O(logn) time
APLIT ( )
6 7 8 9 10 11 12
r = 9 split at 9
8 10 11 12 6 7 9
9 10 11 126 7 8
151
time = O(logn)
Graphs: A Math Model
Hw401HW401
HW6 QEWHw403 QEW
Flight Map (Imaginary)
KNOW
friends
WaterlooToronto
Niagara fallsHamilton
Toronto
Minneapolis
London
MiamiNew Orleans
ChicagoNew York
Bob Mary
Alex Mark
Bob Maryy
Alex Mark
152
Misc: state transition diagrams
Directed Graphs (Digraphs)
V1 = 1, 2, 3, 4, 5E1 = (1,2), (1,3), (2,3),(3,4),(4,5),(4,1),(5,1), (5,4)
G1 : = (V1, E1)
A digraph G = (V, E)
V: set of verices/nodes
E: set of arcs/directed edges
The arc from vertex v to vertex w:
Vw or (v,w) v≠w
Tail head w is adjacent to v
|V| = n|E| ≤ n(n-1)
= O(n2)
A path v1,v2, …vm s.t. the arcs (v1,v2,(v2,v3),…,(vm-1,vm)exist.
Length of the path : m-1
SandySandy
1
2
3
5
4
153
The path passes through v2, v3, …,vm-1
The path is simple if all vertices on the path are distinct, except possibly the first and last.
(Simple) cycle: a (simple) path of length at least one that begins and ends at the same vertex.
e.g. 1, 2, 3, 4, 1, 3 is a path
1, 2, 3, 4, 5 is a simple path
1, 2, 3, 4, 5, 1 is a simple cycle
1, 2, 3, 4, 1, 3, 4 is a simple cycle
labelled diagrapha
b a b bb abab
a abbaaabaa . . .
When the labels are numbers, the diagraph is also called a network or weighted diagraph.
154
Representation of diagraphs
1. List of edges e.g., (1,2), (1, 3), (2, 3),…
2. Adjacency matrix
G = (V,E) V = 1,2, …, n
Adjacency matrix for G is an n x n
Boolean matrix
A[i,j] = true1 if (i, j) ∊E= false0 otherwise
space: O(n2) even if | E | << o(n2)
3. Adjacency list
1
2345
23411
3 .
5 .
155
Space : O(|E|) to decide if ij, we need O(n) time
ADT DIAGRAPH
Single source shortest paths problem:
Given G = (V,E) and source vertex
15 3
40 100 50 5018
2050 10 40 20
30labels (costs)must be ≥ 0costs (2,1) = +∞
We need to determine the cost of the shortest path from sources to every other vertex
e.g. source =1 to min cost 2 70
2
63
4
1
5
n-1Cost(v1,v2,…vn) = ∑ cost(vivi+1) i=1
156
3 60 4 40 5 10 6 30
Dijkstra’s algorithm
Source vertex =1 G = (V,E) V= 1, 2, 3
Distance D(i) = cost of shortest path 1 to i
Let S ⊆ V be a set of verticles,
Ds (i) = cost of shortest path from 1 to i that only passesThrough vertices in S
S is called a restrictionset
10B(3) = 9
Ds(3) = 10 if S = 14 5
Fact: Dv(i) = D(i) and D∅(i) = cost(1i)
Idea: Let S ⊆ V be some set s.t. 1∊S
Suppose we know Ds(i) for each i ≤ V. Then we can enlarge S as follows:
1. w ∊V-S and Ds(w) is the minimum
1 3
2
157
2. S= S∪w
3. Ds(i): min(Ds(i),Ds(w) + cost (w, i))
i .
Ds(i)S
. .
wDs(w)
Algorithm
BeginS:= 1;For i:=1 ton do
D[i]:= cost(1,i);For i:= to n-1 do begin
Find w in V-S s.t. D[w] is a minimum;S:=S∪w;For j:=2 to n do
D[j]:=min(D[j], D[w] + cost(w, j))End
End;
. . . . .
. |
. .
Ds(x)≤ Ds(w)For any X ∊S
D[i]≡Ds(i)
158
Obs. D[j] = D(i) if I ∊S
Thus no need to update D[i] if ∊SExample
1020 10
6010 30 10
4010
3050
30 60 = d[3]
0 +∞ 10 40
30 60
0 +∞ 10 40
30 60
0 +∞ 10 40
Ds(i) = D(i) for I ∊S
2
5
3
4
16
2
6
54
3
1 S = 1
2
6
54
3
1
2
6
54
3
1
159
30
0 +∞ 10 40
30
0 +∞ 10 40
2
6
54
3
1
2
6
54
3
1
160
Procedure Dijkstra:
// C[i,j] = cost(i,j)//
begin S := 1 For i :=2 to n do D[j] :=C[1, i]; For i := 2 to n-1 do begin1. Find a w in V-S such that
D[w] is a minimum; S := S ∪w;2. for each vertex V in V-S do
D[v] := min(D[v] + C[w,v]) End End;
How to recover theShortest paths?
Time = O(n2) Adjacency lists of costs:
W V1 C1
161
Priority Queue for V-S time: 0(|Ellogn)
All-Pairs Shortest Paths Problem
Given: a digraph with nonnegative arc costs
Goal: for each pair v, w of vertices find the cost of the shortest path from v to w.
Application: construction of shortest flying time table
Solution 1: repeat Dijkstra’s algorithm with source = 1 ,2,...,n
time: 0(n3) or 0(n | E | logn)
Solution 2: Floyd’s algorithm
let D(i,j) and Ds(i,j) be as before
D(i,j): distance from i to j
D distance from i to j under restrictions.
162
Floyd’s Idea:
Let sk = 1, 2, …, k, 0 ≤ k < n
Suppose Dsk(i,j) is know for all 1≤ I, j, ≤ n
Then,
Let sk+1 = 1, 2, …k+1
Dsk+1 (i,j) = min Dsk(i,j)Dsk(I,k+1) + Dsk(k+1, j)
For all i≤ j, j≤n
Thus, we compute
Ds0(i,j) Ds1(I,j),…., Dsn(i,j)
Cost (i,j) D(i,j)
163
in the following procedure, A is an nxn matrix
A[i,j] = Dsk,((i,j) after k-th iteration
procedure Floyd(Var A:arraY[l ..n,1 ..n] of real; C:...);
var i,j,k: integer,
beginfor i := i to n do
for j:=1 to n doA[i,j] = C[i,j]; //A = Ds0//
for i := 1 to n doA[i,j] :=0;
for k := 1 to n do for i := 1 to n do
for j:=1 to n doif A[i,k] + A[k,j] <A[i,j] then
A[i,j] := A[k,j]+ A[k,j]// A[i,j]:= min(A[i,j], A[i,k] + A[k,j]) //
end;time = 0(n3)
164
Recovering the paths
Use an rrxn matrix P
initially, P[i,j]:=0 1 ≤ 1,1 ≤n
In procedure Floyd, insert red line
If A[i,k] + A[k,j] <A [i,j]thenbegin
A [i,j] := A[i,k] + A[k,j];P [i,j] : = k
endMeaning: the Shortest path
from i to j passes through vertex k
Procedure path (i,j:integer);
// print a shortest path from I to j //var
k: integer;begin
k:= p[i,j];if k ≠ 0 then begin // the path is not direct //
path(i,k);writeln(k);path(k,j);
end;end;
165
Transitive closure of adjacency matrix
Given: digraph G = (V,E) represented by adjacency matrix C
Goal: for each pair i,j, whether there exists a path from i to j
A[i,j]= 1 true if apath from i toj0 false other
for 1≤i,j≤n
A is called the transitive closure of C
Solution 1: Use Floyd’s algorithm.
Initialize A[i,j] = + ∞ if C[i,j] = 0A[i,j] =1 other
At the end, set A[i,j] =1 if A[i,j] ≠ + ∞= 0 if .. = + ∞
Solution 2: Simplified Floyd’s algorithms - Warshall’s algorithm
in iteration k A[i,j] := A[i,j] or A[i,k] and A[k,j]
166
procedure Warshall(var A:array[1…n, 1…n] of boolean; C:...);
var
i,j,k: integer;begin
for i := 1 to n dofor j := 1 to n do
A[I, j] := C[i,j];
For k := 1 to n dofor j := 1 to n do
for j := i to n doIf A[i,j] = 0 then
A[i,j] = and A[k, i]end;
lime = O(n3)
A=C + C.C + C.C.C +…. + Cn-1
•‘: boolean multiplication, i.e. and
it is known C.C can be done in 0(n2.376)
to obtain A, computec2, c4, c8,… cn-1
log2n
time : O(logn *n2.376) < 0(n3)
167
Undirected graphs
G = (V,E)
V=(1,2),(2,3)(3,4),(4,5),(5,1),1,3),(2,5)
(u,v) and (v,u) denotes the same edge
(u,v) is incident upon u and v.
V1,V2,…Vn is a path if the edges (V1,V2),(V2,V3)…,(Vn-1,Vn) exist.
The path V1,V2,…,Vn connects V1 and Vn
Definitions for simple path and cycleand the same of length ≥ 3
G1 = (V1,E1) is a subgraph of G2 = (V2,E2) if V1 and E1 E2
If E1 contains all edges (u,v) in E2
Such that u,v V1, G1 is called an induced subgraph of G2
21
53
4
2
31
2
31
168
Graph G is connected if every pair of G’s vertices is connected by somepath
Connected component of G: a maximal connected induced subgraph of G
G is cyclic if G contains at least one cycle
G is acyclic if G doesn’t contain any cycles
Free tree: a connected acyclic graph
Fact: 1. Every n-node free tree has n-1 edges
3. If we add any edge to a free tree, we get a cycle
Claim: If n>1, there must be a vertex with degree (i.e., number ofedges incident upon the vertex) =1
169
Proof of claim
Let G be a free tree with > 1 node
Suppose that G’s nodes all have degree >1
∴ a cylcle esists. A contradiction!!
Proof of (1): true If n = 1
Suppose (1) is true for n = k
Let G = (V,E) be a k+1 – node free tree
Let u be a vertex of dgree 1 and (u,w) be the only indicent edge
G’ = (V-u. E-(u,w)I a free tree
By induction hypothesis, G’ has k-1 edges
∴ G has k edges
proof of (2): if no cycle then the graph is still free tree
but number of edges = n.
contradiction!!!
V1
V2
VjVi+2Vi+1ViV3
170
Representation
Adjacency matrix: symmetric, i.e. entry i,j = entry j,i
Adjacency list: redundancy, i.e. if edge (u,v) exists, then u is on the list for v and v is on the list for u.
Minimum-cost spanning tree
G= (V,E) is connected. Each edge (u,v) E has a cost C(u,v) (=C(v,u)).
A spanning tree of G is a subgraph of G which is a free connecting allvertices in V.
The cost of a spanning tree is the sum of the costs of edges in the tree.
11
30 3 820
15 2013
30
171
The MST Property:
Let G = (V,E) be a connected graph
U ⊆ V is a proper subset of V
If (u,v) is an edge of the lowest cost s.t.
u ⊆U and v V-U, then there is a minimum cost spanning tree that includes (u,v) as an edge.
C(u,v) ≤ C(u’,v’)
procedure Prim (G:graph;var T : set of edges);
II Constructs a minimum-cost spanning tree T II
var U : set of vertices; U,V:vertex;begin
T : = ∅ ; ∪ := 1 while ∪≠ V do being find a lowest cost edge (u,v) s.t. u ∊ U and v ∊U; T := T ∪ ((u,v); ∪ := U ∪ v end
end;
u .u u .
. v v-u.v’
172
8An example: 9 9
75 3 8
8 6U = 1
U = 1,2
7 8
U = 1,2,58
7
3
U = 1,2,3,58
7
35
Kruskal’s algorithms
5
1
2
4 31
25
43
52
1
43
1
5
2
4
3
1
5
4
3
2
173
a connectedcomponentw.r.t. T
procedure Kruskal (G:graph;var T:set of edges); var u,v : vertex;
E’ : set of edges;
begin
E’ := E;T := ∅ ;while E’ ≠ ∅ do begin find a lowest cost edge (u,v) in E’; E’ := E’ – (u,v);
If u and v are not in the same connected component then T:= T∪(u,v);
endend;
K1 := FIND (u)K2 := FIND (v)if K1, ≠ K2 then MERGE (k1, k2) …α(n) time
E’: PRIORITY QUEUE;T: MFSET;Time:O(eloge), e = | E|
8Example: 9 95
1
2
174
75 3 8
6add(3,5) to T
3
add (4.5) discard (3,4)
35
Add (2,5)Discard(2,3)
7
35
Add (1,2) discard (1,3),(1,5)8
7
35
Graph Traversal and Search
Digraphs - depth-first search
4 31
25
43
52
1
43
1
5
2
4
3
1
5
4
3
2
175
go as far as you can following the arcs!
typedigraph = array [1 ..n] of adjacency list;vertex = 1..n
varV : vertexmark: array (vertex] of (visited, unvisited);
0(e)
for v.:= 1 to n do mark [V] := unvisited;for v := 1 to n do
if mark [V] = unvisited then dfs (v);
procedure dfs(v:vertex);var w: vertex;begin mark[v] := visited;
Print v; // anything //for each vertex w on L[v] do
I if mark [w] = unvisited then dfs(w)
end;
Example3
9
1
6
176
DFS order : 1, 2, 4, 7, 10, 5, 3, 6, 8, 9
Depth-first spanning forest:
Dfnumber1 7
2 68
3 4 tree arc9
5 10
forwars arc : ancestor descendent (3,8)
back arc: descendent ancestor (7, 1)
Cross arc: all the other (7, 4) , (9,1)
Fact: if (v, w) is a
(1) tree/forward arc, dfnumber (v) dfnumber (w);
25
7
410
8
1
52
4 7
10
3
6
8
9
177
(2) back/cross arc, dfnumber (v) dfnumber(w)
An application- test for acyclicity
Fact: a digraph is cyclic iff a back arc is encountered in any DFS.
Dfnumber (v) is the Smallest in the cycle
.
.
How to start a back arc? dfl0
In dfs, Include a dflNumber for each 1Node enoutered. Also, keep 2 the current path in array. 3
4
Breadth-first search
go as broadly as possible
v
w
V1
V2
V3
V4
V5
178
procedure bfs(v);var Q: QUEUE of vertex
x,y:vertex
beginmarkivi := visited;print v // or anything //MAKENULL(Q)ENQUEUE(v,Q);while not EMPTY(Q) do begin
x:= FRQNT(Q); DEQUEUE(Q);for each vertex y adjacent to do
time = 0(e) if mark[y] = unvisited then beginmark[y] := visited;ENQUEUE(y,Q)
Endend;
BFS order:1, 2, 5, 4, 6, 3, 7
bfnumber, bf spanning forest
(Undirected) Graphs
DFS: very similar to DFS for digraphs.
2 1
4
5
67
3
179
DFS order: 1, 2, 4, 3, 6, 5, 7, 8, 10 , 9
dfs spanningforest
if connected dfspanning tree
dfnumber(v):
Tree edge:
Back edge: (1,4), (4, 6), (2,5)
No cross edges!!!
BFS:
2
1
3
4
5
6
7 8
9
10
1
2
4
3 5
6
7
8 9
10
180
For the above graph, the BFS order is: 1,2,3,4,5,6,7,8,9,10
Applications of DFS and BFS:
1. Test for acyclicity
acyclic 1ff no back edges
2. Test for connectivity
connected 1ff only one tree In the DFS/BFS spanning forest
generally, each tree in the forest gives a connected component.
3. Biconnected components (next lecture)
Articulation points and biconnected components
Flight Map 1
181
Articulation vertex point : if removed the reaming graph becomes disconnected
Def. A vertex v is called an articulation point or cutpoint if vertices x,w st. x≠ v, w≠x, x≠ and v is inevery path connected x and w.
Def. A connected graph is biconnected if it does not have any articulation points.
Fact: The following are equivalent:
1. 0 is biconnected
2
45
87
3
6
182
2. Deletion of any single vertex fails to disconnect 0
3. Every pair of vertices are connected by two disjoint paths (n ≥ 3)
Def. A connected graph Is k-connected If deletion of any k-i vertices fails to disconnect the graph
Def. A connected graph is k edge-connected if deletion of any k-i edges fails to disconnect the graph
Biconnected component (or bicomponent): a maximal Induced biconnected subgraph, e.g. the above graph has 5 bicomponents
Problem
Given a connected graph G, identify all its articulation points and
2 56
34 5
5
83
7
5
1
183
bicomponents. -
Trivial algorithm: 0(n*e). We want 0(e)!
To identity the articulation points:
Step 1 Do a depth-first search of G.
Note:
1. a single df spanning tree
3. only tree and back edges
Fact:
1. A leaf cannot be an articulation point
184
2. The root is an articulation point 1ff it has more than one child
3. Let v be an interior node other than the root. v Is an articulation pointiff some subtree of v has no back edge incident upon a proper ancestor of v.
Obs. Let w be any proper descendent of v and (w,x) be a back edge. x Is a proper ancestor of v iff dfnumber (x) <dfnumber (v).
Def.
low(v) = the smallest dfnumber of v or of any node reachable by following a back edge from some descendent of v (including v itself).
dfnumber
1
2
4
3
6
5 7
8 9
10
11
185
Dfnumber 1 2 3 4 5 6 7 8 9 10 11
Low 1 2 1 1 5 1 6 6 9 1 1
dfnumber (v)Low(v) = min
Dfnumber(x) s.t. (v,x) is back edge from any x
Low(y) for any child y of v
Step 2 Traverse the df spanning tree in postorder and compute low(v) for all nodes v.
Note : if v is a leaf
Low(v) = min dfnumber (v)
Dfnumber(x) s.t. (v,x) is back edge
Step 3 Identify articulation points by traversing the tree in postorder. (This step can be In parallel with Step 2.)
v is an articulation point i for some child w of v
low (w) ≥ dfnumber(v)
186
Step 4 In Step 3, whenever an articulation point v Is found, delete the subtree rooted at w and output the bicomponent given by the subtree and V.
1
Df spanninf treeAnd back edges 2 3 8
4
5
6
7Time : O(e)
Matching in Graphs
Teachers course
BA
E
CD
FH
G
A
D
F
B
H
G
E
C
16
187
G = (V, E) is a graph
A matching in g is a set of edges with no two edges incident upon same vertex
A matching is maximal if the number of its edges in the maximum.
A matching is complete/ perfect if every vertex in V is an endpoint of some edge in the matching.
G is bipartite if V = V1 ∪ V2, V1 ∩V2 = ∅each edge in E had one end in V1
and the other end in V2 .
Problem: Given a bipartite G, find a maximal matching in G.
Solution #1: (Brute force) Enumereate all possible matchings. Pick one that the largest number of edges.
Time: O(n!) = O(nn)
Solution #2: Augmenting paths
Time: O(ne)
2
3
4
5
7
8
9
188
M = (2,7),3(3,6),(4,9)
Let M be a matching
A vertex V is matched if it is an endpoint if an edge in M, e.g. 2,3,4,6,7,9Are matched
An augmenting path relative to M: a path connecting two unmatched vertices in which alternate edges in the path are in M.
e.g.P1
P2
Fact: if P is an augmenting path relative to M, then M ⊗P is a bigger matching.
e.g. M ⊗ P (2,7), (1,6), (3,9), (4,10)
1 6 3 9 4 10
5 10
189
M ⊗ P = (3,6), (2,7), (4,9), (5,10)
(⊗is also the Exclusive —Or on sets, i.e. A ⊗B (A-B) ∪ (B-A) - symmetric difference)
Fact: M is maximal iff there is no augmenting path relative to M.
Proof: “only if”: straight forward
“if”: i.e., if M is not maximal then there must be an augmenting path.
Let N be a matching s.t. |N| >| M|.
Then each connected component of (V,N ⊗M) must be one of the following:
equal 1. a simple cycle with edges alternating between N and M
1more from M 2. an augmenting path relative to N
1 morefrom N 3. an augmenting path relative to M
4. a path with equal number of edges from N and M
Since N ⊗M has more edges from N than M...
Algorithm
M:= ⊗Repeat
Find an augmenting path P relative to M;
190
M:= M ⊗ PUntil no more augmenting path exists
16
27
3
84
9
510
M P⊗
(1,6)
1 6
3 6 81
191
(3,6),(1,8)
(2,8),(1,6), (3,9)
(2,8),(1,6),(3,9),(4,7)
Algorithm to find an augmenting path relative to matching M
//G=(V E) V=V1∪ V2 //
Build an augmenting graph level by level as follows:
level 0 := unmatched vertices In V1
repeat level 2i+1 := new vertices that are adjacent to a vertex atthrough an edge not in M; also add the edge;
level 2i +2:= level 2i+2:= new vertices that are adjacent to a vertex at level 21+1 through an edge in M; also add the edge;
Stop when an unmatched vertex is added at an odd level or no more vertices can be added (i.e. no augmenting path exists)
The path from the vertex to any vertex at level 0 is an augmenting path.
Example
V1 V2
2 8 1 6 3 9
4 7
5 7 10
4
192
Level0
1
2
3
The process is very similar to BFSTime: O(e) if adjacency lists are used
Internal Sorting
1
2
3
6
4
7
5
8
9
10
11 1
2
2
7
5
8
31 4
9
6
193
Internal: data are stored in the main memory which is a RAM. Thus, access to each data item takes constant time.
Data Item: a record with one or more fields. One field contains the key of the record.
‘≤’ linear-ordering on keys (compare ‘<)
Sorting: arrange a sequence of records so that the keys form a nondecreasing sequence.,
r1,r2,…rn
ri1-,ri2,…rin s.t.
ri1 .key ≤ri2.key ≤…≤ rin .key
Bubble Sort
Move the Hghter records to the top
for i := 1 to no do
194
for j:=n down to i+1 doIf A[j].key < a[j-1].key then
swap (A[j], A[j-1])
In place Time: 0(n2) Bad sequence: descending
Insertion Sort
Insert A[i] into A[1], A[2], ..., A[i-1] at its rightful position
A[0].key := - ∞For i:= 2 to n do begin
j:= I;while A[j] A[j-1] to begin
swap (A[j], A[j-1]);inplace j:=j-1
endend;
Time: 0(n2) at descending sequence
Selection Sort
Select the smaHest record and place it at its tightful position.
for i := i to n-i do
195
select the smallest among A[i],…A[n]swap it with A[i]
Time: 0(n2) In placeBetter than Bubble when reconi is large
Shell Sort (diminishing-increment)
Incr = 6
Incr =3
Time: O(n3/2) in place for some incr sequence
Heap Sort
Q: PRIORITY QUEUE
for i:=1 to n doINSERT (A[i],Q);
for i := n down to 1 do
196
A[i] := DELETEMIN(Q);
Time: O(nlogn)
in place if Q is implemented using array A[l ..n]
Details in [AHU]
Quick Sort
If A[i..j] contains two distinct keys then find the larger of the first two distinct keys, v (called pivot);arrange A[i..j] so that k, i+1 ≤j;A[[],…., A[k] < v and A[k+1],…,A[j] ≤; A[i],…A[k] < v, and
A[k+1],…,A[j] ≥ v
quicksort (i,k-1);quicksort (k,j)
Example
Partition v = 7
5 7 2 1 4 3 9 5 1 7
5 1 2 1 4 3 5 9 7 7
9 7 7197
Partition V =5 partition v = 9
Partition v = 3
V= 2 v =4
Worst time complexity: 0(n2)
pivot (i,j) - 0(j-i+1)
partition (i,j,pivot) – O(j-i+1)
T(n) = 0(n) + T(n-1)
5 1 2 1 4 3 5
7 7 93 1 2 1 4 5 5
97 75 53 1 2 1 4
1 1 2 3 4
3 1 2 3 4
1 1 2 3 4
1 1 2 43
198
= . . .
= 0(n
not in place! stack space can be made to O(logn)
Average time complexity
Assumptions:
1. all orderings are equally likely
2. the keys are distinct
Pr(lst group s of size i)
= Pr(A[1] is the i ÷1 St smallest and A[1] Is the pivot)
+ Pr(A[2] Is the i+1 st smallest and A[2] is the pivot)
= 1/n i/n-1 + 1/n i/n-1 = 2i/n(n-1)
Tavg(1) = C0
n-1
Tavg(n) ≤ ∑ 2i [Tavg(i) + tavg(n-1)] +cn
I =1 n(n-1)
199
n-1
≤ 2 ∑ [Tavg(i) +t cn
n-1 I =1
Suppose tavg(i) ≤ k I logi for some constant k 2 ≤ I < n
n-1
Tavg(n) ≤ 2 ∑ k i logi +cn
n-1 I =1
n/2 n-1
= 2k ∑ i logi + ∑ I logn + cn
n-1 i =1 i=n/2+1
n/2 n-1
= 2k ∑ i (logn-1) + ∑ i logn + cn
n-1 i =1 i=n/2+1
≤ knlogn – kn/4 – kn/2(n-1) +1
≤ knlogn, if k is large enough Tavg(n) = 0(nlogn)
2-way merge sort
• divide-and-conquer
• can be used for external sorting
200
• can be generalized to rn-way
Algorithm Msort (A[l ..n]); if n> 1 then begin m:=[n/2]; Msort (A[1...m]);
Msort (A[m+1..n]);Merge (A[1...m], A[m+l …n], B[1...n]);A[1...n] := B[1...n];
end;
Let k=2[logn]
(i.e. k is the smallest power of 2 that is greater than or equal to n)
T(n) ≤ T(k) = T(k/2) + T (k/2) + ck
= 2 T(k(2) + ck
=4 T(k14)+ck+2 ck/2
if Merge =8 T(k/8) + 3cktakes 0(n) … = ck log2 k + k O(1) = O(nlogn)
The nonrecursive version
201
Logn
Recursive lognversion
NOTE : Merging order may be different in nonrecursive version.
Typeafile: array[1…max] of elementtype;
Merging two sorted lists
Ι M m+1 n ι n
202
i j k
procedure merge (var X,Z :afile; l, m, n : integer);
//Merge X[l…m] and X[m +1..n] into Z[l..n] //
var i,j,k: integer;begin i:=1; j := m+1; k:=l while I ≤ m and j ≤ n do begin
if x[i].key ≤ x[j].key thenbegin
Z[k]:=x[i]; i:= i+1End
Else beginZ[k] :=X[j]; j:=j+1
EndK := K +1
End;If i > m then Z[k…n] : = x[j…n] // move the reaming items//Else Z[k…n] : = x[j…m]
End;
Time: O(n-1)
l
Union of ordered sets represented by sorted lists!
203
Procedure onepass (var X,Y:afile; n;l:integer);
//this procedure performs one pass of the merege sort. It merges adjacent pairs of segments of length l from list X to list Y. n= |X| //
var i : integer;begin
i =1;While i ≤ n -2l-1 do begin
Merge (X, Y, i,l-1, i+2l -1);i:= I +2l;
end;
// merge remaining segments of length < 2 I //
if (i+l-1) < n thenMerge (X,Y,i+l-1,n)
ElseY[i…n] := X[i…n]
End;
Time : O(n)
procedure Msort(var X:afile; n:integer);
var l:integer Y:afile; Begin
204
//l is the size of the segments currently being merged //
l :=1; while l < n do begin
Onepass(X,Y,n,I);l := 2*1;Onepass(Y,X,n,l);l =2*l;
end;end;
At most [1og2n] +1 passes.
Each pass takes 0(n) time.
Total: O(nlogn)
Example3 5 6 4 5 9 3 7 2 8 4 6 1 X
1=1
205
3 5 4 6 5 9 3 7 2 8 4 6 1 Y
1=2
3 4 5 6 3 5 7 9 2 4 6 8 1 X
1=4
3 3 4 5 5 6 7 9 1 2 4 6 8 Y
1 =8
1 2 3 3 4 4 5 5 6 6 7 8 9
Obs: The list X and Y are scanned sequentially from left to right
[log2n] + 1 times
206
Bin Sorting
Is Ω(nlogn) the lower bound for sorting a elements?
Yes, if we don’t make any assumption about the keytype and only use comparisons such as key l ≤ key2.
What if we know 1 ≤ key ≤ n and the a elements have distinct keys?
To sort such n elements,
for i := 1 to n do B[A[i].key] := a[i];
0(n)
or
for i:= to n do while A[i].key ≠ i do
swap (A[i],A[A[i].key]);
0(n)
207
Example
Sorting records that have a small number of distinct keys:
n records, O(logn) distinct keys
Can we do better than O(nlogn)?
An algorithm using modified 2 - 3 tree: (AVL also okay)
5 7
.
.
.
size of tree: O(logn)each insert: O(loglogn)Total time: O(nloglogn)
Bin Sorting
4 6 9 11
2 4 5 6 7 119
208
Key = 1..m (any finite and discrete type)
1
2
m
Bin table BProcedure binsort; Var
i:= integer, v: keytype; begin for i = 1 to n doO(n) INSERT(A[i], END(B[A[i].key]), B[A[i],key]);
For v : = 2 to m doO(m) CONCAT(B[1],B[v]) End;
Bin sorting when m = n k for some k
Example
k=2
keytype=0...n2 -1
.
.
.
.
.
.
209
Step 1: Place each integer i into bin i mod n Append Ito the end of the list for bin i mod n.
Step 2: Concatenate the lists.
Step 3: Place each integer i into bin [i/n].
Step 4: Concatenate the lists.
Each step -- 0(n)
Total time — 0(n)
n=10
Given: 45, 36, 21, 64, 60, 33, 12, 27, 30, 25
BIN CONTENTS
ImodlO= 0 60,301 212 12
210
3 334 645 45,256 367 2789
New list: 60, 30, 21, 12, 33, 64, 45, 25, 36, 27
BIN CONTENTS
LW10J= 01 122 21, 25, 273 30,33,364 4556 60,64789
Radix Sort
typekeytype = record
f1: t1;f2:t2; . . finite, discrete
211
fk: tk end;
keyl = (a1, a2,…ak)
key2 = (b1, b2,…,bk)
keyl <key2 iff
1. a1 <b1, or
2. a1 = b1 and a2< b2, or
k. a1 = b1 ,...,ak-l = bk-1, and ak < bk
i.e. i, 0 ≤ i <k s.t. aj = bj i ≤j ≤i and
ai+1 < bi+1
e.g. abc <aca (called lexicographic order)
VarBi:array[ti] of linked listtype;
Procedure radixsort;
// binsort list A, first on fk, concatenate the bins in Bk, binsort of fk-1,and so on //
begin
212
for i := k down to 1 do beginfor each value V of type ti do
make Bi[v] empty;
for each record r on list A domove r from A on to the endof bin Bi[r.fi];
// binssort on fi //
foe each value V of type ti, from lowest to highest do concatenate Bi[v] onto the end of A
endend;
k kTime : o(|ti| +n) = o(kn+ |ti|) i=1 i =1
Example
A = hact, fact, sact, camp, duck, kuck, codd, less, more
D coddE moreK sack, duck, kuck
213
P campS lessT hact, fact
C sack, duck, kuck, hact, factD coddM campV moreS less
A sack, hact, fact, campE lessO codd, moreU duck, kuck
C camp, coddD duckF factH hactK kuckL lessM moreS sack
Odd-even merge sort
(Useful when you have a parallel computer)
Algorithm Odd-even-merge-sort (a0,a1,…a2n-1)
214
1. Split the list a0,a1,…a2n-1 into two lists a0,a1,…an-1 and a0,an+1,…a2n-1
2. Odd-even-merge-sort (a0,a1,…an-1)
3. Odd-even-merge-sort (an,an+1,…a2n-1)
4. Odd-even-merge (a0,a1,…an-1, an,an+1,…a2n-1)
Algorithm Odd-even-merge (a0,a1,…an-1, b0,b1,…bn-1)
1. c0,c1…cn-1 := Odd-even-merge (a0,a2,…an-2, b0,b2,…bn-2)
2. d0d1…dn-1 := Odd-even-merge (a1,a3,…an-1, b1,b3,…bn-1)
3. For all i > 0, compare c1 and di-1 and interchange i necessary
4. Return; c0 c1 d0 c2 d1 c3 d2 … cn-1 dn-2 dn-1
Example
Odd-even-merge (4,5,8,11,20,25;2,9,10,27,30,31)
Odd-even-merge (4,8,20;2,10,30)returns 2,4,8,10,20,30
215
Odd-even-merge (5,11 ,25;9,27,31)returns 5,9,11,28,27,31
c: 2 4 8 10 20 30
d: 5 9 11 25 27 31
2 4 5 8 9 10 11 20 25 27 30 31
Sequential time complexity:
Odd-even-merge – T1(n)
T1(n) = 2T1(n/2) + cn.= O(ntogn)
Odd-even-merge-sort – T2(n)
T2(n) = 2T2(n/2) + c1nlogn.= O(nlog2n)
In parallel, Odd-even-merge - O(logn) time….sort - O(log2n)
Odd-even-mereg (a0a1a2a3, a4 a5 a6 a7)
a0 a1 a2 a3, a4 a5 a6 a7
a0 a2 a4 a6, a1 a3 a5 a7
216
a0 a4 a2 a6, a1 a5 a3 a7
b0 b1 b2 b3, b4 b5 b6 b7
c0 c1 c2 c3, c4 c5 c6 c7
c0 c1 c4 c2, c5 c3 c6 c7
d0 d1 d2 d3, d4 d5 d6 d7
Lower bound for sorting
Defintion:
Let B be a problem and f(n) a function. B requires Ω (f(n)) time if every algorithm for B has time complexity Ω (f(n)) (i.e., the running time is at least f(n) In the worst case for inputs of length n).
f(n) is a time lower bound for B.
217
Theorem : Sorting by comparisons requires Ω (nlogn) time. (in fact, Ω (nlogn) comparisons)
Assumption: Only operations on keys are comparison of key values.
noyes
Without loss of generality, assume the keys are distinct.
decision trees
Let P be any sorting algorithm. Denote the input by:
A[1..n]:a1,a2,…,an
Define a binary tree as follows
yes no
Key 1 < key 2 ?
A[i1,] < a[j1]?
218
y noy no
An outcome i.e.A sort listar1,ar2,….,am
called the decision tree for P on size n
Decision tree for bubble sort with n =3
1 2 3
A
For I : = 1 to 2 doFor j :=3 down to I +1 do
If A[j]< A[j-1] then swap (A[j],A[j-1])
A[i3,] < a[j3]?A[i2,] < a[j2]?
219
A[1..3] = a b c
yy
y n y y
y n n y n n
Fact: For any sorting algorithm A, the decision tree for a must have atleast n! leaves.
Proof There are ni outcomes when A sorts n elements.
Fact The depth of the decision tree must be at least log
Proof Let depth = d
Abc
A[3] < a[2] ?
Abc
A[3] < a[2] ?
Abc
A[3] < a[2] ?
Abc
A[3] < a[2] ?
Abc
A[3] < a[2] ?
Abc
A[2] < a[3] ?
Abc
A[2] < a[1] ?
cbaacb bca abc
cabbac
220
n! ≤ 2
d ≥[1og2(n!)]
Corollary A requires at least 1og2(n!) comparisons in the worst case. .
n! = (n/e) e = 2.71 83
1og2(n!) = n1og2(n/e)
n1og2(n/e) = Ω (nlogn)
Sorting requires c compansors. Sorting requires Ω (nlogn) time.
Average Time complexity for sorting
= avg depth of leaves in decision tree
221
Claim: Among the n! leaves, at least half of them have depth n1og2
(n!).
Proof: The maximum number of leaves with depth ≤ n1og2(n!)-1 is
n1og2(n!/2) n1og2(n!/2)2 =2
= n!/2
on average, sorting requires
Ω 1og2(n!) = Ω(nlogn) time2
Problem Given a1,a2,…,an, s.t. a1 < a2 <…< an and xFind i s.t. a1 =x
Binary search: O(logn) time
Fact searching requires Ω (logn) time
Proof any of a1,a2,…,an could be x
there are at least n + 1 outcomes when we search x in a1,…,an
a decision tree must have depth log2(n+1)
222
Problem Given a1,a2,…,an find the smallest element
yes no
n-1 elements mustall have lost somecomparison!
External sorting
Assumption : The number of data items to be sorted is too large.
The data items (records) are sorted on external storage devices in the form of (sequential) files.
External storage device:
Magnetic tape read/write
ai < aj
ai < ajai < aj
223
head
Operation:Read (B)Write (B)f.forwardrewind
. . . BLOCK i BLOCK i +1 . . .
inter-block gap
Magnetic DiskA track
R/W head
A sector ( block)
224
To access a sector:
1. locate the current track by shifting the R/W head
2. wait until the correct sector arrives
the time needed to access a block:
seek time + actual R/W time
>>> main memory access time
file : a sequence of block a fixed number of records
in external sorting, the dominating factor is the number of block accesses
it is desirable to scan a file beginning to end
The model
File1 file2 file3CPU
.
.
.
.
.
.
.
.
.
.
.
.
225
Objective: sorting with minimum number of passes through the file (thus, minimum number of block accesses)
Bubble, insertion, …., Quick, heap sorts: require at least O(n) passes.
2-way merge sort: only require [log2n] passes!
ASSUME THAT FILES ARE STOREDON DISKS. THUS, SEEK TIME ISTHE “SAME” FOR ALL BLOCKS.
MAIN MEMORY
C BLOCKs
226
EXAMPLE
Sort file F = A1,A2,…A2100
A block = 100 records
Working main memory space
= 3 blocks ( used as buffers)
Step 1: internally sort three blocks (300 records at time.
Store the resulting file on disk.
Run1 run2 run3 run4 run5 run6 run7
1-300 301-600 601-900 901-1200 1201-1500 1501-1800 1801-2100
Step 2: partition the main memory into three blocks.
Two are used as input buffers and the third is used as an output
227
buffer.
Merge runs 1 and 2 .
Alogrithm
Merge (R1,R2)
Read a block from R1;
Read a block from R2;
Merge the records in the input buffers and store the result in the output buffer;
If the output buffer gets full, write the contents on to the disk and clear the buffer;
If an output buffer gets empty, read a block from the same run.
Merge runs 3 and 4, then 5 and 6 , then copy run 7
The result of this is a file of 4 runs.
Merge these runs and produce a file of two runs.
Merge the two runs to obtain a single run (i.e., a sorted file).
F
228
F1
F2
F3
Notes:
1. If the number of initial runs is m, then [log2m] passes suffice.
2. if the device is tape, then we need four tapes.
Tape 1:Run 1 Run 3 Run 5 Run 7
229
Tape 2: Run 2 Run 4 Run 6
Tape 3:Run 1 Run 3
Tape 4: Run 2 Run 4
Tape 1:Run 1
Tape 2: Run 2
Tape 3:Run 1
Tape 4:
3. Temp files can be discarded after being used.
4. k-way merge
3 way
230
Generally, K-way merge sort requires
[logkm] passes
= [log2m/log2k]
k +1 buffers; more comparison (k-1/record) in each pass
for tapes, k-way merge requires 2k tapes.
General algorithm design techniques
Divide-and-conquer e.g. merge sortTop-down, recursive quick sort
Dynamic programming longest commonBottom –up subsequence
Greedy shortest pathsBrute force minimum-cost spanning tree
231
Back tracking rat-in-maze
Divide-and-conquer
To solve problem A:
If A is small enoughThen solve it directly
ElseBreak A into smaller problems smaller instances
A1,A2,..Ak; Of the same problem
Solve Ai for each i = 1, 2, …, k;Combine the solutions for A1,..Ak
To obtain the solution for A
Example:
Towers of Hanoi
A B C
232
Algorithm Move (n,A,B)
//move n disk from A to B //
if n =1 then move the disk B
else beginmove (n-1, A, C);move (1,A, B);move (n-1, C, B)
end
C1
T(n) = 2T(n-1) + c2 if n =1 otherwise
= O(2n)
Example
Given n integers, find both the maximum and minimum
Algorithm maximum (A[1..n], max, min)
If n =1 thenMax := min := a[1]
Else if n =2 thenifA[1] < A[2] then
max : = A[2]min : = A[1]
else
233
max : = A[1]min := A[2]
else // n > 2//
maximin (A[1..n/2], max1, min1);maximin (A[n/2 +1..n], max2, min2);if max1 < max2 then
max : = max2 else
max : = max1if min1 ≤ min2 then
min : = min1else min : = min2
nC (n) = 2C( 2 ) +2 (comparisons)
1 if n =2
C(n) = 3/2n -2 (by induction)
Dynamic programming
234
There are situations where:
(i) There is no way to divide a problem into a small number ofSubproblems.
(ii) The subproblems overlap each other (too much redundancy if d divide-and-conquer is used).
(iii) The total number of subproblems to tackle is not large, i.e. polynomial (i.e. nk. Usually k =2,3).
Dynamic programming approach:
Systematically solve all the subproblems, with the smallest ones first. Keep track of the solutions to the solved subproblems by means of aTable. Solutions to larger subproblems are found by combining solutions to smaller subproblems.
Example
Longest common subsequence problem
(LCS)
235
sequence : x = a1a2…an
subsequence of x: a sequence obtained from x by deleting some characters
a b c a b
c a b b a b
LCS Problem: given x = a1a2…an
y = b1b2…bm
find the length of an LCS of x and y
Previous solution ( using sets);
O(plogn) time
Where p = the number of paris of positions,
One from each sequence, that have
The same character
In the worst case, p = O(nm)
Time : O(nmlogn)
236
Dynamic programming solution
Given x = a1a2…an and y = b1b2…bm
Define an (n +1) X (m +1) matrix L
L[i,j] = the length of an LCS of a1a2…ai and b1b2…bi
For all 0 ≤ i ≤ n, 0 ≤ j ≤ m
Note :L[0,j] = L[i,0] = 0 0 ≤ i ≤ n, 0 ≤ j ≤ m
L[n,m] is the length of the LCS of x and y
Each L[i,j]A subproblem
L[0,j] = 0, 0 ≤ j ≤ m
L[0,j] = 0, 0 ≤ i ≤ n
L[i-1,j]
L[i-j-1]
i-1,j -1] +1 if a1 = bj
L[i,j] = max0 otherwise
237
1 ≤ I ≤ n, 1 ≤ j ≤ m
a1… ai-1 ai
b1… bj-1 bj
0 1 j -1 j0
1
i -1
i
n
solution
Algorithm LCS
// evaluate matrix L row by row , with row 0 first //
for j : = 0 to m doL[0,j] : = 0;
For i := 1 to n do
0 0 0 00
00
0
238
L[i,0] : = 0;
For i := 1 to n dofor j : = 0 to m do
if ai =bj thentemp : = L(i-1,j-1] +1
else temp :=0L[i,j] : = max (L(i-1,j], L[i,j-1], temp
End; Writeln (L[n,m])
Time = O(nm)
Space = O(min(n,m))
Recursive solution
(divide – and-conquer)
Algorithm LCS (n,m)
If n = 0 or m = o thenLCS : = 0
Else beginI1:=LCS (n-1,m);I2:=LCS(n,m-1);if an = bm then
239
I3 : LCS (n-1, m-1) +1 Else
I3: =0;LCS : = max (I1,I2,I3)
EndEnd;
T(n,m) = T(n-1,m) + T(n,m-1) +T(n-1,m-1) +C= O(3n+m)
Dynamic programming example 2
World Series Odds
Problem: Teams A and B play a match. Whoever wins n games first wins the match.
Assumption: A and B are equally competent, i.e., each has a 50% chance of winning a particular game.
240
P(i,j): The probability that if A needs i games to win (i.e., A has won n-i games) and B needs j games to win, that A will eventually win the game, 0 ≤ i, j ≤ n.
We want to compute P(s,t) for some particular 0 ≤ s.t. ≤ n.
P(0,j) = 1 1 ≤ j ≤ n
P(i,0) = 0 1 ≤ i ≤ n
P(i,j) = P(i-1, j) + P (I, j-1)0 ≤ i, j ≤ n
2
0 j -1 j0
0 0 0
0
00
01 1 1 1
241
1
i -1
i
0
Order of evaluation:
1. row by row/column by column
2. diagonal
Greedy Algorithm
Setting : given n objects a1,a2,…an,
Each with a weight ( or cost) w(ai)
We want to select a subset of objects
a11,a12,…akm, subject to some
constraint, such that
242
m∑ w (aij)
j=1
is the minimum
Example
Coin Changing A1 = c1,c2,…cn is a set of distinct coin types
c1 > c2 > cn ≥ 1
How do we make up an exact amount using a minimum
Total number of coins?
If cn =1 then greedy algorithms can be used.
Algorithm coinchange (x);
i = 1;while x ≠ 0 do begin
if c1 ≤ x then begin //selsect the largest coin whose value is ≤ x //
writeln(c1);x : = x –c1
elsei:= i +1
end
243
e.g. c1 = 25¢ c2 = 10¢ c3 =5¢ c4 = 1¢
x =73¢
change : c1, c1, c2, c2, c4, c4, c4
Notes:
1. The algorithm doesn’t necessarily generate change with minimum total number of coins.
e.g., c1 = 5, c2 = 4, c3 = 1
x =8
2. Does so if A1 = kn-1, kn-2,…k0
Matching.
244
1. Start with m 0
2. Find an augmenting path P relative to M and replace m by MP
3. Repeat (2) until no further augmenting path exists, and then M is maximal.
1 6
27
38
4
5
9
10
1 2 3 4 5
6 7 8 9 10245
P = 1,6
M = 0 1,6 = 1,6
P = (2,6), (1,6),(1,7)
M = 1,6 P = (2,6), (1,7)
6
2 3 4
10
1
7 8 9
5
7
3
1
4 5
6 7
286 6 629 10
246
P = (3,7), (1,7),(1,6),(6,2),(2,9)
M = (2,6),(1,7) P =
= (3,7),(1,6),(2,9)
4
6
5
8 9 10
1 10
62
7
247
Doesn’t not work
P = (4,9), (2,9), (1,6),(1,8)
M = (3,7),(1,6),(2,9) P
= (4,9),(3,2),(1,8)
3 1
7
3
3
3
3
2 5
69 1
0
10
8 4
248
P = (2,9),(4,9),(4,10)
M = (4,9), (1,8),(3,7) P =
= (2,9),(1,8),(4,10),(3,7)
5
6 249
P = (5,6)
M = (2,9),(1,8), (4,10),(3,7) P=
= (2,9),(1,8),(4,10),(3,7),(5,6)
250