sigcse 2004 1 tradeoffs, intuition analysis, understanding big-oh aka o-notation owen astrachan...
TRANSCRIPT
SIGCSE 2004 1
Tradeoffs, intuitionanalysis,
understanding big-Ohaka O-notation
Owen [email protected]://www.cs.duke.edu/~ola
SIGCSE 2004 2
Analysis Vocabulary to discuss performance
and to reason about alternative algorithms and implementations It’s faster! It’s more elegant! It’s
safer! It’s cooler!
Use mathematics to analyze the algorithm,
Implementation is another matter cache, compiler optimizations, OS,
memory,…
SIGCSE 2004 3
What do we need? Need empirical tests and
mathematical tools Compare by running
•30 seconds vs. 3 seconds, •5 hours vs. 2 minutes•Two weeks to implement code
We need a vocabulary to discuss tradeoffs
SIGCSE 2004 4
Analyzing Algorithms Three solutions to online problem sort1
Sort strings, scan looking for runs Insert into Set, count each unique
string Use map of (String,Integer) to process
We want to discuss trade-offs of these solutions Ease to develop, debug, verify Runtime efficiency Vocabulary for discussion
SIGCSE 2004 5
What is big-Oh about? Intuition: avoid details when they
don’t matter, and they don’t matter when input size (N) is big enough For polynomials, use only leading
term, ignore coefficients: linear, quadratic
y = 3x y = 6x-2 y = 15x + 44
y = x2 y = x2-6x+9 y = 3x2+4x
SIGCSE 2004 6
O-notation, family of functions first family is O(n), the second is O(n2)
Intuition: family of curves, same shape
More formally: O(f(n)) is an upper-bound, when n is large enough the expression cf(n) is larger
Intuition: linear function: double input, double time, quadratic function: double input, quadruple the time
SIGCSE 2004 7
Reasoning about algorithms We have an O(n) algorithm,
For 5,000 elements takes 3.2 seconds For 10,000 elements takes 6.4 seconds For 15,000 elements takes ….?
We have an O(n2) algorithm For 5,000 elements takes 2.4 seconds For 10,000 elements takes 9.6 seconds For 15,000 elements takes …?
SIGCSE 2004 8
More on O-notation, big-Oh
Big-Oh hides/obscures some empirical analysis, but is good for general description of algorithm Compare algorithms in the limit 20N hours v. N2 microseconds:
•which is better?
SIGCSE 2004 9
More formal definition O-notation is an upper-bound, this
means that N is O(N), but it is also O(N2); we try to provide tight bounds. Formally: A function g(N) is O(f(N)) if there
exist constants c and n such that g(N) < cf(N) for all N > ncf(N)
g(N)
x = n
SIGCSE 2004 10
Big-Oh calculations from code Search for element in array:
What is complexity (using O-notation)?
If array doubles, what happens to time?
for(int k=0; k < a.length; k++) { if (a[k].equals(target)) return true; } return false;
Best case? Average case? Worst case?
SIGCSE 2004 11
Measures of complexity Worst case
Good upper-bound on behavior Never get worse than this
Average case What does average mean? Averaged over all inputs?
Assuming uniformly distributed random data?
SIGCSE 2004 12
Some helpful mathematics 1 + 2 + 3 + 4 + … + N
N(N+1)/2 = N2/2 + N/2 is O(N2)
N + N + N + …. + N (total of N times) N*N = N2 which is O(N2)
1 + 2 + 4 + … + 2N 2N+1 – 1 = 2 x 2N – 1 which is O(2N )
SIGCSE 2004 13
106 instructions/sec, runtimes
N O(log N)
O(N) O(N log N)
O(N2)
10 0.000003 0.00001 0.000033 0.0001
100 0.000007 0.00010 0.000664 0.1000
1,000 0.000010 0.00100 0.010000 1.0
10,000 0.000013 0.01000 0.132900 1.7 min
100,000 0.000017 0.10000 1.661000 2.78 hr
1,000,000 0.000020 1.0 19.9 11.6 day
1,000,000,000
0.000030 16.7 min 18.3 hr 318 centuries
SIGCSE 2004 14
Multiplying and adding big-Oh Suppose we do a linear search then
we do another one What is the complexity? If we do 100 linear searches? If we do n searches on an array of
size n?
SIGCSE 2004 15
Multiplying and adding Binary search followed by linear
search? What are big-Oh complexities?
Sum? 50 binary searches? N searches?
What is the number of elements in the list (1,2,2,3,3,3)? What about (1,2,2, …, n,n,…,n)? How can we reason about this?
SIGCSE 2004 16
Reasoning about big-O Given an n-list: (1,2,2,3,3,3, …n,n,…
n) If we remove all n’s, how many
left?
If we remove all larger than n/2?
SIGCSE 2004 17
Reasoning about big-O Given an n-list: (1,2,2,3,3,3, …n,n,…n)
If we remove every other element?
If we remove all larger than n/1000?
Remove all larger than square root n?
SIGCSE 2004 18
Money matters
while (n != 0) { count++; n = n/2;}
Penny on a statement each time executed What’s the cost?
SIGCSE 2004 19
Money matters
void stuff(int n){ for(int k=0; k < n; k++) System.out.println(k);}while (n != 0) { stuff(n); n = n/2;}
Is a penny always the right cost/statement? What’s the cost?
SIGCSE 2004 20
Find k-th largest Given an array of values, find k-th
largest 0th largest is the smallest How can I do this?
Do it the easy way…
Do it the fast way …
SIGCSE 2004 21
Easy way, complexity?
public int find(int[] list, int index){ Arrays.sort(list); return list[index];}
SIGCSE 2004 22
Fast way, complexity?
public int find(int[] list, int index){ return findHelper(list,index, 0, list.length-1);}
SIGCSE 2004 23
Fast way, complexity?private int findHelper(int[] list, int index, int first, int last){ int lastIndex = first; int pivot = list[first]; for(int k=first+1; k <= last; k++){ if (list[k] <= pivot){ lastIndex++; swap(list,lastIndex,k); } } swap(list,lastIndex,first);
SIGCSE 2004 24
Continued…
if (lastIndex == index) return list[lastIndex];else if (index < lastIndex ) return findHelper(list,index, first,lastIndex-1);else return findHelper(list,index, lastIndex+1,last);
SIGCSE 2004 25
Recurrences
int length(ListNode list) { if (0 == list) return 0; else return 1+length(list.getNext()); }
What is complexity? justification? T(n) = time of length for an n-node
list T(n) = T(n-1) + 1 T(0) = O(1)
SIGCSE 2004 26
Recognizing Recurrences T must be explicitly identified n is measure of size of
input/parameterT(n) time to run on an array of
size n
T(n) = T(n/2) + O(1) binary search O( log n )
T(n) = T(n-1) + O(1) sequential search O( n )
SIGCSE 2004 27
Recurrences
T(n) = 2T(n/2) + O(1)
tree traversal O( n )
T(n) = 2T(n/2) + O(n)
quicksort O( n log n)
T(n) = T(n-1) + O(n)
selection sort O( n2 )
SIGCSE 2004 28
Compute bn, b is BigInteger What is complexity using
BigInteger.pow? What method does it use? How do we analyze it?
Can we do exponentiation ourselves? Is there a reason to do so? What techniques will we use?
SIGCSE 2004 29
Correctness and Complexity?public BigInteger pow(BigInteger b,
int expo){
if (expo == 0) { return BigInteger.ONE; } BigInteger half = pow(b,expo/2); half = half.multiply(half);
if (expo % 2 == 0) return half; else return half.multiply(b);}
SIGCSE 2004 30
Correctness and Complexity?public BigInteger pow(BigInteger b, int expo){ BigInteger result =BigInteger.ONE;while (expo != 0){ if (expo % 2 != 0){
result = result.multiply(b); } expo = expo/2; b = b.multiply(b);}return result;
}
SIGCSE 2004 31
Solve and Analyze Given N words (e.g., from a file)
What are 20 most frequently occurring?
What are k most frequently occurring?
Proposals? Tradeoffs in efficiency Tradeoffs in implementation