slides01_se15_aoa [read-only]

58
1 Analysis of Algorithms Day 1

Upload: sagar-singh

Post on 03-Apr-2015

92 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SLIDES01_SE15_AOA [Read-Only]

1

Analysis of AlgorithmsDay 1

Page 2: SLIDES01_SE15_AOA [Read-Only]

2

Copyright © 2004, Infosys Technologies Ltd

2 ER/CORP/CRS/SE15/003

Version No: 2.0

Objectives of the course

• To introduce the concept of ‘Analysis of Algorithms’• To learn the various factors that affect the performance of an algorithm• To introduce algorithm design techniques• To learn Code Tuning Techniques• To introduce Numerical Analysis (Accuracy)• To introduce Intractable problems

The main concerns of a software engineer are to ensure:(i) Correctness of the solution(ii) Decomposition of a software application into small and clean units which can be

maintained easily(iii) Improving the performance of software application

The main objective of the course is to introduce “Analysis of Algorithms” and to compute the performance parameters of an algorithm.

After studying this course, you will get a better understanding on the importance of designing good algorithms and efficient programs.

Page 3: SLIDES01_SE15_AOA [Read-Only]

3

Copyright © 2004, Infosys Technologies Ltd

3 ER/CORP/CRS/SE15/003

Version No: 2.0

References

1. Donald E Knuth (1997)The Art of Computer Programming, Fundamental Algorithms, Volume 1, Third Edition, Addison Wesley

2. Cormen, Leiserson, Rivest, Stein(2001), Introduction to Algorithms, Second Edition, Prentice Hall

3. Alfred V Aho, John E Hopcraft, Jeffrey D Ullman (1998), Design & Analysis of Computer Algorithms, Addison Wesley Publishing Company

4. Ellis Horowitz, Sartaj Sahni, Sanguthevar, (1998)Fundamentals of Computer Algorithms, Galgotia Publications private limited, New Delhi

5. Weiss M, W. (1993) Data Structures and Algorithm Analysis in C, Benjamin Cummings, Addison Wesley

6. Jon Bentley(2000), Programming Pearls, Second Edition, Pearson Education7. McConnell, S. (1993) Code complete, Microsoft Press8. Press, et al (2002), Numerical Recipes in C++, Cambridge Univ Press

Page 4: SLIDES01_SE15_AOA [Read-Only]

4

Copyright © 2004, Infosys Technologies Ltd

4 ER/CORP/CRS/SE15/003

Version No: 2.0

Course Plan

Day 1

• Introduction to Analysis of algorithms– What is an Algorithm?– Properties of an Algorithm– Life cycle of an Algorithm

• Analyzing Algorithms– Introduction to Space and Time complexities– Basic Mathematical principles– Order of magnitude– Introduction to Asymptotic notations

• Best case• Worst case• Average case

Page 5: SLIDES01_SE15_AOA [Read-Only]

5

Copyright © 2004, Infosys Technologies Ltd

5 ER/CORP/CRS/SE15/003

Version No: 2.0

Course Plan (cont...)

Day 2

• Algorithm design techniques – Brute force– Greedy– Divide & Conquer– Decrease & Conquer– Dynamic Programming

Page 6: SLIDES01_SE15_AOA [Read-Only]

6

Copyright © 2004, Infosys Technologies Ltd

6 ER/CORP/CRS/SE15/003

Version No: 2.0

Course plan (cont…)

Day 3

• Code Tuning• SQL Query Tuning• Introduction to Numerical Analysis• Intractable problems

– Deterministic Vs Non-Deterministic machines– P Vs NP– NP Complete

Page 7: SLIDES01_SE15_AOA [Read-Only]

7

Analysis of AlgorithmsUnit 1 - Introduction

Page 8: SLIDES01_SE15_AOA [Read-Only]

8

Copyright © 2004, Infosys Technologies Ltd

8 ER/CORP/CRS/SE15/003

Version No: 2.0

Introduction to AlgorithmsThe etymology of the word Algorithm dates

back to the 8th Century AD.The word Algorithm is derived from the

name of the Persian author“Abu Jafar Mohammad ibn Musa al

Khowarizmi”

Muhammad al-Khowarizmi, from a 1983 USSR commemorative stamp scanned by Donald KnuthReference: ACM Trans - Algorithms

Abu Jafar Mohammad ibn Musa al Khowarizmi - was a great mathematician who was born around 780 AD in Baghdad. He worked on, algebra, geometry, and astronomy. His treatise on algebra, Hisab al-jabr w'al-muqabala, was the most famous and important of all of al-Khwarizmi's works. It is the title of this text that gives us the word "algebra"

Page 9: SLIDES01_SE15_AOA [Read-Only]

9

Copyright © 2004, Infosys Technologies Ltd

9 ER/CORP/CRS/SE15/003

Version No: 2.0

What is an Algorithm?• Finite set of instructions to accomplish a task. The algorithm should be

correct• The properties of an algorithm are as follows:

Finiteness

InputOutput

DefinitenessEffectiveness Algorithm

An Algorithm is defined as “Finite set of instructions to accomplish a task”.

An Algorithm has five properties as follows:Finiteness: An algorithm should end in a finite number of steps.Definiteness: Every step of an algorithm should be clear and unambiguously defined.Input: The input of an algorithm can either be given interactively by the user or generated internally.Output: An algorithm should have at least one output. Effectiveness: Every step in the algorithm should be easy to understand and prove using paper and pencil.

Page 10: SLIDES01_SE15_AOA [Read-Only]

10

Copyright © 2004, Infosys Technologies Ltd

10 ER/CORP/CRS/SE15/003

Version No: 2.0

Algorithm

Practice:

Write an algorithm to find the GCD of two numbers?

Step1: Get two numbers m & nStep2: Divide m by nStep3: If the remainder is 0 then return n as the GCD

elsem n, n remainderGoto Step2

Check if the above algorithm (Euclid’s Algorithm) to find the GCD of two givennumbers is satisfying all the properties of an algorithm

The above algorithm satisfies all the properties except definiteness because what will happen if m=-2 and n=3.45.So change Step1 as “Get two positive non zero integers m and n”.

Page 11: SLIDES01_SE15_AOA [Read-Only]

11

Copyright © 2004, Infosys Technologies Ltd

11 ER/CORP/CRS/SE15/003

Version No: 2.0

Algorithms span a vast space

• The definition“Finite Set of Instructions to accomplish a task”

spans a very vast space. We will only discuss a few kinds of algorithms, but will briefly indicate the larger picture, through a simplified banking application

The gamut of algorithms is very vast, spanning symbolic, numerical, power efficient, fault tolerant algorithms, etc. We illustrate this wide variety through an simplified banking example.

The different kinds of algorithms used in the banking application have different speeds, memory requirements, real time response, numerical accuracy, fault tolerance, etc.

Page 12: SLIDES01_SE15_AOA [Read-Only]

12

Copyright © 2004, Infosys Technologies Ltd

12 ER/CORP/CRS/SE15/003

Version No: 2.0

Banking Applications: Utilize Computers, Networks, and Storage

WAN Link to DR

Failover

Mirror, Keep multiple Copies in Sync!

Replication

Communication protocols, real time / Error Recovery

Authentication, Encryption

High Speed Rule-based System, with/without state, 10K+ transactions/second

Financially accurate calculation (Rs 1 in Rs 1000,000 crores, one part in 1013)

Bandwidth Conservation, MP3

Huge databases: 10’s of Terabytes

Disk Layout, Data Compression, Database Optimization, Encryption

Fault tolerance (detect potential loss)

Telephone Banking (IVR), Real Time TTS

Fault tolerant Datastructures

Routing Tables, Link State Information

This banking application utilizes all kinds of algorithms from symbolic through real time through fault tolerant. The figure also illustrates in a simplified form the architectural building blocks which comprise this banking system, and algorithm classes written to execute on it.

We show a Finacle installation, with terminals at a branch connected to a set of clustered web servers for authentication. The web servers are in turn connected to a set of application servers for implementing banking rules and polices. The application servers access mirrored and/or replicated data storage. Redundancy is present in the network also. Telephone banking using an Interactive Voice Response System is used as a backup if the branch terminals break down.

The design of the authentication hardware and software requires fault tolerance – the users should not have to relogin if one or more servers fail – some state should be stored in the form of cookies in non-volatile storage somewhere. The banking calculations require very high accuracy (30+ digit accuracy). Various kinds of fault tolerance schemes are used for storage. For example, two mirrored disks always keep identical data. A write to one disk is not considered complete till the other is written also. The servers have to respond within seconds to each user level request (deposit, withdrawal, etc) – the real time response of the system has to be evaluated using queuing theory and similar techniques. For TTS, the response output speech samples have to be guaranteed to be delivered at periodic time intervals, say every 125 microseconds.

Glossary:DR: Disaster RecoveryTTS: Text to Speech IVR: Interactive Voice ResponseWAN: Wide Area Network

Page 13: SLIDES01_SE15_AOA [Read-Only]

13

Copyright © 2004, Infosys Technologies Ltd

13 ER/CORP/CRS/SE15/003

Version No: 2.0

Pseudo Code• An algorithm is independent of any language or machine whereas a program is

dependent on a language and machine• To fill the gap between these two, we need pseudo codesPsuedo-code is a way to represent the step by step methods in finding the solution to the given problem.

Example:Algorithm arrayMax (A,n)

Input array A of n integersOutput maximum element of A

CurrentMax A[0]for i = 1 to n do

if A[i] > currentMax then currentMax A[i]

return currentMax

Algorithms are developed during the design phase of software engineering. During the design phase, we first look at the problem, try to write the “psuedo-code” and move towards the programming (implementation) phase.

It is a high level description of the algorithmIt is less detailed than the programWill not reveal the design issues of the programUses English like language

Page 14: SLIDES01_SE15_AOA [Read-Only]

14

Copyright © 2004, Infosys Technologies Ltd

14 ER/CORP/CRS/SE15/003

Version No: 2.0

Life Cycle of an Algorithm

• Design the Algorithm

• Write (Implementation of the Algorithm)

• Test the Algorithm

• Analyze the Algorithm

The life cycle of an algorithm consists of the four phases: Design, Write, Test and Analyze.(i) Design:The design techniques help in devising the algorithms. Some techniques are Divide & Conquer, Greedy Technique, Dynamic Programming etc.The design techniques will be dealt in Unit-3 (day 2).

(ii) Write (implementation): Implementing the algorithm in pseudo code which will be later represented in an appropriate programming language.

(iii) Test: Testing the algorithm for its correctness.

(iv) Analyze: Estimating the amount of time/space (which are considered to be prime resources) required while executing the algorithm.

Page 15: SLIDES01_SE15_AOA [Read-Only]

15

Copyright © 2004, Infosys Technologies Ltd

15 ER/CORP/CRS/SE15/003

Version No: 2.0

Resources available in a computer

POWER

The Primary Resources available in a deterministic silicon computer are:CPU &Primary memory

In this course we will focus on time (CPU utilization) and space (memory utilization).

When an algorithm is designed it should be analyzed for the amount of these resources it consumes. While solving a problem, an algorithm consuming more resources than others will not be considered in most of the cases.

Page 16: SLIDES01_SE15_AOA [Read-Only]

16

Copyright © 2004, Infosys Technologies Ltd

16 ER/CORP/CRS/SE15/003

Version No: 2.0

Analysis of Algorithms

• An algorithm when implemented, uses the computer’s primary memory and Central Processing Unit

• Analyzing the amount of resources needed for a particular solution of the problem

• The Analysis is done at two stages:– Priori Analysis:

» Analysis done before implementation– Posteriori Analysis:

» Analysis done after implementation

In Analysis we analyze the amount of resources needed for a particular solution of the problem.There are two types of Analysis:Priori Analysis:This is the theoretical estimation of resources required. Here the efficiency of the algorithm is checked. If possible the logic of the algorithm can be improved for efficiency. This is done before the implementation of the algorithm on a machine and so it is done independent of any machine/software.Posteriori Analysis:

This Analysis is done after implementing the algorithm on a target machine. It is aimed at determination of actual statistics about algorithm’s consumption of time and space requirements (primary memory) in the computer when it is being executed as a program.

Eg. Algorithm to check whether a number is prime or not. Algo1: Divide the number n from 2 to (n-1) and check the reminderAlgo2: Divide the number n from 2 to n/2 and check the reminderAlgo3: Divide the number n from 2 to sqrt(n) and check the reminder

Before implementing the algorithm (Priori Analysis) in a programming language, the best of the three algorithms will be selected(Algo3 will suit if n is large).

After implementing the algorithm (Posteriori Analysis) in a programming language, the performance is checked with the help of a profiler.

Page 17: SLIDES01_SE15_AOA [Read-Only]

17

Copyright © 2004, Infosys Technologies Ltd

17 ER/CORP/CRS/SE15/003

Version No: 2.0

A high-level view of analysis of algorithms

AlgorithmAlgorithm

CorrectnessCorrectness

Resource Usage: Resource Usage: Time/Memory/PowerTime/Memory/Power

Communication/ICommunication/I--O O

Accurate, to within Accurate, to within Error MarginError Margin

Condition NumberCondition Number!!

Beyond Beyond Asymptotics: Asymptotics: Mean,VarianceMean,Variance,,……

Power Analysis, Power Analysis, Physical Physical ModelingModeling

AsymptoticsAsymptotics

O(NO(N22), ), O(Nlog(NO(Nlog(N))))

Resiliency Resiliency Analysis: Analysis: Mirroring, Mirroring, Replication,… Replication,… Distributed Distributed System AnalysisSystem Analysis

Algorithms can be analyzed in many dimensions, speed, accuracy, power consumption, and resiliency.

•Numerical algorithms have to be devised for adequate accuracy. Only after you get sufficient accuracy can we look at speed.

•Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc. Memory or in general resource usage is a dual metric

•Embedded systems have to be power efficient, e.g. cell phones.

•Many algorithms, especially banking and finance are required to be fault tolerant, especially of server failures, etc. These systems are required to be generally geographically distributed. The resulting communication overhead can often be the dominant contribution to time.

Page 18: SLIDES01_SE15_AOA [Read-Only]

18

Copyright © 2004, Infosys Technologies Ltd

18 ER/CORP/CRS/SE15/003

Version No: 2.0

Efficiency Measures

• Performance of a solution

• Most of the software problems do not have a single best solution

• Then how do we judge these solutions?

• The solutions are chosen based on performance measures

• Performance Measures

• Time

• Quality

• Simplicity…

Why Performance?

Since most of the software problems do not have a unique solution, we are always interested in finding the better solution. A better solution is judged based on its performance. Some of the performance measures include the time taken by the solution, the quality of the solution, the simplicity of the solution, etc.

For any solution to a problem we would always ask the following questions:

“Is it feasible to use this solution?” In other words is it efficient enough to be used in practice? The efficiency measure which we normally look for is time and space. How much time does this solution take?. How much space (memory) does this solution occupy?

Improving the performance of a solution can be done by improving the algorithm design, database design, transaction design and by paying attention to the end-user psychology. Also continuous improvements in hardware and communication infrastructure aid in improving the performance of a solution.

Page 19: SLIDES01_SE15_AOA [Read-Only]

19

Copyright © 2004, Infosys Technologies Ltd

19 ER/CORP/CRS/SE15/003

Version No: 2.0

Efficiency Measures (Contd…)

• Space Time Tradeoff

Example 1: Consider a personnel management product that an organization can purchase and use to maintain information about its employees. If employee details were to be stored in an array, the array would have to be declared large enough to be able to hold the maximum number of records the system was rated to handle. This would always take up a large amount of memory. With a linked list implementation on the other hand, there would be better utilization of memory.

Which implementation would provide faster access to an employee with a given employee number?

Which implementation would be easier to code?

Which implementation would be easier to test?

The above mentioned example tries to highlight the need for performance. Each of the three questions asked are aimed at some performance measure.

The array data structure is a better data structure for each of these questions. However if a different company also plans to buy this product, then the size of the array must be very high (which could as well lead to wastage of space). In this case a linked list data structure might be a better option.

This example also highlights an universal problem called the space time tradeoff, which we will be discussing shortly.

Page 20: SLIDES01_SE15_AOA [Read-Only]

20

Copyright © 2004, Infosys Technologies Ltd

20 ER/CORP/CRS/SE15/003

Version No: 2.0

Efficiency Measures (Contd …)Example 2: Think of a GUI drop-down list box that displays a list of employees whose names begin with a specified sequence of characters. If the employee database is on a different machine, then there are two options:

Option a: fire a SQL and retrieve the relevant employee names each time the list is dropped down.

Option b: keep the complete list of employees in memory and refer to it each time the list is dropped down.

In your opinion which is the preferred option and why?

This example again does not have a unique solution. It depends on various parameters which include:• The number of employees•The transmission time from the database server to the client machine•The volume of data transmission each time•The frequency of such requests. •The network bandwidth

Neither of the solutions is the better one. The main point here is the tradeoff. When ever we need a better performance in terms of time taken, then we could opt for the option b which would however lead to more memory requirements. The vice versa is also true. When we want our solution to occupy less memory (space) then we need to strike a compromise for the efficiency in terms of time taken. This tradeoff is called the space time tradeoff which is an universal principle.

Page 21: SLIDES01_SE15_AOA [Read-Only]

21

Copyright © 2004, Infosys Technologies Ltd

21 ER/CORP/CRS/SE15/003

Version No: 2.0

Efficiency Measures (Contd …)

Example 3:Which one of the following problems requires more space?

• Design a computer program which produces an output 1, if the word is of length 3n (n=0,1,2,…) and 0, otherwiseexample:

If the input is “aabcef” the output is 1If the input is “aabc” then the output is 0

• Design a computer program that sorts ( in Ascending order ) and outputs the result for any input sequence a1,a2,…an of numbers, where n is any natural number

Consider the RAM size required in both the programs.Program 1 always requires a constant amount of memory.Program 2 must require memory of arbitrary length.

Page 22: SLIDES01_SE15_AOA [Read-Only]

22

Copyright © 2004, Infosys Technologies Ltd

22 ER/CORP/CRS/SE15/003

Version No: 2.0

Summary of Unit - 1

• What is an Algorithm?

• Properties of an Algorithm

• Life Cycle of an Algorithm

• Performance Measures

Page 23: SLIDES01_SE15_AOA [Read-Only]

23

Analysis of AlgorithmsUnit 2 - Analyzing Algorithms

Page 24: SLIDES01_SE15_AOA [Read-Only]

24

Copyright © 2004, Infosys Technologies Ltd

24 ER/CORP/CRS/SE15/003

Version No: 2.0

Analysis of Algorithms

• Refers to predicting the resources required by the algorithm, based on size of the problem

• The primary resources required are Time and Space

• Analysis based on time taken to executethe algorithm is called Time complexity of the Algorithm

• Analysis based on the memory required toexecute the algorithm is called Spacecomplexity of the Algorithm

When a programmer builds an algorithm during design phase of software life cycle, he/she might not be able to implement it immediately. This is because programming comes in later part of the software life cycle. But there is a need to analyze the algorithm at that stage. This will help in forecasting how much time the algorithm takes or how much primary memory it might occupy when it is implemented. So analysis of algorithm becomes very important.

Complexity of an algorithm represents the amount of resources required while executing the algorithm.There will always be a tradeoff between the time and space complexity.Most of the problems which require more space will take less time to execute and vice versa.

Page 25: SLIDES01_SE15_AOA [Read-Only]

25

Copyright © 2004, Infosys Technologies Ltd

25 ER/CORP/CRS/SE15/003

Version No: 2.0

Space Complexity

The space needed by a program has the following components:• Instruction space• Data space• Environment stack space

Instruction space:Space needed to store the object code.

Data space:Space needed to store constants & variables.

Environment stack space: Space needed when functions are called. If the function, fnA calls another

function fnB then the return address and all the local variables and formal parameters are to stored.

Page 26: SLIDES01_SE15_AOA [Read-Only]

26

Copyright © 2004, Infosys Technologies Ltd

26 ER/CORP/CRS/SE15/003

Version No: 2.0

Time Complexity

Time complexity depends on the machine, compilers and other real time factors.

Total time = Σ ( ti * opi(n) )

Where opi(n) is the number of instances the operation opi occurs and ti is the time taken for executing the operation

This Total time is a varying factor which depends on the current load of the system and other real time factors like communication

Time complexity also depends on all the factors that the space complexity depends on.

Time complexity includes the compilation time and execution time but compilation is done once whereas the execution is done n number of times. So the compilation time is not considered in most of the cases but only the execution time.

Page 27: SLIDES01_SE15_AOA [Read-Only]

27

Copyright © 2004, Infosys Technologies Ltd

27 ER/CORP/CRS/SE15/003

Version No: 2.0

Time Complexity (Cont…)

Operation count is one way to estimate the Time Complexity.

• Example 1: Searching an array for the presence of an elementHere the time complexity is estimated based on the number of search operations.

• Example 2: Finding the roots of a quadratic equation ax2+bx+c =0The roots are (–b + sqrt(b2 -4*a*c))/2a and (–b - sqrt(b2 -4*a*c))/2a.

Here the number of operations can be reduced by computing the commonexpression sqrt(b2 -4*a*c).

The success of this method (Operation count) depends on the identification of the exact operation/s that contribute most to the time complexity.

Page 28: SLIDES01_SE15_AOA [Read-Only]

28

Copyright © 2004, Infosys Technologies Ltd

28 ER/CORP/CRS/SE15/003

Version No: 2.0

Time Complexity (Cont…)

Step count is another way to estimate time complexity

Consider the code below: Total steps___________

sum(array, n) 0 01.1 tsum = 0; 11.2 for (i=0 ; i<n ; i++) 2n+2

1.2.1 tsum = tsum + array[i]; n1.3 return tsum; 1 0

___________Total number of steps: 3n+4

Page 29: SLIDES01_SE15_AOA [Read-Only]

29

Copyright © 2004, Infosys Technologies Ltd

29 ER/CORP/CRS/SE15/003

Version No: 2.0

Time Complexity (Cont…)

Recursive functions:Total steps ________

fact(n) 0 01.1 if (n<=1) n

return 1; 11.2 return ( n*fact(n-1) ); 2n-2 0

________3n - 1

Step1.1 is executed for n times and return for 1 timeStep1.2 contains one multiplication and one function calling. Each will be done for (n-1) times, so 2n-2

Page 30: SLIDES01_SE15_AOA [Read-Only]

30

Copyright © 2004, Infosys Technologies Ltd

30 ER/CORP/CRS/SE15/003

Version No: 2.0

Time Complexity (Cont…)Function calling:Consider a function calling the function sum(array,n) (ref: slide 28)

Total Steps ____________

Callsum(array1,array2,n) 0 01.1 for( i=0 ; I < n; i++) 2n+21.1.1 array2[i]=sum(array1,i+1); 3i + 8 = n(3n+13)/2 0

______________ Total number of steps: (3n2 + 17n + 4)/2

Regarding step 1.1.1 the function sum(array,n) is being called.The total number of steps for that function is already calculated as 3n + 4. The function Callsum is called for n=i+1. So substituting n=i+1 will give 3(i + 1) + 4 = 3i + 7. This value is incremented by 1 for the function call. So, it will become 3i + 8.This 3i + 8 will vary for i=0 to n-1 which is (3*0 + 8) + (3*1 + 8) + … + (3*(n-1) + 8)= 3(0+1+2+…+n-1) + 8n = 3(n-1)n/2 + 8n = n(3n+13)/2

Page 31: SLIDES01_SE15_AOA [Read-Only]

31

Copyright © 2004, Infosys Technologies Ltd

31 ER/CORP/CRS/SE15/003

Version No: 2.0

Kinds of Analysis of Algorithms

• Posteriori Analysis is aimed at determination of actual statistics about algorithm’s consumption of time and space requirements (primary memory) in the computer when it is being executed as a program. The Profiler tool is mainly used in finding the performance bottlenecks of a program

• Priori Analysis is aimed at analyzing the algorithm before it is implemented on any computer. It will give the approximate amount of resources required to solve the problem before execution

Posteriori analysis is done after implementing the algorithm in a Programming Language and running it in a machine.

Priori Analysis is carried out before the program is written (based on the algorithm). The calculation of order of magnitude in the examples we have seen above, is the priori analysis of the algorithm.

In case of priori analysis, we ignore the machine and platform dependent factors. Also we analyze the algorithm before we write the program. It is always better if we analyze the algorithm at the earlier stage of the software life cycle.

Page 32: SLIDES01_SE15_AOA [Read-Only]

32

Copyright © 2004, Infosys Technologies Ltd

32 ER/CORP/CRS/SE15/003

Version No: 2.0

Posteriori analysis

• External factors influence the execution of the algorithm– Network delay– Hardware failure etc.,

• The same algorithms might behave differently on different systems• The load on the machine can vary which affects the real performance

measure of the algorithm• Profiler tool can be used for performing Posteriori analysis

Page 33: SLIDES01_SE15_AOA [Read-Only]

33

Copyright © 2004, Infosys Technologies Ltd

33 ER/CORP/CRS/SE15/003

Version No: 2.0

Posteriori Analysis (Cont…)

PROFILER• What is a Profiler?

A tool to identify the performance bottlenecks of an application.

• Why Profiler?– To find the performance bottle necks.– Visualizing the run time of the code.– Finding out the time consumed by the code for the given input

• Limitations of a Profiler– Most Profilers talks more specific in terms of time duration– May vary depending on the load on the system

• Queries can also be profiled (provided by database vendors)– tkprof

Build a table which lists the total number of steps that each statement contributes. Add the contributions of all statements to obtain the step count for the entire program. So we can get the percentage of each statement. This approach in obtaining the step count (ref: time complexity) is called profiling. The same approach is applicable to various functions (subprograms) available in a program.

Refer Lab guide for VC++ profiler.

Page 34: SLIDES01_SE15_AOA [Read-Only]

34

Copyright © 2004, Infosys Technologies Ltd

34 ER/CORP/CRS/SE15/003

Version No: 2.0

Priori Analysis

Priori analysis require the knowledge of– Mathematical equations– Determination of the problem size – Order of magnitude of any algorithm

Each of these are discussed in the forthcoming sections

Page 35: SLIDES01_SE15_AOA [Read-Only]

35

Copyright © 2004, Infosys Technologies Ltd

35 ER/CORP/CRS/SE15/003

Version No: 2.0

Some Basic Mathematics Arithmetic Progressions:

Geometric Progressions:

( )1,111

0≠

−−

∑ =+

=xif

xxx

nn

i

i

1,1

10

<−

=∑∞

=

xifx

xi

i

2)1()1(...321

0

+=+−++++=∑

=

nnnnin

i

Mathematical knowledge is an essence for performing priori analysis. Arithmetic progressions:In this series, the difference between an element to its successor is the same as the difference between the element and its predecessor. So the series will be,

a, a + d, a + 2d, a + 3d,…Sum of n terms = n/2 * ( first term + last term)Also the sum of n terms = (n/2) * [ 2 * first term + (n-1) * constant diff.] = (n/2)*[2a + (n - 1) d]

Geometric Progressions:There will be a constant ratio between an element and its successor( it is the same as the ratio between an element and its predecessor).So the series will be,a, a r, ar^2, a r^3, …

The sum to n terms are shown in the above slide.

Page 36: SLIDES01_SE15_AOA [Read-Only]

36

Copyright © 2004, Infosys Technologies Ltd

36 ER/CORP/CRS/SE15/003

Version No: 2.0

Some Basic Mathematics (Contd…)

Logarithms:abba log=

ba

ab log

1log =

))(log(loglog caa bcb =

ba

ac

cb log

loglog =

The log functions grow slowly compared to linear functions. •loga(x) is a constant multiple of logb(x) for fixed a, b

Whenever the log is specified, it is log base 2.

Factorials:A number n! is represented by 1 * 2 * 3 * …. * (n-1) * n

Page 37: SLIDES01_SE15_AOA [Read-Only]

37

Copyright © 2004, Infosys Technologies Ltd

37 ER/CORP/CRS/SE15/003

Version No: 2.0

Some Basic Mathematics (Contd…)

A few mathematical formulae.

1 + 22 + … + n2 = n * (n + 1) * (2n + 1) / 6

1 + a + a2 + … + an = (a(n+1) – 1) / (a – 1)

Floor function f(x) or x : For a real number x, f(x) is the largest integer not greater than x.

Choice function:

)!(!!

rnrnCr

n

−=

•Applying the basic concepts we had seen so far, the above series can be evaluated.

Page 38: SLIDES01_SE15_AOA [Read-Only]

38

Copyright © 2004, Infosys Technologies Ltd

38 ER/CORP/CRS/SE15/003

Version No: 2.0

Growth of functions

Algorithm complexity will be represented in terms of mathematical functionsEx. n log n, n2

Given the complexities,n log(n)n2

which will grow slow?

nlog(n)

n2n

n2

log(n)

•In the figure in the slide, the x axis represents the problem size and the y axis represents the resources.•As part of Basic Mathematical Principles we will introduce applicable mathematics as required for this course.

•Growth of functions: The above figure shows the growth of a few mathematical functions. The x-axis varies from 0 to 50 and the y-axis varies from 0 to 100. The point to observed here is that the growth rate of the function log(n) is smaller when compared to the other functions namely n, nlog(n), n2 and 2n. An exponential function like 2n will ultimately over take any polynomial function. The need to understand the growth of these basic functions will be well appreciated in the later chapters wherein we analyze algorithms. •From the graph, we can find that the logarithmic functions will grow more slowly and the exponential functions will grow much faster.

What are factorial functions? What is their growth rate?

The functions which grows at the rate of n! are called factorial functions.

The growth rate of factorial is tremendous, that it will be much more greater than 2 ^ x.

Page 39: SLIDES01_SE15_AOA [Read-Only]

39

Copyright © 2004, Infosys Technologies Ltd

39 ER/CORP/CRS/SE15/003

Version No: 2.0

Some Basic Mathematics (Contd…)How many times should we divide (into half) the number of elements ‘n’(discarding reminders if any) to reach 1 element?

Since n is being divided by 2 consecutively we need to consider two cases.Case – 1: n is a power of 2:

Say for example n = 8 in which case 8 must be halved 3 times to reach 18 4 2 1. Similarly 16 must be halved 4 times to reach 1.16 8 4 2 1

Case – 2: n is not a power of 2:Say for example n = 9 in which case 9 must be halved 3 times to reach 1

9 4 2 1. Similarly 15 must be halved 3 times to reach 115 7 3 1. So if 2m < n < 2(m+1) then n must be halved m times to reach 1

In general, n must be halved m times and m is given by :

m= floor(log2n)

•The above mentioned result is necessary for analyzing most of the algorithms

•As a corollary to the above mentioned result we can easily see that a number n must be halved floor(log2n) + 1 times to reach 0.

Page 40: SLIDES01_SE15_AOA [Read-Only]

40

Copyright © 2004, Infosys Technologies Ltd

40 ER/CORP/CRS/SE15/003

Version No: 2.0

A high-level view of analysis of algorithms

AlgorithmAlgorithm

CorrectnessCorrectness

Resource Usage: Resource Usage: Time/Memory/PowerTime/Memory/Power

Communication/ICommunication/I--O O

Accurate, to within Accurate, to within Error MarginError Margin

Condition NumberCondition Number!!

Beyond Beyond Asymptotics: Asymptotics: Mean,VarianceMean,Variance,,……

Power Analysis, Power Analysis, Physical Physical ModelingModeling

AsymptoticsAsymptotics

O(NO(N22), ), O(Nlog(NO(Nlog(N))))

Resiliency Resiliency Analysis: Analysis: Mirroring, Mirroring, Replication,… Replication,… Distributed Distributed System AnalysisSystem Analysis

Given the wide variety of algorithms, they can be analyzed in many dimensions, speed, accuracy, power consumption, and resiliency.

•Numerical algorithms have to be devised for adequate accuracy. Only after you get sufficient accuracy can we look at speed.

•Speed has many dimensions, asymptotics, mean time, variance of the execution time, etc. Instead of time, we can look at memory or in general resource usage also.

•Embedded systems have to be power efficient, e.g. cell phones.

•Many algorithms, especially banking and finance are required to be fault tolerant, especially of server failures, etc. These systems are required to be generally geographically distributed. The resulting communication overhead can often be the dominant contribution to time.

•In this module, we shall primarily focus on ASYMPTOTICS

Page 41: SLIDES01_SE15_AOA [Read-Only]

41

Copyright © 2004, Infosys Technologies Ltd

41 ER/CORP/CRS/SE15/003

Version No: 2.0

Problem size

The problem size depends on the nature of the problem for which we aredeveloping the algorithms. The complexity of an algorithm is expressed as a function of problem size

Examples:• If we are searching an element in an array having ‘n’ elements, the problem

size is ____

• If we are merging 2 arrays of size ‘n’ and ‘m’, the problem size of the algorithm is _____

• If we are computing the nth factorial, the problem size is

same as the size of array ( = ‘n’).

sum of two array sizes ( = ‘n + m’)

The Problem size is ‘n’

The space required for storing n elements is n.The space required for representing the binary form of a number n is floor(log n) + 1.

Page 42: SLIDES01_SE15_AOA [Read-Only]

42

Copyright © 2004, Infosys Technologies Ltd

42 ER/CORP/CRS/SE15/003

Version No: 2.0

Order of Magnitude of an algorithmCalculate the running time and consider only the leading term of the formula which

gives the order of magnitude.• Example 1for( i = 0; i< n; i ++)

...

...Assume there are ‘c’ number of statements inside the loopEach statement takes 1 unit of time

Execution time for 1 loop = c * 1 = c

Total execution time = n * c

Since ‘c’ is constant it is insignificant. So the order is ‘n’

In calculating the order of magnitude, the lower order terms are left out as they are relatively insignificant.

The assumptions in the example are made because we will not know on which machine the algorithm is to be implemented. So we can’t exactly say how much time each statement will take. The exact time depends on the machine on which the algorithm is run.In the example the approximation is done because for higher values of ‘n’, the effect of ‘c’ (constant) will not be significant. Thus, constants can be ignored.

Page 43: SLIDES01_SE15_AOA [Read-Only]

43

Copyright © 2004, Infosys Technologies Ltd

43 ER/CORP/CRS/SE15/003

Version No: 2.0

Order of Magnitude of an algorithm (Cont…)

• Example 2for( i=0;i<n; i ++)

for(j=0;j<m;j++) …. ….

Assume we have ‘c’ number of statements inside the innermost loopFollowing the same assumptions as the earlier example

Execution time for 1 loop = c * 1

Execution time for the inner loop = m * c

Total execution time = n * (m * c)

Since c is a constant, the total execution time = n * m

In the above example, the inner loop will be executed m times and the outer loop n times.

Page 44: SLIDES01_SE15_AOA [Read-Only]

44

Copyright © 2004, Infosys Technologies Ltd

44 ER/CORP/CRS/SE15/003

Version No: 2.0

Analysis based on the nature of the problem

The analysis of the algorithm can be performed based on the nature of the problem.

Thus we have:• Worst case analysis • Average case analysis• Best case analysis

Worst case:Under what condition/s does the algorithm when executed consumes maximum amount of resources. It is the maximum amount of resource the algorithm can consume for any value of problem size.

Best case:Under what condition/s does the algorithm when executed consumes minimum amount of resources.

Average case:This is between worst case & best case. It is probabilistic in nature. Average-case running times are calculated by first arriving at an understanding of the average nature of the input, and then performing a running-time analysis of the algorithm for this configuration.Average case analysis is done by considering every possibility are equally likely to happen.

Page 45: SLIDES01_SE15_AOA [Read-Only]

45

Copyright © 2004, Infosys Technologies Ltd

45 ER/CORP/CRS/SE15/003

Version No: 2.0

Why Worst case analysis?

Even though the average case is more tends towards the real situation worst caseanalysis is preferred due to the following reasons:

• It is better to bound ones pessimism – the time of execution can’t go beyond T(n) as it is the upper bound

• Generally it is easy to compute the worst case analysis as compared to computation of best case and average case of algorithms

During Priori analysis Worst case complexity is preferred. Why?The goodness of an algorithm is most often expressed in terms of its worst-case running time. There are two reasons for this: the need for a bound on one’s pessimism, and the ease of calculation (in most of the cases) of worst-case times as compared to average-case times

Here we prefer worst case complexity is due to ease of computation of the worst case complexity of the algorithm compared to the average case complexity and the least usage of best case. Also it is better to find the maximum time of execution of an algorithm to be on the safer side.

Page 46: SLIDES01_SE15_AOA [Read-Only]

46

Copyright © 2004, Infosys Technologies Ltd

46 ER/CORP/CRS/SE15/003

Version No: 2.0

Asymptotic notations for determination of order of magnitude of an algorithmThe limiting behavior of the complexity of a problem as problem size increases is

called asymptotic complexity

The most common asymptotic notations are:• ‘Big Oh’ ( ‘O’) notation:

It represents the upper bound of the resources required to solve a problem.It is represented by ‘O’

• ‘Omega’ notation:It represents the lower bound of the resources required to solve a problem.

It is represented by Ω

The goodness of an algorithm is expressed usually in terms of its worst case running time.‘Worst case running time’ of an algorithm is the ‘upper bound’ for time of execution of that algorithm for different problem size. An algorithm is said to have a worst-case running time of O(n^2) if, its running time (execution time) is always bound within n^2 where n is the problem size.

Goodness of an algorithm refers to efficiency or capability Upper bound is also called the upper limit or the range of maximum values. Eg: when we consider marks of a student out of 100, 100 is the upper bound. Student can’t get marks greater than 100.

Page 47: SLIDES01_SE15_AOA [Read-Only]

47

Copyright © 2004, Infosys Technologies Ltd

47 ER/CORP/CRS/SE15/003

Version No: 2.0

Asymptotic analysis: What it does?

• Asymptotic analysis is necessary but not sufficient for many kinds of problems

N 2N

?%

100%

The large body of literature on asymptotic (apriori) analysis basically answers the question:In relative terms, how much more time does a problem of twice (say) the size take? Say, if I can sort 1000 numbers in unit time, how much time will it take for sorting 10000 numbers? The unit time is not specified (analysis is relative), but could be say 10-100 microseconds

on typical modern PCs.It does not attempt to give exact estimates of runtime. In database and similar applications, asymptotic analysis is very useful, as it yields insight into scalability to larger database sizes. In real-time and transaction processing system, scalability in terms of throughput (increased answers/second for problems of the same size), requires the mean and variance of the execution time to be controlled instead.

A large portion of this course will deal with asymptotic analysis.

N 2N

?%

100%

Page 48: SLIDES01_SE15_AOA [Read-Only]

48

Copyright © 2004, Infosys Technologies Ltd

48 ER/CORP/CRS/SE15/003

Version No: 2.0

Big Oh notation

T(n) = O(f(n)) if there are constants c and n0 such that T(n) <= cf(n) when n >= n0. In this Big-Oh notation for worst case analysis, c and n0 are positiveintegers. n0 represents the threshold problem size.

T(n)

T(n) is bound within f(n)for different values of n

T(n)

n0

ThresholdProblem size

cf(n)

Problem Size

(Upped bound ofthe algorithm)

While we compute the complexity of any algorithm, we take the threshold problem size i.e n > n0 , where n0 is the threshold problem size and n is the problem size. Accordingly we determine the upper bound of computation. In the above graph, the dotted line (parallel to y axis ) passing through the intersection of T(n) and f(n) represents the threshold problem size.The threshold problem size is taken into account in priori analysis because the algorithm might have some assignment operations which can’t be neglected for a lower problem size ( i.e for lower values of ‘n’).

Example:T(n) = (n+1)2

Which is O(n2).f(n) = n2

Let n0 = 1 ( threshold value)c=(1+1)2 = 4So there exists n0 and c such that T(n) <= cf(n).

Page 49: SLIDES01_SE15_AOA [Read-Only]

49

Copyright © 2004, Infosys Technologies Ltd

49 ER/CORP/CRS/SE15/003

Version No: 2.0

Theta & Omega notations

Theta notation (Θ):

T( n ) = Θ( f( n )) if there are positive constants c1, c2 and n0 such thatc2.f(n) ≤ T( n ) ≤ c1.f(n), for all n ≥ n0.

Omega Notation (Ω):

T( n ) = Ω( f( n )) if there are positive constants c and n0 such that T( n ) ≥ c.f( n ) for all n ≥ n0.

Theta notation:If it can proved that for any two constants c1 & c2, T(n) lies between c1.f(n) and c2.f(n) then T(n) can be expressed as Θ( f( n )).

Omega notation:The function f(n) is the lower bound for T(n). This means for any value of n (n ≥ n0), the time of computation of the algorithm T(n) is always above the graph of f(n). So f(n) serves as the lower bound for T(n).

Page 50: SLIDES01_SE15_AOA [Read-Only]

50

Copyright © 2004, Infosys Technologies Ltd

50 ER/CORP/CRS/SE15/003

Version No: 2.0

Big ‘Oh’ Vs Omega notations

Case (i) : A Project manager requires maximum of 100 software engineers to finish the project on time.

Case (ii) : The Project manager can start the project with minimum of 50 software engineers but cannot assure the completion of project in time.

Case (i) is similar to Big Oh notation, specifying the upper bound of resources needed to do a task.

Case (ii) is similar to Omega notation, specifying the lower bound of resources needed to do a task.

Which case is preferred?

Case (i) is preferred in most of the situations.

Page 51: SLIDES01_SE15_AOA [Read-Only]

51

Copyright © 2004, Infosys Technologies Ltd

51 ER/CORP/CRS/SE15/003

Version No: 2.0

‘Big Oh’ manipulations

While finding the worst case complexities of algorithms using Big Oh notation, some/all of the following rules are used.

Rule I The leading coefficients of highest power of ‘n’ and all lower powers of ‘n’ and the

constants are ignored in f(n)

Example:T(n) = O(100n3 + 29 n2 + 19n)

Representing the same in big Oh notation

T(n) = O(n3)

The constants and the slower growing terms are ignored as their growth rates are insignificant compared to the growth rate of the highest power.

Page 52: SLIDES01_SE15_AOA [Read-Only]

52

Copyright © 2004, Infosys Technologies Ltd

52 ER/CORP/CRS/SE15/003

Version No: 2.0

Big Oh Manipulations (contd.,)

Rule II :The time of execution of a ‘for loop’ is the ‘running time’ of all statements inside the ‘for loop’ multiplied by number of iterations of the ‘for loop’.

Example:for( i=0 to n)

x x + 1;y y + 1;x x + y

The for loop is executed n times.So, worst case running time of the algorithm is

T ( n ) = O( 3 * n ) = O ( n )

Page 53: SLIDES01_SE15_AOA [Read-Only]

53

Copyright © 2004, Infosys Technologies Ltd

53 ER/CORP/CRS/SE15/003

Version No: 2.0

Big Oh Manipulations (contd.,)Rule III :If we have a ‘nested for loop’, in an algorithm, the analysis of that algorithm should start from the inner loop and move it outwards towards outer loop. Example:for(j=0 to m)

for( i=0 to n) x x + 1;y y + 1;z x + y;

The worst case running time of inner loop is O( 3*n )

The worst case running time of outer loop is O( m*3*n )

The total running time = O ( m * n )

Page 54: SLIDES01_SE15_AOA [Read-Only]

54

Copyright © 2004, Infosys Technologies Ltd

54 ER/CORP/CRS/SE15/003

Version No: 2.0

Big Oh Manipulations (contd.,)Rule IV :The execution of an ‘if else statement’ is an algorithm comprises of • Execution time for testing the condition• The maximum execution time of either ‘if’ or ‘else’( whichever is larger )Example:If(x > y)

print( “ x is larger than y”);print(“ x is the value to be selected”);z x;x x+1;

else print( “ x is smaller than y”);

The execution time of the program is the exec. time of testing (X > Y) +

exec. time of ‘if’ statement, as the execution time of ‘if’ statement is

more than that of ‘else’ statement

O(constant)=1.For example, O(100)=1

Page 55: SLIDES01_SE15_AOA [Read-Only]

55

Copyright © 2004, Infosys Technologies Ltd

55 ER/CORP/CRS/SE15/003

Version No: 2.0

Case study on analysis of algorithms

The following examples will help us to understand the concept of worst case andaverage case complexities

Example – 1: Consider the following pseudocode.To insert a given value, k at a particular index, l in an array, a[1…n]:1. Begin2. Copy a[l…n] to a[l+1…n+1] (Assuming space is available)3. Copy k to a[l]4. End

BEST CASE: O (1)

WORST CASE: O (n)

AVERAGE CASE: O (n)

The above given code inserts a value k into position l in an array a. The basic operation here is copy. Worst Case Analysis: Step 2 does n-1 copies in the worst case. Step 3 does 1 copy. So the total number of copy operations is n-1+1=n. Hence the worst case complexity of array insertion is O(n). Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs 2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n + (n/n) = (n+1)/2. Also step 3 performs 1 copy. So on an average the array insertion performs ((n+1)/2) + 1 copies. Hence the average case complexity of array insertion is O(n).Best case Analysis:

O(1) = 1, as only one insertion is done with no movements.

Page 56: SLIDES01_SE15_AOA [Read-Only]

56

Copyright © 2004, Infosys Technologies Ltd

56 ER/CORP/CRS/SE15/003

Version No: 2.0

Case study (Contd…)

Example – 2: Consider the following pseudocode.To delete the value, k at a given index, i in an array, a[1…n]:

1. Begin2. Copy a[i+1…n] to a[i…n-1]3. Clear a[n]4. End

1 to (i-1)

i

(i+1) to n

1 to (j-1)

j to (n-1)

The above given code deletes the value k at a given index i in an array a. The basic operation here is copy. Worst Case Analysis: Step 2 does n-1 copies in the worst case. So the total number of copy operations is n-1. Hence the worst case complexity is O(n). Average Case Analysis: On an average step 2 will perform (n-1)/2 copies. This is derived as follows: The probability that step 2 performs 1 copy is 1/n, the probability that it performs 2 copies is 2/n and so on. The probability that it performs n-1 copies is (n-1)/n. Hence the average number of copies that step 2 performs is (1/n) + (2/n) + … + (n-1)/n = (n-1)/2. So on an average the array deletion performs ((n-1)/2) copies. Hence the average case complexity of array insertion is O(n).Best case Analysis:

O(1) = 1, as only one deletion will be done with no further movements.

Page 57: SLIDES01_SE15_AOA [Read-Only]

57

Copyright © 2004, Infosys Technologies Ltd

57 ER/CORP/CRS/SE15/003

Version No: 2.0

Summary of Unit-2

• Analyzing Algorithms– Introduction to Space and Time complexities– Basic Mathematical principles– Order of magnitude– Introduction to Asymptotic notations

• Best case• Worst case• Average case

Page 58: SLIDES01_SE15_AOA [Read-Only]

58

Copyright © 2004, Infosys Technologies Ltd

58 ER/CORP/CRS/SE15/003

Version No: 2.0

Thank You!