introduction to scientific computing

267
University of Tartu, Institute of Computer Science Introduction to Scientific Computing MTAT.08.025 [email protected] Spring 2017

Upload: others

Post on 12-Jan-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Scientific Computing

University of Tartu, Institute of Computer Science

Introduction to Scientific Computing

MTAT.08.025

[email protected]

Spring 2017

Page 2: Introduction to Scientific Computing

2 Practical information

Lectures: Liivi 2 - 122 MON 10:15Computer classes: Liivi 2 - 205, MON 12:15 Amnir Hadachi <am-

[email protected]>3 ECTS

Lectures: 16h; Computer Classes: 16h; Independent work: 46hFinal grade forms from :

1. Active partitipation at lectures 10%

2. Stand-up quiz(es) 10%

3. Computer class activities 50%

4. Exam 30%

Course homepage (http://courses.cs.ut.ee/2017/isc)

Page 3: Introduction to Scientific Computing

3 Introduction 1.1 Syllabus

1 Introduction1.1 SyllabusLectures:

• Python for Scientific Computing NumPy, SciPy

• Scientific Computing - an Overview

• Floating point numbers, how to deal with roundoff errors

• Large problems in Linear Algebra, condition number

• Memory hierarchies and making use of it

• Numerical solution of differential equations

• Some examples (Google PageRank; Graph partitioning)

• Parallel Computing

Page 4: Introduction to Scientific Computing

4 Introduction 1.1 Syllabus

Computer Classes (preliminary plan)

1. Python for Scientific Computing; Fibonacci numbers; Collatz conjecture

2. Discretization and round-off errors

3. NumPy arrays, matrices; LU-Factorization with Gauss Elimination Method(GEM)

4. UT HPC server; LU-Factorization and GEM on HPC cluster

5. Floating point numbers

6. Fractals

7. Fourier series and Fast Fourier Transform

8. Discrete-time models and ordinary differential equations

Page 5: Introduction to Scientific Computing

5 Introduction 1.2 Literature

1.2 Literature

General Scientific Computing:

1. RH Landau, A First Course in Scientific Computing. Symbolic, Graphic, andNumeric Modeling Using Maple, Java, Mathematica, and Fortran90. PrincentonUniversity Press, 2005.

2. LR Scott, T Clark, B Bagheri. Scientific Parallel Computing. Princenton Uni-versity Press, 2005.

3. MT Heath, Scientific Computing; ISBN: 007112229X, McGraw-Hill Compa-nies, 2001.

4. JW Demmel, Applied Numerical Linear Algebra; ISBN: 0898713897, Societyfor Industrial & Applied Mathematics, Paperback, 1997.

Page 6: Introduction to Scientific Computing

6 Introduction 1.2 Literature

Python for SC:

1. Hans Petter Langtangen, A Primer on Scientific Programming withPython, Springer, 2009. Book webpage (http://vefur.simula.no/intro-programming/).

2. Hans Petter Langtangen, Python Scripting for Computational Science. ThirdEdition, Springer 2008. Web-site for the book (http://folk.uio.no/hpl/scripting/).

3. Neeme Kahusk, Sissejuhatus Pythonisse (http://www.cl.ut.ee/inimesed/nkahusk/sissejuhatus-pythonisse/).

4. Travis E. Oliphant, Guide to NumPy (http://www.tramy.us), TrelgolPublishing 2006.

5. Help documentation inside sage – used during Computer Classes

Page 7: Introduction to Scientific Computing

7 Introduction 1.3 Scripting vs programming

1.3 Scripting vs programming

1.3.1 What is a script?

• Very high-level, often short, programwritten in a high-level scripting language

• Scripting languages:

– Unix shells,

– Tcl,

– Perl,

– Python,

– Ruby,

– Scheme,

– Rexx,

– JavaScript,

– VisualBasic,

– ...

Page 8: Introduction to Scientific Computing

8 Introduction 1.3 Scripting vs programming

1.3.2 Characteristics of a script

• Glue other programs together

• Extensive text processing

• File and directory manipulation

• Often special-purpose code

• Many small interacting scripts may yield a big system

• Perhaps a special-purpose GUI on top

• Portable across Unix, Windows, Mac

• Interpreted program (no compilation+linking)

Page 9: Introduction to Scientific Computing

9 Introduction 1.3 Scripting vs programming

1.3.3 Why not stick to Java, C/C++ or Fortran?

Features of Perl and Python compared with Java, C/C++ and Fortran:

• shorter, more high-level programs

• much faster software development

• more convenient programming

• you feel more productive

• no variable declarations ,but lots of consistency checks at run time

• lots of standardized libraries and tools

Page 10: Introduction to Scientific Computing

10 Introduction 1.4 Scripts yield short code

1.4 Scripts yield short code

Consider reading real numbers from a file, where each line can contain an arbitrarynumber of real numbers: 1.1 9 5.2

1.762543E-02

0 0.01 0.001 9 3 7Python solution:

F = open(filename, ’r’)

n = F.read().split()Perl solution:

open F, $filename;

$s = join "", <F>;

@n = split ’ ’, $s;

Page 11: Introduction to Scientific Computing

11 Introduction 1.5 Performance issues

Ruby solution: n = IO.readlines(filename).join.split

...Doing this in C++ or Java1 requires at least a loop, and in Fortran and C quitesome code lines are necessary

1.5 Performance issues

1.5.1 Scripts can be slow

• Perl and Python scripts are first compiled to byte-code

• The byte-code is then interpreted

1True in case of older Java implementations. With Java 8 and Java Streames that come with it, forexample, plain Java 8 code does the same:Files.lines(filePath).flatMap( Pattern.compile("\\s+")::splitAsStream );

Page 12: Introduction to Scientific Computing

12 Introduction 1.5 Performance issues

• Text processing is usually as fast as in C

• Loops over large data structures might be very slow

for i in range(len(A)):

A[i] = ...• Fortran, C and C++ compilers are good at optimizing such loops at compile time

and produce very efficient assembly code (e.g. 100 times faster)

• Fortunately, long loops in scripts can easily be migrated to Fortran or C (orspecial libraries like numpy!)

Page 13: Introduction to Scientific Computing

13 Introduction 1.5 Performance issues

1.5.2 Scripts may be fast enough

Read 100 000 (x,y) data from file and write (x,f(y)) out again

• Pure Python: 4s

• Pure Perl: 3s

• Pure Tcl: 11s

• Pure C (fscanf/fprintf): 1s

• Pure C++ (iostream): 3.6s

• Pure C++ (buffered streams): 2.5s

• Numerical Python modules: 2.2s (!)

• Remark: in practice, 100 000 data points are written and read in binary format,resulting in much smaller differences

Page 14: Introduction to Scientific Computing

14 Introduction 1.5 Performance issues

1.5.3 When scripting is convenient

• The application’s main task is to connect together existing components

• The application includes a graphical user interface

• The application performs extensive string/text manipulation

• The design of the application code is expected to change significantly

• CPU-time intensive parts can be migrated to C/C++ or Fortran

• The application can be made short if it operates heavily on list or hash structures

• The application is supposed to communicate with Web servers

• The application should run without modifications on Unix, Windows, and Mac-intosh computers, also when a GUI is included

Page 15: Introduction to Scientific Computing

15 Introduction 1.5 Performance issues

1.5.4 When to use C, C++, Java, Fortran

• Does the application implement complicated algorithms and data structures?

• Does the application manipulate large datasets so that execution speed iscritical?

• Are the application’s functions well-defined and changing slowly?

• Will type-safe languages be an advantage, e.g., in large development teams?

At this course we will be using python and corresponding libraries for Scientific Com-puting (numpy, scipy through sage)

NOTE: is Python really a scripting language?

• Object-oriented language!

• Cython

• PyPy – jit (just in time) compiler

Page 16: Introduction to Scientific Computing

16 What is Scientific Computing? 2.1 Introduction to Scientific Computing

2 What is Scientific Computing?2.1 Introduction to Scientific Computing

• Scientific computing – subject on crossroads of

• physics, chemistry, [social, engineering,...] sciences

– problems typically translated into

∗ linear algebraic problems

∗ sometimes combinatorial problems

• a computational scientist needs knowledge of some aspects of

– numerical analysis

– linear algebra

– discrete mathematics

Page 17: Introduction to Scientific Computing

17 What is Scientific Computing? 2.1 Introduction to Scientific Computing

• An efficient implementation needs some understanding of

– computer architecture

∗ both on the CPU level

∗ on the level of parallel computing

– some specific skills of software management

Scientific Computing – field of study concerned with constructing math-ematical models and numerical solution techniques and using computersto analyse and solve scientific and engineering problems

• typically – application of computer simulation and other forms of computationto problems in various scientific disciplines.

Page 18: Introduction to Scientific Computing

18 What is Scientific Computing? 2.1 Introduction to Scientific Computing

Main purpose of Scientific Computing:

• mirroring

• predictionof real world processes’

• characteristics

– behaviour

– development

Example of Computational SimulationASTROPHYSICS: what happens with collision of two black holes in the universe?Situation which is

• impossible to observe in the nature,

• test in a lab

• estimate barely theoretically

Page 19: Introduction to Scientific Computing

19 What is Scientific Computing? 2.1 Introduction to Scientific Computing

Computer simulation CAN HELPBut what is needed for simulation?

• adequate mathematical model (Einstein’s general relativity theory)

• algorithm for numerical solution of the equations

• enough big computer for actual realisation of the algorithms

Frequently: need for simulation of situations that could be performed experimantally,but simulation on computers is needed because:

• HIGH COST OF THE REAL EXPERIMENT. Examples:

– car crash-tests

– simulation of gas explosions

– nuclear explosion simulation

– behaviour of ships in Ocean waves

Page 20: Introduction to Scientific Computing

20 What is Scientific Computing? 2.1 Introduction to Scientific Computing

– airplane aerodynamics

– strength calculations in big constructions (for example oil-platforms)

– oil-field simulations

• TIME FACTOR. Some examples:

– Climate change predictions

– Geological development of the Earth (including oil-fields)

– Glacier flow model

– Weather prediction

• SCALE OF THE PROBLEM. Some examples:

– modeling chemical reactions on the molecular level

– development of biological ecosystems

Page 21: Introduction to Scientific Computing

21 What is Scientific Computing? 2.1 Introduction to Scientific Computing

• PROCESSES THAT CAN NOT BE INTERVENED Some examples:

– human heart model

– global economy model

Page 22: Introduction to Scientific Computing

22 What is Scientific Computing? 2.2 Specifics of computational problems

2.2 Specifics of computational problems

Usually, computer simulation consists of:

1. Creation of mathematical model – usually in a form of equations –physical properties and dependencies of the subject

2. Algorithm creation for numerical solution of the equations

3. Application of the algorithms in computer software

4. Using the created software on a computer in a particular simulationprocess

5. Visualizing the results in an understandable way using computer graph-ics, for example

6. Integration of the results and repetition/redesign of arbitrary given stepabove

Page 23: Introduction to Scientific Computing

23 What is Scientific Computing? 2.2 Specifics of computational problems

Most often:

• algorithm

– written down in an intuitive way

– and/or using special modeling software

• computer program written, based on the algorithm

• testing

• iterating

Explorative nature of Scientific Computing!

Page 24: Introduction to Scientific Computing

24 What is Scientific Computing? 2.3 Mathematical model

2.3 Mathematical model

GENERAL STRATEGY:REPLACE A DIFFICULT PROBLEM WITH A MORE SIMPLE ONE

• – which has the same solution

• – or at least approximate solution

– but still reflecting the most important features of the problem

Page 25: Introduction to Scientific Computing

25 What is Scientific Computing? 2.3 Mathematical model

SOME EXAMPLES OF SUCH TECHNIQUES:

• Replacing infinite spaces with finite ones (in maths sense)

• Infinite processes replacement with finite ones

– replacing integrals with finite sums

– derivatives replaced by finite differences

• Replacing differential equations with algebraic equations

• Nonlinear equations replaced by linear equations

• Replacing higher order systems with lower order ones

• Replacing complicated functions with more simple ones (like polynomials)

• Arbitrary structured matrix replacement with more simple structured matrices

Page 26: Introduction to Scientific Computing

26 What is Scientific Computing? 2.3 Mathematical model

AT THIS COURSE WE TRY TO GIVE:

an overview of some methods and analysis for development of reliable andefficient software for Scientific Computing

Reliability means here both the reliability of the software as well as adequatyof the results – how much can one rely on the achieved results:

• Is the solution acceptable at all? is it a real solution? (extraneous solution?,instability of the solution? etc); Does the solution algorithm guarantee a solutionat all?

• How big is the calculated solution’s deviation from the real solution? How wellthe simulation reflects the real world?

Another aspect: software reliability

Page 27: Introduction to Scientific Computing

27 What is Scientific Computing? 2.3 Mathematical model

Efficiency expressed on various levels of the solution process

• speed

• amount of used resources

Resources can be:

– Time

– Cost

– Number of CPU cycles

– Number of processes

– Amount of RAM

– Human labour

Page 28: Introduction to Scientific Computing

28 What is Scientific Computing? 2.3 Mathematical model

General formula:

minapproximation error

timeEven more generally: – minimise time of the solution

Efficient method requires:

(i) good discretisation

(ii) good computer implementation

(iii) depends on computer architecture (processor speed, RAM size, mem-ory bus speed, availability of cache, number of cache levels and otherproperties )

Page 29: Introduction to Scientific Computing

29 Approximation 3.1 Sources of approximation error

3 Approximation in Scientific Computing

3.1 Sources of approximation error

3.1.1 Error sources that are under our control

MODELLING ERRORS – some physical entities in the model are simplified or evennot taken into account at all (for example: air resistance, viscosity, friction etc)

(Usually it is OK but sometimes not... )(You may want to look: http://en.wikipedia.org/wiki/Spherical_cow :-)

Page 30: Introduction to Scientific Computing

30 Approximation 3.1 Sources of approximation error

MEASUREMENT ERRORS – laboratory equipment has its precision

Errors come also out of

• random measurement deviation

• backward noise

As an example, Newton and Planck constants are used with 8-9 decimal places whilelaboratory measurements are performed with much less precision!

THE EFFECT OF PREVIOUS CALCULATIONS – the input for calculations isoften already output of some previous calculation with some computational errors

Page 31: Introduction to Scientific Computing

31 Approximation 3.1 Sources of approximation error

3.1.2 Errors created during the calculations

Discretisation

As an example:

• replacing derivatives with finite differences

• finite sums used instead of infinite series

• etc

Round-off errors – error created during the calculations due to limited available pre-cision, which the calcualtions are performed with

Page 32: Introduction to Scientific Computing

32 Approximation 3.1 Sources of approximation error

Example 3.1

Suppose, a computer program can find function f value f (x) for arbitrary x.Task: find an algorithm for calculating approximation to the derivative f ′(x)Algorithm: Choose small h > 0 and approximate:

f ′(x)≈ [ f (x+h)− f (x)]/h

Discretisation error is:

T := | f ′(x)− [ f (x+h)− f (x)]/h|.

Using Taylor series, we get an estimation:

T ≤ h2

∥∥ f ′′∥∥

∞. (1)

Page 33: Introduction to Scientific Computing

33 Approximation 3.1 Sources of approximation error

Computational error is created using finite precision arithmetics approximatingthe real f (x) with an approximation f (x). Computational error C is:

C =

∣∣∣∣ f (x+h)− f (x)h

− f (x+h)− f (x)h

∣∣∣∣=

∣∣∣∣ [ f (x+h)− f (x+h)]− [ f (x)− f (x)]h

∣∣∣∣ ,which gives an estimate:

C ≤ 2h‖ f − f‖∞. (2)

The resulting error is ∣∣∣∣ f ′(x)− f (x+h)− f (x)h

∣∣∣∣ ,which can be estimated using (1) and (2):

T +C ≤ h2‖ f ′′‖∞ +

2h‖ f − f‖∞. (3)

Page 34: Introduction to Scientific Computing

34 Approximation 3.1 Sources of approximation error

=⇒ if h is large – the discretisation error is dominating, if h is small, computationalerror starts dominating.

3.1.3 Forward error (arvutuslik viga e. tulemuse viga) and backward error(algandmete viga)

Consider computing y = f (x). Usually we can only compute an approximation ofy, we denote the approximately calculated value by y. We can observe two measuresof the error associated with this computation.

Forward error

The forward error is a measure of the difference between the approximation y andthe true value y:

absolute forward error: |y− y|relative forward error: |y−y|

|y|

Page 35: Introduction to Scientific Computing

35 Approximation 3.1 Sources of approximation error

The forward error would be a natural quantity to measure, but usually (since wedon’t know the actual value of y) we can only get an upper bound on it. Moreover,tight upper bounds on it can be very difficult.

Backward error

The question we might want to ask: For what input data we actually performed thecalculations? We would like to find the smallest ∆x for which

y = f (x+∆x)

– Here we have y as the exact value of f (x+∆x). The value |∆x| (or |∆x||x| ) is called

backward error. This means, backward error is the one which we have in the input.(Like the forward error is the error we observe in the output of the calculations or analgorithms.)

Page 36: Introduction to Scientific Computing

36 Approximation 3.1 Sources of approximation error

Condition number – upper limit of their ratio:

forward error≤ condition number×backward error

From (2) it follows that in Example 4.1 the value of condition number is: 2/h.In given calculations all the values are absolute: actual values of the approximated

entities are not considered. Relative forward error and relative backward error are inthis case:

C|( f (x+h)− f (x))/h|

and‖ f − f‖∞

‖ f‖∞

.

Assuming that minx | f ′(x)|> 0, it follows easily from (2), that:

C|( f (x+h)− f (x))/h|

2h‖ f‖∞

minx | f ′(x)|

‖ f − f‖∞

‖ f‖∞

.

Page 37: Introduction to Scientific Computing

37 Approximation 3.1 Sources of approximation error

The value in the brackets · is called relative condition number of the problem.In general:

• If (absolute or relative) condition number is small,

– then (absolute or relative) error in the input data can produce only a smallerror in the result.

• If condition number is large

– then large error in the result can be caused even by a small error in theinput data

– such problems are said to be ill-conditioned

Page 38: Introduction to Scientific Computing

38 Approximation 3.1 Sources of approximation error

• Sometimes, in the case of finite precision arithmetics:

– backward error is much more simple to estimate than forward error

– Backward error combined with condition number makes it possibe to esti-mate the forward error (absolute or relative)

Page 39: Introduction to Scientific Computing

39 Approximation 3.1 Sources of approximation error

Example 3.2 (One of the key problems in Scientific Computing)Consider solving the system of linear equations:

Ax = b, (4)

where the input consists of

• nonsingular n×n matrix A

• b ∈ Rn

The task is to calculate – an approximate solution: x ∈ Rn.Suppose, instead of exact matrix A – given its approximation A = A+δA, but (for

simplicity) b known exactly. The solution x = x+δx satisfies the system of equations

(A+δA)(x+δx) = b. (5)

Page 40: Introduction to Scientific Computing

40 Approximation 3.1 Sources of approximation error

Then from (4),(5) it follows that

(A+δA)δx =−(δA)x.

Multiplying it with (A+δA)−1 and taking norms, we estimate

‖δx‖ ≤ ‖(A+δA)−1‖‖δA‖‖x‖.

It follows that if x 6= 0 and A 6= 0, we have:

‖δx‖‖x‖

≤ ‖(A+δA)−1‖‖A‖‖δA‖‖A‖

∼= ‖A−1‖‖A‖‖δA‖‖A‖

, (6)

which is satisfied with δA sufficiently small.

Page 41: Introduction to Scientific Computing

41 Approximation 3.1 Sources of approximation error

• =⇒for calculation of x an important factor is relative condition numberκ(A) := ‖A−1‖‖A‖.

– It is usually called as condition number of matrix A.

– Depends on norm ‖ · ‖

Therefore, common practice for forward error estimation is to:

• find an estimate to the backward error

• use the estimate (6)

Page 42: Introduction to Scientific Computing

42 Approximation 3.2 Floating-Point Numbers

3.2 Floating-Point Numbers

The number −3.1416 in scientific notation is −0.31416× 101 or (as computeroutput) -0.31416E01.

sign

exponent

−.31416 101

mantissa base

– floating point numbers in computer notation. Usually, base is 2 (with a few excep-tions like IBM 370 had a base 16; base 10 in most of hand-held calculators; 3 in anill-fated Russian computer).

For example, .101012×23 = 5.2510.(-: There are 10 kinds of people in the world – those who understand binary – and those who don’t :-)

Formally, a floating-point number system F, is characterised by four integers:

• Base (or radix) β > 1

• Precision p > 0

• Exponent range [L,U ]: L < 0 <U

Page 43: Introduction to Scientific Computing

43 Approximation 3.2 Floating-Point Numbers

Any floating-point number x ∈ F has the form

x =±d0 +d1β−1 + ...+dp−1β

1−pβ E , (7)

where integers di satify

0≤ di ≤ β −1, i = 0, ..., p−1,

and E ∈ [L,U ] (E is positive, zero or negative integer). The number E is called anexponent and in the part in the brackets · is called mantissa

Example. In arithmetics with precision 4 and base 10 the number 2347 is repre-sented as

2+3×10−1 +4×10−2 +7×10−3103.

Is it possible to represent 2345 in precision 3 and base 10?Note that exact representation of 2347 in precision 3 and base 10 is not possible!

Page 44: Introduction to Scientific Computing

44 Approximation 3.3 Normalised floating-point numbers

3.3 Normalised floating-point numbers

A number is normalised if d0 > 0Example. The number .101012×23 is normalised, but .0101012×24 is notFloating point systems are usually normalised because:

• Representation of each number is then unique

• No digits are wasted on leading zeros

• In normalised binary (β = 2) system, the leading bit always 1 =⇒ no need tostore it!

Smallest positive normalised number in form (7) is 1×β L – underflow threashold.(In case of underflow, the result is smaller than the smallest representable floating-point number)

Page 45: Introduction to Scientific Computing

45 Approximation 3.3 Normalised floating-point numbers

Largest positive normalised number in form (7) is

(β −1)1+β−1 + ...+β

1−pβU

= (1−β−p)βU+1.

– overflow threashold.If the result of an arithmetic operation is an exact number not represented in the

floating-point number system F, the result is represented as (hopefully close) elementof F. (Rounding)

Page 46: Introduction to Scientific Computing

46 Approximation 3.4 IEEE (Normalised) Arithmetics

3.4 IEEE (Normalised) Arithmetics

• β = 2 (binary)

• d0 = 1 always – not stored

Single precision:

• p = 24, L =−126, U = 127

• Underflow threashold = 2−126 ≈ 10−38

• Overflow threashold = 2127 · (2−2−23)≈ 2128 ≈ 1038

• One bit for sign, 23 for mantissa and 8 for exponent:

1 23 8

• – 32-bit word.

Page 47: Introduction to Scientific Computing

47 Approximation 3.4 IEEE (Normalised) Arithmetics

Double precision:

• p = 53, L =−1022, U = 1023

• Underflow threashold = 2−1022 ≈ 10−308

• Overflow threashold= 21023 · (2−2−52)≈ 21024 ≈ 10308

• One bit for sign, 52 for mantissa and 11 for exponent:

1 52 11

– 64-bit word

• IEEE arithmetics standard – rounding towards the nearest element in F.

• (If the result is exactly between the two elements, the rounding is towards thenumber which has the least significant bit equal to 0 – rounding towards theclosest even number)

Page 48: Introduction to Scientific Computing

48 Approximation 3.4 IEEE (Normalised) Arithmetics

IEEE subnormal numbers - unnormalised numbers with minimal possible expo-nent.

• Between 0 and the smallest normalised floating point value.

• Guarantees that f l(x− y) (the result of operation x− y in floating point arith-metics) in case x 6= y never zero – to avoid underflow in such situatons

IEEE symbols Inf and NaN – Inf (±∞), NaN (Not a Number)

• Inf - in case of overflow

– x/±∞ = 0 in case of arbitrary finite floating/point x

– +∞+∞ =+∞, etc.

• NaN is returned when operation does not have a well/defined finite or ininitevalue, for example

Page 49: Introduction to Scientific Computing

49 Approximation 3.4 IEEE (Normalised) Arithmetics

– ∞−∞

– 00

–√−1

– NaNx (where – one of operations: +, - , *, / ), etc

IEEE defines also double extended floating-point values

• 64 bit mantissa; 15 bit exponent

• most of the compilers do not support it

• Many platforms support also quadruple precision (double*16)

– often emulated with lower precision and therefore slow performance

Page 50: Introduction to Scientific Computing

50 Python in SC 4.1 Numerical Python (NumPy)

4 Python in Scientfic Computing

4.1 Numerical Python (NumPy)

• NumPy enables efficient numerical computing in Python

• NumPy is a package of modules, which offers efficient arrays (contiguous stor-age) with associated array operations coded in C or Fortran

• There are three implementations of Numerical Python

• Numeric from the mid 90s

• numarray from about 2000

• numpy from 2006 (the new and leading implementation)

• numpy (by Travis Oliphant) – recommended

Page 51: Introduction to Scientific Computing

51 Python in SC 4.1 Numerical Python (NumPy)

1 # A taste of NumPy: a least-squares procedure

2 from numpy import *3 n = 100; x = linspace(0.0, 1.0, n) # coordinates

4 y_line = -2*x + 3

5 y = y_line + random.normal(0, 0.55, n) # line with noise

6 # create and solve least squares system:

7 A = array([x, ones(n)])

8 A = A.transpose()

9 result = linalg.lstsq(A, y)

10 # result is a 4-tuple, the solution (a,b) is the 1st entry:

11 a, b = result[0]

12 p=[(x[i],y[i]) for i in range(len(x))]

13 p0 = (0,a*0 + b); p1 = (1,a*1 + b)

14 G=list_plot(p,color=’red’)+line([(0,3),(1,1)],color=’blue’)

15 G=G+line([p0, p1], color=’red’)

16 G=G+text(’Blue - original line -2*x+3’, (0.7, 3.5), color=’blue’) G=

G+text(’Red - line fitted to data’, (0.3, 0.5), color=’red’)

17 show(G) # note: retype symbols "’" when copy-pasting code to sage

Page 52: Introduction to Scientific Computing

52 Python in SC 4.1 Numerical Python (NumPy)

Resulting plot:

Page 53: Introduction to Scientific Computing

53 Python in SC 4.1 Numerical Python (NumPy)

4.1.1 NumPy: making arrays >>> from numpy import *>>> n = 4

>>> a = zeros(n) # one-dim. array of length n

>>> print a # str(a), float (C double) is default type

[ 0. 0. 0. 0.]

>>> a # repr(a)

array([ 0., 0., 0., 0.])

>>> p = q = 2

>>> a = zeros((p,q,3)) # p*q*3 three-dim. array

>>> print a

[[[ 0. 0. 0.]

[ 0. 0. 0.]]

[[ 0. 0. 0.]

[ 0. 0. 0.]]]

>>> a.shape # a’s dimension

(2, 2, 3)

Page 54: Introduction to Scientific Computing

54 Python in SC 4.1 Numerical Python (NumPy)

4.1.2 NumPy: making float, int, complex arrays >>> a = z e r o s ( 3 )>>> p r i n t a . dtype # a ’ s da ta t y p ef l o a t 6 4>>> a = z e r o s ( 3 , i n t )>>> p r i n t a , a . dtype[0 0 0 ] i n t 6 4( or i n t 3 2 , depend ing on a r c h i t e c t u r e )>>> a = z e r o s ( 3 , f l o a t 3 2 ) # s i n g l e p r e c i s i o n>>> p r i n t a[ 0 . 0 . 0 . ]>>> p r i n t a . dtypef l o a t 3 2>>> a = z e r o s ( 3 , complex ) ; aarray ( [ 0 . + 0 . j , 0 . + 0 . j , 0 . + 0 . j ] )>>> a . dtypedtype ( ’ complex128 ’ )

Page 55: Introduction to Scientific Computing

55 Python in SC 4.1 Numerical Python (NumPy)

• Given an array a, make a new array of same dimension and data type:

>>> x = zeros(a.shape, a.dtype)

4.1.3 Array with a sequence of numbers

• linspace(a, b, n) generates n uniformly spaced coordinates, startingwith a and ending with b

>>> x = linspace(-5, 5, 11)

>>> print x

[-5. -4. -3. -2. -1. 0. 1. 2. 3. 4. 5.]

Page 56: Introduction to Scientific Computing

56 Python in SC 4.1 Numerical Python (NumPy)

• arange works like range >>> x = arange(-5, 5, 1, float)

>>> print x # upper limit 5 is not included

[-5. -4. -3. -2. -1. 0. 1. 2. 3. 4.]

4.1.4 Warning: arange is dangerous

• arange’s upper limit may or may not be included (due to round-off errors)

4.1.5 Array construction from a Python list

array(list, [datatype]) generates an array from a list: >>> pl = [0, 1.2, 4, -9.1, 5, 8]

>>> a = array(pl)

Page 57: Introduction to Scientific Computing

57 Python in SC 4.1 Numerical Python (NumPy)

• The array elements are of the simplest possible type:

>>> z = array([1, 2, 3])

>>> print z # int elements possible

[1 2 3]

>>> z = array([1, 2, 3], float)

>>> print z

[ 1. 2. 3.]• A two-dim. array from two one-dim. lists:

>>> x = [0, 0.5, 1]; y = [-6.1, -2, 1.2] # Python lists

>>> a = array([x, y]) # form array with x and y as rows

Page 58: Introduction to Scientific Computing

58 Python in SC 4.1 Numerical Python (NumPy)

• From array to list: alist = a.tolist()

4.1.6 From “anything” to a NumPy array

• Given an object a, a = asarray(a)

converts a to a NumPy array (if possible/necessary)

• Arrays can be ordered as in C (default) or Fortran: a = asarray(a, order=’Fortran’)

isfortran(a) # returns True of a’s order is Fortran

Page 59: Introduction to Scientific Computing

59 Python in SC 4.1 Numerical Python (NumPy)

• Use asarray to, e.g., allow flexible arguments in functions:

def myfunc(some_sequence, ...):

a = asarray(some_sequence)

# work with a as array

myfunc([1,2,3], ...)

myfunc((-1,1), ...)

myfunc(zeros(10), ...)

Page 60: Introduction to Scientific Computing

60 Python in SC 4.1 Numerical Python (NumPy)

4.1.7 Changing array dimensions >>> a = array([0, 1.2, 4, -9.1, 5, 8])

>>> a.shape = (2,3) # turn a into a 2x3 matrix

>>> a.shape

(2, 3)

>>> a.size

6

>>> a.shape = (a.size,) # turn a into a vector of length

6 again

>>> a.shape

(6,)

>>> a = a.reshape(2,3) # same effect as setting a.shape

>>> a.shape

(2, 3)

Page 61: Introduction to Scientific Computing

61 Python in SC 4.1 Numerical Python (NumPy)

4.1.8 Array initialization from a Python function >>> def myfunc(i, j):

... return (i+1)*(j+4-i)

...

>>> # make 3x6 array where a[i,j] = myfunc(i,j):

>>> a = fromfunction(myfunc, (3,6))

>>> a

array([[ 4., 5., 6., 7., 8., 9.],

[ 6., 8., 10., 12., 14., 16.],

[ 6., 9., 12., 15., 18., 21.]])

Page 62: Introduction to Scientific Computing

62 Python in SC 4.1 Numerical Python (NumPy)

4.1.9 Basic array indexing a = linspace(-1, 1, 6)

# array([-1. , -0.6, -0.2, 0.2, 0.6, 1. ])

a[2:4] = -1 # set a[2] and a[3] equal to -1

a[-1] = a[0] # set last element equal to first one

a[:] = 0 # set all elements of a equal to 0

a.fill(0) # set all elements of a equal to 0

a.shape = (2,3) # turn a into a 2x3 matrix

print a[0,1] # print element (0,1)

a[i,j] = 10 # assignment to element (i,j)

a[i][j] = 10 # equivalent syntax (slower)

print a[:,k] # print column with index k

print a[1,:] # print second row

a[:,:] = 0 # set all elements of a equal to 0

Page 63: Introduction to Scientific Computing

63 Python in SC 4.1 Numerical Python (NumPy)

4.1.10 More advanced array indexing >>> a = linspace(0, 29, 30)

>>> a.shape = (5,6)

>>> a

array([[ 0., 1., 2., 3., 4., 5.,]

[ 6., 7., 8., 9., 10., 11.,]

[ 12., 13., 14., 15., 16., 17.,]

[ 18., 19., 20., 21., 22., 23.,]

[ 24., 25., 26., 27., 28., 29.,]])

>>> a[1:3,:-1:2] # a[i,j] for i=1,2 and j=0,2,4

array([[ 6., 8., 10.],

[ 12., 14., 16.]])

>>> a[::3,2:-1:2] # a[i,j] for i=0,3 and j=2,4

array([[ 2., 4.],

[ 20., 22.]])

Page 64: Introduction to Scientific Computing

64 Python in SC 4.1 Numerical Python (NumPy)

>>> i = slice(None, None, 3); j = slice(2, -1, 2)

>>> a[i,j]

array([[ 2., 4.],

[ 20., 22.]])

4.1.11 Slices refer the array data

• With a as list, a[:] makes a copy of the data

• With a as array, a[:] is a reference to the data!!! >>> b = a[1,:] # extract 2nd column of a

>>> print a[1,1]

12.0

>>> b[1] = 2

>>> print a[1,1]

Page 65: Introduction to Scientific Computing

65 Python in SC 4.1 Numerical Python (NumPy)

2.0 # change in b is reflected in a• Take a copy to avoid referencing via slices:

>>> b = a[1,:].copy()

>>> print a[1,1]

12.0

>>> b[1] = 2 # b and a are two different arrays now

>>> print a[1,1]

12.0 # a is not affected by change in b

Page 66: Introduction to Scientific Computing

66 Python in SC 4.1 Numerical Python (NumPy)

4.1.12 Integer arrays as indices

• An integer array or list can be used as (vectorized) index >>> a = linspace(1, 8, 8)

>>> a

array([ 1., 2., 3., 4., 5., 6., 7., 8.])

>>> a[[1,6,7]] = 10

>>> a # ?

array([ 1., 10., 3., 4., 5., 6., 10., 10.])

>>> a[range(2,8,3)] = -2

>>> a # ?

array([ 1., 10., -2., 4., 5., -2., 10., 10.])

>>> a[a < 0] # pick out the negative elements of a

array([-2., -2.])

>>> a[a < 0] = a.max()

>>> a # ?

Page 67: Introduction to Scientific Computing

67 Python in SC 4.1 Numerical Python (NumPy)

array([ 1., 10., 10., 4., 5., 10., 10., 10.])• Such array indices are important for efficient vectorized code

4.1.13 Loops over arrays

• Standard loop over each element:

for i in xrange(a.shape[0]):

for j in xrange(a.shape[1]):

a[i,j] = (i+1)*(j+1)*(j+2)

print ’a[%d,%d]=%g ’ % (i,j,a[i,j]),

print # newline after each row

Page 68: Introduction to Scientific Computing

68 Python in SC 4.1 Numerical Python (NumPy)

• A standard for loop iterates over the first index: >>> print a

[[ 2. 6. 12.]

[ 4. 12. 24.]]

>>> for e in a:

... print e

...

[ 2. 6. 12.]

[ 4. 12. 24.]• View array as one-dimensional and iterate over all elements:

for e in a.flat:

print e

Page 69: Introduction to Scientific Computing

69 Python in SC 4.1 Numerical Python (NumPy)

• For loop over all index tuples and values:

>>> for index, value in ndenumerate(a):

... print index, value

...

(0, 0) 2.0

(0, 1) 6.0

(0, 2) 12.0

(1, 0) 4.0

(1, 1) 12.0

(1, 2) 24.0

Page 70: Introduction to Scientific Computing

70 Python in SC 4.1 Numerical Python (NumPy)

4.1.14 Array computations

• Arithmetic operations can be used with arrays:

b = 3*a - 1 # a is array, b becomes array

1) compute t1 = 3*a, 2) compute t2= t1 - 1, 3) set b = t2

Page 71: Introduction to Scientific Computing

71 Python in SC 4.1 Numerical Python (NumPy)

• Array operations are much faster than element-wise operations:

>>> import time # module for measuring CPU time

>>> a = linspace(0, 1, 1E+07) # create some array

>>> t0 = time.clock()

>>> b = 3*a -1

>>> t1 = time.clock() # t1-t0 is the CPU time of 3*a-1

>>> for i in xrange(a.size): b[i] = 3*a[i] - 1

>>> t2 = time.clock()

>>> print ’3*a-1: %g sec, loop: %g sec’ % (t1-t0, t2-t1)

3*a-1: 2.09 sec, loop: 31.27 sec4.1.15 In-place array arithmetics

• Expressions like 3*a-1 generates temporary arrays

Page 72: Introduction to Scientific Computing

72 Python in SC 4.1 Numerical Python (NumPy)

• With in-place modifications of arrays, we can avoid temporary arrays (to someextent)

b = a

b *= 3 # or multiply(b, 3, b)

b -= 1 # or subtract(b, 1, b)Note: a is changed, use b = a.copy()

Page 73: Introduction to Scientific Computing

73 Python in SC 4.1 Numerical Python (NumPy)

• In-place operations: a *= 3.0 # multiply a’s elements by 3

a -= 1.0 # subtract 1 from each element

a /= 3.0 # divide each element by 3

a += 1.0 # add 1 to each element

a **= 2.0 # square all elements• Assign values to all elements of an existing array:

a[:] = 3*c - 1

Page 74: Introduction to Scientific Computing

74 Python in SC 4.1 Numerical Python (NumPy)

4.1.16 Standard math functions can take array arguments # let b be an array

c = sin(b)

c = arcsin(c)

c = sinh(b)

# same functions for the cos and tan families

c = b**2.5 # power function

c = log(b)

c = exp(b)

c = sqrt(b)

Page 75: Introduction to Scientific Computing

75 Python in SC 4.1 Numerical Python (NumPy)

4.1.17 Other useful array operations # a is an array

a.clip(min=3, max=12) # clip elements

a.mean(); mean(a) # mean value

a.var(); var(a) # variance

a.std(); std(a) # standard deviation

median(a)

cov(x,y) # covariance

trapz(a) # Trapezoidal integration

diff(a) # finite differences (da/dx) # more Matlab-like functions:

corrcoeff, cumprod, diag, eig, eye, fliplr, flipud, max,

min,

prod, ptp, rot90, squeeze, sum, svd, tri, tril, triu

Page 76: Introduction to Scientific Computing

76 Python in SC 4.1 Numerical Python (NumPy)

4.1.18 More useful array methods and attributes >>> a = zeros(4) + 3

>>> a

array([ 3., 3., 3., 3.]) # float data

>>> a.item(2) # more efficient than a[2]

3.0

>>> a.itemset(3,-4.5) # more efficient than a[3]=-4.5

>>> a

array([ 3. , 3. , 3. , -4.5])

>>> a.shape = (2,2)

>>> a

array([[ 3. , 3. ],

[ 3. , -4.5]])

Page 77: Introduction to Scientific Computing

77 Python in SC 4.1 Numerical Python (NumPy)

>>> a.ravel() # from multi-dim to one-dim

array([ 3. , 3. , 3. , -4.5])

>>> a.ndim # no of dimensions

2

>>> len(a.shape) # no of dimensions

2

>>> rank(a) # no of dimensions

2

>>> a.size # total no of elements

4

>>> b = a.astype(int) # change data type

>>> b

array([3, 3, 3, 3])

Page 78: Introduction to Scientific Computing

78 Python in SC 4.1 Numerical Python (NumPy)

4.1.19 Complex number computing >>> from math import sqrt

>>> sqrt(-1) # ?

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

ValueError: math domain error

>>> from numpy import sqrt

>>> sqrt(-1) # ?

Warning: invalid value encountered in sqrt

nan

>>> from cmath import sqrt # complex math functions

>>> sqrt(-1) # ?

1j

>>> sqrt(4) # cmath functions always return complex...

(2+0j)

Page 79: Introduction to Scientific Computing

79 Python in SC 4.1 Numerical Python (NumPy)

>>> from numpy.lib.scimath import sqrt

>>> sqrt(4)

2.0 # real when possible

>>> sqrt(-1)

1j # otherwise complex

Page 80: Introduction to Scientific Computing

80 Python in SC 4.1 Numerical Python (NumPy)

4.1.20 A root function # Goal: compute roots of a parabola, return real when possible,

# otherwise complex

def roots(a, b, c):

# compute roots of a*x^2 + b*x + c = 0

from numpy.lib.scimath import sqrt

q = sqrt(b**2 - 4*a*c) # q is real or complex

r1 = (-b + q)/(2*a)

r2 = (-b - q)/(2*a)

return r1, r2

>>> a = 1; b = 2; c = 100

>>> roots(a, b, c) # complex roots

((-1+9.94987437107j), (-1-9.94987437107j))

>>> a = 1; b = 4; c = 1

>>> roots(a, b, c) # real roots

(-0.267949192431, -3.73205080757)

Page 81: Introduction to Scientific Computing

81 Python in SC 4.1 Numerical Python (NumPy)

4.1.21 Array type and data type >>> import numpy

>>> a = numpy.zeros(5)

>>> type(a)

<type ’numpy.ndarray’>

>>> isinstance(a, ndarray) # is a of type ndarray?

True

>>> a.dtype # data (element) type object

dtype(’float64’)

>>> a.dtype.name

’float64’

>>> a.dtype.char # character code

’d’

>>> a.dtype.itemsize # no of bytes per array element

8

Page 82: Introduction to Scientific Computing

82 Python in SC 4.1 Numerical Python (NumPy)

>>> b = zeros(6, float32)

>>> a.dtype == b.dtype # do a and b have the same data type?

False

>>> c = zeros(2, float)

>>> a.dtype == c.dtype

True

Page 83: Introduction to Scientific Computing

83 Python in SC 4.1 Numerical Python (NumPy)

4.1.22 Matrix objects

• NumPy has an array type, matrix, much like Matlab’s array type >>> x1 = array([1, 2, 3], float)

>>> x2 = matrix(x) # or just mat(x)

>>> x2 # row vector

matrix([[ 1., 2., 3.]])

>>> x3 = mat(x).transpose() # column vector

>>> x3

matrix([[ 1.],

[ 2.],

[ 3.]])

>>> type(x3)

<class ’numpy.core.defmatrix.matrix’>

>>> isinstance(x3, matrix)

True

Page 84: Introduction to Scientific Computing

84 Python in SC 4.1 Numerical Python (NumPy)

• Only 1- and 2-dimensional arrays can be matrix

• For matrix objects, the * operator means matrix-matrix or matrix-vector multi-plication (not elementwise multiplication):

>>> A = eye(3) # identity matrix

>>> A = mat(A) # turn array to matrix

>>> A

matrix([[ 1., 0., 0.],

[ 0., 1., 0.],

[ 0., 0., 1.]])

>>> y2 = x2*A # vector-matrix product

>>> y2

matrix([[ 1., 2., 3.]])

>>> y3 = A*x3 # matrix-vector product

>>> y3

matrix([[ 1.],

[ 2.],

[ 3.]])

Page 85: Introduction to Scientific Computing

85 Python in SC 4.2 NumPy: Vectorisation

4.2 NumPy: Vectorisation

• Loops over an array run slowly

• Vectorization = replace explicit loops by functions calls such that the whole loopis implemented in C (or Fortran)

• Explicit loops: r = zeros(x.shape, x.dtype)

for i in xrange(x.size):

r[i] = sin(x[i])• Vectorised version:

r = sin(x)

Page 86: Introduction to Scientific Computing

86 Python in SC 4.2 NumPy: Vectorisation

• Arithmetic expressions work for both scalars and arrays

• Many fundamental functions work for scalars and arrays

• Ex: x**2 + abs(x) works for x scalar or array

A mathematical function written for scalar arguments can (normally) take a arrayarguments: >>> def f(x):

... return x**2 + sinh(x)*exp(-x) + 1

...

>>> # scalar argument:

>>> x = 2

>>> f(x)

5.4908421805556333

>>> # array argument:

>>> y = array([2, -1, 0, 1.5])

Page 87: Introduction to Scientific Computing

87 Python in SC 4.2 NumPy: Vectorisation

>>> f(y)

array([ 5.49084218, -1.19452805, 1. ,

3.72510647])

4.2.1 Vectorisation of functions with if tests; problem

• Consider a function with an if test: def somefunc(x):

if x < 0:

return 0

else:

return sin(x)

# or

def somefunc(x): return 0 if x < 0 else sin(x)

Page 88: Introduction to Scientific Computing

88 Python in SC 4.2 NumPy: Vectorisation

• This function works with a scalar x but not an array

• Problem: x<0 results in a boolean array, not a boolean value that can be used inthe if test

>>> x = linspace(-1, 1, 3); print x

[-1. 0. 1.]

>>> y = x < 0

>>> y

array([ True, False, False], dtype=bool)

>>> ’ok’ if y else ’not ok’ # test of y in scalar

boolean context

...

ValueError: The truth value of an array with more than

one

element is ambiguous. Use a.any() or a.all()

Page 89: Introduction to Scientific Computing

89 Python in SC 4.2 NumPy: Vectorisation

4.2.2 Vectorisation of functions with if tests; solutions

A. Simplest remedy: call NumPy’s vectorize function to allow array arguments toa function: >>> somefuncv = vectorize(somefunc, otypes=’d’)

>>> # test:

>>> x = linspace(-1, 1, 3); print x

[-1. 0. 1.]

>>> somefuncv(x) # ?

array([ 0. , 0. , 0.84147098])Note: The data type must be specified as a character

• The speed of somefuncv is unfortunately quite slow

Page 90: Introduction to Scientific Computing

90 Python in SC 4.2 NumPy: Vectorisation

B. A better solution, using where: def somefunc_NumPy2(x):

x1 = zeros(x.size, float)

x2 = sin(x)

return where(x < 0, x1, x2)

Page 91: Introduction to Scientific Computing

91 Python in SC 4.2 NumPy: Vectorisation

4.2.3 General vectorization of if-else tests def f(x): # scalar x

if condition:

x = <expression1>

else:

x = <expression2>

return x def f_vectorized(x): # scalar or array x

x1 = <expression1>

x2 = <expression2>

return where(condition, x1, x2)

Page 92: Introduction to Scientific Computing

92 Python in SC 4.2 NumPy: Vectorisation

4.2.4 Vectorization via slicing

• Consider a recursion scheme (which arises from a one-dimensional diffusionequation)

• Straightforward (slow) Python implementation:

n = size(u)-1

for i in xrange(1,n,1):

u_new[i] = beta*u[i-1] + (1-2*beta)*u[i] + beta*u[i+1]• Slices enable us to vectorize the expression:

u[1:n] = beta*u[0:n-1] + (1-2*beta)*u[1:n] + beta*u[2:n+1]

Page 93: Introduction to Scientific Computing

93 Python in SC 4.3 NumPy: Random numbers

4.3 NumPy: Random numbers

• Drawing scalar random numbers: import random

random.seed(2198) # control the seed

print ’uniform random number on (0,1):’, random.random()

print ’uniform random number on (-1,1):’, random.uniform(-1,1)

print ’Normal(0,1) random number:’, random.gauss(0,1)• Vectorized drawing of random numbers (arrays):

from numpy import random

random.seed(12) # set seed

u = random.random(n) # n uniform numbers on (0,1)

u = random.uniform(-1, 1, n) # n uniform numbers on (-1,1)

u = random.normal(m, s, n) # n numbers from N(m,s)

Page 94: Introduction to Scientific Computing

94 Python in SC 4.3 NumPy: Random numbers

• Note that both modules have the name random! A remedy:

import random as random_number # rename random for scalars

from numpy import * # random is now numpy.random

Page 95: Introduction to Scientific Computing

95 Python in SC 4.4 NumPy: Basic linear algebra

4.4 NumPy: Basic linear algebra

NumPy contains the linalg module for

• solving linear systems

• computing the determinant of a matrix

• computing the inverse of a matrix

• computing eigenvalues and eigenvectors of a matrix

• solving least-squares problems

• computing the singular value decomposition of a matrix

• computing the Cholesky decomposition of a matrix

Page 96: Introduction to Scientific Computing

96 Python in SC 4.4 NumPy: Basic linear algebra

4.4.1 A linear algebra session 1 from numpy import * # includes import of linalg

2 n=100 # fill matrix A and vectors x and b:

3 A=random.uniform(0.0,1.0,(n,n)); x=random.uniform(-1,1,n)

4 b = dot(A, x) # matrix-vector product

5 y = linalg.solve(A, b) # solve A*y = b

6 if allclose(x, y, atol=1.0E-12, rtol=1.0E-12):

7 print ’--correct solution’

8 d = linalg.det(A); B = linalg.inv(A)

9 # check result:

10 R = dot(A, B) - eye(n) # residual

11 R_norm = linalg.norm(R) # Frobenius norm of matrix R

12 print ’Residual R = A*A-inverse - I:’, R_norm

13 A_eigenvalues = linalg.eigvals(A) # eigenvalues only

14 A_eigenvalues, A_eigenvectors = linalg.eig(A)

15 for e, v in zip(A_eigenvalues, A_eigenvectors):

16 print ’eigenvalue %g has corresponding vector\n%s’ % (e, v)

Page 97: Introduction to Scientific Computing

97 Python in SC 4.5 Python: Plotting modules

4.5 Python: Plotting modules

By default python environments come with:

• Interface to Gnuplot (curve plotting, 2D scalar and vector fields)

• Matplotlib (curve plotting, 2D scalar and vector fields)

• 3D: Tachyon (ray-tracing) Jmol (interactive plotting)

Available Python interfaces to:

• Interface to Vtk (2D/3D scalar andvector fields)

• Interface to OpenDX (2D/3D scalarand vector fields)

• Interface to IDL

• Interface to Grace

• Interface to Matlab

• Interface to R

• Interface to Blender

• PyX (PostScript/TEX-like drawing)

Page 98: Introduction to Scientific Computing

98 Python in SC 4.5 Python: Plotting modules

from numpy import *n = 100 ; x = linspace(0.0, 1.0, n); y = linspace(0.0,

1.0, n)

a=-2; b=3; c=7

z_line = a*x +b*y + c

rscal=0.05

xx = x + random.normal(0, rscal, n)

yy = y + random.normal(0, rscal, n)

zz = z_line + random.normal(0, rscal, n)

A = array([xx, yy, ones(n)])

A = A.transpose()

result = linalg.lstsq(A, zz)

aa, bb, cc = result[0]

p0 = (x[0], y[0], a*x[0]+b*y[0]+c)

p1 = (x[n-1],y[n-1],a*x[n-1]+b*y[n-1]+c)

Page 99: Introduction to Scientific Computing

99 Python in SC 4.5 Python: Plotting modules

pp=[(xx[i],yy[i],zz[i]) for i in range(len(x))]

p=[(x[i],y[i],z_line[i]) for i in range(len(x))]

pp0 = (xx[0], yy[0], aa*xx[0]+bb*yy[0]+cc); pp1 = (xx[n

-1],yy[n-1],aa*xx[n-1]+bb*yy[n-1]+cc)

G=line3d([p0,p1],color=’blue’)

G=G+list_plot(pp,color=’red’,opacity=0.2)

G=G+line3d([pp0, pp1], color=’red’)

G=G+text3d(’Blue - original line: ’+’%.4f*x+%.4f*y+%.4f’

%(a,b,c), (p[0][0], p[0][1], p[0][2]), color=’blue’)

G=G+text3d(’Red - fitted line: ’+’%.4f*x+%.4f*y+%.4f’ %

(aa,bb,cc), (p[n-1][0], p[n-1][1], p[n-1][2]), color=’

red’)

show(G)

Page 100: Introduction to Scientific Computing

100 Python in SC 4.5 Python: Plotting modules

Page 101: Introduction to Scientific Computing

101 Python in SC 4.6 I/O

4.6 I/O

4.6.1 File I/O with arrays; plain ASCII format

• Plain text output to file (just dump repr(array)): a = linspace(1, 21, 21); a.shape = (2,10)

# In case of Sage Notebook, use the variable DATA, which

holds the current working directory name for current

worksheet

file = open(DATA+’tmp.dat’, ’w’)

file.write(’Here is an array a:\n’)

file.write(repr(a)) # dump string representation of a

file.close()(If you need the objects in a different worksheet, use the directory name that was

stored in variable DATA of the original worksheet...)

Page 102: Introduction to Scientific Computing

102 Python in SC 4.6 I/O

• Plain text input (just take eval on input line):

file = open(DATA+’tmp.dat’, ’r’)

file.readline() # load the first line (a comment)

b = eval(file.read())

file.close()

Page 103: Introduction to Scientific Computing

103 Python in SC 4.6 I/O

4.6.2 File I/O with arrays; binary pickling

• Dump (serialized) arrays with cPickle:

# a1 and a2 are two arrays

import cPickle

file = open(DATA+’tmp.dat’, ’wb’)

file.write(’This is the array a1:\n’)

cPickle.dump(a1, file)

file.write(’Here is another array a2:\n’)

cPickle.dump(a2, file)

file.close()

Page 104: Introduction to Scientific Computing

104 Python in SC 4.6 I/O

Read in the arrays again (in correct order): file = open(DATA+’tmp.dat’, ’rb’)

file.readline() # swallow the initial comment line

b1 = cPickle.load(file)

file.readline() # swallow next comment line

b2 = cPickle.load(file)

file.close()Almost all Python object x can be saved in compressed form to disk using

save(x,filename) (or in many cases x.save(filename)) A=matrix(RR,10,range(100))

save(A,’A’) B=load(’A’)

Page 105: Introduction to Scientific Computing

105 Python in SC 4.7 SciPy

4.7 SciPy

4.7.1 Overview

• SciPy is a comprehensive package (by Eric Jones, Travis Oliphant, Pearu Peter-son) for scientific computing with Python

• Much overlap with ScientificPython

• SciPy interfaces many classical Fortran packages from Netlib (QUADPACK,ODEPACK, MINPACK, ...)

• Functionality: special functions, linear algebra, numerical integration, ODEs,random variables and statistics, optimization, root finding, interpolation, ...

• May require some installation efforts (applies ATLAS)

See SciPy homepage (http://www.scipy.org)

Page 106: Introduction to Scientific Computing

106 Systems of LE 5.1 Systems of Linear Equations

5 Solving Systems of Linear Equations

5.1 Systems of Linear Equations

System of linear equations:

a11x1 +a12x2 + ...+a1nxn = b1

a21x1 +a22x2 + ...+a2nxn = b2

. . . . . . . . . . . . . .

am1x1 +am2x2 + ...+amnxn = bm

Page 107: Introduction to Scientific Computing

107 Systems of LE 5.1 Systems of Linear Equations

Matrix form: A- given matrix, vector b - given; vector of unknowns xa11 a12 · · · a1n

a21 a22 · · · a2n...

... . . . ...am1 am2 · · · amn

x1

x2...

xn

=

b1

b2...

bm

or Ax = b (8)

Suppose,

• m > n – overdetermined system; Does this system have a solution?a system with more equations than unknownshas usually no solution

• m < n – underdetermined system; How many solutions does it have?a system with fewer equations than unknownshas usually infinitely many solutions

• m = n – what about the solution in this case?a system with the same number of equations and unknowns has usuallya single unique solution←− we will deal only with this case now

Page 108: Introduction to Scientific Computing

108 Systems of LE 5.2 Classification

5.2 Classification

Two main types of systems of linear equations: systems with

• full matrix

– most of the values are nonzero

– how to store it?storage in a 2D array

• sparse matrix

– most of the matrix values are zero

– How to store such matrices?storing in a full matrix system would be waste of memory

∗ different sparse matrix storage schemes

Quite different strategies for solution of systems with full or sparse matrices

Page 109: Introduction to Scientific Computing

109 Systems of LE 5.2 Classification

5.2.1 Problem Transformation

• Common strategy is to modify the problem (8) such that

– the solution remains the same

– modified problem – more easy to solve

What kind of transformations do not change the solution?

• Possible to multiply the both sides of the equation (8) with an arbitrary nonsin-gular matrix M without the change in solution.

– To check it, notice that the solution of MAz = Mb is:

z = (MA)−1Mb = A−1M−1Mb = A−1b = x.

Page 110: Introduction to Scientific Computing

110 Systems of LE 5.2 Classification

– For example, M = D – diagonal matrix,

– or M = P permutation matrix

NB! Although, theoretically the multiplication of (8) with nonsingular matrix M doesnot change the solution, we will see later that it may change the numerical process ofthe solution and the exactness of the solution...

The next question we ask: what type of systems are easy to solve?

Page 111: Introduction to Scientific Computing

111 Systems of LE 5.3 Triangular linear systems

5.3 Triangular linear systemsIf the system matrix A has a row i with a nonzero only on the diagonal, it is easy

to calculate xi = bi/aii; if now there is row j where except the diagnal a j j 6= 0 theonly nonzero is at position a ji, we find that x j = (b j− a jixi)/a j j and again, if thereexist a row k such that akk 6= 0 and akl = 0 if l 6= i, j, we can have xk = (bk−akixi−ak jx j)/akk etc.

• Such systems – easy to solve

• called triangular systems.

With rearranging rows and unknowns (columns) it is possible to transform the systemto Lower Triangular form L or Upper Triangular form U :

L =

`11 0 · · · 0`21 `22 · · · 0...

... . . . ...`n1 `n2 · · · `nn

, U =

u11 u12 · · · u1n

0 u22 · · · u2n...

... . . . ...0 0 · · · unn

Page 112: Introduction to Scientific Computing

112 Systems of LE 5.3 Triangular linear systems

L =

`11 0 · · · 0`21 `22 · · · 0...

... . . . ...`n1 `n2 · · · `nn

, U =

u11 u12 · · · u1n

0 u22 · · · u2n...

... . . . ...0 0 · · · unn

Solving system Lx = b is calledForward Substitution :x1 = b1/`11

xi =(

bi−∑i−1j=1 `i jx j

)/`ii

Solving system Ux = b is calledBack Substitution :xn = bn/unn

xi =(

bi−∑nj=i+1 ui jx j

)/uii

But how to transform an arbitrary matrix to a triangular form?

Page 113: Introduction to Scientific Computing

113 Systems of LE 5.4 Elementary Elimination Matrices

5.4 Elementary Elimination Matrices

for a1 6= 0: [1 0

−a2/a1 1

][a1

a2

]=

[a1

0

].

In general case, if a = [a1,a2, ...,an] and ak 6= 0:

Mka =

1 · · · 0 0 · · · 0... . . . ...

... . . . ...0 · · · 1 0 · · · 00 · · · −mk+1 1 · · · 0... . . . ...

... . . . ...0 · · · −mn 0 · · · 1

a1...

ak

ak+1...

an

=

a1...

ak

0...0

,

where mi = ai/ak, i = k+1, ...,n.

Page 114: Introduction to Scientific Computing

114 Systems of LE 5.4 Elementary Elimination Matrices

The divider ak is called pivot (juhtelement)Matrix Mk is called also elementary elimination matrix or Gauss transforma-

tion

1. Mk is nonsingular⇐= Why?being lower triangular and unit diagonal

2. Mk = I−meTk , where m = [0, ...,0,mk+1, ...,mn]

T and ek is column k of unitmatrix

3. Lk =(de f ) M−1

k = I +meTk .

4. If M j, j > k is some other elementary elimination matrix with multiplicationvector t, then

MkM j = I−meTk − teT

j +meTk teT

j = I−meTk − teT

j ,

due to eTk t = 0.

Page 115: Introduction to Scientific Computing

115 Systems of LE 5.5 Gauss Elimination and LU Factorisation

5.5 Gauss Elimination and LU Factorisation

• apply series of Gauss elimination matrices from the left: M1, M2,...,Mn−1, takingM = Mn−1 · · ·M1

– we get the linear system:

MAx = Mn−1 · · ·M1Ax = Mn−1 · · ·M1b = Mb

– upper triangular =⇒

∗ easy to solve.

The process is called Gauss Elimination Method (GEM)

Page 116: Introduction to Scientific Computing

116 Systems of LE 5.5 Gauss Elimination and LU Factorisation

• Denoting U = MA and L = M−1, we get that

L = M−1 = (Mn−1 · · ·M1)−1 = M−1

1 · · ·M−1n−1 = L1 · · ·Ln−1

is lower unit triangular (ones on the diagonal)

• =⇒A = LU.

Expressed in an algorithm:

Page 117: Introduction to Scientific Computing

117 Systems of LE 5.5 Gauss Elimination and LU Factorisation

Algoritm 5.1. LU-factorisation using Gauss elimination method (GEM) do k=1,...,n-1 # cycle over matrix columns

if akk==0 then stop # stop in case pivot == 0

do i=k+1,n

mik = aik/akk # coefficient calculation in column kenddo

do i=k+1,n

do j=k+1,n # applying transformations to

ai j = ai j−mikak j # the rest of the matrix

enddo

enddo

enddoNB! In practical implementation: For storing mik use corresponding elements in

A (will be zeroes anyway)

Page 118: Introduction to Scientific Computing

118 Systems of LE 5.6 Number of operations in GEM

5.6 Number of operations in GEM

Finding operation counts for Alg. 5.1:

• Replace loops with corresponding sums over number of particular operations:

n−1

∑i=1

(n

∑j=i+1

1+n

∑j=i+1

n

∑k=i+1

2

)

=n−1

∑i=1

((n− i)+2(n− i)2) =23

n3 +O(n2)

• used that ∑mi=1 ik =mk+1/(k+1)+O(mk) (which is enough for finding the num-

ber of operations with the highest order)

• How many operations there are for solving a triangular system?Number of operations for forward and backward substitution for L and U isO(n2)

Page 119: Introduction to Scientific Computing

119 Systems of LE 5.7 GEM with row permutations

• =⇒ the whole system solution Ax = b takes 23n3 +O(n2) operations

5.7 GEM with row permutations

• If pivot == 0 GEM won’t work

• Row permutations or partial pivoting may help

• For numerical stability, also the pivot must not be small

Example 5.1

• Consider matrix

A =

[0 11 0

]

– non-singular

– but LU-factorisation impossible without row permutations

Page 120: Introduction to Scientific Computing

120 Systems of LE 5.7 GEM with row permutations

• But on contrary, the matrix

A =

[1 11 1

]

– has the LU-factorisation

A =

[1 11 1

]=

[1 01 1

][1 10 0

]= LU

– But, what is wrong with matrix A?with A being actually singular matrix!

Page 121: Introduction to Scientific Computing

121 Systems of LE 5.7 GEM with row permutations

Example 5.2. Small pivots

• Consider

A =

[ε 11 1

],

with ε such that 0 < ε < εmach in given floating point system

– (i.e. 1+ ε = 1 in floating point arithmetics)

– Without row permutation we get (in floating-point arithmetics):

M =

[1 0−1/ε 1

]=⇒ L =

[1 0

1/ε 1

],U =

[ε 10 1−1/ε

]=

[ε 10 −1/ε

]

• But then

LU =

[1 0

1/ε 1

][ε 10 −1/ε

]=

[ε 11 0

]6= A

Page 122: Introduction to Scientific Computing

122 Systems of LE 5.7 GEM with row permutations

Using row permutation

• the pivot is 1;

• multiplier −ε =⇒

M =

[1 0−ε 1

]=⇒ L =

[1 0ε 1

],U =

[1 10 1− ε

]=

[1 10 1

]

in floating point arithmetics

• =⇒

LU =

[1 0ε 1

][1 10 1

]=

[1 1ε 1

]← OK!

Page 123: Introduction to Scientific Computing

123 Systems of LE 5.7 GEM with row permutations

Algoritm 5.2. LU-factorisation with GEM using row permutations do k=1,...,n-1 # cycle over matrix columns

Find index p such that: # looking for the best pivot

|apk| ≥ |aik|, k ≤ i≤ n # in given column

if p 6= k then interchange rows k and pif akk = 0 then continue with next k # skip such column

do i=k+1,n

mik = aik/akk # multiplier calculation in column kenddo

do i=k+1,n

do j=k+1,n # transformation application

ai j = ai j−mikak j # to the rest of the matrix

enddo

enddo

enddo

Page 124: Introduction to Scientific Computing

124 Systems of LE 5.7 GEM with row permutations

As a result, MA =U , where U upper-triangular, OK so far?but actually

M = Mn−1Pn−1 · · ·M1P1

• M−1 still lower-triangular?is not lower-triangular any more, although it is still denoted by L

– but is it triangular?we have still triangular L

– knowing the permutations P = Pn−1 · · ·P1 in advance would give

PA = LU,

where L indeed lower triangular matrix

Page 125: Introduction to Scientific Computing

125 Systems of LE 5.7 GEM with row permutations

• But, do we really need to actually perform the row exchanges – explicitly?instead of row exchanges we can perform just appropriate mapping ofmatrix (and vector) indexes

– We start with unit index mapping p = [1,2,3,4, ...,n]

– If rows i and j need to be exchanged, we exchange corresponding valuesp[i] and p[ j]

– In the algorithm, take everywhere ap[i] j instead of ai j (and other arrayscorrespondingly)

To solve the system Ax = b (8) How does the whole algorithm look like now?

• Solve the lower triangular system Ly = Pb with forward substitution

• Solve the upper triangular system Ux = y backward substitution

The term partial pivoting comes from the fact that we are seeking for the best pivotonly in the current column (starting from the diagonal and below) of the matrix

Page 126: Introduction to Scientific Computing

126 Systems of LE 5.7 GEM with row permutations

• Complete pivoting – the best pivot is chosen from the whole remaining part ofthe matrix

– This means exchanging both rows and columns of the matrix

PAQ = LU,

where P and Q are permutation matrices.

– The system is solved in three stages: Ly = Pb; Uz = y and x = Qz

Although numerical stability is better in case of complete pivoting it is rarely used,because

• more costly

• usually not needed

Page 127: Introduction to Scientific Computing

127 Systems of LE 5.8 Reliability of the LU-factorisation with partial pivoting

5.8 Reliability of the LU-factorisation with partial pivoting

Introduce the vector norm:

‖x‖∞= maxi|xi|, x ∈ Rn

and corresponding matrix norm:

‖A‖∞ = supx∈Rn‖Ax‖∞

‖x‖∞

, x ∈ Rn.

Looking at the rounding errors in GEM, it can be shown that actually we find L, andU which satisfy the relation:

PA = LU−E,

where error E can be estimated:

‖E‖∞ ≤ nε‖L‖∞‖U‖∞ (9)

Page 128: Introduction to Scientific Computing

128 Systems of LE 5.8 Reliability of the LU-factorisation with partial pivoting

where ε is machine epsilonIn practice we replace the system PAx = Pb with the system LU = Pb. =⇒the

system we are solving is actually

(PA+E)x = Pb

for finding the approximate solution x.

• How far is it from the real solution?

From the matrix perturbation theory:Let us solve the system

Ax = b (10)

where A ∈ Rn×n and b ∈ Rn are given and x ∈ Rn is unknown. Suppose, A is givenwith an error A = A+δA. Perturbed solution satisfies the system of linear equations

(A+δA)(x+δx) = b. (11)

Page 129: Introduction to Scientific Computing

129 Systems of LE 5.8 Reliability of the LU-factorisation with partial pivoting

Theorem 5.1. Let A be nonsingular and δA be sufficiently small, such that

‖δA‖∞‖A−1‖∞ ≤12. (12)

Then (A+δA) is nonsingular and

‖δx‖∞

‖x‖∞

≤ 2κ(A)‖δA‖∞

‖A‖∞

, (13)

where κ(A) = ‖A‖∞‖A−1‖∞ is the condition number.Proof of the theorem (http://www.ut.ee/~eero/SC/konspekt/

perturb-toest/perturb-proof.pdf)(EST) (http://www.ut.ee/~eero/SC/konspekt/perturb-toest/

perturb-toest.pdf)

Page 130: Introduction to Scientific Computing

130 Systems of LE 5.8 Reliability of the LU-factorisation with partial pivoting

Remarks

1. The result is true for an arbitrary matrix norm derived from the correspondingvector norm

2. Theorem 5.1 says, that in case of small condition number a small relative errorin matrix A can cause small or large?only a small error in the solution x

3. If condition number is big, what can happen?everything can happen

4. It is not simple to calculate condition number but still, it can be estimated

Combining the result (9) with Theorem 5.1, we see that:GEM forward error can be estimated as follows:

‖δx‖∞

‖x‖∞

≤ 2nεκ(A)G,

where the coefficient G =[‖L‖∞‖U‖∞

‖A‖∞

]is called growth factor

Page 131: Introduction to Scientific Computing

131 Systems of LE 5.8 Reliability of the LU-factorisation with partial pivoting

Conclusions

• If G is not large, then well-conditioned matrix gives fairly good answer (i.e.O(nε)).

• In case of partial pivoting, the elements of matrix L are ≤ 1.

– Nevertherless, ∃ examples, where with well-conditioned matrix A the ele-ments of U exponentially large in comparison with the elements in A.

– But these examples are more like “academic” – in practice rarely one findssuch (and the method can be used without any fair)

Page 132: Introduction to Scientific Computing

132 BLAS 6.1 Motivation

6 BLAS (Basic Linear Algebra Subroutines)

6.1 Motivation

How to optimise programs that use a lot of linear algebra operations?

• Efficiency depends on

– processor speed

– number of arithmeticoperations

• but also on:

– the speed of memoryreferences

• Hierarchical memory structure:

Page 133: Introduction to Scientific Computing

133 BLAS 6.1 Motivation

Tape, CD, DVD, Network storage

HardDisk

RAM

cache

Registers

slow, large, cheap

fast, small, expensive

• Where are arithmetic operations performed on the picture?Useful arithmetic operations only on the top of the hierarchy

• before operations: what is data movement direction before/after arithmetic op.?data needs to be moved up; after operations the data is moveddown the hierarchy

• information movement faster on top or bottom?information movement faster on top

• What is faster, speed of arithmetic operations or data movement?As a rule: speed of arithmetic operations faster than data movement

Page 134: Introduction to Scientific Computing

134 BLAS 6.1 Motivation

Consider an arbitrary algorithm. Denote:

• f – flops (# arithmetic operations: +, - ×, /)

• m – # memory references

Introduce q = f/m.

Why this number is important?Why this number is important? To prefer it to be small or large value?

• t f – time spent on 1 flop • tm time for memory access,

Then calculation time is:

f ∗ t f +m∗ tm = f ∗ t f (1+1q(tmt f))

In general, tm t f and therefore, total time is reflecting processor speed only if q islarge or small?large

Page 135: Introduction to Scientific Computing

135 BLAS 6.1 Motivation

Example. Gauss elimination method – for each i – key operations:

A(i+1 : n, i) = A(i+1 : n, i)/A(i, i), (14)

A(i+1 : n, i+1 : n) = A(i+1 : n, i+1 : n)−A(i+1 : n, i)∗A(i, i+1 : n). (15)

• Operation (14) represents following general operation:

y = ax+y, x,y ∈ Rn, a ∈ R (16)

– Operation (16) called saxpy ((single-precision) a times x plus y)

• (15) represents:A = A−vwT , A ∈ Rn×n, v,w ∈ Rn (17)

– (17) – matrix A rank-1 update, (Matrix vwT has rank 1, but Why?, because each rowof it is a multiple of vector w)

Page 136: Introduction to Scientific Computing

136 BLAS 6.1 Motivation

Operation (16) analysis:

• m = 3n+1 memory references:

– 2n+1 reads

∗ vectors x, y

∗ scalar a

– n writes

∗ new y

• Computations take f = 2n flops

• =⇒ q = 2/3 + O(1/n) ≈ 2/3 forlarge n

Operation (17) analysis:

• m = 2n2 +2n memory references:

– n2 +2n reads

– n2 writes

• Computations – f = 2n2 flops

• =⇒ q = 1+O(1/n)≈ 1 with largen

(16) – 1st order operation (O(n) flops); (17) – 2nd order operation (O(n2) flops).Note that coefficient q is O(1) in both cases

Page 137: Introduction to Scientific Computing

137 BLAS 6.1 Motivation

Faster results in case of 3rd order operations (O(n3) operations with O(n2) mem-ory references). For example, matrix multiplication:

C = AB+C, A, B,C ∈ Rn×n. (18)

Here m = 4n2 and f = n2(2n−1)+n2 = 2n3 (check it!) =⇒ q = n/2→ ∞ if n→ ∞.This operation can give processor work near peak performance, with good algorithmscheduling!

Page 138: Introduction to Scientific Computing

138 BLAS 6.2 BLAS implementations

6.2 BLAS implementations

• BLAS – standard library for simple 1st, 2nd and 3rd order operations

– BLAS – freeware, available for example from netlib (http://www.netlib.org/blas/)

– Processor vendors often supply their own implementation

– BLAS ATLAS implementation ATLAS (http://math-atlas.sourceforge.net/) – self-optimising code

– OpenBLAS - supporting x86, x86-64, MIPS and ARM processors.

Example of using BLAS (fortran90):

• LU factorisation using BLAS3 operations (http://www.ut.ee/~eero/SC/konspekt/Naited/lu1blas3.f90.html)

• main program for testing different BLAS levels (http://www.ut.ee/~eero/SC/konspekt/Naited/testblas3.f90.html)

Page 139: Introduction to Scientific Computing

139 DE 7.1 Ordinary Differential Equations (ODE)

7 Numerical Solution of Differential Equations

Differential equation – mathematical equation for an unknown function of oneor several variables that relates the values of the function itself and its derivatives ofvarious orders

Example: the velocity of a ball falling through the air, considering only gravityand air resistance.

Order of Differential Equation – the highest derivative of the dependent variablewith respect to the independent variable

7.1 Ordinary Differential Equations (ODE)

Ordinary Differential Equation (ODE) is a differential equation in which theunknown function (also known as the dependent variable) is a function of a singleindependent variable

Page 140: Introduction to Scientific Computing

140 DE 7.1 Ordinary Differential Equations (ODE)

Initial value problem

y′(t) = f (t,y(t)), y(t0) = y0,

where function f : [to,∞)×Rd → Rd , y0 ∈ Rd - initial condition

• (Boundary value problem - giving the solution at more than one point (onboundaries))

We consider here only first-order ODEs

• Higher ODE can be converted into system of first order ODE

– Example: y′′ = −y can be rewritten as two first-order equations: y′ = zand z′ =−y.

7.1.1 Numerical methods for solving ODEs

Page 141: Introduction to Scientific Computing

141 DE 7.1 Ordinary Differential Equations (ODE)

Euler method (or forward Euler method)

• finite difference approximation

y′(t)≈ y(t +h)− y(t)h

⇒ y(t +h)≈ y(t)+hy′(t)⇒ y(t +h)≈ y(t)+h f (t,y(t))Start with t0, t1 = t0 +h, t2 = t0 +2h, etc.

yn+1 = yn +h f (tn,yn).

• Explicit method – the new value (yn+1) depends on values already known (yn)

Page 142: Introduction to Scientific Computing

142 DE 7.1 Ordinary Differential Equations (ODE)

Backward Euler method

• Different finite difference version:

y′(t)≈ y(t)− y(t−h)h

⇒ yn+1 = yn +h f (tn+1,yn+1)

• Implicit method – need to solve an equation to find yn+1!

Page 143: Introduction to Scientific Computing

143 DE 7.1 Ordinary Differential Equations (ODE)

Comparison of the methods

• Implicit methods computationally more complex

• Explicit methods can be unstable – in case of stiff equations

stiff equation – differential equation for which certain numerical methods for solvingthe equation are numerically unstable, unless the step size is taken to be extremelysmall

Page 144: Introduction to Scientific Computing

144 DE 7.1 Ordinary Differential Equations (ODE)

Examples 1

Initial value problem y′(t) =−15y(t), t ≥ 0, y(0) = 1Exact solution: y(t) = e−15t ⇒ y(t)→ 0 as t→ ∞

Explicit schemes with h = 1/4, h = 1/8Adams-Moulton scheme (Trapezoidal method)yn+1 = yn +

12h( f (tn,un)+ f (tn+1,yn+1))

Page 145: Introduction to Scientific Computing

145 DE 7.1 Ordinary Differential Equations (ODE)

Example 2Partial differential quation (see below): Wave equation (in 1D and) 2D

Wave equation

−(

∂ 2u∂x2 +

∂ 2u∂y2

)+

∂ 2u∂ t2 = f (x,y, t),

where

• u(x,y, t)– height of a surface (e.g. water level) in point (x,y) at time t

• f (x,y, t)– external force applied to the surface at time t (For simplicity heref (x,y, t) = 0)

• solving on the domain (x,y) ∈Ω = [0,1]× [0,1] at time t ∈ [0,T ]

• Dirichlet boundary conditions u(x,y, t) = 0, (x,y)∈ ∂Ω and the values of deriva-tives ∂u

∂ t |t=0 for (x,y) ∈Ω

Page 146: Introduction to Scientific Computing

146 DE 7.1 Ordinary Differential Equations (ODE)

Some examples on comparison of explicit vs implicit schemes (http://courses.cs.ut.ee/2009/sc/Main/FDMschemes)

1D wave equation failing with larger h value (http://www.ut.ee/~eero/SC/1DWaveEqExpl-failure.avi)

Page 147: Introduction to Scientific Computing

147 PDE 7.2 Partial Differential Equations (PDE)

7.2 Partial Differential Equations (PDE)

PDE overviewExamples of PDE-s:

• Laplace’s equation

– important in many fields of science,

∗ electromagnetism

∗ astronomy

∗ fluid dynamics

– behaviour of electric, gravitational, and fluid potentials

– The general theory of solutions to Laplace’s equation – potential theory

– In the study of heat conduction, the Laplace equation – the steady-stateheat equation

Page 148: Introduction to Scientific Computing

148 PDE 7.2 Partial Differential Equations (PDE)

• Maxwell’s equations – electrical and magnetical fields’ relationships– set of four partial differential equations

– describe the properties of the electric and magnetic fields and relate themto their sources, charge density and current density

• Navier-Stokes equations – fluid dynamics (dependencies between pressure,speed of fluid particles and fluid viscosity)

• Equations of linear elasticity – vibrations in elastic materials with given prop-erties and in case of compression and stretching out

• Schrödinger equations – quantum mechanics – how the quantum state of aphysical system changes in time. It is as central to quantum mechanics as New-ton’s laws are to classical mechanics

• Einstein field equations – set of ten equations in Einstein’s theory of generalrelativity – describe the fundamental interaction of gravitation as a result ofspacetime being curved by matter and energy.

Page 149: Introduction to Scientific Computing

149 PDE 7.3 2nd order PDEs

7.3 2nd order PDEs

We consider now only a single equation caseIn many practical cases, 2nd order PDE-s occur, for example:

• Heat equation: ut = uxx • Wave equation: utt = uxx

• Laplace’s equation: uxx +uyy = 0.

General second order PDE has the form: (canonical form)

auxx +buxy + cuyy +dux + euy + f u+g = 0.

Assuming not all of a, b and c zero, then depending on discriminant b2−4ac:b2−4ac > 0: hyperbolic equation, typical representative – wave equation;b2−4ac = 0: parabolic equation, typical representative – heat equationb2−4ac < 0: elliptical equation, typical representative – Poisson equation

Page 150: Introduction to Scientific Computing

150 PDE 7.3 2nd order PDEs

• In case of changing coefficient in time, equations can change their type

• In case of equation systems, each equation can be of different type

• Of course, problem can be non-linear or higher order as well

In general,

• Hyperbolic PDE-s describe time-dependent conservative physical processes likewave propagation

• Parabolic PDE-s describe time-dependent dissipative (or scattering) physicalprocesses like diffusion, which move towards some fixed-point

• Elliptic PDE-s describe systems that have reached a fixed-point and are there-fore independent of time

Page 151: Introduction to Scientific Computing

151 PDE 7.4 Time-independent PDE-s

7.4 Time-independent PDE-s

7.4.1 Finite Difference Method (FDM)

• Discrete mesh in solving region

• Derivatives replaced with approximation by finite differences

Example. Conside Poisson equation in 2D:

−uxx−uyy = f , 0≤ x≤ 1, 0≤ y≤ 1, (19)

• boundary values as on the figure on the left:

Page 152: Introduction to Scientific Computing

152 PDE 7.4 Time-independent PDE-s

0

y

x

0

1

0

y

x

0

1

0

• Define discrete nodes as on the figure on right

• Inner nodes, where computations are carried out are defined with

(xi,y j) = (ih, jh), i, j = 1, ...,n

– (in our case n = 2 and h = 1/(n+1) = 1/3)

Page 153: Introduction to Scientific Computing

153 PDE 7.4 Time-independent PDE-s

Consider here that f = 0.Replacing 2nd order derivatives with standard 2nd order differences in mid-points,

we get

ui+1, j−2ui, j +ui−1, j

h2 +ui, j+1−2ui, j +ui, j−1

h2 = 0, i, j = 1, ...,n,

where ui, j is approximation of the real solution u = u(xi,y j) in point (xi,y j), and in-cludes one boundary value, if i or j is 0 or n+1. As a result we get:

4u1,1−u0,1−u2,1−u1,0−u1,2 = 0

4u2,1−u1,1−u3,1−u2,0−u2,2 = 0

4u1,2−u0,2−u2,2−u1,2−u1,3 = 0

4u2,2−u1,2−u3,2−u2,2−u2,3 = 0.

In matrix form:

Page 154: Introduction to Scientific Computing

154 PDE 7.4 Time-independent PDE-s

Ax =

4 −1 −1 0−1 4 0 −1−1 0 4 −10 −1 −1 4

u1,1

u2,1

u1,2

u2,2

=

u0,1 +u1,0

u3,1 +u2,0

u0,2 +u1,3

u3,2 +u2,3

=

0011

= b.

This positively defined system can be solved directly with Cholesky factorisation(Gauss elimination for symmetric matrix, where factorisation A = LT L is found) oriteratively. Exact solution of the problem is:

x =

u1,1

u2,1

u1,2

u2,2

=

0.1250.1250.3750.375

.

Page 155: Introduction to Scientific Computing

155 PDE 7.4 Time-independent PDE-s

In general case n2×n2 Laplace’s matrix has form:

A =

B −I 0 · · · 0

−I B −I . . . ...

0 −I B . . . 0... . . . . . . . . . −I0 · · · 0 −I B

, (20)

where n×n matrix B is of form:

B =

4 −1 0 · · · 0

−1 4 −1 . . . ...

0 −1 4 . . . 0... . . . . . . . . . −10 · · · 0 −1 4

.

It means that most of the elements of matrix A are zero How are such matrices called?– it is a sparse matrix

Page 156: Introduction to Scientific Computing

156 PDE 7.4 Time-independent PDE-s

7.4.2 Finite element Method (FEM)

Example: 2D FEM for Poisson equationConsider u be (as an example) either.:

• temperature

• electro-magnetic potential

• displacement of an elastic mem-brane fixed at the boundary under atransversal load of intensity

Define Laplacian operator ∆ by

(∆u)(x) = (∂ 2u∂x2 +

∂ 2u∂y2 )(x), x =

(xy

)

Poisson equation: −∆u(x) = f (x), ∀x ∈Ω,

u(x) = g(x), ∀x ∈ Γ,

Page 157: Introduction to Scientific Computing

157 PDE 7.4 Time-independent PDE-s

Denote F = (F1,F2) – 2-dimensional vector field,

(Source)

Define 2 operators: divergence of vec-tor function F and gradient of

divF =∂F1

∂x+

∂F2

∂y

Gradient of scalar function f :

∇ f =(

∂ f∂x

,∂ f∂y

)

Denote n = (nx,ny) – outward unit normal to Γ

Page 158: Introduction to Scientific Computing

158 PDE 7.4 Time-independent PDE-s

Divergence theorem ∫Ω

divF dx =∫

Γ

F ·nds,

Here dx – element of area in R2; ds – arc length along Γ.Denote (u,v) =

∫Ω

u(x)v(x)dx.Applying divergence theorem to: F = (vw,0) and F = (0,vw) (details not shown

here) – Green’s first identity∫Ω

∇u ·∇vdx =∫

Γ

v∂u∂n

ds−∫

Ω

v4udx (21)

we come to a variational formulation of the Poisson problem on V :

V = v : v is continuous on Ω,∂v∂x

and∂v∂y

are piecewise continuous on Ω and

v = 0 on Γ

Page 159: Introduction to Scientific Computing

159 PDE 7.4 Time-independent PDE-s

Poisson problem in Variational Formulation: Find u ∈V such that

a(u,v) = ( f ,v), ∀v ∈V , (22)

where (in case of Poisson equation)

a(u,v) =∫

Ω

∇u ·∇vdx.

Written out, the Variational formulation of Poisson equation is:

∫Ω

[∂u∂x

∂v∂x

+∂u∂y

∂v∂y

]dx =

∫Ω

f vdx

With discretisation, we replace the space V with Vh – space of piecewise linear

Page 160: Introduction to Scientific Computing

160 PDE 7.4 Time-independent PDE-s

functions. Each function in Vh can be written as

vh(x,y) = ∑ηiϕi(x,y),

where ϕi(x,y) – basis function (or ’hat’ functions)

Ni

ϕivh

T on Ω

With 2D FEM we demand that the equation in the Variational formulation is sat-isfied for M basis functions ϕi ∈Vh i.e.∫

Ω

∇uh ·∇ϕidx =∫

Ω

f ϕidx i = 1, . . . ,M

Page 161: Introduction to Scientific Computing

161 PDE 7.4 Time-independent PDE-s

But we have:

uh =M

∑j=1

ξ jϕ j =⇒

∇uh =M

∑j=1

ξ j∇ϕ j

M linear equations with respect to unknowns ξ j:

∫Ω

M

∑j=1

(ξ j∇ϕ j

)·∇ϕidx =

∫Ω

f ϕidx i = 1, . . . ,M

The stiffness matrix A = (ai j) elements and right-hand side b = (bi) calculation:

ai j = a(ϕi,ϕ j) =∫

Ω

∇ϕi ·∇ϕ jdx

bi = ( f ,ϕi) =∫

Ω

f ϕidx

Page 162: Introduction to Scientific Computing

162 PDE 7.4 Time-independent PDE-s

Integrals computed only where the pairs ∇ϕi ·∇ϕ j get in touch (have mutualsupport)

Example

two basis functions ϕi and ϕ j for nodes Ni and N j Their common support is τ ∪ τ ′

so thatai j =

∫Ω

∇ϕi ·∇ϕ jdx =∫

τ

∇ϕi ·∇ϕ jdx+∫

τ ′∇ϕi ·∇ϕ jdx

Ni N j

Nk

τ

τ ′

aτk, j

Ni N j

Nk

τ

ϕi ϕ j

Page 163: Introduction to Scientific Computing

163 PDE 7.4 Time-independent PDE-s

Element matrices

Consider single element τ

Pick two basis functions ϕi and ϕ j (out of three).ϕk – piecewise linear =⇒ denoting by pk(x,y) = ϕk|τ :

pi(x,y) = αix+βiy+ γi,

p j(x,y) = α jx+β jy+ γ j.

=⇒∇ϕi∣∣τ= (∂ pi

∂x ,∂ pi∂y ) = (αi,βi) =⇒ their dot product

∫τ

∇ϕi ·∇ϕ jdx =∫

τ

αiα j +βiβ jdx.

Finding coefficients α and β – put three points (xi,yi,1), (x j,y j,0), (xk,yk,0) to the

Page 164: Introduction to Scientific Computing

164 PDE 7.4 Time-independent PDE-s

plane equation and solve the system

D

αi

βi

γi

=

xi yi 1x j y j 1xk yk 1

αi

βi

γi

=

100

The integral is computed by the multiplication with the triangle area 1

2 |detD|

aτi j =

∫τ

∇ϕi ·∇ϕ jdx =(αiα j +βiβ j

) 12

∣∣∣∣∣∣∣det

xi yi 1x j y j 1xk yk 1

∣∣∣∣∣∣∣

The element matrix for τ is

Aτ =

aτii aτ

i j aτik

aτji aτ

j j aτjk

aτki aτ

k j aτkk

Page 165: Introduction to Scientific Computing

165 PDE 7.4 Time-independent PDE-s

Assembled stiffness matrix

• created by adding appropriately all the element matrices together

• Different types of boundary values used in the problem setup result in slightlydifferent stiffness matrices

• Most typical boundary conditions:

– Dirichlet’

– Neumann

• but also:

– free boundary condition

– Robin boundary condition

Page 166: Introduction to Scientific Computing

166 PDE 7.4 Time-independent PDE-s

Right-hand side

Right-hand side b : bi =∫

Ωf ϕidx. Approximate calculation of an integral through

the f value in the middle of the triangle τ:

f τ = f (xi + x j + xk

3yi + y j + yk

3)

• support – the area of the triangle

• ϕi is a pyramid (integrating ϕi over τ gives pyramid volume)

bτi =

∫τ

f τϕidx =

16

f τ |detDτ |

• Assembling procedure for the RHS b values from bτi (somewhat similarly to

matrix assembly, but more simple)

• Introduction of Dirichlet boundary condition just eliminates the rows.

Page 167: Introduction to Scientific Computing

167 PDE 7.4 Time-independent PDE-s

EXAMPLE

Consider unit square.

Vh – space of piecewise linar functionson Ω with zero on Γ and values defined oninner nodes

y

x

Discretised problem in Variational Formulation: Find uh ∈Vh such that

a(uh,v) = ( f ,v), ∀v ∈Vh (23)

where in case of Poisson equation

a(u,v) =∫

Ω

∇u ·∇vdx

Page 168: Introduction to Scientific Computing

168 PDE 7.4 Time-independent PDE-s

• In FEM the equation (23) needs to be satisfied on a set of testfunctions ϕi =

ϕi(x),

– which are defined such that

ϕi =

1, x = xi

0 x = x j j 6= i

• and it is demanded that (23) is satisfied with each ϕi (i = 1, ...,N)

• As a result, a matrix of the linear equations is obtained

• The resulting matrix identical with the matrix from (20) !

Page 169: Introduction to Scientific Computing

169 PDE 7.5 FEM for more general cases

7.5 FEM for more general casesPoisson equation with varying coefficient (different materials, varying conduc-

tivity, elasticity etc.)

−∇(λ (x,y)∇u) = f

• assume that λ – constant accross each element τ: λ (x,y) = λ τ (x,y) ∈ τ

• =⇒ Elemental matrices Aτ get multiplied by λ τ .

• Refinement methods for finiteelements

– static

– adaptive refinement

Page 170: Introduction to Scientific Computing

170 PDE 7.6 FEM software

Other equations

• For higher order and mixed systems of PDEs, more complex finite elements

– higher order finite elements for discretization

– special type (mixed) finite elements

7.6 FEM software

• Input of data, f , g (boundary cond.), Ω, coefficients λ (x,y) etc.

• Triangulation construction

• Computation of element stiffness matrices Aτ

• Assembly of the global stiffness matrix A and load vector b

Page 171: Introduction to Scientific Computing

171 PDE 7.6 FEM software

• Solution of the system of equations Ax = b

• Presentation of the result

Page 172: Introduction to Scientific Computing

172 PDE 7.7 Sparse matrix storage schemes

7.7 Sparse matrix storage schemes

• As we saw, different discretisation schemes give systems with similar matrixstructures

• (In addition to FDM and FEM often also some other discretisation schemes areused like Finite Volume Method (but we do not consider it here))

• In each case, the result is a system of linear equations with sparse matrix.

How to store sparse matrices?How to store sparse matrices?

Page 173: Introduction to Scientific Computing

173 PDE 7.7 Sparse matrix storage schemes

7.7.1 Triple storage format

• n×m matrix A each nonzero with 3 values: integers i and j and (in most appli-cations) real matrix element ai j. =⇒ three arrays:

indi(1:nz), indj(1:nz), vals(1:nz)of length nz, – number of matrix A nonzeroes

Advantages of the scheme:

• Easy to refer to a particular element

• Freedom to choose the order of the elements

Disadvantages :

• Nontrivial to find, for example, all nonzeroes of a particular row or column andtheir positions

Page 174: Introduction to Scientific Computing

174 PDE 7.7 Sparse matrix storage schemes

7.7.2 Column-major storage format

For each matrix A column k a vector row_ind(j) – giving row numbers i forwhich ai j 6= 0.

• To store the whole matrix, each column nonzeros

– added into a 1-dimensonal array row_ind(1:nz)

– introduce cptr(1:M) referring to column starts of each column inrow_ind.

row_ind(1:nz), cptr(1:M), vals(1:nz)

Advantages:

• Easy to find matrix column nonzeroes together with their positions

Page 175: Introduction to Scientific Computing

175 PDE 7.7 Sparse matrix storage schemes

Disadvantages:

• Algorithms become more difficult to read

• Difficult to find nonzeroes in a particular row

7.7.3 Row-major storage format

For each matrix A row k a vector col_ind(i) giving column numbers j forwhich ai j 6= 0.

• To store the whole matrix, each row nonzeros

– added into a 1-dimensonal array col_ind(1:nz)

– introduce rptr(1:N) referring to row starts of each row in col_ind.

col_ind(1:nz), rptr(1:N), vals(1:nz)

Page 176: Introduction to Scientific Computing

176 PDE 7.7 Sparse matrix storage schemes

Advantages:

• Easy to find matrix row nonzeroes together with their positions

Disadvantages:

• Algorithms become more difficult to read.

• Difficult to find nonzeroes in a particular column.

7.7.4 Combined schemes

Triple format enhanced with cols(1:nz), cptr(1:M), rows(1:nz),

rptr(1:N). Here cols and rows refer to corresponding matrix A values intriple format. E.g., to access row-major type stuctures, one has to index throughrows(1:nz)

Advantages:

Page 177: Introduction to Scientific Computing

177 PDE 7.7 Sparse matrix storage schemes

• All operations easy to perform

Disadvantages:

• More memory needed.

• Reference through indexing in all cases

Page 178: Introduction to Scientific Computing

178 Iterative methods 8.1 Problem setup

8 Iterative methods

8.1 Problem setup

Itereative methods for solving systems of linear equations withsparse matrices

Consider system of linear equations

Ax = b, (24)

where N×N matrix A

• is sparse,

– number of elements for which Ai j 6= 0 is O(N).

• Typical example: Poisson equation discretisation on n×n mesh, (N = n×n)

– in average 5, nonzeros per A row

Page 179: Introduction to Scientific Computing

179 Iterative methods 8.1 Problem setup

In case of direct methods, like LU-factorisation

• memory consumption (together with fill-in): O(N2) = O(n4).

• flops: 2/3 ·N3 +O(N2) = O(n6).

Banded matrix LU-decomposition

• memory consumption (together with fill-in): O(N · L) = O(n3), where L isbandwidth

• flops: 2/3 ·N ·L2 +O(N ·L) = O(n4).

Page 180: Introduction to Scientific Computing

180 Iterative methods 8.2 Jacobi Method

8.2 Jacobi Method• Iterative method for solving (24)

• With given initial approximation x(0), approximate solution x(k), k = 1,2,3, ...of (24) real solution x are calculated as follows:

– i-th component of x(k+1), x(k+1)i is obtained by taking from (24) only the

i-th row:Ai,1x1 + · · ·+Ai,ixi + · · ·+Ai,NxN = bi

– solving this with respect to xi, an iterative scheme is obtained:

x(k+1)i =

1Ai,i

(bi−∑

j 6=iAi, jx

(k)j

)(25)

Page 181: Introduction to Scientific Computing

181 Iterative methods 8.2 Jacobi Method

The calculations are in essence parallel with respect to i – no dependence onother componens x(k+1)

j , j 6= i. Iteration stop criteria can be taken, for example:∥∥∥x(k+1)−x(k)∥∥∥< ε or k+1≥ kmax, (26)

– ε – given error tolerance

– kmax – maximal number of iterations

• memory consumption (no fill-in):

– NA6=0 – number of nonzeroes of matrix A

• Number of iterations to reduce∥∥∥x(k)−x

∥∥∥2< ε

∥∥∥x(0)−x∥∥∥

2:

#IT≥ 2lnε−1

π2 (n+1)2 = O(n2)

Page 182: Introduction to Scientific Computing

182 Iterative methods 8.2 Jacobi Method

• flops/iteration ≈ 10 ·N = O(n2), =⇒

#IT · flopsiteration

=Cn4 +O(n3) = O(n4).

coefficent C in front of n4 is:

C ≈ 2lnε−1

π2 ·10≈ 2 · lnε−1

• Is this good or bad?This is not very good at all... We need some better methods, because

– For LU-decomposition (banded matrices) we had C = 2/3

Page 183: Introduction to Scientific Computing

183 Iterative methods 8.3 Conjugate Gradient Method (CG)

8.3 Conjugate Gradient Method (CG) C a l c u l a t e r(0) = b−Ax(0) wi th g i v e n s t a r t i n g v e c t o r x(0)

f o r i = 1 , 2 , . . .s o l v e Mz(i−1) = r(i−1) # we assume here t h a t M = I nowρi−1 = r(i−1)T z(i−1)

i f i ==1p(1) = z(0)

e l s eβi−1 = ρi−1/ρi−2

p(i) = z(i−1)+βi−1p(i−1)

e n d i fq(i) = Ap(i) ; αi = ρi−1/p(i)T q(i)

x(i) = x(i−1)+αip(i) ; r(i) = r(i−1)−αiq(i)

check c o n v e r g e n c e ; c o n t in u e i f neededend

Page 184: Introduction to Scientific Computing

184 Iterative methods 8.3 Conjugate Gradient Method (CG)

• memory consumption (no fill-in):

NA6=0 +O(N) = O(n2),

where NA6=0 – # nonzeroes ofA

• Number of iterations to achieve∥∥∥x(k)−x

∥∥∥2< ε

∥∥∥x(0)−x∥∥∥

2:

#IT≈ lnε−1

2

√κ(A) = O(n)

• Flops/iteration ≈ 24 ·N = O(n2) , =⇒

#IT · flopsiteration

=Cn3 +O(n2) = O(n3),

Page 185: Introduction to Scientific Computing

185 Iterative methods 8.3 Conjugate Gradient Method (CG)

where C ≈ 12lnε−1 ·√

κ2(A).

=⇒ C depends on condition number of A! This paves the way for preconditioningtechnique

Page 186: Introduction to Scientific Computing

186 Iterative methods 8.4 Preconditioning

8.4 PreconditioningIdea:Replace Ax = b with system M−1Ax = M−1b.Apply CG to

Bx = c, (27)

where B = M−1A and c = M−1b.But how to choose M?Preconditioner M = MT to be chosen such that

(i) Problem Mz = r being easy to solve

(ii) Matrix B being better conditioned than A, meaning that κ2(B)< κ2(A)

Page 187: Introduction to Scientific Computing

187 Iterative methods 8.4 Preconditioning

Then#IT(27) = O(

√κ2(B))< O(

√κ2(A)) = #IT(24)

butflops

iteration(27) =

flopsiteration

(24)+ (i) >flops

iteration(24)

• =⇒We need to make a compromise!

• (In extreme cases M = I or M = A)

• Preconditioned Conjugate Gradients (PCG) Method

– obtained if to take in previous algorithm M 6= I

Page 188: Introduction to Scientific Computing

188 Iterative methods 8.5 Preconditioner examples

8.5 Preconditioner examplesDiagonal Scaling (or Jacobi method)

M = diag(A)

(i) flopsIteration = N

(ii) κ2(B) = κ2(A)

⇒ Is this good?no large improvement to be expeted

Page 189: Introduction to Scientific Computing

189 Iterative methods 8.5 Preconditioner examples

Incomplete LU-factorisation

M = LU ,

• L and U – approximations to actual factors L and U in LU-decompoition

– nonzeroes in Li j and Ui j only where Ai j 6= 0 (i.e. fill-in is ignored in LU-factorisation algorithm)

(i) flopsIiteration = O(N)

(ii) κ2(B)< κ2(A)

How good is this preconditioner?Some improvement at least expected!

κ2(B) = O(n2)

Page 190: Introduction to Scientific Computing

190 Iterative methods 8.5 Preconditioner examples

Gauss-Seidel method do k=1,2,...

do i=1,...,n

x(k+1)i =

1Ai,i

(bi−

i−1

∑j=1

Ai jx(k+1)j −

n

∑j=i+1

Ai, jx(k)j

)(28)

enddo

enddoNote that in real implementation, the method is done like:

do k=1,2,...

do i=1,...,n

xi =1

Ai,i

(bi−∑

j 6=iAi jx j

)(29)

enddo

enddoDo you see a problem with this preconditioner (with PCG method)?But the preconditioner is not symmetric, which makes CG not to converge!

Page 191: Introduction to Scientific Computing

191 Iterative methods 8.5 Preconditioner examples

Symmetric Gauss-Seidel methodTo get the symmetric preconditioner, another step is added:

do k=1,2,...

do i=1,...,n

xi =1

Ai,i

(bi−∑ j 6=i Ai jx j

)enddo

enddo

do k=1,2,...

do i=n,...,1

xi =1

Ai,i

(bi−∑ j 6=i Ai jx j

)enddo

enddo

Page 192: Introduction to Scientific Computing

192 Examples 9.1 Google PageRank problem

9 Some examples9.1 Google PageRank problem9.1.1 Overview

WWW is a huge collection of data distributed around the globe, in constant changeand growth

# pages indexed by Google (estimate)

May-June 2000 1 billion

November-December 2000 1.3 billion

July - August 2002 2.5 billion

November - December 2002 4 billion

January - February 2004 4.28 billion

November - December 2004 8 billion

August 2005 8.2 billion

January 2007 14 billionWas roughly, doubling every 16 months

Page 193: Introduction to Scientific Computing

193 Examples 9.1 Google PageRank problem

• According to http://www.livescience.com/54094-how-big-is-the-internet.html

– 4.66 billion Web pages online as of mid-March 2016

– but this is only searchable web!

• Need really good tools for navigating, searching, indexing the information

(According to some estimates, number of pages indexed by search engines has notgrown very much last years)

Page 194: Introduction to Scientific Computing

194 Examples 9.1 Google PageRank problem

How does Internet look

like?

Maps of the Internet(http://www.opte.org/maps/)

OK, these are just servers.Imagine, how would theWWW look like?

Page 195: Introduction to Scientific Computing

195 Examples 9.1 Google PageRank problem

9.1.2 PageRank Problem Description

Original proposal of the PageRank algorithm by L. Page, S. Brin, R. Motwani andT. Winograd, 1998

• one of the reasons why Google is so effective

• a method for computing the relative rank of web pages

• based on web link structure

• has become a natural part of modern search engines

• Also, a useful tool applied in many other search technologies, for example

– Web spam detection [Z.Gyöngyi et al 2004]

– crawler configuration

– P2P trust networks [S.D.Kamvar et al 2003]

Page 196: Introduction to Scientific Computing

196 Examples 9.1 Google PageRank problem

9.1.3 Markov process

Surfing the web, going from page to page by randomly choosing an outgoing link

• can lead to dead ends (dangling nodes)

• cycles

Sometimes choosing simply a random page from the Web.

Markov chain or Markov process

The limiting probability that an infinitely dedicated random surfer visits anyparticular page is its PageRank

Page 197: Introduction to Scientific Computing

197 Examples 9.1 Google PageRank problem

9.1.4 Mathematical formulation of PageRank problem

Problem setup

W - set of web pages reachable in a chain following hyperlinks from a root page

G - corresponding n×n connectivity matrix:

gi j =

1 if ∃ hyperlink i← j0 otherwise.

• G can be huge, is sparse, column j shows the links on jth page

• # nonzeros in G - the total number of hyperlinks in W

Page 198: Introduction to Scientific Computing

198 Examples 9.1 Google PageRank problem

Let ri and c j be the row and column sums of G:

ri = ∑j

gi j, c j = ∑i

gi j.

• ri - in-degree of the ith page

• c j - out-degree of the jth page.

Let p - the probability that the random walk follows a link.

• A typical value is p = 0.85

• 1− p is the probability that some arbitrary page is chosen

• δ = (1− p)/n - probability that a particular random page is chosen.

Page 199: Introduction to Scientific Computing

199 Examples 9.1 Google PageRank problem

Let B be the n×n matrix with elements bi j:

bi j =

pgi j/c j +δ : c j 6= 0

1/n : c j = 0

Notice that:

• B is not sparse

• most of the values = δ (the probability of jumping from one page to anotherwithout following link)

• If n = 4 ·109 and p = 0.85, then δ = 3.75 ·10−11

• B - the transition probability matrix of the Markov chain

• 0 < bi j < 1

• ∑ni=1 bi j = 1, ∀ j

Page 200: Introduction to Scientific Computing

200 Examples 9.1 Google PageRank problem

Matrix theory: Perron-Frobenius theorem applies:∃! (within a scaling factor) solution x 6= 0 of the equation

x = Bx.

If the scaling factor is chosen such that ∑i xi = 1 then x is the state vector of theMarkov chain and is Google’s PageRank; 0 < xi < 1.

Page 201: Introduction to Scientific Computing

201 Examples 9.1 Google PageRank problem

9.1.1 Power method Algorithm: Power method

Input: Matrix B, initial vector x, threashold ε

Output: PageRank vector yrepeat

x← Bxuntil ‖x−Bx‖< ε

y← x/‖x‖In practice, matrix B (or G) is never formed.

Page 202: Introduction to Scientific Computing

202 Examples 9.1 Google PageRank problem

9.1.2 Transfer to a linear system solution

the first idea: the solution of the problem

x = Bx

being equivalent to(I−B)x = 0

But, the non-sparsity of I−B !

Is there a better way?

Page 203: Introduction to Scientific Computing

203 Examples 9.1 Google PageRank problem

Yes: Note that

B = pGD+ ezT , (30)

where D - diagonal matrix

d j j =

1/c j : c j 6= 0

0 : c j = 0, e =

11...1

, z j =

δ : c j 6= 0

1/n : c j = 0

• ezT - rank-one matrix - the random choices of Web pages that do not followlinks.

Page 204: Introduction to Scientific Computing

204 Examples 9.1 Google PageRank problem

The equationx = Bx

is becoming thus due to (30):

x = (pGD+ ezT )x

x− pGDx = e zT x︸︷︷︸γ

(I− pGD)︸ ︷︷ ︸x = γe ,

A

Page 205: Introduction to Scientific Computing

205 Examples 9.1 Google PageRank problem

we get the system of linear equations to solve:

Ax = e (31)

(We temporarily take γ = 1.) After solution of (31), the resulting x can be scaled sothat ∑i xi = 1 to obtain PageRank.

Note that the matrix A = I− pGD is

• sparse

• nonsinguar, if p < 1

• nonsymmetric

• huge in size

Page 206: Introduction to Scientific Computing

206 Examples 9.2 Graph partitioning

9.2 Graph partitioning

• Consider graph G=(T,K), T – set of nodes, K – Set of edges (each connectingtwo nodes)

• One of the key tasks: dividing graph G into two subgraphs Gi = (Ti,Ki), i = 1,2without intersection by erasing a set of edges of G

• Subgraphs need to fullfill certain criteria:

– A. more or less equal sizes – #T1 ≈ #T2

∗ (# denoting number of elements in a given set)

∗ For example, in parallel processing, this means equal load in algo-rithms associated with the graph

– B. minimum # of erased edges. (In parallel program – minimising theamount of communication between different processors)

Page 207: Introduction to Scientific Computing

207 Examples 9.2 Graph partitioning

Graph partitioning is applied to the dual graph to obtain two approximately equalsets of triangles.

Note. The method can be applied recursively to obtain other numbers of subdo-mains

Page 208: Introduction to Scientific Computing

208 Examples 9.2 Graph partitioning

9.2.1 Mathematical formulation

Let N = #T – number of nodes. Assign to each graph G node ti a real number xi

– obtaining vector x = xi ∈ RN . Define set of allowed vectors

V = x ∈ RN : xi =+1 or −1 ∀t ∈ T.

Introduce vector e = (1,1, ...,1)T ∈ V . Assuming now that N – even, partitioningproblem can be written as follows:

Find x ∈ V for whicheT x = 0 (32)

and which minimises the functional:

φ(x) := ∑(k,s)∈K

(xk− xs)2 (33)

over whole set of allowed vectors V .

Page 209: Introduction to Scientific Computing

209 Examples 9.2 Graph partitioning

• Solving this problem, corresponding graph is divided into two subgraphs settingeach edge t for which

– xt = 1 into first set

– xt =−1 into the second set

• At the same time, the number (33) gives the number of erased edges multipliedwith 4

• Equal size of subgraphs follows from (32)

• Minimum of erased edges is quaranteed by (33)

Page 210: Introduction to Scientific Computing

210 Examples 9.2 Graph partitioning

Alternative formula for φ(x)Introduce graph G adjoint matrix AG = (ai j) , which is N×N matrix consisting

only of values 0 or 1:

ai j =

1, ∃ edge from node i to node j,0, otherwise.

• Note that matrix AG is symmetric

• Denote by dt number of connections from node t ∈ T and introduce diagonalmatrix

DG = diagdt : t ∈ T.

Define graph’s Laplacian matrix LG:

LG = DG−AG. (34)

Page 211: Introduction to Scientific Computing

211 Examples 9.2 Graph partitioning

Lemma 9.1.Functional φ and Laplacian matrix LG are connected with each other as follows:

12

φ(x) = xT LGx.

Proof. (Actually quite straightforward, leaving here as an exercise)Using Lemma 9.1 the partitioning problem can be reformulated as follows:Find x ∈ V such that

eT x = 0 (35)

and which minimises the expression

xT LGx over all elements in x ∈ V . (36)

From Lemma 9.1 it follows that

Page 212: Introduction to Scientific Computing

212 Examples 9.2 Graph partitioning

• LG is

– symmetric

– non-negative definite

• All matrix LG eigenvalues λi ≥ 0

– Indeed, from Lemma 8.1 we see that LGx= 0 iff xt is constant on the wholeset t ∈ T , ie. 0 is eigenvalue of matrix LG with a corresponding eigenspacespane.

Exact solution to the linear planning problem (35),(36) is difficult to find. Therefore,in practice, minimisation over the allowed vector set V is replaced with minimisationover the RN subspace.

Page 213: Introduction to Scientific Computing

213 Examples 9.2 Graph partitioning

9.2.2 Modified problem

Find x ∈ RN such thateT x = 0 (37)

and minimises the expression

xT LGx over all elementsx ∈ RN , where‖x‖2 = N1/2. (38)

Eigenvalue problem formulationIn eigenvalue problem pairs (λ ,v) are searched, for which the following equation

is satisfied:LGv = λv. (39)

• In case of symmetric matrix all the eigenvalues λi, i = 1, ...,N are real.

Page 214: Introduction to Scientific Computing

214 Examples 9.2 Graph partitioning

• Eigenvectors are orthogonal with each other (ie. vTi v j = 0 if i 6= j) and form

a basis for space RN . From here it follows, that each vector x ∈ RN can berepresented as

x =N

∑i=1

wivi with mutliplierswi ∈ R. (40)

Lemma 9.2.Problem (37)(38) solution is given by matrix LG eigenvector corresponding to the

smallest matrix LG positive eigenvalue λ2.Proof. Exercise(Using (34) show that matrix LG eigenvector is v1 = e and corresponding eigenvalue is λ1 = 0. Then from (37) follows that w1 = 0 in expression (40). To find

minimum of the expression xT LGx use (40) and (39) and nonnegative definite property of the matrix.)

Page 215: Introduction to Scientific Computing

215 Examples 9.2 Graph partitioning

9.2.3 Method of Spectral Bisectioning

In case of graph G spectral bisectioning method an eigenvector x corresponding tomatrix LG smallest positive eigenvalue λ2 is found.

• x consists of real numbers, not integer

• For obtaining bisection, arithmetic average a of vector x elements is found.

– All graph G nodes t for which xt < a are assigned to the first set

– xt ≥ a are assigned to the second set

Despite of heuristic nature of the algorithm, it results often in bisections of goodquality.

Details of the algorithm was a result of the scientific work:N. P. Kruyt, A conjugate gradient algorithm for the spectral partitioning of graphs,

Parallel Computing, 22 (1997), 1493 - 1502.

Page 216: Introduction to Scientific Computing

216 Examples 9.2 Graph partitioning

9.2.4 Finding smallest positive eigenvalue of matrix LG

As mentioned above, to find graph bisectioning, arithmetic average a of eigenvec-tor x needs to be found. (based of which, graph is divided) But as the result does notdepend on vector x norm, we are free to normalise ‖x‖2 = 1.

9.2.5 Power Method Choose x0 as an initial approximation

for k=0,1,...

yk := LGxk

xk+1 := yk/‖yk‖2 # normalisation

Page 217: Introduction to Scientific Computing

217 Examples 9.2 Graph partitioning

Theorem 9.1. Let eigenvalues of matrix LG be

0 = λ1 < λ2 ≤ ·· · ≤ λN

with corresponding orthonormal eigenvectors vi, i = 1, ...,N. Assume that the initialapproximation x0 has the direction xN represented , ie.

x0 =N

∑i=1

αivi, whereαN 6= 0. (41)

If k→∞, then xk converges to the vector vN – eigenvector corresponding to the largesteigenvalue λN of matrix LG.

Proof. Exercise. (Using (41) and (39) write out Akx0 . Taking λN in front of summation, show that multipliers in front of vi , i = 1, ..,N−1 tend to

zero.)

Page 218: Introduction to Scientific Computing

218 Examples 9.2 Graph partitioning

• Assuming that σ is not an eigenvalue of LG, then replacing the matrix LG withmatrix (LG−σ I)−1,

– we get an algorithm that convereges to the matrix (LG−σ I)−1 eigenvectorcorresponding to the dominant eigenvalue.

– This is the eigenvector corresponding to the closest to σ of eigenvalues ofmatrix LG.

Page 219: Introduction to Scientific Computing

219 Examples 9.2 Graph partitioning

9.2.6 Inverse Power Method. Choose x0 as an initial approximation vector

Choose σ which is close, but not equal to the

eigenvalue we are searching

for k=0,1,...

solve (LG−σ I)yk = xk for yk

xk+1 := yk/‖yk‖2 # normalisationIn our case, if we choose σ < 0, convergence will be towards e , eigenvector corre-sponding to the eigenvalue λ1 = 0.

Page 220: Introduction to Scientific Computing

220 Examples 9.2 Graph partitioning

To find eigenvector corresponding to the matrix LG smallest positive eigenvalue,we can use inverse power method but at the same time to assure, that the iterationvector xk won’t have any component towards e. For that we notice that

P(x) = x−((eT x)/(eT e)

)e

makes the vector x orthogonal against vector e.

Page 221: Introduction to Scientific Computing

221 Examples 9.2 Graph partitioning

9.2.7 Inverse Power Method with Projection Choose x0 as an initial approximation vector

Choose σ which is close, but not equal to the

eigenvalue we are searching

Normalise x0 := x0/‖x0‖2

Apply the projection x0← P(x0)

for k=0,1,...

solve (LG−σ I)yk = xk for yk

Apply the projection yk← P(yk)

normalise xk+1 := yk/‖yk‖2Remark. Number

R(xk) = xkTLGxk/xkT

xk

is called Rayleigh Quotient. In our case xkT xk = 1 then

Page 222: Introduction to Scientific Computing

222 Examples 9.2 Graph partitioning

R(xk) = xkTLGxk

• If xk converges to LG eigenvector vi, then Raylegh Quotient R(xk) converges tocorresponding eigenvalue λi.

• Rayleigh Quotient makes it possible to compute, how good (λi,vi) actually is.

– For this purpose one can calculate L2-eigenvalue-norm:‖LGvi−R(vi)vi‖2.

Page 223: Introduction to Scientific Computing

223 // Computing 10.1 Motivation

10 Parallel Computing

10.1 MotivationWhy do we need increasingly much computing power?Much-cited legend: In 1947 computer engineer Howard Aiken said that USA will

need in the future at most 6 computers!

Gordon Moore’s (founder of Intel) law:(1965: the number of switches doubles every second year )1975: - refinement of the above: The number of switches on a CPU doubles

every 18 months

Until 2020 or 2030 we would reach in such a way to the atomic level (or quantumcomputer)

Page 224: Introduction to Scientific Computing

224 // Computing 10.1 Motivation

Flops:

first computers 102 100 Flopsdesktop computers today 109 Gigaflops (GFlops)supercomputersnowadays 1012 Teraflops (TFlops)

the aim of today 1015 Petaflops (PFlops)next step 1018 Exaflops (EFlops)

• How can parallel processing help achieving these goals?

• Why should we talk about it at the Scientific Computing course?

Example 10.1. Why speeds of order 1012 are not enough?Weather forecast in Europe for next 48 hoursfrom sea lever upto 20km– need to solve a PDE in 3D (xyz and t)The volume of the region: 5000∗4000∗20 km3.Stepsize in xy-plane 1km Cartesian mesh z-direction: 0.1km. Timesteps: 1min.

Around 1000 flop per each timestep. As a result, ca

Page 225: Introduction to Scientific Computing

225 // Computing 10.1 Motivation

5000∗4000∗20km3×10 rectangles per km3 = 4∗109meshpoints

and4∗109 meshpoints ∗48∗60timesteps×103 flop≈ 1.15∗1015 flop.

If a computer can do 109 flops, it will take around

1.15∗1015 flop / 109 flop= 1.15∗106 seconds ≈ 13 days !!

But, with 1012 flops,1.15∗103 seconds < 20min.

Not hard to imagine “small” changes in given scheme such that 1012 flops not enough:

Page 226: Introduction to Scientific Computing

226 // Computing 10.1 Motivation

• Reduce the mesh stepsize to 0.1km in each directions and timestep to 10 secondsand the total time would grow from

20min to 8days.

• We could replace the Europe with the whole World model (the area: 2 ∗107km2 −→ 5∗108km2) and to combine the model with an Ocean model.

Therefore, we must say: The needs of science and technology grow faster thanavailable possibilities, need only to change ε and h to get unsatisfied again!

But again, why do we need a parallel computer to achieve this goal?

Page 227: Introduction to Scientific Computing

227 // Computing 10.1 Motivation

Example 10.2Have a look at the following piece of code:

do i=1,1000000000000

z(i)=x(i)+y(i) # ie. 3∗1012 memory accesses

end doAssuming, that data is traveling with the speed of light 3 ∗ 108m/s, for finishing

the operation in 1 second, the average distance of a memory cell must be:

r = 3∗108m/s∗1s3∗1012 = 10−4m.

Typical computer memory – a rectangular mesh. The length of each edge would be

2∗10−4m√3∗1012 ≈ 10−10m,

– the size of a small atom!

Page 228: Introduction to Scientific Computing

228 // Computing 10.1 Motivation

Why parallel computing not yet the only way of doing it?Three main reasons: hardware , algorithms and software

Hardware: speed of

• networking

• peripheral devices

• memory access

do not cope with the speed growth in pro-cessor capacity.

Algorithms: An enormous number ofdifferent algorithms exist but

• problems start with their implemen-tation on some real life application

Software: development in progress;

• no good enough automaticparallelisers

• everything done as handwork

• no easily portable libraries for dif-ferent architectures

• does the paralleising effort pay offat all?

Page 229: Introduction to Scientific Computing

229 // Program Design 10.2 Design and Evaluation of Parallel Programs

10.2 Design and Evaluation of Parallel Programs

10.2.1 Two main approaches in parallel program design

• Data-parallel approach

– Data is divided between processes so that each process does the same op-erations but with different data

• Control-parallel approach

– Each process has access to all pieces of data, but they perform differentoperations on them

Majority of parallel programs use mix of the above

Page 230: Introduction to Scientific Computing

230 // Program Design 10.2 Design and Evaluation of Parallel Programs

What obstacles can be accounted?

• Data dependencies

Example. Need to solve a 2×2 system of linear equations(a 0b c

)(xy

)=

(fg

)

Data partitioning: First row goes to process 0, 2nd row to process 1Isn’t it easy! −→ NOT!Write out as system of two equations:

ax = f

bx +cy = g.

Computation of y on process 1 depends on x computed on process 0. Therefore weneed

Page 231: Introduction to Scientific Computing

231 // Program Design 10.2 Design and Evaluation of Parallel Programs

• Communication (more expensive than arithmetic operations or memoryreferences)

• Synchronisation (Processes that are just waiting are useless)

• Extra work (avoid communication or write a parallel program)

Which way to go?

• Divide data evenly between machines −→ load balancing

• Reducing data dependencies −→ good partitioning (geometric, for example)

• Make communication-free parts of the calculations as large as possible −→ al-gorithms with large granularity

Given requirements are not easily achievable. Some problems are easily parallelisable,others are not...

Page 232: Introduction to Scientific Computing

232 // Program Design 10.3 Assessing parallel programs

10.3 Assessing parallel programs10.3.1 Speedup

S(N,P) :=tseq(N)

tpar(N,P)

• tseq(N) time solving the same problem with best known sequential algorithm

– =⇒ often sequential algorithm differs from the parallel one!

– If to time the same parallel algorithm just on one processor (despite ofexistance of a better sequential algorithm), corresponding ratio is calledrelative speedup

• 0 < S(N,P)≤ P

• If S(N,P)=P – the parallel program has linear or optimal speedup. (Example:calculating π with a parallel quadrature formula)

Page 233: Introduction to Scientific Computing

233 // Program Design 10.3 Assessing parallel programs

• Sometimes it may happen that S(N,P)> P How is it possible?due to processor cache– called superlinear speedup

• But sometimes S(N,P)< 1 – What does this mean?slowdown instead of speedup!

10.3.2 Efficiency

E(N,P) :=tseq(N)

P · tpar(N,P).

Presumably, 0 < E(N,P)≤ 1.

Page 234: Introduction to Scientific Computing

234 // Program Design 10.3 Assessing parallel programs

10.3.3 Amdahl’s lawEach algorithm has some part(s) not parallelisable

• Let σ (0 < σ ≤ 1) denote sequential part of a parallel program that cannot beparallelised

• Assume, 1−σ part be parallelised optimally, =⇒

S(N,P) =tseq(

σ + 1−σ

P

)tseq

=1

σ + 1−σ

P

≤ 1σ.

If e.g. 5% of the algo-rithm is not parallelisable(i.e. σ = 0.05), we get that

P S(N,P)

2 1.94 3.5

10 6.920 10.3

100 16.8∞ 20

=⇒ using a large num-ber of processors seemsuseless for gaining any rea-sonable speedup increase!

Page 235: Introduction to Scientific Computing

235 // Program Design 10.3 Assessing parallel programs

10.3.4 Validity of Amdahl’s law

John Gustavson & Ed Barsis (Scania Laboratory):1024-processor nCube/10 bet the Amdahl’s law! Is it possible at all?In their problem they had:σ ≈ 0.004...0.008Got S≈ 1000(According to Amdahl’s law S should have been 125...250 only !!!)Does Amdahl’s law hold?Mathematically, Amdahl’s law holds, of courseDoes it make sense to solve a problem with fixed problem size N on arbitrarily

large number of processors? (What if only 1000 operations at all in the whole calcu-lation – does it helpusing 1001 processors? ;-)

• The point is, that usually σ is not constant, but is reducing with N growing

• It is said that algorithm is efficiently parallel, if σ → 0 with N→ ∞

Page 236: Introduction to Scientific Computing

236 // Program Design 10.3 Assessing parallel programs

To avoid problems with the terms, often term scaled efficiency is used

10.3.5 Scaled efficiency

ES(N,P) :=tseq(N)

tpar(P ·N,P)

• Does solution time remain the same with problem size change?

• 0 < ES(N,P)≤ 1

• If ES(N,P) = 1, we have linear speedup

Page 237: Introduction to Scientific Computing

237 // Program Design 10.3 Assessing parallel programs

11 Parallel programming models

Many different models for expressing parallelism in programming languages

• Actor model

– Erlang

– Scala

• Coordination languages

– Linda

• CSP-based (Communicating Se-quential Processes)

– FortranM

– Occam

• Dataflow

– SISAL (Streams and Itera-tion in a Single AssignmentLanguage)

• Distributed

– Sequoia

– Bloom

• Event-driven and hardware de-scription

Page 238: Introduction to Scientific Computing

238 // Program Design 10.3 Assessing parallel programs

– Verilog hardware descriptionlanguage (HDL)

• Functional

– Concurrent Haskell

• GPU languages

– CUDA

– OpenMP

– OpenACC

– OpenHMPP (HMPP for Hy-brid Multicore Parallel Pro-gramming)

• Logic programming

– Parlog

• Multi-threaded

– Clojure

• Object-oriented

– Charm++

– Smalltalk

• Message-passing

– MPI

– PVM

• Partitioned global address space(PGAS)

– High Performance Fortran(HPF)

Page 239: Introduction to Scientific Computing

239 // programming models 11.1 HPF

11.1 HPF

Partitioned global address space parallel programming modelFortran90 extension

• SPMD (Single Program Multiple Data) model

• each process operates with its own part of data

• HPF commands specify which processor gets which part of the data

• Concurrency is defined by HPF commands based on Fortran90

• HPF directives as comments:!HPF$ <directive>

Page 240: Introduction to Scientific Computing

240 // programming models 11.1 HPF

HPF example: matrix multiplication PROGRAM ABmult

IMPLICIT NONEINTEGER , PARAMETER : : N = 100INTEGER , DIMENSION (N,N) : : A, B , CINTEGER : : i , j

! HPF$ PROCESSORS s q u a r e ( 2 , 2 )! HPF$ DISTRIBUTE (BLOCK,BLOCK) ONTO s q u a r e : : C! HPF$ ALIGN A( i , ∗ ) WITH C( i , ∗ )! r e p l i c a t e c o p i e s o f row A ( i , ∗ ) on to proc . s which compute C( i , j )! HPF$ ALIGN B(∗ , j ) WITH C(∗ , j )! r e p l i c a t e c o p i e s o f c o l . B (∗ , j ) ) on to proc . s which compute C( i , j )

A = 1B = 2C = 0DO i = 1 , N

DO j = 1 , N! A l l t h e work i s l o c a l due t o ALIGNsC( i , j ) = DOT_PRODUCT(A( i , : ) , B ( : , j ) )

END DOEND DOWRITE(∗ ,∗ ) CEND

Page 241: Introduction to Scientific Computing

241 // programming models 11.1 HPF

HPF programming methodology

• Need to find balance between concurrency and communication

• the more processes the more communication

• aiming to

– find balanced load based from the owner calculates rule

– data locality

Easy to write a program in HPF but difficult to gain good efficiencyProgramming in HPF technique is more or less like this:

1. Write a correctly working serial program, test and debug it

2. add distribution directives introducing as less as possible communication

Page 242: Introduction to Scientific Computing

242 // programming models 11.2 OpenMP

11.2 OpenMP

OpenMP tutorial (http://www.llnl.gov/computing/tutorials/openMP/)

• Programming model based on thread parallelism

• Helping tool for a programmer

• Based on compiler directivesC/C++, Fortran*

• Nested Parallelism is supported(though, not all implementations support it)

• Dynamic threads

• OpenMP has become a standard

Page 243: Introduction to Scientific Computing

243 // programming models 11.2 OpenMP

• Fork-Join model

Page 244: Introduction to Scientific Computing

244 // programming models 11.2 OpenMP

OpenMP Example: Matrix Multiplication: ! h t t p : / / www. l l n l . gov / comput ing / t u t o r i a l s / openMP / e x e r c i s e . h tm lC∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗C OpenMp Example − Ma t r ix M u l t i p l y − F o r t r a n V e r s i o nC FILE : omp_mm . fC DESCRIPTION :C D e m o n s t r a t e s a m a t r i x m u l t i p l y u s i n g OpenMP . Threads s h a r e row i t e r a t i o n sC a c c o r d i n g to a p r e d e f i n e d chunk s i z e .C LAST REVISED : 1 / 5 / 0 4 B l a i s e BarneyC∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗

PROGRAM MATMULT

INTEGER NRA, NCA, NCB, TID , NTHREADS, I , J , K, CHUNK,+ OMP_GET_NUM_THREADS, OMP_GET_THREAD_NUM

C number of rows in m a t r i x APARAMETER (NRA=62)

C number of columns in m a t r i x APARAMETER (NCA=15)

C number of columns in m a t r i x BPARAMETER (NCB=7)

REAL∗8 A(NRA,NCA) , B(NCA,NCB) , C(NRA,NCB)

Page 245: Introduction to Scientific Computing

245 // programming models 11.2 OpenMP

C S e t loop i t e r a t i o n chunk s i z eCHUNK = 10

C Spawn a p a r a l l e l r e g i o n e x p l i c i t l y s c o p i n g a l l v a r i a b l e s!$OMP PARALLEL SHARED(A, B , C ,NTHREADS,CHUNK) PRIVATE ( TID , I , J ,K)

TID = OMP_GET_THREAD_NUM( )IF ( TID .EQ . 0 ) THEN

NTHREADS = OMP_GET_NUM_THREADS( )PRINT ∗ , ’ S t a r t i n g m a t r i x m u l t i p l e example wi th ’ , NTHREADS,

+ ’ t h r e a d s ’PRINT ∗ , ’ I n i t i a l i z i n g m a t r i c e s ’

END IF

C I n i t i a l i z e m a t r i c e s!$OMP DO SCHEDULE( STATIC , CHUNK)

DO 30 I =1 , NRADO 30 J =1 , NCA

A( I , J ) = ( I−1) +( J−1)30 CONTINUE

!$OMP DO SCHEDULE( STATIC , CHUNK)DO 40 I =1 , NCA

DO 40 J =1 , NCBB( I , J ) = ( I−1)∗ ( J−1)

40 CONTINUE

Page 246: Introduction to Scientific Computing

246 // programming models 11.2 OpenMP

!$OMP DO SCHEDULE( STATIC , CHUNK)DO 50 I =1 , NRA

DO 50 J =1 , NCBC( I , J ) = 0

50 CONTINUE

C Do m a t r i x m u l t i p l y s h a r i n g i t e r a t i o n s on o u t e r l oopC D i s p l a y who does which i t e r a t i o n s f o r d e m o n s t r a t i o n p u r p o s e s

PRINT ∗ , ’ Thread ’ , TID , ’ s t a r t i n g m a t r i x m u l t i p l y . . . ’!$OMP DO SCHEDULE( STATIC , CHUNK)

DO 60 I =1 , NRAPRINT ∗ , ’ Thread ’ , TID , ’ d i d row ’ , I

DO 60 J =1 , NCBDO 60 K=1 , NCA

C( I , J ) = C( I , J ) + A( I ,K) ∗ B(K, J )60 CONTINUE

C End of p a r a l l e l r e g i o n!$OMP END PARALLEL

C P r i n t r e s u l t sPRINT ∗ , ’∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’PRINT ∗ , ’ R e s u l t Ma t r i x : ’DO 90 I =1 , NRA

Page 247: Introduction to Scientific Computing

247 // programming models 11.2 OpenMP

DO 80 J =1 , NCBWRITE(∗ , 7 0 ) C( I , J )

70 FORMAT(2 x , f8 . 2 , $ )80 CONTINUE

PRINT ∗ , ’ ’90 CONTINUE

PRINT ∗ , ’∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ’PRINT ∗ , ’ Done . ’

END

Page 248: Introduction to Scientific Computing

248 // programming models 11.3 GPGPU

11.3 GPGPU

General-purpose computing on graphics processing units

11.3.1 CUDA

• Parallel programming platform andmodel created by NVIDIA

• CUDA gives access to

– virtual instruction set

– memory on parallel computa-tional elements on GPUs

• based on executing a large numberof threads simultaneously

• CUDA-accelerated libraries

• CUDA-extended programming lan-guages (like nvcc)

– C, C++, Fortran

Page 249: Introduction to Scientific Computing

249 // programming models 11.3 GPGPU

CUDA Fortran Matrix Multiplication example (for more information, see http://www.pgroup.com/lit/articles/insider/v1n3a2.htm)

Host (CPU) code:

s u b r o u t i n e mmul ( A, B , C )

use c u d a f o rrea l , dimension ( : , : ) : : A, B , Ci n t e g e r : : N, M, Lrea l , d ev i ce , a l l o c a t a b l e , dimension ( : , : ) : : Adev , Bdev , Cdevtype ( dim3 ) : : dimGrid , dimBlockN = s i z e (A, 1 ) ; M = s i z e (A, 2 ) ; L = s i z e (B , 2 )a l l o c a t e ( Adev (N,M) , Bdev (M, L ) , Cdev (N, L ) )Adev = A( 1 : N, 1 :M)Bdev = B ( 1 :M, 1 : L )dimGrid = dim3 ( N/ 1 6 , L / 1 6 , 1 )dimBlock = dim3 ( 16 , 16 , 1 )c a l l mmul_kernel <<<dimGrid , dimBlock >>>( Adev , Bdev , Cdev , N,M, L )C ( 1 : N, 1 :M) = Cdevd e a l l o c a t e ( Adev , Bdev , Cdev )

end s u b r o u t i n e

Page 250: Introduction to Scientific Computing

250 // programming models 11.3 GPGPU

GPU code: a t t r i b u t e s ( g l o b a l ) s u b r o u t i n e MMUL_KERNEL( A, B , C , N,M, L )

rea l , d e v i c e : : A(N,M) ,B(M, L ) ,C(N, L )i n t e g e r , v a l u e : : N,M, Li n t e g e r : : i , j , kb , k , tx , t yrea l , s h a r e d : : Ab ( 1 6 , 1 6 ) , Bb ( 1 6 , 1 6 )r e a l : : C i jt x = t h r e a d i d x%x ; t y = t h r e a d i d x%yi = ( b l o c k i d x%x−1) ∗ 16 + t xj = ( b l o c k i d x%y−1) ∗ 16 + t yC i j = 0 . 0do kb = 1 , M, 16

! Fe tch one e l e m e n t each i n t o Ab and Bb ; n o t e t h a t 16 x16 = 256! t h r e a d s i n t h i s t h read−b l o c k are f e t c h i n g s e p a r a t e e l e m e n t s! o f Ab and BbAb ( tx , t y ) = A( i , kb+ ty −1)Bb ( tx , t y ) = B( kb+ tx −1, j )! Wait u n t i l a l l e l e m e n t s o f Ab and Bb are f i l l e dc a l l s y n c t h r e a d s ( )do k = 1 , 16

C i j = C i j + Ab ( tx , k ) ∗ Bb ( k , t y )enddo! Wait u n t i l a l l t h r e a d s i n t h e th read−b l o c k f i n i s h w i t h

Page 251: Introduction to Scientific Computing

251 // programming models 11.3 GPGPU

! t h i s i t e r a t i o n ’ s Ab and Bbc a l l s y n c t h r e a d s ( )

enddoC( i , j ) = C i jend s u b r o u t i n e

11.3.2 OpenCL

Open Computing Language (OpenCL):

• Heterogeneous systems of

– CPUs (central processing units)

– GPUs (graphics processing units)

– DSPs (digital signal processors)

– FPGAs (field-programmable gate arrays)

– and other processors

Page 252: Introduction to Scientific Computing

252 // programming models 11.3 GPGPU

• language (based on C99)

– kernels (functions that execute on OpenCL devices)

– plus application programming interfaces (APIs)

– ( fortrancl )

• parallel computing using

– task-based and data-based parallelism

– GPGPU

• open standard by Khronos Group (Apple, AMD, IBM, Intel and Nvidia)

– + adopted by Altera, Samsung, Vivante and ARM Holdings

Example: Matrix Multiplication

Page 253: Introduction to Scientific Computing

253 // programming models 11.3 GPGPU

/∗ k e r n e l . c l∗ Ma tr i x m u l t i p l i c a t i o n : C = A ∗ B . ( ( Dev ice code ) )( h t t p : / / gpgpu−comput ing4 . b l o g s p o t . co . uk / 2 0 0 9 / 0 9 / ma t r i x−m u l t i p l i c a t i o n −2−o p e n c l . h tm l )

∗ // / OpenCL Ke rn e l_ _ k e r n e l voidmatr ixMul ( _ _ g l o b a l f l o a t ∗ C ,

_ _ g l o b a l f l o a t ∗ A,_ _ g l o b a l f l o a t ∗ B ,i n t wA, i n t wB)

/ / 2D Thread ID/ / Old CUDA code/ / i n t t x = b l o c k I d x . x ∗ TILE_SIZE + t h r e a d I d x . x ;/ / i n t t y = b l o c k I d x . y ∗ TILE_SIZE + t h r e a d I d x . y ;i n t t x = g e t _ g l o b a l _ i d ( 0 ) ;i n t t y = g e t _ g l o b a l _ i d ( 1 ) ;/ / v a l u e s t o r e s t h e e l e m e n t t h a t i s/ / computed by t h e t h r e a df l o a t v a l u e = 0 ;f o r ( i n t k = 0 ; k < wA; ++k )

f l o a t elementA = A[ t y ∗ wA + k ] ;f l o a t elementB = B[ k ∗ wB + t x ] ;

Page 254: Introduction to Scientific Computing

254 // programming models 11.3 GPGPU

v a l u e += elementA ∗ elementB ;/ / W r i t e t h e m a t r i x t o d e v i c e memory each/ / t h r e a d w r i t e s one e l e m e n tC[ t y ∗ wA + t x ] = v a l u e ;

11.3.3 OpenACC

• standard (Cray, CAPS, Nvidia and PGI) to simplify parallel programming ofheterogeneous CPU/GPU systems

• using annotations (like OpenMP), C, C++, Fortran source code

• code started on both CPU and GPU automatically

• OpenACC to merge into OpenMP in a future release of OpenMP

Page 255: Introduction to Scientific Computing

255 // programming models 11.3 GPGPU

! A s i m p l e OpenACC k e r n e l f o r M at r i x M u l t i p l i c a t i o n! $acc k e r n e l s

do k = 1 , n1do i = 1 , n3

c ( i , k ) = 0 . 0do j = 1 , n2

c ( i , k ) = c ( i , k ) + a ( i , j ) ∗ b ( j , k )enddo

enddoenddo

! $acc end k e r n e l s

! h t t p : / / www. bu . edu / t e c h / r e s e a r c h / c o m p u t a t i o n / l i n u x−c l u s t e r / ka tana−c l u s t e r / gpu−comput ing / openacc−f o r t r a n / ma t r i x−m u l t i p l y−f o r t r a n /

program m a t r i x _ m u l t i p l yuse omp_l ibuse openacci m p l i c i t nonei n t e g e r : : i , j , k , myid , m, n , c o m p i l e d _ f o r , o p t i o ni n t e g e r , parameter : : f d = 11i n t e g e r : : t1 , t2 , d t , c o u n t _ r a t e , count_maxrea l , a l l o c a t a b l e , dimension ( : , : ) : : a , b , c

Page 256: Introduction to Scientific Computing

256 // programming models 11.3 GPGPU

r e a l : : tmp , s e c sopen ( fd , f i l e = ’ w a l l c l o c k t i m e ’ , form= ’ f o r m a t t e d ’ )o p t i o n = c o m p i l e d _ f o r ( fd ) ! 1− s e r i a l , 2−OpenMP , 3−OpenACC , 4−bo th

! $omp p a r a l l e l! $ myid = OMP_GET_THREAD_NUM( )! $ i f ( myid . eq . 0 ) t h e n! $ w r i t e ( fd , " ( ’ Number o f p r o c s i s ’ , i 4 ) " ) OMP_GET_NUM_THREADS( )! $ e n d i f! $omp end p a r a l l e l

c a l l sys t em_c lock ( count_max=count_max , c o u n t _ r a t e = c o u n t _ r a t e )do m=1 ,4 ! compute f o r d i f f e r e n t s i z e m a t r i x m u l t i p l i e s

c a l l sys t em_c lock ( t 1 )n = 1000∗2∗∗(m−1) ! 1000 , 2000 , 4000 , 8000a l l o c a t e ( a ( n , n ) , b ( n , n ) , c ( n , n ) )

! I n i t i a l i z e m a t r i c e sdo j =1 , n

do i =1 , na ( i , j ) = r e a l ( i + j )b ( i , j ) = r e a l ( i − j )

enddoenddo

! $omp p a r a l l e l do s h a r e d ( a , b , c , n , tmp ) r e d u c t i o n ( + : tmp )! $ acc d a t a co py in ( a , b ) copy ( c )! $ acc k e r n e l s

Page 257: Introduction to Scientific Computing

257 // programming models 11.3 GPGPU

! Compute m a t r i x m u l t i p l i c a t i o n .do j =1 , n

do i =1 , ntmp = 0 . 0 ! e n a b l e s ACC p a r a l l e l i s m f o r k−l oopdo k =1 , n

tmp = tmp + a ( i , k ) ∗ b ( k , j )enddoc ( i , j ) = tmp

enddoenddo

! $ acc end k e r n e l s! $ acc end d a t a! $omp end p a r a l l e l do

c a l l sys t em_c lock ( t 2 )d t = t2−t 1s e c s = r e a l ( d t ) / r e a l ( c o u n t _ r a t e )w r i t e ( fd , " ( ’ For n = ’ , i4 , ’ , w a l l c l o c k t ime i s ’ , f12 . 2 , ’ seconds ’ ) " ) &

n , s e c sd e a l l o c a t e ( a , b , c )

enddoc l o s e ( fd )

end program m a t r i x _ m u l t i p l y

i n t e g e r f u n c t i o n c o m p i l e d _ f o r ( fd )

Page 258: Introduction to Scientific Computing

258 // programming models 11.3 GPGPU

i m p l i c i t nonei n t e g e r : : f d# i f d e f i n e d _OPENMP && d e f i n e d _OPENACC

c o m p i l e d _ f o r = 4w r i t e ( fd , " ( ’ Th i s code i s compi l ed wi th OpenMP & OpenACC ’ ) " )

# e l i f d e f i n e d _OPENACCc o m p i l e d _ f o r = 3w r i t e ( fd , " ( ’ Th i s code i s compi l ed wi th OpenACC ’ ) " )

# e l i f d e f i n e d _OPENMPc o m p i l e d _ f o r = 2w r i t e ( fd , " ( ’ Th i s code i s compi l ed wi th OpenMP ’ ) " )

# e l s ec o m p i l e d _ f o r = 1w r i t e ( fd , " ( ’ Th i s code i s compi l ed f o r s e r i a l o p e r a t i o n s ’ ) " )

# e n d i fend f u n c t i o n c o m p i l e d _ f o r

Page 259: Introduction to Scientific Computing

259 // programming models CONTENTS

Contents

1 Introduction 31.1 Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Scripting vs programming . . . . . . . . . . . . . . . . . . . . . . . 7

1.3.1 What is a script? . . . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Characteristics of a script . . . . . . . . . . . . . . . . . . . . 81.3.3 Why not stick to Java, C/C++ or Fortran? . . . . . . . . . . . 9

1.4 Scripts yield short code . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 Performance issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.5.1 Scripts can be slow . . . . . . . . . . . . . . . . . . . . . . . 111.5.2 Scripts may be fast enough . . . . . . . . . . . . . . . . . . . 131.5.3 When scripting is convenient . . . . . . . . . . . . . . . . . . 141.5.4 When to use C, C++, Java, Fortran . . . . . . . . . . . . . . . 15

2 What is Scientific Computing? 16

Page 260: Introduction to Scientific Computing

260 // programming models CONTENTS

2.1 Introduction to Scientific Computing . . . . . . . . . . . . . . . . . . 162.2 Specifics of computational problems . . . . . . . . . . . . . . . . . . 222.3 Mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Approximation in Scientific Computing 293.1 Sources of approximation error . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Error sources that are under our control . . . . . . . . . . . . 293.1.2 Errors created during the calculations . . . . . . . . . . . . . 313.1.3 Forward error (arvutuslik viga e. tulemuse viga) and back-

ward error (algandmete viga) . . . . . . . . . . . . . . . . . . 343.2 Floating-Point Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 423.3 Normalised floating-point numbers . . . . . . . . . . . . . . . . . . . 443.4 IEEE (Normalised) Arithmetics . . . . . . . . . . . . . . . . . . . . 46

4 Python in Scientfic Computing 504.1 Numerical Python (NumPy) . . . . . . . . . . . . . . . . . . . . . . 50

4.1.1 NumPy: making arrays . . . . . . . . . . . . . . . . . . . . . 53

Page 261: Introduction to Scientific Computing

261 // programming models CONTENTS

4.1.2 NumPy: making float, int, complex arrays . . . . . . . . . . . 544.1.3 Array with a sequence of numbers . . . . . . . . . . . . . . . 554.1.4 Warning: arange is dangerous . . . . . . . . . . . . . . . . . 564.1.5 Array construction from a Python list . . . . . . . . . . . . . 564.1.6 From “anything” to a NumPy array . . . . . . . . . . . . . . 584.1.7 Changing array dimensions . . . . . . . . . . . . . . . . . . . 604.1.8 Array initialization from a Python function . . . . . . . . . . 614.1.9 Basic array indexing . . . . . . . . . . . . . . . . . . . . . . 624.1.10 More advanced array indexing . . . . . . . . . . . . . . . . . 634.1.11 Slices refer the array data . . . . . . . . . . . . . . . . . . . . 644.1.12 Integer arrays as indices . . . . . . . . . . . . . . . . . . . . 664.1.13 Loops over arrays . . . . . . . . . . . . . . . . . . . . . . . . 674.1.14 Array computations . . . . . . . . . . . . . . . . . . . . . . . 704.1.15 In-place array arithmetics . . . . . . . . . . . . . . . . . . . 714.1.16 Standard math functions can take array arguments . . . . . . 744.1.17 Other useful array operations . . . . . . . . . . . . . . . . . . 75

Page 262: Introduction to Scientific Computing

262 // programming models CONTENTS

4.1.18 More useful array methods and attributes . . . . . . . . . . . 764.1.19 Complex number computing . . . . . . . . . . . . . . . . . . 784.1.20 A root function . . . . . . . . . . . . . . . . . . . . . . . . . 804.1.21 Array type and data type . . . . . . . . . . . . . . . . . . . . 814.1.22 Matrix objects . . . . . . . . . . . . . . . . . . . . . . . . . 83

4.2 NumPy: Vectorisation . . . . . . . . . . . . . . . . . . . . . . . . . . 854.2.1 Vectorisation of functions with if tests; problem . . . . . . . . 874.2.2 Vectorisation of functions with if tests; solutions . . . . . . . 894.2.3 General vectorization of if-else tests . . . . . . . . . . . . . . 914.2.4 Vectorization via slicing . . . . . . . . . . . . . . . . . . . . 92

4.3 NumPy: Random numbers . . . . . . . . . . . . . . . . . . . . . . . 934.4 NumPy: Basic linear algebra . . . . . . . . . . . . . . . . . . . . . . 95

4.4.1 A linear algebra session . . . . . . . . . . . . . . . . . . . . 964.5 Python: Plotting modules . . . . . . . . . . . . . . . . . . . . . . . . 974.6 I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

4.6.1 File I/O with arrays; plain ASCII format . . . . . . . . . . . . 101

Page 263: Introduction to Scientific Computing

263 // programming models CONTENTS

4.6.2 File I/O with arrays; binary pickling . . . . . . . . . . . . . . 1034.7 SciPy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

5 Solving Systems of Linear Equations 1065.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . 1065.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.2.1 Problem Transformation . . . . . . . . . . . . . . . . . . . . 1095.3 Triangular linear systems . . . . . . . . . . . . . . . . . . . . . . . . 1115.4 Elementary Elimination Matrices . . . . . . . . . . . . . . . . . . . . 1135.5 Gauss Elimination and LU Factorisation . . . . . . . . . . . . . . . . 1155.6 Number of operations in GEM . . . . . . . . . . . . . . . . . . . . . 1185.7 GEM with row permutations . . . . . . . . . . . . . . . . . . . . . . 1195.8 Reliability of the LU-factorisation with partial pivoting . . . . . . . . 127

6 BLAS (Basic Linear Algebra Subroutines) 1326.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Page 264: Introduction to Scientific Computing

264 // programming models CONTENTS

6.2 BLAS implementations . . . . . . . . . . . . . . . . . . . . . . . . . 138

7 Numerical Solution of Differential Equations 1397.1 Ordinary Differential Equations (ODE) . . . . . . . . . . . . . . . . 139

7.1.1 Numerical methods for solving ODEs . . . . . . . . . . . . . 1407.2 Partial Differential Equations (PDE) . . . . . . . . . . . . . . . . . . 1477.3 2nd order PDEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.4 Time-independent PDE-s . . . . . . . . . . . . . . . . . . . . . . . . 151

7.4.1 Finite Difference Method (FDM) . . . . . . . . . . . . . . . 1517.4.2 Finite element Method (FEM) . . . . . . . . . . . . . . . . . 156

7.5 FEM for more general cases . . . . . . . . . . . . . . . . . . . . . . 1697.6 FEM software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1707.7 Sparse matrix storage schemes . . . . . . . . . . . . . . . . . . . . . 172

7.7.1 Triple storage format . . . . . . . . . . . . . . . . . . . . . . 1737.7.2 Column-major storage format . . . . . . . . . . . . . . . . . 1747.7.3 Row-major storage format . . . . . . . . . . . . . . . . . . . 175

Page 265: Introduction to Scientific Computing

265 // programming models CONTENTS

7.7.4 Combined schemes . . . . . . . . . . . . . . . . . . . . . . . 176

8 Iterative methods 1788.1 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1788.2 Jacobi Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1808.3 Conjugate Gradient Method (CG) . . . . . . . . . . . . . . . . . . . 1838.4 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1868.5 Preconditioner examples . . . . . . . . . . . . . . . . . . . . . . . . 188

9 Some examples 1929.1 Google PageRank problem . . . . . . . . . . . . . . . . . . . . . . . 192

9.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1929.1.2 PageRank Problem Description . . . . . . . . . . . . . . . . 1959.1.3 Markov process . . . . . . . . . . . . . . . . . . . . . . . . . 1969.1.4 Mathematical formulation of PageRank problem . . . . . . . 1979.1.1 Power method . . . . . . . . . . . . . . . . . . . . . . . . . 2019.1.2 Transfer to a linear system solution . . . . . . . . . . . . . . 202

Page 266: Introduction to Scientific Computing

266 // programming models CONTENTS

9.2 Graph partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2069.2.1 Mathematical formulation . . . . . . . . . . . . . . . . . . . 2089.2.2 Modified problem . . . . . . . . . . . . . . . . . . . . . . . 2139.2.3 Method of Spectral Bisectioning . . . . . . . . . . . . . . . . 2159.2.4 Finding smallest positive eigenvalue of matrix LG . . . . . . . 2169.2.5 Power Method . . . . . . . . . . . . . . . . . . . . . . . . . 2169.2.6 Inverse Power Method. . . . . . . . . . . . . . . . . . . . . . 2199.2.7 Inverse Power Method with Projection . . . . . . . . . . . . . 221

10 Parallel Computing 22310.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22310.2 Design and Evaluation of Parallel Programs . . . . . . . . . . . . . . 229

10.2.1 Two main approaches in parallel program design . . . . . . . 22910.3 Assessing parallel programs . . . . . . . . . . . . . . . . . . . . . . 232

10.3.1 Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23210.3.2 Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

Page 267: Introduction to Scientific Computing

267 // programming models CONTENTS

10.3.3 Amdahl’s law . . . . . . . . . . . . . . . . . . . . . . . . . . 23410.3.4 Validity of Amdahl’s law . . . . . . . . . . . . . . . . . . . . 23510.3.5 Scaled efficiency . . . . . . . . . . . . . . . . . . . . . . . . 236

11 Parallel programming models 23711.1 HPF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23911.2 OpenMP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24211.3 GPGPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248

11.3.1 CUDA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24811.3.2 OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25111.3.3 OpenACC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254