python as number crunching code glue

51
Python as number crunching glue Jiahao Chen [email protected] @mitpostdoc theochem.mit.edu 1 Thursday, September 22, 2011

Upload: jiahao-chen

Post on 27-Jan-2015

145 views

Category:

Education


6 download

DESCRIPTION

Presented to the Boston Python User Group on 9/21/2011

TRANSCRIPT

Page 1: Python as number crunching code glue

Python as number crunching glue

Jiahao [email protected]@mitpostdoc

theochem.mit.edu

1Thursday, September 22, 2011

Page 2: Python as number crunching code glue

This is not a crash course on scientific computing or numerical linear algebraRecommended texts:

2

nr.com

Thursday, September 22, 2011

Page 3: Python as number crunching code glue

NumPy and SciPyHow to say:

NumPy: no official pronunciation

SciPy: “sigh pie”

3Thursday, September 22, 2011

Page 4: Python as number crunching code glue

NumPy and SciPyHow to say:

NumPy: no official pronunciation

SciPy: “sigh pie”

3

Where to get:

scipy.org, numpy.scipy.org

You might already have it

Otherwise, have fun installing it ;)

Thursday, September 22, 2011

Page 5: Python as number crunching code glue

You may already know how to use numpy/scipy!

Similar to Matlab, Octave, Scilab, R.

see:http://mathesaurus.sourceforge.net/

In many cases, Matlab/Octave/Scilab code can be translated easily to use numpy+scipy+matplotlib.

Other interfaces exist: e.g. mlabwrap lets you wrap Python around Matlab.

4Thursday, September 22, 2011

Page 6: Python as number crunching code glue

Approximately continuous arithmeticfloating point*

- vs -

Exact discrete arithmeticbooleans, integers, strings, ...

*David Goldberg, “What every computer scientist should know about floating-point arithmetic”

5Thursday, September 22, 2011

Page 7: Python as number crunching code glue

Using numpy can make code cleaner

6

a = range(10000000)b = range(10000000)c = []

for i in range(len(a)): c.append(a[i] + b[i])

import numpy as npa = np.arange(10000000)b = np.arange(10000000)c = a + b

What’s different??

Thursday, September 22, 2011

Page 8: Python as number crunching code glue

What’s different?

7

a = range(10000000)b = range(10000000)c = [] #a+b is concatenation

for i in range(len(a)): c.append(a[i] + b[i])

import numpy as npa = np.arange(10000000)b = np.arange(10000000)c = a + b #vectorized addition

Using numpy can save lots of time

0.333s7.050s (21x)

a convenient interface to compiled C/Fortran libraries: BLAS, LAPACK, FFTW, UMFPACK,...

creates list ofdynamically typed int

creates ndarray ofstatically typed int

Thursday, September 22, 2011

Page 9: Python as number crunching code glue

Numerical sw stack

8

PythonBLAS

NumPy

SciPy

FFTW

...

linearalgebra

Fouriertransforms

External Fortran/C

Your code

LAPACK

...

Thursday, September 22, 2011

Page 10: Python as number crunching code glue

“One thing that graduate students eventually learn is that you can hide just about anything in a NxN matrix... (for sufficiently large N)” - anonymous string theorist

9Thursday, September 22, 2011

Page 11: Python as number crunching code glue

“One thing that graduate students eventually learn is that you can hide just about anything in a NxN matrix... (for sufficiently large N)” - anonymous string theorist

9

If your data can be put into a matrix/vector, numpy/scipy can help you!

Thursday, September 22, 2011

Page 12: Python as number crunching code glue

You may already be working with matrix/vector data...

10

bitmap/video waveform

database table text differential

equation model

graph

Thursday, September 22, 2011

Page 13: Python as number crunching code glue

11

# Chapter NumPy SciPy

1 Scientific Computing2 Systems of linear equations X X

3 Linear least squares X

4 Eigenvalue problems X X

5 Nonlinear equations X

6 Optimization X

7 Interpolation X

8 Numerical integration and differntiation X

9 Initial value problems for ODEs X

10 Boundary value problems for ODEs X

11 Partial differential equations X

12 Fast Fourier Transform X

13 Random numbers and stochastic simulation X

Table of contents from Michael Heath’s textbook

Thursday, September 22, 2011

Page 14: Python as number crunching code glue

Outline:

* NumPy: explicit data typing with dtypes : array manipulation with ndarrays

* SciPy: high-level numerical routines : use cases

* NumPy/SciPy as code glue: f2py and weave

12Thursday, September 22, 2011

Page 15: Python as number crunching code glue

The most fundamental object in NumPy is the ndarray (N-dimensional array)

v[:] vector M[:,:] matrix x[:,:,...,:] higher order tensor

unlike built-in Python data types,ndarrays are designed forhomogeneous, explicitly typed data

13Thursday, September 22, 2011

Page 16: Python as number crunching code glue

numpy primitive dtypes

14

Bits Boolean Signedinteger

Unsignedinteger Float Complex

8 bool int8 uint816 int16 uint1632 int32 uint32 float32

64int intp uint float

float64 complex6464int64 uint64

floatfloat64 complex64

128 float128 complex128256 complex256

dtypes bring explicit typing to Python

Thursday, September 22, 2011

Page 17: Python as number crunching code glue

>>> mol = np.array(mol, dtype={'atomicnum':('uint8',0), 'coords':('3float64',1)})>>> mol['atomicnum']array([8, 1, 1], dtype=uint8)

Recarray: ndarray of data structure with named fields (record)

15

Structured array: ndarray of data structure

>>> mol = np.zeros(3, dtype=('uint8, 3float64'))>>> mol[0] = 8, (-0.464, 0.177, 0.0)>>> mol[1] = 1, (-0.464, 1.137, 0.0)>>> mol[2] = 1, (0.441, -0.143, 0.0)>>> molarray([(8, [-0.46400000000000002, 0.17699999999999999, 0.0]), (1, [-0.46400000000000002, 1.137, 0.0]), (1, [0.441, -0.14299999999999999, 0.0])], dtype=[('f0', '|u1'), ('f1', '<f8', (3,))])

Thursday, September 22, 2011

Page 18: Python as number crunching code glue

The most fundamental object in NumPy is the ndarray (N-dimensional array)In 2D, the matrix class is also useful, especially when porting Matlab/Octave code.* For matrices, a*b is matrix multiply. For ndarrays, a*b is elementwise multiply.

* Matrices have convenient attributes: M.T transpose of M M.H Hermitian conjugate of M M.I matrix inverse of M

* Matrices are always 2D, no matter how you manipulate them. ****** This can lead to some very severe, insidious bugs. ******

using asarray() and asmatrix() views allows the best of both worlds.see: http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#matrix-objects

16Thursday, September 22, 2011

Page 19: Python as number crunching code glue

Memory layout of matrices

column major: first dimension is contiguous in memory Fortran, Matlab, R,...

row major: last dimension is contiguous in memory C, Java, numpy,...

Why you should care:• Cache coherence• Transposing a matrix is very expensive

17Thursday, September 22, 2011

Page 20: Python as number crunching code glue

• from Python iterable: lists, tuples,...e.g. array([1, 2, 3]) == asarray((1, 2, 3))• from intrinsic functionsempty() allocates memory onlyzeros() initializes to 0ones() initializes to 1arange() creates a uniform rangerand() initializes to uniform randomrandn() initializes to standard normal random...• from binary representation in string/buffer• from file on disk

18

Creating ndarrays

Thursday, September 22, 2011

Page 21: Python as number crunching code glue

fromfunction() creates an ndarray whose entries are functions of its indices

e.g. the Hilbert matrix

>>> np.fromfunction(lambda i,j: 1./(i+j+1), (4,4))array([[ 1. , 0.5 , 0.33333333, 0.25 ], [ 0.5 , 0.33333333, 0.25 , 0.2 ], [ 0.33333333, 0.25 , 0.2 , 0.16666667], [ 0.25 , 0.2 , 0.16666667, 0.14285714]])

19

1..n

Generating ndarrays

Thursday, September 22, 2011

Page 22: Python as number crunching code glue

arange(): like range() but accepts floats>>> import numpy as np>>> np.arange(2, 2.5, 0.1)array([ 2. , 2.1, 2.2, 2.3, 2.4])

linspace(): creates array with specified number of elements, spaced equally between the specified beginning and ending.>>> np.linspace(2.0, 2.4, 5)array([ 2. , 2.1, 2.2, 2.3, 2.4])

20

Generating ndarrays

Thursday, September 22, 2011

Page 23: Python as number crunching code glue

21

ndarray native I/OFormat Reader Writer

pickle pickle.loads() dumps()pickle

np.load()

dumps()

NPY np.load() np.save()NPZ

np.load()np.savez()

Memory map np.memmapnp.memmap

NPY is numpy’s native binary formatNPZ is a zip file of NPYsMemory map: a class useful for handling huge matrices won’t load entire matrix into memory

Thursday, September 22, 2011

Page 24: Python as number crunching code glue

22

ndarray text I/OFormat Reader Writer

Stringeval() np.array_repr()

Stringor below with StringIOor below with StringIO

Text filenp.loadtxt()

np.genfromtxt()np.recfromtxt()

savetxt()

CSV np.recfromcsv()Matrix Market scipy.io.mmread() mmwrite()

Thursday, September 22, 2011

Page 25: Python as number crunching code glue

23

ndarray binary I/OFormat Reader WriterList np.array() ndarray.tolist()

Stringnp.fromstring() tostring()

Stringor below with StringIOor below with StringIO

Raw binary file

scipy.io.numpyio.fread() ndarray.fromfile()

fwrite().tofile()

MATLAB scipy.io.loadmat() savemat()netCDF scipy.io.netcdf.netcdf_filescipy.io.netcdf.netcdf_file

WAV audio scipy.io.wavfile.read() write()Image

(via PIL)scipy.misc.imread()

scipy.misc.fromimage()imsave()toimage()

Also video (OpenCV), HDF5 (PyTables), FITS (PyFITS)...Thursday, September 22, 2011

Page 26: Python as number crunching code glue

Indexing>>> x = np.arange(12).reshape(3,4); xarray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> x[1,2]6>>> x[2,-1]11>>> x[0][2]2>>> x[(2,2)]10>>> x[:1]array([[0, 1, 2, 3]])>>> x[::2,1:4:2]array([[ 1, 3], [ 9, 11]])

24

#slices return views, not copies

#tuple

row, then column

Thursday, September 22, 2011

Page 27: Python as number crunching code glue

Fancy indexing>>> x = np.arange(12).reshape(3,4); xarray([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> x[(2,2)]10>>> x[np.array([2,2])] #same as x[[2,2]]array([[ 8, 9, 10, 11], [ 8, 9, 10, 11]])>>> x[np.array([1,0]), np.array([2,1])]array([6, 1])>>> x[x>8]array([ 9, 10, 11])>>> x>8array([[False, False, False, False], [False, False, False, False], [False, True, True, True]], dtype=bool)

25

array index

Boolean mask

Thursday, September 22, 2011

Page 28: Python as number crunching code glue

Fancy indexing II>>> y = np.arange(1*2*3*4).reshape(1,2,3,4); yarray([[[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],

[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]])

>>> y[0, Ellipsis, 0] # == y[0, ..., 0] == [0,:,:,0]array([[ 0, 4, 8], [12, 16, 20]])>>> y[0, 0, 0, slice(2,4)] # == y[(0, 0, 0, 2:4)]array([2, 3])

26Thursday, September 22, 2011

Page 29: Python as number crunching code glue

Broadcasting

>>> x #.shape = (3,4)array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])>>> y #.shape = (1,2,3,4)array([[[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]],

[[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]])

27

>>> y * xarray([[[[ 0, 1, 4, 9], [ 16, 25, 36, 49], [ 64, 81, 100, 121]],

[[ 0, 13, 28, 45], [ 64, 85, 108, 133], [160, 189, 220, 253]]]])

What happens when you multiply ndarrays of different dimensions?

Case I: trailing dimensions match

Thursday, September 22, 2011

Page 30: Python as number crunching code glue

Broadcasting

>>> a = np.arange(4); aarray([0, 1, 2, 3])>>> b = np.arange(4)[::-1]; barray([3, 2, 1, 0])>>> a + barray([3, 3, 3, 3])

28

What happens when you multiply ndarrays of different dimensions?

Case II: trailing dimension is 1>>> b.shape = 4,1>>> a + barray([[3, 4, 5, 6], [2, 3, 4, 5], [1, 2, 3, 4], [0, 1, 2, 3]])

>>> b.shape = 1,4>>> a + barray([[3, 3, 3, 3]])

Thursday, September 22, 2011

Page 31: Python as number crunching code glue

In 2D, the matrix class is often more useful than ndarrays, especially when porting Matlab/Octave code.* For matrices, a*b is matrix multiply. For ndarrays, a*b is elementwise multiply.

* Matrices have convenient attributes: M.T transpose of M M.H Hermitian conjugate of M M.I matrix inverse of M

* Matrices are always 2D, no matter how you manipulate them. ****** This can lead to some very severe, insidious bugs. ******

using asarray() and asmatrix() views allows the best of both worlds.see: http://docs.scipy.org/doc/numpy/reference/arrays.classes.html#matrix-objects

29

Matrix operations

Thursday, September 22, 2011

Page 32: Python as number crunching code glue

Matrix functionsYou can apply a function elementwise to a matrix...>>> from numpy import array, exp>>> X = array([[1, 1], [1, 0]])>>> exp(X)array([[ 2.71828183, 2.71828183], [ 2.71828183, 1.]])

...or a matrix version of that function>>> from scipy.linalg import expm>>> expm(X)array([[ 2.71828183, 7.3890561 ], [ 1. , 2.71828183]])

other functions in scipy.linalg.matfuncs30

Thursday, September 22, 2011

Page 33: Python as number crunching code glue

SciPy by example

* Data fitting

* Signal matching

* Disease outbreak modeling (epidemiology)

31

http://scipy-central.org/

Thursday, September 22, 2011

Page 34: Python as number crunching code glue

Least-squares curve fittingfrom scipy import *from scipy.optimize import leastsqfrom matplotlib.pyplot import plot

#Make up data x(t) with Gaussian noisenum_points = 150t = linspace(5, 8, num_points)x = 11.86*cos(2*pi/0.81*t-1.32) + 0.64*t\ +4*((0.5-rand(num_points))*\ exp(2*rand(num_points)**2))

# Target functionmodel = lambda p, x: \ p[0]*cos(2*pi/p[1]*x+p[2]) + p[3]*x# Distance to the target functionerror = lambda p, x, y: model(p, x) - y# Initial guess for the parametersp0 = [-15., 0.8, 0., -1.]p1, _ = leastsq(error, p0, args=(t, x))

t2 = linspace(t.min(), t.max(), 100)plot(t, x, "ro", t2, model(p1, t2), "b-")raw_input()

32

fit data to model

Thursday, September 22, 2011

Page 35: Python as number crunching code glue

Matching signalsSuppose I have a short audio clip

that I know to be part of a larger file

How can I figure out its offset?

Problem: naïve matching scales as O(N2)

33Thursday, September 22, 2011

Page 36: Python as number crunching code glue

An O(N lg N) solutionNaïve matching scales as O(N2)How can we do faster?

phase correlation

Exploit Fourier transforms: they encode relative offsets in complex phase

34

60o

1/6Thursday, September 22, 2011

Page 37: Python as number crunching code glue

From math to code

35Thursday, September 22, 2011

Page 38: Python as number crunching code glue

From math to code

35

import numpy

#Make up some dataN = 30000idx = 24700size = 300data = numpy.random.rand(N)frag_pad = numpy.zeros(N)frag = data[idx:idx+size]frag_pad[:size] = frag

#Compute phase correlationdata_ft = numpy.fft.rfft(data)frag_ft = numpy.fft.rfft(frag_pad)phase = data_ft * numpy.conj(frag_ft)phase /= abs(phase)cross_correlation = numpy.fft.irfft(phase)offset = numpy.argmax(cross_correlation)

print 'Input offset: %d, computed: %d' % (idx, offset)from matplotlib.pyplot import plotplot(cross_correlation)raw_input() #Pause

Thursday, September 22, 2011

Page 39: Python as number crunching code glue

From math to code

35

import numpy

#Make up some dataN = 30000idx = 24700size = 300data = numpy.random.rand(N)frag_pad = numpy.zeros(N)frag = data[idx:idx+size]frag_pad[:size] = frag

#Compute phase correlationdata_ft = numpy.fft.rfft(data)frag_ft = numpy.fft.rfft(frag_pad)phase = data_ft * numpy.conj(frag_ft)phase /= abs(phase)cross_correlation = numpy.fft.irfft(phase)offset = numpy.argmax(cross_correlation)

print 'Input offset: %d, computed: %d' % (idx, offset)from matplotlib.pyplot import plotplot(cross_correlation)raw_input() #Pause

Thursday, September 22, 2011

Page 41: Python as number crunching code glue

Modeling a zombie apocalypse

37

http://www.scipy.org/Cookbook/Zombie_Apocalypse_ODEINT

Normal (S) Zombie Dead (R)

Each person can be in one of three states

Thursday, September 22, 2011

Page 42: Python as number crunching code glue

Modeling a zombie apocalypse

38

http://www.scipy.org/Cookbook/Zombie_Apocalypse_ODEINT

Normal (S) Zombie Dead (R)

Various processes connect these states

birth (P) normal death

+

resurrection (G)transmission (B)

destruction (A)

Thursday, September 22, 2011

Page 43: Python as number crunching code glue

from numpy import linspacefrom scipy.integrate import odeint

P = 0 # birth rated = 0.0001 # natural death rateB = 0.0095 # transmission rateG = 0.0001 # resurrection rateA = 0.0001 # destruction ratedef f(y, t): Si, Zi, Ri = y return [P - B*Si*Zi - d*Si, B*Si*Zi + G*Ri - A*Si*Zi, d*Si + A*Si*Zi - G*Ri]

y0 = [500, 0, 0] # initial conditionst = linspace(0, 5., 1000) # time grid

soln = odeint(f, y0, t) # solve ODES, Z, R = soln[:, :].T

From math to code

39

http://www.scipy.org/Cookbook/Zombie_Apocalypse_ODEINT

S Z R

r d+

GB

A

Thursday, September 22, 2011

Page 44: Python as number crunching code glue

Using external code“NumPy can get you most of the way to compiled speeds through vectorization. In situations where you still need the last ounce of speed in a critical section, or when it either requires a PhD in NumPy-ology to vectorize the solution or it results in too much memory overhead, you can reach for Cython or Weave. If you already know C/C++, then weave is a simple and speedy solution. If, however, you are not already familiar with C then you may find Cython to be exactly what you are looking for to get the speed you need out of Python.” - Travis Oliphant, 2011-06-20

see:http://www.scipy.org/PerformancePythonhttp://technicaldiscovery.blogspot.com/2011/06/speeding-up-python-numpy-cython-and.html

40Thursday, September 22, 2011

Page 45: Python as number crunching code glue

Python as code glue- numpy.f2py: wraps * C, Fortran 77/90/95 functions * Fortran 90/95 module data * Fortran 77 COMMON blocks

- scipy.weave * .inline: compiles & runs C/C++ code manipulating Python scalars/ndarrays * .blitz: interfaces with Blitz++

Other wrapper libraries and programs: seehttp://scipy.org/Topical_Software

41Thursday, September 22, 2011

Page 46: Python as number crunching code glue

numpy.f2py: Fortran/C

$ cat>invsqrt.f real*8 function invsqrt (a) real*8 a invsqrt = 1.0/sqrt(a) end

$ f2py -c -m invsqrt invsqrt.f$ python -c 'import invsqrt; print invsqrt.invsqrt(4)'0.5

see: http://www.scipy.org/F2py

42

$ cat>invsqrt.c#include <math.h>double invsqrt(a) { return 1.0/sqrt(a);}$ cat>invsqrt.mpython module invsqrtinterface real*8 function invsqrt(x) intent(c) :: invsqrt real*8 intent(in) :: x end function invsqrtend interfaceend python module invsqrt$ f2py invsqrt.m invsqrt.c -c$ python -c 'import invsqrt; print invsqrt.invsqrt(4)'0.5

Thursday, September 22, 2011

Page 47: Python as number crunching code glue

scipy.weave.inline

>>> from scipy.weave import inline>>> x = 4.0>>> inline('return_val = 1./sqrt(x));',['x'])0.5

see: https://github.com/scipy/scipy/blob/master/scipy/weave/doc/tutorial.txt

43

inline Extension

pythonscipyweave

distutilscore

on-the-flycompiledC/C++program

Thursday, September 22, 2011

Page 48: Python as number crunching code glue

scipy.weave.blitzUses the Blitz++ numerical library for C++Converts between ndarrays and Blitz arrays>>> # Computes five-point average using numpy and weave.blitz>>> import numpy import empty>>> from scipy.weave import blitz>>> a = numpy.zeros((4096,4096)); c = numpy.zeros((4096, 4096))>>> b = numpy.random.randn(4096,4096)>>> c[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] + b[1:-1,2:] + b[1:-1,:-2]) / 5.0>>> blitz("a[1:-1,1:-1] = (b[1:-1,1:-1] + b[2:,1:-1] + b[:-2,1:-1] + b[1:-1,2:] + b[1:-1,:-2]) / 5.")>>> (a == c).all()True

see:https://github.com/scipy/scipy/blob/master/scipy/weave/doc/tutorial.txt

44Thursday, September 22, 2011

Page 49: Python as number crunching code glue

ParallelizationThe easy way: numpy/scipy’s primitives automatically use vectorization compiled into external BLAS/LAPACK/... libraries

The usual way:- MPI interfaces (mpi4py,...)- Python threads/multiprocessing/...- OpenMP/pthreads... in external C/Fortran

see:http://www.scipy.org/ParallelProgramming

45Thursday, September 22, 2011

Page 50: Python as number crunching code glue

How I use NumPy/Scipy

46

Text input

Matrices Test model Visualize

Text output

scipy.optimizeQuasi-Newton optimizers

External binary

Binary outputndarray.

fromfile()

Thursday, September 22, 2011

Page 51: Python as number crunching code glue

Beyond NumPy/SciPy

47

Python

NumPy

SciPyExternal Fortran/C

My script

CVXOpt

many more examples at http://www.scipy.org/Topical_Software

PyTables VTK matplotlib

My interactive session

PylabHDF5

file I/Onumerical

optimization

visualization

PyMol

moleculeviz.

plots

Thursday, September 22, 2011