effective numerical computation in numpy and scipy

35
Effective Numerical Computation in NumPy and SciPy Kimikazu Kato PyCon JP 2014 September 13, 2014 1 / 35

Upload: kimikazu-kato

Post on 22-Nov-2014

1.801 views

Category:

Technology


0 download

DESCRIPTION

Presented at PyCon JP 2014. Video is available at http://bit.ly/1tXYhw6 This talk explores case studies of effective usage of Numpy/Scipy and shows that the computational speed sometimes improves drastically with the appropriate derivation of formulas and performance-conscious implementation. I especially focus on scipy.sparse, the module for sparse matrices, which is often useful in the areas of machine learning and natural language processing.

TRANSCRIPT

Page 1: Effective Numerical Computation in NumPy and SciPy

Effective Numerical Computation in NumPy and SciPy

Kimikazu Kato

PyCon JP 2014

September 13, 2014

1 / 35

Page 2: Effective Numerical Computation in NumPy and SciPy

About Myself

Kimikazu KatoChief Scientists at Silver Egg Technology Co., Ltd.

Ph.D in Computer Science

Background in Mathematics, Numerical Computation, Algorithms, etc.

<2 year experience in Python>10 year experience in numerical computation

Now designing algorithms for recommendation system, and doing researchabout machine learning and data analysis.

2 / 35

Page 3: Effective Numerical Computation in NumPy and SciPy

This talk...

is about effective usage of NumPy/SciPyis NOT exhaustive introduction of capabilities, but shows some casestudies based on my experience and interest

3 / 35

Page 4: Effective Numerical Computation in NumPy and SciPy

Table of Contents

IntroductionBasics about NumPy

BroadcastingIndexing

Sparse matrixUsage of scipy.sparseInternal structure

Case studiesConclusion

4 / 35

Page 5: Effective Numerical Computation in NumPy and SciPy

Numerical Computation

Differential equationsSimulationsSignal processingMachine Learningetc...

Why Numerical Computation in Python?

ProductivityEasy to writeEasy to debug

Connectivity with visualization toolsMatplotlibIPython

Connectivity with web systemMany frameworks (Django, Pyramid, Flask, Bottle, etc.)

5 / 35

Page 6: Effective Numerical Computation in NumPy and SciPy

But Python is Very Slow!

Code in C

#include <stdio.h>int main() { int i; double s=0; for (i=1; i<=100000000; i++) s+=i; printf("%.0f\n",s);}

Code in Python

s=0.for i in xrange(1,100000001): s+=iprint s

Both of the codes compute the sum of integers from 1 to 100,000,000.

Result of benchmark in a certain environment:Above: 0.109 sec (compiled with -O3 option)Below: 8.657 sec(80+ times slower!!)

6 / 35

Page 7: Effective Numerical Computation in NumPy and SciPy

Better code

import numpy as npa=np.arange(1,100000001)print a.sum()

Now it takes 0.188 sec. (Measured by "time" command in Linux, loading timeincluded)

Still slower than C, but sufficiently fast as a script language.

7 / 35

Page 8: Effective Numerical Computation in NumPy and SciPy

Lessons

Python is very slow when written badlyTranslate C (or Java, C# etc.) code into Python is often a bad idea.Python-friendly rewriting sometimes result in drastic performanceimprovement

8 / 35

Page 9: Effective Numerical Computation in NumPy and SciPy

Basic rules for better performance

Avoid for-sentence as far as possibleUtilize libraries' capabilities insteadForget about the cost of copying memory

Typical C programmer might care about it, but ...

9 / 35

Page 10: Effective Numerical Computation in NumPy and SciPy

Basic techniques for NumPy

BroadcastingIndexing

10 / 35

Page 11: Effective Numerical Computation in NumPy and SciPy

Broadcasting

>>> import numpy as np>>> a=np.array([0,1,2])>>> a*3array([0, 3, 6])

>>> b=np.array([1,4,9])>>> np.sqrt(b)array([ 1., 2., 3.])

A function which is applied to each element when applied to an array is calleda universal function.

11 / 35

Page 12: Effective Numerical Computation in NumPy and SciPy

Broadcasting (2D)

>>> import numpy as np>>> a=np.arange(9).reshape((3,3))>>> b=np.array([1,2,3])>>> aarray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])>>> barray([1, 2, 3])>>> a*barray([[ 0, 2, 6], [ 3, 8, 15], [ 6, 14, 24]])

12 / 35

Page 13: Effective Numerical Computation in NumPy and SciPy

Indexing

>>> import numpy as np>>> a=np.arange(10)>>> aarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>> indices=np.arange(0,10,2)>>> indicesarray([0, 2, 4, 6, 8])>>> a[indices]=0>>> aarray([0, 1, 0, 3, 0, 5, 0, 7, 0, 9])>>> b=np.arange(100,600,100)>>> barray([100, 200, 300, 400, 500])>>> a[indices]=b>>> aarray([100, 1, 200, 3, 300, 5, 400, 7, 500, 9])

13 / 35

Page 14: Effective Numerical Computation in NumPy and SciPy

Refernces

Gabriele Lanaro, "Python High Performance Programming," PacktPublishing, 2013.Stéfan van der Walt, Numpy Medkit

14 / 35

Page 15: Effective Numerical Computation in NumPy and SciPy

Sparse matrix

Defined as a matrix in which most elements are zeroCompressed data structure is used to express it, so that it will be...

Space effectiveTime effective

15 / 35

Page 16: Effective Numerical Computation in NumPy and SciPy

scipy.sparse

The class scipy.sparse has mainly three types as expressions of a sparsematrix. (There are other types but not mentioned here)

lil_matrix : convenient to set data; setting a[i,j] is fastcsr_matrix : convenient for computation, fast to retrieve a rowcsc_matrix : convenient for computation, fast to retrieve a column

Usually, set the data into lil_matrix, and then, convert it to csc_matrix orcsr_matrix.

For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast,but you should avoid calculation of different types.

16 / 35

Page 17: Effective Numerical Computation in NumPy and SciPy

Use case

>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,0]=1.; a[0,2]=2.>>> a=a.tocsr()>>> print a (0, 0) 1.0 (0, 2) 2.0>>> a.todense()matrix([[ 1., 0., 2.], [ 0., 0., 0.], [ 0., 0., 0.]])>>> b=lil_matrix((3,3))>>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5.>>> b=b.tocsr()>>> b.todense()matrix([[ 0., 0., 0.], [ 0., 3., 0.], [ 4., 0., 5.]])>>> c=a.dot(b)>>> c.todense()matrix([[ 8., 0., 10.], [ 0., 0., 0.], [ 0., 0., 0.]])>>> d=a+b>>> d.todense()matrix([[ 1., 0., 2.], [ 0., 3., 0.], [ 4., 0., 5.]]) 17 / 35

Page 18: Effective Numerical Computation in NumPy and SciPy

Internal structure: csr_matrix

>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.>>> b=a.tocsr()>>> b.todense()matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]])>>> b.indicesarray([1, 2, 2, 0, 1], dtype=int32)>>> b.dataarray([ 1., 2., 3., 4., 5.])>>> b.indptrarray([0, 2, 3, 5], dtype=int32)

18 / 35

Page 19: Effective Numerical Computation in NumPy and SciPy

Internal structure: csc_matrix

>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.>>> b=a.tocsc()>>> b.todense()matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]])>>> b.indicesarray([2, 0, 2, 0, 1], dtype=int32)>>> b.dataarray([ 4., 1., 5., 2., 3.])>>> b.indptrarray([0, 1, 3, 5], dtype=int32)

19 / 35

Page 20: Effective Numerical Computation in NumPy and SciPy

Merit of knowing the internal structure

Setting csr_matrix or csc_matrix with its internal structure is much faster thansetting lil_matrix with indices.

See the benchmark of setting

⎜⎜⎜⎜⎜⎜⎜⎜

2 12 1

⋱ ⋱

⋱ 12

⎟⎟⎟⎟⎟⎟⎟⎟

20 / 35

Page 21: Effective Numerical Computation in NumPy and SciPy

from scipy.sparse import lil_matrix, csr_matriximport numpy as npfrom timeit import timeit

def set_lil(n): a=lil_matrix((n,n)) for i in xrange(n): a[i,i]=2. if i+1<n: a[i,i+1]=1. return a

def set_csr(n): data=np.empty(2*n-1) indices=np.empty(2*n-1,dtype=np.int32) indptr=np.empty(n+1,dtype=np.int32) # to be fair, for-sentence is intentionally used # (using indexing technique is faster) for i in xrange(n): indices[2*i]=i data[2*i]=2. if i<n-1: indices[2*i+1]=i+1 data[2*i+1]=1. indptr[i]=2*i indptr[n]=2*n-1 a=csr_matrix((data,indices,indptr),shape=(n,n)) return a

print "lil:",timeit("set_lil(10000)", number=10,setup="from __main__ import set_lil")print "csr:",timeit("set_csr(10000)", number=10,setup="from __main__ import set_csr")

21 / 35

Page 22: Effective Numerical Computation in NumPy and SciPy

Result:

lil: 11.6730761528csr: 0.0562081336975

Remark

When you deal with already sorted data, setting csr_matrix or csc_matrixwith data, indices, indptr is much faster than setting lil_matrixBut the code tend to be more complicated if you use the internal structureof csr_matrix or csc_matrix

22 / 35

Page 23: Effective Numerical Computation in NumPy and SciPy

Case Studies

23 / 35

Page 24: Effective Numerical Computation in NumPy and SciPy

Case 1: Norms

If is dense:

norm=np.dot(v,v)

Expressed as product of matrices. (dot means matrix product, but you don'thave to take transpose explicitly.)

When is sparse, suppose that is expressed as matrix:

norm=v.multiply(v).sum()

(multiply() is element-wise product)

This is because taking transpose of a sparse matrix changes the type.

∥v =∥2 ∑i

v2i

v

v v 1 × n

24 / 35

Page 25: Effective Numerical Computation in NumPy and SciPy

Frobenius norm:

norm=a.multiply(a).sum()

=∥A∥Fro ∑ij

a2ij

25 / 35

Page 26: Effective Numerical Computation in NumPy and SciPy

Case 2: Applying a function to all of the elements of asparse matrix

A universal function can be applied to a dense matrix:

>>> import numpy as np>>> a=np.arange(9).reshape((3,3))>>> aarray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])>>> np.tanh(a)array([[ 0. , 0.76159416, 0.96402758], [ 0.99505475, 0.9993293 , 0.9999092 ], [ 0.99998771, 0.99999834, 0.99999977]])

This is convenient and fast.

However, we cannot do the same thing for a sparse matrix.

26 / 35

Page 27: Effective Numerical Computation in NumPy and SciPy

>>> from scipy.sparse import lil_matrix>>> a=lil_matrix((3,3))>>> a[0,0]=1.>>> a[1,0]=2.>>> b=a.tocsr()>>> np.tanh(b)<3x3 sparse matrix of type '<type 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>

This is because, for an arbitrary function, its application to a sparse matrix isnot necessarily sparse.

However, if a universal function satisfies , the density ispreserved.

Then, how can we compute it?

f f(0) = 0

27 / 35

Page 28: Effective Numerical Computation in NumPy and SciPy

Use the internal structure!!

The positions of the non-zero elements are not changed after application ofthe function.

Keep indices and indptr, and just change data.

Solution:

b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape)

28 / 35

Page 29: Effective Numerical Computation in NumPy and SciPy

Case 3: Formula which appears in a paper

In the algorithm for recommendation system [1], the following formulaappears:

where is dense matrix, and D is a diagonal matrix defined from agiven array as:

Here, (which corresponds to the number of users or items) is big and (which means the number of latent factors) is small.

[1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM,2008.

⋅ D ⋅ AAT

A n × f( )di

D =

⎝⎜⎜⎜⎜⎜

d1

d2

⋱dn

⎠⎟⎟⎟⎟⎟

n f

29 / 35

Page 30: Effective Numerical Computation in NumPy and SciPy

Solution 1:

There is a special class dia_matrix to deal with a diagonal sparse matrix.

import scipy.sparse as sparseimport numpy as np

def f(a,d): """a: 2d array of shape (n,f), d: 1d array of length n""" dd=sparse.diags([d],[0]) return np.dot(a.T,dd.dot(a))

30 / 35

Page 31: Effective Numerical Computation in NumPy and SciPy

Solution 2:

Pack csr_matrix with data,indices,indptr

data=dindices=[0,1,..,n]indptr=[0,1,...,n+1]

def g(a,d): n,f=a.shape data=d indices=np.arange(n) indptr=np.arange(n+1) dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) return np.dot(a.T,dd.dot(a))

31 / 35

Page 32: Effective Numerical Computation in NumPy and SciPy

Solution 3:

This is equivalent to the broadcasting!

def h(a,d): return np.dot(a.T*d,a)

( D)A = × × AAT

⎝⎜⎜⎜⎜

a11

a12

⋮a1m

a21

a22

⋮a2m

⋯⋯

an1

an2

⋮anm

⎠⎟⎟⎟⎟

⎝⎜⎜⎜⎜⎜

d1

d2

⋱dn

⎠⎟⎟⎟⎟⎟

= × A

⎝⎜⎜⎜⎜

a11d1

a12d1

⋮a1md1

a21d2

a22d2

⋮a2md2

⋯⋯

an1dn

an2dn

⋮anmdn

⎠⎟⎟⎟⎟

32 / 35

Page 33: Effective Numerical Computation in NumPy and SciPy

Benchmark

def datagen(n,f): np.random.seed(0) a=np.random.random((n,f)) d=np.random.random(n) return a,d

from timeit import timeitprint "dia_matrix :",timeit("f(a,d)",number=10, setup="from __main__ import f,datagen; a,d=datagen(1000000,10)")print "csr_matrix :",timeit("g(a,d)",number=10, setup="from __main__ import g,datagen; a,d=datagen(1000000,10)")print "broadcasting :",timeit("h(a,d)",number=10, setup="from __main__ import h,datagen; a,d=datagen(1000000,10)")

Result:

dia_matrix : 1.60458707809csr_matrix : 1.32580018044broadcasting : 1.30032682419

33 / 35

Page 34: Effective Numerical Computation in NumPy and SciPy

Conclusion

Try not to use for-sentence, but use libraries' capabilities instead.Knowledge about the internal structure of the sparse matrix is useful toextract further performance.Mathematical derivation is important. The key is to find a mathematicallyequivalent and Python-friendly formula.Computational speed does not necessarily matter. Finding a better code ina short time is valuable. Otherwise, you shouldn't pursue too much.

34 / 35

Page 35: Effective Numerical Computation in NumPy and SciPy

Acknowledgment

I would like to thank

(@shima__shima)who gave me useful advice in Twitter.

35 / 35