effective numerical computation in numpy and scipy

Post on 22-Nov-2014

1.801 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presented at PyCon JP 2014. Video is available at http://bit.ly/1tXYhw6 This talk explores case studies of effective usage of Numpy/Scipy and shows that the computational speed sometimes improves drastically with the appropriate derivation of formulas and performance-conscious implementation. I especially focus on scipy.sparse, the module for sparse matrices, which is often useful in the areas of machine learning and natural language processing.

TRANSCRIPT

Effective Numerical Computation in NumPy and SciPy

Kimikazu Kato

PyCon JP 2014

September 13, 2014

1 / 35

About Myself

Kimikazu KatoChief Scientists at Silver Egg Technology Co., Ltd.

Ph.D in Computer Science

Background in Mathematics, Numerical Computation, Algorithms, etc.

<2 year experience in Python>10 year experience in numerical computation

Now designing algorithms for recommendation system, and doing researchabout machine learning and data analysis.

2 / 35

This talk...

is about effective usage of NumPy/SciPyis NOT exhaustive introduction of capabilities, but shows some casestudies based on my experience and interest

3 / 35

Table of Contents

IntroductionBasics about NumPy

BroadcastingIndexing

Sparse matrixUsage of scipy.sparseInternal structure

Case studiesConclusion

4 / 35

Numerical Computation

Differential equationsSimulationsSignal processingMachine Learningetc...

Why Numerical Computation in Python?

ProductivityEasy to writeEasy to debug

Connectivity with visualization toolsMatplotlibIPython

Connectivity with web systemMany frameworks (Django, Pyramid, Flask, Bottle, etc.)

5 / 35

But Python is Very Slow!

Code in C

#include <stdio.h>int main() { int i; double s=0; for (i=1; i<=100000000; i++) s+=i; printf("%.0f\n",s);}

Code in Python

s=0.for i in xrange(1,100000001): s+=iprint s

Both of the codes compute the sum of integers from 1 to 100,000,000.

Result of benchmark in a certain environment:Above: 0.109 sec (compiled with -O3 option)Below: 8.657 sec(80+ times slower!!)

6 / 35

Better code

import numpy as npa=np.arange(1,100000001)print a.sum()

Now it takes 0.188 sec. (Measured by "time" command in Linux, loading timeincluded)

Still slower than C, but sufficiently fast as a script language.

7 / 35

Lessons

Python is very slow when written badlyTranslate C (or Java, C# etc.) code into Python is often a bad idea.Python-friendly rewriting sometimes result in drastic performanceimprovement

8 / 35

Basic rules for better performance

Avoid for-sentence as far as possibleUtilize libraries' capabilities insteadForget about the cost of copying memory

Typical C programmer might care about it, but ...

9 / 35

Basic techniques for NumPy

BroadcastingIndexing

10 / 35

Broadcasting

>>> import numpy as np>>> a=np.array([0,1,2])>>> a*3array([0, 3, 6])

>>> b=np.array([1,4,9])>>> np.sqrt(b)array([ 1., 2., 3.])

A function which is applied to each element when applied to an array is calleda universal function.

11 / 35

Broadcasting (2D)

>>> import numpy as np>>> a=np.arange(9).reshape((3,3))>>> b=np.array([1,2,3])>>> aarray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])>>> barray([1, 2, 3])>>> a*barray([[ 0, 2, 6], [ 3, 8, 15], [ 6, 14, 24]])

12 / 35

Indexing

>>> import numpy as np>>> a=np.arange(10)>>> aarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>>> indices=np.arange(0,10,2)>>> indicesarray([0, 2, 4, 6, 8])>>> a[indices]=0>>> aarray([0, 1, 0, 3, 0, 5, 0, 7, 0, 9])>>> b=np.arange(100,600,100)>>> barray([100, 200, 300, 400, 500])>>> a[indices]=b>>> aarray([100, 1, 200, 3, 300, 5, 400, 7, 500, 9])

13 / 35

Refernces

Gabriele Lanaro, "Python High Performance Programming," PacktPublishing, 2013.Stéfan van der Walt, Numpy Medkit

14 / 35

Sparse matrix

Defined as a matrix in which most elements are zeroCompressed data structure is used to express it, so that it will be...

Space effectiveTime effective

15 / 35

scipy.sparse

The class scipy.sparse has mainly three types as expressions of a sparsematrix. (There are other types but not mentioned here)

lil_matrix : convenient to set data; setting a[i,j] is fastcsr_matrix : convenient for computation, fast to retrieve a rowcsc_matrix : convenient for computation, fast to retrieve a column

Usually, set the data into lil_matrix, and then, convert it to csc_matrix orcsr_matrix.

For csr_matrix, and csc_matrix, calcutaion of matrices of the same type is fast,but you should avoid calculation of different types.

16 / 35

Use case

>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,0]=1.; a[0,2]=2.>>> a=a.tocsr()>>> print a (0, 0) 1.0 (0, 2) 2.0>>> a.todense()matrix([[ 1., 0., 2.], [ 0., 0., 0.], [ 0., 0., 0.]])>>> b=lil_matrix((3,3))>>> b[1,1]=3.; b[2,0]=4.; b[2,2]=5.>>> b=b.tocsr()>>> b.todense()matrix([[ 0., 0., 0.], [ 0., 3., 0.], [ 4., 0., 5.]])>>> c=a.dot(b)>>> c.todense()matrix([[ 8., 0., 10.], [ 0., 0., 0.], [ 0., 0., 0.]])>>> d=a+b>>> d.todense()matrix([[ 1., 0., 2.], [ 0., 3., 0.], [ 4., 0., 5.]]) 17 / 35

Internal structure: csr_matrix

>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.>>> b=a.tocsr()>>> b.todense()matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]])>>> b.indicesarray([1, 2, 2, 0, 1], dtype=int32)>>> b.dataarray([ 1., 2., 3., 4., 5.])>>> b.indptrarray([0, 2, 3, 5], dtype=int32)

18 / 35

Internal structure: csc_matrix

>>> from scipy.sparse import lil_matrix, csr_matrix>>> a=lil_matrix((3,3))>>> a[0,1]=1.; a[0,2]=2.; a[1,2]=3.; a[2,0]=4.; a[2,1]=5.>>> b=a.tocsc()>>> b.todense()matrix([[ 0., 1., 2.], [ 0., 0., 3.], [ 4., 5., 0.]])>>> b.indicesarray([2, 0, 2, 0, 1], dtype=int32)>>> b.dataarray([ 4., 1., 5., 2., 3.])>>> b.indptrarray([0, 1, 3, 5], dtype=int32)

19 / 35

Merit of knowing the internal structure

Setting csr_matrix or csc_matrix with its internal structure is much faster thansetting lil_matrix with indices.

See the benchmark of setting

⎜⎜⎜⎜⎜⎜⎜⎜

2 12 1

⋱ ⋱

⋱ 12

⎟⎟⎟⎟⎟⎟⎟⎟

20 / 35

from scipy.sparse import lil_matrix, csr_matriximport numpy as npfrom timeit import timeit

def set_lil(n): a=lil_matrix((n,n)) for i in xrange(n): a[i,i]=2. if i+1<n: a[i,i+1]=1. return a

def set_csr(n): data=np.empty(2*n-1) indices=np.empty(2*n-1,dtype=np.int32) indptr=np.empty(n+1,dtype=np.int32) # to be fair, for-sentence is intentionally used # (using indexing technique is faster) for i in xrange(n): indices[2*i]=i data[2*i]=2. if i<n-1: indices[2*i+1]=i+1 data[2*i+1]=1. indptr[i]=2*i indptr[n]=2*n-1 a=csr_matrix((data,indices,indptr),shape=(n,n)) return a

print "lil:",timeit("set_lil(10000)", number=10,setup="from __main__ import set_lil")print "csr:",timeit("set_csr(10000)", number=10,setup="from __main__ import set_csr")

21 / 35

Result:

lil: 11.6730761528csr: 0.0562081336975

Remark

When you deal with already sorted data, setting csr_matrix or csc_matrixwith data, indices, indptr is much faster than setting lil_matrixBut the code tend to be more complicated if you use the internal structureof csr_matrix or csc_matrix

22 / 35

Case Studies

23 / 35

Case 1: Norms

If is dense:

norm=np.dot(v,v)

Expressed as product of matrices. (dot means matrix product, but you don'thave to take transpose explicitly.)

When is sparse, suppose that is expressed as matrix:

norm=v.multiply(v).sum()

(multiply() is element-wise product)

This is because taking transpose of a sparse matrix changes the type.

∥v =∥2 ∑i

v2i

v

v v 1 × n

24 / 35

Frobenius norm:

norm=a.multiply(a).sum()

=∥A∥Fro ∑ij

a2ij

25 / 35

Case 2: Applying a function to all of the elements of asparse matrix

A universal function can be applied to a dense matrix:

>>> import numpy as np>>> a=np.arange(9).reshape((3,3))>>> aarray([[0, 1, 2], [3, 4, 5], [6, 7, 8]])>>> np.tanh(a)array([[ 0. , 0.76159416, 0.96402758], [ 0.99505475, 0.9993293 , 0.9999092 ], [ 0.99998771, 0.99999834, 0.99999977]])

This is convenient and fast.

However, we cannot do the same thing for a sparse matrix.

26 / 35

>>> from scipy.sparse import lil_matrix>>> a=lil_matrix((3,3))>>> a[0,0]=1.>>> a[1,0]=2.>>> b=a.tocsr()>>> np.tanh(b)<3x3 sparse matrix of type '<type 'numpy.float64'>' with 2 stored elements in Compressed Sparse Row format>

This is because, for an arbitrary function, its application to a sparse matrix isnot necessarily sparse.

However, if a universal function satisfies , the density ispreserved.

Then, how can we compute it?

f f(0) = 0

27 / 35

Use the internal structure!!

The positions of the non-zero elements are not changed after application ofthe function.

Keep indices and indptr, and just change data.

Solution:

b = csr_matrix((np.tanh(a.data), a.indices, a.indptr), shape=a.shape)

28 / 35

Case 3: Formula which appears in a paper

In the algorithm for recommendation system [1], the following formulaappears:

where is dense matrix, and D is a diagonal matrix defined from agiven array as:

Here, (which corresponds to the number of users or items) is big and (which means the number of latent factors) is small.

[1] Hu et al. Collaborative Filtering for Implicit Feedback Datasets, ICDM,2008.

⋅ D ⋅ AAT

A n × f( )di

D =

⎝⎜⎜⎜⎜⎜

d1

d2

⋱dn

⎠⎟⎟⎟⎟⎟

n f

29 / 35

Solution 1:

There is a special class dia_matrix to deal with a diagonal sparse matrix.

import scipy.sparse as sparseimport numpy as np

def f(a,d): """a: 2d array of shape (n,f), d: 1d array of length n""" dd=sparse.diags([d],[0]) return np.dot(a.T,dd.dot(a))

30 / 35

Solution 2:

Pack csr_matrix with data,indices,indptr

data=dindices=[0,1,..,n]indptr=[0,1,...,n+1]

def g(a,d): n,f=a.shape data=d indices=np.arange(n) indptr=np.arange(n+1) dd=sparse.csr_matrix((data,indices,indptr),shape=(n,n)) return np.dot(a.T,dd.dot(a))

31 / 35

Solution 3:

This is equivalent to the broadcasting!

def h(a,d): return np.dot(a.T*d,a)

( D)A = × × AAT

⎝⎜⎜⎜⎜

a11

a12

⋮a1m

a21

a22

⋮a2m

⋯⋯

an1

an2

⋮anm

⎠⎟⎟⎟⎟

⎝⎜⎜⎜⎜⎜

d1

d2

⋱dn

⎠⎟⎟⎟⎟⎟

= × A

⎝⎜⎜⎜⎜

a11d1

a12d1

⋮a1md1

a21d2

a22d2

⋮a2md2

⋯⋯

an1dn

an2dn

⋮anmdn

⎠⎟⎟⎟⎟

32 / 35

Benchmark

def datagen(n,f): np.random.seed(0) a=np.random.random((n,f)) d=np.random.random(n) return a,d

from timeit import timeitprint "dia_matrix :",timeit("f(a,d)",number=10, setup="from __main__ import f,datagen; a,d=datagen(1000000,10)")print "csr_matrix :",timeit("g(a,d)",number=10, setup="from __main__ import g,datagen; a,d=datagen(1000000,10)")print "broadcasting :",timeit("h(a,d)",number=10, setup="from __main__ import h,datagen; a,d=datagen(1000000,10)")

Result:

dia_matrix : 1.60458707809csr_matrix : 1.32580018044broadcasting : 1.30032682419

33 / 35

Conclusion

Try not to use for-sentence, but use libraries' capabilities instead.Knowledge about the internal structure of the sparse matrix is useful toextract further performance.Mathematical derivation is important. The key is to find a mathematicallyequivalent and Python-friendly formula.Computational speed does not necessarily matter. Finding a better code ina short time is valuable. Otherwise, you shouldn't pursue too much.

34 / 35

Acknowledgment

I would like to thank

(@shima__shima)who gave me useful advice in Twitter.

35 / 35

top related