number crunching in python

94
LOREM I P S U M NUMBER CRUNCHING IN PYTHON Enrico Franchi ([email protected] ) & Valerio Maggio ([email protected] )

Upload: valerio-maggio

Post on 06-May-2015

713 views

Category:

Technology


7 download

DESCRIPTION

"Number Crunching in Python": slides presented at EuroPython 2012, Florence, Italy Slides have been authored by me and by Dr. Enrico Franchi. Scientific and Engineering Computing, Numpy NDArray implementation and some working case studies are reported.

TRANSCRIPT

Page 2: Number Crunching in Python

DOLORS I T OUTLINE

• Scientific and Engineering Computing

• Common FP pitfalls

• Numpy NDArray (Memory and Indexing)

• Case Studies

Page 3: Number Crunching in Python

DOLORS I T OUTLINE

• Scientific and Engineering Computing

• Common FP pitfalls

• Numpy NDArray (Memory and Indexing)

• Case Studies

Page 4: Number Crunching in Python

DOLORS I T OUTLINE

• Scientific and Engineering Computing

• Common FP pitfalls

• Numpy NDArray (Memory and Indexing)

• Case Studies

Page 5: Number Crunching in Python

number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch.

Page 6: Number Crunching in Python

number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch.

We are not evil.

Page 7: Number Crunching in Python

number-crunching: n. [common] Computations of a numerical nature, esp. those that make extensive use of floating-point numbers. This term is in widespread informal use outside hackerdom and even in mainstream slang, but has additional hackish connotations: namely, that the computations are mindless and involve massive use of brute force. This is not always evil, esp. if it involves ray tracing or fractals or some other use that makes pretty pictures, esp. if such pictures can be used as screen backgrounds. See also crunch.

We are not evil. Just chaotic neutral.

Page 8: Number Crunching in Python

AMETM E N T I

T U M ALTERNATIVES• Matlab (IDE, numeric computations oriented, high quality algorithms,

lots of packages, poor GP programming support, commercial)

• Octave (Matlab clone)

• R (stats oriented, poor general purpose programming support)

• Fortran/C++ (very low level, very fast, more complex to use)

• In general, these tools either are low level GP or high level DSLs

Page 9: Number Crunching in Python

HIS EX,T E M P O

R PYTHON• Numpy (low-level numerical computations) +

Scipy (lots of additional packages)

• IPython (wonderfull command line interpreter) + IPython Notebook (“Mathematica-like” interactive documents)

• HDF5 (PyTables, H5Py), Databases

• Specific libraries for machine learning, etc.

• General Purpose Object Oriented Programming

Page 10: Number Crunching in Python

TOOLSCUS E D

Page 11: Number Crunching in Python

TOOLSCUS E D

Page 12: Number Crunching in Python

TOOLSCUS E D

Page 13: Number Crunching in Python

DENIQUE

G U B E RG R E N

Our Code

Numpy

Atlas/MKL

Improvements

Improvements

Algorithms are fast because of highly optimized C/Fortran code

4 30 LOAD_GLOBAL 1 (dot) 33 LOAD_FAST 0 (a) 36 LOAD_FAST 1 (b) 39 CALL_FUNCTION 2 42 STORE_FAST 2 (c)

NUMPY STACKc = a · b

Page 14: Number Crunching in Python

ndar

ray

ndarray

Memory

behavior

shape, stride, flags

(i0, . . . , in�1) ! I

Shape: (d0, …, dn-1)

4x3

An n-dimensional array references some (usually contiguous memory area)

An n-dimensional array has property such as its shape or the

data-type of the elements containes

Is an object, so there is some behavior, e.g., the def. of __add__ and similar stuff

N-dimensional arrays are homogeneous

Page 15: Number Crunching in Python

(i0, . . . , in�1) ! I

C-contiguousF-contiguous

Shape: (d0, …, dn)

IC =n�1X

k=0

ik

n�1Y

j=k+1

dj

IF =n�1X

k=0

ik

k�1Y

j=0

dj

Shape: (d0, …, dk ,…, dn-1)

Shape: (d0, …, dk ,…, dn-1)

IC = i0 · d0 + i14x3

IF = i0 + i1 · d1

Elem

ent L

ayou

t in

Mem

ory

Page 16: Number Crunching in Python

Strid

e

C-contiguous F-contiguous

sF (k) =k�1Y

j=0

dj

IF =nX

k=0

ik · sF (k)

sC(k) =n�1Y

j=k+1

dj

IC =n�1X

k=0

ik · sC(k)

Stride

C-contiguousF-contiguous

C-contiguous

(s0 = d0, s1 = 1) (s0 = 1, s1 = d1)

IC =n�1X

k=0

ik

n�1Y

j=k+1

dj IF =n�1X

k=0

ik

k�1Y

j=0

dj

Page 17: Number Crunching in Python

ndarray

Memory

behavior

shape, stride, flags

ndarray

behavior

shape, stride, flags

View View

View View

View

s

Page 18: Number Crunching in Python
Page 19: Number Crunching in Python
Page 20: Number Crunching in Python

C-contiguous

ndarray

behavior

(1,4)

Memory

Page 21: Number Crunching in Python

C-contiguous

ndarray

behavior

(1,4)

Memory

Page 22: Number Crunching in Python

ndarray

Memory

behavior

shape, stride, flags

matrix

Memory

behavior

shape, stride, flags

ndarray

matrix

Page 23: Number Crunching in Python

Basic

Inde

xing

Page 24: Number Crunching in Python

Adva

nced

Inde

xing

Broa

dcas

ting!

Page 25: Number Crunching in Python

Adva

nced

Inde

xing

Broa

dcas

ting!

Page 26: Number Crunching in Python

Adva

nced

Inde

xing

Broa

dcas

ting!

Page 27: Number Crunching in Python

Adva

nced

Inde

xing

Broa

dcas

ting!

Page 28: Number Crunching in Python

Adva

nced

Inde

xing

2

Page 29: Number Crunching in Python

Adva

nced

Inde

xing

2

Page 30: Number Crunching in Python

Adva

nced

Inde

xing

2

Page 31: Number Crunching in Python

Adva

nced

Inde

xing

2

Page 32: Number Crunching in Python

Adva

nced

Inde

xing

2

Page 33: Number Crunching in Python

Adva

nced

Inde

xing

2

Page 34: Number Crunching in Python

Vect

orize

!

Don’t use explicit for loops unless you have to!

Page 35: Number Crunching in Python

PART II: NUMBER CRUNCHING IN ACTION

Page 36: Number Crunching in Python

PART II: NUMBER CRUNCHING IN ACTION

Page 37: Number Crunching in Python

General Disclaimer: All the Maths appearing in the next slides is only intended to better introduce the considered case studies. Speakers are not responsible for any possible disease or “brain consumption” caused by too much formulas.

So BEWARE; use this information at your own risk! It's intention is solely educational. We would strongly encourage you to use this information in cooperation with a medical or health professional.

Awfu

l Mat

hs

Page 38: Number Crunching in Python

BEFORE STARTINGWhat do you need to get started:

• A handful Unix Command-line tool:

• Linux / Mac OSX Users: Your’re done.

• Windows Users: It should be the time to change your OS :-)

• [I]Python (You say?!)

• A DBMS:

• Relational: e.g., SQLite3, PostgreSQL

• No-SQL: e.g., MongoDB

MINIMS C R I PT O R E M

Page 39: Number Crunching in Python

LOREMI P S U M

BENCHMARKING

Page 40: Number Crunching in Python

LOREMI P S U M

• Vectorization (NumPy vs. “pure” Python

• Loops and Math functions (i.e., sin(x))

• Matrix-Vector Product

• Different implementations of Matrix-Vector Product

CASE STUDIES ON NUMERICAL EFFICIENCY

Page 41: Number Crunching in Python

Hw In

fo

Page 42: Number Crunching in Python

Vect

oriza

tion:

sin

(x)

Page 43: Number Crunching in Python

Vect

oriza

tion:

sin

(x)

Page 44: Number Crunching in Python

Vect

oriza

tion:

sin

(x)

Page 45: Number Crunching in Python

Vect

oriza

tion:

sin

(x)

Page 46: Number Crunching in Python

Vect

oriza

tion:

sin

(x)

Page 47: Number Crunching in Python

Vect

oriza

tion:

sin

(x)

Page 48: Number Crunching in Python

NumPy, Winssi

n(x)

: Res

ults

Page 49: Number Crunching in Python

NumPy, Winsfatality

sin(

x): R

esul

ts

Page 50: Number Crunching in Python

NumPy, Winsfatality

sin(

x): R

esul

ts

Page 51: Number Crunching in Python

NumPy, Winsfatality

sin(

x): R

esul

ts

Page 52: Number Crunching in Python

Mat

rix-V

ecto

r Pro

duct

Page 53: Number Crunching in Python

dot

Page 54: Number Crunching in Python

dot

Page 55: Number Crunching in Python

dot

Page 56: Number Crunching in Python

dot

Page 57: Number Crunching in Python

dot

Page 58: Number Crunching in Python

dot

Page 59: Number Crunching in Python

NumPy, Winsdo

t: R

esul

ts

Page 60: Number Crunching in Python

NumPy, Winsfatality

dot:

Res

ults

Page 61: Number Crunching in Python

LOREMI P S U M

NUMBER CRUNCHING APPLICATIONS

Page 62: Number Crunching in Python

MACHINE LEARNING• Machine Learing = Learning by Machine(s)

• Algorithms and Techniques to gain insights from data or a dataset

• Supervised or Unsupervised Learning

• Machine Learning is actively being used today, perhaps in many more places than you’d expected

• Mail Spam Filtering

• Search Engine Results Ranking

• Preference Selection

• e.g., Amazon “Customers Who Bought This Item Also Bought”

NAM IN,S E A

N O

Page 63: Number Crunching in Python

LOREMI P S U M

CLUSTERING: BRIEF INTRODUCTION

• Clustering is a type of unsupervised learning that automatically forms clusters (groups) of similar things. It’s like automatic classification. You can cluster almost anything, and the more similar the items are in the cluster, the better your clusters are.

• k-means is an algorithm that will find k clusters for a given dataset.

• The number of clusters k is user defined.

• Each cluster is described by a single point known as the centroid.

• Centroid means it’s at the center of all the points in the cluster.

Page 64: Number Crunching in Python

from scipy.cluster.vq import kmeans, vqK-

mea

ns

Page 65: Number Crunching in Python

from scipy.cluster.vq import kmeans, vqK-

mea

ns

Page 66: Number Crunching in Python

from scipy.cluster.vq import kmeans, vqK-

mea

ns

Page 67: Number Crunching in Python

from scipy.cluster.vq import kmeans, vqK-

mea

ns

Page 68: Number Crunching in Python

from scipy.cluster.vq import kmeans, vqK-

mea

ns

Page 69: Number Crunching in Python

K-m

eans

plo

tfrom scipy.cluster.vq import kmeans, vq

Page 70: Number Crunching in Python

K-m

eans

plo

tfrom scipy.cluster.vq import kmeans, vq

Page 71: Number Crunching in Python

LOREMI P S U M

EXAMPLE:CLUSTERING POINTS ON A MAP

Here’s the situation: your friend <NAME> wants you to take him out in the greater Portland, Oregon, area (US) for his birthday. A number of other friends are going to come also, so you need to provide a plan that everyone can follow. Your friend has given you a list of places he wants to go. This list is long; it has 70 establishments in it.

Page 72: Number Crunching in Python

Yaho

o AP

I: ge

oGra

b

Page 73: Number Crunching in Python

�s�s �f�fLatitude and Longitude Coordinates of two points (s and f)

���� Corresponding differences

��̂ = arccos(sin�s sin�f + cos�s cos�f cos��)Spherical Distance Measure

Sphe

rical

Dist

ance

Mea

sure

Page 74: Number Crunching in Python

kmea

ns w

ith dis

tLSC

Page 75: Number Crunching in Python

• Problem: Given an input matrix A, calculate if possible, its inverse matrix.

• Definition: In linear algebra, a n-by-n (square) matrix A is invertible (a.k.a. is nonsingular or nondegenerate) if there exists a n-by-n matrix B (A-1) such that: AB = BA = In

TRIVIAL EXAMPLE:INVERSE MATRIX

Page 76: Number Crunching in Python

✓ Eigen Decomposition: • If A is nonsingular, i.e., it can be eigendecomposed and none of its

eigenvalue is equal to zero

✓ Cholesky Decomposition:• If A is positive definite, where is the Conjugate transpose matrix

of L (i.e., L is a lower triangular matrix)

✓ LU Factorization: (with L and U Lower (Upper) Triangular Matrix)

✓ Analytic Solution: (writing the Matrix of Cofactors), a.k.a. Cramer Method

A�1 = Q⇤Q�1

A�1 = (L⇤)�1L�1

A�1 = 1det(A) (C

T )i,j =1

det(A) (Cji) =1

det(A)

0

BBB@

C1,1 C1,2 · · · C1,n

C2,1 C2,2 · · · C2,n...

.... . .

...Cm,1 Cm,2 · · · Cm,n

1

CCCA

L⇤

A = LU

Solu

tion(

s)

Page 77: Number Crunching in Python

C =

0

@C1,1 C1,2 C1,3

C2,1 C2,2 C2,3

C3,1 C3,2 C3,3

1

A

Exam

ple

Page 78: Number Crunching in Python

C =

0

@C1,1 C1,2 C1,3

C2,1 C2,2 C2,3

C3,1 C3,2 C3,3

1

A

Exam

ple

C�1 =1

det(C)⇤

0

@(C2,2C3,3 � C2,3C3,2) (C1,3C3,2 � C1,2C3,3) (C1,2C2,3 � C1,3C2,2)(C2,3C3,1 � C2,1C3,3) (C1,1C3,3 � C1,3C3,1) (C1,3C2,1 � C1,1C2,3)(C2,1C3,2 � C2,2C3,1) (C3,1C1,2 � C1,1C3,2) (C1,1C2,2 � C1,2C2,1)

1

A

Page 79: Number Crunching in Python

C =

0

@C1,1 C1,2 C1,3

C2,1 C2,2 C2,3

C3,1 C3,2 C3,3

1

A

Exam

pledet(C) = C1,1(C2,2C3,3 � C2,3C3,2)

+C1,2(C1,3C3,2 � C1,2C3,3)

+C1,3(C1,2C2,3 � C1,3C2,2)

C�1 =1

det(C)⇤

0

@(C2,2C3,3 � C2,3C3,2) (C1,3C3,2 � C1,2C3,3) (C1,2C2,3 � C1,3C2,2)(C2,3C3,1 � C2,1C3,3) (C1,1C3,3 � C1,3C3,1) (C1,3C2,1 � C1,1C2,3)(C2,1C3,2 � C2,2C3,1) (C3,1C1,2 � C1,1C3,2) (C1,1C2,2 � C1,2C2,1)

1

A

Page 80: Number Crunching in Python

Hom

e M

ade

Page 81: Number Crunching in Python

Duplicated Code

Hom

e M

ade

Page 82: Number Crunching in Python

Duplicated CodeTemplate Method Pattern

Hom

e M

ade

Page 83: Number Crunching in Python

Duplicated CodeTemplate Method Pattern

However, we still have to implementfrom scratch computational functions!!

Reinventing the wheel!

Hom

e M

ade

Page 84: Number Crunching in Python

Num

pyfrom numpy import linalg

Page 85: Number Crunching in Python

Type: functionString Form:<function inv at 0x105f72b90>File: /Library/Python/2.7/site-packages/numpy/linalg/linalg.pyDefinition: linalg.inv(a)Source:def inv(a): """ Compute the (multiplicative) inverse of a matrix. [...]

Parameters ---------- a : array_like, shape (M, M) Matrix to be inverted.

Returns ------- ainv : ndarray or matrix, shape (M, M) (Multiplicative) inverse of the matrix `a`.

Raises ------ LinAlgError If `a` is singular or not square.

[...] """ a, wrap = _makearray(a) return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))Unde

r the

hoo

d

Page 86: Number Crunching in Python

• Alternative built-in solutions to the same problem:

Num

py A

ltern

ative

s

Page 87: Number Crunching in Python

Thanks for your kind attention.

Page 88: Number Crunching in Python

Vect

oriza

tion:

i+=

2

Page 89: Number Crunching in Python

Vect

oriza

tion:

i+=

2

Page 90: Number Crunching in Python

Vect

oriza

tion:

i+=

2

Page 91: Number Crunching in Python

Vect

oriza

tion:

i+=

2

Page 92: Number Crunching in Python

NumPy, Winsi+

=2: R

esul

ts

Page 93: Number Crunching in Python

fatalityNumPy, Wins

i+=2

: Res

ults

Page 94: Number Crunching in Python

Create k points for starting centroids (often randomly)

While any point has changed cluster assignment for every point in dataset: for every centroid:

d = distance(centroid,point) assign(point, nearest(cluster))

for each cluster: mean = average(cluster) centroid[cluster] = mean

K-m

eans