the joy of scipy
DESCRIPTION
Presentation to Python Users Berlin on 2/14/2013TRANSCRIPT
The Joy of SciPy
PUB February 14, 2013
David Kammeyer
Brief History
Person Package Year
Jim Fulton Matrix Object in Python
1994
Jim Hugunin Numeric 1995
Perry Greenfield, Rick White, Todd Miller Numarray 2001
Travis Oliphant NumPy 2005
SciPy 2001
Founded in 2001 with Travis Vaught
Eric JonesweaveclusterGA*
Pearu Petersonlinalg
interpolatef2py
Travis Oliphantoptimizesparse
interpolateintegratespecialsignalstats
fftpackmisc
SciPy Ecosystem
Community effort• Chuck Harris• Pauli Virtanen• David Cournapeau• Stefan van der Walt• Dag Sverre Seljebotn• Robert Kern• Warren Weckesser• Ralf Gommers• Mark Wiebe• Nathaniel Smith
Why Python for Technical Computing
• Syntax (it gets out of your way)• Over-loadable operators• Complex numbers built-in early• Just enough language support for arrays• “Occasional” programmers can grok it• Supports multiple programming styles• Expert programmers can also use it effectively• Has a simple, extensible implementation• General-purpose language --- can build a system• Critical mass
Putting Science back in Comp Sci
• Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web- Complex numbers?- Vectorized primitives?• Array-oriented programming has been
supplanted by Object-oriented programming• Software stack for scientists is not as helpful
as it should be• Fortran is still where many scientists end up
NumPy: an Array-Oriented Extension• Data: the array object
– slicing and shaping – data-type map to Bytes
• Fast Math: – vectorization – broadcasting– aggregations
Zen of NumPy
• strided is better than scattered• contiguous is better than strided• descriptive is better than imperative • array-oriented is better than object-oriented• broadcasting is a great idea • vectorized is better than an explicit loop• unless it’s too complicated --- then use Cython/Numba• think in higher dimensions
Memory using Object-oriented
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Object
Attr1
Attr2
Attr3
Array-oriented (Table) approach
Attr1 Attr2 Attr3
Object1
Object2
Object3
Object4
Object5
Object6
Benefits of Array-oriented
• Many technical problems are naturally array-oriented (easy to vectorize)• Algorithms can be expressed at a high-level• These algorithms can be parallelized more
simply (quite often much information is lost in the translation to typical “compiled” languages)• Array-oriented algorithms map to modern
hard-ware caches and pipelines.
We need more focus on complied array-oriented languages with fast compilers!
What is good about NumPy?• Array-oriented• Extensive Dtype System (including structures)• C-API• Simple to understand data-structure• Memory mapping• Syntax support from Python• Large community of users• Broadcasting• Easy to interface C/C++/Fortran code
New Project
Blaze
NumPy
Next Generation NumPyOut-of-core
Distributed Tables
Overview
Main Script
Processing Node
Processing Node
Processing Node
Processing Node
Processing Node
Code
Code
Code
Code
Timeline (Available on GitHubNow!)
Date Milestone
July 2012 Pre-alpha release
December 2012 Early Beta Release
June 2013 Version 1.0
Spectrogram Demo
Introducing Numba (lots of kernels to write)
NumPy Users
• Want to be able to write Python to get fast code that works on arrays and scalars• Need access to a boat-load of C-extensions
(NumPy is just the beginning)
PyPy doesn’t cut it for us!
Dynamic compilationPython
Function
NumPy Runtime
Ufu
ncs
Gen
eral
ized
U
Func
s
Func
tion-
base
d In
dexi
ng
Mem
ory
Filte
rs
Win
dow
K
erne
l Fu
ncs
I/O F
ilter
s
Red
uctio
n Fi
lters
Com
pute
d C
olum
ns
Dynamic Compilation
function pointer
SciPy needs a Python compiler
integrateoptimize
odespecial
writing more of SciPy at high-level
Numba -- a Python compiler
• Replays byte-code on a stack with simple type-inference• Translates to LLVM (using LLVM-py)• Uses LLVM for code-gen• Resulting C-level function-pointer can be
inserted into NumPy run-time• Understands NumPy arrays• Is NumPy / SciPy aware
NumPy + Mamba = Numba
LLVM 3.1
Intel Nvidia AppleAMD
OpenCLISPC CUDA CLANGOpenMP
LLVM-PY
Python Function Machine Code
Examples
Software Stack Future?
LLVM
Python
COBJCFORTRAN
R
C++
Plateaus of Code re-use + DSLs
MatlabSQL
TDPL
Wakari/Numba Demo
How to pay for all this?
Dual strategy
Blaze
NumFOCUS
Num(Py) Foundation for Open Code for Usable Science
NumFOCUS
• Mission• To initiate and support educational programs
furthering the use of open source software in science.
• To promote the use of high-level languages and open source in science, engineering, and math research
• To encourage reproducible scientific research• To provide infrastructure and support for open
source projects for technical computing
NumFOCUS
Core Projects
Other Projects (seeking more --- need representatives)
NumPy SciPy IPython Matplotlib
Scikits Image
• Large-scale data analysis products• Anaconda, SciPy in a Box• Wakari.io -- Cloud Hosted SciPy• Python training (data analysis and
development)• NumPy support and consulting• Blaze, Numba, and More Development