Transcript
Page 1: Intro to Python Data Analysis in Wakari

Intro to Python Data Analysis in Wakari

Karissa McKelveySoftware Developer Continuum Analytics

@karissamck

November 8, 2013PyData NYC

Page 2: Intro to Python Data Analysis in Wakari

$ WHOAMI

karissamck.com@karissamck

Page 3: Intro to Python Data Analysis in Wakari

truthy.indiana.edu

Page 4: Intro to Python Data Analysis in Wakari

More Tweets, Mote Votes

Page 5: Intro to Python Data Analysis in Wakari

Get you excited about data analysis in Wakari

Walk through some basic analysis packages and wakari workflows

Kick-start your journey

MY GOALS

Page 6: Intro to Python Data Analysis in Wakari

WHO ARE YOU?

Page 7: Intro to Python Data Analysis in Wakari
Page 8: Intro to Python Data Analysis in Wakari
Page 9: Intro to Python Data Analysis in Wakari
Page 10: Intro to Python Data Analysis in Wakari
Page 11: Intro to Python Data Analysis in Wakari
Page 12: Intro to Python Data Analysis in Wakari
Page 13: Intro to Python Data Analysis in Wakari
Page 14: Intro to Python Data Analysis in Wakari

Putting Science back in Comp Sci

• Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web

- Complex numbers? - Vectorized primitives?

• Software stack for scientists is not as helpful as it should be

• Fortran is still where many scientists end up

Page 15: Intro to Python Data Analysis in Wakari
Page 16: Intro to Python Data Analysis in Wakari

Why Python?

Page 17: Intro to Python Data Analysis in Wakari

High Performance with BIG DATA

Page 18: Intro to Python Data Analysis in Wakari

Packages for data analysis and visualization

Page 19: Intro to Python Data Analysis in Wakari

Syntax – Gets out of your way

Page 20: Intro to Python Data Analysis in Wakari

Community Driven

Page 21: Intro to Python Data Analysis in Wakari

Ready for web applications, too.

Page 22: Intro to Python Data Analysis in Wakari
Page 23: Intro to Python Data Analysis in Wakari

• “Python is good for data cleanup, R for statistical models”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

Page 24: Intro to Python Data Analysis in Wakari

• “Python is good for data cleanup, R for statistical models”

• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

Page 25: Intro to Python Data Analysis in Wakari

• “Python is good for data cleanup, R for statistical models”

• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”

• “You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

Page 26: Intro to Python Data Analysis in Wakari

Ready for DATA, and then some

“You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”

Page 27: Intro to Python Data Analysis in Wakari

Numba: just-in-time compiler to LLVM through @decorators

numba.pydata.org

Page 28: Intro to Python Data Analysis in Wakari

Numba: just-in-time compiler to LLVM through @decorators*

numba.pydata.org*aka, fast. easy.

Page 29: Intro to Python Data Analysis in Wakari
Page 30: Intro to Python Data Analysis in Wakari

Basic packages for data analysis and visualization

Page 31: Intro to Python Data Analysis in Wakari

NumPy: The foundation of the Python Data Analysis stack

Page 32: Intro to Python Data Analysis in Wakari

NumPy: Array-oriented

Page 33: Intro to Python Data Analysis in Wakari
Page 34: Intro to Python Data Analysis in Wakari
Page 35: Intro to Python Data Analysis in Wakari
Page 36: Intro to Python Data Analysis in Wakari

Pandas: Builds upon NumPy

Page 37: Intro to Python Data Analysis in Wakari

Matplotlib: 2D plotting library

Page 38: Intro to Python Data Analysis in Wakari

IPython: Interactive Python (+ in the Web)

tab completionmagic %-commands

Inline plots

Page 39: Intro to Python Data Analysis in Wakari

Anaconda: pulls it all together

Page 40: Intro to Python Data Analysis in Wakari
Page 41: Intro to Python Data Analysis in Wakari

wakari.io Browser-based Python & Linux environment

Page 42: Intro to Python Data Analysis in Wakari

Share files, IPython notebooks, and plots with pay-as-you-go compute

IPython Notebook

Scientific Packages

Terminal

Page 43: Intro to Python Data Analysis in Wakari

Sharing in Wakari

• Packages IPython notebooks, files, folders, data, and environment

• Get a link

• Share that link.

Page 44: Intro to Python Data Analysis in Wakari

Reproducible Research

Page 45: Intro to Python Data Analysis in Wakari
Page 46: Intro to Python Data Analysis in Wakari

“A rule of thumb among biotechnology venture capitalists is that half of published research

cannot be replicated”

Page 47: Intro to Python Data Analysis in Wakari

How do we replicate research today?

Page 48: Intro to Python Data Analysis in Wakari

How do we replicate research today?collaborate on

Page 49: Intro to Python Data Analysis in Wakari

How do we replicate research today?collaborate on

data analysis

Page 50: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 51: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 52: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 53: Intro to Python Data Analysis in Wakari

How do we collaborate today?

Page 54: Intro to Python Data Analysis in Wakari

????????

Page 55: Intro to Python Data Analysis in Wakari

How do we replicate research today?

Page 56: Intro to Python Data Analysis in Wakari

wakari.io Browser-based Python & Linux environment

Page 57: Intro to Python Data Analysis in Wakari

Enterprise or Cloud

Online at wakari.io or install locally for access to your hardware and data

Page 58: Intro to Python Data Analysis in Wakari

wakari.io Browser-based Python & Linux environment

Page 59: Intro to Python Data Analysis in Wakari

Coming Soon

Page 60: Intro to Python Data Analysis in Wakari

Project-based interaction

Projects starting at 10$/month with unlimited team members

user

Page 61: Intro to Python Data Analysis in Wakari

Interactive Plotting

Next-generation collaborative data manipulation, analysis, and presentation

Page 62: Intro to Python Data Analysis in Wakari

Talks to see

• Jack Vanderplas (Washington)– Efficient computing with Numpy • 29th Floor combo 3pm (Right now, next door!)

• Julia Evans (N/A)– A practical introduction to IPython Notebook &

pandas • Here, 4:45pm.

Page 63: Intro to Python Data Analysis in Wakari

Talks to see

• Sarah Guido (Michigan)– A Beginner’s Guide to Machine Learning with

scikit-learn

• Imram Haque (Counsyl)– Beyond the dict

• Peter Wang (Continuum)– Bokeh Workshop

Page 64: Intro to Python Data Analysis in Wakari

Special Thanks

Ben ZaitlinMark FlorissonClayton Davis

Bryan Van de VenTravis Oliphant

Page 65: Intro to Python Data Analysis in Wakari

Karissa McKelvey@karissamck


Top Related