intro to python data analysis in wakari

Post on 26-Jan-2015

132 Views

Category:

Technology

16 Downloads

Preview:

Click to see full reader

DESCRIPTION

Outlines the vision and philosophy for Wakari.io with a basic overview of popular python data analysis packages. Most of the talk is conducted in Wakari and is not visible on these slides. 90 minutes for PyData NYC, November 8th 2013.

TRANSCRIPT

Intro to Python Data Analysis in Wakari

Karissa McKelveySoftware Developer Continuum Analytics

@karissamck

November 8, 2013PyData NYC

$ WHOAMI

karissamck.com@karissamck

truthy.indiana.edu

More Tweets, Mote Votes

Get you excited about data analysis in Wakari

Walk through some basic analysis packages and wakari workflows

Kick-start your journey

MY GOALS

WHO ARE YOU?

Putting Science back in Comp Sci

• Much of the software stack is for systems programming --- C++, Java, .NET, ObjC, web

- Complex numbers? - Vectorized primitives?

• Software stack for scientists is not as helpful as it should be

• Fortran is still where many scientists end up

Why Python?

High Performance with BIG DATA

Packages for data analysis and visualization

Syntax – Gets out of your way

Community Driven

Ready for web applications, too.

• “Python is good for data cleanup, R for statistical models”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

• “Python is good for data cleanup, R for statistical models”

• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

• “Python is good for data cleanup, R for statistical models”

• “R is quirky and weird but the statisticians love it and there really isn’t any compelling reason to switch”

• “You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”

“Which is the better Data Analysis language? R or Python?” Quora. http://www.quora.com/Data-Analysis/Which-is-the-better-Data-analysis-language-R-or-Python

Ready for DATA, and then some

“You’re running an MCMC simulation on a laptop? Perhaps you should write it in C++/FORTRAN”

Numba: just-in-time compiler to LLVM through @decorators

numba.pydata.org

Numba: just-in-time compiler to LLVM through @decorators*

numba.pydata.org*aka, fast. easy.

Basic packages for data analysis and visualization

NumPy: The foundation of the Python Data Analysis stack

NumPy: Array-oriented

Pandas: Builds upon NumPy

Matplotlib: 2D plotting library

IPython: Interactive Python (+ in the Web)

tab completionmagic %-commands

Inline plots

Anaconda: pulls it all together

wakari.io Browser-based Python & Linux environment

Share files, IPython notebooks, and plots with pay-as-you-go compute

IPython Notebook

Scientific Packages

Terminal

Sharing in Wakari

• Packages IPython notebooks, files, folders, data, and environment

• Get a link

• Share that link.

Reproducible Research

“A rule of thumb among biotechnology venture capitalists is that half of published research

cannot be replicated”

How do we replicate research today?

How do we replicate research today?collaborate on

How do we replicate research today?collaborate on

data analysis

How do we collaborate today?

How do we collaborate today?

How do we collaborate today?

How do we collaborate today?

????????

How do we replicate research today?

wakari.io Browser-based Python & Linux environment

Enterprise or Cloud

Online at wakari.io or install locally for access to your hardware and data

wakari.io Browser-based Python & Linux environment

Coming Soon

Project-based interaction

Projects starting at 10$/month with unlimited team members

user

Interactive Plotting

Next-generation collaborative data manipulation, analysis, and presentation

Talks to see

• Jack Vanderplas (Washington)– Efficient computing with Numpy • 29th Floor combo 3pm (Right now, next door!)

• Julia Evans (N/A)– A practical introduction to IPython Notebook &

pandas • Here, 4:45pm.

Talks to see

• Sarah Guido (Michigan)– A Beginner’s Guide to Machine Learning with

scikit-learn

• Imram Haque (Counsyl)– Beyond the dict

• Peter Wang (Continuum)– Bokeh Workshop

Special Thanks

Ben ZaitlinMark FlorissonClayton Davis

Bryan Van de VenTravis Oliphant

Karissa McKelvey@karissamck

top related