data analysis with pandas

47
Data Analysis with Pandas

Upload: outreach-digital

Post on 11-Apr-2017

42 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Data analysis with pandas

Data Analysis with Pandas

Page 2: Data analysis with pandas

When you think of Python...

Page 3: Data analysis with pandas

Meet Jupyter Notebook

Page 4: Data analysis with pandas

And me

job_title != “Developer”

I’m a Consultant at Distilled (since September 2015)

I do build some software in Python

But I mainly use it for data analysis

Page 5: Data analysis with pandas

Getting Started

Page 6: Data analysis with pandas

Python for scientific computing

Huge community

Fantastic ecosystem of packages other people have written

Can be tedious to actually install everything

Page 7: Data analysis with pandas

Just use this! (https://continuum.io/downloads)

Page 8: Data analysis with pandas

What is Anaconda?

Essentially a large (~400 MB) Python installation

But contains everything* you need for data analysis

Unless you have a special reason not to, you should just install and use this

*OK, technically not true, but it has everything you’re likely to need

Page 9: Data analysis with pandas

You need the command line (but only for a minute)

On Windows, open Powershell

On mac, Terminal or iTerm2

Page 10: Data analysis with pandas

Just one line, though:

1. Just type “jupyter notebook”

2. Wait

3. ...

Page 11: Data analysis with pandas

Back to safety

Page 12: Data analysis with pandas

Open a new Notebook

Page 13: Data analysis with pandas

Your very own data analysis environment

Page 14: Data analysis with pandas

So that was fairly easy...

Page 15: Data analysis with pandas

but why is it better than Excel?

Page 16: Data analysis with pandas

There’s not enough room to list everything, but:

1. Handle larger data sets—no set limit on rows

2. Combine multiple files and data sources together instantaneously. Pull data straight from APIs or scraping

3. Everything is completely customisable—if you can imagine a query, it can be done (though not always easily)

4. It’s a safe place to mess things up

5. Keeps a record of your workflow—retrace your steps

Page 17: Data analysis with pandas

...and it’s the perfect playground for learning Python

Page 18: Data analysis with pandas

Side note: don’t know any Python?

Page 19: Data analysis with pandas

Can’t cover it all today, so go here:

1. Learn Python the Hard Way (free)

2. Real Python ($60, but good)

3. Writing Idiomatic Python (~$15)

Page 20: Data analysis with pandas

Unless you’re building applications:

1. Stick with the small building blocks

2. Learn how to write a function (we’ll do this today)

3. Learn about loops, conditional statements, and handling data

4. Probably no need to learn about managing projects and apps

Page 21: Data analysis with pandas

Jupyter Notebook

Save notebooks for later

Run and re-run Python code

Really cool features like post-mortem debugging if you make a mistake

Page 22: Data analysis with pandas

Cells

1. Type all the code you want

2. Shift+Enter to run it

3. View the result

Page 23: Data analysis with pandas

Now we have our Jupyter Notebook up and running, you can start playing around with almost any Python code

We’re going to look at Pandas, though—a data analysis library written in Python

Started its life in finance

Great for fast, flexible computation

The Star of the Show

Page 24: Data analysis with pandas

A little setup, first

You’ll do this more or less at the beginning of each session

It’ll become second nature; just import the workhorse libraries we always use: numpy, pandas, pyplot.

Page 25: Data analysis with pandas

The DataFrame

If you’re used to spreadsheets, the DataFrame isn’t too difficult to understand

It’s the fundamental, flexible building block in Pandas

Page 26: Data analysis with pandas

At its simplest, it looks rather like a spreadsheet would

The only obvious difference with Excel is the column indexes, which are numeric instead of A, B, C...

Page 27: Data analysis with pandas

You’ll usually create them from some other source:

The Pandas library provides some nice functions for importing from common file formats, so you won’t usually be building “by hand”:

1. pd.read_csv()

2. pd.read_table()

3. pd.read_sql()

Page 28: Data analysis with pandas

We have so much data stored in CSVs

Our first function call will just read some data into the DataFrame, where we can analyse it

Reading a CSV

Page 29: Data analysis with pandas

Get help at any time with Shift+Tab

Page 30: Data analysis with pandas

1. pd.read_csv() will read in the data

2. Fields are separated by tabs

3. The encoding is UTF-16 (don’t ask…)

4. The whole result is assigned to the variable ‘df’

Page 31: Data analysis with pandas

Get a quick sense of the data (658k rows, here)

Page 32: Data analysis with pandas

See the columns

Page 33: Data analysis with pandas

Filtering

Page 34: Data analysis with pandas

What’s happening there?

df[‘Link Active?’] is:

1. Checking that whole column for values that are True or False

2. Returning an array of True/False values

3. This is fast, and lets us filter in an amazing variety of ways

Page 35: Data analysis with pandas

Filtering (again)

Page 36: Data analysis with pandas

We’re probably ready for this one, now:

Page 37: Data analysis with pandas

Example project: Getting data from SEMRush

Page 38: Data analysis with pandas

Writing your own function

Page 39: Data analysis with pandas

Call our function, get a DataFrame!

Page 40: Data analysis with pandas

Write to disk in case anything goes wrong

Page 41: Data analysis with pandas

Reading in multiple files

Page 42: Data analysis with pandas

Apply custom filters

Page 43: Data analysis with pandas

Drill down into individual words:

Counter() will save you a huge amount of workHere we wanted to hone in on modifier words

Page 44: Data analysis with pandas

More detailed questions

How local are the searches?Do people search by state code or full name?Do people search by hotel category?

Page 45: Data analysis with pandas

Second example: Custom Rank Tracking Charts

Page 46: Data analysis with pandas

Where to begin?

If you don’t know Python, start with those books I shared earlier.

If you do, check out Python for Data Analysis

Keep Jupyter Notebook open at all times

Experiment!

Page 47: Data analysis with pandas

Questions?