introduction to python for social scientists€¦ ·  · 2015-02-05introduction to python for...

14
I n t r o d u c t i o n t o P y t h o n f o r S o c i a l S c i e n t i s t s Presented by: Drew Conway Date: February 19, 2013 Goals I. Provide you with a baseline introduction to the Python programming langauge II. Get you excited about using Python for your research Expectations I. Little to no programming experience II. You have some project or problem you want to use Python to solve What you will get out of this tutorial The purpose of this tutorial is to provide a very basic introduction to the Python language. The vast majority of our time will be spent reviewing the following elements: Data types and function One of Python's strongest characteristics is it's syntax. It is sometimes described as "executable pseudo-code", and it is this high degree of readability that attracts programmers from all levels to the language. We will discuss the basic data types of strings, integers, floating point values, and booleans, and many of Python's built-in functions. x = True if x is True: print "We will always end up here" else: print "We will never end up here" Data structures There are many useful data structures that can be used in Python, but for this tutorial we will focus on the two most frequently used: list and dict. # A list of fruits fruits = ["apple", "banana", "peach"] fruits[1] "banana" # A dictionary of my current fruit count fruit_count = {"apple": 3, "banana": 1, "peach": 0} fruit_count["peach"] 0 Iteration In nearly all of your work you will need to iterate over some data structure. We have many ways of accessing and manipulating data via iteration. for i in range(10): print "I can count to "+str(i) Logical statements We often need to test if a variable meets some logical criteria in order to make a decision in our programs. We will discuss the most basic forms of logical statements. c = 10 while c > 0: if c > 1: print "I am only going to do this "+str(c-1)+" more times" else: print "This is the last one!" c -= 1 Writing functions Many times we have some task that we need to do many times over. In these cases we do not want to have keep writing the same code in our program over and over. To avoid this we will write functions that can be called each time we need to perform that task. def my_exp(x, i): return x**i print my_exp(5,8) Download notebook

Upload: trinhcong

Post on 02-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

Introduction to Python for Social ScientistsPresented by: Drew ConwayDate: February 19, 2013

Goals

I. Provide you with a baseline introduction to the Python programming langaugeII. Get you excited about using Python for your research

Expectations

I. Little to no programming experienceII. You have some project or problem you want to use Python to solve

What you will get out of this tutorial

The purpose of this tutorial is to provide a very basic introduction to the Python language. The vast majority of our time will be spent reviewing the followingelements:

Data types and function

One of Python's strongest characteristics is it's syntax. It is sometimes described as "executable pseudo-code", and it is this high degree of readability thatattracts programmers from all levels to the language.

We will discuss the basic data types of strings, integers, floating point values, and booleans, and many of Python's built-in functions.

x = Trueif x is True: print "We will always end up here"else: print "We will never end up here"

Data structures

There are many useful data structures that can be used in Python, but for this tutorial we will focus on the two most frequently used: list and dict.

# A list of fruitsfruits = ["apple", "banana", "peach"]fruits[1]"banana"

# A dictionary of my current fruit countfruit_count = {"apple": 3, "banana": 1, "peach": 0}fruit_count["peach"]0

Iteration

In nearly all of your work you will need to iterate over some data structure. We have many ways of accessing and manipulating data via iteration.

for i in range(10): print "I can count to "+str(i)

Logical statements

We often need to test if a variable meets some logical criteria in order to make a decision in our programs. We will discuss the most basic forms of logicalstatements.

c = 10while c > 0: if c > 1: print "I am only going to do this "+str(c-1)+" more times" else: print "This is the last one!" c -= 1

Writing functions

Many times we have some task that we need to do many times over. In these cases we do not want to have keep writing the same code in our program overand over. To avoid this we will write functions that can be called each time we need to perform that task.

def my_exp(x, i): return x**i

print my_exp(5,8)

Download notebook

Page 2: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

390625

Loading libraries

As noted, Python has hundreds of standard libraries and thousands of third-party libraries that you may want to work with in your research. We will review howto load libraries and call functions and classes from those libraries.

from datetime import datetimeprint datetime.now()2013-01-10 01:33:33.397024

What you will not get out of this tutorial

All of the things you probably actually want to learn Python to do:

Web scrapingText processingStatistical programmingVisualizationAccessing data via an API

Hopefully, this tutorial will give you the building blocks so that you can investigate these topics and approach them independently.

Why should you learn to program in Python?

Presuming you have an interest -- or need -- to program, why should you use Python for your academic work?

The design philosophy

In [15]: import this

This design philosophy also contributes in large part to the readability, and thus share-ability, of your code.Code sharing and reproducibility are a huge positive externalities of using Python for scientific computing.

It has a very large scientific computing community

The Zen of Python, by Tim Peters

Beautiful is better than ugly.Explicit is better than implicit.Simple is better than complex.Complex is better than complicated.Flat is better than nested.Sparse is better than dense.Readability counts.Special cases aren't special enough to break the rules.Although practicality beats purity.Errors should never pass silently.Unless explicitly silenced.In the face of ambiguity, refuse the temptation to guess.There should be one-- and preferably only one --obvious way to do it.Although that way may not be obvious at first unless you're Dutch.Now is better than never.Although never is often better than *right* now.If the implementation is hard to explain, it's a bad idea.If the implementation is easy to explain, it may be a good idea.Namespaces are one honking great idea -- let's do more of those!

Page 3: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

Source: scipy.org

People love it

Source: xkcd

Page 4: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

How this tutorial will work

This tutorial is running on a program called iPython, or "interactive Python"

Specifically, we are using the iPython notebook, which runs in the browserIt very similar to a Mathematica or Sage notebook

The notebook is separated by cells

Some of the cells contain Markdown text, which provide informationOther cells are Python input fields, wherein code can be written and executedAs you saw with the import this example above

This tutorial is meant to be highly interactive. There are no slides, only this notebook.

When we finish, you will be able to save all of your work to a Python script file and keep it for your records

Using the iPython notebook

For the purposes of this tutorial you only need to understand three things about the notebook:

I. To navigate between cells you can click on the cell with your mouse, or use the up and down arrows on your keyboardII. If you double click on a cell you can edit it (do this at your own risk!)III. To write code in an input field simply click on the field and start typing. Go ahead and give it a shot below.

I. To execute code in a cell click the "play" button in the tool bar above (highlighted below)II. If you prefer the key-bindings, a cell can be run by pressing Control-Shift and Enter on a PC or Command-Shift and Return on a Mac.

Exercises and problem

The best way to learn any new language is to immerse yourself in it and force yourself to use it.

Programming languages are no different!You will learn the syntax, functions, and idioms of Python exclusively by working through examples and solving problems.

For this tutorial we will be exploring the Python programming language in two different ways.

Exercises to give you practice with the languageProblems to challenge you to apply the tools you learn

Exercises: code writing calisthenics

Repetition, repetition, repetition!

When you see a block of code and an input field surrounded by single horizontal lines, type that code into the input field and run the code.

You can copy-paste the code as well, but that's lazy and defeats the purpose!

For example:

a = 2 + 2a

Problems: finding solution with new tools

The best way to learn a new programming language is to have a specific problem you need to solve.

When you see a problem or question and an input field surrounded by double horizontal lines, write you own code to attempt to solve it.

As with any programing language, there are many ways to solve the same problem in Python.Attempt to solve the problem on your own before working with your neighborShout out any error messages or odd behavior you observe!

For example:

Set a new variable to the result of some arithmetic operation, then print the variable's value to screen.

Page 5: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

Data for this exercise

We will be working with a data set that contains ever play from scrimmage for the 2011 NFL Season.

This data makes for a great sandbox because it contains columns with many different data typesIt is also every play from an NFL season -- awesome!

Loading in the data

First thing we need to do is load in the data

The file is a text file with comma-separated values (CSV)The file name is 2011_nfl_pbp_data.csv

To load the data we need to load Python's csv library

import csv

Once you have the csv library loaded, let's explore the contents of the library via iPython

In the input field above, type csv. and then hit the TAB key

You can see all of the functions and classes contained in the csv library interactively in iPython.

We will use the DictReader classThis class will take each row of data in the CSV files and map it to a Python dictionary

csv.DictReader?

By appending a ? to the end of a function or class in iPython we can view its documentation. For the DictReader class we see the following:

Type: classobjString Form:csv.DictReaderFile: /usr/lib/python2.7/csv.pyDocstring: <no docstring>Constructor information:Definition:csv.DictReader(self, f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)

One thing to note is that this class does not contain a Docstring, i.e., no plain-English documentation. Unfortunately, many of the Python base librarieshave their documentation online, so getting familiar with them may require some Googling or browsing of docs.python.org.

For our purposes, the most important line is the Definition, which tells us the arguments the class needs to be initialized. The arguments of concern to usare:

f: A file connection our data file.This is the only argument that does not have a default value, meaning it is requiredfieldnames: These are the columns headers contained in the CSV file.We need to supply a list of strings that match these values so the data can be mapped to a dict keys

Creating a list for the fieldnames argument

The column are:

Page 6: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

gameid: a unique alphanumeric ID value for the gameqtr: game quarter that play occurred, {1,2,3,4}min: Minutes reining in the game, [0,59]sec: Seconds remaining in the game, [0,59]off: Team on offense for the play, two character valuedef: Team on defense for the play, two character valuedown: Down for play, {1,2,3,4}togo: Yards needed by offensive team to gain a first downydline: Yard-line the ball is on for playdescription: Text description of playoffscore: Score of offensive team for playdefscore: Score of defensive team for playseason: NFL season in which play occurred

We need to create a list that contains all 13 column headers values as strings

nfl_headers = ["gameid","qtr","min","sec","off","def","down","togo","ydline","description","offscore","defscore","season"]len(nfl_headers)13

Open a connection to the CSV file

We will use the open function to create a connection to the CSV file. But, first we need to know what open needs to make the connection.

Pull up the documentation for the open function, and tell me what we need to open the connection

We now have the steps needed for opening a CSV file:

I. Find path to fileII. Open connection to fileIII. Create DictReader object from file connection

data_file = "2011_nfl_pbp_data.csv"con = open(data_file, "r")data = csv.DictReader(con, fieldnames=nfl_headers)

Our new variable data, returned by the DictReader class is special kind of Python object called an iterator.

We know this because by inspecting the data object we see a function called nextSpoiler: iterators are meant to be iterated overAnother common example is xrange, which is use to iterate through a set of integers

Getting data out of dataOn its own, the data iterator is not useful to us.

We need to iterate over it and generate a dict for each row.Then, store each of those dicts in a list

To do this we will create an empty list, and then iterate over the data object using a for-loop. At each iteration we will use the lists built-in appendfunction to add each new row of data.

data_rows = list()# Alternatively# data_rows =[]

# Add data to listfor row in data: data_rows.append(row)

# Close file connectioncon.close()

The data rows list now contains all of the rows from the 2011 nfl pbp data.csv file.

Page 7: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

A brief digression on Python lists

In Python you will work with lists a lot.

If you have worked with other languages, they are similar to vectors, arrays or stacks

You can slice, or select ranges, of elements

nfl_headers[2:5]['min', 'sec', 'off']

You can index in reverse

nfl_headers[-1]'season'

They are zero-indexed

data_rows[0]{'def': 'def', 'defscore': 'defscore', 'description': 'description', 'down': 'down', 'gameid': 'gameid', 'min': 'min', 'off': 'off', 'offscore': 'offscore', 'qtr': 'qtr', 'season': 'season', 'sec': 'sec', 'togo': 'togo', 'ydline': 'ydline'}

They come with a very useful set of functions, some of which we have already used this tutorial.

I. Fix data_rows so that it does not contain an element for the column header rows from the CSV file, call the new list play_data.II. According to this data, how many plays were run during the 2011 NFL seasons?

Manipulating the data

Now we have all of the play-by-play data in a single list

Each element of the list represents a single playThese elements at Python dicts

Digression on Python dictionaries

A dictionary, dict object, data structure is an object that maps a key to a value.

Keys and values can be of arbitrary typeOrder does not matter!

first_play = play_data[0]first_play

We can access the values by referencing the key

first_game_id = first_play["gameid"]first_game_id'20110908_NO@GB'

Return the keys as a list

play_keys = first_play.keys()play_keys['gameid', 'qtr', 'description', 'min', 'season',

Page 8: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

'down', 'togo', 'ydline', 'sec', 'off', 'defscore', 'def', 'offscore']

Or, return the values as a list

play_keys = first_play.values()play_keys['20110908_NO@GB', '1', 'T.Morstead kicks 68 yards from NO 35 to GB -3. R.Cobb to GB 24 for 27 yards (L.Torrence).', '', '2011', '', '', '', '0', 'NO', '0', 'GB', '0']

List Comprehension

One of the most powerful features of Python for working with data is list comprehension.

Very often we need to manipulate a list of data based on set conditions. One way to do this is to iterate over that list and operate on only those value thatmeet the condition during the iteration.

Suppose we wanted to create a subset of play_data such that it contained only 4th down plays in the 4th quarter?

q4_d4 = []for p in play_data: if p["down"] == "4" and p["qtr"] == "4": q4_d4.append(p)len(q4_d4)

Now, suppose we wanted to know what the average score was in 2011 for teams that "went for it" on forth down in the 4th quarter?

q4_d4_scores = []for p in q4_d4: q4_d4_scores.append(int(p["offscore"]))print sum(q4_d4_scores) / float(len(q4_d4_scores))

The was a lot of coding!

List comprehensions allow you to write much more concise code for doing these manipulations.

Because Python has some functional programming aspects to it, you can combine list comprehensions with functions to quickly explore data.

Let's try to answer the same question using list comprehension:

lc_q4_d4_scores = [int(a["offscore"]) for a in play_data if a["down"] == "4" and a["qtr"] == "4"]print sum(lc_q4_d4_scores) / float(len(lc_q4_d4_scores))

So, what just happened?

The syntax is very specific, we read them as:

[(return this) for (this,or,that) in this_list if these conditions are met]

Because in Python a list can contain many different kinds of data objects, the list comprehension must be written as appropriate to the data in the listbeing passed through it.

Page 9: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

In our case, we have a list of dict objects, so the list comprehension can understand dict operations on the items being passed:

play_tuples = [a.items() for a in play_data]play_tuples[0]

[('gameid', '20110908_NO@GB'), ('qtr', '1'), ('description', 'T.Morstead kicks 68 yards from NO 35 to GB -3. R.Cobb to GB 24 for 27 yards (L.Torrence).'), ('min', ''), ('season', '2011'), ('down', ''), ('togo', ''), ('ydline', ''), ('sec', '0'), ('off', 'NO'), ('defscore', '0'), ('def', 'GB'), ('offscore', '0')]

But, you can't do non-dict things to the elements in a list comprehension if that list only contains dict objects. If you do, Python gets very mad...

Suppose we wanted to add a new ID to each element based on the teams playing for each play(off-def), but used the append functions for lists. We arepassing a dict object, which has no append function.

[a.append(a["off"]+"-"+a["def"]) for a in play_data]---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-33-9993fdcc0f9b> in <module>()----> 1 [a.append(a["off"]+"-"+a["def"]) for a in play_data]

AttributeError: 'dict' object has no attribute 'append'

We just got bitten by the Python interpreter! This happened because we tried to call a function on a list item that does not exist!

But, even if we had done things the way dict wants you to, we could have done even more damage

new_id_play_data = [a.update({"new_id": a["off"]+"_"+a["def"]}) for a in play_data]new_id_play_data[0:5][None, None, None, None, None]

Page 10: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

We nuked all of our data! This happened because the update function operates on an element "in-place", meaning it does not return a value.

The first part of a list comprehension is always the thing that is being returnedIf you perform an in-place modification in a list comprehension you return the None object

But...if we inspect the play_data list we will notice something:

play_data[0] {'def': 'GB', 'defscore': '0', 'description': 'T.Morstead kicks 68 yards from NO 35 to GB -3. R.Cobb to GB 24 for 27 yards (L.Torrence).', 'down': '','gameid': '20110908_NO@GB', 'min': '', 'new_id': 'NO_GB', 'off': 'NO', 'offscore': '0', 'qtr': '1', 'season': '2011', 'sec': '0', 'togo': '', 'ydline': ''}

The new_id key and value have been added

List comprehensions should NOT be used to modify the data, even though they can beTo modify a list with iteration, it is much better to use a for-loop

for i in xrange(len(in play_data)): play_data[i]["new_id] = play_data[i]["off"] + "-" + play_data[i]["def"]

Now that we have seen some of the things you can do with list comprehensions, let's try to answer a specific question:

How many games in 2011 were the New York Giants leading at halftime?

First, let's think about what we need in order to answer this question:

I. Plays from only New York Giants gamesII. The score at halftime for all of these gamesIII. Calculate if New York Giants were winning

We begin by slicing the full play_data set to include only plays involving the New York Giants

The team ID for the New York Giants in this data is NYG

nyg_plays = [a for a in play_data if a["off"]=="NYG" or a["def"]=="NYG"]len(nyg_plays)

We see that the New York Giants ran 3,396 plays in the 2011 season!

Next, we need to extract only those plays that were the first of each game's second half. From the data, there are a few things we need to keep in mind:

First play of second half: qtr = '3', min = '30' and sec = '0'The second half always begins with a kickoff, therefore, the description of the first play of the second half will always contain the word "kicks"

Page 11: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

nyg_halftime = [a for a in nyg_plays if a["qtr"]=="3" and a["min"]=="30" and a["sec"]=="0" and a["description"].find("kicks") > 0]len(nyg_halftime)

Here we find that the New York Giants ran 20 plays to start the second halves. This is reassuring, because The New York Giants played exactly 20 games in2011: 16 regular season, and 4 playoff games -- the most games possible for an NFL team in a single season

Finally, we need to inspect each of these plays to see if the New York Giants were leading at halftime.

In this case, we need to answer our question using a decision tree-like structure:

Are the New York Giants on offense to start the half, and if so, are they winning?Likewise, if they are on defense, are they winning?

We could use a list comprehension to answer this, but it would be very long and overly complex (remember the Zen of Python!). List comprehensions do notsupport decision trees processes very well. A better way would be to use a simple for-loop to iterate over each play and do the comparison. We can create asimple counter to keep track of the number of games the New York Giants are ahead at halftime.

nyg_halftime_lead = 0for g in nyg_halftime: if g["def"] == "NYG": if int(g["defscore"]) > int(g["offscore"]): nyg_halftime_lead += 1 else: if int(g["offscore"]) > int(g["defscore"]): nyg_halftime_lead += 1print nyg_halftime_lead

The Super Bowl Champion New York Giants were only leading at halftime in 40% of their games in 2011?

This is why we love football!

Now that you have seen list comprehensions at work, it's time for you to go to work.

Using list comprehension, calculate the following values:

I. How many offensive plays did the San Francisco 49'ers run in 2011?II. San Francisco 49'ers team ID is SFIII. What was the average number of yards the Green Bay Packers needed to gain for a first down when going for it on 4th down in 2011?IV. Green Bay Packers team ID is GBV. Since means don't make a lot sense in down and distance, what was the median? HINTVI. Challenge: What was the mean point differential between the New England Patriots and their opponents in the 2011 season.

Page 12: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

Writing functions

Functions in Python are identical to those you may have seen in other languages

Take some input, and produce some output (most of the time)Can be arbitrarily complex, but best when simpleShouldn't be disposable, best when used for repetitive tasks

Special things about Python Function

In Python, there are few specifics about functions that you may find useful

Unlike some other languages, such as R, Python functions can return multiple arguments

def multiReturn(x,y): return x-y, y-x

x1,x2 = multiReturn(10,11)print x1,x2-1 1

Default values can be set as function parameters, but parameters that have no default are required.

We have actually already seen this behavior with DictReader

def drewRocks(s, drew_rocks=True): if drew_rocks: print "Drew rocks at "+s else: print "Drew sucks at "+s

drewRocks("eating")Drew rocks at eatingdrewRocks("dancing", drew_rocks=False)Drew sucks at dancing

drewRocks()---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-15-0aadc14722a0> in <module>()----> 1 drewRocks()

TypeError: drewRocks() takes at least 1 argument (0 given)

Order matters when passing arguments, unless you're specific

drewRocks(True, "programming")---------------------------------------------------------------------------TypeError Traceback (most recent call last)<ipython-input-2-9c847bb9e508> in <module>()----> 1 drewRocks(True, "programming")

/Users/drewconway/Dropbox/NYU/python_tutorial/tutorial_code.py in drewRocks(s, drew_rocks) 128 def drewRocks(s, drew_rocks=True): 129 if drew_rocks:--> 130 print "Drew rocks at "+s 131 else: 132 print "Drew sucks at "+s

TypeError: cannot concatenate 'str' and 'bool' objects

Let's try to write a function that calculates a factorial.

def factorial(n): if n < 0: raise ValueError("Factorials are only defined for weakly positive integers") else: if n==0: return 1 else: f = 1 for i in xrange(1,n+1): f = f * i return f

Page 13: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

In this case we do the following:

I. First check to make sure the input matches the domain of our functionII. If it does, then check if the input is equal to 0. If it does, return 1.III. If the input matches our domain, but is not equal to 1, perform the factorial operation.

Think about how we could improve this function -- we'll return to it.

What are some common programming tasks you have written, or thought you needed, to write functions for?

List them here:

map and reduce

map

Often times we want to perform the same function over some iterator, and return a copy. This pattern is SO common that Python includes the map function forperforming this task.

map is exactly analogous to R's apply family.All maps take the form of map(function,iterator)

Remember when I said functions shouldn't be disposable?

Well, sometimes they are -- especially when using a mapIn these cases we can use a lambda function

Square all the elements of a list

range(10)[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

squared = map(lambda x: x**2, range(10))squared[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

You can also overload a variable as a lambda function

s = lambda x: x**2map(s, range(10))[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

map and lambda are two great things that go great together, and you will learn to use them everywhere!

reduce

Similar to map the reduce function passes a function over an iterator, however, the reduce function aggregates the iterator using the passed function.

With map we perform the same function on each element of an iterator and return a listWith reduce we perform the same function on each element in a cumulative fashion, and return the final valueThe function passed to reduce must always have two inputs, and have some cumulative arithmetic affect.

Let's revisit the factorial function and see how we could make it more compact with a reduce

def factorial_reduce(n): if type(n) != int: return TypeError("Factorials are only defined for int type") else: if n < 0: raise ValueError("Factorials are only defined for weakly positive integers") else: if n==0: return 1 else: return reduce(lambda x,y: x*y, range(1,n+1)factorial_reduce(4)24

How the reduce worked:

I. [1,2,3,4]II. (((12)3)*4)

Page 14: Introduction to Python for Social Scientists€¦ ·  · 2015-02-05Introduction to Python for Social Scientists ... Code sharing and reproducibility are a huge positive externalities

III. 24

What else did I add?

Error handling is great, and you should learn all about it on your own!

Time to write some mappers and reducers!

Using only map, lambda and reduce functions, do the following:

I. Create a list that contains only the the offensive team for all plays in the data setII. Challenge: Return a list of (offensive, defensive) teams tuples from all the play dataIII. Return a list of the difference between the offensive and defensive teams' scores for all plays in the data setIV. Return the team on offense for the entire play data set as a single stringV. HINT: You need to know about string concatenatin in Python to get this one done.VI. Return the difference of all the score differentials you calculated in #2

Final problem

Time to combine everything you learned! Answer the follow question however you like...

Create a list of dict objects with the following key-values: {"gameid" : gameid, "winner": teamid}, where each entry contrains the winning team for everygame in the data set

<hr?

Additional resources

(things you could Google on your own)

Installing Python

If you have Mac OS X or a Linux distribution, Python is already installed on your machine (lucky you!)If you are on Windows, you will need to install the Windows binariesIf you are interested in starting with a new fresh installation, I recommend Python 2.7

Useful distributions

Because Python has so many useful libraries, there are several "full service" distributions that come with many of these libraries already installed

AnacondaEnthoughtSciPy Superpack (requires Mac OS X and Homebrew)

### Libraries you will want to learn to do real stuff

NumPy / SciPy / matplotlibHoly trinity of Python scientific computing: numerical, scientific, and plottingpandasData manipulation, R-like data.framestatsmodelsStatistical modelingscikit-learnSuite of machine learning methods and algorithmlxmlWeb scrapping and parsingnltkNatural language and text mining