session 2 wharton summer tech camp 1: basic python 2: start regex
TRANSCRIPT
Session 2Wharton Summer Tech Camp
1: Basic Python2: Start Regex
Announcement
If you did not get an email from me saying that the slides have been uploaded, please email me and I’ll add you to the list
Python Packaged Distribution
• Download this packaged version • Enthought Canopy or EPD– Company that maintains a great compiled version of
Python.– Has many packages included. – Alternative is to download python and install countless
number of packages -> can be a nightmare due to compiler incompatibility etc
– https://www.enthought.com/products/canopy/academic/• Free for people with EDU email
Why ?• Has many great packages useful for us (Scientific computing, Machine
Learning, NLP, Scraping etc) • One of the easiest and concise language yet powerful
– Memory consumption was often "better than Java and not much worse than C or C++”
• Has IDLE ("Interactive DeveLopment Environment") – Read-Eval-Print-Loop
• Great OOP (Compared to other comparable languages, say PERL. bless() those who use it)
• Highly scalable • Easy incorporation of other languages (Cython, Jython) • Named after Monty Python
Used by many companies as prototyping and "duct-tape" language as well as the main language: Wall Street, Yahoo, CERN, NASA, Con Edison, Google, etc. Also Youtube is written in Python!
Bit More Background on Python• Does few things EXCELLENTLY (OOP, Sci Comp, etc) and is generally good
for lot of things• Guido van Rossum – late 1980s• Programmer oriented (easy to write and read). Use of white space.• Automatic memory management • Can be interpreted or compiled (PyPy – Just-in-time compiler)• Direct opposite of PERL when it comes to programming philosophy
– PERL "there is more than one way to do it" -> Super fun when writing your own code. Rage when you debug other people’s PERL code (there is even a contest Obfuscated PERL)
– Python "there should be one—and preferably only one—obvious way to do it" -> Writing your own & Reading others’ = Fun
• Would you like to know more? – http://www.youtube.com/watch?v=ugqu10JV7dk– Van Rossum talks about history of python for 110 min!
Let’s start coding in Python!Fire up your IDLE.
Load the file called basicpython.py from the camp website
Basic Data Types
• All the standard types– Integers, floating• 2, 2.2, 3.14 etc
– Strings • “Hi, I am a string”
– Booleans • True• False
Hello World & Arithmetic
Helloworld.py >>> print "hello, world!" #that's it# <- used for commenting
Simple Arithmetic (+ - * ** / %)>>> 1+1>>> 5**2
Booleans (operators: and, or, not, >, <, <=, ==, !=, etc)>>> True >>> False
Strings
string="hello";string+stringstring*3string[0]string[-1]string[1:4]len(string)
Lists, Tuples, and Dictionaries
Data structures – there are many but 4 most commonly used. Each has pros and cons.
• List – list of values • Sets – set(list). You can do set operations which can be faster
than going through array element one at a time.• Tuples – just like list but not mutable and fixed size. Also, style-
wise, array usually consist of homogeneous stuff while tuples can consist of heterogeneous stuff and make a some sort of structure. (firstname, lastname) (name, age)
• Dictionaries – Hash look up table. Index of stuff. Basic book keeping "Key->Value". Fast look up O(1).
Lists, Tuples, and Dictionaries
• List – []>>> TPlayersList=["Federer","Nadal","Murray", "Djokovic"]range(), append(),pop(),insert(),reverse(),sort() e.g. TPlayersList.sort()
• Tuples – ()>>> TPlayersTuple=("Federer","Nadal","Murray", "Djokovic")
• Dictionaries – {}>>> TPlayersDict={ "Federer": 5, "Nadal": 4, "Murray":2, "Djokovic":1}>>>TPlayersDict["Ferrer"]=3>>>TPlayersDict["Ferrer"]>>>del TPlayersDict["Ferrer"]let d be a dictionary then d.keys(), d.values(), d.items()
• When you are first reading in Data– Think carefully about what you want to do with the data – Then decide what data structures to use– It is common to have things like
• Array of arrays• Array of tuples • Dictionary of arrays• Dictionary of dictionaries• Dictionary made of (tuple keys)
– However, once you need things like dictionary of dictionary of dictionary of arrays or similar ridiculous structures, consider using object-oriented programming • Look up python Classes
(http://docs.python.org/2/tutorial/classes.html)
Lists, Tuples, and Dictionaries
Basic Control Flow
• Boils down to– If (elif, else)–While– For
• Python has better syntactic sugar for control flow to iterate through different data structure
Basic Control Flow
• True Things – True– Any non-zero numbers– Any non-empty string or data structure
• False Things – False – 0– “”– Empty data structures
If and while
if True: print "everything is good”else: print "?! HUHHHHH?"
i=1while (i<=5): print "Hellodoctornamecontinueyesterdaytomorrow" i+=1 if i>5: print "good morning dr. chandra"
Basic Control Flow - forfor player in TPlayersList: print player
for player in sorted(TPlayersList): print player
for index, player in enumerate(TPlayersList): print index, player
for i in xrange(1,10,2): print i
for key, value in TPlayersDict.iteritems(): print key, value
continue and break
• While running loops, you may need to skip or stop at some point, look up – continue– break
Defining a function
def fib(n): # write Fibonacci series up to n """Print a Fibonacci series up to n.""" a, b = 0, 1 while a < n: print a, a, b = b, a+b
Importing Libraries
• Import library• E.g. “import sys”• Some useful libraries
– sys– re– csv– scipy– numpy
• http://wiki.python.org/moin/UsefulModules#Useful_Modules.2C_Packages_and_Libraries
File IO
• Reading data files into the memory • open() – returns a file object which can read or
write files• open(filename, mode)• filehandle= open(filename, mode)• filehandle.readline() Mode• r= read w=write a=append rb=read in binary
(windows makes that distinction)
Python Example 1
• Reading a CSV and saving each row as an array– Dealing with CSV can be very painful. – Sometimes different character encoding causes
problem when reading csv – If CSV reading just doesn’t work, suspect that you
have an encoding issue. Look up encodings (ISO-8859-1/latin1 to UTF-8)
– This is why no serious programs really use csv as a storage mechanism
• Fire up csvRead.py
Lab
Do Interactive tutorials athttp://www.codecademy.com/courses/
http://www.learnpython.org/