Download - Learning Python from Data
![Page 1: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/1.jpg)
LEARNING PYTHON FROM DATA
Mosky
1
![Page 2: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/2.jpg)
THIS SLIDE
• The online version is at https://speakerdeck.com/mosky/learning-python-from-data.
• The examples are at https://github.com/moskytw/learning-python-from-data-examples.
2
![Page 3: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/3.jpg)
MOSKY
3
![Page 5: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/5.jpg)
MOSKY
• I am working at Pinkoi.
• I've taught Python for 100+ hours.
3
![Page 6: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/6.jpg)
MOSKY
• I am working at Pinkoi.
• I've taught Python for 100+ hours.
• A speaker atCOSCUP 2014, PyCon SG 2014, PyCon APAC 014, OSDC 2014, PyCon APAC 2013, COSCUP 2014, ...
3
![Page 7: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/7.jpg)
MOSKY
• I am working at Pinkoi.
• I've taught Python for 100+ hours.
• A speaker atCOSCUP 2014, PyCon SG 2014, PyCon APAC 014, OSDC 2014, PyCon APAC 2013, COSCUP 2014, ...
• The author of the Python packages: MoSQL, Clime, ZIPCodeTW, ...
3
![Page 8: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/8.jpg)
MOSKY
• I am working at Pinkoi.
• I've taught Python for 100+ hours.
• A speaker atCOSCUP 2014, PyCon SG 2014, PyCon APAC 014, OSDC 2014, PyCon APAC 2013, COSCUP 2014, ...
• The author of the Python packages: MoSQL, Clime, ZIPCodeTW, ...
• http://mosky.tw/3
![Page 9: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/9.jpg)
SCHEDULE
4
![Page 10: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/10.jpg)
SCHEDULE
• Warm-up
4
![Page 11: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/11.jpg)
SCHEDULE
• Warm-up
• Packages - Install the packages we need.
4
![Page 12: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/12.jpg)
SCHEDULE
• Warm-up
• Packages - Install the packages we need.
• CSV - Download a CSV from the Internet and handle it.
4
![Page 13: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/13.jpg)
SCHEDULE
• Warm-up
• Packages - Install the packages we need.
• CSV - Download a CSV from the Internet and handle it.
• HTML - Parse a HTML source code and write a Web crawler.
4
![Page 14: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/14.jpg)
SCHEDULE
• Warm-up
• Packages - Install the packages we need.
• CSV - Download a CSV from the Internet and handle it.
• HTML - Parse a HTML source code and write a Web crawler.
• SQL - Save data into a SQLite database.
4
![Page 15: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/15.jpg)
SCHEDULE
• Warm-up
• Packages - Install the packages we need.
• CSV - Download a CSV from the Internet and handle it.
• HTML - Parse a HTML source code and write a Web crawler.
• SQL - Save data into a SQLite database.
• The End4
![Page 16: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/16.jpg)
FIRST OF ALL,
5
![Page 17: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/17.jpg)
6
![Page 18: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/18.jpg)
PYTHON IS AWESOME!
6
![Page 19: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/19.jpg)
2 OR 3?
7
![Page 21: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/21.jpg)
2 OR 3?
• Use Python 3!
• But it actually depends on the libs you need.
7
![Page 22: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/22.jpg)
2 OR 3?
• Use Python 3!
• But it actually depends on the libs you need.
• https://python3wos.appspot.com/
7
![Page 23: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/23.jpg)
2 OR 3?
• Use Python 3!
• But it actually depends on the libs you need.
• https://python3wos.appspot.com/
• We will go ahead with Python 2.7,but I will also introduce the changes in Python 3.
7
![Page 24: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/24.jpg)
THE ONLINE RESOURCES
8
![Page 25: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/25.jpg)
THE ONLINE RESOURCES
• The Python Official Doc
• http://docs.python.org
• The Python Tutorial
• The Python Standard Library
8
![Page 26: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/26.jpg)
THE ONLINE RESOURCES
• The Python Official Doc
• http://docs.python.org
• The Python Tutorial
• The Python Standard Library
• My Past Slides
• Programming with Python - Basic
• Programming with Python - Adv.
8
![Page 27: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/27.jpg)
THE BOOKS
9
![Page 28: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/28.jpg)
THE BOOKS
• Learning Python by Mark Lutz
9
![Page 29: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/29.jpg)
THE BOOKS
• Learning Python by Mark Lutz
• Programming in Python 3 by Mark Summerfield
9
![Page 30: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/30.jpg)
THE BOOKS
• Learning Python by Mark Lutz
• Programming in Python 3 by Mark Summerfield
• Python Essential Reference by David Beazley
9
![Page 31: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/31.jpg)
PREPARATION
10
![Page 32: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/32.jpg)
PREPARATION
• Did you say "hello" to Python?
10
![Page 33: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/33.jpg)
PREPARATION
• Did you say "hello" to Python?
• If no, visit
• http://www.slideshare.net/moskytw/programming-with-python-basic.
10
![Page 34: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/34.jpg)
PREPARATION
• Did you say "hello" to Python?
• If no, visit
• http://www.slideshare.net/moskytw/programming-with-python-basic.
• If yes, open your Python shell.
10
![Page 35: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/35.jpg)
WARM-UPThe things you must know.
11
![Page 36: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/36.jpg)
MATH & VARS
2 + 32 - 32 * 32 / 3, -2 / 3!(1+10)*10 / 2!2.0 / 3!2 % 3!2 ** 3
x = 2!y = 3!z = x + y!print z!'#' * 10
12
![Page 37: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/37.jpg)
FOR
for i in [0, 1, 2, 3, 4]: print i!items = [0, 1, 2, 3, 4] for i in items: print i!for i in range(5): print i!!!
chars = 'SAHFI' for i, c in enumerate(chars): print i, c!!words = ('Samsung', 'Apple', 'HP', 'Foxconn', 'IBM') for c, w in zip(chars, words): print c, w
13
![Page 38: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/38.jpg)
IF
for i in range(1, 10): if i % 2 == 0: print '{} is divisible by 2'.format(i) elif i % 3 == 0: print '{} is divisible by 3'.format(i) else: print '{} is not divisible by 2 nor 3'.format(i)
14
![Page 39: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/39.jpg)
WHILE
while 1: n = int(raw_input('How big pyramid do you want? ')) if n <= 0: print 'It must greater than 0: {}'.format(n) continue break
15
![Page 40: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/40.jpg)
TRY
while 1:! try: n = int(raw_input('How big pyramid do you want? ')) except ValueError as e: print 'It must be a number: {}'.format(e) continue! if n <= 0: print 'It must greater than 0: {}'.format(n) continue! break
16
![Page 41: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/41.jpg)
LOOP ... ELSE
for n in range(2, 100): for i in range(2, n): if n % i == 0: break else: print '{} is a prime!'.format(n)
17
![Page 42: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/42.jpg)
A PYRAMID
****
************
********************
****************************
************************************
18
![Page 43: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/43.jpg)
A FATER PYRAMID
******
**********************
*******************
19
![Page 44: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/44.jpg)
YOUR TURN!
20
![Page 45: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/45.jpg)
LIST COMPREHENSION
[ n for n in range(2, 100) if not any(n % i == 0 for i in range(2, n))]
21
![Page 46: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/46.jpg)
PACKAGESimport is important.
22
![Page 47: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/47.jpg)
23
![Page 48: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/48.jpg)
GET PIP - UN*X
24
![Page 49: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/49.jpg)
GET PIP - UN*X
• Debian family
• # apt-get install python-pip
24
![Page 50: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/50.jpg)
GET PIP - UN*X
• Debian family
• # apt-get install python-pip
• Rehat family
• # yum install python-pip
24
![Page 51: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/51.jpg)
GET PIP - UN*X
• Debian family
• # apt-get install python-pip
• Rehat family
• # yum install python-pip
• Mac OS X
• # easy_install pip24
![Page 52: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/52.jpg)
GET PIP - WIN *
25
![Page 53: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/53.jpg)
GET PIP - WIN *
• Follow the steps in http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows.
25
![Page 54: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/54.jpg)
GET PIP - WIN *
• Follow the steps in http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows.
• Or just use easy_install to install. The easy_install should be found at C:\Python27\Scripts\.
25
![Page 55: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/55.jpg)
GET PIP - WIN *
• Follow the steps in http://stackoverflow.com/questions/4750806/how-to-install-pip-on-windows.
• Or just use easy_install to install. The easy_install should be found at C:\Python27\Scripts\.
• Or find the Windows installer on Python Package Index.
25
![Page 56: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/56.jpg)
3-RD PARTY PACKAGES
26
![Page 57: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/57.jpg)
3-RD PARTY PACKAGES
• requests - Python HTTP for Humans
26
![Page 58: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/58.jpg)
3-RD PARTY PACKAGES
• requests - Python HTTP for Humans
• lxml - Pythonic XML processing library
26
![Page 59: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/59.jpg)
3-RD PARTY PACKAGES
• requests - Python HTTP for Humans
• lxml - Pythonic XML processing library
• uniout - Print the object representation in readable chars.
26
![Page 60: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/60.jpg)
3-RD PARTY PACKAGES
• requests - Python HTTP for Humans
• lxml - Pythonic XML processing library
• uniout - Print the object representation in readable chars.
• clime - Convert module into a CLI program w/o any config.
26
![Page 61: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/61.jpg)
YOUR TURN!
27
![Page 62: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/62.jpg)
CSVLet's start from making a HTTP request!
28
![Page 63: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/63.jpg)
HTTP GET
import requests!#url = 'http://stats.moe.gov.tw/files/school/101/u1_new.csv'url = 'https://raw.github.com/moskytw/learning-python-from-data-examples/master/sql/schools.csv'!print requests.get(url).content!#print requests.get(url).text
29
![Page 64: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/64.jpg)
FILE
save_path = 'school_list.csv'!with open(save_path, 'w') as f: f.write(requests.get(url).content)!with open(save_path) as f: print f.read()!with open(save_path) as f: for line in f: print line,
30
![Page 65: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/65.jpg)
DEF
from os.path import basename!def save(url, path=None):! if not path: path = basename(url)! with open(path, 'w') as f: f.write(requests.get(url).content)
31
![Page 66: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/66.jpg)
CSV
import csvfrom os.path import exists!if not exists(save_path): save(url, save_path)!with open(save_path) as f: for row in csv.reader(f): print row
32
![Page 67: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/67.jpg)
+ UNIOUT
import csvfrom os.path import existsimport uniout # You want this!!if not exists(save_path): save(url, save_path)!with open(save_path) as f: for row in csv.reader(f): print row
33
![Page 68: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/68.jpg)
NEXT
with open(save_path) as f: next(f) # skip the unwanted lines next(f) for row in csv.reader(f): print row
34
![Page 69: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/69.jpg)
DICT READER
with open(save_path) as f: next(f) next(f) for row in csv.DictReader(f): print row!# We now have a great output. :)
35
![Page 70: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/70.jpg)
DEF AGAIN
def parse_to_school_list(path): school_list = [] with open(path) as f: next(f) next(f) for school in csv.DictReader(f): school_list.append(school)! return school_list[:-2]
36
![Page 71: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/71.jpg)
+ COMPREHENSION
def parse_to_school_list(path='schools.csv'): with open(path) as f: next(f) next(f) school_list = [school for school in csv.DictReader(f)][:-2]! return school_list
37
![Page 72: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/72.jpg)
+ PRETTY PRINT
from pprint import pprint!pprint(parse_to_school_list(save_path))!# AWESOME!
38
![Page 73: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/73.jpg)
PYTHONIC
school_list = parse_to_school_list(save_path)!# hmmm ...!for school in shcool_list: print shcool['School Name']!# It is more Pythonic! :)!print [school['School Name'] for school in school_list]
39
![Page 74: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/74.jpg)
GROUP BY
from itertools import groupby!# You MUST sort it.keyfunc = lambda school: school['County']school_list.sort(key=keyfunc)!for county, schools in groupby(school_list, keyfunc): for school in schools: print '%s %r' % (county, school) print '---'
40
![Page 75: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/75.jpg)
DOCSTRING
'''It contains some useful function for paring data from government.'''!def save(url, path=None): '''It saves data from `url` to `path`.''' ...!--- Shell ---!$ pydoc csv_docstring
41
![Page 76: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/76.jpg)
CLIME
if __name__ == '__main__': import clime.now!--- shell ---!$ python csv_clime.pyusage: basename <p> or: parse-to-school-list <path> or: save [--path] <url>!It contains some userful function for parsing data from government.
42
![Page 77: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/77.jpg)
DOC TIPS
help(requests)!print dir(requests)!print '\n'.join(dir(requests))
43
![Page 78: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/78.jpg)
YOUR TURN!
44
![Page 79: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/79.jpg)
HTMLHave fun with the final crawler. ;)
45
![Page 80: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/80.jpg)
LXML
import requestsfrom lxml import etree!content = requests.get('http://clbc.tw').contentroot = etree.HTML(content)!print root
46
![Page 81: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/81.jpg)
CACHE
from os.path import exists!cache_path = 'cache.html'!if exists(cache_path): with open(cache_path) as f: content = f.read()else: content = requests.get('http://clbc.tw').content with open(cache_path, 'w') as f: f.write(content)
47
![Page 82: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/82.jpg)
SEARCHING
head = root.find('head')print head!head_children = head.getchildren()print head_children!metas = head.findall('meta')print metas!title_text = head.findtext('title')print title_text
48
![Page 83: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/83.jpg)
XPATH
titles = root.xpath('/html/head/title')print titles[0].text!title_texts = root.xpath('/html/head/title/text()')print title_texts[0]!as_ = root.xpath('//a')print as_print [a.get('href') for a in as_]
49
![Page 84: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/84.jpg)
MD5
from hashlib import md5!message = 'There should be one-- and preferably only one --obvious way to do it.'!print md5(message).hexdigest()!# Actually, it is noting about HTML.
50
![Page 85: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/85.jpg)
DEF GET
from os import makedirsfrom os.path import exists, join!def get(url, cache_dir_path='cache/'):! if not exists(cache_dir_path): makedirs(cache_dir)! cache_path = join(cache_dir_path, md5(url).hexdigest())! ...
51
![Page 86: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/86.jpg)
DEF FIND_URLS
def find_urls(content): root = etree.HTML(content) return [ a.attrib['href'] for a in root.xpath('//a') if 'href' in a.attrib ]
52
![Page 87: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/87.jpg)
BFS 1/2
NEW = 0QUEUED = 1VISITED = 2!def search_urls(url):! url_queue = [url] url_state_map = {url: QUEUED}! while url_queue:! url = url_queue.pop(0) print url
53
![Page 88: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/88.jpg)
BFS 2/2
# continue the previous page try: found_urls = find_urls(get(url)) except Exception, e: url_state_map[url] = e print 'Exception: %s' % e except KeyboardInterrupt, e: return url_state_map else: for found_url in found_urls: if not url_state_map.get(found_url, NEW): url_queue.append(found_url) url_state_map[found_url] = QUEUED url_state_map[url] = VISITED
54
![Page 89: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/89.jpg)
DEQUE
from collections import deque...!def search_urls(url): url_queue = deque([url])... while url_queue:! url = url_queue.popleft() print url...
55
![Page 90: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/90.jpg)
YIELD
...!def search_urls(url):... while url_queue:! url = url_queue.pop(0) yield url... except KeyboardInterrupt, e: print url_state_map return...
56
![Page 91: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/91.jpg)
YOUR TURN!
57
![Page 92: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/92.jpg)
SQLHow about saving the CSV file into a db?
58
![Page 93: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/93.jpg)
TABLE
CREATE TABLE schools ( id TEXT PRIMARY KEY, name TEXT, county TEXT, address TEXT, phone TEXT, url TEXT, type TEXT);!DROP TABLE schools;
59
![Page 94: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/94.jpg)
CRUD
INSERT INTO schools (id, name) VALUES ('1', 'The First');INSERT INTO schools VALUES (...);!SELECT * FROM schools WHERE id='1';SELECT name FROM schools WHERE id='1';!UPDATE schools SET id='10' WHERE id='1';!DELETE FROM schools WHERE id='10';
60
![Page 95: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/95.jpg)
COMMON PATTERN
import sqlite3!db_path = 'schools.db'conn = sqlite3.connect(db_path)cur = conn.cursor()!cur.execute('''CREATE TABLE schools ( ...)''')conn.commit()!cur.close()conn.close()
61
![Page 96: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/96.jpg)
ROLLBACK
...!try: cur.execute('...')except: conn.rollback() raiseelse: conn.commit()!...
62
![Page 97: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/97.jpg)
PARAMETERIZE QUERY
...!rows = ...!for row in rows: cur.execute('INSERT INTO schools VALUES (?, ?, ?, ?, ?, ?, ?)', row)!conn.commit()!...
63
![Page 98: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/98.jpg)
EXECUTEMANY
...!rows = ...!cur.executemany('INSERT INTO schools VALUES (?, ?, ?, ?, ?, ?, ?)', rows)!conn.commit()!...
64
![Page 99: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/99.jpg)
FETCH
...cur.execute('select * from schools')!print cur.fetchone()!# orprint cur.fetchall()!# orfor row in cur: print row...
65
![Page 100: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/100.jpg)
TEXT FACTORY
# SQLite only: Let you pass the 8-bit string as parameter.!...!conn = sqlite3.connect(db_path)conn.text_factory = str!...
66
![Page 101: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/101.jpg)
ROW FACTORY
# SQLite only: Let you convert tuple into dict. It is `DictCursor` in some other connectors.!def dict_factory(cursor, row): d = {} for idx, col in enumerate(cursor.description): d[col[0]] = row[idx] return d!...con.row_factory = dict_factory...
67
![Page 102: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/102.jpg)
MORE
68
![Page 103: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/103.jpg)
MORE
• Python DB API 2.0
68
![Page 104: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/104.jpg)
MORE
• Python DB API 2.0
• MySQLdb - MySQL connector for Python
68
![Page 105: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/105.jpg)
MORE
• Python DB API 2.0
• MySQLdb - MySQL connector for Python
• Psycopg2 - PostgreSQL adapter for Python
68
![Page 106: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/106.jpg)
MORE
• Python DB API 2.0
• MySQLdb - MySQL connector for Python
• Psycopg2 - PostgreSQL adapter for Python
• SQLAlchemy - the Python SQL toolkit and ORM
68
![Page 107: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/107.jpg)
MORE
• Python DB API 2.0
• MySQLdb - MySQL connector for Python
• Psycopg2 - PostgreSQL adapter for Python
• SQLAlchemy - the Python SQL toolkit and ORM
• MoSQL - Build SQL from common Python data structure.
68
![Page 108: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/108.jpg)
THE END
69
![Page 109: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/109.jpg)
THE END
• You learned how to ...
69
![Page 110: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/110.jpg)
THE END
• You learned how to ...• make a HTTP request
69
![Page 111: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/111.jpg)
THE END
• You learned how to ...• make a HTTP request• load a CSV file
69
![Page 112: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/112.jpg)
THE END
• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file
69
![Page 113: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/113.jpg)
THE END
• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file• write a Web crawler
69
![Page 114: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/114.jpg)
THE END
• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file• write a Web crawler• use SQL with SQLite
69
![Page 115: Learning Python from Data](https://reader033.vdocuments.us/reader033/viewer/2022052821/5549ab07b4c90507608b57ca/html5/thumbnails/115.jpg)
THE END
• You learned how to ...• make a HTTP request• load a CSV file• parse a HTML file• write a Web crawler• use SQL with SQLite• and lot of techniques today. ;)
69