python data structures · pythondatastructures zbigniewjędrzejewski-szmek george mason university...

46
Python Data Structures Zbigniew Jędrzejewski-Szmek George Mason University Python Summer School, Zürich, September 02, 2013 Version Zürich-83-gf79f32d This work is licensed under the Creative Commons CC BY-SA 3.0 License. 1 / 46

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Python Data Structures

Zbigniew Jędrzejewski-Szmek

George Mason University

Python Summer School, Zürich, September 02, 2013

Version Zürich-83-gf79f32d

This work is licensed under the Creative Commons CC BY-SA 3.0 License.

1 / 46

Page 2: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Outline

1 Introduction (using lists)How does Python manage memory?Comprehensions

2 Dictionaries, and setsDictionariesHow to create a dictionary?Specialized dictionary subclassesSets

3 IteratorsIterator classesGenerator expressionsGenerator functions

4 The end

2 / 46

Page 3: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Introduction (and lists)

3 / 46

Page 4: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Why?

You can write sort(), but it’ll take one day.DRY. Use proper data structures.

4 / 46

Page 5: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

The venerable list

A sequence of arbitrary values

Quiz:Is list.pop(0) faster than list.pop()?

5 / 46

Page 6: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Test, test, test

N = 1000L0 = range(N)L = L0[:]for i in xrange(N):

L.pop(0)

6 / 46

Page 7: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

How does time required depend on problem size?

0 200000 400000 600000 800000 1000000 1200000 1400000N

0

200

400

600

800

1000

1200

t/

us

7 / 46

Page 8: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Reminder: computational complexity

f (x) = O(g(x))

There’s a constant C , such thatf (x) 6 C · g(x) for big enough x .

For “computational complexity” x is N

8 / 46

Page 9: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

What does this mean?

Computational complexity depends on the data structure and operation —good to understand what you’re doing

9 / 46

Page 10: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

How is list implemented?

12 4 3

56

78

910 13 12

1415

1611

L

list.pop(0) list.pop(-1)

10 / 46

Page 11: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

pop(0) vs pop(-1)

0 200000 400000 600000 800000 1000000 1200000 1400000N

0

200

400

600

800

1000

1200

t/

us

11 / 46

Page 12: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Solutionfor the .pop(0) problem

collections.deque pops from both ends cheaply:.pop().popleft()

. . . but does not allow other removals in the middle

12 / 46

Page 13: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

The complexity constant

f (x) 6 C · g(x)

13 / 46

Page 14: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Complexity constant exampleCPython vs. Jython

0 200000 400000 600000 800000 1000000 1200000 1400000N

0

500

1000

1500

2000

2500

t/

us

14 / 46

Page 15: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Testing is not enough here

Time complexity issuesdo not show up during development,because O(N2) is small for small N.

15 / 46

Page 16: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Removing an element from a list

N = 1000L0 = range(N)L = L0[:]...

Why?

16 / 46

Page 17: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists)

Objects are kept around if they are referenced

ints are referenced from both L and L0

12 4 3

56

78

910 13 12

1415

1611

L0

L

17 / 46

Page 18: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists) How does Python manage memory?

Other languages have “variables”

int a = 1;

a = 2;

int b = a;

Idiomatic Python by David Goodger

18 / 46

Page 19: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists) How does Python manage memory?

Python has “names”

a = 1

a = 2

b = a

Idiomatic Python by David Goodger

19 / 46

Page 20: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Introduction (using lists) Comprehensions

List comprehensions

[day for day in week]

[day for day in week if day[0] != ’M’ ]

[(day, hour) for day in weekfor hour in [’12:00’, ’14:00’, ’16:00’] ]

20 / 46

Page 21: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets

Dictionaries and sets

21 / 46

Page 22: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

Creating dictionaries

dict == a mapping of keys to arbitrary values

D = {’key1’: "value1",2: [’a’, ’list’, ’here’],None: None,

}

22 / 46

Page 23: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

From an iterator

text = """\lundi Mondaymardi Tuesdaymercredi Wednesdayjeudi Thursdayvendredi Fridaysamedi Saturdaydimanche Sunday"""

source = (line.split() for line in text.splitlines())

weekdays = dict(source)

23 / 46

Page 24: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

From a comprehension

{x:x**2 for x in xrange(10)}

24 / 46

Page 25: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

How much does it cost to enter an item in the dictionary?

>>> D = {}>>> L = range(10000)>>> for i in L:... D[i] = i

O∗(1)

25 / 46

Page 26: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

How much does it cost to retrieve an item from thedictionary?

>>> for i in L:... assert D[i] == i

O(1)

26 / 46

Page 27: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

How are dictionaries organized?

12 4 3

56

10 7 8

9

D

27 / 46

Page 28: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

Weekdays in French and in English

The order of keys is random!

A list of the weekdays?

>>> weekdays.keys()

[’mardi’, ’samedi’, ’vendredi’, ’jeudi’,’lundi’, ’dimanche’, ’mercredi’]

28 / 46

Page 29: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets How to create a dictionary?

Demohash_demo.py

# 1. Initialize a dict with the same keys.# 2. Display keys.

from __future__ import print_function

d = dict(monday=1, tuesday=2, wednesday=3,thursday=4, friday=5)

print(d.keys())print(’monday ->’, hash(’monday’))

29 / 46

Page 30: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Specialized dictionary subclasses

collections.OrderedDict

# source = ((’lundi’, ’Monday’),# (’mardi’, ’Tuesday’),# ...

>>> weekdays = collections.OrderedDict(source)

>>> weekdays.keys()[’lundi’, ’mardi’, ’mercredi’, ’jeudi’,’vendredi’, ’samedi’, ’dimanche’]

30 / 46

Page 31: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Specialized dictionary subclasses

Classifying objects

Sorting objects into groupsgrades.log:

20120101 John 520120102 John 420120107 Mary 320120109 Jane 2...

31 / 46

Page 32: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Specialized dictionary subclasses

Sorting objects into groupsversion 0

grades = {}for line in open(’grades.log’):

date, person, grade = line.split()grades[person].append(grade)

32 / 46

Page 33: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Specialized dictionary subclasses

Sorting objects into groupsversion 0.5

grades = {}for line in open(’grades.log’):

date, person, grade = line.split()if person not in grades:

grades[person] = []grades[person].apend(grade)

33 / 46

Page 34: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Specialized dictionary subclasses

Sorting objects into groupsversion 1.0

grades = collections.defaultdict(list)for line in open(’grades.log’):

date, person, grade = line.split()grades[person].append(grade)

34 / 46

Page 35: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Specialized dictionary subclasses

Counting objects in groups

Let’s make a histogram

grades = collections.Counter()for line in open(’grades.log’):

date, person, grade = line.split()grades[grade] += 1

>>> gradesCounter({’4’: 2, ’3’: 1, ’2’: 1, ’5’: 1})

>>> print(’\n’.join(x + ’ ’ + ’x’*y... for (x, y) in sorted(grades.items())))

1 x2 xxx3 x4 xxx5 xx

35 / 46

Page 36: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Sets

Set

A non-ordered collection of unique elements

visitors = set()

for connection in connection_log():ip = connection.ipif ip not in visitors:

print(’new visitor’, ip)visitors.add(ip)

36 / 46

Page 37: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Dictionaries, and sets Sets

Using sets

Poetry:find words used by the poet, sorted by length

>>> shak = ’’’\... She walks in beauty, like the night... Of cloudless climes and starry skies;... And all that’s best of dark and bright... Meet ...... ’’’

>>> words = set(shak.lower().translate(None, ’,;’).split())>>> words{’she’, ’like’, ’cloudless’, ...}

>>> sorted(words, key=len, reverse=True)[’cloudless’, ’starry’, ’beauty’, ...]

37 / 46

Page 38: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators

Iterators

38 / 46

Page 39: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators

Collections and their iterators

sequence .__iter__() −→ iterator

iterator .__iter__() −→self

iterator .next() −→ item

iterator .next() −→ item

iterator .next() −→ item

39 / 46

Page 40: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators

Iterators can be non-destructive or destructive

list

file

40 / 46

Page 41: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators

How can we create an iterator?

1 write a class with .__iter__ and .next2 use a generator expression3 write a generator function

41 / 46

Page 42: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators Iterator classes

1. Iterator class

import random

class randomaccess(object):def __init__(self, alist):

self.indices = range(len(alist))random.shuffle(self.indices)self.alist = alist

def next(self):return self.alist[self.indices.pop()]

def __iter__(self):return self

42 / 46

Page 43: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators Generator expressions

2. Generator expressions

# chomped lines(line.rstrip() for line in open(some_file))

# remove comments(line for line in open(some_file)

if not line.startswith(’#’))

>>> type([i for i in ’abc’])<type ’list’>>>> type( i for i in ’abc’ )<type ’generator’>

43 / 46

Page 44: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators Generator functions

3. Generator functions

>>> def countdown(n):... print ’--start’... for i in range(n, 0, -1):... yield i... print ’--done’

>>> for i in countdown(3):... print i--start321--done

>>> g = gen(2)>>> g<generator object ...>

>>> g.next()--start2

>>> g.next()1

>>> g.next()--stopTraceback (most recent call last):

...StopIteration

44 / 46

Page 45: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

Iterators Generator functions

Generator objects__iter__ and next

>>> g.__iter__()<generator object gen at 0x7f24a5e1c690>>>> g<generator object gen at 0x7f24a5e1c690>

>>> g.next()--start1

45 / 46

Page 46: Python Data Structures · PythonDataStructures ZbigniewJędrzejewski-Szmek George Mason University PythonSummerSchool,Zürich,September02,2013 Version Zürich-83-gf79f32d This work

The end

That’s all!

46 / 46