pydiomatic
DESCRIPTION
Python is a high level language focused on readability. The Python community developed the concept of "Pythonic Code", requiring not only semantic correctness, but also conformity to universally acknowledged stylistic criteria.A pre-requisite to write pythonic code is to write idiomatic code. Using the right idioms is a matter of acquired taste and experience, however, some idioms are quite easy to learn.This presentation focuses on some of these idioms and other stylistic criteria:* for vs. while* iterators, itertools* code conventions (space invaders)* avoid default values bugs* first order functions* internal/external iterators* substituting the switch statement* properties, attributes, read only objects* named tuples* duck typings* bits of metaprogramming* exception management: LBYL vs. EAFPTRANSCRIPT
Idiomatic PythonEnrico [email protected]
1
Could you please lend me the thing that you put in the wall when you want to turn on the hairdryer and
the hairdryer comes from a different country?
Could you please lend me a power adapter?
2
If you are out to describe the truth, leave elegance to the tailor.
Albert Einstein
3
Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.
Brian Kernighan
4
READABILITYCOUNTS
Zen of Python
5
TOC
Iteration
Naming
Functions are objects
Choice
Attributes and methods
Duck Typing
Exceptions [unless TimeoutError is thrown]
6
FOR vs. WHILE vs. ...
Iteration vs. Recursion
sys.setrecursionlimit(n)
for vs. while
Traditionally bounded iteration vs. unbounded iteration
In C for and while are completely equivalent
Some languages have for/foreach to iterate on collections
for file in *.py; do pygmentize -o ${file%.py}.rtf $filedone
7
Numerical Iteration
int i = 0; while(i < MAX) { printf("%d\n", i); ++i; }
int i = 0; for(i=0; i < MAX; ++i) { printf("%d\n", i); }
i = 0while i < MAX: print i i += 1
# O(n) spacefor i in range(MAX): print i
# O(1) spacefor i in xrange(MAX): print i
8
Iteration on elements
It is also common to iterate on elements of some collection
C uses indices to iterate on array elements
Python uses for
What if we want to iterate both on elements and indices?
i = 0while i < len(lst): process(lst[i]) i += 1
for el in lst: process(el)
BAD
GOOD
9
j = 0while j < len(lst): process(index=j, element=lst[j]) j += 1
for j in range(len(lst)): process(index=j, element=lst[j])
for j, el in enumerate(lst): process(index=j, element=el)
BAD
GOOD
BAD
10
What about Turing?
for is usually considered the more pythonic alternative
Ideally every iteration should be done using for
However, we have shown only iteration on finite collections, that is to say, for would not provide turing-completeness
But everybody knows about generators: Python has infinite (lazy) sequences and they cover many other patterns as well
11
Design Implications
Python for statement uses external iterators, that are extremely easy to implement through generators
itertools provides lots of functions to manipulate iterators
The iteration logic is pushed inside the iterator; the client code becomes totally agnostic on how values are generated
12
def server_socket(host, port): sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.bind((host, port)) sock.listen(5) csock, info = sock.accept() return csock.makefile('rw')
def server(host, port): fh = server_socket(host, port) for i, line in enumerate(fh): if line == "EOF\r\n": break fh.write("%4.d:\t%s" % (i, line)) fh.close()
... (Forking)TCPServer and higher level modules and frameworks are better!
13
def depth_first_visit(node): stack = [node, ] while stack: current_node = stack.pop() stack.extend(reversed(current_node.children)) yield current_node.value def breadth_first_visit(node): queue = collections.deque((node, )) while queue: current_node = queue.popleft() queue.extend(current_node.children) yield current_node.value
for v in depth_first_visit(tree): print v,print
for v in breadth_first_visit(tree): print v,print
14
PEP-8
http://www.python.it/doc/articoli/pep-8.html
‘‘‘One of Guido\’s key insights is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make consistent across the wide spectrum of Python code. As PEP 20 [6] says, “Readability counts”.’’’
http://www.python.org/dev/peps/pep-0008/
15
PEP-8 (II)
Standard for source code style
names
whitespace
indentation
Consistency with this style guide is important.
Consistency within a project is more important.
Consistency within one module or function is most important.
16
Indentation4 spaces, don’t mix tabs and spaces
79 characters per line max
Wrap lines in using implied line cont. in (), [] and {}
Add parentheses to wrap lines
Sometimes backslash is more appropriate
Newline after operators
One blank line between functions, two between classes
(not filename.startswith('.') and filename.endswith(('.pyc', '.pyo')))
17
Space Invaders
Put a space after “,” [parameters, lists, tuples, etc]
Put a space after “:” in dicts, not before
Put spaces around assignments and comparisons
Unless it is an argument list
No spaces just inside parentheses or just before argument lists
18
Naming conventions (I)
Always use descriptive names; the longer the scope, the longer the name
Trailing underscore: avoids conflict with keywords or builtins (class_)
Leading underscore: “internal use”/non-public
Double leading underscore: name mangling
Double leading and trailing: “magic”
Avoid l, 1 and similar confusing names
19
Naming conventions (II)
simple lower_case CamelCase ALL_CAPSClasses
VariablesMethodsFunctionsConstantsPackagesModules
XX XX XX X
XXX (x)
... and self/cls first argument name for methods
20
Default values
The default values are evaluated once, when the function is defined and is ‘shared’ among all call points
If the default value is a mutable object, that leads to bugs
>>> def f(x=[]): ... x.append(1)... return x... >>> f()[1]>>> f()[1, 1]>>> f()[1, 1, 1]
>>> def g(x=None):... x = [] if x is None else x... return x... >>> g()[]>>> g([1, 2])[1, 2]
21
Functions are ObjectsIn Python everything is an object
Thus, functions are objects
Functions can be passed as arguments (easy)
Functions can be returned as return values
Some APIs explicitly expect functions as arguments (sort(key=))
import sys, urllibdef reporthook(*a): print afor url in sys.argv[1:]: i = url.rfind('/') file = url[i+1:] print url, "->", file urllib.urlretrieve(url, file, reporthook)
22
Internal Iteratorsdef dfs(node, action): stack = [node, ] while stack: current_node = stack.pop() stack.extend(reversed(current_node.children)) action(current_node.value)
def bfs(node, action): queue = collections.deque((node, )) while queue: current_node = queue.popleft() queue.extend(current_node.children) action(current_node.value)
dfs(tree, lambda x: sys.stdout.write("%s, " % x))
23
def dfs(node, pre_action=None, post_action=None): def nop(node): pass pre_action = pre_action or nop # bad, use if post_action = post_action or nop # bad stack = [] def process_node(n): def do_pre(): pre_action(n.value) def do_post(): post_action(n.value) def do_process(): stack.append(do_post) for child in reversed(n.children): stack.append(process_node(child)) stack.append(do_pre) return do_process stack.append(process_node(node)) while stack: action = stack.pop() action()
dfs(tree, pre_action=lambda x: sys.stdout.write("%s, " % x))printdfs(tree, post_action=lambda x: sys.stdout.write("%s, " % x))print
24
AA
B C
D E
Pre
Proc
Post
A C B A
A C B
BA C B
A C B
A C
A C E D C
A C E D
A C E D D
A C E D
A C E E
A C E1
2
3
4
5
6
7
8
9
10
11
A C E
A C
A
12
13
14
15
25
def dfs(node, pre_action=None, post_action=None): def nop(node): pass pre_action = pre_action or nop post_action = post_action or nop stack = []
def process_node(n): def do_pre(): pre_action(n.value) def do_post(): post_action(n.value) def do_process(): stack.append(do_post) for child in reversed(n.children): stack.append(process_node(child)) stack.append(do_pre) return do_process
stack.append(process_node(node))
while stack: action = stack.pop() action()
26
Command Pattern is obsolete...
class TreePrinter(object): def __init__(self, fh, step=' '): self.out = fh self.step = step self.level = 0
def pre_print(self, value): self.out.write(self.step * self.level) self.out.write(str(value)) self.out.write('\n') self.level += 1
def post_print(self, _): self.level -= 1
tp = TreePrinter(sys.stdout)dfs(tree, tp.pre_print, tp.post_print)
0 1
2 3 4
5 6 7
8 9 10
11
27
The case of the missing switch
Some people think Python should have a switch/case like statement, something that executes a block of code determined by the value of a variable
Possible solutions
Python if/elif/else statement
Seems the job for a dictionary + functions
A cleverly designed class can solve the problem as well
28
What if we use the if?
An if statement is easy to read and write, if there are few branches. Confusing if there are many branches
Theoretically correct (provided that the conditions are disjoint)
Maybe slower as conditions are evaluated in order
Some suggest that if statements should be banned ;)
f (x1,…, xn ) =
φ1 x1,…, xn( ) if ρ1 x1,…, xn( )
φm x1,…, xn( ) if ρm x1,…, xn( )φm+1 x1,…, xn( ) otherwise
⎧
⎨
⎪⎪
⎩
⎪⎪
29
Dictionary
If the body of the switch essentially sets some (set of) variable(s), a dictionary is perfect
def some_function(n, *more_args): # ... masks = { 0: '0000', 1: '0001', 2: '0010', 3: '0011', 4: '0100', 5: '0101', 6: '0110', 7: '0111', 8: '1000', 9: '1001', 10: '1010', 11: '1011', 12: '1100', 13: '1101', 14: '1110', 15: '1111' } # ... str_bits = masks[n]
30
Dictionary [+ Functions]
If the “actions” in the branches are naturally abstracted as functions, a dictionary is perfectimport operator# ...class BinOp(Node): # ... def compute(self): operations = { '+': operator.add, '-': operator.sub, '*': operator.mul, '/': operator.div } return operations[self.op](self.left.compute(), self.right.compute())
31
import cmd
class Example(cmd.Cmd): def do_greet(self, rest): print 'Hello %s!' % rest
def do_quit(self, rest): return True
while 1: words = raw_input('(cmd) ').split(' ', 1) command = words[0] try: rest = words[1] except IndexError: rest=''
switch command: case 'greet': print 'Hello %s!' % rest case 'quit': break
32
Properties are a neat way to implement attributes whose usage resembles attribute access, but whose implementation uses method calls.
These are sometimes known as “managed attributes”.
GvR
33
class Track(object): def __init__(self, artist, title, duration): self.artist = artist self.title = title self.duration = duration
def __str__(self): return '%s - %s - %s' % (self.artist, self.title, self.duration)
34Example (Track)
Properties (I)
Track has public attributes
“Java” bad-practice
Dependency from “implementation details”
What if we need validation in setters and such?
property: old attribute access syntax, function calls under the hood
class A(object): def __init__(self, foo): self._foo = foo
def get_foo(self): print 'got foo' return self._foo
def set_foo(self, val): print 'set foo' self._foo = val
foo = property(get_foo, set_foo)
a = A('hello')print a.foo# => 'got foo'# => 'hello'a.foo = 'bar'# => 'set foo'
35
Properties (II)
Sometimes we don’t need the setter...class A(object): def __init__(self, foo): self._foo = foo
def get_foo(self): print 'got foo' return self._foo
foo = property(get_foo)
a = A('ciao')print a.foo# => 'got foo'# => 'ciao'a.foo = 'bar'# Traceback (most recent call last):# File "prop_example2.py", line 15, in <module># a.foo = 'bar'# AttributeError: can't set attribute'
36
Properties (III)
Nicer syntax: decorators are handyclass A(object): def __init__(self, foo): self._foo = foo
@property def foo(self): print 'got foo' return self._foo
a = A('hello')print a.foo# => 'got foo'# => 'hello'a.foo = 'bar'# Traceback (most recent call last):# File "prop_example2.py", line 15, in <module># a.foo = 'bar'# AttributeError: can't set attribute'
37
Properties (IV)
From Python 2.6, decorator for the setter:class A(object): def __init__(self, foo): self._foo = foo
@property def foo(self): print 'got foo' return self._foo
@foo.setter def foo(self, value): print 'set foo' self._foo = value
a = A('hello')a.foo = 'bar'# => 'set foo'
38
class Track(object): def __init__(self, artist, title, duration): self._artist = artist self._title = title self._duration = duration
@property def artist(self): return self._artist
@property def title(self): return self._title
@property def duration(self): return self._duration
def __str__(self): return '%s - %s - %s' % (self.artist, self.title, self.duration)
39
How Pythonic?
We can decouple interface from implementation (getters/setters)
We have “read-only” attributes,
therefore, “immutable” objects
Trivial getter/setters are repetitive
Properties are helpful in order to evolve code, but are verbose to define “immutable objects”
40
Named Tuples
Named Tuples solve the problem nicely
Immutable objects (easier to use, too much C++ and FP lately ☺)
Can be used both as objects and tuples
__str__ and other methods have good default implementation
Subclassing can be used to change defaults
Very quick to write!
http://code.activestate.com/recipes/500261-named-tuples/
41
Track = collections.namedtuple('Track', ['title', 'artist', 'duration'])
42
About Java/C++ types...
In statically typed languages like C++ we constrain parameters to be of a given type or any of its subtypes
However, a good programming practice is program to an interface
Java interfaces (true dynamic polymorphism)
C++ Templates (static polymorphism)
Both solutions have problems (however, I do love ML static typing...)
43
Books, search by title
If the list contains a non book, an exception is raised
Does not even work with subclasses
Worst strategy
Never type-check like that
Solving a non-problem
class Book(object): def __init__(self, title, author): self.title = title self.author = author
def find_by_title(seq, title): for item in seq: if type(item) == Book: # horrible if item.title == title: return item else: raise TypeError
def find_by_author(seq, author): for item in seq: if type(item) == Book: # horrible if item.author == author: return item else: raise TypeError
44
Books, search by title
If the list contains a non book, an exception is raised
Does not even work with subclasses
Worst strategy
Never type-check like that
Solving a non-problem
44
Books, search by title
Subclasses are ok
However, code does not depend on elements being books
They have a title
They have an author
What about songs?
Bad strategy, afterall
def find_by_title(seq, title): for item in seq: if isinstance(item, Book): # bad if item.title == title: return item else: raise TypeError
def find_by_author(seq, author): for item in seq: if isinstance(item, Book): # bad if item.author == author: return item else: raise TypeError
class Book(object): def __init__(self, title, author): self.title = title self.author = author
45
Books, search by title
Subclasses are ok
However, code does not depend on elements being books
They have a title
They have an author
What about songs?
Bad strategy, afterall
def find_by_title(seq, title): for item in seq: if isinstance(item, Book): # bad if item.title == title: return item else: raise TypeError
def find_by_author(seq, author): for item in seq: if isinstance(item, Book): # bad if item.author == author: return item else: raise TypeError
class Song(object): def __init__(self, title, author): self.title = title self.author = author
45
What about movies?
Movies have a title. However, they have a director and no author
find_by_title should work, find_by_author, shouldn’t
Interface for Book e Song. And what about Movie?
Design Pattern o code duplication
Square Wheel ⇒ Roads designed for square wheels
Duck typing simply avoids the problem
46
Books and Songs
The simplest solution is the best
Programmers do not code by chance (hopefully)
AttributeErrors are raised in case of problems
UnitTests discover these kind of errors
You have unit tests, don’t you?
class Book(object): def __init__(self, t, a): self.title = t self.author = a def find_by_title(seq, title): for item in seq: if item.title == title: return item
def find_by_author(seq, author): for item in seq: if item.author == author: return item
47
def find_by(seq, **kwargs): for obj in seq: for key, val in kwargs.iteritems(): try: if getattr(obj, key) != val: break except AttributeError: break else: return obj raise NotFound
print find_by(books, title='Python in a Nutshell')print find_by(books, author='M. Beri')print find_by(books, title='Python in a Nutshell', author='A. Martelli')
try: print find_by(books, title='Python in a Nutshell', author='M. Beri') print find_by(books, title='Python in a Nutshell', pages=123)except NotFound: pass
48
def find_by(seq, **kwargs): for obj in seq: for key, val in kwargs.iteritems(): try: attr = getattr(obj, key) except AttributeError: break else: if val != attr and val not in attr: break else: yield obj
Life expectations
Function parameters and every variable bound in a function body constitutes the function local scope
These variables scope is the whole function body
However, using them before binding is an error
50
Life expectations
Function parameters and every variable bound in a function body constitutes the function local scope
These variables scope is the whole function body
However, using them before binding is an errorif s.startswith(t): a = s[:4]else: a = tprint a
a = None
WRONG
50
Life expectations
Function parameters and every variable bound in a function body constitutes the function local scope
These variables scope is the whole function body
However, using them before binding is an errorif s.startswith(t): a = s[:4]else: a = tprint a
GOOD
50
LBYL vs. EAFP
LBYL: Look before you leap
EAFP: Easier to ask forgiveness than permission
Usually EAFP is the best strategy
Exception are rather fast
Atomicity, ...
# LBYL -- badif id_ in employees: emp = employees[id_]else: report_error(...)
#EAFP -- goodtry: emp = employees[id_]except KeyError: report_error(...)
51
if os.access(filename, os.F_OK): fh = file(filename)else: print "Something went bad."
if os.access(filename, os.F_OK): try: fh = file(filename) except IOError: print "Something went bad."else: print "Something went bad."
try: fh = file(filename)except IOError: print "Something went bad."
BAD
VERBOSE
GOOD
52
More on Exceptions
Exceptions should subclass Exception directly or indirectly
Catch exceptions using the most specific specifier
Don’t use the base except: unless
You plan to re-raise the exception (but you probably should use finally)
You want to log any error or something like that
Also catches KeyboardInterrupt
53
Limit the try scopetry: # Too broad! return handle_value(collection[key])except KeyError: # Will also catch KeyError raised by handle_value() return key_not_found(key)
try: value = collection[key]except KeyError: return key_not_found(key)else: return handle_value(value)
BAD
GOOD
54
References
Python in a Nutshell, 2ed, Alex Martelli, O’Reilly
Python Cookbook, Alex Martelli, Anna Martelli Ravenscroft and David Ascher, O’Reilly
Agile Software Development: Principles, Patterns and Practices, Robert C. Martin, Prentice Hall
Code Clean, Robert C. Martin, Prentice Hall
Structure and Interpretation of Computer Programs,H. Abelson, G. Sussman, J. Sussman,http://mitpress.mit.edu/sicp/full-text/book/book.html
55
References
http://python.net/~goodger/projects/pycon/2007/idiomatic/handout.html
http://dirtsimple.org/2004/12/python-is-not-java.html
http://docs.python.org/dev/howto/doanddont.html
http://www.slideshare.net/sykora/idiomatic-python
http://bayes.colorado.edu/PythonIdioms.html
56
Q&A57