Extensible Networking Platform 11 - CSE 330 – Creative Programming and Rapid Prototyping
Module 4 – Python and Regular Expressions
• Module 4 contains only an individual assignment
• Due Monday July 6th
• Do not wait until the last minute to start on this module
• Read the WIKI before starting along with a few Python tutorials
• Portions of today’s slides came from– Marc Conrad
• University of Luton– Paul Prescod
• Vancouver Python Users’ Group– James Casey
• Opscode– Tim Finin
• Univeristy of Maryland
1
Extensible Networking Platform 22 - CSE 330 – Creative Programming and Rapid Prototyping
What is Python?
• Python is an easy to learn, powerful programming language– Efficient high-level data structures– Simple approach to object-oriented
programming.– Elegant syntax and dynamic typing– Up-and-coming language in the open source
world
• We are using Python version 3.4 or later in this course
2
Extensible Networking Platform 33 - CSE 330 – Creative Programming and Rapid Prototyping
Usability Features
• Very clear syntax• Obvious way to do most things• Huge amount of free code and libraries• Interactive• Only innovative where innovation is really
necessary– Better to steal a good idea than invent a bad one!
3
Extensible Networking Platform 44 - CSE 330 – Creative Programming and Rapid Prototyping
Python “Hello world"
print (“Hello, World”)
4
Extensible Networking Platform 55 - CSE 330 – Creative Programming and Rapid Prototyping
Python Interpreter
• Just type:
• Todds-MacBook-Air:~ todd$ python3• Python 3.6.1 (default, Apr 4 2017, 09:40:21)• [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)]
on darwin• Type "help", "copyright", "credits" or "license" for more
information.
5
Extensible Networking Platform 66 - CSE 330 – Creative Programming and Rapid Prototyping
Features of the Interpreter
• Lines start with “>>>”. You can recognize Python interpreter transcripts anywhere you see them.
• Expressions that return a value display the value.>>> 5+3*417
• This saves you from excessive “print”ing
6
Extensible Networking Platform 77 - CSE 330 – Creative Programming and Rapid Prototyping
Interactive Interpreters
• Windows command line• OS X• Linux/Unix • Graphical command lines: “IDLE”, “PythonWin”, “MacPython”, …
• Jython• And many more…
7
Extensible Networking Platform 88 - CSE 330 – Creative Programming and Rapid Prototyping
Python scripts
• Sometimes you want to run the same program more than once!
• Make a file with Python statements in it:foo.py:print (“hello world”)
todd$ python3 foo.pyhello worldtodd$ python3 foo.pyhello world
8
Extensible Networking Platform 99 - CSE 330 – Creative Programming and Rapid Prototyping
Python is dynamically typed
width = 20print (width) height = 5 * 9print (height)print (width * height) width = "really wide"print (width)
9
Extensible Networking Platform 1010 - CSE 330 – Creative Programming and Rapid Prototyping
Experiment in the Interpreter
• Any Python variable can hold any value.
>>> width = 20>>> height = 5 * 9>>> print (width * height)900>>> width = "really wide">>> print (width)really wide
10
Extensible Networking Platform 1111 - CSE 330 – Creative Programming and Rapid Prototyping
Dynamic Type Checking
test_sqrt.py:import math
def square_root(num):return math.sqrt(num)
def goodfunc():print (square_root(10))
def badfunc():print (square_root("10"))
goodfunc()badfunc()
11
Extensible Networking Platform 1212 - CSE 330 – Creative Programming and Rapid Prototyping
Multiple statements on a line
• You can combine multiple simple statements on a line:
>>> a = 5;print (a); a = 6; print (a)
5
6
12
Extensible Networking Platform 1313 - CSE 330 – Creative Programming and Rapid Prototyping
Indentation
• Python uses indentation for scoping:
if this_function(that_variable):
do_something()
else:
do_something_else()
13
Extensible Networking Platform 1414 - CSE 330 – Creative Programming and Rapid Prototyping
Indentation
• Tabs and spaces look the same in most editors.
• If your editor uses a different conversion rate between tabs and spaces than “standard”, your Python code may not parse properly.
• Three easy solutions:1. Only use tabs or spaces in a file: don’t mix them.2. Use an editor that knows about Python.3. Configure editor to use the same tab/space rules as Python, vi, emacs,
notepad, edit, etc. : 8 spaces per tab
14
Extensible Networking Platform 1515 - CSE 330 – Creative Programming and Rapid Prototyping
Compared to PHP/Javascript
• Excellent for Web apps (PHP on server, Javascripton client) but not much else.
• Python can be used for your Web apps, your complicated algorithms, your GUIs, your COM components, an extension language for Java programs
• Even in Web apps, Python handles complexity better.
15
Extensible Networking Platform 1616 - CSE 330 – Creative Programming and Rapid Prototyping
Compared to Java
• Java is more difficult for amateur programmers.
• Static type checking can be inconvenient and inflexible.
• Bottom line: Java can make projects harder than they need to be.
16
Extensible Networking Platform 1717 - CSE 330 – Creative Programming and Rapid Prototyping
Python Limitations
• Not the fastest executing programming language:– C/C++ is naturally fast– Perl’s regular expressions and IO are a little faster– Some Java implementations have good JITs– But Python also has some speed advantages:
• Fast implementations of built-in data structures• Pyrex compiles Python code to C
• Dynamic type checking requires more care in testing.
• Language changes (relatively) quickly: this is a strength and a weakness.
17
Extensible Networking Platform 1818 - CSE 330 – Creative Programming and Rapid Prototyping
Objects All the Way Down
• Everything in Python is an object• Integers are objects.• Characters are objects.• Complex numbers are objects.• Booleans are objects.• Functions are objects.• Methods are objects.• Modules are objects
18
Extensible Networking Platform 1919 - CSE 330 – Creative Programming and Rapid Prototyping
Object Type and Identity
• You can find out the type of any object:>>> print (type(1))<type 'int'>>>> print (type(1.0))<type 'float'>
• Every object also has a unique identifier (usually only for debugging purposes)>>> print (id(1))7629640>>> print (id("1"))7910560
19
Extensible Networking Platform 2020 - CSE 330 – Creative Programming and Rapid Prototyping
None
• “None” represents the lack of a value.• Like “NULL” in some languages or in databases.• For instance:
>>> if y!=0:... fraction = x/y... else:... fraction = None
20
Extensible Networking Platform 2121 - CSE 330 – Creative Programming and Rapid Prototyping
File Objects
• Represent opened files:>>> infile = open( "catalog.txt", "r" )>>> data = infile.read()>>> infile.close()>>> outfile = open( "catalog2.txt", "w" )>>> data = data+ "more data">>> outfile.write( data )>>> outfile.close()
• You may sometimes see the name “open” used to create files.
21
Extensible Networking Platform 2222 - CSE 330 – Creative Programming and Rapid Prototyping
Basic Flow Control
• if/elif/else (test condition)
• while (loop until condition changes)
• for (iterate over iteraterable object)
22
Extensible Networking Platform 2323 - CSE 330 – Creative Programming and Rapid Prototyping
if Statement
if j=="Hello":doSomething()
elif j=="World":doSomethingElse()
else:doTheRightThing()
23
Extensible Networking Platform 2424 - CSE 330 – Creative Programming and Rapid Prototyping
while Statement
str=""while str!="quit":
str=raw_input()print (str)
print "Done"
24
Extensible Networking Platform 2525 - CSE 330 – Creative Programming and Rapid Prototyping
for Statement
myList = ["a", "b", "c", "d", "e"]for i in myList:
print (i)
for i in range( 10 ):print (i)
for i in range( len( myList ) ):if myList[i]=="c":
myList[i]=None
• Can “break” out of for-loops.• Can “continue” to next iteration.
25
Extensible Networking Platform 2626 - CSE 330 – Creative Programming and Rapid Prototyping
Python Modules
26
Extensible Networking Platform 2727 - CSE 330 – Creative Programming and Rapid Prototyping
What is a Module?
- A file containing some Python code
OR
- A .dll (.so on Unix) containing compiled code which follows some guidelines
- A namespace
27
Extensible Networking Platform 2828 - CSE 330 – Creative Programming and Rapid Prototyping
A Python Module
def hello_world():print (“Hello world”)
• Save this as “myModule.py” Now we can use it:>>> import myModule>>> myModule.hello_world()
• Or:>>> from myModule import hello_world>>> hello_world()
28
Extensible Networking Platform 2929 - CSE 330 – Creative Programming and Rapid Prototyping
Other Built-in Protocols
• FTP• XML-RPC• Telnet• POP• IMAP• MIME• NNTP• HTTP
• SSL• Sockets• CGI• Gopher• URL Parsing
• Plus downloadable modules for every other protocol in the universe!
29
Extensible Networking Platform 3030 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expressions
30
Extensible Networking Platform 3131 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expressions
• Regular expressions are a powerful string manipulation tool
• All modern languages have similar library packages for regular expressions
• Use regular expressions to:– Search a string (search and match)– Replace parts of a string (sub)– Break strings into smaller pieces (split)
31
Extensible Networking Platform 3232 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• Most characters match themselvesThe regular expression “test”matches the string ‘test’, and only that string
• [x] matches any one of a list of characters“[abc]”matches ‘a’,‘b’,or ‘c’
• [^x] matches any one character that is not included in x“[^abc]”matches any single character except‘a’,’b’,or ‘c’
32
Extensible Networking Platform 3333 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• “.”matches any single character
• Parentheses can be used for grouping“(abc)+”matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc.
• x|y matches x or y“this|that”matches ‘this’ and ‘that’, but not ‘thisthat’.
33
Extensible Networking Platform 3434 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• x* matches zero or more x’s“a*”matches ’’, ’a’, ’aa’, etc.
• x+ matches one or more x’s“a+”matches ’a’,’aa’,’aaa’, etc.
• x? matches zero or one x’s“a?”matches ’’ or ’a’
• x{m, n} matches i x‘s, where m<i< n“a{2,3}”matches ’aa’ or ’aaa’
34
Extensible Networking Platform 3535 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• “\d”matches any digit; “\D” any non-digit
• “\s”matches any whitespace character; “\S” any non-whitespace character
• “\w”matches any alphanumeric character; “\W” any non-alphanumeric character
• “^”matches the beginning of the string;“$” the end of the string
35
Extensible Networking Platform 3636 - CSE 330 – Creative Programming and Rapid Prototyping
Debuggex Example
36
Extensible Networking Platform 3737 - CSE 330 – Creative Programming and Rapid Prototyping
Search and Match in Python RegEx
• The two basic functions are re.search and re.match– Search looks for a pattern anywhere in a string– Match looks for a match starting at the beginning
• Both return None (logical false) if the pattern isn’t found and a “match object” instance if it is>>> import re>>> pat = "a*b”>>> re.search(pat,"fooaaabcde")<_sre.SRE_Match object at 0x809c0>>>> re.match(pat,"fooaaabcde")>>>
37
Extensible Networking Platform 3838 - CSE 330 – Creative Programming and Rapid Prototyping
What’s a match object?
• An instance of the match class with the details of the match result
>>> r1 = re.search("a*b","fooaaabcde")>>> r1.group() # group returns string matched
'aaab'>>> r1.start() # index of the match start
3>>> r1.end() # index of the match end7>>> r1.span() # tuple of (start, end)(3, 7)
38
Extensible Networking Platform 3939 - CSE 330 – Creative Programming and Rapid Prototyping
What got matched?
• Here’s a pattern to match simple email addresses\w+@(\w+\.)+(com|org|net|edu)
>>> pat1 = "\w+@(\w+\.)+(com|org|net|edu)">>> r1 = re.match(pat1,"[email protected]")>>> r1.group()’[email protected]’
• We might want to extract the pattern parts, like the email name and host
39
Extensible Networking Platform 4040 - CSE 330 – Creative Programming and Rapid Prototyping
What got matched?
• We can put parentheses around groups we want to be able to reference
>>> pat2 = "(\w+)@((\w+\.)+(com|org|net|edu))">>> r2 = re.match(pat2,”[email protected]")>>> r2.group(1)’todd'>>> r2.group(2)’arl.wustl.edu'>>> r2.groups()r2.groups()(’todd', ’arl.wustl.edu', ’wustl.', 'edu’)
• Note that the ‘groups’ are numbered in a preorder traversal
40
Extensible Networking Platform 4141 - CSE 330 – Creative Programming and Rapid Prototyping
What got matched?
• We can ‘label’ the groups as well… >>> pat3 ="(?P<name>\w+)@(?P<host>(\w+\.)+(com|org|net|edu))"
>>> r3 = re.match(pat3,"[email protected]")>>> r3.group('name')’todd'>>> r3.group('host')’arl.wustl.edu’
• And reference the matching parts by the labels
41
Extensible Networking Platform 4242 - CSE 330 – Creative Programming and Rapid Prototyping
More re functions
• re.split() is like split but can use patterns>>> re.split("\W+", “This... is a test,
short and sweet, of split().”)['This', 'is', 'a', 'test', 'short’,
'and', 'sweet', 'of', 'split’, ‘’]
• re.sub substitutes one string for a pattern>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')
'black socks and black shoes’
• re.findall() finds all matches>>> re.findall("\d+”,"12 dogs,11 cats, 1 egg")['12', '11', ’1’]
42
Extensible Networking Platform 4343 - CSE 330 – Creative Programming and Rapid Prototyping
Compiling regular expressions• If you plan to use a re pattern more than once, compile it
to a re object• Python produces a special data structure that speeds up
matching>>> cpat3 = re.compile(pat3)>>> cpat3<_sre.SRE_Pattern object at 0x2d9c0>>>> r3 = cpat3.search("[email protected]")
>>> r3<_sre.SRE_Match object at 0x895a0>>>> r3.group()’[email protected]'
43