module 4 –python and regular expressionstodd/cse330/cse330_lecture4.pdf · what is python?...
TRANSCRIPT
![Page 1: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/1.jpg)
Extensible Networking Platform 11 - CSE 330 – Creative Programming and Rapid Prototyping
Module 4 – Python and Regular Expressions
• Module 4 contains only an individual assignment
• Due Monday July 6th
• Do not wait until the last minute to start on this module
• Read the WIKI before starting along with a few Python tutorials
• Portions of today’s slides came from– Marc Conrad
• University of Luton– Paul Prescod
• Vancouver Python Users’ Group– James Casey
• Opscode– Tim Finin
• Univeristy of Maryland
1
Extensible Networking Platform 22 - CSE 330 – Creative Programming and Rapid Prototyping
What is Python?
• Python is an easy to learn, powerful programming language– Efficient high-level data structures– Simple approach to object-oriented
programming.– Elegant syntax and dynamic typing– Up-and-coming language in the open source
world
• We are using Python version 3.4 or later in this course
2
![Page 2: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/2.jpg)
Extensible Networking Platform 33 - CSE 330 – Creative Programming and Rapid Prototyping
Usability Features
• Very clear syntax• Obvious way to do most things• Huge amount of free code and libraries• Interactive• Only innovative where innovation is really
necessary– Better to steal a good idea than invent a bad one!
3
Extensible Networking Platform 44 - CSE 330 – Creative Programming and Rapid Prototyping
Python “Hello world"
print (“Hello, World”)
4
![Page 3: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/3.jpg)
Extensible Networking Platform 55 - CSE 330 – Creative Programming and Rapid Prototyping
Python Interpreter
• Just type:
• Todds-MacBook-Air:~ todd$ python3• Python 3.6.1 (default, Apr 4 2017, 09:40:21)• [GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)]
on darwin• Type "help", "copyright", "credits" or "license" for more
information.
5
Extensible Networking Platform 66 - CSE 330 – Creative Programming and Rapid Prototyping
Features of the Interpreter
• Lines start with “>>>”. You can recognize Python interpreter transcripts anywhere you see them.
• Expressions that return a value display the value.>>> 5+3*417
• This saves you from excessive “print”ing
6
![Page 4: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/4.jpg)
Extensible Networking Platform 77 - CSE 330 – Creative Programming and Rapid Prototyping
Interactive Interpreters
• Windows command line• OS X• Linux/Unix • Graphical command lines: “IDLE”, “PythonWin”, “MacPython”, …
• Jython• And many more…
7
Extensible Networking Platform 88 - CSE 330 – Creative Programming and Rapid Prototyping
Python scripts
• Sometimes you want to run the same program more than once!
• Make a file with Python statements in it:foo.py:print (“hello world”)
todd$ python3 foo.pyhello worldtodd$ python3 foo.pyhello world
8
![Page 5: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/5.jpg)
Extensible Networking Platform 99 - CSE 330 – Creative Programming and Rapid Prototyping
Python is dynamically typed
width = 20print (width) height = 5 * 9print (height)print (width * height) width = "really wide"print (width)
9
Extensible Networking Platform 1010 - CSE 330 – Creative Programming and Rapid Prototyping
Experiment in the Interpreter
• Any Python variable can hold any value.
>>> width = 20>>> height = 5 * 9>>> print (width * height)900>>> width = "really wide">>> print (width)really wide
10
![Page 6: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/6.jpg)
Extensible Networking Platform 1111 - CSE 330 – Creative Programming and Rapid Prototyping
Dynamic Type Checking
test_sqrt.py:import math
def square_root(num):return math.sqrt(num)
def goodfunc():print (square_root(10))
def badfunc():print (square_root("10"))
goodfunc()badfunc()
11
Extensible Networking Platform 1212 - CSE 330 – Creative Programming and Rapid Prototyping
Multiple statements on a line
• You can combine multiple simple statements on a line:
>>> a = 5;print (a); a = 6; print (a)
5
6
12
![Page 7: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/7.jpg)
Extensible Networking Platform 1313 - CSE 330 – Creative Programming and Rapid Prototyping
Indentation
• Python uses indentation for scoping:
if this_function(that_variable):
do_something()
else:
do_something_else()
13
Extensible Networking Platform 1414 - CSE 330 – Creative Programming and Rapid Prototyping
Indentation
• Tabs and spaces look the same in most editors.
• If your editor uses a different conversion rate between tabs and spaces than “standard”, your Python code may not parse properly.
• Three easy solutions:1. Only use tabs or spaces in a file: don’t mix them.2. Use an editor that knows about Python.3. Configure editor to use the same tab/space rules as Python, vi, emacs,
notepad, edit, etc. : 8 spaces per tab
14
![Page 8: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/8.jpg)
Extensible Networking Platform 1515 - CSE 330 – Creative Programming and Rapid Prototyping
Compared to PHP/Javascript
• Excellent for Web apps (PHP on server, Javascripton client) but not much else.
• Python can be used for your Web apps, your complicated algorithms, your GUIs, your COM components, an extension language for Java programs
• Even in Web apps, Python handles complexity better.
15
Extensible Networking Platform 1616 - CSE 330 – Creative Programming and Rapid Prototyping
Compared to Java
• Java is more difficult for amateur programmers.
• Static type checking can be inconvenient and inflexible.
• Bottom line: Java can make projects harder than they need to be.
16
![Page 9: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/9.jpg)
Extensible Networking Platform 1717 - CSE 330 – Creative Programming and Rapid Prototyping
Python Limitations
• Not the fastest executing programming language:– C/C++ is naturally fast– Perl’s regular expressions and IO are a little faster– Some Java implementations have good JITs– But Python also has some speed advantages:
• Fast implementations of built-in data structures• Pyrex compiles Python code to C
• Dynamic type checking requires more care in testing.
• Language changes (relatively) quickly: this is a strength and a weakness.
17
Extensible Networking Platform 1818 - CSE 330 – Creative Programming and Rapid Prototyping
Objects All the Way Down
• Everything in Python is an object• Integers are objects.• Characters are objects.• Complex numbers are objects.• Booleans are objects.• Functions are objects.• Methods are objects.• Modules are objects
18
![Page 10: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/10.jpg)
Extensible Networking Platform 1919 - CSE 330 – Creative Programming and Rapid Prototyping
Object Type and Identity
• You can find out the type of any object:>>> print (type(1))<type 'int'>>>> print (type(1.0))<type 'float'>
• Every object also has a unique identifier (usually only for debugging purposes)>>> print (id(1))7629640>>> print (id("1"))7910560
19
Extensible Networking Platform 2020 - CSE 330 – Creative Programming and Rapid Prototyping
None
• “None” represents the lack of a value.• Like “NULL” in some languages or in databases.• For instance:
>>> if y!=0:... fraction = x/y... else:... fraction = None
20
![Page 11: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/11.jpg)
Extensible Networking Platform 2121 - CSE 330 – Creative Programming and Rapid Prototyping
File Objects
• Represent opened files:>>> infile = open( "catalog.txt", "r" )>>> data = infile.read()>>> infile.close()>>> outfile = open( "catalog2.txt", "w" )>>> data = data+ "more data">>> outfile.write( data )>>> outfile.close()
• You may sometimes see the name “open” used to create files.
21
Extensible Networking Platform 2222 - CSE 330 – Creative Programming and Rapid Prototyping
Basic Flow Control
• if/elif/else (test condition)
• while (loop until condition changes)
• for (iterate over iteraterable object)
22
![Page 12: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/12.jpg)
Extensible Networking Platform 2323 - CSE 330 – Creative Programming and Rapid Prototyping
if Statement
if j=="Hello":doSomething()
elif j=="World":doSomethingElse()
else:doTheRightThing()
23
Extensible Networking Platform 2424 - CSE 330 – Creative Programming and Rapid Prototyping
while Statement
str=""while str!="quit":
str=raw_input()print (str)
print "Done"
24
![Page 13: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/13.jpg)
Extensible Networking Platform 2525 - CSE 330 – Creative Programming and Rapid Prototyping
for Statement
myList = ["a", "b", "c", "d", "e"]for i in myList:
print (i)
for i in range( 10 ):print (i)
for i in range( len( myList ) ):if myList[i]=="c":
myList[i]=None
• Can “break” out of for-loops.• Can “continue” to next iteration.
25
Extensible Networking Platform 2626 - CSE 330 – Creative Programming and Rapid Prototyping
Python Modules
26
![Page 14: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/14.jpg)
Extensible Networking Platform 2727 - CSE 330 – Creative Programming and Rapid Prototyping
What is a Module?
- A file containing some Python code
OR
- A .dll (.so on Unix) containing compiled code which follows some guidelines
- A namespace
27
Extensible Networking Platform 2828 - CSE 330 – Creative Programming and Rapid Prototyping
A Python Module
def hello_world():print (“Hello world”)
• Save this as “myModule.py” Now we can use it:>>> import myModule>>> myModule.hello_world()
• Or:>>> from myModule import hello_world>>> hello_world()
28
![Page 15: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/15.jpg)
Extensible Networking Platform 2929 - CSE 330 – Creative Programming and Rapid Prototyping
Other Built-in Protocols
• FTP• XML-RPC• Telnet• POP• IMAP• MIME• NNTP• HTTP
• SSL• Sockets• CGI• Gopher• URL Parsing
• Plus downloadable modules for every other protocol in the universe!
29
Extensible Networking Platform 3030 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expressions
30
![Page 16: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/16.jpg)
Extensible Networking Platform 3131 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expressions
• Regular expressions are a powerful string manipulation tool
• All modern languages have similar library packages for regular expressions
• Use regular expressions to:– Search a string (search and match)– Replace parts of a string (sub)– Break strings into smaller pieces (split)
31
Extensible Networking Platform 3232 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• Most characters match themselvesThe regular expression “test”matches the string ‘test’, and only that string
• [x] matches any one of a list of characters“[abc]”matches ‘a’,‘b’,or ‘c’
• [^x] matches any one character that is not included in x“[^abc]”matches any single character except‘a’,’b’,or ‘c’
32
![Page 17: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/17.jpg)
Extensible Networking Platform 3333 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• “.”matches any single character
• Parentheses can be used for grouping“(abc)+”matches ’abc’, ‘abcabc’, ‘abcabcabc’, etc.
• x|y matches x or y“this|that”matches ‘this’ and ‘that’, but not ‘thisthat’.
33
Extensible Networking Platform 3434 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• x* matches zero or more x’s“a*”matches ’’, ’a’, ’aa’, etc.
• x+ matches one or more x’s“a+”matches ’a’,’aa’,’aaa’, etc.
• x? matches zero or one x’s“a?”matches ’’ or ’a’
• x{m, n} matches i x‘s, where m<i< n“a{2,3}”matches ’aa’ or ’aaa’
34
![Page 18: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/18.jpg)
Extensible Networking Platform 3535 - CSE 330 – Creative Programming and Rapid Prototyping
Regular Expression Syntax
• “\d”matches any digit; “\D” any non-digit
• “\s”matches any whitespace character; “\S” any non-whitespace character
• “\w”matches any alphanumeric character; “\W” any non-alphanumeric character
• “^”matches the beginning of the string;“$” the end of the string
35
Extensible Networking Platform 3636 - CSE 330 – Creative Programming and Rapid Prototyping
Debuggex Example
36
![Page 19: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/19.jpg)
Extensible Networking Platform 3737 - CSE 330 – Creative Programming and Rapid Prototyping
Search and Match in Python RegEx
• The two basic functions are re.search and re.match– Search looks for a pattern anywhere in a string– Match looks for a match starting at the beginning
• Both return None (logical false) if the pattern isn’t found and a “match object” instance if it is>>> import re>>> pat = "a*b”>>> re.search(pat,"fooaaabcde")<_sre.SRE_Match object at 0x809c0>>>> re.match(pat,"fooaaabcde")>>>
37
Extensible Networking Platform 3838 - CSE 330 – Creative Programming and Rapid Prototyping
What’s a match object?
• An instance of the match class with the details of the match result
>>> r1 = re.search("a*b","fooaaabcde")>>> r1.group() # group returns string matched
'aaab'>>> r1.start() # index of the match start
3>>> r1.end() # index of the match end7>>> r1.span() # tuple of (start, end)(3, 7)
38
![Page 20: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/20.jpg)
Extensible Networking Platform 3939 - CSE 330 – Creative Programming and Rapid Prototyping
What got matched?
• Here’s a pattern to match simple email addresses\w+@(\w+\.)+(com|org|net|edu)
>>> pat1 = "\w+@(\w+\.)+(com|org|net|edu)">>> r1 = re.match(pat1,"[email protected]")>>> r1.group()’[email protected]’
• We might want to extract the pattern parts, like the email name and host
39
Extensible Networking Platform 4040 - CSE 330 – Creative Programming and Rapid Prototyping
What got matched?
• We can put parentheses around groups we want to be able to reference
>>> pat2 = "(\w+)@((\w+\.)+(com|org|net|edu))">>> r2 = re.match(pat2,”[email protected]")>>> r2.group(1)’todd'>>> r2.group(2)’arl.wustl.edu'>>> r2.groups()r2.groups()(’todd', ’arl.wustl.edu', ’wustl.', 'edu’)
• Note that the ‘groups’ are numbered in a preorder traversal
40
![Page 21: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/21.jpg)
Extensible Networking Platform 4141 - CSE 330 – Creative Programming and Rapid Prototyping
What got matched?
• We can ‘label’ the groups as well… >>> pat3 ="(?P<name>\w+)@(?P<host>(\w+\.)+(com|org|net|edu))"
>>> r3 = re.match(pat3,"[email protected]")>>> r3.group('name')’todd'>>> r3.group('host')’arl.wustl.edu’
• And reference the matching parts by the labels
41
Extensible Networking Platform 4242 - CSE 330 – Creative Programming and Rapid Prototyping
More re functions
• re.split() is like split but can use patterns>>> re.split("\W+", “This... is a test,
short and sweet, of split().”)['This', 'is', 'a', 'test', 'short’,
'and', 'sweet', 'of', 'split’, ‘’]
• re.sub substitutes one string for a pattern>>> re.sub('(blue|white|red)', 'black', 'blue socks and red shoes')
'black socks and black shoes’
• re.findall() finds all matches>>> re.findall("\d+”,"12 dogs,11 cats, 1 egg")['12', '11', ’1’]
42
![Page 22: Module 4 –Python and Regular Expressionstodd/cse330/cse330_lecture4.pdf · What is Python? •Python is an easy to learn, powerful programming language –Efficient high-level data](https://reader036.vdocuments.us/reader036/viewer/2022062506/5f01fe437e708231d4020c45/html5/thumbnails/22.jpg)
Extensible Networking Platform 4343 - CSE 330 – Creative Programming and Rapid Prototyping
Compiling regular expressions• If you plan to use a re pattern more than once, compile it
to a re object• Python produces a special data structure that speeds up
matching>>> cpat3 = re.compile(pat3)>>> cpat3<_sre.SRE_Pattern object at 0x2d9c0>>>> r3 = cpat3.search("[email protected]")
>>> r3<_sre.SRE_Match object at 0x895a0>>>> r3.group()’[email protected]'
43