introduction to python for...
Post on 20-May-2020
5 Views
Preview:
TRANSCRIPT
Introduction to Python for Biologists
Katerina Taskova1 Jean-Fred Fontaine1,2
1Faculty of Biology, Johannes Gutenberg-Universitat Mainz, Mainz, Germany
2Genomics and Computational Biology, Kernel Press, Mainz, Germany
https://cbdm.uni-mainz.de/mb17
March 21, 2017
Introduction to Python for Biologists –
Table of Contents
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 2
Introduction to Python for Biologists – Introduction
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 3
Introduction to Python for Biologists – Introduction
What is Python?
� Python is a general-purpose programming language� created by Guido van Rossum (1991)� high-level (abstraction from the details of the computer)� interpreted (needs an interpreter software)
� Python design philosophy� code readability� syntax brevity
� Python is widely used for Biology� rich built-in features� powerful scientific extensions� plotting capabilities
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 4
Introduction to Python for Biologists – Introduction
Structured programming I
� Instructions are executed sequentially, one per line� Conditional statements allow selective execution of code
blocks� Loops allow repeated execution of code blocks� Functions allow on-demand execution of code blocks
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 5
Introduction to Python for Biologists – Introduction
Structured programming II1 i n s t r u c t i o n 1 # 1 s t i n s t r u c t i o n ( hashtag # s t a r t s comments )2 # blank l i n e3 repeat 20 t imes # 2nd i n s t r u c t i o n ( loop s t a r t s a block )4 i n s t r u c t i o n a # block d e f i n e d by i n d e n t a t i o n ( spaces or tabs )5 i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n b lock6 # blank l i n e7 i f n>10 # 3 rd i n s t r u c t i o n ( C o n d i t i o n a l statement )8 i n s t r u c t i o n a # 1 s t i n s t r u c t i o n i n b lock9 i n s t r u c t i o n b # 2nd i n s t r u c t i o n i n b lock
10 # blank l i n e11 # blank l i n e12 # backslashs j o i n l i n e s13 i n s t r u c t i o n 3 \ # 3 rd i n s t r u c t i o n , p a r t 114 i n s t r u c t i o n 3 # 3 rd i n s t r u c t i o n , p a r t 215 # blank l i n e16 # Expressions i n ( ) , {} , or [ ] can span m u l t i p l e l i n e s17 i n s t r u c t i o n 4 ( 1 , 2 , 3 # 4 t h i n s t r u c t i o n , p a r t 118 4 , 5 , 6) # 4 t h i n s t r u c t i o n , p a r t 2
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 6
Introduction to Python for Biologists – Introduction
Namespace
� Variables are names associated with data� e.g. a=2 assigns value 2 to variable a
� Functions are names associated to specific code blocks� built-in functions are available (see list on slide 100)� e.g. print(a) will display ’2’ on the screen
� The user namespace is the set of names available to theuser
� users can define new names of variables and functions in theirnamespace
� imported modules can add names of variables and functionsin the user namespace
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 7
Introduction to Python for Biologists – Introduction
Object-oriented programming� Data is organized in classes and objects
� a class is a template defining what objects can store and do� an object is an instance of a class� objects have attributes to store data and methods to do
actions� object namespaces are different from user namespace
� Example class ”Human” is defined as:� has a name (an attribute ”name”)� has an age (an attribute ”age”)� can introduce itself (a method ”who”)� example with 1 existing Human object P1:1 P1 . name = ” Mary ” # assigns value t o a t t r i b u t e name2 P1 . age = 26 # assigns value t o a t t r i b u t e age3 P1 . who ( ) # d i s p l a y s ”My name i s Mary I am 2 6 ! ”4 who ( ) # e r r o r ! not i n the user namespace
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 8
Introduction to Python for Biologists – Introduction
Modules� Modules can add functionalities to Python
� e.g. classes and functions� Example of available modules:
� NumPy for scientific computing� Matplotlib for plotting� BioPython for Biology
� Modules have to be imported into the code1 # i m p o r t datet ime module i n i t s own namespace2 i m p o r t datet ime3 datet ime . date . today ( ) # 2017−03−164 today ( ) # e r r o r !56 # i m p o r t f u n c t i o n s log2 and log10 from module math7 # i n c u r r e n t namespace8 from math i m p o r t log2 , log109 log10 ( 1 ) # equal 0
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 9
Introduction to Python for Biologists – Running code
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 10
Introduction to Python for Biologists – Running code
Running code I
� From a terminal by using the interactive Python shell1 $ python3 # opens Python s h e l l2 a=2 # assigns 2 t o a3 b=3 # assigns 3 t o b4 e x i t ( ) # c loses Python s h e l l
� From a terminal by running a script file� e.g. let say myscript.py is a script file (simple text file)� and it contains: print(”hello world!”)
1 $ python3 m y s c r i p t . py # runs python3 and the s c r i p t2 h e l l o wor ld ! # r e s u l t o f the s c r i p t on the t e r m i n a l
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 11
Introduction to Python for Biologists – Running code
Running code II
� From Jupyter Notebook� web-based graphical interface� manage cells of code or text� see execution results on the same notebook� save/open notebooks
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 12
Introduction to Python for Biologists – Running code
Documentation and messages I
Documentation and help:� https://docs.python.org/3� use the built-in help() function
� e.g. help(print) to display help for function print()� see help menu or Google it
Examples of error messages1 # F o r g e t t i n g quotes2 p r i n t ( H e l l o wor ld )3 # F i l e ”<s t d i n >” , l i n e 24 # p r i n t ( H e l l o wor ld )5 # ˆ6 # SyntaxError : i n v a l i d syntax
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 13
Introduction to Python for Biologists – Running code
Documentation and messages II
1 # S p e l l i n g mistakes2 p r i n ( ” H e l l o wor ld ” )3 # Traceback ( most r e c e n t c a l l l a s t ) :4 # F i l e ”<s t d i n >” , l i n e 2 , i n <module>5 # NameError : name ’ p r i n ’ i s not d e f i n e d
1 # Wrong l i n e break w i t h i n a s t r i n g2 p r i n t ( ” H e l l o3 World ” )4 # F i l e ”<s t d i n >” , l i n e 25 # p r i n t ( ” H e l l o6 # ˆ7 # SyntaxError : EOL w h i l e scanning s t r i n g l i t e r a l
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 14
Introduction to Python for Biologists – Literals and variables
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 15
Introduction to Python for Biologists – Literals and variables
Numeric and strings literals I
1 # Numeric l i t e r a l s2 123 −1234 1.6E3 # means 160056 # S t r i n g s l i t e r a l s7 ’A s t r i n g ’ # A s t r i n g8 ’A ” s t r i n g ” ’ # A ” s t r i n g ”9 ”A ’ s t r i n g ’ ” # A ’ s t r i n g ’
10 ’ ’ ’ Three s i n g l e quotes ’ ’ ’ # Three s i n g l e quotes11 ” ” ” Three double quotes ” ” ” # Three double quotes12 ’A \ ’ s t r i n g \ ’ ’ # A ’ s t r i n g ’ ( backslash escape sequence )13 r ’A \ ’ s t r i n g \ ’ ’ # A \ ’ s t r i n g \ ’ ( raw s t r i n g )
Python stores literals in objects of corresponding classes (class intfor integers, float for floatting point, and str for strings)
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 16
Introduction to Python for Biologists – Literals and variables
Numeric and strings literals IIPrinting numeric and strings literals
1 p r i n t ( 1 2 ) # 122 p r i n t (1+2) # 334 p r i n t ( ’ H e l l o World ’ ) # H e l l o World56 p r i n t ( ’ H e l l o World ’ , 1+2) # H e l l o World 37 p r i n t ( ’ H e l l o World ’ , 1+2 , sep= ’− ’ ) # H e l l o World−38 p r i n t ( ’ H e l l o World ’ , 1+2 , sep= ’ \ t ’ ) # H e l l o World 39 # (\ t : tab , \n : newl ine )
1011 p r i n t ( ’AB ’ , end= ’ ’ ) # AB ( avoid newl ine a t the end )12 p r i n t ( ’CD ’ ) # ABCD1314 p r i n t ( ’Max i s ’ , 12 , ’ and Min i s ’ , 3) # Max i s 12 and Min i s 3
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 17
Introduction to Python for Biologists – Literals and variables
Variables IVariables are names used to access objects
� first letter is a character (not a digit)� no space characters allowed� case-sensitive (variable name var is not Var)� prefer alphanumeric characters (e.g. abc123)
� avoid accents, non-alphanumeric, non English� underscores may be used (e.g. abc 123)
The following keywords can not be used as variable names� and, assert, break, class, continue� def, del, elif, else, except, exec, finally, for, from� global, if, import, in, is, lambda, not, or, pass� print, raise, return, try, while, yield
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 18
Introduction to Python for Biologists – Literals and variables
Variables II
1 # Numeric types2 a=2 # a i s assigned an i n t o b j e c t o f value 23 p r i n t ( a ) # p r i n t s the o b j e c t assigned t o a ( 2 )4 b=a # b i s assigned the same o b j e c t as a ( 2 )5 p r i n t ( b ) # 26 a=5 # a i s assigned a new o b j e c t o f value 57 p r i n t ( a ) # 58 p r i n t ( b ) # 2 ( b i s s t i l l assigned t o o b j e c t o f value 2)9
10 # S t r i n g s11 c1= ’ a ’12 p r i n t ( c1 ) # ’ a ’13 myName125 = ’ abc ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 19
Introduction to Python for Biologists – Numeric types
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 20
Introduction to Python for Biologists – Numeric types
Numeric types I1 type ( 7 ) # <c l a s s ’ i n t ’> ( i n t e g e r number )2 type ( 8 . 2 5 ) # <c l a s s ’ f l o a t ’> ( f l o a t i n g p o i n t )3 type ( 4 . 5 2 e−3) # <c l a s s ’ f l o a t ’> ( f l o a t i n g p o i n t )45 # Operators ( s p e c i a l b u i l t−i n f u n c t i o n s )6 1 + 3 # 4 ( a d d i t i o n )7 4 − 1 # 3 ( s u b s t r a c t i o n )8 3 ∗ 2 # 6 ( m u l t i p l i c a t i o n )9 9 / 2 # 4.5 ( d i v i s i o n )
10 9 / / 2 # 4 ( i n t e g e r d i v i s i o n )11 9 % 2 # 1 ( i n t e g e r d i v i s i o n remainder )12 2∗∗3 # 8 ( exponent )1314 # Lowest t o h i g h e s t o p e r a t o r s precedence ( equal i f on same l i n e )15 +,− # A d d i t i o n , S u b t r a c t i o n16 ∗ , / , / / , % # M u l t i p l i c a t i o n , D i v i s i o n s , Remainder17 +x , −x # P o s i t i v e , Negative18 ∗∗ # E x p o n e n t i a t i o n
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 21
Introduction to Python for Biologists – Numeric types
Numeric types II1 # B u i l t−i n f u n c t i o n s2 abs (−2.58) # 2.58 ( a b s o l u t e value o f x )3 round ( 2 . 5 ) # 2 ( round t o c l o s e s t i n t e g e r )45 # With v a r i a b l e s6 a = 1 # 17 b = 1 + 1 # 28 c = a + b # 39 d = a+c∗b # 7 ( precedence o f ∗ over +)
10 d = ( a+c ) ∗b # 8 ( use parentheses t o break precedence )1112 # Short n o t a t i o n s ( v a l i d f o r + , −, ∗ , / , . . . )13 a += 1 # a = a + 114 a ∗= 5 # a = a ∗ 51516 # S p e c i a l f l o a t values17 f l o a t ( ’NaN ’ ) # nan ( Not a Number )18 f l o a t ( ’ I n f ’ ) # i n f : I n f i n i t e p o s i t i v e ; − i n f : I n f i n i t e n e g a t i v e
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 22
Introduction to Python for Biologists – Strings
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 23
Introduction to Python for Biologists – Strings
Sequence types
Text sequence type:� Strings: immutable sequences of characters
Basic sequence types:� Lists: mutable sequences� Tuples: immutable sequences� Ranges: immutable sequence of numbers
Sequence operations:� All sequence types support common sequence operations
(slide 98)� Mutable sequence types support specific operations (slide 99)
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 24
Introduction to Python for Biologists – Strings
Strings I1 # Quotes2 ’A s t r i n g ’ # A s t r i n g3 ’A ” s t r i n g ” ’ # A ” s t r i n g ”4 ”A ’ s t r i n g ’ ” # A ’ s t r i n g ’5 ’ ’ ’ Three s i n g l e quotes ’ ’ ’ # Three s i n g l e quotes6 ” ” ” Three double quotes ” ” ” # Three double quotes78 # Escape sequences ( see annexes )9 ”A s i n g l e quote ’ ” # A s i n g l e quote ’
10 ’A s i n g l e quote \ ’ ’ # A s i n g l e quote ’11 ”A t a b u l a t i o n \ t ”12 ”A newl ine \n ”
� See other escape sequences in slide 97� Triple quoted strings may span multiple lines - all associated
whitespace will be included in the string literal
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 25
Introduction to Python for Biologists – Strings
Strings II
1 # Operators2 ’ p ipe ’ + ’ t t e ’ # = ’ p i p e t t e ’ ( c o n c a t e n a t i o n )3 ’A ’ ∗7 # = ’AAAAAAA ’ ( r e p l i c a t i o n )4 ’A ’ ∗3 + ’C ’ ∗2 # = ’AAACC ’5 ’A ’ + s t r ( 2 . 0 ) # = ’A2 . 0 ’ ( c o n v e r t number then concatenate )67 # B u i l t−i n f u n c t i o n s8 l e n ( ’A s t r i n g o f c h a r a c t e r s ’ ) # 22 ( l e n g t h i n c h a r a c t e r s )9 type ( ’ a ’ ) # <c l a s s ’ s t r ’> ( s t r i n g )
1011 # S l i c e s [ s t a r t : end : step ] (0 i s index o f f i r s t c h a r a c t e r )12 ”ABCDEFG” [ 2 : 5 ] # ’CDE ’ ( F a t index 5 excluded )13 ”ABCDEFG” [ : 5 ] # ’ABCDE ’ ( from b e g i n i n g )14 ”ABCDEFG” [ 5 : ] # ’FG ’ ( t o the end )15 ”ABCDEFG” [−2 : ] # ’FG ’ (−2 from the end : t o the end )16 ”ABCDEFG” [ 0 : 5 : 2 ] # ’ACE ’ ( every second l e t t e r w i t h step =2)
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 26
Introduction to Python for Biologists – Strings
Strings methods IStrings are immutable: new objects are created for changes
1 seq = ”ACGtCCAgTnAGaaGT”23 # Case4 seq . c a p i t a l i z e ( ) # ’ Acgtccagtnagaagt ’5 seq . c a s e f o l d ( ) # ’ acgtccagtnagaagt ’ ( e s z e t t => ” ss ” )6 seq . lower ( ) # ’ acgtccagtnagaagt ’ ( e s z e t t => e s z e t t )7 seq . swapcase ( ) # ’ acgTccaGtNagAAgt ’8 seq . upper ( ) # ’ACGTCCAGTNAGAAGT ’9
10 # Search and r e p l a c e11 seq . count ( ’ a ’ ) # 2 ( case s e n s i t i v e )12 seq . count ( ’G ’ , 0 , 4) # 1 ( s l i c e s t a r t and end indexes )13 seq . endswith ( ’GT ’ ) # True14 seq . endswith ( ’G ’ , 0 , 4) # False ( s l i c e s t a r t and end indexes )15 seq . f i n d ( ’ GtC ’ ) # 2 (1 s t h i t index , −1 o t h e r w i s e )16 seq . r e p l a c e ( ” aa ” , ” t t ” ) # ’ ACGtCCAgTnAGttGT ’ ( case s e n s i t i v e )17 seq . r e p l a c e ( ”A” , ” x ” , 2) # ’xCGtCCxgTnAGaaGT ’ (2 f i r s t h i t s o n l y )
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 27
Introduction to Python for Biologists – Strings
Strings methods II
1 seq = ”ACGtCCAgTnAGaaGT”23 # I s f u n c t i o n s4 seq . isalnum ( ) # True ( Are a l l c h a r a c t e r s alphanumeric ?)5 seq . i s a l p h a ( ) # True ( Are a l l c h a r a c t e r s a l p h a b e t i c ?)6 seq . i s l o w e r ( ) # False ( Are a l l c h a r a c t e r s lowercase ?)7 seq . i s n u m e r i c ( ) # False ( Are a l l numeric c h a r a c t e r s ?)8 seq . isspace ( ) # False ( Are a l l whitespace c h a r a c t e r s ?)9 seq . i s u p p e r ( ) # False ( Are a l l c h a r a c t e r s uppercase ?)
1011 # J o i n and s p l i t12 ”−” . j o i n ( [ ”A” , ”B” ] ) # ’A−B ’13 ”−” . j o i n ( seq ) # ’A−C−G−t−C−C−A−g−T−n−A−G−a−a−G−T ’14 seq . p a r t i t i o n ( ” aa ” ) # ( ’ ACGtCCAgTnAG ’ , ’ aa ’ , ’GT ’ ) : a t u p l e15 seq . s p l i t ( ” aa ” ) # [ ’ ACGtCCAgTnAG ’ , ’GT ’ ] : a l i s t16 ’ 1\n2 ’ . s p l i t l i n e s ( ) # [ ’ 1 ’ , ’ 2 ’ ] ( s p l i t a t l i n e boundaries \ r , \n )
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 28
Introduction to Python for Biologists – Strings
Strings methods III1 seq = ”ACGtCCAgTnAGaaGT”23 # D e l e t i n g4 seq . l s t r i p ( ) # remove l e a d i n g whitespace c h a r a c t e r s5 seq . r s t r i p ( ) # remove t r a i l i n g whitespace c h a r a c t e r s6 seq . s t r i p ( ) # remove whitespace c h a r a c t e r s from both ends78 seq . l s t r i p ( ”AC” ) # ’GtCCAgTnAGaaGT ’ ( remove C ’ s or A ’ s )9 seq . l s t r i p ( ”CA” ) # ’GtCCAgTnAGaaGT ’ ( remove C ’ s or A ’ s )
10 seq . l s t r i p ( ”C” ) # ’ACGtCCAgTnAGaaGT ’ ( no impact )11 # same f o r r s t r i p but from the r i g h t and s t r i p from both ends1213 # Simple p a r s i n g o f t e x t l i n e s from CSV f i l e s14 l i n e . s t r i p ( ) . s p l i t ( ’ , ’ ) # remove newl ine and s p l i t CSV (\ t i f TSV)1516 # t r a n s l a t e ( case s e n s i t i v e )17 t a b l e = seq . maketrans ( ’ atcg ’ , ’ tagc ’ ) # map c h a r a c t e r s by index18 seq . lower ( ) . t r a n s l a t e ( t a b l e ) # ’ t g c a g g t c a n t c t t c a ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 29
Introduction to Python for Biologists – – Exercise–
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 30
Introduction to Python for Biologists – – Exercise–
Exercise
Create the following directory structure� Dokumente
� python� notebooks� data
Jupyter Notebook� File: Literals.ipynb� URL: https://cbdm.uni-mainz.de/mb17� Download the file into the notebooks folder
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 31
Introduction to Python for Biologists – Lists, tuples and ranges
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 32
Introduction to Python for Biologists – Lists, tuples and ranges
Sequence types
Text sequence type:� Strings: immutable sequences of characters
Basic sequence types:� Lists: mutable sequences� Tuples: immutable sequences� Ranges: immutable sequence of numbers
Sequence operations:� All sequence types support common sequence operations
(slide 98)� Mutable sequence types support specific operations (slide 99)
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 33
Introduction to Python for Biologists – Lists, tuples and ranges
Lists I
A List is an ordered collection of objects1 L i s t 1 = [ ] # an empty l i s t23 L i s t 1 = [ ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ , ’ dog ’ , ’ F ’ ]4 L i s t 1 [ 0 ] # ’ b ’ ( access i tem o f index 0)5 L i s t 1 [ 1 ] # ’ a ’ ( access i tem o f index 1)6 L i s t 1 [−1] # ’ F ’ ( access the l a s t i tem )7 L i s t 1 [−2] # ’ dog ’ ( access the second l a s t i tem )89 # S l i c e s [ s t a r t : end : step ]
10 L i s t 1 [ 2 : 5 ] # [ 1 , ’ c a t ’ , ’K ’ ] ( index 5 excluded )11 L i s t 1 [ : 5 ] # [ ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ ]12 L i s t 1 [ 5 : ] # [ ’ dog ’ , ’ F ’ ]13 L i s t 1 [−2 : ] # [ ’ dog ’ , ’ F ’ ]14 L i s t 1 [ 0 : 5 : 2 ] # [ ’ b ’ , 1 , ’K ’ ]
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 34
Introduction to Python for Biologists – Lists, tuples and ranges
Lists II1 # B u i l t−i n f u n c t i o n s2 L i s t 2 = [ 1 , 2 , 3 , 4 , 5 ]3 l e n ( L i s t 1 ) # 5 ( l e n g t h = 7 i tems )4 max( L i s t 2 ) # 55 min ( L i s t 2 ) # 16 sum( L i s t 2 ) # 1578 # L i s t methods9 L i s t 2 = [ ] # empty l i s t
10 L i s t 2 . append ( 1 ) # [ 1 ]11 L i s t 2 . append ( ’A ’ ) # [ 1 , ’A ’ ]12 L i s t 2 . extend ( [ ’B ’ , 2 ] ) # [ 1 , ’A ’ , ’B ’ , 2 ]13 L i s t 2 . pop ( 2 ) # [ 1 , ’A ’ , 2 ]14 L i s t 2 . i n s e r t ( 3 , ’A ’ ) # [ 1 , ’A ’ , 2 , ’A ’ ] ( i n s e r t ’A ’ a t index 3)15 L i s t 2 . index ( ’A ’ ) # 1 ( index o f the 1 s t ’A ’ )16 L i s t 2 . count ( ’A ’ ) # 2 ( number o f ’A ’ )17 L i s t 2 . reverse ( ) # [ ’ A ’ , 2 , ’A ’ , 1 ]
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 35
Introduction to Python for Biologists – Lists, tuples and ranges
Lists III
1 # s o r t i n g2 L i s t 3 = [ 5 , 3 , 4 , 1 , 2 ]3 s o r t e d ( L i s t 3 ) # [ 1 , 2 , 3 , 4 , 5 ] ( b u i l d a new s o r t e d l i s t )4 L i s t 3 # [ 5 , 3 , 4 , 1 , 2 ] ( L i s t 3 not changed )5 L i s t 3 . s o r t ( ) # m o d i f i e s the l i s t in−place6 L i s t 3 # [ 1 , 2 , 3 , 4 , 5 ] ( . s o r t ( ) d i d modify L i s t 3 ! )78 # nested l i s t / 2D l i s t s / t a b l e s9 myList = [ [ ’ b ’ , ’ a ’ ] ,
10 [ 1 , ’ c a t ’ ] ] # a l i s t o f 2 l i s t s11 myList [ 0 ] # r e t u r n s the f i r s t l i s t [ ’ b ’ , ’ a ’ ]12 myList [ 0 ] [ 0 ] # ’ b ’ (1 s t i tem o f the 1 s t l i s t )13 myList [ 0 ] [ 1 ] # ’ a ’ (2 nd i tem o f the 1 s t l i s t )14 myList [ 1 ] # r e t u r n s the 2nd l i s t [ 1 , ’ c a t ’ ]15 myList [ 1 ] [ 0 ] = 1 0 # [ [ ’ b ’ , ’ a ’ ] , [ 1 0 , ’ c a t ’ ] ]
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 36
Introduction to Python for Biologists – Lists, tuples and ranges
Lists IV
1 myList = [ [ ’ b ’ , ’ a ’ ] ,2 [ 1 , ’ c a t ’ ] ]34 f o r s u b l i s t i n myList : # loop over s u b l i s t s5 f o r value i n s u b l i s t : # loop over values6 p r i n t ( value ) # p r i n t 1 value per l i n e7 # b8 # a9 # 10
10 # c a t1112 f o r s u b l i s t i n myList : # loop over s u b l i s t s13 n e w s u b l i s t = map( s t r , s u b l i s t ) # c o n v e r t each i tem t o s t r i n g14 p r i n t ( ’ \ t ’ . j o i n ( n e w s u b l i s t ) ) # p r i n t as TSV t a b l e15 # b a16 # 10 c a t
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 37
Introduction to Python for Biologists – Lists, tuples and ranges
Tuples and rangesA Tuple is an ordered collection of objects
1 Tuple1 = ( ) # empty t u p l e2 Tuple1 = ( ’ b ’ , ’ a ’ , 1 , ’ c a t ’ , ’K ’ , ’ dog ’ , ’ F ’ ) # d e f i n e d t u p l e34 Tuple1 [ 0 ] # ’ b ’5 Tuple1 [ 1 : 3 ] # ( ’ a ’ , 1) ( index 3 excluded )
Ranges1 # Range ( s t a r t , stop [ , step ] )2 range ( 1 0 ) # range ( 0 , 10) => no n i c e p r i n t method3 l i s t ( range ( 1 0 ) ) # [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ]4 l i s t ( range ( 0 , 30 , 5) ) # [ 0 , 5 , 10 , 15 , 20 , 25]5 l i s t ( range ( 0 , −5, −1) ) # [ 0 , −1, −2, −3, −4]6 l i s t ( range ( 0 ) ) # [ ]7 l i s t ( range ( 1 , 0) ) # [ ]
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 38
Introduction to Python for Biologists – Sets and dictionaries
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 39
Introduction to Python for Biologists – Sets and dictionaries
Sets IA Set is a mutable unordered collection of objects
1 S0 = s e t ( ) # an empty s e t2 S0 = { ’ a ’ , 1} # a new s e t o f 2 i tems3 S1 = { ’ a ’ , 1 , ’ b ’ , ’R ’ } # a new s e t o f 4 i tems4 S2 = { ’ a ’ , 1 , ’ b ’ , ’S ’ } # a new s e t o f 4 i tems5 l e n ( S0 ) # 267 # Operators8 ’R ’ i n S1 # True9 ’R ’ not i n S2 # True
10 S1 − S2 # i n S1 but not i n S2 => { ’R ’}11 S1 | S2 # i n S1 or i n S2 => {1 , ’ a ’ , ’S ’ , ’R ’ , ’ b ’}12 S1 & S2 # i n S1 and i n S2 => {1 , ’ b ’ , ’ a ’}13 S1 ˆ S2 # i n S1 or i n S2 but not i n both => { ’R ’ , ’S ’}14 S0 <= S1 # S0 i s subset o f S2 => True15 S1 >= S2 # S1 i s superset o f S2 => False16 S1 >= S0 # True17 S0 . i s d i s j o i n t ( S1 ) # False
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 40
Introduction to Python for Biologists – Sets and dictionaries
Sets II
1 # Methods2 S0 . copy ( ) # r e t u r n a new s e t w i t h a shal low copy o f S03 S0 . add ( i tem ) # add element i tem t o the s e t4 S0 . remove ( i tem ) # remove element i tem from the s e t5 S0 . d i s c a r d ( i tem ) # remove element i tem from the s e t i f present6 S0 . pop ( ) # remove and r e t u r n an a r b i t r a r y element7 S0 . c l e a r ( ) # remove a l l elements from the s e t
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 41
Introduction to Python for Biologists – Sets and dictionaries
Dictionaries I
A Dictionary is a mutable indexed collection of objects (indexedby unique keys)
1 d = {} # empty d i c t i o n a r y2 d = { ’A ’ : ”ALA” , ’C ’ : ”CYS” } # d i c t i o n a r y w i t h 2 i tems3 d [ ’A ’ ] # ’ALA ’4 d [ ’C ’ ] # ’CYS ’5 d [ ’H ’ ] = ” HIS ” # add new item6 d # { ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’ , ’A ’ : ’ALA ’}7 d e l d [ ’A ’ ] # { ’C ’ : ’CYS ’ , ’H ’ : ’ HIS ’}89 ’C ’ i n d # True ( key ’C ’ i s i n d )
10 ’A ’ not i n d # True ( key ’A ’ i s not i n d anymore )
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 42
Introduction to Python for Biologists – Sets and dictionaries
Dictionaries II
d[key] get value by keyd[key] = val set value by keydel d[key] delete item by keyd.clear() delete all itemslen(d) number of itemsd.copy() make a shallow copyd.keys() return a view of all keysd.values() return a view of all valuesd.items() return a view of all items (key,value)d.update(d2) add all items from dictionary d2d.get(key [, val]) get value by key if exists, otherwise vald.setdefaults(key [, val]) like d.get(k,val), also set d[k]=val if k not in dpop(key[, default]) remove key and return its value, return default otherwise.d.popitem() remove a random item and returns it as tuple
Table: Functions for dictionaries
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 43
Introduction to Python for Biologists – Convert and copy
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 44
Introduction to Python for Biologists – Convert and copy
Converting types I
Many Python functions are sensitive to the type of data. Forexample, you cannot concatenate a string with an integer:
1 s i g n = ’ You are ’ + 21 + ’−years−o l d ’ # e r r o r ! !2 s i g n = ’ You are ’ + s t r ( 2 1 ) + ’−years−o l d ’ # OK3 s i g n # ’ You are 21−years−o l d ’45 # c o n v e r t t o i n t ( from s t r or f l o a t )6 i n t ( ’ 2014 ’ ) # from a s t r i n g7 i n t (3.141592) # from a f l o a t89 # c o n v e r t t o f l o a t ( from s t r or i n t )
10 f l o a t ( ’ 1.99 ’ ) # from a s t r i n g11 f l o a t ( 5 ) # from an i n t e g e r
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 45
Introduction to Python for Biologists – Convert and copy
Converting types II
1 # c o n v e r t t o s t r ( from i n t , f l o a t , l i s t , t u p l e , d i c t and s e t )2 s t r (3.141592) # ’3.141592 ’3 s t r ( [ 1 , 2 , 3 , 4 ] ) # ’ [ 1 , 2 , 3 , 4 ] ’45 # c o n v e r t a sequence type t o another6 # ( s t r , l i s t , t u p l e , and s e t f u n c t i o n s )7 new set = s e t ( o l d l i s t ) # l i s t t o s e t8 new tuple = t u p l e ( o l d l i s t ) # l i s t t o t u p l e9 new set = s e t ( ” H e l l o ” ) # s t r i n g t o s e t { ’H ’ , ’ o ’ , ’ e ’ , ’ l ’}
10 n e w l i s t = l i s t ( ” H e l l o ” ) # s t r i n g t o l i s t [ ’ H ’ , ’ e ’ , ’ l ’ , ’ l ’ , ’ o ’ ]
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 46
Introduction to Python for Biologists – Convert and copy
Copy I
� Assignments (=) do not copy objects, they create bindingsbetween a target and an object.
1 # Numeric types ( immutable )2 a = 1 # a binds the o b j e c t 13 b = a # b binds the o b j e c t 14 b = b + 1 # b binds a new o b j e c t created by the sum5 a # 16 b # 278 # S t r i n g s ( immutable )9 a = ” H e l l o ” # a binds the o b j e c t ” H e l l o ”
10 b = a # b binds the o b j e c t ” H e l l o ”11 a = a . r e p l a c e ( ’ o ’ , ’ o World ! ’ ) # a binds a new o b j e c t12 a # ’ H e l l o World ! ’13 b # ’ H e l l o ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 47
Introduction to Python for Biologists – Convert and copy
Copy II� For collections that are mutable or contain mutable items, a
shallow copy is sometimes needed so one can change onecopy without changing the other.
1 # D i c t i o n a r y ( mutable )2 d1 = { ’A ’ : ”ALA” , ’C ’ : ”CYS” } # d1 binds the o b j e c t3 d2 = d1 # d2 binds the o b j e c t4 d2 [ ’H ’ ] = ” HIS ” # add i tem t o the o b j e c t5 d1 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’}6 d2 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’}78 d2 = d1 . copy ( ) # d2 binds a shal low copy o f the o b j e c t9 d2 [ ’P ’ ] = ”PRO” # add i tem t o the copied o b j e c t
10 d1 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’C ’ : ’CYS ’}11 d2 # { ’A ’ : ’ALA ’ , ’H ’ : ’ HIS ’ , ’P ’ : ’PRO ’ , ’C ’ : ’CYS ’}
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 48
Introduction to Python for Biologists – Convert and copy
Copy III
1 # L i s t ( mutable )2 l 1 = [ ’A ’ , ’H ’ , ’C ’ ]3 l 2 = l 14 l 2 . append ( ’P ’ )5 l 1 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ]6 l 2 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ]78 l 2 = l 1 [ : ] # shal low copy by a s s i g n i n g a s l i c e o f the a l l l i s t9 l 2 . append ( ’V ’ )
10 l 1 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ ]11 l 2 # [ ’ A ’ , ’H ’ , ’C ’ , ’P ’ , ’V ’ ]
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 49
Introduction to Python for Biologists – Convert and copy
Copy IV
� Convert types to get copies1 n e w l i s t = l i s t ( o l d l i s t ) # shal low copy2 n e w d i c t = d i c t ( o l d d i c t ) # shal low copy3 new set = s e t ( o l d l i s t ) # copy l i s t as a s e t4 new tuple = t u p l e ( o l d l i s t ) # copy l i s t a t u p l e
� The copy module1 i m p o r t copy2 x . copy ( ) # shal low copy o f x3 x . deepcopy ( ) # deep copy o f x , i n c l u d i n g embedded o b j e c t s
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 50
Introduction to Python for Biologists – Loops
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 51
Introduction to Python for Biologists – Loops
For loop I
1 # For i tems i n a l i s t2 f o r person i n [ ’ I s a b e l ’ , ’ Kate ’ , ’ Michael ’ ] :3 p r i n t ( ” Hi ” , person )4 # Hi I s a b e l5 # Hi Kate6 # Hi Michael78 # For i tems i n a d i c t i o n a r y9 seq = ’ ’ # an empty s t r i n g
10 d = { ’A ’ : ”ALA” , ’C ’ : ”CYS” } # a d i c t i o n a r y w i t h 2 keys11 f o r k i n d . keys ( ) : # loop over the keys12 seq += d [ k ] # append value t o seq13 p r i n t ( seq ) # ’CYSALA ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 52
Introduction to Python for Biologists – Loops
For loop II1 # For i tems i n a s t r i n g2 f o r c i n ’ abc ’ :3 p r i n t ( c )4 # a5 # b6 # c78 # For i tems i n a range9 f o r n i n range ( 3 ) :
10 p r i n t ( n )11 # 012 # 113 # 21415 # For i tems from any i t e r a t o r16 f o r n i n i t e r a t o r :17 p r i n t ( n )
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 53
Introduction to Python for Biologists – Loops
Enumerate
1 # loop g e t t i n g index and value2 RNAs = [ ’miRNA ’ , ’ tRNA ’ , ’mRNA ’ ]3 f o r i , rna i n enumerate (RNAs) :4 p r i n t ( i , rna )5 # 0 miRNA6 # 1 tRNA7 # 2 mRNA89 # loop over 2 l i s t s
10 RNAtypes = [ ’ micro ’ , ’ t r a n s f e r ’ , ’ messenger ’ ]11 f o r i , t i n enumerate ( RNAtypes ) :12 r = RNAs [ i ]13 p r i n t ( i , t , r )14 # 0 micro miRNA15 # 1 t r a n s f e r tRNA16 # 2 messenger mRNA
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 54
Introduction to Python for Biologists – Loops
While loop
1 i =02 value =13 w h i l e value <200:4 i +=15 value ∗= i6 p r i n t ( i , value )7 # 1 18 # 2 29 # 3 6
10 # 4 2411 # 5 12012 # 6 720
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 55
Introduction to Python for Biologists – – Exercise –
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 56
Introduction to Python for Biologists – – Exercise –
Exercise
URL� https://cbdm.uni-mainz.de/mb17
Jupyter Notebook� File: Sequences.ipynb� Download the file into the notebooks folder
Data file� File: shrub dimensions.csv� Download the file into the data folder
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 57
Introduction to Python for Biologists – Functions
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 58
Introduction to Python for Biologists – Functions
Functions I1 from random i m p o r t choice # i m p o r t f u n c t i o n ’ choice ’23 # Simple f u n c t i o n4 def kmerFixed ( ) : # d e f i n e f u n c t i o n kmerFixed5 p r i n t ( ”ACGTAGACGC” ) # p r i n t p r e d e f i n e d s t r i n g67 kmerFixed ( ) # d i s p l a y ’ACGTAGACGC ’89 # Return ing a value
10 def kmer10 ( ) : # d e f i n e f u n c t i o n kmer1011 seq= ” ” # d e f i n e an empty s t r i n g12 f o r count i n range ( 1 0 ) : # repeat 10 t imes13 seq += choice ( ”CGTA” ) # add 1 random n t t o s t r i n g14 r e t u r n ( seq ) # r e t u r n s t r i n g1516 newKmer = kmer10 ( ) # get r e s u l t o f f u n c t i o n i n t o v a r i a b l e17 p r i n t ( newKmer ) # c a l l the f u n c t i o n e . g . ’ACGGATACGC ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 59
Introduction to Python for Biologists – Functions
Functions II
1 # One parameter2 def kmer ( k ) : # d e f i n e kmer w i t h 1 param . k3 seq= ” ”4 f o r count i n range ( k ) : # k i s used t o d e f i n e the range5 seq+= choice ( ”CGTA” )6 r e t u r n ( seq )78 p r i n t ( kmer ( k =4) ) # e . g . ’TACC ’9 p r i n t ( kmer ( 2 0 ) ) # e . g . ’CACAATGGGTACCCCGGACC ’
10 p r i n t ( kmer ( 0 ) ) #11 p r i n t ( kmer ( ) ) # TypeError : kmer ( ) missing 1 r e q u i r e d12 # p o s i t i o n a l argument : ’ k ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 60
Introduction to Python for Biologists – Functions
Functions III
1 # Parameters w i t h more parameters and d e f a u l t values2 def gener ic kmer ( a lphabet = ”ACGT” , k =10) :3 seq= ” ”4 f o r count i n range ( k ) :5 seq+= choice ( a lphabet )6 r e t u r n ( seq )78 gener ic kmer ( ” AB12 ” , 15) # e . g . ’112AA1A12AA1121 ’9 gener ic kmer ( ” AB12 ” ) # e . g . ’1AA1B1BA2A ’
10 gener ic kmer ( k =20) # e . g . ’GTGGGCTTGTGCCCTGCACT ’11 gener ic kmer ( ) # e . g . ’CTTGCCGGGA ’12 gener ic kmer ( k =8 , a lphabet = ” #$%&” ) # e . g . ’ $$#&%$%$ ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 61
Introduction to Python for Biologists – Functions
Name spaces I
� Variable and function names defined globally can be seen infunctions: this is the global namespace
1 a = 10 # g l o b a l v a r i a b l e23 def m y f u n c t i o n ( ) :4 p r i n t ( a ) # w i l l use the g l o b a l v a r i a b l e56 m y f u n c t i o n ( ) # 10 ( the g l o b a l a )7 p r i n t ( a ) # 10 ( the g l o b a l a )
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 62
Introduction to Python for Biologists – Functions
Name spaces II
� Names defined within a function can not be seen outside: thefunction has its own namespace.
1 a = 10 # g l o b a l v a r i a b l e23 def m y f u n c t i o n ( ) :4 a = 1 # l o c a l v a r i a b l e d e f i n e d by assignment5 b = 2 # l o c a l v a r i a b l e d e f i n e d by assignment6 p r i n t ( a )78 m y f u n c t i o n ( ) # 1 ( the l o c a l a )9 p r i n t ( a ) # 10 ( the g l o b a l a )
10 p r i n t ( b ) # NameError : name ’ b ’ i s not d e f i n e d
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 63
Introduction to Python for Biologists – Functions
Name spaces III� Use parameters and returned values to get and set variables
outside the name space1 a = 10 # g l o b a l v a r i a b l e23 def m y f u n c t i o n ( v a l ) : # l o c a l v a r i a b l e v a l4 b = 25 v a l = v a l + b6 r e t u r n ( v a l )7 p r i n t ( a ) # 10 ( the g l o b a l a )8 p r i n t ( m y f u n c t i o n ( a ) ) # 129 p r i n t ( a ) # 10 ( the g l o b a l a unchanged )
1011 c = m y f u n c t i o n ( a ) # s e t v a l t o 10 and assign 10+2 t o c12 p r i n t ( c ) # 12 ( g l o b a l a was changed )13 p r i n t ( a ) # 10 ( g l o b a l a was unchanged )1415 a = m y f u n c t i o n ( a ) # change g l o b a l a w i t h value 10+2
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 64
Introduction to Python for Biologists – Branching
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 65
Introduction to Python for Biologists – Branching
Truth Value Testing I
Any object can be tested for truth value. The following values areconsidered false (other values are considered True):
� None� False� zero value: e.g. 0 or 0.0� an empty sequence or mapping: e.g. ’ ’, (), [ ], { }.
Operations and built-in functions that have a Boolean resultalways return 0 for False and 1 for True
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 66
Introduction to Python for Biologists – Branching
Boolean Operations I
A Boolean is equal to True or False� a and b (true if a and b are true, false otherwise)� a or b (true if a or b is true (1 alone or both), false otherwise)� a ˆ b (true if either a or b is true (not both), false otherwise)� not b (true if b is false, false otherwise)
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 67
Introduction to Python for Biologists – Branching
Boolean Operations II
All example code for tests below return ”True” unless otherwisespecified
1 # l e t s e t values o f 3 v a r i a b l e s ( s i n g l e ” = ” symbol )2 a = True3 b = False4 c = True567 # simple t e s t s using two ” = ” symbols ( = = )8 a == True9 b == False
10 c == True
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 68
Introduction to Python for Biologists – Branching
Boolean Operations III
1 # l e t s e t values o f 3 v a r i a b l e s ( one ” = ” symbol )2 a = True3 b = False4 c = True56 # order i s i r r e l e v a n t7 ( a or b ) == ( b or a )8 ( a and b ) == ( b and a )9
10 # n e u t r a l ( whatever value o f a )11 ( a or False ) == a12 ( a and True ) == a1314 # always the same ( whatever value o f a )15 ( a and False ) == False16 ( a or True ) == True
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 69
Introduction to Python for Biologists – Branching
Boolean Operations IV1 # l e t s e t values o f 3 v a r i a b l e s ( one ” = ” symbol )2 a = True3 b = False4 c = True56 # precedence ” = = ” > ” not ” > ” and ” > ” or ”7 ( a and b or c ) == ( ( a and b ) or c )8 ( not a == b ) == ( not ( a == b ) )9
10 # e q u i v a l e n t expressions11 ( ( a or b ) or c ) == ( a or ( b or c ) ) == ( a or b or c )12 ( a or a or a ) == a13 ( b and b and b ) == b1415 b and b and b == b # False and False and True => False ! !1617 a and ( b or c ) == ( a and b ) or ( a and c )18 a or ( b and c ) == ( a or b ) and ( a or c )
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 70
Introduction to Python for Biologists – Branching
Comparisons1 Operat ions2 < # s t r i c t l y l e s s than3 <= # l e s s than or equal4 > # s t r i c t l y g r e a t e r than5 >= # g r e a t e r than or equal6 == # equal ( two symbols =)7 math . i s c l o s e ( a , b ) # equal f o r f l o a t i n g p o i n t s a and b8 ! = # not equal9 i s # o b j e c t i d e n t i t y
10 i s not # negated o b j e c t i d e n t i t y11 x < y <= z # i s e q u i v a l e n t t o ” x < y and y <= z ”
� Comparisons between objects of same class are supported ifoperator defined for the class.
� Different numerical types can be compared: e.g. 2<4.56� Floating points can not be compared exactly due to the limited
precision to represent infinite numbers such as 1/3 =0.33333...
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 71
Introduction to Python for Biologists – Branching
Conditionals
� IF-ELIF-ELSE1 seq = ’ATGAnnATG ’2 i f ’ n ’ i n seq :3 p r i n t ( ” sequence c o n t a i n s undef ined bases ( n ) ” )4 e l i f ’ x ’ i n seq :5 p r i n t ( ” sequence c o n t a i n s unknown bases x but not n ” )6 e l s e :7 p r i n t ( ” no undef ined bases i n sequence ” )89 #
10 # sequence c o n t a i n s undef ined bases
� ELIF and ELSE are optional� multiple ELIF are possible
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 72
Introduction to Python for Biologists – – Exercise –
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 73
Introduction to Python for Biologists – – Exercise –
Exercise
URL� https://cbdm.uni-mainz.de/mb17
Jupyter Notebook� File: Conditionals.ipynb� Download the file into the notebooks folder
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 74
Introduction to Python for Biologists – Regular Expressions
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 75
Introduction to Python for Biologists – Regular Expressions
RE: Regular Expressions I
� Regular expressions (called REs, or regexes, or regexpatterns) are a powerful language for matching text patterns(re module)
� In Python a regular expression search is typically written as:1 match = re . search ( expression , s t r i n g )
� The re.search() method takes a regular expression patternand a string and searches for that pattern within the string.
� If the search is successful, re.search() returns a Matchobject (actually class ’ sre.SRE Match’) or None otherwise.
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 76
Introduction to Python for Biologists – Regular Expressions
RE: Regular Expressions II
1 i m p o r t re # i m p o r t re module2 s t r = ’ an example word : c a t ! ! ’ # Example s t r i n g3 match = re . search ( r ’ word :\w\w\w ’ , s t r ) # Search a p a t t e r n4 i f match :5 p r i n t ( ’ found ’ , match . group ( ) ) # ’ found word : c a t ’6 e l s e :7 p r i n t ( ’ d i d not f i n d ’ )
� In the pattern string, \w codes a character (letter, digit orunderscore)
� The ’r’ at the start of the pattern string designates a python”raw” string which passes through backslashes withoutchange.
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 77
Introduction to Python for Biologists – Regular Expressions
RE: Basic Patterns
Pattern Matcha, X, 9, < ordinary characters match themselves exactly. a period matches any single character except newline\w matches a ”word” character: a letter or digit or underbar [a-zA-Z0-9 ]\W matches any non-word character\b boundary between word and non-word\s a single whitespace character – space, newline, return, tab, form [\n \r \t \f]\S matches any non-whitespace character\t tab\n newline\r return\d decimal digit [0-9]ˆ circumflex (top hat) matches the start of a string$ dollar matches the end of a string\ inhibits the ”specialness” of a character. So, for example, use \. to match a period
Table: Regular expressions: basic patterns
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 78
Introduction to Python for Biologists – Regular Expressions
RE: Basic examples I
The basic rules of RE search for a pattern within a string are:� The search proceeds through the string from start to end,
stopping at the first match found� All of the pattern must be matched, but not all of the string� If match = re.search(pat, str) is successful, match is not
None and in particular match.group() is the matching text
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 79
Introduction to Python for Biologists – Regular Expressions
RE: Basic examples II
1 match = re . search ( r ’ i i i ’ , ’ p i i i g ’ ) # found2 match . group ( ) == ” i i i ” # True34 match = re . search ( r ’ i g s ’ , ’ p i i i g ’ ) # not found5 match == None # True67 match = re . search ( r ’ . . g ’ , ’ p i i i g ’ ) # found8 match . group ( ) == ” i i g ” # True9
10 match = re . search ( r ’ \d\d\d ’ , ’ p123g ’ ) # found11 match . group ( ) == ” 123 ” # True1213 match = re . search ( r ’ \w\w\w ’ , ’@@abcd ! ! ’ ) # found14 match . group ( ) == ” abc ” # True
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 80
Introduction to Python for Biologists – Regular Expressions
RE: Repetitions IRepetitions are defined using +, *, ? and { }
� + means 1 or more occurrences of the pattern to its left� e.g. i+ = one or more i’s
� * means 0 or more occurrences of the pattern to its left� ? means match 0 or 1 occurrences of the pattern to its left� curly brackets are used to specify exact number of repetitions
� e.g. A{5} for 5 A letters� A{6,10} for 6 to 10 A letters
Leftmost and Largest:� First the search finds the leftmost match for the pattern, and
second it tries to use up as much of the string as possible� i.e. + and * go as far as possible (they are said to be
”greedy”).
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 81
Introduction to Python for Biologists – Regular Expressions
RE: Repetitions II
1 # simple r e p e t i t i o n s2 re . search ( r ’ p i + ’ , ’ p i i i g ’ ) . group ( ) # p i i i3 re . search ( r ’ p i ? ’ , ’ ap ’ ) . group ( ) # p4 re . search ( r ’ p i ? ’ , ’ a p i i ’ ) . group ( ) # p i5 re . search ( r ’ p i ∗ ’ , ’ ap ’ ) . group ( ) # p6 re . search ( r ’ p i ∗ ’ , ’ a p i i ’ ) . group ( ) # p i i7 re . search ( r ’ p i {3} ’ , ’ a p i i i i i ’ ) . group ( ) # p i i i8 re . search ( r ’ i + ’ , ’ p i i g i i i i ’ ) . group ( ) # i i (1 s t h i t o n l y )9
10 # 3 d i g i t s p o s s i b l y separated by whitespaces (\ s ∗ )11 re . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx1 2 3xx ’ ) . group ( ) # ”1 2 3”12 re . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx12 3xx ’ ) . group ( ) # ”12 3”13 re . search ( r ’ \d\s∗\d\s∗\d ’ , ’ xx123xx ’ ) . group ( ) # ”123”
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 82
Introduction to Python for Biologists – Regular Expressions
RE: Sets of characters I
� Square brackets indicate a set of characters� [ABC] matches ’A’ or ’B’ or ’C’.
� The codes \w, \s etc. work inside square brackets too withthe one exception that dot (.) just means a literal dot
� Dash indicate a range or itself if put at the end� [a-z] for lowercase alphabetic characters� [a-zA-Z] for alphabetic characters� [AB-] for A, B or dash
� Circumflex (ˆ) at the start inverts the set� [ˆAB] for any character except A or B.
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 83
Introduction to Python for Biologists – Regular Expressions
RE: Sets of characters II
1 s t r = ’ p u r p l e a l i c e−b@google . com monkey dishwasher ’2 match = re . search ( r ’ \w+@\w+ ’ , s t r )3 i f match :4 p r i n t match . group ( ) ## ’ b@google ’56 match = re . search ( r ’ [\w.−]+@[\w.−]+ ’ , s t r )7 i f match :8 p r i n t match . group ( ) ## ’ a l i c e−b@google . com ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 84
Introduction to Python for Biologists – Regular Expressions
RE: Functions IRE module functions:
� re.match() returns a Match object if occurrence found at beginingof string, None otherwise
� re.search() returns a Match object for 1st occurrence, None if notfound
� re.findall() returns a list of matched sub strings, an empty list if notfound
� re.finditer() returns an iterator on Match objects of theoccurrences, an empty iterator if not found
Match object methods:� match.start() returns start index� match.end() returns end index� match.span() returns start and end index in a tuple� match.group() returns matched string
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 85
Introduction to Python for Biologists – Regular Expressions
RE: Functions II
1 i m p o r t re2 seq = ”RPAPPDRAPDQX” # A sequence3 expr = ’A.{1 ,2}D ’ # A and D separated by 1 or 2 c h a r a c t e r s45 match = re . search ( expr , seq )6 i f match :7 p r i n t (8 match . s t a r t ( ) , # s t a r t index9 match . end ( ) , # end index
10 match . span ( ) , # s t a r t and end index11 match . group ( ) , # the matched s t r i n g12 seq [ match . s t a r t ( ) : match . end ( ) ] , # the matched s t r i n g13 sep= ’ − ’14 )15 # 2 − 6 − ( 2 , 6) − APPD − APPD
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 86
Introduction to Python for Biologists – Regular Expressions
RE: Functions III1 i m p o r t re2 seq = ”RPAPPDRAPDQX” # A sequence3 expr = ’A.{1 ,2}D ’ # A and D separated by 1 or 2 c h a r a c t e r s45 match = re . match ( expr , seq ) # Not found a t b e g i n i n g6 p r i n t ( match )7 # None89 matches = re . f i n d a l l ( expr , seq ) # Found 2 occurrences
10 p r i n t ( matches )11 # [ ’APPD ’ , ’APD ’ ]1213 matches = re . f i n d i t e r ( expr , seq ) # Found 2 occurrences14 f o r m i n matches : # I t e r a t e over Match o b j e c t s15 p r i n t ( m. span ( ) , m. group ( ) ) # Use each Match o b j e c t16 # ( 2 , 6) APPD17 # ( 7 , 10) APD
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 87
Introduction to Python for Biologists – Regular Expressions
RE: Group Extraction
� Groups are defined with parentheses� On a successful search
� match.group(): the whole match text� match.group(1): match text of 1st left parenthesis� match.group(2): match text of 2nd left parenthesis� ...
1 i m p o r t re2 s t r = ’ p u r p l e a l i c e−b@google . com monkey dishwasher ’3 match = re . search ( ’ ( [ \w.− ]+)@( [ \w.− ]+) ’ , s t r )4 i f match :5 p r i n t ( match . group ( ) ) ## ’ a l i c e−b@google . com ’6 p r i n t ( match . group ( 1 ) ) ## ’ a l i c e−b ’7 p r i n t ( match . group ( 2 ) ) ## ’ google . com ’
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 88
Introduction to Python for Biologists – Regular Expressions
RE: Group Extraction and Findall� If the pattern includes a single set of parenthesis, then
findall() returns a list of strings corresponding to that singlegroup
� If the pattern includes 2 or more parenthesis groups, theninstead of returning a list of strings, findall() returns a list oftuples. Each tuple represents one match of the pattern, andinside the tuple is the group(1), group(2) ... data.
1 s t r = ’ al ice@google . com , monkey bob@abc . com dishwasher ’2 t u p l e s = re . f i n d a l l ( r ’ ( [ \w\ .− ]+)@( [ \w\ .− ]+) ’ , s t r )3 p r i n t ( t u p l e s )4 # [ ( ’ a l i c e ’ , ’ google . com ’ ) , ( ’ bob ’ , ’ abc . com ’ ) ]56 f o r t i n t u p l e s :7 p r i n t ( t [ 0 ] , t [ 1 ] , sep= ’ | ’ )8 # a l i c e | google . com9 # bob | abc . com
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 89
Introduction to Python for Biologists – Regular Expressions
RE: Options
The re functions take options to modify the behavior of the patternmatch. The option flag is added as an extra argument to thesearch() or findall() etc., e.g. re.search(pat, str,re.IGNORECASE).
� IGNORECASE ignores upper/lowercase differences formatching
� DOTALL allows dot (.) to match newline – normally it matchesanything but newline.
� Note that \s (whitespace) includes newlines� MULTILINE allows ˆand $ to match the start and end of each
line within a string made of many lines. Normally they justmatch the start and end of the whole string.
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 90
Introduction to Python for Biologists – Regular Expressions
Greedy vs. Non-Greedy
� .* or .+ return the largest match (aka it is ”greedy”)� to get nested occurrences use .*? or .+?
1 s t r i n g = ’<b>foo </b> and <i>so on</ i> ’ # s t r i n g w i t h xml tags23 matches = re . f i n d a l l ( r ’<.∗> ’ , s t r i n g ) # <.∗>4 p r i n t ( matches ) # [ ’<b>foo </b> and <i>so on</ i > ’ ] # got a l l s t r i n g56 matches = re . f i n d a l l ( r ’<.∗?> ’ , s t r i n g ) # <.∗?>7 p r i n t ( matches ) # [ ’<b> ’ , ’</b> ’ , ’< i > ’ , ’</ i > ’ ] # got each tag
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 91
Introduction to Python for Biologists – Regular Expressions
Substitution� re.sub(expression, replacement, string)
1 t e x t 1 = ’ al ice@google . com and bob@abc . net ’2 t e x t 2 = re . sub ( r ’ \ .\w+ ’ , r ’ . de ’ , t e x t 1 )3 p r i n t ( t e x t 2 )4 # alice@google . de and bob@abc . de
� \1, \2 ... in replacement refer to match group(1), group(2) ...1 t e x t 1 = ’ al ice@google . com and bob@abc . com ’2 t e x t 2 = re . sub (3 r ’ ( [ \w\ .− ]+)@( [ \w\ .− ]+) ’ , # Expression4 r ’ \2@\1 ’ , # Replacement s t r i n g5 s t r ) # I n p u t s t r i n g6 p r i n t ( t e x t 2 )7 ## google . com@alice and abc . com@bob
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 92
Introduction to Python for Biologists – – Exercise –
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 93
Introduction to Python for Biologists – – Exercise –
Exercise
URL� https://cbdm.uni-mainz.de/mb17
Jupyter Notebook� File: Regex.ipynb� Download the file into the notebooks folder
Data file� File: sequences.tsv� Download the file into the data folder
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 94
Introduction to Python for Biologists – Annexes
IntroductionRunning codeLiterals and variablesNumeric typesStrings– Exercise–Lists, tuples and rangesSets and dictionaries
Convert and copyLoops– Exercise –FunctionsBranching– Exercise –Regular Expressions– Exercise –Annexes
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 95
Introduction to Python for Biologists – Annexes
References
� Python documentation� https://docs.python.org
� Online tutorials (Python 2 or 3)� Google’s Python Class� ProgrammingForBiologists.org
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 96
Introduction to Python for Biologists – Annexes
Escape sequences
Escape Sequence Meaning\newline Backslash and newline ignored\\ Backslash (\)\’ Single quote (’)\” Double quote (”)\a ASCII Bell (BEL)\b ASCII Backspace (BS)\f ASCII Formfeed (FF)\n ASCII Linefeed (LF)\r ASCII Carriage Return (CR)\t ASCII Horizontal Tab (TAB)\v ASCII Vertical Tab (VT)\ooo Character with octal value ooo\xhh Character with hex value hh
Table: Escape sequences
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 97
Introduction to Python for Biologists – Annexes
Common Sequence Operations
Operation Resultx in s True if an item of s is equal to x, else Falsex not in s False if an item of s is equal to x, else Trues + t the concatenation of s and ts * n or n * s equivalent to adding s to itself n timess[i] ith item of s, origin 0s[i:j] slice of s from i to js[i:j:k] slice of s from i to j with step klen(s) length of smin(s) smallest item of smax(s) largest item of ss.index(x[, i[, j]]) index of the first occurrence of x in s (at or after index i and before index j)s.count(x) total number of occurrences of x in s
Table: Sequence operations sorted in ascending priority. s and t aresequences of the same type, n, i, j and k are integers and x is anarbitrary object that meets any type and value restrictions imposed by s.
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 98
Introduction to Python for Biologists – Annexes
Operations on mutable sequence types
Operation Results[i] = x item i of s is replaced by xs[i:j] = t slice of s from i to j is replaced by the contents of the iterable tdel s[i:j] same as s[i:j] = []s[i:j:k] = t the elements of s[i:j:k] are replaced by those of tdel s[i:j:k] removes the elements of s[i:j:k] from the lists.append(x) appends x to the end of the sequence (same as s[len(s):len(s)] = [x])s.clear() removes all items from s (same as del s[:])s.copy() creates a shallow copy of s (same as s[:])s.extend(t) or s += t extends s with the contents of t (for the most part the same as s[len(s):len(s)] = t)s *= n updates s with its contents repeated n timess.insert(i, x) inserts x into s at the index given by i (same as s[i:i] = [x])s.pop([i]) retrieves the item at i and also removes it from ss.remove(x) remove the first item from s where s[i] == xs.reverse() reverses the items of s in place
Table: s is an instance of a mutable sequence type, t is any iterableobject and x is an arbitrary object that meets any type and valuerestrictions imposed by s
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 99
Introduction to Python for Biologists – Annexes
Built-in functions
abs() Return the absolute value of a number.all() Return True if all elements of the iterable are true (or if the iterable is empty).any() Return True if any element of the iterable is true. If the iterable is empty, return False.ascii() Return a string containing a printable representation of an object (escape non-ASCII characters).bin() Convert an integer number to a binary string.bool() Convert a value to a Boolean.chr() Return the string representing a character.dict() Create a new dictionary.dir() Return the list of names in the current local scope.float() Convert a string or a number to floating point.format() Convert a value to a ”formatted” representation.help() Invoke the built-in help system.hex() Convert an integer number to a hexadecimal string.
Table: Python built-in functions
March 21, 2017 Johannes Gutenberg-Universitat Mainz Taskova & Fontaine 100
top related