introduction to pythonintroduction to python september 26, 2011 9/27/11 2 bioinformatics languages !...

22
Introduction to Python September 26, 2011

Upload: others

Post on 05-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

Introduction to Python

September 26, 2011

Page 2: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 2

Bioinformatics Languages

!  Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming (harder to both write &read)

!  Statistical languages: R, MATLAB, Octave… Pros: many functions are provided. Cons: limited applicability to non-statistical problems (and

some major ones are non-free). !  Scripting languages: Python, Perl, Ruby...

Pros: fast programming. Python is easy to read. Cons: slower run times and larger memory footprint (sometimes

by orders of magnitude).

Page 3: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 3

Python

!  Started in 1989. Currently in version 2.7/3.3 !  More than most languages – very readable & clear !  "There should be one – and preferably only one – obvious

way to do it." !  Like most scripting languages: !  Interpreted !  Garbage Collected (no need for memory management) !  High memory usage (wasteful)

Page 4: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 4

1  for x in [0,1,2,3,4]: 2  result = x**2 – 5 3  print result

Printing and Loops

Suppose we want to evaluate the expression x2 – 5 for the integers 0 through 4

A Python keyword indicating a loop. Other option for loops is while.

Four spaces. In Python, whitespace matters. Shows block structure.

A Python list. Like an array in other languages. Indicated by the brackets.

Colon shows beginning of block.

Single equals sign assigns value to variable

A built-in function. Prints to stdout.

Page 5: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 5

Running Python Interactively

!  Start the Python interpreter: !  $ python

!  Enter commands at >>> prompt.

Page 6: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 6

A Python Program

!  Structure of a program:

Page 7: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 7

A Python Program

!  Write program in editor of your choice. There are many options !  See http://stackoverflow.com/questions/60784/poll-which-python-ide-editor-is-the-best

Page 8: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 8

1  def little_function(n): 2  return n**2 – 5

3  for x in [0,1,2,3,4]: 4  result = little_function(x) 5  print result

Declaring a function

!  Suppose our goal is similar, but we want to separate the details of the calculation into a function.

A Python keyword indicating a new function.

The name of the function.

A Python keyword indicating the vale to return.

We replace our previous expression with a function call.

The parameters to the function.

Page 9: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 9

Modules and Imports

!  Now suppose instead of finding x2-5, we want to find log(x) !  Python has “batteries included”, meaning it has a broad

standard library

1  import math

2  def little_function(n): 3  return math.log(n)

4  for x in [0,1,2,3,4]: 5  result = little_function(x) 6  print result

Python keyword at beginning of file to make a module available.

The name of the module to import. In this case, we're using math. Other common standard library modules include sys, os, re, datetime, and zlib.

Dot is member access/ scope operator in Python

Page 10: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 10

Conditional Statements

!  Python uses if/elif/else !  Suppose we want to print “Less than zero.”

or “Zero.” rathen than the value.

1  import math 2  def little_function(n): 3  return math.log(n) 4  for x in [0,1,2,3,4]: 5  result = little_function(x) 6  if result < 0: 7  print “Less than zero.” 8  elif result == 0: 9  print “Zero.” 10  else: 11  print result

If blocks use same colon and indent rules as for loops.

Elif and else are optional.

Double equals tests for equality.

Page 11: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 11

Modules and Imports

!  Now suppose instead of finding x2-5, we want to find log(x) !  Python has “batteries included”, meaning it has a broad

standard library

1  import math

2  def little_function(n): 3  return math.log(n)

4  for x in [0,1,2,3,4]: 5  result = little_function(x) 6  print result

Python keyword at beginning of file to make a module available.

The name of the module to import. In this case, we're using math. Other common standard library modules include sys, os, re, datetime, and zlib.

Dot is member access/ scope operator in Python

Page 12: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 12

Strings

1  s = 'Hello' 2  print s[0] 3  print s[4] 4  print s[-1] 5  print s[1:3] 6  print s[2:] 7  print s[:3] 8  print s[::2] 9  print s[::-1] 10  print len(s)

Single or double quotes denote a string

Brackets access characters of the string by index:

“Slices” can be taken with indices separated by a colon

Third term in slice determines step size.

'H' 'o' 'o' 'el' 'llo' 'Hel' 'Hlo' 'olleH' 5 len() gives length of string.

Page 13: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 13

String Methods !  s.lower(), s.upper() -- returns the lowercase or uppercase version of the string

!  s.strip() -- returns a string with whitespace removed from the start and end

!  s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes

!  s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string

!  s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found

!  s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'

!  s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.

!  s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc

Page 14: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 14

Practice

!  Write a function that takes a string and returns another string consisting of the first two and last two characters of the input string. If the input string has fewer than two characters, return an empty string.

!  Write a function that takes two strings and returns the number of times the second string appears in the first.

Page 15: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 15

Lists

!  Lists in Python are similar to arrays in other languages

1  z = [17,19, 23, 29, 31]

2  print z[0] 3  print z[4] 4  print z[-1] 5  print z[-3] 6  print z[1:3] 7  print z[2:] 8  print z[:3] 9  print z[::-1] 10  print range(5)

Square brackets indicate list.

Brackets also access elements in the list. Note they are 0-indexed.

Negative index starts from end.

Colon inidicates “slice” from list.

Idiom for reversing a list.

17 31 31 23

[19, 23] [23, 29, 31] [17, 19, 23]

[31, 29, 23, 19, 17]

[0, 1, 2, 3, 4] range(n) return a list of integers

Page 16: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 16

List Methods !  list.append(elem) -- adds a single element to the end of the list. Common error: does not

return the new list, just modifies the original.

!  list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.

!  list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().

!  list.index(elem) -- searches for the given element from the start of the list and returns its index. Throws a ValueError if the element does not appear (use "in" to check without a ValueError).

!  list.remove(elem) -- searches for the first instance of the given element and removes it (throws ValueError if not present)

!  list.sort() -- sorts the list in place (does not return it). (The sorted() function shown below is preferred.)

!  list.reverse() -- reverses the list in place (does not return it)

!  list.pop(index) -- removes and returns the element at the given index. Returns the rightmost element if index is omitted (roughly the opposite of append()).

Page 17: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 17

Practice

!  Write a function that takes a list of strings, return a list with the strings in sorted order, except group all the strings that begin with 'x' first.

!  Write a function that takes a list of numbers and returns a list where all adjacent == elements have been reduced to a single element, so [1, 2, 2, 3] returns [1, 2, 3].

Page 18: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 18

Dictionaries

!  Python's key/value hash table is called a dictionary 1  d = {} 2  d['a'] = 'alpha' 3  d['g'] = 'gamma' 4  print d['a'] 5  print d['z'] 6  if 'z' in d: 7  print d['z'] 8  print d.keys() 9  print d.values() 10  print d.items()

Curly braces indicate a dictionary

Associate keys with values

Retrieve values associated with keys 'alpha' KeyError

Check if key in dictionary.

['a', 'g']

['alpha', 'gamma'] [('a', 'alpha'), ('g', 'gamma')]

Page 19: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 19

Files

1  f = open('file.txt', 'r') 2  outf = open('output.txt', 'w') 3  for line in f: 4  print line 5  outf.write(line) 6  f.close() 7  outf.close() 8  wholefile = file('file.txt').read() 9  oneline = file('file.txt').readline()

!  Files in Python are generally handled line by line open returns a file object

The second argument to open sets the mode. 'r' means read, 'w' means write. Note the write mode completely overwrites an existing file.

You can iterate through lines in a file using a for loop.

The write method of a file object in write mode writes a string to the file.

You can use the shorter file(filename) syntax to get a file object in read mode.

The read method with no arguments returns the contents of the whole file. The readline() method returns a single line

from a file object.

Page 20: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 20

Command Line Arguments

!  The sys module has a list called argv that contains the arguments used at the command line.

1  import sys 2  3  def main(word_to_print): 4  print word_to_print

5  if __name__ == '__main__': 6  print sys.argv 7  main(sys.argv[1])

['scriptname.py', 'argument1', 'argument2', ...]

Here select the second element of the list since we don't care about the name of the script.

Page 21: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 21

Practice

!  Sequencing data comes in files in the FASTQ format:

!  Write a program that reads a FASTQ file and writes only the reversed sequences to another file. The names of in input and output files should be passed as command line parameters.

@SEQ_ID1 GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 @SEQ_ID2 AGTGCGGGAAATATCACCGTACATTCATCGCCCCCCTGAACAATACCCATAGATCACTT + !''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF>>>>>>CCCCCCC65 ...

Page 22: Introduction to PythonIntroduction to Python September 26, 2011 9/27/11 2 Bioinformatics Languages ! Low-level, compiled languages: C, C++, Java… Pros: performance Cons: slower programming

9/27/11 22

Getting Help

!  Python has an online tutorial and reference at http://docs.python.org/

!  The “help” command gives help in interactive mode:

!  Google “python” + your question !  For windows users: !  http://www.richarddooling.com/index.php/2006/03/14/

python-on-xp-7-minutes-to-hello-world/

help(len) len(...) len(object) -> integer

Return the number of items of a sequence or mapping.