1 why to become a pyologist perl is for plumbers – python is for biologists stefan maetschke...
Post on 18-Dec-2015
218 views
TRANSCRIPT
1
why to become a Pyologistwhy to become a Pyologist
Perl is for plumbers – Python is for biologists
Stefan MaetschkeTeasdale Group
2
why
Biologists suffer for no good reason Perl is difficult to write and read Perl gives weak error feedback Perl obscures basic concepts Limited understanding of principles Low productivity Reduced research scope
Perl is for plumbers - Python is for scientists I want to have an easy life
why, why, why …
3
plumbers and others
sys admin plumbing vi awk/Perl grep/diff
SW developer designing Emacs/IDE C/C++/Java UML/Unit test
spectrum of tasks, tools and roles
scientist
Python
4
equals( , )
Cross-platform, open-source, scripting language, multi-paradigm, dynamic typing, statement ratio: 6
There should be one way There’s more than one way
Guido van Rossum Larry Wall
1991 1987
Python Perl
Easy Difficult
5
you must be joking!
http://www.strombergers.com/python/
my @list = ('a', 'b', 'c'); my %hash; $hash{‘letters'} = \@list; print "@{$hash{‘letters'}}\n";
list = ['a', 'b', 'c'] hash = {} hash[‘letters'] = list print hash[‘letters']
package Person; use strict; sub new { my $class = shift; my $age = shift or die "Must pass age"; my $rSelf = {'age' => $age}; bless ($rSelf, $class); return $rSelf; }
class Person: def __init__(self, age): self.age = age
@list = ( [‘a’, ’b’, ’c’], [1, 2, 3] );print “@{$list[0]}\n”; print “@{$list[0]}\n”;
list = [ [‘a’, ’b’, ’c’], [1, 2, 3] ]print list[0]
6
More Perl bashing…
http://www.strombergers.com/python/
sub add { $_[0] + $_[1]; }
def add(a, b): return a + b
sub add { my ($a, $b) = _@; return $a + $b; }
sub add { my $a = shift; my $b = shift; return $a + $b; }
def diff(a, b): return len(a) - len(b)
sub diff { my ($aref, $bref) = _@; my (@a) = @$aref; my (@b) = @$bref; return scalar(@a) + scalar(@b);}}
sub add($, $) { local ($a, $b) = _@; return $a + $b; }
7
complexity wall
simple scripts
≈ 100 lines=> fun stops
Higher order concepts
Data structuresFunctionsClasses
=> Python allows you to break through the complexity wall
everything you can do in Python you can do in Perl but you don’t
8
googliness
C 53,000 1,820 572 Java 7,760 2,890 320 C++ 1,290 3,100 231 C# 1,020 794 161 Perl 1,150 685 101 Python 527 798 199 Ruby 470 806 186 Scala 394 354 69 Haskell 212 323 74
X language X load file
kilo-hits, May 2008
X bioinformatics
10
damn lies and stats
http://rengelink.textdriven.com/blog/
sourceforge projects
Perl declining, Python increasing ? May 2008, keyword search : Perl 3474, Python 4063
11
see the light…
classify Iris plants
Fisher, R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936)
http://archive.ics.uci.edu/ml/datasets/Iris
Three species: • Iris setosa• Iris versicolor • Iris virginica
Four attributes:• sepal length• sepal width• petal length• petal width
17
libs for life science Scientific computing: SciPy, NumPy, matplotlib Bioinformatics: BioPython Phylogenetic trees: Mavric, Plone, P4, Newick Microarrays: SciGraph, CompClust Molecular modeling: MMTK, OpenBabel, CDK, RDKit, cinfony,
mmLib Dynamic systems modeling: PyDSTools Protein structure visualization: PyMol, UCSF Chimera Networks/Graphs: NetworkX, PyGraphViz Symbolic math: SymPy, Sage Wrapper for C/C++ code: SWIG, Pyrex, Cython R/SPlus interface: RSPython, RPy Java interface: Jython Fortran to Python: F2PY …
Check also out: http://www.scipy.org/Topical_Softwareand: http://pypi.python.org/pypi
18
last words
Perl perfect for plumbing Python excellent for scientific programming
Easy to learn, write and maintain Suited for scripting and mid-size projects Huge number of scientific libraries
Python is an attractive alternative to Matlab/R Easy integration of Java, C/C++ or Fortran code
21
links Wikipedia – Python
http://en.wikipedia.org/wiki/Python Instant Python
http://hetland.org/writing/instant-python.html How to think like a computer scientist
http://openbookproject.net//thinkCSpy/ Dive into Python
http://www.diveintopython.org/ Python course in bioinformatics
http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html Beginning Python for bioinformatics
http://www.onlamp.com/pub/a/python/2002/10/17/biopython.html SciPy Cookbook
http://www.scipy.org/CookbookMatplotlib Cookbookhttp://www.scipy.org/Cookbook/Matplotlib
Biopython tutorial and cookbookhttp://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.html
Huge collection of Python tutorialhttp://www.awaretek.com/tutorials.html
What’s wrong with Perlhttp://www.garshol.priv.no/download/text/perl.html
20 Stages of Perl to Python conversionhttp://aspn.activestate.com/ASPN/Mail/Message/python-list/1323993
Why Pythonhttp://www.linuxjournal.com/article/3882
22
some papers Bassi S. (2007)
A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199
Mangalam H. (2002)The Bio* toolkits--a brief overview. Brief Bioinform. 3(3):296-302.
Fourment M., Gillings MR. (2008)A comparison of common programming languages used in bioinformatics.BMC Bioinformatics 9:82.
23
to whom it may concern
NPs who don’t use Perl yet NPs who want to see the light NPs who want to give their code away
without being rightfully ashamed Matlab aficionados
NP = Non-Programmer
24
one of ten Perl mythshttp://www.perl.com/pub/a/2000/01/10PerlMyths.html
“…Perl works the way you do…”
“…That's one, fairly natural way to think about it…”
while (<>) { s/(.*):(.*)/$2:$1/; print; }
Swap two sections of a string: “aaa:bbb” -> “bbb:aaa”
for line in file: line = line.strip() first, second = line.split(‘:’) print second+’:’+first
while (<>) { chomp; ($first, $second) = split /:/; print $second, ":", $first, "\n"; }
“…we can happily consign the idea that ‘Perl is hard’ to mythology.”
from re import subfor line in file: print sub(‘(.*):(.*)’, r’\2:\1’, line)
25
camel chaos does not scale well complex syntax cryptic commands does not encourage clear code difficult to read/maintain hard to understand the principles error prone
no check of subroutine arguments variables are global by default …
26
why Python overcome the complexity wall many, excellent scientific libraries clear, easy to learn syntax hard to do it wrong does not require prior suffering/experience
27
my bias R&D: C/C++ ->
applied ML in robotics, image processing, quality control SW Development: Java ->
Speech Processing, Data Mining Computational Biology: Java, Python Other languages I played with:
Ada, APL, Basic, MatLab, Modula, Pascal, Perl, Prolog, R, Groovy, Forth, Fortran, Scala, Assembly code