CODING!
Background for the bioinformatic perspective
Samantha SanfordWill Foran
Overview• Why?
– Examples• Speaking Computer-ise
– How– What– Environment (windows)
• Basic Instructions– Declare– Conditional– Loop– Input
• Write a quiz game• Programming in bioinformatics
– Transcription– Gene Structure– Blast
Why bother?
• Computers do tedious work without complaining
• Computers don’t make arithmetic errors (but they can make a rounding error)
• We want to be lazy• Its fun!
Things computers are good at
• Declare a variable• Define a data structure• Test a condition• Iterate through a loop • Get input
The usefulness of programming• Calculating huge equations accurately – projectile tables– Discrete models
• Searching– Quickest route– Matches to a database– faces in a picture
• Examples?
The usefulness of programming• Calculating huge equations accurately – projectile tables– Discrete models
• Searching– Quickest route– Matches to a database– faces in a picture
• Examples?
Examples in biology
• Make Diagnoses– Identify mutations– Organize and decode test results
(fMRI)• Develop treatments
– Create and parse connected graphs• Research
– Simulating an environment or reaction (protein folding)
– Deriving probabilities of dependent states
What we are going to do
• Transcribe DNA• Identify genomic positions• Parse a BLAST report
Ways to speak
• Piano Roll, Sewing loom punch cards (1800)• FORTRAN (1940s)• C++, PERL (1980s)• Ruby, Python (1990s)• Go, C#, Clojure (2000s)
How to communicate
Write Code
Computer Interprets
ProgrammedResults
What we need for today
• A text editor to store commands (code) in a text (source) file – We will use notepad ++
• A command line to talk to the computer (execute code)– We’ll use cygwin to call Perl on our code
Cygwin Perl
Notepad++
Process• Edit source file• Save
• Execute source file(run)
• View results• Repeat
Process• Edit source file• Save
• Execute source file(run)
• View results• Repeat
Notepad++
Process• Edit source file• Save
• Execute source file(run)
• View results• Repeat
Notepad++
Cygwin Perl
Process• Edit source file• Save
• Execute source file(run)
• View results• Repeat
Notepad++
Cygwin Perl
Launch Cygwin1. Launch Cygwin from the desktop2. Type (all on one line):
hyphen q big o hyphen
wget -qO- http://euler.phys.cmu.edu/wforan1/sams/get.sh | bash
This will retrieve all the code and launch notepad++
Get Organized• Use “hot corners” to organize the command line
and the text editor
Change directory and playIn Cygwin
Change directory to code cd code
List Files ls Execute 00_start_here.pl perl 00_start_here.pl
OR try “tab completion”perl 0push tab button
In notepad Edit file edit between the “ and ”
Save file File->Save OR Ctrl+S
Re-Run perl 00_start_here.pl OR perl 0push tab
Looks like
Change directory and playIn Cygwin
Change directory to code cd code
List Files ls Execute 00_start_here.pl perl 00_start_here.pl
OR try “tab completion”perl 0push tab button
In notepad Edit file edit between the “ and ”
Save file File->Save OR Ctrl+S
Re-Run perl 00_start_here.pl OR perl 0push tab
Looks like
Change directory and playIn Cygwin
Change directory to code cd code
List Files ls Execute 00_start_here.pl perl 00_start_here.pl
OR try “tab completion”perl 0push tab button
In notepad Edit file edit between the “ and ”
Save file File->Save OR Ctrl+S
Re-Run perl 00_start_here.pl OR perl 0push tab
Change directory and playIn Cygwin
Change directory to code cd code
List Files ls Execute 00_start_here.pl perl 00_start_here.pl
OR try “tab completion”perl 0push tab button
In notepad Edit file edit between the “ and ”
Save file File->Save OR Ctrl+S
Re-Run perl 00_start_here.pl OR perl 0push tab
Run Edit and Save Run
Cygwin Notepad++ Cygwin
perl 00_start_here.pl Ctrl+S perl 00_start_here.pl
Important things to remember
• SAVE your changes before RUNNINGevery time you edit your file and want to run it save or the new code will not run (it’s not saved!)
• In a string of text always remember to close your quotation marks: “ bioinformatics” “ ” , ( ) , and { } should exist in pairs
• Always remember the ; semicolon at the end of each command.
Reverse program
• In notepad++, replace the first (and only) line with:print scalar reverse (“TTGAGC”);
•
• What happened?• Why is this not the reverse complement?
Edit and Save Run
Notepad++ Cygwin
Ctrl+S perl 00_start_here.pl
Proper Perl
• We’ve only looked at executing one line code
• We need a bit more “fluff” for more complicated programs
• Lets check out how Perl code usually start
Proper Perl• Open file boiler.pl
Open
Notepad++
Ctrl+O OR File->OpenSelect boiler.pl
Proper Perl• Open file boiler.pl
Open
Notepad++
Ctrl+O OR File->OpenSelect boiler.pl
Proper Perl• Open file boiler.pl
• #! is a “shebang” or “hashbang”– It directs the computer to the interpreter (PERL)
• use tells Perl to use a package. – warnings and strict help report errors
• Statements are ended with ; • anything after a “#” is not seen by the computer (called a comment)
Open
Notepad++
Ctrl+O OR File->OpenSelect boiler.pl
• We’re ready to look at some examples of
– Variables my $variable
– Conditionals if( ){ } else{ }
– Input capture <STDIN>
• Try to determine the output of the follow code
Coding Examples
Variables: code/variable.pl
What is the output?
Run Open Edit and Save Run
Cygwin Notepad++ Notepad++ Cygwin
perl variable.pl OR perl vpush tab
Ctrl+O OR File->Open
Select variable.pl
Ctrl+S perl variable.pl OR push ↑ (up arrow)
Were you right?
Run Open Edit and Save Run
Cygwin Notepad++ Notepad++ Cygwin
perl variable.pl OR perl vpush tab
Ctrl+O OR File->Open
Select variable.pl
Ctrl+S perl variable.pl OR push ↑ (up arrow)
Were you right?
wforan1@ug142 ~/code$ perl variable.plPyrimidine contains 2 nitrogen atoms on its ring(s)
Run Open Edit and Save Run
Cygwin Notepad++ Notepad++ Cygwin
perl variable.pl OR perl vpush tab
Ctrl+O OR File->Open
Select variable.pl
Ctrl+S perl variable.pl OR push ↑ (up arrow)
Change it up
Run Open Edit and Save Run
Cygwin Notepad++ Notepad++ Cygwin
perl variable.pl OR perl vpush tab
Ctrl+O OR File->Open
Select variable.pl
Ctrl+S perl variable.pl OR push ↑ (up arrow)
Change it up
wforan1@ug142 ~/code$ perl variable.plPyrimidine contains 2 nitrogen atoms on its ring(s)wforan1@ug142 ~/code$ perl variable.plPurine contains 4 nitrogen atoms on its ring(s)
Variables: code/length.plWhat is the output?
Run Open Edit and Save RunCygwin Notepad++ Notepad++ Cygwinperl length.pl Ctrl+O length.pl Ctrl+S perl length.pl or ↑
Conditional: code/conditional.pl
Run Open Edit and Save RunCygwin Notepad++ Notepad++ Cygwinperl variable.pl Ctrl+O conditional.pl Ctrl+S perl conditional.pl or ↑
Loops: code/loop2.pl
Run Open Edit and Save RunCygwin Notepad++ Notepad++ Cygwinperl loop2.pl Ctrl+O loop2.pl Ctrl+S perl loop2.pl or ↑
Input: code/in1.pl
Run Open Edit and Save RunCygwin Notepad++ Notepad++ Cygwinperl in1.pl Ctrl+O in1.pl Ctrl+S perl in1.pl or ↑
Exercise 1 – Transcription: DNA→ RNA
• We are going to transcribe DNA to RNA
• This will require – changing the directory in cygwin– Opening a file in a new directory with notepad++– And exploring unknown code
• Listen to the instructions first, then proceed– Visit the URL for more precise directions
http://malaria.phys.cmu.edu/ sams/exercise/3-1.html
Exercise 1 – Transcription: DNA→ RNA1. Run exercise
(go to home directory)– cd ~(go to exercise dir)
– cd exercise/code/dna2rna/ (list files, check for dna2rna.pl)– ls(run exercise1 program)– perl dna2rna.pl
(use this sequence file)• enter sequence.txt
http://malaria.phys.cmu.edu/ sams/exercise/3-1.html
Exercise 1 – Transcription: DNA→ RNA1. Run exercise
(go to home directory)– cd ~(go to exercise dir)
– cd exercise/code/dna2rna/ (list files, check for dna2rna.pl)– ls(run exercise1 program)– perl dna2rna.pl
(use this sequence file)• enter sequence.txt
http://malaria.phys.cmu.edu/ sams/exercise/3-1.html
2. Open exercise file in notepad++• Ctrl+o (file->open);
• Go back one directory,• then click on exercises• then code• then dna2rna• Open dna2rna.pl
Exercise 1 – Transcription: DNA→ RNA1. Run exercise
(go to home directory)– cd ~(go to exercise dir)
– cd exercise/code/dna2rna/ (list files, check for dna2rna.pl)– ls(run exercise1 program)– perl dna2rna.pl
(use this sequence file)• enter sequence.txt
http://malaria.phys.cmu.edu/ sams/exercise/3-1.html
2. Open exercise file in notepad++• Ctrl+o (file->open);
• Go back one directory,• then click on exercises• then code• then dna2rna• Open dna2rna.pl
3. Look for unfamiliar code1. What is unfamiliar2. What does it do
Exercise 2: Gene Structure
• Background– Remember gene structure? (intron, exon, intergenic)
– We will use human X chromosome. • What is significant about this chromosome?
• Requires “print” and using defined variablesmy $variable = “badfadfy9osudlfj”; print “my variable is $variable,”;
http://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
Exercise 2: Gene Structurehttp://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Open http://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Steps 1 and 2 are already completed for you (chrX has been downloaded)
Exercise 2: Gene Structurehttp://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Open http://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Steps 1 and 2 are already completed for you (chrX has been downloaded)
Step 3-4: cd and run
Exercise 2: Gene Structurehttp://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Open http://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Steps 1 and 2 are already completed for you (chrX has been downloaded)
Step 5: open and edit the help messageStep 3-4: cd and run
Exercise 2: Gene Structurehttp://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Open http://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Steps 1 and 2 are already completed for you (chrX has been downloaded)
Steps 6: get position status
Step 5: open and edit the help messageStep 3-4: cd and run
Exercise 2: Gene Structurehttp://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Open http://malaria.phys.cmu.edu/ outreach/exercise/3-2.html
• Steps 1 and 2 are already completed for you (chrX has been downloaded)
Steps 6: get position status
Step 5: open and edit the help messageStep 3-4: cd and run
Step 7: modify output
Exercise 3: Blast• We are going to parse a BLAST report• First, we need to save a BLAST search for
ATATGCGTGCTAGTGCAGTGGGTGGTAGCGTGAATGC to C:\cygwin\home\[username]\exercise\code\blast as blast_report
• Now we can code!
http://malaria.phys.cmu.edu/ outreach/exercise/3-3.html
Exercise 3: Blasthttp://malaria.phys.cmu.edu/ outreach/exercise/3-3.html
3. cd 4. run
5. Print all less than 100%
6. Modify the output to display something different
Write a quiz• Declare variables
– my $numRight = 0; my $userIn;• Print question
– print “What is my favorite color? “;• Get input
– $userIn = <STDIN>; chomp($userIn);• Count correct answers
– if($userIn eq “pink”){ $numRight+=1}• Tell the player how many they got right
– print “You got $numRight correct\n”;• BONUS: Give percent right
– print $numRight/3; • BONUS: Count the number wrong Run Open Edit and Save RunCygwin Notepad++ Notepad++ Cygwinperl quiz.pl Ctrl+O quiz.pl
OR ctrl+N (file->new)Ctrl+S perl quiz.pl or ↑