1 bioperl. 2 object oriented programming continued – bioperl install

41
1 BioPerl

Upload: cameron-karin-woods

Post on 28-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

1

BioPerl

Page 2: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

2

Object Oriented Programming Continued – BioPerl Install

Page 3: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

3

System Configfile: .ncbirc[NCBI]Data=~/blast/blast-2.2.6/data# this tells blast applications where certain data files are located – such as

BLOSUM matrices

file: .cshrc (for tcsh/csh)

# this line tells blast applications where to look for pre-formatted DB'ssetenv BLASTDB ~/blast/blast-2.2.6/blastdb# this line is a plain environment variable – adding the blast applications to the

"path" variableset path = ( $path ~/blast/blast-2.2.6 )

Exampleblastall –p blastn –i seq –d yeast.nt

Page 4: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

4

Example – Bio::SearchIO#!/usr/bin/perl# blastPars.pl# taken almost verbatim from perldoc Bio::SearchIO

use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-file => 'bmp4.out'); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) {

print "Hit= ", $hit->name, ",Length=", $hsp->length('total'), ",Percent_id=", $hsp->percent_identity, ",hit_string=", $hsp->hit_string, "\n";

} } }

Page 5: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

5

Hit= mm3_dna,Length=14,Percent_id=100,hit_string=ttaattgtaattttHit= mm3_dna,Length=13,Percent_id=100,hit_string=cttccctcctcccHit= mm3_dna,Length=13,Percent_id=100,hit_string=ggcaataacaccaHit= mm3_dna,Length=12,Percent_id=100,hit_string=ccttttaggccaHit= mm3_dna,Length=12,Percent_id=100,hit_string=tgttttaatcatHit= mm3_dna,Length=12,Percent_id=100,hit_string=gttattttgtttHit= mm3_dna,Length=12,Percent_id=100,hit_string=tccttctcttttHit= mm3_dna,Length=12,Percent_id=100,hit_string=taaactgttaaaHit= mm3_dna,Length=11,Percent_id=100,hit_string=caaaaggaggaHit= mm3_dna,Length=11,Percent_id=100,hit_string=tcaaagtaaat

Page 6: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

HW – Part 2

6

Page 7: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

7

Installing bioperlCondensed instructions for installing bioperl on CSS (aka ICAEN): (see also INSTALL

that comes with bioperl)First – look at www.cpan.org

0) perl –versionNote, this will only work with perl version 5.6.0 or higher. (so on CSS, use NX-client

download bioperl-1.4.tar.gz (~3 Meg, from www.bioperl.org/DIST, OR www.cpan.org/modules/01modules.index.html, or ftp.cpan.org in /pub/CPAN/modules/by-module/Bio)

1) ftp ftp.cpan.org1.1) bin1.2) cd pub/CPAN/modules/by-module/Bio1.3) get bioperl-1.*.*.tar.gz #bioperl-1.4.tar.gz1.4) quit

2) gunzip bioperl-1.4.tar.gz3) tar -xvf bioperl-1.4.tar ( this is approximately 13 meg)

Page 8: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

8

Installing bioperlCondensed instructions for installing bioperl on CSS (aka ICAEN): (see also INSTALL

that comes with bioperl)First – look at www.cpan.org

0) perl –versionNote, this will only work with perl version 5.6.0 or higher. (so on CSS, use NX-client

download bioperl-1.4.tar.gz (~3 Meg, from www.bioperl.org/DIST, OR www.cpan.org/modules/01modules.index.html, or ftp.cpan.org in /pub/CPAN/modules/by-module/Bio)

1) ftp ftp.cpan.org1.1) bin1.2) cd pub/CPAN/modules/by-module/Bio1.3) get bioperl-1.*.*.tar.gz #bioperl-1.4.tar.gz1.4) quit

2) gunzip bioperl-1.4.tar.gz3) tar -xvf bioperl-1.4.tar ( this is approximately 13 meg)

Page 9: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

9

Installing bioperl

3.5) mkdir ~/perl3.6) mkdir ~/perl/bioperl

3.8) cd bioperl-1.4

4) perl Makefile.PL PREFIX=~/perl/bioperl (if you do it this way -- the "lib" won't work)

make make test make install (see installing in private space on next slides)

To uninstall, just delete ~/perl/bioperl and ~/perl/bioperl-1.4

Page 10: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

10

5) To use:

#!/usr/local/bin/perl

use lib "~/perl/bioperl/"; # this is supposed to work ,but did NOT on CSS use Bio::SearchIO;

Instead, set environment variable:

Bash 5.1) PERL5LIB=~/perl/bioperl/lib/perl5/site_perl/5.10.0; export PERL5LIBCsh 5.1) setenv PERL5LIB ~/perl/bioperl/lib/perl5/site_perl/5.10.0

Mac (bash) 5.1) PERL5LIB=~/perl/bioperl/lib/perl5/site_perl; export PERL5LIB

6) To make docs work (I would just put this in your .cshrc file:Bash: PATH=$PATH:~/perl/bioperl/lib/site_perl/5.10.10; export PATHCsh: set path = ($path ~/perl/bioperl/lib/site_perl/5.10.0)

Test with: perldoc Bio::SearchIO

7) Test with sample program

FINALLY, please note that the version numbers change over time, and the actual paths may very a little between CPAN and/or bioperl.org

It make take some trial and error (it usually does for me).

Page 11: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

11

Try it – Bio::SearchIO#!/usr/bin/perl# blastPars.pl# taken almost verbatim from perldoc Bio::SearchIO

use strict; use Bio::SearchIO; my $in = new Bio::SearchIO(-file => 'bmp4.out'); while( my $result = $in->next_result ) { while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) {

print "Hit= ", $hit->name, ",Length=", $hsp->length('total'), ",Percent_id=", $hsp->percent_identity, ",hit_string=", $hsp->hit_string, "\n";

} } }

Page 12: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

12

INSTALLING BIOPERL IN A PERSONAL OR PRIVATE MODULE AREA

If you lack permission to install perl modules into the standard site_perl/ system area you can configure bioperl to install itself anywhere you choose. Ideally this would be a personal perl directory or standard place where you plan to put all your 'local' or personal perl modules.

Note: you _must_ have write permission to this area.

Simply pass a parameter to perl as it builds your system specific makefile.

Example:

perl Makefile.PL LIB=/home/users/dag/My_Local_Perl_Modules make make test make install

This tells perl to install bioperl in the desired place, e.g.: /home/users/dag/My_Perl_Modules/Bio/Seq.pm

Then in your Bioperl script you would write (NOTE ~/dag/My_Local_Perl_Modules will NOT work):

use lib "/home/users/dag/My_Local_Perl_Modules"; use Bio::Seq;

To see "perldoc Bio::SearchIO -- you would need to be in directory ~/dag/My_Local_Peral_Modules

Page 13: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

13

SearchIO.pm

http://www.bioperl.org/wiki/HOWTO:SeqIO

References:

http://bioperl.org/

http://bioperl.org/Core/Latest/faq.html

http://cpan.org

Page 14: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

14

More notes on bioperl: Windowshttp://bioperl.org/SRC/bioperl-live/INSTALL.WIN

1) Quick instructions for the impatient, lucky, or experienced user.==========================================Download the ActivePerl MSI from http://www.activestate.com/Products/ActivePerl/Run the ActivePerl Installer (accepting all defaults is fine).

Open a command prompt (Menus Start->Run and type cmd) and run the PPM shell (C:\>ppm).Add two new PPM repositories with the following commands: ppm> rep add Bioperl http://bioperl.org/DIST ppm> rep add Kobes http://theoryx5.uwinnipeg.ca/ppms ppm> rep add Bribes http://www.Bribes.org/perl/ppm

Install Bioperl with the following commands:

ppm> search Bioperl

This returns a numbered list of packages with corresponding version numbers etc. with "Bioperl" in their name.

ppm> install <number>

Where <number> corresponds to the relevant package and version from the numbered list obtained above.

Go to http://www.bioperl.org and start reading documentation.

Page 15: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

15

Another way

"cpan"

ppm

Page 16: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

16

Windows blast binaries?

ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/blast-2.2.14-ia32-win32.exe

Page 17: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

17

Bioperl

• large collection of Perl modules (extensions to the Perl language) that aid in the task of writing Perl code

• assists with sequence data and associated annotation

• access to various types of databases remote (GenBank, EMBL etc) and

• local (MySQL, flat files, GFF etc.) for storage and retrieval of sequences.

• associated documentation and mailing list (community of bioinformaticists)

Page 18: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

18

Bioperl• "most" bioinformatics and computational biology

applications are developed in Unix/Linux environments

• more and more programs are being ported to other operating systems like Windows, and many users (often biologists with little background in programming) are looking for ways to automate bioinformatics analyses in the Windows environment.

• Perl and Bioperl can be installed natively on Windows NT/2000/XP.

• Most of the functionality of Bioperl is available with this type of install

Page 19: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

19

Bioperl• Some programs (BLAST for example) have been ported to Windows. • These can be installed and work quite happily with Bioperl in the native Windows environment.

• fairly simple project OR only have access to a computer running Windows, and/or don't mind bumping up against some limitations then Bioperl on Windows may be a good place for you to start.

• example, downloading a bunch of sequences from GenBank and sorting out the ones that have a particular annotation or feature works great.

• • Running a bunch of your sequences against remote or local BLAST, parsing the output and

storing it in a MySQL database would be fine also.

• Be aware that most Bioperl developers are working in some type of a UNIX environment (Linux, OSX, Cygwin).

• If you have problems with Bioperl that are specific to the Windows environment, you may be blazing new ground and your pleas for help on the Bioperl mailing list may get few responses - simply because no one knows the answer to your Windows specific problem.

• One solution to this problem that will keep you working on a Windows machine it to install Cygwin, a UNIX emulation environment for Windows.

Page 20: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

20

Bioperl

• Perl is a programming language that has been extended a lot by the addition of external modules.

• These modules work with the core language to extend the functionality of Perl.

• Bioperl is one such extension to Perl. • These modular extensions to Perl sometimes depend on the

functionality of other Perl modules and this creates a dependency.• Some Perl modules are so fundamentally useful that the Perl

developers have included them in the core distribution of Perl - if you've installed Perl then these modules are already installed

Page 21: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

21

Bioperl

• Bioperl is actually a large collection of Perl modules (over 1000 currently) and these modules are split into six packages.

Page 22: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

22

Bioperl Bioperl Group Functions -----------------------------------------------------------------

bioperl (the core) Most of the main functionality of Bioperl.

bioperl-run Wrappers to a lot of external programs.

bioperl-ext Interaction with some alignment functions and the Staden package.

bioperl-db Using bioperl with BioSQL and local relational databases.

bioperl-microarray Microarray specific functions.

bioperl-gui Some preliminary work on a graphical user interface to some Bioperl functions.

Page 23: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

23

Miscellaneous

Various commands and techniques that did not make it into other sections.

Useful as a review

Valuable (I've used them)

Page 24: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

24

splitsplit /PATTERN/, EXPR, LIMITsplit /PATTERN/, EXPRsplit /PATTERN/split

-- returns an array of strings-- scans the string EXPR-- splits the EXPR string into a list of substrings by delimiters-- delimiters are defined by repeated pattern matching of the regular expression PATTERN-- if it doesn't match, the whole string (EXPR) is returned-- if it matches once, you get 2 strings, etc.-- if PATTERN is omitted, it splits on whitespaces after omitting leading whitespaces (/\s+/-- if EXPR is omitted, it splits $_-- If LIMIT is specified, it splits the string into NO MORE than that many fields

Page 25: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

25

Examples

@words = split ' ', $text;@tokens = split /[ |,]+/, $text;

($login, $passwd, $remainder) = split /:/, $_, 3;

# this splits on ":" – and assigns the first 2 to variables, then the rest is stored in $remainder because of the limit (3)

Page 26: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

26

Split Examples#!/usr/bin/perl

$text = "this is a test";@words = split ' ',$text;foreach (@words) { print "$_\n";}# this is a test (on separate lines)

$text = "this is another, simple test";@words = split /[ |,]+/,$text;foreach (@words) { print "$_\n";}# this is another simple test (on separate lines)

Page 27: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

27

splice

splice ARRAY, OFFSET, LENGTH, LISTsplice ARRAY, OFFSET, LENGTHsplice ARRAY, OFFSET

-- removes the elements designated by OFFSET and LENGTH, from ARRAY, and replaces them with LIST-- if LENGTH is omitted, then everything after OFFSET is removed-- returns the elements removed

Page 28: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

28

splice examples

#!/usr/bin/perl

@array = qw/one two 3 4 5 six seven/;@middle = ("three", "four", "five");

print "@array\n";# one two 3 4 5 six sevensplice @array, 2, 3, @middle;print "@array\n";# one two three four five six seven

Page 29: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

29

glob

glob EXPR

-- returns the file name expansions of EXPR

@files = glob "*";

@files = glob ".* *"; #multiple patterns separated by spaces

@files = glob "*.pl";

Page 30: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

30

system

system LIST-- executes any program on the system

#!/usr/bin/perl

system("/mnt/r0-blastdb/blast-bin/blastall –p blastn –d /mnt/r0-blastdb/FormattedDBs/nt – i test.txt –o test.out");

.

Page 31: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

31

back ticks

#!/usr/bin/perl

$output = `/mnt/r0-blastdb/blast-bin/blastall –p blastn –d /mnt/r0-blastdb/FormattedDBs/nt – i test.txt –o test.out`;

print "$output\n";

# command is passed on, and interpreted by the shell

# output of command returned

Page 32: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

Slides Deprecated

32

Page 33: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

33

New Version of BPlite

• BPlite has actually been "deprecated"– this means that its functionality has been

replaced by something else– the code is still available and included, but will

not be supported by future versions

• Replaced by SearchIO

perldoc Bio::Tools::BPliteperldoc Bio::SearchIO

Page 34: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

34

OOP used extensively in BioPerl

A subject is a BLAST hit, which should not be confused with an HSP (below).

A BLAST hit may have several alignments associated with it. A useful way of thinking about it is that a subject is "analogous" to a

gene and HSPs are "analogous" to exons. Subjects have one attribute (name) and one method (nextHSP).An HSP is a high scoring pair, or simply an alignment.

Look at:perldoc Bio::Tools::BPlite

Page 35: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

35

BPlite Example – what it looks like to use OOP

$report->query;$report->database;while(my $sbjct = $report->nextSbjct) { $sbjct->name; while (my $hsp = $sbjct->nextHSP) { print "querySeq ".$hsp->querySeq."\n"; print "sbjctSeq ".$hsp->sbjctSeq."\n"; print "homologySeq ".$hsp->homologySeq."\n"; }}

Page 36: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

use Bio::Tools::BPlite;

$blast_file = "Chr16.0.out";

my $report = new Bio::Tools::BPlite('-file' => $blast_file);

$rp = $report->query;

$db = $report->database;

while(my $sbjct = $report->nextSbjct)

{

$sbjct->name;

while (my $hsp = $sbjct->nextHSP)

{

print "substart ".$hsp->subject->start."\n";

print "subjectend ".$hsp->subject->end."\n";

}

}

36

Page 37: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

37

#!/usr/bin/perl -w##Input: file_name (blast results file from RPS-BLAST)#Output: list of domain locations relative to database sequence,# and perhaps the genomic sequence with domains emphasized## Note I broke 5.8.0 in:# /usr/lib/perl5/site_perl # to mimic the fact that students probably do not have bio_perl

installed

use Bio::Tools::BPlite;use Getopt::Long;

if($#ARGV != 3) { die "usage: domainID.pl -bf file -c cutoff_value(0.001)\n";}

&GetOptions("bf=s" => \$blast_file, "c=s" => \$cutoff);

my $report = new Bio::Tools::BPlite('-file' => $blast_file);#open(FH,$blast_file);#my $report = new Bio::Tools::BPlite('-fh' => \*FH);

$rp = $report->query;print "rp = $rp\n";$db = $report->database;print "db = $db\n";

while(my $sbjct = $report->nextSbjct) { $sbjct->name; while (my $hsp = $sbjct->nextHSP) { #print "score ".$hsp->score."\n"; #print "bits ".$hsp->bits."\n";

#print "percent ".$hsp->percent."\n"; #print "P ".$hsp->P."\n"; #print "match ".$hsp->match."\n"; #print "positive ".$hsp->positive."\n"; #print "length ".$hsp->length."\n"; #print "querySeq ".$hsp->querySeq."\n"; #print "sbjctSeq ".$hsp->sbjctSeq."\n"; #print "homologySeq ".$hsp->homologySeq."\n";

if($hsp->P <= $cutoff) { print "subjectseqname ".$hsp->subject->seqname."\n"; print "qstart ".$hsp->query->start."\n"; print "qend ".$hsp->query->end."\n"; print "e = ".$hsp->P."\n"; #print "percent = ".$hsp->percent."\n"; #What is this??? print "match = ".$hsp->match."\n"; print "positive = ".$hsp->positive."\n"; print "length = ".$hsp->length."\n"; #print "NT query start ".(3*$hsp->query->start)." (assuming protein

input)\n"; #print "NT query qend ".(3*$hsp->query->end)."\n"; #print "querySeq ".$hsp->querySeq."\n"; print "\n"; } #print "substart ".$hsp->subject->start."\n"; #print "subjectend ".$hsp->subject->end."\n"; #print "subjectseqname ".$hsp->subject->seqname."\n";

# $hsp->subject->overlaps($exon); } } # the following line takes you to the next report in the stream/file # it will return 0 if that report is empty, # but that is valid for an empty blast report. # Returns -1 for EOF.

Page 38: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

38

Sample "look" file rp = Random Sequence 500 10 (500 letters) db = blastdb/yeast.nt subjectseqname gi|6324971|ref|NC_001148.1| Saccharomyces cerevisiae chromosome XVI, complete chromosome sequence qstart 1qend 16e = 1.2match = 16positive = 16length = 16

subjectseqname gi|6324971|ref|NC_001148.1| Saccharomyces cerevisiae chromosome XVI, complete chromosome sequence qstart 321qend 335e = 4.7match = 15positive = 15length = 15

Page 39: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

39

./domainID.pl -bf bmp4.out -c 10

-------------------- WARNING ---------------------MSG: SeqFeatureI::seqname() is deprecated. Please use seq_id() instead.---------------------------------------------------subjectseqname mm3_dna range=chr14:37718906-37723909 5'pad=0 3'pad=0 revComp=FALSE

strand=? repeatMasking=none qstart 1qend 278e = 1e-110match = 258positive = 258length = 278

-------------------- WARNING ---------------------MSG: SeqFeatureI::seqname() is deprecated. Please use seq_id() instead.---------------------------------------------------subjectseqname mm3_dna range=chr14:37718906-37723909 5'pad=0 3'pad=0 revComp=FALSE

strand=? repeatMasking=none qstart 707qend 818e = 3e-32match = 102positive = 102length = 112

Page 40: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

40

End

Page 41: 1 BioPerl. 2 Object Oriented Programming Continued – BioPerl Install

41

Notes

• Note to self – exploring the percent identity, gapped, and non-gapped would be a great assignment

• requires random sequence, alignment (clustalw)