workshop outline part 1: introduction and motivation how does blast work? part 2: blast programs...

61
Workshop OUTLINE Part 1: • Introduction and motivation • How does BLAST work? Part 2: • BLAST programs • Sequence databases • Work Steps • Extract and analyze results

Upload: jessie-johns

Post on 31-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Workshop OUTLINEPart 1:

• Introduction and motivation

• How does BLAST work?

Part 2:

• BLAST programs

• Sequence databases

• Work Steps

• Extract and analyze results

Page 2: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

BLAST programs

2

• All types of searches are possibleQuery: DNA Protein

Database: DNA Protein

blastn – nuc vs. nucblastp – prot vs. protblastx – translated query vs. protein databasetblastn – protein vs. translated nuc. DBtblastx – translated query vs. translated database

Page 3: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Amino acid sequence – most suitable for homology search

• The database and the query can be either nucleotides or amino acids!

• We prefer amino acid sequence:-amino acid sequence is more conserved-20 letter alphabet. Two random hits share 5% identity in average (comparing to 25% in DNA seq).-protein comparison matrices are more sensitive .- protein databases are smaller – less random hits.- we want to conclude about the structure- proteins are much more relevant.

BLAST programs

Page 4: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

• Where? (to find homologues)

• Structural templates- search against the PDB

• Sequence homologues- search against SwissProt or Uniprot (recommended!)

• How many?

• As many as possible, as long as the MSA looks good (next week…)

General Issues

Page 5: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

• How long? (length of homologues)

• Fragments- short homologues (less than 50,60% the query’s length) = bad alignment

• Ensure your sequences exhibit the wanted domain(s)

• N/C terminal tend to vary in length between homologues

• How close? (distance from query sequence)

• All too close- no information

• Too many too far- bad alignment

• Ensure that you have a balanced collection!

General Issues

Page 6: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

• From who? (which species the sequence belongs to)

• Don’t care, all homologues are welcome

• Orthologues/paralogues may be helpful

• Sequences from distant/close species provide different types of information

• Which method? (BLAST/PSI-BLAST)

• Depends on the protein, available homologues, the goal in mind…

General Issues

Page 7: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Sequence databases

Where do we want to search?DNA sequences

• ESTs- no annotated coding sequence pool. the largest pool of sequence data for many organisms (NCBI)

• NR- All GenBank + EMBL + DDBJ + PDB sequences. No longer "non-redundant" due to computational cost.

• Genomes a specific organisms

• RefSeq- mRna or genomic- an annotated collection from NCBI Reference Sequence Project.

• EMBL- Europe's primary nucleotide sequence resource (EBI)• ….

Page 8: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Sequence databases

Where do we want to search?Protein databases:

• PDB- the sequences of proteins for which structures are available

• NR (non-redundant)- Non-redundant GenBank CDS translations + PDB + SwissProt + PIR + PRF, excluding those in env_nr

• RefSeq- sequences from NCBI Reference Sequence project.

• Proteins of a specific organisms

• Uniprot –swissprot or trembl

• ….

Page 9: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Sequence databases

Where do we want to search?

UniProt• UniProt is a collaboration between the

European Bioinformatics Institute (EBI), the Swiss Institute of Bioinformatics (SIB) and the Protein Information Resource (PIR).

• In 2002, the three institutes decided to pool their resources and expertise and formed the UniProt Consortium.

Page 10: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Sequence databases

Where do we want to search?

UniProt• The world's most comprehensive catalog of

information on proteins- Sequence, function & more…

• Comprised mainly of the databases:

– SwissProt – 366226 last year, 412525 protein entries now –high quality annotation, non-redundant & cross-referenced to many other databases.

– TrEMBL - 5708298 last year, 7341751 protein entries now – computer translation of the genetic information from the EMBL Nucleotide Sequence Database many proteins are poorly annotated since only automatic annotation is generated

Page 11: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Overall work steps

1.Run the search- 1. Select database2. E-value threshold3. BLAST or PSI-BLAST- how many rounds?

2.Take out sequences1. HSP or full sequences2. Can (should!) filter out redundant and sequences

that are too short (fragments)

3. Usually- align sequences- choose alignment program

4.View alignment with BioEdi tor another program

5.Calculate trees, conservatino scores (conseq) etc…

Page 12: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Multiple Sequence Alignment (MSA)

Overall work steps

• Perform alignment of a large collection of sequences

• Many algorithms, leading ones:

1. ClustalW2. MUSCLE3. T-COFFEE

Page 13: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Examining BaliBase 2005…

Edgar, R.C., 2004

MUSCLE is superior!

Overall work steps

Page 14: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

BLAST NCBI

Page 15: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

BLAST NCBI

• All program types

• Many databases to chose from, both nucleotide and protein

• 12 genome-specific databases

• Can also look for conserved domain, SNPs and more…

The well-known serverhttp://blast.ncbi.nlm.nih.gov/Blast.cgi

Page 16: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

BLASTp

BLAST NCBI

Page 17: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

QuerySequenc

e

Database

Run

BLAST NCBI

BLASTp

Page 18: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

As many as possible

Matrix

BLAST NCBI

Evalue

Page 19: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

Mark all

Mark onlywanted

BLAST NCBI

Page 20: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

BLAST NCBI

Page 21: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

BLAST NCBI

Page 22: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

BLAST EBI

Page 23: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ebi.ac.uk/blastall/index.html

Many databases,including UniProt

Insert sequenc

e

RUN

Get maximum number of alignments!

BLAST EBI

Page 24: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ebi.ac.uk/blastall/index.html

Send sequences

to ClustalW

Mark all or wanted

Get sequences

BLAST EBI

Page 25: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

PSI-BLAST

Page 26: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

QuerySequenc

e

Database

Run

PSI-BLAST NCBI

PSI-BLAST

Page 27: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

Pre-calculated PSSM

PSI-BLAST NCBI

Threshold for inclusionin PSSM

Page 28: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ncbi.nlm.nih.gov/blast/Blast.cgi

PSI-BLAST NCBI

Run next round

Include sequence in the PSSM

Not found inprevious round

Page 29: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

http://www.ebi.ac.uk/blastpgp/

QuerySequenc

e

Database

Run

PSI-BLAST EBI

Number of iterations

Page 30: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

(PSI-)BLAST on ConSeq, extract sequence & align

Page 31: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

PSI-BLAST on ConSeq

The ConSeq webserver

• Calculates evolutionary conservation scores that are than displayed on the sequence.

• Requires a Multiple Sequence Alignment (MSA)- if nor provided, can create one automatically

• Runs (PSI-)BLAST, extracts hits from the BLAST results, filters according to e-value and aligns the sequences.

Page 32: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

PSI-BLAST on ConSeq

The ConSeq webserver-http://conseq.tau.ac.il/

Page 33: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

PSI-BLAST on ConSeq

The ConSeq webserver-http://conseq.tau.ac.il/

Query sequence

Email

Page 34: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

PSI-BLAST on ConSeq

The ConSeq webserver-http://conseq.tau.ac.il/

Alignment

algorithmDatabase

- swissprot or uniprot

No. of homologue

sIterations

E-value

Page 35: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

PSI-BLAST on ConSeq

The ConSeq webserver-http://conseq.tau.ac.il/

Page 36: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

PSI-BLAST on ConSeq

The ConSeq webserver-http://conseq.tau.ac.il/

All BLAST hits

MSA

Page 37: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Summary of web servers:

1. PSI-BLAST at NCBI-- Can control PSSM, included sequences & threshold- All types of BLAST programs- Not against UniProt- SwissProt or NR- Against RefSeq and NT- Full sequences downloaded like BLAST- Number of sequences up to 2000

NCBI vs. EBI vs. ConSeq

Page 38: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Summary of web servers:

2. BLAST at EBI – - Against UniProt or EMBL, not NR or specific genomes- Can’t control PSSM- just get last round

- Download and align only full sequences - The number of presented sequences is limited to 500- blastN, blastP, tblastN, tblastX

NCBI vs. EBI vs. ConSeq

Page 39: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Summary of web servers:

3. BLAST at ConSeq – • Get HSPs, not entire sequences!!!• Only blastP• Search uniprot/swissprot• Still, can’t control all options… such as redundancy and

minimal length of HSP

NCBI vs. EBI vs. ConSeq

Page 40: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

(PSI-)BLAST via Max-Planck

Page 41: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

(PSI-)BLAST via Max-PlanckRun (PSI-) BLAST

Send HSP or full sequences to an

alignment program

Forward HSP to filtrationvia “BLAMMER”

Download filtered sequences

Align the sequences via program of

choice

Page 42: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

(PSI-)BLAST via Max-Planck

BLAST at Max-Planchttp://toolkit.tuebingen.mpg.de/sections/search

• Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA.

• All BLAST programs

• Main advantage- you can easily extract and filter the HSPs, on top of full sequences.

Page 43: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

The Query Protein

Name: Dihydrodipicolinate reductase

Enzyme reaction:

Molecular process: Lysine biosynthesis (early stages)

Organism: E. coli

Sequence length: 273 aa

Page 44: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Query:DAPB_ECOLI

<DAPB_ECOLIMHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAVKDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLLEKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATVRAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL

The Query Protein

Page 45: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

(PSI-)BLAST via Max-Planckhttp://toolkit.tuebingen.mpg.de/psi_blast/

Choose database or databases

(selecting a few using CTRL)

Upload sequenceor MSA

Page 46: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

(PSI-)BLAST via Max-Planc

Save PSi-BLAST result

Page 47: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

(PSI-)BLAST via Max-Planck

E-value threshold can be assessed using the distribution

Page 48: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Filter Results via Max-Planck

Forward results to BLAMMER

Page 49: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

BLAMMER

• Suppose to create MSAs from BLAST results, we will use it

just to filter the results and then align them via MUSCLE or

another known MSA program.

• Filter according to:• E-value• Min. coverage- min. percent of the query protein• Max. redundancy- extract similar sequences• Max. number of homolgoues- if wanted

Filter Results via Max-Planck

http://toolkit.tuebingen.mpg.de/blammer/

Page 50: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Filter Results via Max-Planckhttp://toolkit.tuebingen.mpg.de/blammer

Forwarded PSI-BLAST

result

Filtering parameters

Page 51: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Filter Results via Max-Planck

Save & thenre-align!

Page 52: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Align the BLAST sequences

Page 53: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Align via Max-Planck

http://toolkit.tuebingen.mpg.de/sections/alignment

Page 54: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

1.Forward BLAST to MUSCLE, MAFFT etc...

Choose program

Use hits or full sequences

Align via Max-Planck

Page 55: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Align via Max-Planck

2. Filter via BLAMMER and then ALIGN:

Upload the results of the BLAMMER – downloaded

file

Page 56: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Align via Max-Planck

Alignment results:

Save the alignment

Page 57: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Alignmen viewing & editingBioEdit

• http://www.mbio.ncsu.edu/BioEdit/BioEdit.html

• Easy-to-use sequence alignment editor

• View and manipulate alignments up to 20,000 sequences. •Four modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor.

•Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats.  Also reads GCG and Clustal formats

Page 58: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Easiest Using Bioedit

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

Alignmen viewing & editing

Page 59: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Easiest Using Bioedit

http://www.mbio.ncsu.edu/BioEdit/bioedit.html

• Find a specific sequence: “Edit-> search -> in titles”

• Erase\add sequences: “Edit-> cut\paste\delete sequence”

• “Sequence Identity matrix” under “Alignment”- useful for a rough evaluation of distances within the alignment.

• After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps.

• Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP”

Alignmen viewing & editing

Page 60: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

Each sequence is a different story

adjust parameters:

• BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap…

• PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds…

• Try using HSP or full sequences, different MSA programs…

No “Miracle solution”

Page 61: Workshop OUTLINE Part 1: Introduction and motivation How does BLAST work? Part 2: BLAST programs Sequence databases Work Steps Extract and analyze results

THANKS

Some slides were taken from previous presentations by members of the Pupko lab and Prof. Beni Chor