database search. overview : 1. fasta : is suitable for protein sequence searching 2. blast :...

35
database search

Upload: emerald-georgina-clark

Post on 13-Dec-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

database searchdatabase search

Page 2: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

Overview :

1. FastA : is suitable for protein sequence searching

2. BLAST : is suitable for DNA, RNA, protein sequence searching

Page 3: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

FastA

History : FastA was developed by Lipman and Pearson in 1985, which is the first database search software.

EBI provides fastA service, available at

http://www.ebi.ac.uk/Tools/fasta/

Idea: identify the short substring matching with the target sequence.

Page 4: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

other software

commonly used

http://www.ebi.ac.uk/Tools/sss/

Page 5: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

example: protein sequence :EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP

parametersinput

sequence

select database

Page 6: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching
Page 7: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

results

100% identity

17/28=60.7% (identity)28 aa overlap

Page 8: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

BLAST

Basic Local Alignment Search Tool (BLAST) .

BLAST was developed by NCBI.

BLAST finds regions of similarity between biological sequences.

Page 9: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

Basic BLASTProgram Sequence database Program description

Blastn Nucleotide NucleotideSearch a nucleotide database using a nucleotide

query Algorithms: blastn, megablast, discontiguous megablast

Blastp Protein ProteinSearch protein database using a protein query

Algorithms: blastp, psi-blast, phi-blast, delta-blast

Blastx Nucleotide proteinSearch protein database using a translated

nucleotide query

Tblastn Protein NucleotideSearch translated nucleotide database using a

protein query

Tblastx Nucleotide NucleotideSearch translated nucleotide database using a

translated nucleotide query

T:translation, n: nucleotide, p:protein ; x: cross

Page 10: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

BLASTALLBLASTALL

Query Sequence

Amino acid Sequence DNA Sequence

TBLASTxBLASTxBLASTnTBLASTnBLASTp

NucleotideDatabase

ProteinDatabase

NucleotideDatabase

NucleotideDatabase

ProteinDatabase

Translated TranslatedTranslated

Page 11: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

Blast source1. NCBI : http://blast.ncbi.nlm.nih.gov/Blast.cgi/ (online

version)

ftp://ftp.ncbi.nih.gov/blast/ (stand alone)

2.other websites : http://life.zsu.edu.cn/blast/

http://www.fruitfly.org/blast/

http://www.mcgb.uestc.edu.cn/blast/blast.html

Page 12: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching
Page 13: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

BLAST

1. online : from website

2. stand alone : download the software

Page 14: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

comparison between them web server advantages : 1. easy. 2. update. 4. database download is no need. disadvantages : 1. not suitable for large data. 2. cannot define your own database.

Page 15: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

Web Blast provided by NCBIBlastn for nucleotide

Blastp for protein

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Page 16: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

An example :1. cctggcgataaccgtcttgtcggcggttgcgctgacgttgcgtcgtgatatcatcagggcAgaccggttacatccccctaa

2.gatcgaaaaacgcttgtgttaaaaatttgctaaattttgccaatttggtaaaacagttgcAtcacaacaggagatagcaat

Page 17: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

the first sequence

Page 18: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

The second sequence

sequence

range

softwaresimilarity from high to low

results shown in new window

Page 19: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

results of pairwise alignment

No significant similarity found

information of the two sequences

parameters selected

Page 20: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

Why we need the standalone version of BLAST ?1. specific database

2. privacy

3. batch processing

Blast (standalone version)

Page 21: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

Blast (standalone version)

How to download BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release

blast-2.2.23-ia32-win32.exe

Page 22: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching
Page 23: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

unzip, we can get three folders

bin: all the exe files

data : data for BLAST

doc : readme

Page 24: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

We need to format the database for BLAST.

First, save your database as Fasta format;Second, use formatdb provided in BLAST package to

format the database.dos command : formatdb –i sequence.fa –p T/F –o T/F –n db_name

Blast (standalone version)

Page 25: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

An example

1. There are 13 proteins in the file “Delta.txt” as the database.

2. 1 protein is selected as the query sequence, and stored in file “seq.txt” ;

Page 26: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

1. format Delta.txt :

formatdb –i Delta.txt –p T

parameter :1. –i: database2. –p: T-protein , F-nucleotide

Page 27: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

2. search Delta.txt by using BLAST :

Blastall –p blastp –d Delta.txt –i seq.txt –o out.txt

parameter :1. –p: program name : blastp , blastn , blastx , tblastn , tblastx2. –d: database name3. –i: query sequences4. –o: output file

Page 28: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

3. To read other parameters just type blastall

Page 29: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

4. Results : Score ESequences producing significant alignments: (bits) Value

P83301|CXO_CONVE 69 1e-017P69749|CXD6A_CONBU 20 0.009P69750|CXD6A_CONCN 18 0.036P24159|CXDB_CONTE P18511|CXDA_CONTE 18 0.042P60179|CXD66_CONAA 17 0.066P60513|CXD6A_CONER 17 0.11 P69751|CXD6E_CONCT P69748|CXD6A_CONAI 16 0.19 P69754|CXD6B_CONMA P69753|CXD6A_CONMA 14 0.56 P69752|CXD6B_CONER P58913|CXD6A_CONPU 14 0.62 P69756|CXD6D_CONMA P69755|CXD6C_CONMA 13 0.89 Q9XZK5|CXSO6_CONST P69757|CXD6A_CONSE 12 2.6

Page 30: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

>P83301|CXO_CONVE Length = 33

Score = 69.3 bits (168), Expect = 1e-017, Method: Compositional matrix adjust. Identities = 33/33 (100%)

Query: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLPSbjct: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33

>P69749|CXD6A_CONBU Length = 27

Score = 20.0 bits (40), Expect = 0.009, Method: Compositional matrix adjust. Identities = 13/30 (43%), Gaps = 6/30 (20%)

Query: 1 EDCIAVGQLCVFWNIGRP CCSGLCVFAC 28 C A G C RP CCS C FACSbjct: 1 DECSAPGAFCLI RPGLCCSEFCFFAC 26

Page 31: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

5. pairwise alignment :

bl2seq –p blastp –i seq.txt –j 1.txt –o out.txt

parameter :1.–p: program name : blastp , blastn……2. –i: first sequence3. –j: second sequence 4. –o: output filesTo read other parameter, just type bl2seq

Page 32: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

6. database can be downloaded from :

ftp://ftp.ncbi.nih.gov/blast/db/

scoring matrices can be downloaded from :ftp://ftp.ncbi.nih.gov/blast/matrices/

Page 33: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

PSI-blast

Position specific iterative BLAST (PSI-

BLAST) .

Altschul et al. (1997). Gapped Blast and PSI-Blast: a new

generation of protein database search programs. Nucleic

Acids Research, 25(17):3389-3402

target: only proteins

Page 34: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

PSI-blast Position specific iterative BLAST (PSI-BLAST) refers to a

feature of BLAST 2.0 in which a profile is automatically

constructed from the first set of BLAST alignments. PSI-

BLAST is similar to NCBI BLAST2 except that it uses

position-specific scoring matrices derived during the

search, this tool is used to detect distant evolutionary

relationships.

Page 35: Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching

online source : http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?

page=/NPSA/npsa_psiblast.html

http://blast.ncbi.nlm.nih.gov/Blast.cgi

http://www.ebi.ac.uk/Tools/blastpgp/