database search. overview : 1. fasta : is suitable for protein sequence searching 2. blast :...

Post on 13-Dec-2015

227 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

database searchdatabase search

Overview :

1. FastA : is suitable for protein sequence searching

2. BLAST : is suitable for DNA, RNA, protein sequence searching

FastA

History : FastA was developed by Lipman and Pearson in 1985, which is the first database search software.

EBI provides fastA service, available at

http://www.ebi.ac.uk/Tools/fasta/

Idea: identify the short substring matching with the target sequence.

other software

commonly used

http://www.ebi.ac.uk/Tools/sss/

example: protein sequence :EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP

parametersinput

sequence

select database

results

100% identity

17/28=60.7% (identity)28 aa overlap

BLAST

Basic Local Alignment Search Tool (BLAST) .

BLAST was developed by NCBI.

BLAST finds regions of similarity between biological sequences.

Basic BLASTProgram Sequence database Program description

Blastn Nucleotide NucleotideSearch a nucleotide database using a nucleotide

query Algorithms: blastn, megablast, discontiguous megablast

Blastp Protein ProteinSearch protein database using a protein query

Algorithms: blastp, psi-blast, phi-blast, delta-blast

Blastx Nucleotide proteinSearch protein database using a translated

nucleotide query

Tblastn Protein NucleotideSearch translated nucleotide database using a

protein query

Tblastx Nucleotide NucleotideSearch translated nucleotide database using a

translated nucleotide query

T:translation, n: nucleotide, p:protein ; x: cross

BLASTALLBLASTALL

Query Sequence

Amino acid Sequence DNA Sequence

TBLASTxBLASTxBLASTnTBLASTnBLASTp

NucleotideDatabase

ProteinDatabase

NucleotideDatabase

NucleotideDatabase

ProteinDatabase

Translated TranslatedTranslated

Blast source1. NCBI : http://blast.ncbi.nlm.nih.gov/Blast.cgi/ (online

version)

ftp://ftp.ncbi.nih.gov/blast/ (stand alone)

2.other websites : http://life.zsu.edu.cn/blast/

http://www.fruitfly.org/blast/

http://www.mcgb.uestc.edu.cn/blast/blast.html

BLAST

1. online : from website

2. stand alone : download the software

comparison between them web server advantages : 1. easy. 2. update. 4. database download is no need. disadvantages : 1. not suitable for large data. 2. cannot define your own database.

Web Blast provided by NCBIBlastn for nucleotide

Blastp for protein

http://blast.ncbi.nlm.nih.gov/Blast.cgi

An example :1. cctggcgataaccgtcttgtcggcggttgcgctgacgttgcgtcgtgatatcatcagggcAgaccggttacatccccctaa

2.gatcgaaaaacgcttgtgttaaaaatttgctaaattttgccaatttggtaaaacagttgcAtcacaacaggagatagcaat

the first sequence

The second sequence

sequence

range

softwaresimilarity from high to low

results shown in new window

results of pairwise alignment

No significant similarity found

information of the two sequences

parameters selected

Why we need the standalone version of BLAST ?1. specific database

2. privacy

3. batch processing

Blast (standalone version)

Blast (standalone version)

How to download BLAST ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release

blast-2.2.23-ia32-win32.exe

unzip, we can get three folders

bin: all the exe files

data : data for BLAST

doc : readme

We need to format the database for BLAST.

First, save your database as Fasta format;Second, use formatdb provided in BLAST package to

format the database.dos command : formatdb –i sequence.fa –p T/F –o T/F –n db_name

Blast (standalone version)

An example

1. There are 13 proteins in the file “Delta.txt” as the database.

2. 1 protein is selected as the query sequence, and stored in file “seq.txt” ;

1. format Delta.txt :

formatdb –i Delta.txt –p T

parameter :1. –i: database2. –p: T-protein , F-nucleotide

2. search Delta.txt by using BLAST :

Blastall –p blastp –d Delta.txt –i seq.txt –o out.txt

parameter :1. –p: program name : blastp , blastn , blastx , tblastn , tblastx2. –d: database name3. –i: query sequences4. –o: output file

3. To read other parameters just type blastall

4. Results : Score ESequences producing significant alignments: (bits) Value

P83301|CXO_CONVE 69 1e-017P69749|CXD6A_CONBU 20 0.009P69750|CXD6A_CONCN 18 0.036P24159|CXDB_CONTE P18511|CXDA_CONTE 18 0.042P60179|CXD66_CONAA 17 0.066P60513|CXD6A_CONER 17 0.11 P69751|CXD6E_CONCT P69748|CXD6A_CONAI 16 0.19 P69754|CXD6B_CONMA P69753|CXD6A_CONMA 14 0.56 P69752|CXD6B_CONER P58913|CXD6A_CONPU 14 0.62 P69756|CXD6D_CONMA P69755|CXD6C_CONMA 13 0.89 Q9XZK5|CXSO6_CONST P69757|CXD6A_CONSE 12 2.6

>P83301|CXO_CONVE Length = 33

Score = 69.3 bits (168), Expect = 1e-017, Method: Compositional matrix adjust. Identities = 33/33 (100%)

Query: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLPSbjct: 1 EDCIAVGQLCVFWNIGRPCCSGLCVFACTVKLP 33

>P69749|CXD6A_CONBU Length = 27

Score = 20.0 bits (40), Expect = 0.009, Method: Compositional matrix adjust. Identities = 13/30 (43%), Gaps = 6/30 (20%)

Query: 1 EDCIAVGQLCVFWNIGRP CCSGLCVFAC 28 C A G C RP CCS C FACSbjct: 1 DECSAPGAFCLI RPGLCCSEFCFFAC 26

5. pairwise alignment :

bl2seq –p blastp –i seq.txt –j 1.txt –o out.txt

parameter :1.–p: program name : blastp , blastn……2. –i: first sequence3. –j: second sequence 4. –o: output filesTo read other parameter, just type bl2seq

6. database can be downloaded from :

ftp://ftp.ncbi.nih.gov/blast/db/

scoring matrices can be downloaded from :ftp://ftp.ncbi.nih.gov/blast/matrices/

PSI-blast

Position specific iterative BLAST (PSI-

BLAST) .

Altschul et al. (1997). Gapped Blast and PSI-Blast: a new

generation of protein database search programs. Nucleic

Acids Research, 25(17):3389-3402

target: only proteins

PSI-blast Position specific iterative BLAST (PSI-BLAST) refers to a

feature of BLAST 2.0 in which a profile is automatically

constructed from the first set of BLAST alignments. PSI-

BLAST is similar to NCBI BLAST2 except that it uses

position-specific scoring matrices derived during the

search, this tool is used to detect distant evolutionary

relationships.

online source : http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?

page=/NPSA/npsa_psiblast.html

http://blast.ncbi.nlm.nih.gov/Blast.cgi

http://www.ebi.ac.uk/Tools/blastpgp/

top related