in-silico characterization of proteins

Upload: sumera120488

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 In-silico characterization of proteins

    1/2

    In-silico characterization of proteins

    BLAST: In bioinformatics, Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing

    primary biological sequence information, such as the amino-acid sequences of different proteins or the

    nucleotides of DNA sequences. A BLAST search enables a researcher to compare a query sequence

    with a library or database of sequences, and identify library sequences that resemble the query

    sequence above a certain threshold. Different types of BLASTs are available according to the querysequences. For example, following the discovery of a previously unknown gene in the mouse, a

    scientist will typically perform a BLAST search of the human genome to see if humans carry a similar

    gene; BLAST will identify sequences in the human genome that resemble the mouse gene based on

    similarity of sequence. The BLAST program was designed by Eugene Myers, Stephen Altschul, Warren

    Gish, David J. Lipman, and Webb Miller at the NIH and was published in the Journal of Molecular

    Biology in 1990

    CDD search:Conserved Domain Database (CDD)is a protein annotation resource that consists of

    a collection of well-annotated multiple sequence alignment models for ancient domains and full-length

    proteins. These are available as position-specific score matrices (PSSMs) for fast identification of

    conserved domains in protein sequences viaRPS-BLAST.CDD contentincludesNCBI-curated domains,

    which use3D-structureinformation to explicitly to define domain boundaries and provide insights

    intosequence/structure/function relationships, as well as domain models imported from anumber ofexternal source databases(Pfam,SMART,COG,PRK,TIGRFAM).

    PFAM: The Pfam database is a large collection of protein families, each represented by multiple

    sequence alignments and hidden Markov models (HMMs). Proteins are generally composed of one or

    more functional regions, commonly termed domains. Different combinations of domains give rise to

    the diverse range of proteins found in nature. The identification of domains that occur within proteins

    can therefore provide insights into their function. There are two components to Pfam: Pfam-A and

    Pfam-B. Pfam-A entries are high quality, manually curated families. Although these Pfam-A entries

    cover a large proportion of the sequences in the underlying sequence database, in order to give a

    more comprehensive coverage of known proteins we also generate a supplement using

    theADDAdatabase. These automatically generated entries are called Pfam-B. Although of lower

    quality, Pfam-B families can be useful for identifying functionally conserved regions when no Pfam-A

    entries are found. Pfam also generates higher-level groupings of related families, known as clans. Aclan is a collection of Pfam-A entries which are related by similarity of sequence, structure or profile-

    HMM.

    TMHMM: A variety of tools are available to predict the topology of transmembrane proteins. To date

    no independent evaluation of the performance of these tools has been published. A better

    understanding of the strengths and weaknesses of the different tools would guide both the biologist

    and the bioinformatician to make better predictions of membrane protein topology.

    SignalP: SignalP 4.0 server predicts the presence and location of signal peptide cleavage sites in

    amino acid sequences from different organisms: Gram-positive prokaryotes, Gram-negative

    prokaryotes, and eukaryotes. The method incorporates a prediction of cleavage sites and a signal

    peptide/non-signal peptide prediction based on a combination of several artificial neural networks.

    STRING: STRING is a database of known and predicted protein interactions. The interactions include

    direct (physical) and indirect (functional) associations; they are derived from four sources i.e.Genomic context, high throughput experiments, coexpression, previous knowledge. STRING

    quantitatively integrates interaction data from these sources for a large number of organisms, and

    transfers information between these organisms where applicable. The database currently covers

    5214234 proteins from 1133 organisms.

    PROTPARAM:ProtParam(References/Documentation) is a tool which allows the computation of

    various physical and chemical parameters for a given protein stored in Swiss-Prot or TrEMBLor for a

    user entered sequence. The computed parameters include the molecular weight, theoretical pI, amino

    http://bioinformatictools.wordpress.com/2012/03/27/in-silico-characterization-of-proteins/http://bioinformatictools.wordpress.com/2012/03/27/in-silico-characterization-of-proteins/http://blast.ncbi.nlm.nih.gov/Blast.cgihttp://blast.ncbi.nlm.nih.gov/Blast.cgihttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://www.ncbi.nlm.nih.gov/cdd/http://www.ncbi.nlm.nih.gov/cdd/http://www.ncbi.nlm.nih.gov/cdd/http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CD_PSSMhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CD_PSSMhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CD_PSSMhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#RPSBWhathttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#RPSBWhathttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#RPSBWhathttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSourcehttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSourcehttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSourcehttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_NCBI_curatedhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_NCBI_curatedhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_NCBI_curatedhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#Include3DStructhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#Include3DStructhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#Include3DStructhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domainshttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domainshttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domainshttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_externalhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_externalhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_externalhttp://pfam.sanger.ac.uk/http://pfam.sanger.ac.uk/http://pfam.sanger.ac.uk/http://smart.embl-heidelberg.de/http://smart.embl-heidelberg.de/http://smart.embl-heidelberg.de/http://www.ncbi.nlm.nih.gov/COG/new/http://www.ncbi.nlm.nih.gov/COG/new/http://www.ncbi.nlm.nih.gov/COG/new/http://www.ncbi.nlm.nih.gov/proteinclusters/http://www.ncbi.nlm.nih.gov/proteinclusters/http://www.ncbi.nlm.nih.gov/proteinclusters/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://pfam.sanger.ac.uk/http://pfam.sanger.ac.uk/http://ekhidna.biocenter.helsinki.fi/sqgraph/pairsdb/index_htmlhttp://ekhidna.biocenter.helsinki.fi/sqgraph/pairsdb/index_htmlhttp://ekhidna.biocenter.helsinki.fi/sqgraph/pairsdb/index_htmlhttp://www.cbs.dtu.dk/services/TMHMM/http://www.cbs.dtu.dk/services/TMHMM/http://www.cbs.dtu.dk/services/SignalP/http://string-db.org/http://string-db.org/http://web.expasy.org/protparam/http://web.expasy.org/protparam/http://web.expasy.org/protparam/protpar-ref.htmlhttp://web.expasy.org/protparam/protpar-ref.htmlhttp://web.expasy.org/protparam/protpar-ref.htmlhttp://web.expasy.org/protparam/protparam-doc.htmlhttp://web.expasy.org/protparam/protparam-doc.htmlhttp://web.expasy.org/protparam/protparam-doc.htmlhttp://www.uniprot.org/http://www.uniprot.org/http://www.uniprot.org/http://web.expasy.org/protparam/protparam-doc.htmlhttp://web.expasy.org/protparam/protpar-ref.htmlhttp://web.expasy.org/protparam/http://string-db.org/http://www.cbs.dtu.dk/services/SignalP/http://www.cbs.dtu.dk/services/TMHMM/http://ekhidna.biocenter.helsinki.fi/sqgraph/pairsdb/index_htmlhttp://pfam.sanger.ac.uk/http://www.jcvi.org/cms/research/projects/tigrfams/overview/http://www.ncbi.nlm.nih.gov/proteinclusters/http://www.ncbi.nlm.nih.gov/COG/new/http://smart.embl-heidelberg.de/http://pfam.sanger.ac.uk/http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_externalhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#NCBI_curated_domainshttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#Include3DStructhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSource_NCBI_curatedhttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CDSourcehttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#RPSBWhathttp://www.ncbi.nlm.nih.gov/Structure/cdd/cdd_help.shtml#CD_PSSMhttp://www.ncbi.nlm.nih.gov/cdd/http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtmlhttp://blast.ncbi.nlm.nih.gov/Blast.cgihttp://bioinformatictools.wordpress.com/2012/03/27/in-silico-characterization-of-proteins/
  • 7/30/2019 In-silico characterization of proteins

    2/2

    acid composition, atomic composition, extinction coefficient, estimated half-life, instability index,

    aliphatic index and grand average of hydropathicity (GRAVY)

    PROSITE:Search your query sequence for protein motifs, rapidly compare your query protein

    sequence against all patterns stored in the PROSITE pattern database and determine what the

    function of an uncharacterised protein is. This tool requires a protein sequence as input, but DNA/RNA

    may be translated into a protein sequence usingtranseqand then queried.

    InterPro:InterPro is an integrated database of predictive protein signatures used for the

    classification and automatic annotation of proteins and genomes. InterPro classifies sequences at

    superfamily, family and subfamily levels, predicting the occurrence of functional domains, repeats and

    important sites. InterPro adds in-depth annotation, including GO terms, to the protein signatures.

    http://www.ebi.ac.uk/Tools/ppsearch/http://www.ebi.ac.uk/Tools/ppsearch/http://www.ebi.ac.uk/Tools/emboss/transeq/index.htmlhttp://www.ebi.ac.uk/Tools/emboss/transeq/index.htmlhttp://www.ebi.ac.uk/Tools/emboss/transeq/index.htmlhttp://www.ebi.ac.uk/interpro/http://www.ebi.ac.uk/interpro/http://www.ebi.ac.uk/interpro/http://www.ebi.ac.uk/interpro/http://www.ebi.ac.uk/Tools/emboss/transeq/index.htmlhttp://www.ebi.ac.uk/Tools/ppsearch/