class 3 2009 european resources protein focused. protein databases ebi – european bioinformatics...
Post on 21-Dec-2015
221 views
TRANSCRIPT
Protein information• Name & description
• Gene encoded from
• Organism
• Function (only one?)
• Enzyme?
• Ligands?
• PTMs?
• Interactions?
• Biological processes.
• Structure.
• Sequence.
• Localization
• More...
Protein DB -short history
Pre-UniProt
Swiss-Prot: created in July 1986; since 1987, a collaboration of the SIB and the EMBL/EBI;
TrEMBL: created at the EBI in 1996 as a computer-annotated protein sequence database supplementing Swiss-Prot.
It was introduced to deal with the increased data flow from genome projects
The three-layered approach
The UniProt Archive (UniParc)•UniProtKB + all other protein sequences publicly available•Completeness
The UniProt Reference Clusters (UniRef)•Non-redundant views of UniProtKB + selected UniParcsets•Speed
The UniProt Knowledgebase (UniProtKB)•Central database of annotated protein sequences and functional information•UniProtKB/Swiss-Prot + UniProtKB/TrEMBL
Protein DBs• Swiss-Prot - manually annotated.
• TrEMBL – translated EMBL, automatically annotated.
• UniProtKB – The UniProt Knowledge
• UniParc – The Achieve pf UniProt
• PIR - Protein Information Resource
• UniRef – The UniProt Reference Clusters
• PDB – Protein Data Bank – structure
• PRIDE – Resource for experimental proteomics (not in this
class)
Protein NamesDifferent DBs – different accessions
DB Accessions
TrEMBL P12345
Swiss-Prot (to be changed..) MAPK_HUMAN
RefSeq NP_123456
XP_123456
UniRef UniRef100_P99999
UniRef90_P99999
UniRef50_P99999
Ensembl ENSP00000123456
Protein DBs• Swiss-Prot - manually annotated.
• TrEMBL – translated EMBL, automatically annotated.
• UniProtKB – The UniProt Knowledge
• UniParc – The Achieve pf UniProt
• PIR - Protein Information Resource
• UniRef – The UniProt Reference Clusters
• PDB – Protein Data Bank – structure
• PRIDE – Resource for experimental proteomics (not in this
class)
More in UniProt a complete annotated protein sequence database
UniProt The Universal Protein Resource for protein sequences.
UniProt Archive A non-redundant archive of protein sequences extracted from public databases and contains only protein sequences.
UniProt/UniRef Features clustering of similar sequences to yield a representative subset of sequences. This produces very fast search times.
UniProt/UniMES A repository specifically developed for metagenomic and environmental data.
Protein DBs• Swiss-Prot - manually annotated.
• TrEMBL – translated EMBL, automatically annotated.
• UniProtKB – The UniProt Knowledge
• UniParc – The Achieve pf UniProt
• PIR - Protein Information Resource
• UniRef – The UniProt Reference Clusters
• PDB – Protein Data Bank – structure
• PRIDE – Resource for experimental proteomics (not in this
class)
PIR – Protein Information Resource
Protein Family Classification System
Integrated
Protein
Knowledgebase
Integrated Protein Literature, Information and Knowledge