uniprot
DESCRIPTION
Protein Sequence Database:. UniProt. Jennifer McDowall. Overview. The UniProt databases UniProt/SwissProt annotation UniProt/TrEMBL automatic annotation Using the uniprot.org website Computational access. 1) The UniProt databases. Source of protein sequence data. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/1.jpg)
EBI is an Outstation of the European Molecular Biology Laboratory.
UniProt
Jennifer McDowall
Protein Sequence Database:
![Page 2: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/2.jpg)
22
Overview
1) The UniProt databases
2) UniProt/SwissProt annotation
3) UniProt/TrEMBL automatic annotation
4) Using the uniprot.org website
5) Computational access
![Page 3: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/3.jpg)
1) The UniProt databases
![Page 4: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/4.jpg)
44
Source of protein sequence data
Nucleotidesequencedatabase
Proteinsequencedatabase
Individual scientists
Large-scale sequencing
projects Patent Offices
Nucleotide sequencing
Submit
Submit
Protein sequencing
Deriveprotein
sequence
• Protein sequencing is rare
• Most protein sequence
derived from nucleotide data
• Protein sequencing is rare
• Most protein sequence
derived from nucleotide data
![Page 5: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/5.jpg)
55
Protein sequence is mainly derived data
ACGCTCGTACGCATCGTCACTACTAGCTACGACGACGACACGCTACTACTCGACGATTCTDNA sequence
translate
Derived mRNA sequence AUGCGUAGUGAUGAAUGCUGCUGUGCGAUGAGCUGC
Derived protein sequence MRSNECCCAMSC
transcribe
submit
![Page 6: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/6.jpg)
66
Protein sequence is mainly derived data
ACGCTCGTACGCATCGTCACTACTAGCTACGACGACGACACGCTACTACTCGACGATTCTDNA sequence
translate
Derived mRNA sequence AUGCGUAGUGAUGAAUGCUGCUGUGCGAUGAGCUGC
Derived protein sequence MRSNECCCAMSC
transcribe
submit
Predictedstop
Predictedstart
may not have direct evidence
Predictedsplice sites
![Page 7: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/7.jpg)
77
How to find the information you need?
TATCTACAG
TAGAGGCTATCAGCA
CGCAGCACCAT
GACGCGCATAACT
GATCTACGA
TAGCGAGCAGCAGCA
CAGCATC
GCAGCATCAG
CTAAGCGACA
ATAGACATCA
AATCATCACGAT
GAATCATCGTCTACG
AGATCGC
CTATCTGT
High quality protein sequence
• Non-redundant data • Splice isoforms, disease variants, PTMs• Sequence archiving essential
Protein identification
• Stable identifiers • Consistent nomenclature
Protein annotation
• Information
protein functionbiological processesmolecular interactions
pathways
![Page 8: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/8.jpg)
88
UniProt
Since 2002 a merger and collaboration of three databases:
Funded mainly by NIH (US) to be the highest quality, most thoroughly annotated protein sequence database
Swiss-Prot & TrEMBL PIR-PSD
http://www.uniprot.org/http://www.uniprot.org/
![Page 9: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/9.jpg)
99
UniProt Consortium
![Page 10: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/10.jpg)
1010
Where does the data come from?S
eque
nce
sour
ces
UniParcENA
exchange
data daily
![Page 11: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/11.jpg)
1111
Where does the data come from?
more…
Seq
uenc
e so
urce
s
ENA
Modelorganisms
PDB
RefSeq
Ensembl
VEGA
Patents
UniParc
UniMES UniProtKB/TrEMBL
Metagenomic &environmental
Taxonomyknown
History of sequencesHistory of
sequences
High quality annotation
High quality annotation
UniProtKB/SwissProt
Removeredundancy
Manualannotation
![Page 12: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/12.jpg)
1212
Where does the data come from?
UniParc
UniMES UniProtKB/TrEMBL
Metagenomic &environmental
Taxonomyknown
UniProtKB/SwissProt
UniMESClusters
UniRefClusters
more…
Seq
uenc
e so
urce
s
ENA
Modelorganisms
PDB
RefSeq
Ensembl
VEGA
Patents
![Page 13: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/13.jpg)
1313
4 components of UniProt
UniParc
UniMES
Swiss-Prot: non-redundant, manual annotation
TrEMBL: redundant, automatic annotation
Combines sequences (speed searching)
UniRef100, UniRef90, UniRef50
Complete history of sequences (no annotation)
Cross-links to external sequence sources
Sequences from metagenomic projects
UniProtKB
UniRef
![Page 14: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/14.jpg)
1414
Browsing a UniParc entry
Sequence
Navigate to individual entries
Download data
Deleted entries
identified (greyed out)
Accession
List of databases containing sequence
![Page 15: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/15.jpg)
1515
Browsing a UniProtKB/SwissProt entry
References
Navigate to external data
sourcese.g. Ensembl
Download dataNames (synonyms)
and taxonomy
Ontologies
Protein attributes
Annotation
Protein interactionsSplice variants
Sequence features
General information
Sequence
![Page 16: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/16.jpg)
1616
Browsing a UniRef90 entry
Cluster name
List of entries in cluster
Taxonomy of each entry
% identity of sequences in cluster
Status (SwissProt
and/or TrEMBL)
Faster and more sensitive sequence search with no
loss of information
Faster and more sensitive sequence search with no
loss of information
![Page 17: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/17.jpg)
1717
Taxonomic distribution of species
Bacteria(61%)
Eukaryota (32%)
Archaea(4%)
Viruses(3%)
All kingdoms: Within Eukaryota:
Other mammals (27%)
Homo (12%)
Other (8%)
Nematoda(2%)
Insecta(5%)
Fungi(18%)
Viridiplantae(18%)
Other Vertebrata
(10%)
![Page 18: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/18.jpg)
1818
SwissProt – most represented species
Mainly model organisms
![Page 19: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/19.jpg)
1919
Protein Existence tag
Protein existence level:
Evidence at protein level
Evidence at transcript level
Inferred from homology
Predicted
Uncertain (mainly TrEMBL)
Total
13%
12%
70%
5%
-
!! Not sequence validation !!
![Page 20: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/20.jpg)
2020
Protein existence categories
Protein existence level:
Evidence at protein level
Evidence at transcript level
Inferred from homology
Predicted
Uncertain (mainly TrEMBL)
!! Not sequence validation !!
Human
59%
37.5%
1%
0.5%
2%
![Page 21: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/21.jpg)
2) UniProtKB/SwissProt
annotation
![Page 22: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/22.jpg)
2222
Annotation sources for UniProtKB
UniProtKB
* Manual curation
* Literature-based annotation
* Sequence analysis
Transmembrane prediction
Transmembrane prediction
InterPro classification
InterPro classification
Signal predictionSignal prediction
Other predictionsOther predictions
Protein classification
* Automated annotation
PRIDE
GO
InterPro
IntAct
IntEnz
HAMAP
RESID
Functional infoFunctional info
Protein identification data
Protein identification data
Protein families and domains
Protein families and domains
Molecular interactionsMolecular interactions
EnzymesEnzymes
Microbial protein families
Microbial protein families
Post-translational modifications
Post-translational modifications
Som
e da
ta s
ourc
es f
or
anno
tatio
n
Data sources
![Page 23: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/23.jpg)
2323
Features of UniProtKB
Sequence
Annotations
Nomenclature References
Ontologies
Splice variants
Sequence features
![Page 24: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/24.jpg)
2424
A wealth of external links
125 links!
![Page 25: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/25.jpg)
2525
SwissProt manual annotation
1. Protein sequence
2. Biological information
• Extract literature information
• Orthologue data propagation
• Protein sequence analysis...
• Merge available CDS (coding sequence)
• Annotate sequence discrepancies
• Report sequencing errors...
![Page 26: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/26.jpg)
2626
Problem #1: sequence correction
~20% of Swiss-Prot entries required correction
• Typical problems:
– Unsolved conflicts (sequencing errors)
– Erroneous gene model predictions
– Wrong initiation sites
– Frameshifts...
![Page 27: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/27.jpg)
2727
Sequence quality from genome projects
• Drosophila:
• Well-curated• 1.8% of gene models incorrect
• Arabidopsis:
• Annotated when sequenced, but no update• 19.5% of gene models incorrect
• Tetraodon nigroviridis:
• Automatic run through (no manual intervention)• >90% of gene models incorrect
![Page 28: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/28.jpg)
2828
Sequence curation
Other examples of sequencing errors include:premature stop codons, read-throughs, erroneous initiator methionines
Sequencing errors
![Page 29: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/29.jpg)
2929
Problem #2: proteome complexity
1 SwissProt entry = 1 gene (1 species)
genome~20,000 human
protein-coding genes
transcriptome~100,000 human
transcripts
alternative splicing, alternative initiation, mRNA editing...
proteome>1,000,000 human
proteins
Post-translational modification
Annotation of sequence differences
![Page 30: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/30.jpg)
3030
Merging entries
1) Errors• Erroneous gene model predictions; sequence errors
2) Natural variation• Polymorphisms; Alternative start sites; Alternative splicing
Multiple entries for the same protein exist in TrEMBL (redundancy)
Apart from 100% identical sequences all merged sequences are analyzed by a curator so they can be annotated
accordingly.
Because of:
![Page 31: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/31.jpg)
3131
Example
Multiple alignment of the end of the available GCR sequences:
Annotation of the sequence differences (protein diversity):
![Page 32: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/32.jpg)
3232
Merging entries
![Page 33: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/33.jpg)
3333
Sequence curation
Alternative Splicing
![Page 34: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/34.jpg)
3434
Sequence curation
Alternative Splicing
![Page 35: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/35.jpg)
3535
Sequence curation
Alternative Splicing
![Page 36: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/36.jpg)
3636
Sequence curation
Alternative Splicing
![Page 37: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/37.jpg)
3737
Sequence curation
Alternative Splicing
![Page 38: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/38.jpg)
3838
Sequence curation
Identification of amino acid variants
....and of PTMs
....and also
![Page 39: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/39.jpg)
3939
Sequence curation
Domain annotation
Binding sites
![Page 40: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/40.jpg)
4040
SwissProt manual annotation
1. Protein sequence
2. Biological information
• Extract literature information
• Orthologue data propagation
• Protein sequence analysis...
• Merge available CDS (coding sequence)
• Annotate sequence discrepancies
• Report sequencing errors...
![Page 41: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/41.jpg)
4141
Sources of annotated information
UniProtKB/SwissProt gathers
information from multiple sources:
• Publications (literature/PubMed)
• Prediction proteins (Prosite, Anabelle)
• Contact with experts
• Other databases
• Nomenclature committees
![Page 42: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/42.jpg)
4242
Nomenclature
Synonyms useful for
literature searching
Synonyms useful for
literature searching
![Page 43: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/43.jpg)
4343
Nomenclature
Provides synonyms
and cleavage
products of
bifunctional proteins
Provides synonyms
and cleavage
products of
bifunctional proteins
![Page 44: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/44.jpg)
4444
Annotation comments
Controlled vocabularies used whenever possible…
>30 comment fields
![Page 45: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/45.jpg)
4545
Disease association
Mendelian Inheritance in Man provides information on genetic
disease associations
Mendelian Inheritance in Man provides information on genetic
disease associations
Pharmacogenomics databasePharmacogenomics database
![Page 46: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/46.jpg)
4646
Sequence annotation (Features)
…enable researchers
to obtain a summary
of what is known
about a protein…
![Page 47: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/47.jpg)
4747
Sequence annotation (Features)
Feature (e.g. domain) highlighted on sequence
Feature (e.g. domain) highlighted on sequence
![Page 48: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/48.jpg)
4848
Gene Ontology
2. Molecular Function
An elemental activity or task or job
• Protein kinase activity• Insulin binding• Insulin receptor activity
1. Biological Process
A commonly recognized series of events
• Cell division• Mitosis• Organelle fission
3. Cellular Component
Where a gene product is located
• Mitochondrion
• Mitochondrial matrix
• Mitochondrial membrane
![Page 49: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/49.jpg)
4949
Gene Ontology
Annotation for human Rhodopsin:
![Page 50: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/50.jpg)
5050
Imported annotation
Binary interactions are taken from the database
Interactors of human p53
![Page 51: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/51.jpg)
5151
Evidence for annotation
UniProtKB/Swiss-Prot distinguishes between
experimental and predicted data
Type of evidence Evidence tag
1st: Experimental evidence Reference provided
2nd: Light experimental evidence Probable
3rd: Inferred by similarity with homologous protein By similarity
4th: Inferred by sequence prediction Potential
![Page 52: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/52.jpg)
5252
Evidence for annotation
Proven
Proven
Potential
Proven
By similarity
![Page 53: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/53.jpg)
5353
Sources references included
![Page 54: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/54.jpg)
5454
Versioning and archiving
![Page 55: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/55.jpg)
5555
Versioning and archiving
Able to compare
versions directly
Able to compare
versions directly
![Page 56: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/56.jpg)
5656
Versioning and archiving
![Page 57: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/57.jpg)
3) UniProtKB/TrEMBL automatic annotation
![Page 58: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/58.jpg)
5858
UniProtKB/TrEMBL
!! Caution !!Quality of UniProtKB/TrEMBL entries
depends upon quality of submissions
in original EMBL/GenBank/DDBJ entry.
![Page 59: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/59.jpg)
5959
Annotated proteins guide TrEMBL entries
• 379 annotated UniProtKB/Swiss-Prot entries
• 9,186 un-annotated UniProtKB/TrEMBL entries
Automatic annotation added using Swiss-Prot and InterPro (function prediction database)
Don’t want un-annotated TrEMBL to be skeleton entries with no information
Example for rhodopsin:
![Page 60: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/60.jpg)
6060
Automatic annotation
UniProtKB uses 2 prediction programs:
UniRule:
maintains a set
of manual
annotation rules.
InterProSwiss-Prot
SAAS:
generates a set of
decision trees using
data mining.
(new set every
UniProtKB release)
![Page 61: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/61.jpg)
6161
Automatic annotation - InterPro
Swiss-Prot
groups of related proteins
(same family or share domains)
TrEMBL
uncharacterised sequence
protein signatures
InterPro
automatic annotation
pipelineCGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
CGCGCCTGTACGCTGAACGCTCGTGACGTGTAGTGCGCG
manually annotated sequence
![Page 62: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/62.jpg)
6262
Browsing a UniProtKB/TrEMBL entry
Name(could be clone name)
Automatic annotation . (derived from InterPro)
Ontologies (both automatic and
manual curation)
Taxonomy
![Page 63: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/63.jpg)
4) Using the www.uniprot.org website
![Page 64: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/64.jpg)
6464
www.uniprot.org
Useful Features
Integrated BLAST and Alignments
Batch retrieval in a variety of formats
Simple and modular advanced searching
![Page 65: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/65.jpg)
6565
uniprot.org: anatomy of an entry
Entry Info
Link to UniSave
Link to UniRef
Variety of formats
Navigation bar
Customize order
![Page 66: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/66.jpg)
6666
uniprot.org: anatomy of an entry
Entry Info
Link to UniSave
Link to UniRef
Variety of formats
Navigation bar
Customize order
![Page 67: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/67.jpg)
6767
Searching UniProt
Search tools include:
• Text Search
• Blast sequence search
• Additional search engines through EBI (e.g. SSearch and FASTA)
http://www.uniprot.org/http://www.uniprot.org/
![Page 68: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/68.jpg)
6868
Search
Powerful text search tool with
autocompletion and refinement options
look for UniProt entries and documentation
using biological information
![Page 69: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/69.jpg)
6969
Search
Search sequence database,
literature, taxonomy…
Search sequence database,
literature, taxonomy…
More search
options
More search
options
![Page 70: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/70.jpg)
7070
Search
Refine searchRefine search
![Page 71: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/71.jpg)
7171
Search results
![Page 72: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/72.jpg)
7272
Search results
Define type and order
of search results
Define type and order
of search results
![Page 73: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/73.jpg)
7373
Search results
Each result linked to
the UniProt entry
Each result linked to
the UniProt entry
SwissProt
TrEMBL
SwissProt
TrEMBL
Select specific entriesSelect specific entries
![Page 74: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/74.jpg)
7474
Search results
Can retrieve or
BLAST sequence
Can retrieve or
BLAST sequence
Keeps selected entries
throughout session
Keeps selected entries
throughout session
![Page 75: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/75.jpg)
7575
Search results
Can retrieve or align
>2 sequences
Can retrieve or align
>2 sequences
![Page 76: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/76.jpg)
7676
BLAST
A tool with standard options to search
sequences in UniProt databases by
sequence blast
Search refinement
(change parameters)
Search refinement
(change parameters)
![Page 77: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/77.jpg)
7777
BLAST
Can query using protein
or nucleotide sequences
Can query using protein
or nucleotide sequences
![Page 78: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/78.jpg)
7878
BLAST
P00750
Can query using identifier:
• UniProtKB accession (P00750)
• Specific version (P00750:2)
• Splice variant (P00750-2)
• Name (A4_HUMAN)
• UniParc accession (UPI0000000001)
• UniRef accession (UniRef100_P00750)
Can query using identifier:
• UniProtKB accession (P00750)
• Specific version (P00750:2)
• Splice variant (P00750-2)
• Name (A4_HUMAN)
• UniParc accession (UPI0000000001)
• UniRef accession (UniRef100_P00750)
![Page 79: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/79.jpg)
7979
BLAST
= best
= should verify
= biological significance less likely
Threshold =
expectation (E)
value
Threshold =
expectation (E)
value
Provides cut-off between good and poor hits
![Page 80: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/80.jpg)
8080
BLAST
Matrix = assigns
probability score
for each position
Matrix = assigns
probability score
for each position
Controls sensitivity of search
![Page 81: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/81.jpg)
8181
BLAST
Stretches of cysteines or hydrophobic regions can cause spurious matches
Replaces them with X’s
Filtering = masks low
complexity regions
Filtering = masks low
complexity regions
![Page 82: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/82.jpg)
8282
BLAST
Gapped = allows gaps in sequence
• Yes = to find more distant homologues• No = to find closest matches (strict)
Gapped = allows gaps in sequence
• Yes = to find more distant homologues• No = to find closest matches (strict)
![Page 83: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/83.jpg)
8383
BLAST
Hits = limits
number of results
Hits = limits
number of results
![Page 84: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/84.jpg)
8484
BLAST results
Can filter or
customize results
Can filter or
customize results
![Page 85: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/85.jpg)
8585
BLAST results
Shows length of
query sequence
aligned
Shows length of
query sequence
aligned
Select match to
see alignment
Select match to
see alignment
![Page 86: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/86.jpg)
8686
BLAST results – pairwise alignment
Alignment of
selected sequence
Alignment of
selected sequence
![Page 87: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/87.jpg)
8787
BLAST results – pairwise alignment
Colour alignment by
annotation or
properties
Colour alignment by
annotation or
properties
![Page 88: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/88.jpg)
8888
BLAST results
...Further down the
results page…
details about matching
protein sequences
Further down the
results page…
details about matching
protein sequences
![Page 89: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/89.jpg)
8989
BLAST results
.
.
.
Can align checked
sequences
Can align checked
sequences
![Page 90: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/90.jpg)
9090
BLAST results – multiple alignment
Alignment of
selected sequence
Alignment of
selected sequence
Can add additional
sequences to
alignment
Can add additional
sequences to
alignment
![Page 91: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/91.jpg)
9191
BLAST results – multiple alignment
Colour alignment
by annotation or
properties
Colour alignment
by annotation or
properties
![Page 92: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/92.jpg)
9292
Align
ClustalW multiple alignment tool with
amino-acids highlighting options
and feature annotation highlighting option
![Page 93: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/93.jpg)
9393
Retrieve
- retrieve a list of entries in several standard formats.
- then query retrieved sequences with UniProt search tool.
UniProt-specific tool:
![Page 94: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/94.jpg)
9494
ID Mapping
Allows mapping between different
databases for a given protein
![Page 95: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/95.jpg)
9595
Other tools
http://www.ebi.ac.uk/http://www.ebi.ac.uk/
Sequence Similarity & Analysis
![Page 96: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/96.jpg)
9696
Other tools
BLASTBLAST
FASTAFASTA
specialized
searches
specialized
searches
http://www.ebi.ac.uk/Tools/sss/http://www.ebi.ac.uk/Tools/sss/
![Page 97: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/97.jpg)
5) Computational access
![Page 98: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/98.jpg)
9898
Computational access to UniProt
http://www.uniprot.org/http://www.uniprot.org/
![Page 99: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/99.jpg)
9999
Computational access to UniProt
http://www.ebi.ac.uk/uniprot/http://www.ebi.ac.uk/uniprot/
![Page 100: UniProt](https://reader033.vdocuments.us/reader033/viewer/2022051401/5681436a550346895dafe8fc/html5/thumbnails/100.jpg)
100100
Acknowledgements
Rolf Apweiler
Ioanis Xenarios
Cathy H Wu
+100 annotators