prosite and ucsc genome browser exercise 3. protein motifs and prosite
Post on 21-Dec-2015
216 views
TRANSCRIPT
Prosite and Prosite and UCSC Genome UCSC Genome
BrowserBrowser
Exercise 3Exercise 3
Protein motifsProtein motifs and and
Prosite Prosite
Turning information into knowledgeTurning information into knowledge
The outcome of a sequencing project is The outcome of a sequencing project is masses of raw datamasses of raw data
The challenge is to turn this The challenge is to turn this raw data into raw data into biological knowledgebiological knowledge
A valuable tool for this challenge is an A valuable tool for this challenge is an automated diagnostic pipe through which automated diagnostic pipe through which newly determined sequences can be newly determined sequences can be streamlinedstreamlined
From sequence to functionFrom sequence to function
Nature tends to innovate rather than inventNature tends to innovate rather than invent Proteins are composed of functional Proteins are composed of functional
elements: domains and motifselements: domains and motifs DomainsDomains are structural units that carry out a are structural units that carry out a
certain functioncertain function The same domains are The same domains are
shared between different shared between different proteinsproteins
MotifsMotifs are shorter are shorter sequences with certainsequences with certainbiological activitybiological activity
What is a motif?What is a motif?
A sequence motifA sequence motif = a certain sequence = a certain sequence that is widespread and conjectured to that is widespread and conjectured to have biological significancehave biological significance
Examples:Examples:KDELKDEL – ER-lumen retention signal – ER-lumen retention signalPKKKRKVPKKKRKV – an NLS (nuclear localization – an NLS (nuclear localization signal)signal)
More loosely defined motifsMore loosely defined motifs
KDEL (usually)KDEL (usually)++
HDEL (rarely) HDEL (rarely) ==
[HK]-D-E-L:[HK]-D-E-L:H H oror K at the first position K at the first position
This is called a pattern (in Biology), or a This is called a pattern (in Biology), or a regular expression (in computer science)regular expression (in computer science)
Syntax of a patternSyntax of a pattern
Example:Example: W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]
PatternsPatterns
W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]W-x(9,11)-[FYV]-[FYW]-x(6,7)-[GSTNE]
Any amino-acid, between 9-11
times
F or Y or
V
WOPLASDFGYVWPPPLAWSROPLASDFGYVWPPPLAWSWOPLASDFGYVWPPPLSQQQ
Patterns - syntaxPatterns - syntax
The standard IUPAC one-letter codes. The standard IUPAC one-letter codes. ‘‘x’x’ : any amino acid. : any amino acid. ‘‘[]’[]’ : residues allowed at the position. : residues allowed at the position. ‘‘{}’{}’ : residues forbidden at the position. : residues forbidden at the position. ‘‘()’()’ : repetition of a pattern element are indicated in : repetition of a pattern element are indicated in
parenthesis. X(n) or X(n,m) to indicate the number or parenthesis. X(n) or X(n,m) to indicate the number or range of repetition. range of repetition.
‘‘-’-’ : separates each pattern element. : separates each pattern element. ‘‹’‘‹’ : indicated a N-terminal restriction of the pattern. : indicated a N-terminal restriction of the pattern. ‘›’‘›’ : indicated a C-terminal restriction of the pattern. : indicated a C-terminal restriction of the pattern. ‘‘.’.’ : the period ends the pattern. : the period ends the pattern.
Profile-pattern-consensusProfile-pattern-consensus
AAAACCTTTTGG
AAAAGGTTCCGG
CCAACCTTTTCC
1122334455
AA0.660.66110000..
TT00000011..
CC0.330.33000.660.6600..
GG00000.330.3300..
AAAACCTTTTGG
]AC-[A-[GC]-T-[TC]-[GC]
multiple alignment
consensus
pattern
profile
NNAANNTTNNNN
http://www.expasy.ch/http://www.expasy.ch/prositeprosite//
PrositeProsite
A method for determining the function of A method for determining the function of uncharacterized translated protein uncharacterized translated protein sequencessequences
Database of annotated protein families Database of annotated protein families and functional sites as well as associated and functional sites as well as associated patterns and profiles to identify thempatterns and profiles to identify them
PrositeProsite Entries are represented with Entries are represented with patternspatterns or or
profilesprofiles
pattern
1122334455
AA0.660.66110000..
TT00000011..
CC0.330.33000.660.6600..
GG00000.330.3300..
profile
]AC-[A-[GC]-T-[TC]-[GC]
Profiles are used in Prosite when the motif is relatively Profiles are used in Prosite when the motif is relatively divergent and it is difficult to represent as a patterndivergent and it is difficult to represent as a pattern
Scanning PrositeScanning Prosite
Query: sequence
Query: pattern
Result: all patterns found in sequence
Result: all sequences which adhere to this pattern
prosite sequence queryprosite sequence query
Prosite profileProsite profile
Prosite profile Prosite profile sequence logo sequence logo
Sequence logoSequence logo
WebLogoWebLogo
http://weblogo.berkeley.edu/logo.cgi
Searching Prosite with a sequenceSearching Prosite with a sequence
Patterns with a high probability of Patterns with a high probability of occurrenceoccurrence
Entries describing commonly found postEntries describing commonly found post--translational modifications or compositionally translational modifications or compositionally biased regions.biased regions.
Found in the majority of known protein Found in the majority of known protein sequences sequences
High probability of occurrenceHigh probability of occurrence
Searching Prosite with a patternSearching Prosite with a pattern
prosite pattern queryprosite pattern query
Searching Prosite with a Prosite ACSearching Prosite with a Prosite AC
UCSC UCSC Genome Browser Genome Browser
UCSC Genome BrowserUCSC Genome Browser
UCSC Genome BrowserUCSC Genome Browser
Reset all settings of
previous user
UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway
UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway
UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway
UCSC Genome Browser query resultsUCSC Genome Browser query results
UCSC Genome Browser UCSC Genome Browser Annotation tracksAnnotation tracks
Vertebrate conservation
mRNA (GenBank)
RefSeq
UCSC Genes
Base position
Single species compared
SNPs
Repeats
Direction oftranscription (<)
CDS
Intron
UTR
USCS GeneUSCS Gene
UCSC Genome Browser - movementUCSC Genome Browser - movement
Zoom x3 + Center
UCSC Genome Browser – UCSC Genome Browser – Base viewBase view
Annotation track optionsAnnotation track options
dense
squish
full
pack
Annotation track optionsAnnotation track optionsAnother option totoggle between
‘pack’ and ‘dense’view is to click on
the track title
Sickle-cell anemia distr.
Malariadistr.
BLATBLAT
BLAT = BBLAT = Blast-last-LLike ike AAlignment lignment TTool ool BLAT is designed to find similarity of BLAT is designed to find similarity of >95% on >95% on
DNADNA, , >80% for protein>80% for protein Rapid search by indexing entire genome.Rapid search by indexing entire genome.Good for:Good for:1.1. Finding genomic coordinates of cDNAFinding genomic coordinates of cDNA2.2. Determining exons/intronsDetermining exons/introns3.3. Finding human (or chimp, dog, cow…) Finding human (or chimp, dog, cow…)
homologs of another vertebrate sequencehomologs of another vertebrate sequence4.4. Find upstream regulatory regionsFind upstream regulatory regions
BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser
BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser
BLAT ResultsBLAT Results
BLAT ResultsBLAT Results
Match
Non-Match(mismatch/indel)
Indel boundaries
BLAT ResultsBLAT Results
BLAT Results on the browserBLAT Results on the browser
Getting Getting DNADNA sequence of region sequence of region
Getting Getting DNADNA sequence of region sequence of region