Download - Barcelona sabatica
![Page 1: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/1.jpg)
Protein loop classification using Artificial Neural
Networks
Armando Vieira1 and Baldomero Oliva2
1ISEP and Centro de Física Computacional, Coimbra, Portugalwww.defi.isep.ipp.pt/~asv
2Structural Bioinformatics Laboratory (GRIB) IMIM/Universitat Pompeu Fabra, Barcelona, Spain
![Page 2: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/2.jpg)
XXI: the century of BIO
![Page 3: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/3.jpg)
BIOINFORMATICSjoining two worlds apart
![Page 4: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/4.jpg)
OutlineBrief review of protein structure
Statement of problem and why is so hard
Data pre-processing, corrections, updates and beyond multiple alignments…
Neural Networks in protein structure prediction
HLVQ
Results and future work
![Page 5: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/5.jpg)
Proteins
All proteins are chains of 20 amino acids
Not all chains of amino acids are proteins
Fold rapidly and repeatedly
Proteins are the machinery of live
Essential to all (known) organisms
![Page 6: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/6.jpg)
The Gist of it
Amino acid Amino acid sequencesequence
Physical Physical structurestructure
FunctionFunction
![Page 7: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/7.jpg)
Typical globular protein
MMEMEKMEKKMEKKEFHIVAMEKKEFHIVAETGIHARPATLLVQTASLFNSDINLETLGKSVNLKSIMGVMSLGVGQGSDVTITVDGADEADGMAAIVETLQLQGLAQ
![Page 8: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/8.jpg)
Coarse-Grained Model
![Page 9: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/9.jpg)
+180
b b b p o M e e e
b b b p o M M e e
b b b p . l l s e
a a a T . l l g N
N a a a . U l g N
N a a a . U g g N
I a a a . G G G I
e F F F o e e e e
b b b p o e e e e
-180
-180 +180
![Page 10: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/10.jpg)
Ramachandran Alphabet
φφ
ψψ
-180-180°° 180°180°-180°-180°
180°180°
90°90°
-90°-90°
0°0°
0°0°-90°-90° 90°90°
AA
BB
EE
GG
![Page 11: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/11.jpg)
5-letter alphabet
Residue Sequence
MEKKEFHIVAETGIHARPATLLVQTASLFNSDINLETLGKSVNLKSIMGVMSLGVGQGSDVTITVDGADEADGMAAIVETLQLQGLAQ...
3° Structure
ACCDECBAABDECBDABCDBEABDBCBDBAEBDBDBAEBABDCBBDBADDCBDBCBDBEBDBCBBDCAABDEDCDCEAABACAAAADC…
![Page 12: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/12.jpg)
What shall we do?• Ab initio:
Quantum Mechanics + big computers + large # configurations
= huge problems…
• Machine Learning:Use known cases to learn a suitable
map:sequence→ structure
![Page 13: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/13.jpg)
Machine Learning Approach
![Page 14: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/14.jpg)
Artificial Neural Networks• A problem-solving paradigm modeled after the
physiological functioning of the human brain.
• Synapses in the brain are modeled by computational nodes.
• The firing of a synapse is modeled by input, output, and threshold functions.
• The network “learns” based on problems to which answers are known (supervised learning).
• The network can then produce answers to entirely new problems of the same type.
![Page 15: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/15.jpg)
Neural Networks
OutputLayer
InputLayer
HiddenLayers
![Page 16: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/16.jpg)
Overfitting – high risk!
Less complicated hypothesis has lower error rate
![Page 17: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/17.jpg)
Hidden Layer Vector Quantization- HLVQ
xxxx xxxxxx
xx
xxxx
oo
oooo oo
oo
oooo
Traditional NNTraditional NN
xxxx xxxxxx
xx
xxxx
oo
oooo oo
oo
oooo
HLVQHLVQ
Main advantage: detect and Main advantage: detect and correctcorrect prediction for prediction for outliersoutliers
zz
![Page 18: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/18.jpg)
Loops, loops everywhere!!!Loops, loops everywhere!!!
![Page 19: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/19.jpg)
Look for a loop…
![Page 20: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/20.jpg)
Geometry of the Motif
![Page 21: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/21.jpg)
Loop Types
: : strandstrand - - -helix -helix
: : -helix - -helix - -helix -helix : : -helix – -helix – strandstrand
-hairpin-hairpin: : strandstrand - - strandstrand
- link- link: : strandstrand - - strandstrand
![Page 22: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/22.jpg)
Similar conformation Similar conformation aa{aa{bb}aa / aa{p}aa}aa / aa{p}aa
Identical geometry Identical geometry (4,6)(0,45)(45,90)(180,225)(4,6)(0,45)(45,90)(180,225)
1.3.1 aa{p}aa1.3.1 aa{p}aa
1.1.2 aa{b}aa1.1.2 aa{b}aa
Pro 75%
Ser 75%
© Baldomero Oliva© Baldomero Oliva
![Page 23: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/23.jpg)
Class Class
![Page 24: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/24.jpg)
ArchDB database
~ 20 000 loops classified into ~ 3000 classes.EE-3.4.1
Loop type - loop size . consensus . motif
TASK: classify a loop from sequence alone
If not possible, get as much information as possible
![Page 25: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/25.jpg)
Problems
• Coding of aminoacids
• Huge searching space, sparsely populated
• How to assign the loop classes?
• High dimensionality → Large Networks → poor generalization
![Page 26: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/26.jpg)
Aminoacid codingthe classical way
A → (1, 0, …0)
C → (0, 1, …0)
Y → (0, 0, …1)
Useful but not efficient!!!
I am working to improve it…
![Page 27: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/27.jpg)
Theory; but how about applications?!
![Page 28: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/28.jpg)
- link and - harpins from sequence
HLVQ
(MLP)
Predicted
- link
Predicted
- harpin
Real
- link
88.4
(79.4)
11.6
(20.6)
Real
- harpin
12.5
(16.1)
87.5
(83.9)
![Page 29: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/29.jpg)
Prediction of all loop types from sequence alone
- lk α- - hp -α α-α
- lk 45.9 28.5 3.7 19.8 2.1
α- 8.8 67.4 1.2 18.0 4.6
- hp 0.4 0.9 96.1 2.1 0.5
-α 4.4 6.2 2.4 79.5 7.6
α-α 4.0 15.7 1.3 20.3 58.6
![Page 30: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/30.jpg)
What’s it all mean?
Given a loop residue sequence, we can (usually) identify its native structure.
Not ab initio: We cannot tell the structure of a novel sequence.
HLVQ is superior to MLP
![Page 31: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/31.jpg)
Future Work
Better coding of aminoacidsBetter coding of aminoacids
Larger sequences / low complexityLarger sequences / low complexity
Going beyond structureGoing beyond structure
Clever alphabet that explore similaritiesClever alphabet that explore similarities
Multiobjective Genetic AlgorithmsMultiobjective Genetic Algorithms
![Page 32: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/32.jpg)
Beyond Multiple Alignments
• Alligments are good … but expensive and boring ...
• Information contained in a multiple alignment can, in principle, be expressed using an adequate aminoacid coding scheme
• How? SensibilitySensibility
Genetic AlgorithmGenetic Algorithm
![Page 33: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/33.jpg)
Coded Amino Acids
Alanine (A) Arginine (R) Asparagine (N) Aspartic Acid (D) Cysteine (C)
Glutamic Acid (E) Glutamine (Q) Glycine (G) Histidine (H) Isoleucine (I)
Leucine (L) Methionine (M)Lysine (K) Phenylalanine (F) Proline (P)
Serine (S) Threonine (T) Tryptophan (W) Tyrosine (Y) Valine (V)http://www.chemie.fu-berlin.de/chemistry/bio/http://www.chemie.fu-berlin.de/chemistry/bio/
![Page 34: Barcelona sabatica](https://reader035.vdocuments.us/reader035/viewer/2022081413/546c9af6af795953298b4f6e/html5/thumbnails/34.jpg)
ArchDB database
Protein Data Bank (PDB) http://www.rcsb.org contains ~ 25 000 proteins with known structure of ~ 106 entries in SWISS-PROT
ArchDB ~ 20 000 classified loops