Download - Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction
![Page 1: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/1.jpg)
Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction
Rajkumar Bondugula,
Ognen Duzlevski and Dong Xu
Digital Biology Laboratory, Dept. of Computer Science
University of Missouri – Columbia, MO 65211, USA
![Page 2: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/2.jpg)
Outline
Introduction Protein secondary structure prediction Popular methods K-Nearest Neighbor method Fuzzy K-Nearest Neighbor method
Methods Filtering the prediction Results and discussion Summary and Future work
![Page 3: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/3.jpg)
Introduction
Goal: Given a sequence of amino acids, predict in which one of the eight possible secondary structures states {H, G, I, B, E, C, S,T} will each residue fold in to.
CASP convention {H,G,I} → H {B,E} → E {C,S,T} → C
Example:Amino Acid VKDGYIVDXVNCTYFCGRNAYCNEECTKLXGEQWASPYYCYXLPDHVRTKGPGRCHSecondary StructureCEEEEEECCCCCCCCCCCHHHHHHHHHHCCCCEEEECCEEEEECCCCCCCCCCCCC
![Page 4: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/4.jpg)
Protein 3-Dimensional structure
![Page 5: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/5.jpg)
Importance of Secondary Structure
An intermediate step in 3D structure prediction structure → function
ClassificationEx: α, β, α/β, α+β
Helps in protein folding pathway determination
![Page 6: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/6.jpg)
Existing Methods
Popular MethodsNeural Network methods
Ex: PSIPRED, PHD
Nearest Neighbor methods Ex: NNSSP
Hidden Markov Model methods
![Page 7: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/7.jpg)
Why K-Nearest Neighbors method?
Methods based on Neural Networks and Hidden Markov models perform well if the query protein have many homologs
in the sequence databasenot easily expandable
The 1-Nearest Neighbor rule is bound above by no more than twice the optimal Baye’s error rate [Keller et. al, 1985]
K-NN will work better and better as more and more structures are being solved
![Page 8: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/8.jpg)
K-Nearest Neighbor Algorithm
Instances to be classified Classified instances
![Page 9: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/9.jpg)
Instances to be classified Classified instances
K-Nearest Neighbor Algorithm
![Page 10: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/10.jpg)
K-Nearest Neighbor Algorithm
Instances to be classified class B class F
![Page 11: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/11.jpg)
K-Nearest Neighbor Algorithm
Advantages of Nearest Neighbor methodsSimple and transparent model
New structures can be added without re-training
Linear complexity
DisadvantageSlower compared to other models as processing is
delayed until prediction is needed
![Page 12: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/12.jpg)
Why Fuzzy K-NN?
Disadvantages of Crisp K-NN Atypical examples are given as much as weight as those that
truly represent a particular class
Once instance is assigned to a class, there is no indication of its “strength” of its membership in that class
![Page 13: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/13.jpg)
- - - N L G A G N S G L N L G H V A L T F
![Page 14: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/14.jpg)
- - - N L G A
- - - N L G A G N S G L N L G H V A L T F
![Page 15: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/15.jpg)
- - - N L G A- - N L G A G
- - - N L G A G N S G L N L G H V A L T F
![Page 16: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/16.jpg)
- - - N L G A- - N L G A G- N L G A G N
- - - N L G A G N S G L N L G H V A L T F
![Page 17: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/17.jpg)
- - - N L G A- - N L G A G- - N L G A NN L G A G N S
- - - N L G A G N S G L N L G H V A L T F
![Page 18: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/18.jpg)
- - - N L G A- - N L G A G- N L G A G N SL G A G N S G
- - - N L G A G N S G L N L G H V A L T F
![Page 19: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/19.jpg)
Position Specific Scoring Matrix
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 -1 -3 -4 8 -1 -2 -4 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 -1 -2 -3 -2 0 3 -3 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 -2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 4 3 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 -2 -3 -4 -3 -1 -2 3 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 -3 1 0 -3 -2 -2 2 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 1 3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 -1 -4 -4 -2 -1 -1 5 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
Length of protein(l)
20
PSI-BLAST
. . . N L G A G N S G L N L G H V A L T F . . .
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
![Page 20: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/20.jpg)
Why Profile-FKNN?
Evolutionary information has been shown to increase the accuracy of secondary structure prediction by many popular methods
An attempt to combine the advantages of incorporating the evolutionary information, fuzzy set theory and nearest neighbor methods
![Page 21: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/21.jpg)
Methods
Calculate profiles using PSI-BLAST The popular Rost and Sander database of 126
representative proteins (<25% sequence Identity)
Find K-Nearest Neighbors Calculate the membership values of the neighbors Calculate the membership values of the current
residue Assign classes Filter the output
![Page 22: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/22.jpg)
Profile Calculation
The profiles of both the query protein and the test protein are calculated using the program PSI-BLAST
Parameters for PSI-BLAST Expectation Value (e) = 0.1
Maximum number of passes (j) = 3
E-value threshold for inclusion in multi-pass model (h) = 5
Default values for the rest of the parameters
![Page 23: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/23.jpg)
K-Nearest Neighbors
For each profile-window in the query protein, the position-weighted absolute distance ‘d’ is calculated from all profile-windows of all proteins in the database.
The profile-windows corresponding to K smallest distances are retained as the K-Nearest Neighbors
20
1 1
1,min,1maxi
W
j
Databaseij
Queryij jWjppd
![Page 24: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/24.jpg)
Distance Calculation
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 2 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4. . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . N L G A G N S G L T F . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 -2 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 -2 -3 -4 -3 -1 -2 3 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 -3 1 0 -3 -2 -2 2 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 1 3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 -1 -4 -4 -2 -1 -1 5 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 4 3 . . .
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 -1 -3 -4 8 -1 -2 -4 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 -1 -2 -3 -2 0 3 -3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . N L G A G N S G L N L G H V A L T F . . .
![Page 25: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/25.jpg)
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 2 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4. . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . N L G A G N S G L T F . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 -2 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 -2 -3 -4 -3 -1 -2 3 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 -3 1 0 -3 -2 -2 2 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 1 3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 -1 -4 -4 -2 -1 -1 5 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 4 3 . . .
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 -1 -3 -4 8 -1 -2 -4 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 -1 -2 -3 -2 0 3 -3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . N L G A G N S G L N L G H V A L T F . . .
Distance Calculation
![Page 26: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/26.jpg)
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 2 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4. . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . N L G A G N S G L T F . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 -2 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 -2 -3 -4 -3 -1 -2 3 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 -3 1 0 -3 -2 -2 2 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 1 3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 -1 -4 -4 -2 -1 -1 5 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 4 3 . . .
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 -1 -3 -4 8 -1 -2 -4 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 -1 -2 -3 -2 0 3 -3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . N L G A G N S G L N L G H V A L T F . . .
Distance Calculation
![Page 27: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/27.jpg)
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 2 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4. . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . N L G A G N S G L T F . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 -2 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 -2 -3 -4 -3 -1 -2 3 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 -3 1 0 -3 -2 -2 2 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 1 3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 -1 -4 -4 -2 -1 -1 5 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 4 3 . . .
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 -1 -3 -4 8 -1 -2 -4 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 -1 -2 -3 -2 0 3 -3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . N L G A G N S G L N L G H V A L T F . . .
Distance Calculation
![Page 28: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/28.jpg)
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 2 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4. . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . N L G A G N S G L T F . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 -2 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 -2 -3 -4 -3 -1 -2 3 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 -3 1 0 -3 -2 -2 2 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 1 3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 -1 -4 -4 -2 -1 -1 5 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 4 3 . . .
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 -1 -3 -4 8 -1 -2 -4 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 -1 -2 -3 -2 0 3 -3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . N L G A G N S G L N L G H V A L T F . . .
Distance Calculation
![Page 29: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/29.jpg)
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 2 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4. . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 . . .
. . . N L G A G N S G L T F . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 -2 . . .
. . . 0 -3 -1 -2 -3 -2 -3 6 -3 -4 -4 -2 -3 -4 -3 -1 -2 3 . . .
. . . 0 -3 -4 -4 -2 -3 -3 -3 -3 1 5 -3 1 0 -3 -2 -2 2 . . .
. . . 2 -1 0 -1 -1 -1 -1 -1 -2 -3 -3 -1 -2 -3 -1 5 1 3 . . .
. . . -2 -2 3 6 -4 -1 1 -2 -1 -4 -4 -1 -4 -4 -2 -1 -1 5 . . .
. . . 2 -3 -3 -3 -2 -2 -3 -2 -3 1 0 -2 0 4 -3 -1 -1 2 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -4 -4 -1 -3 -3 -4 -4 2 0 -3 0 -1 -3 -2 -1 3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . 0 -1 0 -1 -1 -1 -1 -1 -2 -2 -3 -1 -2 -3 -1 4 4 3 . . .
. . . 5 -2 -2 -2 -1 -1 -1 0 -2 -2 -2 -1 -1 -3 -1 1 0 -3 . . .
. . . -1 -3 -2 -2 -3 -2 -2 -3 -3 -3 -3 -1 -3 -4 8 -1 -2 -4 . . .
. . . 3 -2 -1 -2 -1 -2 -2 2 -2 -2 -2 -1 -2 -3 -2 0 3 -3 . . .
. . . 0 -2 0 -1 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -2 4 4 3 . . .
. . . -1 -3 -3 -2 -3 -2 -2 -3 -3 -3 -4 -2 -3 -4 8 -1 -2 4 . . .
. . . 4 -2 -1 -2 -1 -1 -1 -1 -2 -2 -2 -1 -2 -3 -1 3 0 3 . . .
. . . N L G A G N S G L N L G H V A L T F . . .
Distance Calculation
![Page 30: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/30.jpg)
Membership Values of the Neighbors
The memberships of the nearest neighbors are assigned based on their corresponding secondary structures in various positions in the window
The residues near to the center are weighed more than the residues that are farther away
![Page 31: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/31.jpg)
Membership values of the Neighbors
0.067 0.133 0.20 0.20 0.20 0.133 0.067
H
E 1 1 1
C 1 1 1 1
C C E E E C C
H = 0
E = 0.200x1 + 0.200x1 + 0.20x1 = 0.6
C = 0.067x1 + 0.133x1 +0.133x1 + 0.067x1 = 0.4C C E E E C C
E
N L G A G N S
A
![Page 32: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/32.jpg)
Membership Value
The membership values of each residue in classes Helix, Sheet and Coil is calculated from the corresponding neighbors using the Fuzzy K-NN algorithm
Each residue is assigned to class in which it has the highest membership value
Helix = . . . 15 22 61 91 95 96 26 21 23 18 29 30 24 17 5 8 . . .
Sheet = . . . 22 28 13 1 1 2 8 8 12 11 42 44 46 29 14 10 . . .
Coil = . . . 63 50 26 8 4 2 65 71 65 71 29 26 31 53 81 82 . . .
Final = . . . C C H H H H C C C C E E E C C C . . .
![Page 33: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/33.jpg)
Fuzzy K-Nearest neighbor Algorithm
BEGIN Initialize i=1. DO UNTIL(r assigned membership in all classes) Compute ui(r) using
Increment i. END DO UNTILEND
K
j
mj
K
j
mjij
i
rrd
rrdu
ru
1
12
1
12
),(/1
),(/1
)(
Where,
ui = membership value of
residue ‘r’ in class ‘i’,
i = Helix, Sheet or Coil
d(r,rj)= distance between query
window centered in
residue ‘r’ its jth
neighbor
m = 2 (Fuzzifier)
![Page 34: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/34.jpg)
Structure Filtration
In the basic setting, the secondary structure state is class with highest membership value
Unrealistic structures may be present Popular methods of structure filtration
Neural Network
Heuristic based
![Page 35: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/35.jpg)
Heuristic Filter
1. Smoothen the memberships values
2. Filter unrealistic structures Helix > 3 amino acids, -sheet > 2 amino acids
3. Calculate the thresholds to filter noise
4. Mark the possible Helix and Sheet regions Resolve conflicts based on average membership value in
overlap region
5. Fill the rest of the structure with Coil
11 25.05.025.0 nnn mmmm
![Page 36: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/36.jpg)
Filter: Final Structure
Unfiltered CCCCCHCCCCCHHHHHHHHCCCCCCEEEEECCCCCCCCCCCCCEEEEEECCCCCCHHHCCCCCTarget CCCHHHCCCCHHHHHHHHHHHCCCCEEEEEECCCCEECCCCCCEEEEEEECCCCEECCCCEECFiltered CCHHHHCCCHHHHHHHHHHHHHCCCEEEEEECCCCCCCCCCCCEEEEEEECCCCCCCCCCCCC
![Page 37: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/37.jpg)
Metrics
Seven commonly used metricsQ3 = Number of correctly predicted residues x 100
Total number of residues
Q<H,E,C>= Number of <helix,sheet,coil> residues correctly predicted X100
Total number of residues in <helix,sheet,coil>
Matthew’s Correlation Coefficient
MCC<H,E,C>= opuponun
uopn
where, p – true positives n – true negatives u – false negatives o – false positives
![Page 38: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/38.jpg)
Results
Q3(%) QH(%) QE(%) QC(%) MH ME MC
Unfiltered 74.0 69.6 55.8 79.9 0.58 0.61 0.54
Filtered 76.2 68.1 66.1 80.4 0.64 0.64 0.56
Performance on database of 1973 proteins (<25% sequence identity) generated by the PISCES1 server
1. G. Wang and R. L. Dunbrack, Jr. PISCES: a protein sequence culling server. Bioinformatics, 19:1589-1591, 2003.
![Page 39: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/39.jpg)
Relative Performance
Method Accuracy
MBR1 66.40
NN2 68.00
NNSSP3 72.20
PFKNN 76.20
1. X. Zhang, J. P. Mesirov and D.L Waltz. Hybrid system for Protein Secondary Structure Prediction. J. Mol. Biol., 225:1049-1063, 1992
2. Tau-Mu Yi and E. S. Lander. Protein Secondary Structure Prediction using Nearest-Neighbor Methods. J. Mol. Biol., 232:1117-1129, 1993
3. A. A. Salamov and V. V. Solovyev. Prediction of Protein Secondary Structure by Combining Nearest-neighbor Algorithm and Multiple Sequence Alignments. J. Mol. Biol., 247:11-15, 1995
![Page 40: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/40.jpg)
Summary
A novel approach for PSSP Evolutionary information
K-Nearest Neighbor algorithm
Fuzzy set theory
Most accurate KNN approach to date Easily expandable Accuracy increases with new structures Average computing time < 1 min on a single
CPU machine
![Page 41: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/41.jpg)
Future Work
System with faster search capabilitiesEfficient search for neighbors
Accurate prediction system
![Page 42: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/42.jpg)
Acknowledgements
Dr. James Keller for insight into the Fuzzy K-Nearest Neighbor Algorithm
Oak Ridge National Laboratory for providing the supercomputing facilities
Members of Digital Biology Laboratory for their support
![Page 43: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/43.jpg)
Software
The enhanced version of the software is coded in C and is available upon request. Please e-mail your requests to
![Page 44: Profiles and Fuzzy K-Nearest Neighbor Algorithm for Protein Secondary Structure Prediction](https://reader036.vdocuments.us/reader036/viewer/2022081419/56814527550346895db1ee32/html5/thumbnails/44.jpg)
Thank you for
Participation!