slide 1 of 38 t-cell epitopes prediction of hemagglutinin, neuraminidase and matrix protein of...
Post on 14-Dec-2015
216 Views
Preview:
TRANSCRIPT
Slide Slide 11 of 38 of 38
T-cell EPITOPES PREDICTION OF
HEMAGGLUTININ, NEURAMINIDASE AND
MATRIX PROTEIN OF INFLUENZA A
VIRUS USING SUPPORT VECTOR
MACHINE AND HIDDEN MARKOV MODEL
Vo Cam Quy, Nguyen Thanh Khoi, Nguyen Thi Truc Minh, Tran Linh Thuoc
Department of Biotechnology
University of Natural Sciences
Vietnam National University – HoChiMinh city, VietNam
Sixth International Conference on BioinformaticsInCob2007, HongKong
Slide Slide 22 of 38 of 38
OUTLINEOUTLINE
Introduction Epitope prediction methods Influenza A virus
Materials And Methods Results And Discussion Conclusion and future
work
Slide Slide 33 of 38 of 38
Epitope Epitope in in silicosilico Analysis Analysis
Gene/Protein Sequence Database
Disease related protein DB
Candidate Epitope DB
VACCINOME
PeptideMultiepitope
vaccines
Epitope prediction
Slide Slide 44 of 38 of 38
EpitopeEpitope An epitope is the part of a macromolecule that is
recognized by the immune system, specifically by antibodies, B cells, or T cells.
Most referred as three-dimensional surface features of an antigen molecule
linear epitopes are determined by the amino acid sequence
Slide Slide 55 of 38 of 38
EPITOPE PREDICTION EPITOPE PREDICTION STRATEGIESSTRATEGIES
Epitope prediction
B cell epitope prediction T cell epitope prediction
structurechemical features Sequence Structure
Binding motifs, matrices Statitical methodMachine learning method
Hidden Markov Model
Flexible model
Support Vector Machine, Artifical Neural Network…
High accuracyQuantitative Matrices
Slide Slide 66 of 38 of 38
Tcell epitope prediction Tcell epitope prediction approachapproach
T cell epitope prediction
Direct approach Indirect approach
Negative:
non-epitope
Postive:
Putative epitope
Postive:
MHC binding peptides (binder)
Negative:
MHC-I non-
binding peptides
(non-binder)
Epitope Candidates
Compare
Slide Slide 77 of 38 of 38
Influenza A virusInfluenza A virus Influenza A viruses continue to emerge from the aquatic avian
reservoir and cause pandemics Many variances and mutations in the population difficult for
vaccine producing
http://www.roche.com/pages/facets/10/viruse.htm
Genome: Consists of s/s (-) sense RNA in
8 segments Hemagglutinin,
neuraminidase, matrix protein are 3 of proteins concerned much.
Red: M2 protein
Green: hemagglutinin
Blue: euraminidase
Inside: viral RNA
Slide Slide 88 of 38 of 38
OBJECTIVEOBJECTIVEBuilding HMM and SVM models for T
cell epitope prediction (MHC class I and II) Direct approach (epitope prediction) Indirect approach (MHC binder prediction) combining the results to get epitope
candidates
Epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico
Slide Slide 99 of 38 of 38
METHODSMETHODS
AntiJen MHCBN IEDB
Data collection
Raw data
Training set
Processing
Training
models
Evaluating
Optimal model
EPITOPES
epitopes predicted by both methods / both approachs
were considered as epitopes
Predict
Protein
1DATA COLLECTION AND PROCESSING
2BUILDING
MODEL
3PARAMETERS OPTIMIZATION
4APPLYING
SVM method HMM method
Slide Slide 1010 of 38 of 38
RESULTS OF DATA COLLECTION AND RESULTS OF DATA COLLECTION AND PROCESSINGPROCESSING
Alen
MHC class
Indirect Direct
Positivedata set(binder)
Negativedata set
(non-binder)
Positivedata set(epitop
e)
Negativedata set
(non-epitope)
I
H-2-Db 452 335 160 344H-2-Kb 446 413 219 465H-2-Kd 170 74 208 91
II
H-2-IAd 411 143 179 195H-2-IEd 274 41 199 85H-2-IEk 326 28 166 96
Allele
Peptidetype
24 data sets24 data sets
Slide Slide 1111 of 38 of 38
METHODSMETHODS
AntiJen MHCBN IEDB
Data collection
Raw data
Training set
Processing
Training
models
Evaluating
Optimal model
EPITOPES
epitopes predicted by both methods were considered as
epitopes
Predict
Protein
1DATA COLLECTION AND PROCESSING
2BUILDING
MODEL
3PARAMETERS OPTIMIZATION
4APPLYING
SVM method HMM method
Slide Slide 1212 of 38 of 38
Step 2: BUIDLING MODEL – HMM Step 2: BUIDLING MODEL – HMM methodmethod
Positivetraining set
ClustalW
Perl script
modelfromalign
Initial model
Result: 11 matrices x 6 allele x 2 approaches
= 132 initial models
Slide Slide 1313 of 38 of 38
Step 2: BUIDLING MODEL – SVM Step 2: BUIDLING MODEL – SVM methodmethod
Motif 9mer (binding core)
MHC class II binder/epitope data processing
non-binder/non-epitope data processing
Sequence is cut into overlaps
8mer/9mer
Choosing peptide
conforming reported
motif
Motif information from SYFPEITHI database
MHC class I binder/epitope data processing
(script perl)
Negative data
Positive data
Slide Slide 1414 of 38 of 38
METHODSMETHODS
AntiJen MHCBN IEDB
Data collection
Raw data
Training set
Processing
Training
models
Evaluating
Optimal model
EPITOPES
epitopes predicted by both methods were considered as
epitopes
Predict
Protein
1DATA COLLECTION AND PROCESSING
2BUILDING
MODEL
3PARAMETERS OPTIMIZATION
4APPLYING
SVM method HMM method
Slide Slide 1515 of 38 of 38
STEP 3: PARAMETERS STEP 3: PARAMETERS OPTIMIZATIONOPTIMIZATION
HMM METHOD
COUPLE OF MODELS
12 Positivedata set
132 Initial models
Positive model
buildmodel(Baum-Welch
or Viterbi)
12 Negativedata set
Negative model
buildmodel(Baum-Welch
or Viterbi)
TRAINING PRINCIPLE
Training setTest set
ROC analysis
+-Training
Training
Initial model(positive)
Couple 1
Acc. 6
10-FOLD CROSS VALIDATION
1 2 3 4 5
Acc. 1
6 7 8 9 10
Positive and negativedata sets
Acc. 2
Acc. 3
Acc. 4
Acc. 5
Acc. 7
Acc. 8
Acc. 9
Acc. 10
Averageaccuracy
NLL CALCULATING PRINCIPLE
Negative model
Positive model
PPVPVSKVVSTDEYVARQueried sequence
hmmscore(Viterbi)
hmmscore(Viterbi) NLL 1
final NLL
Compare
Epitope
?NLL 2
threshold
NLL
Non-epitope
final NLL
threshold NLL
final NLL
threshold NLL
NLL 1 – NLL 2
ROC (Receiver Operating Curve) Analysis
AROC > 90%: excellent prediction
AROC > 80%: good prediction
AROC < 80%: not acceptable prediction
RESULTS OF VALIDATION
85.1
4
84.7
6
84.9
8
84.5
4
84.2
5
84.2
4
84.8
1
83.5
7
84.7
6
84.9
8
85.3
0
77.3
3
77.2
4
76.7
2
73.6
1
74.1
5 76.7
4
77.2
8
78.0
7
77.0
6
74.0
0
77.2
5
73.00
75.00
77.00
79.00
81.00
83.00
85.00
87.00
BLOSU
M62
BLOSU
M70
BLOSU
M75
BLOSU
M80
BLOSU
M85
BLOSU
M90
PAM50
PAM60
PAM70
PAM80
PAM90
Ma trận điểm
Độ chính xác (%)
Baum-Welch
Viterbi
The validation result of 22 couples of models trained by Baum-Welch and Viterbi algorithm in indirect approach for H-2-Db allele
Name Approach Algorithm Matrix Accuracy (%)
Db_GBA90 Indirect Baum-Welch PAM 90 85,30
Db_TBL75 Direct Baum-Welch BLOSUM 75 86,00
Kb_GBL70 Indirect Baum-Welch BLOSUM 62 79,80
Kb_TBL70 Direct Baum-Welch BLOSUM 70 84,54
Kd_GBA50 Indirect Baum-Welch PAM 50 83,55
Kd_TBL85 Direct Baum-Welch BLOSUM 85 84,72
IAd_GBP70 Indirect Baum-Welch PAM 70 77,41
IAd_TBA90 Direct Baum-Welch PAM 90 77,84
IEd_GVL75 Indirect Viterbi BLOSUM 75 92,77
IEd_TBA70 Direct Baum-Welch PAM 70 93,90
IEk_GVL70 Indirect Viterbi BLOSUM 70 95,11
IEk_TVL75 Direct Viterbi BLOSUM 75 69,52
OPTIMAL PARAMETERS
Slide Slide 2222 of 38 of 38
STEP 3: PARAMETERS STEP 3: PARAMETERS OPTIMIZATIONOPTIMIZATION
SVM METHOD
LOOCV (LEAVE-ONE-OUT-CROSS-LOOCV (LEAVE-ONE-OUT-CROSS-VALIDATION)VALIDATION)
Removing one peptide from the training data
The model was built by remaining data
Testing was done on the removed peptide
Training set
THE ACCURACY (MHC class I THE ACCURACY (MHC class I MODELS)MODELS)
86.58%83.45%
80.25%83.77%
75.43%72.03%
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
H-2-Db H-2-Kd H-2-Kb
Accuracy
Direct method
Indirect method
MHC allele
comparing the accuracies of predictive models between direct and indirect method after carrying out LOOCV procedure (mhc class I)
THE ACCURACY (MHC class II THE ACCURACY (MHC class II MODELS)MODELS)
Direct method
Indirect method
Accuracy
MHC allele
OPTIMAL PARAMETERS (MHC OPTIMAL PARAMETERS (MHC CLASS I)CLASS I)
MHC Allele
Kernel functions
and parameters
Direct method Indirect method
H-2-Db
Selected kernel
functionLinear function RBF function
Optimal paremeters
-t 0 -c 0.1111-t 2 -c 1 –g 0.145
H-2-Kd
Selected kernel
functionPolynimial function Polynimial function
Optimal paremeters
-t 1 -c 0.1-d 3 -s 0.2 -r 2
-t 1 -c 0.001-d 3 -s 2.5 -r 8
H-2-Kb
Selected kernel
functionLinear function RBF function
Optimal paremeters
-c 1.4-t 2 -c 1-g 0.115
Kernel functions:
- Linear function - Polynimial function - RBF function - Sigmoid function
OPTIMAL PARAMETERS (MHC OPTIMAL PARAMETERS (MHC CLASS II)CLASS II)
MHC Allele
Kernel functions
and parameters
Direct method Indirect method
H-2-Db
Selected kernel
functionLinear function Linear function
Optimal paremeters
-t 0-c 0.15
-t 0-c 0.53
H-2-Kd
Selected kernel
functionLinear function Linear function
Optimal paremeters
-t 0-c 0.19
-t 0-c 0.27
H-2-Kb
Selected kernel
functionLinear function Linear function
Optimal paremeters
-t 0 -c 1.4-t 2 -c 1-g 0.115
Kernel functions:
- Linear function - Polynimial function - RBF function - Sigmoid function
Slide Slide 2828 of 38 of 38
METHODSMETHODS
AntiJen MHCBN IEDB
Data collection
Raw data
Training set
Processing
Training
models
Evaluating
Optimal model
EPITOPES
epitopes predicted by both methods were considered as
epitopes
Predict
Protein
1DATA COLLECTION AND PROCESSING
2BUILDING
MODEL
3PARAMETERS OPTIMIZATION
4APPLYING
SVM method HMM method
EPITOPE PREDICTION RESULTS – SVM METHOD
MHC class I MHC class II
H-2- DB H-2Kb H-2Kd H-2-IAd H-2-IEd H-2-IEk
HA
MHC binder 334 1565 1012 3557 1982 458
Putative epitope 1756 5618 1297 3675 787 2285
Epitope candidate
268 984 694 938 469 225
NA
MHC binder 261 911 694 2595 1076 236
Putative epitope 1109 3839 774 2536 339 1555
Epitope candidate
192 560 309 791 213 123
M
MHC binder 24 95 49 256 130 38
Putative epitope 104 318 106 258 65 130
Epitope candidate
13 65 17 79 44 21
Slide Slide 3030 of 38 of 38
EPITOPE PREDICTION RESULTS – HMM METHOD
Protein Indirect method
Direct method
Compared results
HA 11386 6752 2960
NA 6658 5634 2171
Matrix 929 361 189
Slide Slide 3131 of 38 of 38
Total amount of epitopes in Total amount of epitopes in Influenza A virusInfluenza A virus
HA NA M
H-2DB 15 12 0
H-2Kd 56 14 1
Table 7: The number of epitopes in both HMM - SVM method
proteinAllel
e
MHC allele
Sequence description
Start StopEpitope
sequence
No. of epitope
s
H-2-Kd
>Q67157|M1_IAAIC Matrix protein1-Influenza A virus (strain A/Aichi/2/1968 H3N2)
99 107 YRKLKREIT
3129 137 LIYNRMGAV
131 139 YNRMGAVTT
H-2-Kb
>P03445|HEMA_IADM1 Hemagglutinin[Contains: Hemagglutinin HA1 chain] (Fragment)-Influenza A virus(strain A/Duck/Memphis/546/1976 (H11N9)
10 18 IICIRADE
8
21 29 GYLSNNST
44 52 SVELVENE
58 66 SIDGKAPI
69 77 DCSFAGWI
74 82 GWILGNPM
90 98 SWSYIVEN
92 100 SYIVENQS
EPITOPE PREDICTION RESULTS – EPITOPE PREDICTION RESULTS – EXAMPLESEXAMPLES
WEB PREDICTION TOOL FOR HMM METHOD
Positive results
Negativeresults
Number ofpositive sequences
Number ofnegative sequences
WEB PREDICTION TOOL FOR HMM METHOD (cont)
Slide Slide 3535 of 38 of 38
CONCLUSIONSCONCLUSIONS SVM method: the model accuracy
Indirect method is betterMHC class I: H-2-Db (86.58%), H-2-Kb
(80.25% ) and H-2-Kd (83.45%)MHC class II: H-2-IEd (93.26%), H-2-IEk
(95.19%), H-2-IAd (89.42%) HMM method: the model accuracy
dicrect method is betterMHC class I: H-2-Db (86%), H-2-Kb (84.54% )
and H-2-Kd (84.72%)MHC class II: H-2-IEd (93.90%), H-2-IEk
(95.11%), H-2-IAd (77.84%)
Slide Slide 3636 of 38 of 38
CONCLUSIONSCONCLUSIONSBuilt HMM and SVM models for T cell epitope prediction (MHC class I and II) Direct approach (epitope prediction) Indirect approach (MHC binder
prediction)
with a high accuracy
Applying successfully these model for epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico
Slide Slide 3737 of 38 of 38
FUTURE WORKSFUTURE WORKSApplying this tool to other proteins Will run any programs by web. B cell epitope predictionTest result by biological experiment…
Slide Slide 3838 of 38 of 38
THANK YOU FOR YOUR ATTENTION
top related