slide 1 of 38 t-cell epitopes prediction of hemagglutinin, neuraminidase and matrix protein of...

Slide Slide 11 of 38 of 38

T-cell EPITOPES PREDICTION OF

HEMAGGLUTININ, NEURAMINIDASE AND

MATRIX PROTEIN OF INFLUENZA A

VIRUS USING SUPPORT VECTOR

MACHINE AND HIDDEN MARKOV MODEL

Vo Cam Quy, Nguyen Thanh Khoi, Nguyen Thi Truc Minh, Tran Linh Thuoc

Department of Biotechnology

University of Natural Sciences

Vietnam National University – HoChiMinh city, VietNam

Sixth International Conference on BioinformaticsInCob2007, HongKong

OUTLINEOUTLINE

Introduction Epitope prediction methods Influenza A virus

Materials And Methods Results And Discussion Conclusion and future

Epitope Epitope in in silicosilico Analysis Analysis

Gene/Protein Sequence Database

Disease related protein DB

Candidate Epitope DB

VACCINOME

PeptideMultiepitope

vaccines

Epitope prediction

EpitopeEpitope An epitope is the part of a macromolecule that is

recognized by the immune system, specifically by antibodies, B cells, or T cells.

Most referred as three-dimensional surface features of an antigen molecule

linear epitopes are determined by the amino acid sequence

EPITOPE PREDICTION EPITOPE PREDICTION STRATEGIESSTRATEGIES

Epitope prediction

B cell epitope prediction T cell epitope prediction

structurechemical features Sequence Structure

Binding motifs, matrices Statitical methodMachine learning method

Hidden Markov Model

Flexible model

Support Vector Machine, Artifical Neural Network…

High accuracyQuantitative Matrices

Tcell epitope prediction Tcell epitope prediction approachapproach

T cell epitope prediction

Direct approach Indirect approach

Negative:

non-epitope

Postive:

Putative epitope

Postive:

MHC binding peptides (binder)

Negative:

MHC-I non-

binding peptides

(non-binder)

Epitope Candidates

Compare

Influenza A virusInfluenza A virus Influenza A viruses continue to emerge from the aquatic avian

reservoir and cause pandemics Many variances and mutations in the population difficult for

vaccine producing

http://www.roche.com/pages/facets/10/viruse.htm

Genome: Consists of s/s (-) sense RNA in

8 segments Hemagglutinin,

neuraminidase, matrix protein are 3 of proteins concerned much.

Red: M2 protein

Green: hemagglutinin

Blue: euraminidase

Inside: viral RNA

OBJECTIVEOBJECTIVEBuilding HMM and SVM models for T

cell epitope prediction (MHC class I and II) Direct approach (epitope prediction) Indirect approach (MHC binder prediction) combining the results to get epitope

candidates

Epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico

METHODSMETHODS

AntiJen MHCBN IEDB

Data collection

Raw data

Training set

Processing

Training

models

Evaluating

Optimal model

EPITOPES

epitopes predicted by both methods / both approachs

were considered as epitopes

Predict

Protein

1DATA COLLECTION AND PROCESSING

2BUILDING

3PARAMETERS OPTIMIZATION

4APPLYING

SVM method HMM method

RESULTS OF DATA COLLECTION AND RESULTS OF DATA COLLECTION AND PROCESSINGPROCESSING

MHC class

Indirect Direct

Positivedata set(binder)

Negativedata set

(non-binder)

Positivedata set(epitop

Negativedata set

(non-epitope)

H-2-Db 452 335 160 344H-2-Kb 446 413 219 465H-2-Kd 170 74 208 91

H-2-IAd 411 143 179 195H-2-IEd 274 41 199 85H-2-IEk 326 28 166 96

Allele

Peptidetype

24 data sets24 data sets

METHODSMETHODS

AntiJen MHCBN IEDB

Data collection

Raw data

Training set

Processing

Training

models

Evaluating

Optimal model

EPITOPES

epitopes predicted by both methods were considered as

epitopes

Predict

Protein

2BUILDING

4APPLYING

Step 2: BUIDLING MODEL – HMM Step 2: BUIDLING MODEL – HMM methodmethod

Positivetraining set

ClustalW

Perl script

modelfromalign

Initial model

Result: 11 matrices x 6 allele x 2 approaches

= 132 initial models

Step 2: BUIDLING MODEL – SVM Step 2: BUIDLING MODEL – SVM methodmethod

Motif 9mer (binding core)

MHC class II binder/epitope data processing

non-binder/non-epitope data processing

Sequence is cut into overlaps

8mer/9mer

Choosing peptide

conforming reported

Motif information from SYFPEITHI database

MHC class I binder/epitope data processing

(script perl)

Negative data

Positive data

METHODSMETHODS

AntiJen MHCBN IEDB

Data collection

Raw data

Training set

Processing

Training

models

Evaluating

Optimal model

EPITOPES

epitopes

Predict

Protein

2BUILDING

4APPLYING

STEP 3: PARAMETERS STEP 3: PARAMETERS OPTIMIZATIONOPTIMIZATION

HMM METHOD

COUPLE OF MODELS

12 Positivedata set

132 Initial models

Positive model

buildmodel(Baum-Welch

or Viterbi)

12 Negativedata set

Negative model

buildmodel(Baum-Welch

or Viterbi)

TRAINING PRINCIPLE

Training setTest set

ROC analysis

+-Training

Training

Initial model(positive)

Couple 1

Acc. 6

10-FOLD CROSS VALIDATION

1 2 3 4 5

Acc. 1

6 7 8 9 10

Positive and negativedata sets

Acc. 2

Acc. 3

Acc. 4

Acc. 5

Acc. 7

Acc. 8

Acc. 9

Acc. 10

Averageaccuracy

NLL CALCULATING PRINCIPLE

Negative model

Positive model

PPVPVSKVVSTDEYVARQueried sequence

hmmscore(Viterbi)

hmmscore(Viterbi) NLL 1

final NLL

Compare

Epitope

?NLL 2

threshold

Non-epitope

final NLL

threshold NLL

final NLL

threshold NLL

NLL 1 – NLL 2

ROC (Receiver Operating Curve) Analysis

AROC > 90%: excellent prediction

AROC > 80%: good prediction

AROC < 80%: not acceptable prediction

RESULTS OF VALIDATION

5 76.7

Ma trận điểm

Độ chính xác (%)

Baum-Welch

Viterbi

The validation result of 22 couples of models trained by Baum-Welch and Viterbi algorithm in indirect approach for H-2-Db allele

Name Approach Algorithm Matrix Accuracy (%)

Db_GBA90 Indirect Baum-Welch PAM 90 85,30

Db_TBL75 Direct Baum-Welch BLOSUM 75 86,00

Kb_GBL70 Indirect Baum-Welch BLOSUM 62 79,80

Kb_TBL70 Direct Baum-Welch BLOSUM 70 84,54

Kd_GBA50 Indirect Baum-Welch PAM 50 83,55

Kd_TBL85 Direct Baum-Welch BLOSUM 85 84,72

IAd_GBP70 Indirect Baum-Welch PAM 70 77,41

IAd_TBA90 Direct Baum-Welch PAM 90 77,84

IEd_GVL75 Indirect Viterbi BLOSUM 75 92,77

IEd_TBA70 Direct Baum-Welch PAM 70 93,90

IEk_GVL70 Indirect Viterbi BLOSUM 70 95,11

IEk_TVL75 Direct Viterbi BLOSUM 75 69,52

OPTIMAL PARAMETERS

STEP 3: PARAMETERS STEP 3: PARAMETERS OPTIMIZATIONOPTIMIZATION

SVM METHOD

LOOCV (LEAVE-ONE-OUT-CROSS-LOOCV (LEAVE-ONE-OUT-CROSS-VALIDATION)VALIDATION)

Removing one peptide from the training data

The model was built by remaining data

Testing was done on the removed peptide

Training set

THE ACCURACY (MHC class I THE ACCURACY (MHC class I MODELS)MODELS)

86.58%83.45%

80.25%83.77%

75.43%72.03%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%

H-2-Db H-2-Kd H-2-Kb

Accuracy

Direct method

Indirect method

MHC allele

comparing the accuracies of predictive models between direct and indirect method after carrying out LOOCV procedure (mhc class I)

THE ACCURACY (MHC class II THE ACCURACY (MHC class II MODELS)MODELS)

Direct method

Indirect method

Accuracy

MHC allele

OPTIMAL PARAMETERS (MHC OPTIMAL PARAMETERS (MHC CLASS I)CLASS I)

MHC Allele

Kernel functions

and parameters

Direct method Indirect method

H-2-Db

Selected kernel

functionLinear function RBF function

Optimal paremeters

-t 0 -c 0.1111-t 2 -c 1 –g 0.145

H-2-Kd

Selected kernel

functionPolynimial function Polynimial function

Optimal paremeters

-t 1 -c 0.1-d 3 -s 0.2 -r 2

-t 1 -c 0.001-d 3 -s 2.5 -r 8

H-2-Kb

Selected kernel

functionLinear function RBF function

Optimal paremeters

-c 1.4-t 2 -c 1-g 0.115

Kernel functions:

- Linear function - Polynimial function - RBF function - Sigmoid function

OPTIMAL PARAMETERS (MHC OPTIMAL PARAMETERS (MHC CLASS II)CLASS II)

MHC Allele

Kernel functions

and parameters

Direct method Indirect method

H-2-Db

Selected kernel

functionLinear function Linear function

Optimal paremeters

-t 0-c 0.15

-t 0-c 0.53

H-2-Kd

Selected kernel

Optimal paremeters

-t 0-c 0.19

-t 0-c 0.27

H-2-Kb

Selected kernel

Optimal paremeters

-t 0 -c 1.4-t 2 -c 1-g 0.115

Kernel functions:

- Linear function - Polynimial function - RBF function - Sigmoid function

METHODSMETHODS

AntiJen MHCBN IEDB

Data collection

Raw data

Training set

Processing

Training

models

Evaluating

Optimal model

EPITOPES

epitopes

Predict

Protein

2BUILDING

4APPLYING

EPITOPE PREDICTION RESULTS – SVM METHOD

MHC class I MHC class II

H-2- DB H-2Kb H-2Kd H-2-IAd H-2-IEd H-2-IEk

MHC binder 334 1565 1012 3557 1982 458

Putative epitope 1756 5618 1297 3675 787 2285

Epitope candidate

268 984 694 938 469 225

MHC binder 261 911 694 2595 1076 236

Epitope candidate

192 560 309 791 213 123

MHC binder 24 95 49 256 130 38

Epitope candidate

13 65 17 79 44 21

EPITOPE PREDICTION RESULTS – HMM METHOD

Protein Indirect method

Direct method

Compared results

HA 11386 6752 2960

NA 6658 5634 2171

Matrix 929 361 189

Total amount of epitopes in Total amount of epitopes in Influenza A virusInfluenza A virus

HA NA M

H-2DB 15 12 0

H-2Kd 56 14 1

Table 7: The number of epitopes in both HMM - SVM method

proteinAllel

MHC allele

Sequence description

Start StopEpitope

sequence

No. of epitope

H-2-Kd

>Q67157|M1_IAAIC Matrix protein1-Influenza A virus (strain A/Aichi/2/1968 H3N2)

99 107 YRKLKREIT

3129 137 LIYNRMGAV

131 139 YNRMGAVTT

H-2-Kb

>P03445|HEMA_IADM1 Hemagglutinin[Contains: Hemagglutinin HA1 chain] (Fragment)-Influenza A virus(strain A/Duck/Memphis/546/1976 (H11N9)

10 18 IICIRADE

21 29 GYLSNNST

44 52 SVELVENE

58 66 SIDGKAPI

69 77 DCSFAGWI

74 82 GWILGNPM

90 98 SWSYIVEN

92 100 SYIVENQS

EPITOPE PREDICTION RESULTS – EPITOPE PREDICTION RESULTS – EXAMPLESEXAMPLES

WEB PREDICTION TOOL FOR HMM METHOD

Positive results

Negativeresults

Number ofpositive sequences

Number ofnegative sequences

WEB PREDICTION TOOL FOR HMM METHOD (cont)

CONCLUSIONSCONCLUSIONS SVM method: the model accuracy

Indirect method is betterMHC class I: H-2-Db (86.58%), H-2-Kb

(80.25% ) and H-2-Kd (83.45%)MHC class II: H-2-IEd (93.26%), H-2-IEk

(95.19%), H-2-IAd (89.42%) HMM method: the model accuracy

dicrect method is betterMHC class I: H-2-Db (86%), H-2-Kb (84.54% )

and H-2-Kd (84.72%)MHC class II: H-2-IEd (93.90%), H-2-IEk

(95.11%), H-2-IAd (77.84%)

CONCLUSIONSCONCLUSIONSBuilt HMM and SVM models for T cell epitope prediction (MHC class I and II) Direct approach (epitope prediction) Indirect approach (MHC binder

prediction)

with a high accuracy

Applying successfully these model for epitope prediction of Influenza A virus’s proteins for the design of vaccine in silico

FUTURE WORKSFUTURE WORKSApplying this tool to other proteins Will run any programs by web. B cell epitope predictionTest result by biological experiment…

THANK YOU FOR YOUR ATTENTION

slide 1 of 38 t-cell epitopes prediction of hemagglutinin, neuraminidase and matrix protein of...

silico slide

hongkong slide

data sets slide

nonepitope postive

viral rna slide

putative epitope postive

future work slide

amino acid sequence

Documents

in silico analysis of hemagglutinin, neuraminidase, and...

streptococcal neuraminidase and acute glomerulonephritis

cloning expression neuraminidase gene nanh escherichia ·...

epitopes with phage-display libraries

expression of haemagglutinin-neuraminidase envelope...

streptococcus oralis neuraminidase modulates adherence to...

no treatment cp neuraminidase st neuraminidase

prediction of b cell epitopes

powerpoint presentation · t-cell epitopes cell lysis by...

a membrane-associated neuraminidase in entamoeba...

sanjay v, chaudhari t, shambhu mg, kusum p. inhibitory ......

influenza virus assembly and budding in raft-derived ... ·...

neuraminidase inhibitor susceptibility testing in human...

neuraminidase inhibitiors

na-star ® influenza neuraminidase inhibitor resistance

caspr2 autoantibodies target multiple epitopes

b-cell epitopes

effectiveness of neuraminidase inhibitors in reducing

antivirals targeting the neuraminidase · antivirals...

purified influenza virus hemagglutinin and neuraminidase are