ppsc version 1.0 user guidestructure.bmc.lu.se/ppsc/files/user_guide_ppsc.pdf3 3 functions...

15
PPSC version 1.0 User Guide

Upload: phamthu

Post on 22-May-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

PPSC version 1.0

User Guide

Page 2: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

Index

1 Introduction.................................................................................................................... 1

2 Runtime environment ................................................................................................... 2

3 Functions......................................................................................................................... 3

3.1 Download PDB File ............................................................................................... 3

3.2 Calculate CE ........................................................................................................... 4

3.3 Train model ............................................................................................................. 5

3.4 Predict stability ...................................................................................................... 6

4 Input and output files ................................................................................................... 7

4.1 Formats of input files ............................................................................................ 7

4.2 Output files ............................................................................................................. 9

5 Example .......................................................................................................................... 9

6 Refernces....................................................................................................................... 12

Page 3: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

1

1 Introduction

Many amino acid substitutions affect the stability and biological functions of proteins.

Understanding the effects of these alterations facilitates the elucidation of the

molecular basis of many diseases (Thusberg and Vihinen, 2009). Site-directed

mutagenesis has been utilized for decades (Kearns-Jonker et al., 2007, Rajendhran

and Gunasekaran, 2007), but the experimental trail and error-based design and

construction of mutations is time-consuming and expensive, while its success rate in

protein character design and alteration is low. Experimental methods are tedious and

often costly, so computational methods are often used to predict stability changes

caused by amino acid substitutions whether related to analysis of disease related

variations or to design substitutions for protein engineering.

Several methods have been developed to predict protein stability changes based on

information about the primary sequence or protein three-dimensional structure. There

are energy function-based methods and machine learning-based methods. The energy

functions used in these models include: (1) a physical energy function calculated

using ab initio quantum mechanics (QM); (2) an empirical energy function or force

field derived from experimental data (Capriotti, et al., 2004); or (3) a statistical energy

function obtained by the analysis of protein structures. Both ab initio QM and force

field calculations are time-consuming and sensitive to small displacements in protein

structures. Ab initio QM calculations are also impractical for large proteins. Statistical

approaches may provide a similar accuracy to ab initio QM calculation, but their

theoretical foundation is not clear (Lazaridis and Karplus, 2000). Machine

learning-based methods include, for example, those utilizing support vector machines

and neural-networks (Capriotti et al., 2005; Capriotti, Fariselli and Casadio, 2004;

Cheng et al., 2006; Guerois et al., 2002; Shen et al., 2008). Many methods predict just

the sign of ΔΔG. A positive or negative ΔΔG corresponds to an increase or decrease in

the protein stability, respectively. Sequence-based classifiers have been trained and

Page 4: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

2

tested using different sequence window lengths (Capriotti, Fariselli, Calabrese and

Casadio, 2005; Cheng, Randall and Baldi, 2006), or in combination with structural

information (Capriotti, et al., 2004). However, the inputs are encoded using the

20-alphabet amino acid code, so the number of parameters rises rapidly to more than

one hundred as a result. Thus, these models may be over-fitted and with consequent

effects on performance when applied to new cases (Khan and Vihinen, 2010). A

database containing experimental ∆∆G values for variations, ProTherm, is available

for just over 2000 cases (Bava et al., 2004; Kumar et al., 2006). The performance of

most published stability predictors was recently tested systematically using known

examples and it was found to be suboptimal with all methods (Khan and Vihinen,

2010).

2 Runtime environment

PPSC is running on Window or Linux Operation System. It can be downloaded from

http://www.ibio-cn.org/softwares/PPSC/index.html (Server in China) or

http://structure.bmc.lu.se/ppsc/ (Mirror in Sweden). Before running the program,

follow the steps.

1. Install Java Runtime Environment (JRE) or Java Development Kit (JDK) version

6.0 later. JDK can be downloaded from

http://www.oracle.com/technetwork/java/javase/downloads/index.html.

2. For Windows: Download the executable file (DSSPCMBI.EXE) from

http://swift.cmbi.ru.nl/gv/dssp/ and copy it to the folder where the jar file is.

For Linux: Download the source code for DSSP from the website

(http://swift.cmbi.ru.nl/gv/dssp/), compile the source code and generate an

executable file (DSSPCMBI) and then copy it to the route where the jar file is;

3. Double click the “PPSC-v1.0.jar” file to run the program. Sometimes firewalls can

cause problems and may need to be adjusted.

Page 5: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

3

3 Functions

Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models

developed to utilize 8 and 47 features, respectively, in a support vector machine

optimized with experimental data. PPSC predicts the effect of protein variation

stability. It has four main panels: download PDB file, calculate contact energy (CE),

train model, and predict stability change. As PPSC is trained with protein structures,

PDB file has to be downloaded. The function of “calculate CE” is to calculate CE

value, because CE value is one of the input attributes. The program can calculate

effects for one or many variants by uploading a batch file. The results of the

calculation are displayed as tables or image files. The function of “train model” can be

used to train a new SVM model with user defined dataset.

3.1 Download PDB File

All PPSC functions are based on PDB files of proteins, which are downloaded from

the Protein Data Bank (http://www.pdb.org/pdb/home/home.do) or from user defined

site. There are two input options: a single PDB id or TXT file containing a list of PDB

ids one at a line (Fig 3.1).

The TXT file format is:

PDB-id1

PDB-id2

….

For example:

1B55

1VQB

1A23

Page 6: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

4

Fig 3.1 “download PDB File” panel

3.2 Calculate CE

PPSC contains two models to predict protein stability change upon substitution.

Model M8 requires 8 input attributes and M47 requires 47. The attribute dCE, which

was defined as the difference in total contact energy between the wild-type protein

and the variant protein, has to be calculated separately. A coarse-grained model (Shen

and Vihinen, 2003) is implemented in “Calculate CE” panel (Fig 2.2).

There are two input ways:

1. Input a single PDB id, variation position and new residue.

2. Input a txt file containing a list of the information. The file format is :

PDB-id1 Variation1 pH1 Temperature 1 ddG1

PDB-id2 Variation2 pH2 Temperature 2 ddG2

….

Usually the ddG values are omitted. However, if you want to train a novel predictor

with cases with known effect the values should be provided. The default values for pH

and temperature are 7 and 25 and should only be changed if training a new method.

Page 7: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

5

For example:

1AAR K6E 5 25 0.53

1AAR K6Q 5 25 0.26

1AAR F45W 5 25 0.32

The format of the results is either a list file or a SVM file, or both. SVM file is used to

train new method and can be written only if ddG value is available.

Clicking “Results and running status box” shows the result in both a table and a

histogram view. They can be saved to Excel or txt files with mouse right-click.

Fig 3.2 “Calculate CE” panel

3.3 Train model

This option is used for training new SVM model. For that purpose a relatively large

set of experimentally studied cases with ddG values is needed.

As shown in Fig3.3, the input file containing training data should be in the following

format:

PDB-id1 Variation1 pH1 Temperature 1 ddG1

PDB-id2 Variation2 pH2 Temperature 2 ddG2

Page 8: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

6

….

For example:

1AAR K6E 5 25 0.53

1AAR K6Q 5 25 0.26

Set a route for saving the result as a “*.model” file. The default directory is

“\PPSC-v1.0_jar\libscm-model\”. Then parameters for the SVM should be provided.

Consult LIBSVM (www.csie.ntu.edu.tw/~cjlin/libsvm) for possible options.

Fig 3.3 “Train model” panel

3.4 Predict stability

By clicking “Predict stability” the effect on stability and the ddG are calculated.

Select M8 or M47 model or both. Choose either "Predict stability" or "Predict ddG"

function.

Then, we choose either input single variation information or input a list of variations.

The single variation information contains PDB id, variation position, new residue, as

well as temperature and PH values.

The list file containing multiple variations is in the following format:

Page 9: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

7

PDB-id1 Variation1 pH1 Temperature1

PDB-id2 Variation2 pH2 Temperature2

….

For example:

1AAR K6E 5 25

1AAR K6Q 5 25

1AAR F45W 5 25

Note that M47 model produces more reliable results than M8 model.

Fig 3.4 “Predict stability” panel

4 Input and output files

4.1 Formats of input files

1. The format of PDB id list file to input in “Download PDB file” panel is:

Page 10: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

8

PDB-id

For example:

1B55

1VQB

1A23

2. The format of list file in “Calculate CE” panel is :

PDB-id1 Variation1 pH1 Temperature 1 ddG1

PDB-id2 Variation2 pH2 Temperature 2 ddG2

….

The ddG value should be omitted, unless you want to get the svm-file to obtain a new

predictor. The default pH and temperature values are 7 and 25, adjust only if

necessary.

For example:

1AAR K6E 5 25 0.53

1AAR K6Q 5 25 0.26

1AAR F45W 5 25 0.32

3. The format of input file in “Train model” panel is:

PDB-id1 Variation1 pH1 Temperature 1 ddG1

PDB-id2 Variation2 pH2 Temperature 2 ddG2

….

For example:

1AAR K6E 5 25 0.53

1AAR K6Q 5 25 0.26

4. The format of input file in “Predict stability” panel is:

PDB-id1 Variation1 pH1 Temperature1

PDB-id2 Variation 2 pH2 Temperature2

Page 11: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

9

….

For example:

1AAR K6E 5 25

1AAR K6Q 5 25

1AAR F45W 5 25

4.2 Output files

The results shown in the four main panels are visual in table and histogram views. All

tables can be saved as either txt or Excel files, and all histograms can be saved as JPG

files by right clicking the mouse.

5 Example

This part shows how to use PPSC with a brief example. The aim is to calculate CE

and predict stability. The data are listed in a file named "example.txt" with the content

as follows:

1LZ1 I89A 2.7 64.9

1LZ1 V121A 2.7 64.9

1VQB C33I 7 25

1BNI A32Y 6.3 25

1CYO V61Y 7 67.2

The file meets all format requirements in 3 main panels but the “Train model”.

Double click the “PPSC-v1.0.jar” file to start the program:

Step 1, download PDB file in” Download PDB file” panel (Fig 5.1). We use

“example.txt” as input and you can find “OK” in status column when all PDB files are

fully downloaded.

Page 12: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

10

Fig 5.1 Download PDB file

Step 2, calculate CE using “example.txt” and the CE result is shown in red in the table,

see Fig 5.2.

Fig 5.2 Calculate CE

Step3, predict stability. We used both “M8” and “M47” to predict stability. Input the

“example.txt” as a list of variants and get the result in Fig 5.3.

Page 13: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

11

Fig 5.3 Predict stability of the example by M8 and M47

Or, when choosing to predict ddG and the result save as in Fig 5.4 and 5.5.

Fig 5.4 Right click to save the result of “Predict ddG” using M8 and M47 as Excel and txt file

Page 14: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

12

Fig 5.5 Right click to save the result image

6 References

Alberty RA. 1969. Standard Gibbs free energy, enthalpy, and entropy changes as a

function of pH and pMg for several reactions involving adenosine phosphates.

J Biol Chem 244:3290-302.

Capriotti E, Fariselli P, Calabrese R, Casadio R. 2005. Predicting protein stability

changes from sequences using support vector machines. Bioinformatics 21

Suppl 2:ii54-8.

Capriotti E, Fariselli P, Casadio R. 2004. A neural-network-based method for

predicting protein stability changes upon single point mutations.

Bioinformatics 20 Suppl 1:i63-8.

Chih-Chung Chang, Lin C-J. 2001. LIBSVM: a library for support vector machines.

Collantes ER, Dunn WJ, 3rd. 1995. Amino acid side chain descriptors for quantitative

structure-activity relationship studies of peptide analogues. J Med Chem

38:2705-13.

Eisenberg D, Schwarz E, Komaromy M, Wall R. 1984. Analysis of membrane and

Page 15: PPSC version 1.0 User Guidestructure.bmc.lu.se/PPSC/files/User_guide_PPSC.pdf3 3 Functions Prediction of Protein Stability Changes (PPSC) is based on M8 and M47 models developed to

13

surface protein sequences with the hydrophobic moment plot. J Mol Biol

179:125-42.

Ferrer-Costa C, Orozco M, de la Cruz X. 2002. Characterization of disease-associated

single amino acid polymorphisms in terms of sequence and structure

properties. J Mol Biol 315:771-86.

Keerthi SS, Lin C-J. 2003. Asymptotic behaviors of support vector machines with

Gaussian kernel. Neural Computation 15:1667-89.

Khan S, Vihinen M. 2010. Performance of protein stability predictors. Hum Mutat

Khatun J, Khare SD, Dokholyan NV. 2004. Can contact potentials reliably predict

stability of proteins? J Mol Biol 336:1223-38.

Lazaridis T, Karplus M. 2000. Effective energy functions for protein structure

prediction. Curr Opin Struct Biol 10:139-45.

Lin H-T, Lin C-J. 2003. A study on sigmoid kernels for SVM and the training of

non-PSD kernels by SMO-type methods. Department of Computer Science,

National Taiwan University.

Matthews BW. 1975. Comparison of the predicted and observed secondary structure

of T4 phage lysozyme. Biochim Biophys Acta 405:442-51.

Shen B, Bai J, Vihinen M. 2008. Physicochemical feature-based classification of

amino acid mutations. Protein Eng Des Sel 21:37-44.

Shen B, Vihinen M. 2003. RankViaContact: ranking and visualization of amino acid

contacts. Bioinformatics 19:2161-2.

Thusberg J, Vihinen M. 2009. Pathogenic or not? And if so, then how? Studying the

effects of missense mutations using bioinformatics methods. Hum Mutat

30:703-14.

Zamyatnin AA. 1972. Protein volume in solution. Prog Biophys Mol Biol 24:107-23.