fwht-rf:anovelcomputationalapproachtopredictplant ...particularly expensive, time-consuming, and...

Research ArticleFWHT-RF A Novel Computational Approach to Predict PlantProtein-Protein Interactions via an Ensemble Learning Method

Jie Pan Li-Ping Li Chang-Qing Yu Zhu-Hong You Zhong-Hao Ren and Jing-Yu Tang

School of Information Engineering Xijing University Xirsquoan 710123 China

Correspondence should be addressed to Li-Ping Li cs2bioinformaticsgmailcom

Received 26 May 2021 Accepted 14 July 2021 Published 22 July 2021

Academic Editor Pengwei Wang

Copyright copy 2021 Jie Pan et al is is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Protein-protein interactions (PPIs) in plants are crucial for understanding biological processes Although high-throughputtechniques produced valuable information to identify PPIs in plants they are usually expensive inefficient and extremely time-consuming Hence there is an urgent need to develop novel computational methods to predict PPIs in plants In this article weproposed a novel approach to predict PPIs in plants only using the information of protein sequences Specifically plantsrsquo proteinsequences are first converted as position-specific scoring matrix (PSSM) then the fast WalshndashHadamard transform (FWHT)algorithm is used to extract feature vectors from PSSM to obtain evolutionary information of plant proteins Lastly the rotationforest (RF) classifier is trained for prediction and produced a series of evaluation results In this work we named this approachFWHT-RF because FWHT and RF are used for feature extraction and classification respectively When applying FWHT-RF onthree plantsrsquo PPI datasets Maize Rice and Arabidopsis thaliana (Arabidopsis) the average accuracies of FWHT-RF using 5-foldcross validation were achieved as high as 9520 9442 and 8385 respectively To further evaluate the predictive power ofFWHT-RF we compared it with the state-of-art support vector machine (SVM) and K-nearest neighbor (KNN) classifier indifferent aspects e experimental results demonstrated that FWHT-RF can be a useful supplementary method to predictpotential PPIs in plants

1 Introduction

Protein-protein interactions (PPIs) in plants underlie manybiological processes including cellular organization signaltransduction [1] metabolic cycles [2] and plant defense [3]us detecting and characterizing the protein interactions arecritically important for understanding the relevant molecularmechanisms inside the plant cells With the scientific andtechnological advances a multitude of experimental approacheshad been developed to identify PPIs in plants such as yeast two-hybrid (Y2H) [4] bimolecular fluorescence complementation(BiFC) [5] tandem affinity purification (TAP) [6] and someother high-throughput DNA sequencing technology for PPIsdetection erefore a huge and ever-increasing of experi-mental data about plants PPIs has been accumulated Howeverthese approaches have some inevitable shortcomings they areparticularly expensive time-consuming and always present

problems with high false-negative rates Besides it is also dif-ficult to apply large-scale experiments on plants due to thecomplexity of interactions in plant cells As a result of theseshortcomings developing accurate computational methods topredict PPIs would be of great value to plant biologists

In recent years many computational methods and en-semble learning algorithms have been established to offercomplementary and supporting information for previousexperimental approaches [7ndash9] ese methods can bebroadly classified into three categories protein structure-based method docking-based method and sequence-basedmethod Generally the first two methods usually needstructural details However many proteins do not haveinformation about the prior knowledge such as 3D struc-tural information and protein homology In addition withthe rapid advance in high-throughput sequencing tech-nology more and more plant protein sequence data are

HindawiScientific ProgrammingVolume 2021 Article ID 1607946 11 pageshttpsdoiorg10115520211607946

available which lead to a great interest in sequence-basedmethods for PPIs prediction

To date many sequence-based approaches have beenpresented for predicting PPIs and many ensemble-learningalgorithms have been proposed for classification [10ndash12]For example Yi et al [13] proposed a method called RPI-SAN which adopts the deep-learning stacked auto-encodernetwork to mine the features from RNA and protein se-quences and then employs the rotation forest classifier topredict ncRNA binding proteins Hashemifar et al [14]developed a novel deep learning framework named DPPIDPPI combined random projection and data augmentationwith a deep Siamese-like convolutional neural network topredict PPIs Zhang et al [15] presented the EnsDNN(Ensemble Deep Neural) method which first employed thelocal descriptor covariance descriptor and multiscalecontinuous and discontinuous local descriptor together toexplore the interactions between proteins en it trainedthe deep neural networks (DNNs) based on different con-figurations of each descriptor Finally they adopted a two-hidden layers neural network to integrate these DNNs topredict potential PPIs Wei et al [16] combined the novelnegative samples features and an ensemble classifier topredict PPIs ey report two types of novel feature ex-traction methods One is the based on physicochemicalproperties of proteins and the other is based on the sec-ondary structure information Sun et al [17] applied a deep-learning algorithm stacked autoencoder to identify PPIsfrom the protein sequence Kulmanov et al [18] developed amethod called DeepGO which combined a deep ontology-aware classifier with amino acid sequence information todetect protein functions and interactions Despite theseadvances there is still room for improvement in the pre-diction performance of PPIsrsquo model [19]

In this article we present a novel sequence-basedcomputational approach namely FWHT-RF to predictpotential protein-protein interactions in plants Morespecifically we first transformed the plants protein se-quences as position-specific scoring matrix (PSSM)en in order to fully characterize the evolutionaryinformation of protein pairs we performed the fastWalshndashHadamard transform (FWHT) on the PSSM toextract featuresrsquo vectors Although FWHT plays an es-sential role in image analysis and pattern recognition butas we know it is first time to be applied in plant biologyfor the purpose of PPIsrsquo prediction Lastly a powerfulclassification model rotation forest (RF) was used totrain the models e major contributions of FWHT-RFare as follows (1) FWHT-RF did not depend on uniquesubspaces in the studied proteomic space or known PPIsrsquosamples because it extracts features directly from PSSM ofthe plant protein sequence (2) Since these characteristicsare linked to the evolutionary past of plant proteins theyhave more power to detect PPIs than many other ap-proaches (3) e basic features from PSSM for each plantproteins were extracted using a novel statistical selectionfeature mechanism and converted into a 400-dimensionalfeature vector As a result the feature vectors of these twoproteins are integrated to create an 800-dimensional

feature vector for each protein pair (4) Finally this worksuggested to use the RF classifier for training these fea-tures which can improve the accuracy of PPIs predictionis model has been well investigated in three plantsrsquodatasets (Maize Rice and Arabidopsis thaliana (Arabi-dopsis)) and yields a high prediction accuracy of 95209442 and 8385 respectively To further evaluate thepredictive performance of FWHT-RF we comparedFWHT-RF with the state-of-art support vector machine(SVM) and k-nearest neighbor (KNN) classifier eexperimental results indicated that FWHT-RF can be acomplement tool to large-scale prediction of PPIs inplants

2 Materials and Methods

21 Benchmark Datasets Collection Although many ex-periments and databases have been developed to identifyand store the PPIs data in plants [20 21] however falsepositive interactions are typical in these data ese falsepositive data may have a negative impact for the com-putational methods erefore the construction ofbenchmark datasets to improve the accuracy of plant PPIsprediction is necessary In this paper we evaluate theFWHT-RF approach through three plantsrsquo benchmarkdatasets including Maize Rice and Arabidopsis thaliana(Arabidopsis)

As we all know maize is one of the most importantcereal crops in the world and a model plant for genomicstudies of PPIs e Maize dataset was gathered from theProtein-Protein Interaction Database for Maize (PPIM)[22] and agriGO [23] We obtained 14800 nonredundantmaize protein pairs which built the positive dataset Inorder to construct the negative dataset we selected 14800additional maize protein pairs of different subcellularlocalizations Consequently the whole Maize datasetconsists of 29600 protein pairs

To further demonstrate the feasibility of the proposedmethod two different types of plant PPIsrsquo datasets werealso adopted in this study e first one is Rice which wasgathered from the PRIN [24] database which consisted of9600 nonredundant rice protein interaction pairs (4800interacting pairs and 4800 noninteracting pairs) esecond is the popular model plant Arabidopsis Wecollected Arabidopsis PPIs from public PPI databasesIntAct [25] BioGRID [26] and TAIR [27] After theremoval of redundant sequences we obtained 28110interactions from 7437 Arabidopsis proteins e negativeprotein pairs are generated by randomly pairing theproteins without evidence of interactions In this way thewhole Arabidopsis dataset is constructed by 56220protein pairs

22 Representation of Target Proteins e position-specificscoring matrix (PSSM) [28] was firstly proposed for testingthe distantly related proteins In recent years PSSM has beenwidely used for mining the evolutionary information ofprotein sequences [29] PSSM is a P times 20 matrix e

2 Scientific Programming

number of amino acids in the proteins is represented by Pand the naive amino acids are represented by 20 columnsSuppose that L φ (i j) i 1 P j 1 201113864 1113865 andthe following is a summary of each matrix

L

φ11 φ12 middot middot middot φ1j middot middot middot φ120


⋮ ⋮ ⋮ ⋮

φi1 φi2 middot middot middot φij middot middot middot φi20

⋮ ⋮ ⋮ ⋮

φp1 φp2 middot middot middot φpj middot middot middot φp20

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(1)

where φ11 in the i row of PSSM indicates the probability ofthe ith residue being mutated into jth native amino acid

In this study we adopted the Position-Specific IteratedBLAST (PSI-BLAST) [30] tool to generate the PSSM for thepurpose of extracting evolutionary information To achievebroad and high homologous sequences the expectationvalue (e value) was set to 0001 the number of iterations wasset to 3 and other parameters were maintained as the defaultvalues

23 2D Fast WalshndashHadamard TransformWalshndashHadamard transform (WHT) [31] is employed inmany applications such as image analysis and signal pro-cessing It is recognized as a generalized type of Fouriertransforms (FT) and has three popular orderings (1)NaturalOrdering (Hadamard Ordering) (2) Dyadic Ordering (PaleyOrdering) and (3) Sequency Ordering (Walsh Ordering)[32ndash34] In this study we will focus on the WHTof NaturalOrdering e WHT matrix consists only by plusmn1 Since nomultiplication operation is required in the computation thecomputational complexity is greatly reduced In theencrypted domain this algorithm can avoid quantizationerror and thusWHTcan ensure perfect reconstruction of theencrypted image erefore WHT is better and more ef-fective than transformations such as DFT [35] or DCT [36]

Suppose that Q(i j) represented the input image witha times b size where a and b were used to describe the same andthe power of 2 e two-dimensional fast WalshndashHadamardtransform (FWHT) [37] of Natural Ordering (HadamardOrdering) can be defined as follows

ψ(p k) 1a

1113944

aminus1

i01113944

bminus1

j0Hη(p i)Q(i j)Hη(j k) p 0 1 2 3 a minus 3 a minus 2 a minus 1 and n 0 1 2 3 b minus 3 b minus 2 b minus 1

(2)

where μ log2 p andHη denotes the Hadamardmatrix Hηcan be generated by the core matrix

H1 1 1

1 minus11113890 1113891 (3)

and the Kronecker product recursion is as follows

Hη H1 otimesHηminus1 Hηminus1 Hηminus1

Hηminus1 minusHηminus1

⎡⎣ ⎤⎦ (4)

where otimes is the Kronecker product operator [38] e 2DFWHT [39] is a separable transformation which can befurther divided into two 1D transforms When applying the2D FWHT on the input image Q(i j) is equivalent to ap-plying 1D FWHTon all columns of the input image initiallyand then using 1D FWHTon all rows of achieved results For2D FWHT the computational complex is a log a In thisstudy Q(i j) is the input signal matrix and here is the P times

20 PSSMmatrix By this way the plant protein sequence canbe represented by FWHT feature descriptors

24 Ensemble Rotation Forest Classifier Rotation forest(RF) was introduced by Rodriguez et al [40] which is an

ensemble learning algorithm based on an independentlytrained decision tree e main advantage of RF is that itcan balance diversity and accuracy at the same time RFfirst randomly divided the samples into different subsetsen principal component analysis (PCA) [41] was usedto transform the attribute subsets to increase the dif-ference between the subsets At last the transformedsubsets will be fed into the decision trees e results ofRF can be achieved via a voting method by these trees especific steps of RF are as follows

Suppose that qi pi1113864 1113865 contains T samples of which qi

(qi1 qi2 qi3 qiL) be an L-dimensional feature vectorLet Z represents the training sample set containing Ttraining samples and forming a matrix of T times L Let Urepresents the feature set and M denotes the label setAssume that the number of decision trees is S then thedecision trees can be denoted as D1 D2 D3 DS erotation forest algorithm is implemented as follows

(1) Choose the suitable parameter M which can ran-domly split divide U into M disjointed subsets andthe number of features contained in the featuresubset is LM

(2) Let Uij represent the jth feature subset and be usedto train the classifier Di e sample subset Zij

prime is

Scientific Programming 3

constructed by a nonempty subset which is ran-domly picked out from a certain proportion

(3) Apply PCA on Zijprime to order the coefficients which

is stored in matrix λij(4) e coefficients achieved from the matrix λij are

used to construct a sparse rotation matrix φi whichcan be defined as follows

φi

a(1)i1 a

S1( )i1 0 middot middot middot 0

0 a(1)i1 a

S2( )i1 middot middot middot 0

⋮ ⋮ ⋱ ⋮

0 0 middot middot middot a(1)i1 a

SM( )i1

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(5)

During the prediction process a test sample g is givenwhich is generated by the classifier Di of Rij(Zφa

i ) which isintroduced to indicate that g belongs to class pi en theclass of confidence is calculated via the average combinationand the formula can be expressed as follows

Vj(g) 1S

1113944

S

i1Rij Zφa

i( 1113857 (6)

en assign the category with the largest Vj(g) value tog e overview of FWHT-RF workflow is presented inFigure 1

3 Results and Discussion

31 Validation Measures In this work we employed mul-tiple evaluation indicators to access the effectiveness ofFWHT-RF including accuracy (Acc) sensitivity (Sen)precision (Prec) Matthews correlation coefficient (MCC)and area under the receiver operating characteristic curve(AUC) Correspondingly the first four formulas can berepresented as follows

Acc TN + TP

TN + FP + FN + TP

Sen TP

FN + TP

Prec TP

TP + FP

MCC TN times TP minus FP times FN

(TP + FP) times(TN + FN) times(TN + FP) times(TP + FN)

1113968

(7)

where TP (true positive) represents the number of true plantPPIs that are correctly identified (positive samples) FP (falsepositive) refers to the number of noninteraction plantprotein pairs (negative samples) and TN (true negative)denotes the number of correct classification of positivesamples while FN (false negative) refers to the number ofincorrect classification of negative samples

To provide a more comprehensive assessment of theFWHT-RF method the receiver operating characteristic

(ROC) curves which are suitable for accessing the perfor-mance of the proposed method were computed e areaunder the ROC curves (AUC) was also calculated to test thepredictive ability of FWHT-RF AUC denotes the probabilitythat a positive sample is ahead of a negative one e AUCvalue closer to 10 indicates the better predictive perfor-mance of the FWHT-RF method [42]

32 Assessment of PredictionAbility In this article we adopt5-fold cross-validation technique to comparatively accessthe prediction performance of FWHT-RF in three plantdatasets involvingMaize Rice and Arabidopsis By this waywe can prevent overfitting and test the stability of theproposed method More specifically each plant PPIsrsquo datasetis randomly split into five subsets one of them is used as atesting set in turn and the other four subsets are adopted astraining sets us five models can be generated for the fivesets of data e cross validation has the advantages that itcan minimize the impact of data dependency and improvethe reliability of the results

e 5-fold cross validation results of the proposed ap-proach on the three plants datasets are listed in Tables 1ndash3From Tables 1ndash3 we can observe that when applying theproposed method to the Maize dataset we obtained the bestprediction results of average accuracy precision sensitivityand MCC as 9520 9729 9299 and 9085 withcorresponding standard deviations 038 026 062 and069 respectively When performing FWHT-RF on the Ricedataset we yielded good results of average accuracy precisionsensitivity and MCC of 9442 9463 9417 and 8946respectively e standard deviations of these criteria valuesare 056 084 072 and 099 respectively Whenperforming FWHT-RF on the Arabidopsis dataset the pro-posed approach obtained good results of average accuracyprecision sensitivity and MCC of 8385 8929 7695and 7266 and the standard deviations are 035 062116 and 052 respectively Figures 2ndash4 show the ROCcurves for the proposed approach on Maize Rice andArabidopsis e average AUC values range from 9055 to9750 (Maize 9750 Rice 9690 and Arabidopsis9055) demonstrating that FWHT-RF is fitting well forpredicting PPIs in plants from amino acid sequences

ese good results collectively indicated that it is sufficientto predict PPIs in plants only using protein sequence infor-mation and that powerful prediction capability can be gen-erated by combining the RF classifier with FWHT featuresrsquodescriptorse high accuracies and low standard deviations ofthese criterion values indicate that FWHT-RF is feasible andeffective for predicting potential PPIs in plants

33 Comparison of RF with SVM and KNN Classifiersere are various methodologies for machine learning modelsto identify PPIs and most of them are based on traditionalclassifiers To further access the predictive performance ofFWHT-RF we compared it by using the same feature ex-traction approachwith the state-of-art SVMandKNNclassifierin the same three plantsrsquo datasets e main idea of the SVMalgorithm is to find the optimal hyperplane that maximally


MTSHIATETSVNRWSTEPVVREEGRPPGPEDFPSTKQGREQDGNVIGITPNYKIGEQKPNSELNLGKTGSLPRGSGDYKMTYRQEDYKQTIIDPKVITDESLRELETMKQRIETLKRDANKTVDTSPRSVVSMNGASHDR

Representation ofprotein sequence

1

2

11

12

13

n

M

F

D

V

E

L

L =

φ11 φ12φ22φ21

φi1 φi2 φij φi20

φp20φpjφp2φp1

φ1jφ2j

φ120

x0y0

y1

y2

y3

y4

y5

y6

y7

x4

x1

x5

x2

x6

x3

x7

φ220

A R Y V

i-3

i+3 065 014 098 054

072 013 014 032

089064098016

Position-specific scoring matrix

FWHT

Featuresextraction

2

1

0

-1

-2

-3 -2 0 1-1 2 3

PCA

true

true

true

true

true

true

false

false

false

false

false

false

Decision treeVoting results

Rotation forest

1

08

06

04

02

0

Sens

itivi

ty

0 02 04 06 08 11 ndash specificity

1st-fold (AUC = 09686)

2nd-fold (AUC = 09693)

3rd-fold (AUC = 09700)

4th-fold (AUC = 09683)


Recognized PPIs

Protein A and B

Average AUC = 09678

Figure 1 e workflow of FWHT-RF for predicting protein-protein interactions in plants

Table 1 5-fold cross-validation results obtained on the Maize dataset using FWHT-RF

Testing set Acc () Sen () Prec () MCC () AUC ()1 9468 9206 9724 8991 97062 9506 9297 9693 9060 97403 9556 9349 9765 9150 97954 9559 9360 9727 9155 97625 9512 9282 9737 9070 9745Average 9520 plusmn 038 9299 plusmn 062 9729 plusmn 026 9085 plusmn 069 9750 plusmn 033

Table 2 5-fold cross-validation results obtained on the Rice dataset using FWHT-RF


Table 3 5-fold cross-validation results obtained on the Arabidopsis dataset using FWHT-RF



separates training data from the two classes and it is effectivefor solving classification prediction problems K-nearestneighbor is a supervisedmachine learning technique and it cansolve the classification task e LIBSVM tool was selected inthis paper to training the SVM model At the same time thereare two parameters c and g that need to be optimized In theexperiment of theMaize and Rice dataset we set c 5 g 03c 7 and g 04 respectively When applying the FWHT-RFon the Arabidopsis dataset we set c 5 and g 07 e KNNmodel needs to choose the neighbor k and distance measuringfunction In this paper k is set to be 1 and the distancemeasuring function is selected as L1

Figure 5 shows the experimental results of RF SVMand KNN models in three plants datasetsMaize Rice andArabidopsis From Figures 5(a)ndash5(d) it can be concludedthat the results of the RF classifier are significantly betterthan those of SVM and KNN classifiers For example theaccuracy gaps between SVM and RF on the Maize Riceand Arabidopsis were 798 853 and 326 respec-tively Similarly the accuracy gaps between KNN and RFare 1172 1536 and 1040 respectively e ROCcurves achieved by the SVM and KNN classifiers on thethree plants datasets are shown in Figures 6ndash8 All theexperimental results are listed in Table 4

1

08

06

04

02

00 02 04 06 08 1

1 ndash specificity

1st-fold (AUC = 09706)2st-fold (AUC = 09740)3rd-fold (AUC = 09795)

4th-fold (AUC = 09762)5th-fold (AUC = 09745)

Sens

itivi

ty

Average AUC = 09750

Figure 2 ROC curve for FWHT-RF on Maize PPIs dataset

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09690

Figure 3 ROC curve for FWHT-RF on Rice PPIs dataset


1

08

06

04

02

0

Sens

itivi

ty


1st-fold (AUC = 09073)2nd-fold (AUC = 09021)3rd-fold (AUC = 09091)


Average AUC = 09055

Figure 4 ROC curve for FWHT-RF on Arabidopsis PPIs dataset

100

90

80

70

60

50

40

30

20

10

0

Accu

racy

()

Maize Rice Arabidopsis

RFSVMKNN

(a)

100

90

80

70

60

50

40

30

20

10

0

Prec

ision

()


RFSVMKNN

(b)

Figure 5 Continued


100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)

Figure 5 Performance comparisons of four validation metrics for three classifiers RF (blue bar) SVM (green bar) and KNN (yellow bar)(a) Accuracy (b) Precision (c) MCC (d) AUC

1

08

06

04

02

0

Sens

itivi

ty


1st-fold(AUC = 09316)2nd-fold(AUC = 09381)3rd-fold(AUC = 09322)

4th-fold(AUC = 09265)5th-fold(AUC = 09288)

Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)

Figure 6 ROC curves performed onMaize dataset (5-fold cross validation) (a) is the ROC curves of SVMmethod (b) is the ROC curves ofKNN classifier


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)

Figure 7 ROC curves performed on Rice dataset (5-fold cross validation) (a) is the ROC curves of SVM method (b) is the ROC curves ofKNN classifier

1

08

06

04

02

0

Sens

itivi

ty

0 02 04 06 0t8 11 ndash specificity



Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)

Figure 8 ROC curves performed on Arabidopsis dataset (5-fold cross validation) (a) is the ROC curves of SVM method (b) is the ROCcurves of KNN classifier


4 Discussion and Conclusions

In this study we presented an effective sequence-basedmethod called FWHT-RF to predict potential PPIs in plantsis method combined position-specific scoring matrix(PSSM) with fast WalshndashHadamard transform (FWHT) androtation forest (RF) classifier First we transformed the plantprotein sequences into PSSM to obtain the evolutionaryinformation of plantsrsquo protein sequences en the FWHTalgorithm was used to extract as much hidden informationas possible from the plant protein sequences At last the RFclassifier was trained for predicting PPIs in plants Whenperformed FWHT-RF on three plantsrsquo PPI datasets MaizeRice and Arabidopsis it achieved a high prediction accuracyof 9520 9442 and 8385 respectively Moreover wecompared FWHT-RF with the state-of-art SVM and KNNclassifier by adopting the same feature extraction methode comprehensive experiments demonstrated that FWHT-RF is an effective tool to predict PPIs in plants In the futurework we will consider applying FWHT-RF to other bio-informatics problems

Data Availability

e data are original and the data source is restricted

Conflicts of Interest

e authors declare that there are no conflicts of interest

Acknowledgments

is research was funded by the National Natural ScienceFoundation of China Grant nos 61722212 and 62002297

References

[1] R Subramaniam D Desveaux C Spickler S W Michnickand N Brisson ldquoDirect visualization of protein interactions inplant cellsrdquo Nature Biotechnology vol 19 no 8 pp 769ndash7722001

[2] T Pawson and P Nash ldquoProtein-protein interactions definespecificity in signal transductionrdquo Genes amp Developmentvol 14 no 9 pp 1027ndash1047 2000

[3] M Alves S Dadalto A Gonccedilalves G De Souza V Barrosand L Fietto ldquoTranscription factor functional protein-proteininteractions in plant defense responsesrdquo Proteomes vol 2no 1 pp 85ndash106 2014

[4] C C Matiolli andMMelotto ldquoA comprehensive Arabidopsisyeast two-hybrid library for protein-protein interactionstudies a resource to the plant research communityrdquo Mo-lecular Plant-Microbe Interactions vol 31 no 9 pp 899ndash9022018

[5] N Ohad and S Yalovsky ldquoUtilizing bimolecular fluorescencecomplementation (BiFC) to assay protein-protein interactionin plantsrdquo in Plant Developmental Biology pp 347ndash358Springer Berlin Germany 2010

[6] J S Rohila M Chen S Chen et al ldquoProtein-protein in-teractions of tandem affinity purified protein kinases fromricerdquo PLoS One vol 4 no 8 Article ID e6685 2009

[7] X-Y Pan Y-N Zhang and H-B Shen ldquoLarge-scale pre-diction of human protein-protein interactions from aminoacid sequence based on latent topic featuresrdquo Journal ofProteome Research vol 9 no 10 pp 4992ndash5001 2010

[8] P Wang R Pleskot J Zang et al ldquoPlant AtEHPan1 proteinsdrive autophagosome formation at ER-PM contact sites withactin and endocytic machineryrdquo Nature Communicationsvol 10 no 1 pp 5132ndash16 2019

[9] K Knox P Wang V Kriechbaumer et al ldquoPutting thesqueeze on plasmodesmata a role for reticulons in primaryplasmodesmata formationrdquo Plant Physiology vol 168 no 4pp 1563ndash1572 2015

[10] A Onan ldquoOn the performance of ensemble learning forautomated diagnosis of breast cancerrdquo inArtificial IntelligencePerspectives and Applications pp 119ndash129 Springer BerlinGermany 2015

[11] A Onan ldquoBiomedical text categorization based on ensemblepruning and optimized topic modellingrdquo Computational andMathematical Methods in Medicine vol 2018 Article ID2497471 22 pages 2018

[12] A Onan ldquoEnsemble learning based feature selection with anapplication to text classificationrdquo in Proceedings of the 201826th Signal Processing and Communications ApplicationsConference (SIU) 2018

[13] H-C Yi Z-H You D-S Huang X Li T-H Jiang andL-P Li ldquoA deep learning framework for robust and accurateprediction of ncRNA-protein interactions using evolutionaryinformationrdquo Molecular IerapymdashNucleic Acids vol 11pp 337ndash344 2018

[14] S Hashemifar B Neyshabur A A Khan and J Xu ldquoPre-dicting protein-protein interactions through sequence-baseddeep learningrdquo Bioinformatics vol 34 no 17 pp i802ndashi8102018

[15] L Zhang G Yu D Xia and J Wang ldquoProtein-protein in-teractions prediction based on ensemble deep neural net-worksrdquo Neurocomputing vol 324 pp 10ndash19 2019

[16] L Wei P Xing J Zeng J Chen R Su and F Guo ldquoImprovedprediction of protein-protein interactions using novel

Table 4 Comparing results of RF with SVM and KNN model on three plants PPIs dataset

Dataset Classifier Acc () Sen () Prec () MCC () AUC ()

MaizeRF 9520plusmn 038 9299plusmn 062 9729plusmn 026 9085plusmn 069 9750plusmn 033SVM 8722plusmn 041 8626plusmn 089 8795plusmn 071 7770plusmn 062 9314plusmn 044KNN 8348plusmn 038 9129plusmn 040 7896plusmn 087 7208plusmn 051 8348plusmn 030

RiceRF 9442 plusmn 056 9417 plusmn 072 9463 plusmn 084 8946 plusmn 099 9690 plusmn 037SVM 8589plusmn 091 8665plusmn 176 8533plusmn 055 7576plusmn 129 9238plusmn 049KNN 7906plusmn 065 8886plusmn 096 7429plusmn 091 6625plusmn 074 7905plusmn 055

ArabidopsisRF 8385 plusmn 035 7695plusmn 116 8929 plusmn 062 7266 plusmn 052 9055 plusmn 041SVM 8059plusmn 037 7722plusmn 085 8281plusmn 041 6865plusmn 046 8783plusmn 045KNN 7345plusmn 041 7853 plusmn 088 7129plusmn 072 6079plusmn 038 7345plusmn 040


negative samples features and an ensemble classifierrdquo Ar-tificial Intelligence in Medicine vol 83 pp 67ndash74 2017

[17] T Sun B Zhou L Lai and J Pei ldquoSequence-based predictionof protein protein interaction using a deep-learning algo-rithmrdquo BMC Bioinformatics vol 18 no 1 pp 277ndash8 2017

[18] M Kulmanov M A Khan and R Hoehndorf ldquoDeepGOpredicting protein functions from sequence and interactionsusing a deep ontology-aware classifierrdquo Bioinformaticsvol 34 no 4 pp 660ndash668 2018

[19] A Onan S Korukoglu and H Bulut ldquoA multiobjectiveweighted voting ensemble classifier based on differentialevolution algorithm for text sentiment classificationrdquo ExpertSystems with Applications vol 62 pp 1ndash16 2016

[20] K Bracha-Drori ldquoDetection of protein-protein interactions inplants using bimolecular fluorescence complementationrdquoIePlant Journal vol 40 no 3 pp 419ndash427 2004

[21] P Zhu H Gu Y Jiao D Huang and M Chen ldquoCompu-tational identification of protein-protein interactions in ricebased on the predicted rice interactome networkrdquo GenomicsProteomics amp Bioinformatics vol 9 no 4-5 pp 128ndash37 2011

[22] G Zhu A Wu X-J Xu et al ldquoPPIM a protein-proteininteraction database for maizerdquo Plant Physiology vol 170no 2 pp 618ndash626 2016

[23] T Tian Y Liu H Yan et al ldquoagriGO v20 a GO analysistoolkit for the agricultural community 2017 updaterdquo NucleicAcids Research vol 45 no W1 pp W122ndashW129 2017

[24] H Gu P Zhu Y Jiao Y Meng and M Chen ldquoPRIN apredicted rice interactome networkrdquo BMC Bioinformaticsvol 12 no 1 p 161 2011

[25] S Kerrien B Aranda L Breuza et al ldquoe IntAct molecularinteraction database in 2012rdquo Nucleic Acids Research vol 40no D1 pp D841ndashD846 2012

[26] R Oughtred C Stark B-J Breitkreutz et al ldquoe BioGRIDinteraction database 2019 updaterdquo Nucleic Acids Researchvol 47 no D1 pp D529ndashD541 2019

[27] S Y Rhee ldquoe Arabidopsis information resource (TAIR) amodel organism database providing a centralized curatedgateway to Arabidopsis biology research materials andcommunityrdquo Nucleic Acids Research vol 31 no 1pp 224ndash228 2003

[28] M Gribskov A D McLachlan and D Eisenberg ldquoProfileanalysis detection of distantly related proteinsrdquo Proceedingsof the National Academy of Sciences vol 84 no 13pp 4355ndash4358 1987

[29] G Raicar H Saini A Dehzangi S Lal and A SharmaldquoImproving protein fold recognition and structural classprediction accuracies using physicochemical properties ofamino acidsrdquo Journal of Ieoretical Biology vol 402pp 117ndash128 2016

[30] S F Altschul and E V Koonin ldquoIterated profile searches withPSI-BLAST-a tool for discovery in protein databasesrdquo Trendsin biochemical sciences vol 23 no 11 pp 444ndash447 1998

[31] B J Fino and V R Algazi ldquoUnified matrix treatment of thefast Walsh-Hadamard transformrdquo IEEE Transactions onComputers vol 25 no 11 pp 1142ndash1146 1976

[32] P Zheng and J Huang ldquoWalsh-Hadamard transform in thehomomorphic encrypted domain and its application in imagewatermarkingrdquo in Proceedings of the 2012 InternationalWorkshop on Information Hiding pp 240ndash254 SpringerBerkeley CA USA 2012

[33] A ompson ldquoe cascading Haar wavelet algorithm forcomputing the Walsh-Hadamard transformrdquo IEEE SignalProcessing Letters vol 24 no 7 pp 1020ndash1023 2017

[34] C-S Park ldquoRecursive algorithm for sliding Walsh Hadamardtransformrdquo IEEE Transactions on Signal Processing vol 62no 11 pp 2827ndash2836 2014

[35] S Weinstein and P Ebert ldquoData transmission by frequency-division multiplexing using the discrete Fourier transformrdquoIEEE Transactions on Communication Technology vol 19no 5 pp 628ndash634 1971

[36] N Ahmed T Natarajan and K R Rao ldquoDiscrete cosinetransformrdquo IEEE Transactions on Computers vol C-23 no 1pp 90ndash93 1974

[37] Y A Geadah and M Corinthios ldquoNatural dyadic and se-quency order algorithms and processors for the Walsh-Hadamard transformrdquo IEEE Computer Architecture Lettersvol 26 no 5 pp 435ndash442 1977

[38] M T Hamood and S Boussakta ldquoFast Walsh-Hadamard-Fourier transform algorithmrdquo IEEE Transactions on SignalProcessing vol 59 no 11 pp 5627ndash5631 2011

[39] P Zheng and J Huang ldquoEfficient encrypted images filteringand transform coding with Walsh-Hadamard transform andparallelizationrdquo IEEE Transactions on Image Processingvol 27 no 5 pp 2541ndash2556 2018

[40] J J Rodriguez L I Kuncheva and C J Alonso ldquoRotationforest a new classifier ensemble methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 28 no 10pp 1619ndash1630 2006

[41] H Abdi and L J Williams ldquoPrincipal component analysisrdquoWiley Interdisciplinary Reviews Computational Statisticsvol 2 no 4 pp 433ndash459 2010

[42] A Onan S Korukoglu and H Bulut ldquoA hybrid ensemblepruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classificationrdquoInformation Processing amp Management vol 53 no 4pp 814ndash833 2017


available which lead to a great interest in sequence-basedmethods for PPIs prediction

To date many sequence-based approaches have beenpresented for predicting PPIs and many ensemble-learningalgorithms have been proposed for classification [10ndash12]For example Yi et al [13] proposed a method called RPI-SAN which adopts the deep-learning stacked auto-encodernetwork to mine the features from RNA and protein se-quences and then employs the rotation forest classifier topredict ncRNA binding proteins Hashemifar et al [14]developed a novel deep learning framework named DPPIDPPI combined random projection and data augmentationwith a deep Siamese-like convolutional neural network topredict PPIs Zhang et al [15] presented the EnsDNN(Ensemble Deep Neural) method which first employed thelocal descriptor covariance descriptor and multiscalecontinuous and discontinuous local descriptor together toexplore the interactions between proteins en it trainedthe deep neural networks (DNNs) based on different con-figurations of each descriptor Finally they adopted a two-hidden layers neural network to integrate these DNNs topredict potential PPIs Wei et al [16] combined the novelnegative samples features and an ensemble classifier topredict PPIs ey report two types of novel feature ex-traction methods One is the based on physicochemicalproperties of proteins and the other is based on the sec-ondary structure information Sun et al [17] applied a deep-learning algorithm stacked autoencoder to identify PPIsfrom the protein sequence Kulmanov et al [18] developed amethod called DeepGO which combined a deep ontology-aware classifier with amino acid sequence information todetect protein functions and interactions Despite theseadvances there is still room for improvement in the pre-diction performance of PPIsrsquo model [19]

In this article we present a novel sequence-basedcomputational approach namely FWHT-RF to predictpotential protein-protein interactions in plants Morespecifically we first transformed the plants protein se-quences as position-specific scoring matrix (PSSM)en in order to fully characterize the evolutionaryinformation of protein pairs we performed the fastWalshndashHadamard transform (FWHT) on the PSSM toextract featuresrsquo vectors Although FWHT plays an es-sential role in image analysis and pattern recognition butas we know it is first time to be applied in plant biologyfor the purpose of PPIsrsquo prediction Lastly a powerfulclassification model rotation forest (RF) was used totrain the models e major contributions of FWHT-RFare as follows (1) FWHT-RF did not depend on uniquesubspaces in the studied proteomic space or known PPIsrsquosamples because it extracts features directly from PSSM ofthe plant protein sequence (2) Since these characteristicsare linked to the evolutionary past of plant proteins theyhave more power to detect PPIs than many other ap-proaches (3) e basic features from PSSM for each plantproteins were extracted using a novel statistical selectionfeature mechanism and converted into a 400-dimensionalfeature vector As a result the feature vectors of these twoproteins are integrated to create an 800-dimensional

feature vector for each protein pair (4) Finally this worksuggested to use the RF classifier for training these fea-tures which can improve the accuracy of PPIs predictionis model has been well investigated in three plantsrsquodatasets (Maize Rice and Arabidopsis thaliana (Arabi-dopsis)) and yields a high prediction accuracy of 95209442 and 8385 respectively To further evaluate thepredictive performance of FWHT-RF we comparedFWHT-RF with the state-of-art support vector machine(SVM) and k-nearest neighbor (KNN) classifier eexperimental results indicated that FWHT-RF can be acomplement tool to large-scale prediction of PPIs inplants

2 Materials and Methods

21 Benchmark Datasets Collection Although many ex-periments and databases have been developed to identifyand store the PPIs data in plants [20 21] however falsepositive interactions are typical in these data ese falsepositive data may have a negative impact for the com-putational methods erefore the construction ofbenchmark datasets to improve the accuracy of plant PPIsprediction is necessary In this paper we evaluate theFWHT-RF approach through three plantsrsquo benchmarkdatasets including Maize Rice and Arabidopsis thaliana(Arabidopsis)

As we all know maize is one of the most importantcereal crops in the world and a model plant for genomicstudies of PPIs e Maize dataset was gathered from theProtein-Protein Interaction Database for Maize (PPIM)[22] and agriGO [23] We obtained 14800 nonredundantmaize protein pairs which built the positive dataset Inorder to construct the negative dataset we selected 14800additional maize protein pairs of different subcellularlocalizations Consequently the whole Maize datasetconsists of 29600 protein pairs

To further demonstrate the feasibility of the proposedmethod two different types of plant PPIsrsquo datasets werealso adopted in this study e first one is Rice which wasgathered from the PRIN [24] database which consisted of9600 nonredundant rice protein interaction pairs (4800interacting pairs and 4800 noninteracting pairs) esecond is the popular model plant Arabidopsis Wecollected Arabidopsis PPIs from public PPI databasesIntAct [25] BioGRID [26] and TAIR [27] After theremoval of redundant sequences we obtained 28110interactions from 7437 Arabidopsis proteins e negativeprotein pairs are generated by randomly pairing theproteins without evidence of interactions In this way thewhole Arabidopsis dataset is constructed by 56220protein pairs

22 Representation of Target Proteins e position-specificscoring matrix (PSSM) [28] was firstly proposed for testingthe distantly related proteins In recent years PSSM has beenwidely used for mining the evolutionary information ofprotein sequences [29] PSSM is a P times 20 matrix e



L



⋮ ⋮ ⋮ ⋮


⋮ ⋮ ⋮ ⋮


⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(1)





ψ(p k) 1a

1113944

aminus1

i01113944

bminus1


(2)


H1 1 1

1 minus11113890 1113891 (3)




⎡⎣ ⎤⎦ (4)









prime is






φi

a(1)i1 a


0 a(1)i1 a


⋮ ⋮ ⋱ ⋮


SM( )i1

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(5)



Vj(g) 1S

1113944

S

i1Rij Zφa

i( 1113857 (6)




Acc TN + TP

TN + FP + FN + TP

Sen TP

FN + TP

Prec TP

TP + FP



1113968

(7)











1

2

11

12

13

n

M

F

D

V

E

L

L =

φ11 φ12φ22φ21


φp20φpjφp2φp1

φ1jφ2j

φ120

x0y0

y1

y2

y3

y4

y5

y6

y7

x4

x1

x5

x2

x6

x3

x7

φ220

A R Y V

i-3

i+3 065 014 098 054

072 013 014 032

089064098016


FWHT

Featuresextraction

2

1

0

-1

-2

-3 -2 0 1-1 2 3

PCA

true

true

true

true

true

true

false

false

false

false

false

false


Rotation forest

1

08

06

04

02

0

Sens

itivi

ty







Recognized PPIs

Protein A and B

Average AUC = 09678











1

08

06

04

02

00 02 04 06 08 1

1 ndash specificity



Sens

itivi

ty

Average AUC = 09750


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09690



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09055


100

90

80

70

60

50

40

30

20

10

0

Accu

racy

()


RFSVMKNN

(a)

100

90

80

70

60

50

40

30

20

10

0

Prec

ision

()


RFSVMKNN

(b)

Figure 5 Continued


100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References




















































L



⋮ ⋮ ⋮ ⋮


⋮ ⋮ ⋮ ⋮


⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(1)





ψ(p k) 1a

1113944

aminus1

i01113944

bminus1


(2)


H1 1 1

1 minus11113890 1113891 (3)




⎡⎣ ⎤⎦ (4)









prime is






φi

a(1)i1 a


0 a(1)i1 a


⋮ ⋮ ⋱ ⋮


SM( )i1

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(5)



Vj(g) 1S

1113944

S

i1Rij Zφa

i( 1113857 (6)




Acc TN + TP

TN + FP + FN + TP

Sen TP

FN + TP

Prec TP

TP + FP



1113968

(7)











1

2

11

12

13

n

M

F

D

V

E

L

L =

φ11 φ12φ22φ21


φp20φpjφp2φp1

φ1jφ2j

φ120

x0y0

y1

y2

y3

y4

y5

y6

y7

x4

x1

x5

x2

x6

x3

x7

φ220

A R Y V

i-3

i+3 065 014 098 054

072 013 014 032

089064098016


FWHT

Featuresextraction

2

1

0

-1

-2

-3 -2 0 1-1 2 3

PCA

true

true

true

true

true

true

false

false

false

false

false

false


Rotation forest

1

08

06

04

02

0

Sens

itivi

ty







Recognized PPIs

Protein A and B

Average AUC = 09678











1

08

06

04

02

00 02 04 06 08 1

1 ndash specificity



Sens

itivi

ty

Average AUC = 09750


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09690



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09055


100

90

80

70

60

50

40

30

20

10

0

Accu

racy

()


RFSVMKNN

(a)

100

90

80

70

60

50

40

30

20

10

0

Prec

ision

()


RFSVMKNN

(b)

Figure 5 Continued


100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References























































φi

a(1)i1 a


0 a(1)i1 a


⋮ ⋮ ⋱ ⋮


SM( )i1

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

(5)



Vj(g) 1S

1113944

S

i1Rij Zφa

i( 1113857 (6)




Acc TN + TP

TN + FP + FN + TP

Sen TP

FN + TP

Prec TP

TP + FP



1113968

(7)











1

2

11

12

13

n

M

F

D

V

E

L

L =

φ11 φ12φ22φ21


φp20φpjφp2φp1

φ1jφ2j

φ120

x0y0

y1

y2

y3

y4

y5

y6

y7

x4

x1

x5

x2

x6

x3

x7

φ220

A R Y V

i-3

i+3 065 014 098 054

072 013 014 032

089064098016


FWHT

Featuresextraction

2

1

0

-1

-2

-3 -2 0 1-1 2 3

PCA

true

true

true

true

true

true

false

false

false

false

false

false


Rotation forest

1

08

06

04

02

0

Sens

itivi

ty







Recognized PPIs

Protein A and B

Average AUC = 09678











1

08

06

04

02

00 02 04 06 08 1

1 ndash specificity



Sens

itivi

ty

Average AUC = 09750


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09690



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09055


100

90

80

70

60

50

40

30

20

10

0

Accu

racy

()


RFSVMKNN

(a)

100

90

80

70

60

50

40

30

20

10

0

Prec

ision

()


RFSVMKNN

(b)

Figure 5 Continued


100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References





















































1

2

11

12

13

n

M

F

D

V

E

L

L =

φ11 φ12φ22φ21


φp20φpjφp2φp1

φ1jφ2j

φ120

x0y0

y1

y2

y3

y4

y5

y6

y7

x4

x1

x5

x2

x6

x3

x7

φ220

A R Y V

i-3

i+3 065 014 098 054

072 013 014 032

089064098016


FWHT

Featuresextraction

2

1

0

-1

-2

-3 -2 0 1-1 2 3

PCA

true

true

true

true

true

true

false

false

false

false

false

false


Rotation forest

1

08

06

04

02

0

Sens

itivi

ty







Recognized PPIs

Protein A and B

Average AUC = 09678











1

08

06

04

02

00 02 04 06 08 1

1 ndash specificity



Sens

itivi

ty

Average AUC = 09750


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09690



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09055


100

90

80

70

60

50

40

30

20

10

0

Accu

racy

()


RFSVMKNN

(a)

100

90

80

70

60

50

40

30

20

10

0

Prec

ision

()


RFSVMKNN

(b)

Figure 5 Continued


100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References





















































1

08

06

04

02

00 02 04 06 08 1

1 ndash specificity



Sens

itivi

ty

Average AUC = 09750


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09690



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09055


100

90

80

70

60

50

40

30

20

10

0

Accu

racy

()


RFSVMKNN

(a)

100

90

80

70

60

50

40

30

20

10

0

Prec

ision

()


RFSVMKNN

(b)

Figure 5 Continued


100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References



















































1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09055


100

90

80

70

60

50

40

30

20

10

0

Accu

racy

()


RFSVMKNN

(a)

100

90

80

70

60

50

40

30

20

10

0

Prec

ision

()


RFSVMKNN

(b)

Figure 5 Continued


100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References



















































100

90

80

70

60

50

40

30

20

10

0

MCC

()


RFSVMKNN

(c)

100

90

80

70

60

50

40

30

20

10

0

AUC

()


RFSVMKNN

(d)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09314

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08348

(b)



1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References



















































1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 09238

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07905

(b)


1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 08783

(a)

1

08

06

04

02

0

Sens

itivi

ty




Average AUC = 07345

(b)





Data Availability




Acknowledgments


References





















































Data Availability




Acknowledgments


References



















































fwht-rf:anovelcomputationalapproachtopredictplant ...particularly expensive, time-consuming, and...

Documents