research article a robust probability classifier based on the … · 2020. 1. 13. · research...
TRANSCRIPT
Research ArticleA Robust Probability Classifier Based onthe Modified 1205942-Distance
Yongzhi Wang1 Yuli Zhang2 Jining Yi3 Honggang Qu34 and Jinli Miu34
1 College of Instrumentation amp Electrical Engineering Jilin University Changchun 130061 China2Department of Automation TNList Tsinghua University Beijing 100084 China3Development and Research Center of China Geological Survey Beijing 100037 China4Key Laboratory of Geological Information Technology Ministry of Land and Resources Beijing 100037 China
Correspondence should be addressed to Yongzhi Wang iamwangyongzhi126com
Received 9 January 2014 Revised 5 April 2014 Accepted 7 April 2014 Published 30 April 2014
Academic Editor Hua-Peng Chen
Copyright copy 2014 Yongzhi Wang et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
We propose a robust probability classifier model to address classification problems with data uncertainty A class-conditionalprobability distributional set is constructed based on the modified 120594
2-distance Based on a ldquolinear combination assumptionrdquo for theposterior class-conditional probabilities we consider a classification criterion using the weighted sum of the posterior probabilitiesAn optimal robust minimax classifier is defined as the one with the minimal worst-case absolute error loss function value overall possible distributions belonging to the constructed distributional set Based on the conic duality theorem we show that theresulted optimization problem can be reformulated into a second order cone programming problemwhich can be efficiently solvedby interior algorithms The robustness of the proposed model can avoid the ldquooverlearningrdquo phenomenon on training sets and thuskeep a comparable accuracy on test sets Numerical experiments validate the effectiveness of the proposed model and further showthat it also provides promising results on multiple classification problems
1 Introduction
Statistics classification has been extensively studied in thefield ofmachine learning and statistics A typical classificationproblem is to design a linear or nonlinear classifier basedon a known training set such that a new observation canbe assigned to one of the known classes Many classificationmodels have been proposed such as the naive Bayes classifiers(NBC) [1 2] artificial neural network [3] and support vectormachines (SVM) [4]
In real-world classification problems it is often the casethat the data of training set are imprecise due to unavoidableobservational noises in the process of data collection or dataapproximation from incomplete samples One way to handlethe data uncertainty is to design a robust classifier in thesense that it has the minimal worst-case misclassificationprobability for the training sets The idea of robustness hasbeen widely applied in many traditional machine learningand statistics techniques such as robust Bayes classifiers [5]
robust support vector machines [6] and robust quadraticregressions [7] Robust classifiers are highly related to therecently flourished research on robust optimization Formorerecent developments on robust optimization we refer thereaders to the excellent book [8] and reviews [9 10]
Recently [11 12] have proposed a robust minimaxapproach called the minimax probability machine to designa binary classifier Unlike the traditional methods they makeno assumption on the class-conditional distributions butonly the mean and covariance matrix of each class areassumed to be known Under this assumption the designedclassifier is determined by minimizing the worst-case proba-bility of misclassification under all possible choices of class-conditional distributions with the givenmean and covariancematrix By reformulating the classifier design problem intosecond order cone programming they show that the com-putational complexity of the proposed approach is similarto that of SVM Because of its computational advantage andcompetitive performance with other current methods this
Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 621314 11 pageshttpdxdoiorg1011552014621314
2 Mathematical Problems in Engineering
approach has been further extended to incorporating otherfeatures El Ghaoui et al [13] propose a robust classificationmodel by minimizing the worst-case value of a given lossfunction over all possible choices of the data in a boundedhyperrectangles Three loss functions from SVM logisticregressions and minimax probability machines are studiedin [13] Based on the same assumption of known meanand covariance matrix [14 15] propose the biased minimaxprobability machines to address the biased classificationproblem and further generalize it to obtain the minimumerrorminimax probabilitymachinesHoi and Lyu [16] study aquadratic classifier with positive definite covariance matricesand further consider the problem of finding a convex set tocover known sampled data in one class while minimizing theworst-case misclassification probability The minimax prob-ability machines have also been extended to solve multipleclassification problems see [17 18]
In this paper we propose a robust probability classifier(RPC) based on the modified 120594
2-distance Specifically for agiven training set we first estimate the probability of eachsample belonging to each class based on a feature whichis called a nominal class-conditional distribution Then a 120598-confidence probability distributional set 119875
120598is constructed
based on the nominal class-conditional distributions and themodified 120594
2-distance where parameter 120598 controls the size ofthe constructed set Unlike the ldquoconditional independenceassumptionrdquo in NBC we introduce a ldquolinear combinationassumptionrdquo for the posterior class-conditional probabilitiesand the proposed classifier takes a linear combination formof these probabilities based on different features and it willassign the sample to the class with the maximal posteriorprobability To get a robust classifier we minimize the worst-case loss function value over all possible choices of class-conditional distributions over the distributional set 119875
120598 The
underlying assumption is that due to observational noiseswe cannot obtain the true probability distribution of eachclass but it can be well estimated by the nominal distributionsuch that it belongs to the distributional set119875
120598
Our two major contributions are as follows First inour model the proposed distributional set 119875
120598is based on
the nominal distribution and the modified 1205942-distance As
pointed in [19] such distributional set can make use of moreinformation conveyed in the training set compared withtraditional robust approaches which only use the informationofmean and covariancematrix To the best of our knowledgethis is among the first study of classification models con-sidering complex distribution information Although [20]considers a 120598-contaminated robust support vector machinemodel its distributional set is defined by easily handledlinear constraints and its analysis is highly dependent oncharacterization of the extreme points of this set Here ourproposed distributional set is defined by nonlinear quadraticfunction and is analyzed by the conic duality theorem Secondby taking the absolute error function as the loss functionwe show how to transform our robust minimax optimizationproblem into computable second order cone programmingThe absolute error function in the objective function alsodistinguishes our model from other existing models such
as the soft-margin support vector machine which uses theHinge loss function [21 22] and the logistic regression whichuses the negative log likelihood function [23] Note that theabsolute error function is essential in our model to obtaina tractable optimization problem for the proposed modelNumerical experiments on real-world application validatethe effectiveness of the proposed classifier and further showthat the proposed classifier also performs well for multipleclassification problems
The paper proceeds as follows Section 2 introduces theproposed robust minimax probability classifier based on themodified 120594
2-distance and discusses how to construct thedesired distributional set 119875
120598 Section 3 provides an equivalent
reformulation by handling the robust constraints and robustobjective separately Numerical experiments on real-worlddata set are carried out to validate the effectiveness of theproposed classifier in Section 4 Section 5 concludes thispaper and gives future research directions
2 Classifier Models
In this section a simple probability classifier is first presentedand then it is extended to handle data uncertainty byintroducing a distributional set 119875
120598 We also discuss how to
construct this distributional set based on training data setConsider a multiclass multifeature classification problem
in which each sample contains |119871| features and there are|119869| classes and |119868| samples Specifically given a training set(119883 119884) isin R|119868|times|119871|times0 1
|119868|times|119869| where 119909119894119897denotes the 119897th feature
of the 119894th sample and 119910119894119895
= 1 if the 119894th sample belongs to119895th class otherwise 119910
119894119895= 0 In the following context we
will also use the term 119909119894 to denote the 119894th sample that is
119909119894= (1199091198941
119909119894|119871|
)
21 Probability Classifier Bayes classifiers assign an observa-tion 119909 to the 119895
lowast(119909)th class which has the maximal posterior
probability that is
119895lowast
(119909) = arg max119895isin119869
119875 (119895 | 119909) (1)
and 119875(119895 | 119909) is the posterior probability function that isthe conditional probability that the sample belongs to the 119895thclass given that we know it has feature vector 119909
Using Bayesrsquo theorem we have
119875 (119895 | 119909) =119875 (119895) 119875 (119909 | 119895)
119875 (119909)prop 119875 (119895) 119875 (119909 | 119895) (2)
where 119875(119895) is the prior probability of the 119895th class 119875(119909 | 119895)
is the conditional probability for the 119894th class and 119875(119909) isthe probability that a sample has feature vector 119909 Note that119875(119909) is a constant if the values of the feature variables areknown and thus can be omitted To design an effective Bayesclassifier the key issue is estimating the class-conditionalprobability 119875(119909 | 119895) or the joint probability 119875(119909 119895) Theoreti-cally using the chain rule we have
119875 (119909 119895) = 119875 (119895) 119875 (1199091
| 119895) 119875 (1199092
| 119895 1199091)
sdot sdot sdot 119875 (119909|119871|
| 119895 1199091 119909
|119871minus1|)
(3)
Mathematical Problems in Engineering 3
However such estimating method leads to the problem ofldquodimension disasterrdquo
To address this issue the naive Bayes classifier makes thefollowing ldquoconditional independence assumptionrdquo
119875 (119909 | 119895) =
|119871|
prod
119897=1
119901119897
119895(119909) (4)
where 119901119897
119895(119909) = 119875(119909
119897| 119895) is the class-conditional probability
that the observation 119909 belongs to the 119895th class based on the119897th feature Here we introduce another ldquolinear combinationassumptionrdquo for the class-conditional probability
119875 (119909 | 119895) =
|119871|
sum
119897=1
120573119897
119895119901119897
119895(119909) (5)
where 120573119897
119895is a coefficient Compared with the ldquoconditional
independence assumptionrdquo which uses the probabilisticinformation in terms of multiplication the proposed ldquolinearcombination assumptionrdquo uses the probabilistic informationin terms of weighted sum We will further discuss therationality of this assumption at the end of this subsection
Under this assumption we have
119875 (119895 | 119909) prop 119875 (119895) 119875 (119909 | 119895) = 119875 (119895)
|119871|
sum
119897=1
120573119897
119895119901119897
119895(119909) =
|119871|
sum
119897=1
120572119897
119895119901119897
119895(119909)
(6)
where 120572119897
119895= 119875(119895)120573
119897
119895denotes the probability weight of the 119897th
feature for the 119895th classTo obtain the optimal probability classifier based on the
ldquolinear combination assumptionrdquo it is natural to consider thefollowing optimization problem
min120572isinΘ
sum
119895isin119869
sum
119894isin119868
119871 (119875 (119895 | 119909119894) 119910119894119895
) (7)
where 119871(sdot sdot) R timesR rarr 119877+is a prespecified loss function In
the following context we will take the absolute error functionas our loss function that is 119871(119909 119910) = |119909 minus 119910| In view ofits probability property it is straightforward to impose thefollowing constraints on the posterior probability
0 le 119891 (119895 | 119909119894) le 1 forall119894 isin 119868 119895 isin 119869 (8)
Under such constraints we have that
sum
119895isin119869
sum
119894isin119868
119871 (119891 (119895 | 119909119894) 119910119894119895
)
= sum
119895isin119869
sum
119894isin119868
10038161003816100381610038161003816119891 (119895 | 119909
119894) minus 119910119894119895
10038161003816100381610038161003816
= sum
119895isin119869
sum
119894isin119868
119910119894119895
(1 minus 119891 (119895 | 119909119894)) + (1 minus 119910
119894119895) 119891 (119895 | 119909
119894)
= sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) 119891 (119895 | 119909119894) + |119868|
(9)
where |119868| = sum119895isin119869
sum119894isin119868
119910119894119895
Thus the optimal probability classifier (PC) problem canbe formulated as follows
(PC) min sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119901119897
119894119895+ |119868|
st 0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119894 isin 119868 119895 isin 119869
(10)
It is no doubt that the ldquolinear combination assumptionrdquomay not work sometimes However we justify the proposedclassifier by the following facts
(1) As an intuitive interpretation note that 119901119897
119895(119909) esti-
mates the probability of the observation 119909 belongingto the 119895th class only based on the 119897th feature thusit provides partial probabilistic information of thesample Hence we can interpret the weight 120572
119897
119895as
certain degree of trust on the information and in thissense the ldquolinear combination assumptionrdquo is a wayof combining evidence fromdifferent sources Similarideas can also be found in the theory of evidence seethe Dempster-Shafer theory [24 25]
(2) In terms of the classification performance in theworst case the proposed classifier may put all weighton one feature thus in such case it is equivalent toa Bayes classifier based on a well-selected feature Ifeach class has its ldquotypicalrdquo feature which can distin-guish it from other classes the proposed classifier hasthe ability to learn this property by putting differentweights on different features for different classes andthus provides better classification performance Areal-life application on lithology classification prob-lems also validates its classification performance bycomparison with support vector machines and thenaive Bayes classifier
(3) Another advantage of the proposed classifier is itshigh computability As we show in Section 3 the pro-posed classifier and its robust counterpart problemscan be reformulated as second order cone program-ming problems and thus can be solved by interioralgorithms in polynomial time
22 Robust Probability Classifier Due to observationalnoises the true class-conditional probability distribution isoften difficult to obtain Instead we can construct a confi-dence distributional set which contains the true distributionUnlike the traditional distributional sets in minimax prob-ability machines which only utilize mean and covariancematrix we construct our class-conditional probability distri-butional set based on the modified 120594
2-distance which usesmore information from the samples
4 Mathematical Problems in Engineering
The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is
used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901
1 119901
119898)119879
and 119902 = (1199021 119902
119898)119879 it is defined as
119889 (119902 119901) =
119898
sum
119895=1
(119902119895
minus 119901119895)2
119901119895
(11)
Based on the modified 1205942-distance we present the following
class-conditional probability distributional set
119875120598
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0 sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598
forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(12)
where 119901119897
119894119895is the nominal class-conditional distribution prob-
ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set
To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875
120598 the robust constraints ensure
that all the original constraints should also be satisfied forany distribution in 119875
120598 Thus the robust probability classifier
problem is of the following form
(RPC) min
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| 119902119897
119894119895 isin 119875120598
st 0 le sum
119897isin119871
120572119897
119895119902119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598
forall119894 119895
(13)
Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3
23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal
probability 119901119897
119894119895 The selection of parameter 120598 is application
based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897
119894119895For the 119897th feature the following procedure takes an
integer 119870119897indicating the number of data intervals as an input
andwill output the estimated probability119901119897
119894119895of the 119894th sample
belonging to the 119895th class
(1) Sort samples in the increased order and divide theminto 119870
119897intervals such that each interval has at least
lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval
by Δ119897119896
(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval
119873119897119896 and the total number of samples belonging to the
119895-class in the 119896th interval 119873119897119896119895
(3) For the 119894th sample if it falls into the 119896th interval the
class-conditional probability 119901119897
119894119895is calculated by
119901119897
119894119895= Prob (119894 isin 119895 | 119909
119894119897isin Δ119897119896
)
=Prob (119894 isin 119895 119909
119894119897isin Δ119897119896
)
Prob (119909119894119897
isin Δ119897119896
)
=Prob (119894 isin 119895)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 119895)
sum1198951015840isin119869Prob (119894 isin 119895
1015840)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 1198951015840)
=
(119873119895 |119868|) sdot (119873
119897119896119895119873119895)
sum1198951015840isin119869
(1198731015840
119895 |119868|) sdot (119873
11989711989611989510158401198731015840
119895)
=
119873119897119896119895
119873119897119896
(14)
Note that from the definition of 119875120598 we easily compute the
upper bound 119902119897
119894119895and lower bound 119902
119897
119894119895for the true class-
conditional probability 119902119897
119894119895as follows
119902119897
119894119895= max
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(15)
119902119897
119894119895= min
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(16)
The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]
3 Solution Methods for RPC
In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]
Mathematical Problems in Engineering 5
Consider a conic program of the following form
(CP) min 119888119879119909
st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898
119860119909 = 119887
(17)
and its dual problem
(DP) max 119887119879119911 +
119898
sum
119894=1
119887119879
119894119910119894
st 119860lowast119911 +
119898
sum
119894=1
119860lowast
119894119910119894= 119888
119910119894isin 119862lowast
119894 forall119894 = 1 119898
(18)
where 119862119894is a cone in R119899119894 and 119862
lowast
119894is its dual cone defined by
119862lowast
119894= 119910 isin R
119899119894 119910119879119909 ge forall119909 isin 119862
119894 (19)
A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860
119894119909 minus 119887119894
isin int119862119894 forall119894 = 1 119898 where
int119862119894denotes the interior point set of 119862
119894
Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value
31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently
Lemma 2 For given 119894 119895 the robust constraint
0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598 (20)
is equal to the following constraints
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 119906
1198971
119894119895 V1198971119894119895
ge 0 forall119897 isin 119871
(21)
Proof First note that the distributional set 119875120598119894can be repre-
sented as theCartesian product of a series of projected subsets
119875120598
= prod
119894isin119868
119875120598119894
(22)
where the projected subset on index 119894 is defined by
119875120598119894
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0
sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598 forall119897 isin 119871 119895 isin 119869
(23)
Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902
119897
119894119895 119897 isin 119871 we can further split the
projected subset 119875120598119894into |119869| subsets
119875120598119894
= prod
119895isin119869
119875120598119894119895
= prod
119895isin119869
119902119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 (24)
where 119902119897
119894119895and 119902119897
119894119895are computed by (15) and (16) respectively
For constraint sum119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall119902
119897
119894119895 isin 119875120598 it is equal to
the following constraint
sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894
lArrrArr sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894119895
lArrrArr minsum
119897isin119871
120572119897
119895119901119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 ge 0
lArrrArr maxsum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
)
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871 ge 0
lArrrArr sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
(25)
where the last equivalence comes from the strong dualitybetween these two linear programs
For the constraint sum119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119902
119897
119894119895 isin 119875120598 the same
technique applies thus we complete the proof
32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889
lowast of the modified 1205942-distance
119889lowast
(119904) = sup119905ge0
119904119905 minus 119889 (119905) =[119904 + 2]
2
+
4minus 1 (26)
6 Mathematical Problems in Engineering
where the function [sdot]+is defined as [119909]
+= 119909 if 119909 ge
0 otherwise [119909]+
= 0 For more details about conjugatefunctions see [28]
Proposition 3 The following inner maximization problem
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895+ |119868| 119902
119897
119894119895 isin 119875120598 (27)
is equivalent to a second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 120582119897
119894119895 119911119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(28)
where a second order cone 119871119899+1 is defined as
119871119899+1
=
119909 isin R119899+1
119909119899+1
ge radic
119899
sum
119894=1
1199092
119894
(29)
Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)
(MP) min 119905
st 119905 ge sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895 + |119868|
forall 119902119897
119894119895 isin 119875120598
(30)
The above constraint can be further reduced to the followingconstraint
max
sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| minus 119905 forall 119902119897
119894119895 isin 119875120598 le 0
(31)
By assigning Lagrange multipliers 120579119897
119894isin R and 120582
119897
119894isin R+
to the constraints in the left optimization problem we obtainthe following Lagrange function
119871 (119902 120579 120582) = sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
)
+ |119868| minus 119905
(32)
where 119903119897
119894119895= 120572119897
119895(1 minus 2119910
119894119895) + 120579119897
119894 Its dual function is given as
119863 (120579 120582) = max119902ge0
119871 (119905 119902 120579 120582)
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
max119902119897
119894119895ge0
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894119901119897
119894119895(
119902119897
119894119895minus 119901119897
119894119895
119901119897
119894119895
)
2
)
+ |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895max119905ge0
(119903119897
119894119895119905 minus 120582119897
119894(119905 minus 1)
2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894max119905ge0
(
119903119897
119894119895
120582119897
119894
119905 minus (119905 minus 1)2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) + |119868| minus 119905
(33)
Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897
119894119895 thus there is no duality gap between (31) and the
following dual problem
min 119863 (120579 120582) 120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
lArrrArr
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| minus 119905
st 119908119897
119894119895ge120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) forall119894isin119868 119897isin119871 119895isin119869
120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
(34)
Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints
120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) le 119908119897
119894119895lArrrArr 120582
119897
119894(minus1 +
1
4
[
[
119903119897
119894119895
120582119897
119894
+ 2]
]
2
+
) le 119908119897
119894119895
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge [119903
119897
119894119895+ 2120582119897
119894]2
+
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge (119911
119897
119894119895)2
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
Mathematical Problems in Engineering 7
lArrrArr (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
(35)
By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem
min 119905
st sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| le 119905
(
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 119911119897
119894119895 120582119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(36)
By eliminating variable 119905 we complete the proof
Based on the Lemma 2 and Proposition 3 we obtain ourmain result
Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119911119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
120582119897
119894119895 119911119897
119894119895 1199061198971
119894119895 V1198971119894119895
1199061198970
119894119895 V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895 120579119897
119894119895 119908119897
119894119895 120572119897
119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(37)
4 Numerical Experiments onReal-World Applications
In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems
All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM
41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility
Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples
42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575
2|119869| where
120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901
119897
119894119895= 1|119869| its maximal variation
range is between 119901119897
119894119895(1 minus 120575) and 119901
119897
119894119895(1 + 120575) The number of
data intervals 119870119897is defined as 119870
119897= |119868|(|119869| times 119870) such that if
the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8
We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)
(RSVM) min sum
119894isin119868
120585119894119895
+ 120582119895
10038171003817100381710038171003817119908119895
10038171003817100381710038171003817
st 119910119894119895
(sum
119897isin119871
119908119897
119895119909119897
119894+ 119887119895) ge 1 minus 120585
119894119895 119894 isin 119868
120585119894119895
ge 0 119894 isin 119868
(38)
where 119910119894119895
= 2119910119894119895
minus1 and 120582119895
ge 0 is a regularization parameterAs pointed by [8] 120582
119895ge 0 represents a trade-off between the
number of training set errors and the amount of robustness
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
2 Mathematical Problems in Engineering
approach has been further extended to incorporating otherfeatures El Ghaoui et al [13] propose a robust classificationmodel by minimizing the worst-case value of a given lossfunction over all possible choices of the data in a boundedhyperrectangles Three loss functions from SVM logisticregressions and minimax probability machines are studiedin [13] Based on the same assumption of known meanand covariance matrix [14 15] propose the biased minimaxprobability machines to address the biased classificationproblem and further generalize it to obtain the minimumerrorminimax probabilitymachinesHoi and Lyu [16] study aquadratic classifier with positive definite covariance matricesand further consider the problem of finding a convex set tocover known sampled data in one class while minimizing theworst-case misclassification probability The minimax prob-ability machines have also been extended to solve multipleclassification problems see [17 18]
In this paper we propose a robust probability classifier(RPC) based on the modified 120594
2-distance Specifically for agiven training set we first estimate the probability of eachsample belonging to each class based on a feature whichis called a nominal class-conditional distribution Then a 120598-confidence probability distributional set 119875
120598is constructed
based on the nominal class-conditional distributions and themodified 120594
2-distance where parameter 120598 controls the size ofthe constructed set Unlike the ldquoconditional independenceassumptionrdquo in NBC we introduce a ldquolinear combinationassumptionrdquo for the posterior class-conditional probabilitiesand the proposed classifier takes a linear combination formof these probabilities based on different features and it willassign the sample to the class with the maximal posteriorprobability To get a robust classifier we minimize the worst-case loss function value over all possible choices of class-conditional distributions over the distributional set 119875
120598 The
underlying assumption is that due to observational noiseswe cannot obtain the true probability distribution of eachclass but it can be well estimated by the nominal distributionsuch that it belongs to the distributional set119875
120598
Our two major contributions are as follows First inour model the proposed distributional set 119875
120598is based on
the nominal distribution and the modified 1205942-distance As
pointed in [19] such distributional set can make use of moreinformation conveyed in the training set compared withtraditional robust approaches which only use the informationofmean and covariancematrix To the best of our knowledgethis is among the first study of classification models con-sidering complex distribution information Although [20]considers a 120598-contaminated robust support vector machinemodel its distributional set is defined by easily handledlinear constraints and its analysis is highly dependent oncharacterization of the extreme points of this set Here ourproposed distributional set is defined by nonlinear quadraticfunction and is analyzed by the conic duality theorem Secondby taking the absolute error function as the loss functionwe show how to transform our robust minimax optimizationproblem into computable second order cone programmingThe absolute error function in the objective function alsodistinguishes our model from other existing models such
as the soft-margin support vector machine which uses theHinge loss function [21 22] and the logistic regression whichuses the negative log likelihood function [23] Note that theabsolute error function is essential in our model to obtaina tractable optimization problem for the proposed modelNumerical experiments on real-world application validatethe effectiveness of the proposed classifier and further showthat the proposed classifier also performs well for multipleclassification problems
The paper proceeds as follows Section 2 introduces theproposed robust minimax probability classifier based on themodified 120594
2-distance and discusses how to construct thedesired distributional set 119875
120598 Section 3 provides an equivalent
reformulation by handling the robust constraints and robustobjective separately Numerical experiments on real-worlddata set are carried out to validate the effectiveness of theproposed classifier in Section 4 Section 5 concludes thispaper and gives future research directions
2 Classifier Models
In this section a simple probability classifier is first presentedand then it is extended to handle data uncertainty byintroducing a distributional set 119875
120598 We also discuss how to
construct this distributional set based on training data setConsider a multiclass multifeature classification problem
in which each sample contains |119871| features and there are|119869| classes and |119868| samples Specifically given a training set(119883 119884) isin R|119868|times|119871|times0 1
|119868|times|119869| where 119909119894119897denotes the 119897th feature
of the 119894th sample and 119910119894119895
= 1 if the 119894th sample belongs to119895th class otherwise 119910
119894119895= 0 In the following context we
will also use the term 119909119894 to denote the 119894th sample that is
119909119894= (1199091198941
119909119894|119871|
)
21 Probability Classifier Bayes classifiers assign an observa-tion 119909 to the 119895
lowast(119909)th class which has the maximal posterior
probability that is
119895lowast
(119909) = arg max119895isin119869
119875 (119895 | 119909) (1)
and 119875(119895 | 119909) is the posterior probability function that isthe conditional probability that the sample belongs to the 119895thclass given that we know it has feature vector 119909
Using Bayesrsquo theorem we have
119875 (119895 | 119909) =119875 (119895) 119875 (119909 | 119895)
119875 (119909)prop 119875 (119895) 119875 (119909 | 119895) (2)
where 119875(119895) is the prior probability of the 119895th class 119875(119909 | 119895)
is the conditional probability for the 119894th class and 119875(119909) isthe probability that a sample has feature vector 119909 Note that119875(119909) is a constant if the values of the feature variables areknown and thus can be omitted To design an effective Bayesclassifier the key issue is estimating the class-conditionalprobability 119875(119909 | 119895) or the joint probability 119875(119909 119895) Theoreti-cally using the chain rule we have
119875 (119909 119895) = 119875 (119895) 119875 (1199091
| 119895) 119875 (1199092
| 119895 1199091)
sdot sdot sdot 119875 (119909|119871|
| 119895 1199091 119909
|119871minus1|)
(3)
Mathematical Problems in Engineering 3
However such estimating method leads to the problem ofldquodimension disasterrdquo
To address this issue the naive Bayes classifier makes thefollowing ldquoconditional independence assumptionrdquo
119875 (119909 | 119895) =
|119871|
prod
119897=1
119901119897
119895(119909) (4)
where 119901119897
119895(119909) = 119875(119909
119897| 119895) is the class-conditional probability
that the observation 119909 belongs to the 119895th class based on the119897th feature Here we introduce another ldquolinear combinationassumptionrdquo for the class-conditional probability
119875 (119909 | 119895) =
|119871|
sum
119897=1
120573119897
119895119901119897
119895(119909) (5)
where 120573119897
119895is a coefficient Compared with the ldquoconditional
independence assumptionrdquo which uses the probabilisticinformation in terms of multiplication the proposed ldquolinearcombination assumptionrdquo uses the probabilistic informationin terms of weighted sum We will further discuss therationality of this assumption at the end of this subsection
Under this assumption we have
119875 (119895 | 119909) prop 119875 (119895) 119875 (119909 | 119895) = 119875 (119895)
|119871|
sum
119897=1
120573119897
119895119901119897
119895(119909) =
|119871|
sum
119897=1
120572119897
119895119901119897
119895(119909)
(6)
where 120572119897
119895= 119875(119895)120573
119897
119895denotes the probability weight of the 119897th
feature for the 119895th classTo obtain the optimal probability classifier based on the
ldquolinear combination assumptionrdquo it is natural to consider thefollowing optimization problem
min120572isinΘ
sum
119895isin119869
sum
119894isin119868
119871 (119875 (119895 | 119909119894) 119910119894119895
) (7)
where 119871(sdot sdot) R timesR rarr 119877+is a prespecified loss function In
the following context we will take the absolute error functionas our loss function that is 119871(119909 119910) = |119909 minus 119910| In view ofits probability property it is straightforward to impose thefollowing constraints on the posterior probability
0 le 119891 (119895 | 119909119894) le 1 forall119894 isin 119868 119895 isin 119869 (8)
Under such constraints we have that
sum
119895isin119869
sum
119894isin119868
119871 (119891 (119895 | 119909119894) 119910119894119895
)
= sum
119895isin119869
sum
119894isin119868
10038161003816100381610038161003816119891 (119895 | 119909
119894) minus 119910119894119895
10038161003816100381610038161003816
= sum
119895isin119869
sum
119894isin119868
119910119894119895
(1 minus 119891 (119895 | 119909119894)) + (1 minus 119910
119894119895) 119891 (119895 | 119909
119894)
= sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) 119891 (119895 | 119909119894) + |119868|
(9)
where |119868| = sum119895isin119869
sum119894isin119868
119910119894119895
Thus the optimal probability classifier (PC) problem canbe formulated as follows
(PC) min sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119901119897
119894119895+ |119868|
st 0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119894 isin 119868 119895 isin 119869
(10)
It is no doubt that the ldquolinear combination assumptionrdquomay not work sometimes However we justify the proposedclassifier by the following facts
(1) As an intuitive interpretation note that 119901119897
119895(119909) esti-
mates the probability of the observation 119909 belongingto the 119895th class only based on the 119897th feature thusit provides partial probabilistic information of thesample Hence we can interpret the weight 120572
119897
119895as
certain degree of trust on the information and in thissense the ldquolinear combination assumptionrdquo is a wayof combining evidence fromdifferent sources Similarideas can also be found in the theory of evidence seethe Dempster-Shafer theory [24 25]
(2) In terms of the classification performance in theworst case the proposed classifier may put all weighton one feature thus in such case it is equivalent toa Bayes classifier based on a well-selected feature Ifeach class has its ldquotypicalrdquo feature which can distin-guish it from other classes the proposed classifier hasthe ability to learn this property by putting differentweights on different features for different classes andthus provides better classification performance Areal-life application on lithology classification prob-lems also validates its classification performance bycomparison with support vector machines and thenaive Bayes classifier
(3) Another advantage of the proposed classifier is itshigh computability As we show in Section 3 the pro-posed classifier and its robust counterpart problemscan be reformulated as second order cone program-ming problems and thus can be solved by interioralgorithms in polynomial time
22 Robust Probability Classifier Due to observationalnoises the true class-conditional probability distribution isoften difficult to obtain Instead we can construct a confi-dence distributional set which contains the true distributionUnlike the traditional distributional sets in minimax prob-ability machines which only utilize mean and covariancematrix we construct our class-conditional probability distri-butional set based on the modified 120594
2-distance which usesmore information from the samples
4 Mathematical Problems in Engineering
The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is
used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901
1 119901
119898)119879
and 119902 = (1199021 119902
119898)119879 it is defined as
119889 (119902 119901) =
119898
sum
119895=1
(119902119895
minus 119901119895)2
119901119895
(11)
Based on the modified 1205942-distance we present the following
class-conditional probability distributional set
119875120598
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0 sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598
forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(12)
where 119901119897
119894119895is the nominal class-conditional distribution prob-
ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set
To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875
120598 the robust constraints ensure
that all the original constraints should also be satisfied forany distribution in 119875
120598 Thus the robust probability classifier
problem is of the following form
(RPC) min
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| 119902119897
119894119895 isin 119875120598
st 0 le sum
119897isin119871
120572119897
119895119902119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598
forall119894 119895
(13)
Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3
23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal
probability 119901119897
119894119895 The selection of parameter 120598 is application
based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897
119894119895For the 119897th feature the following procedure takes an
integer 119870119897indicating the number of data intervals as an input
andwill output the estimated probability119901119897
119894119895of the 119894th sample
belonging to the 119895th class
(1) Sort samples in the increased order and divide theminto 119870
119897intervals such that each interval has at least
lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval
by Δ119897119896
(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval
119873119897119896 and the total number of samples belonging to the
119895-class in the 119896th interval 119873119897119896119895
(3) For the 119894th sample if it falls into the 119896th interval the
class-conditional probability 119901119897
119894119895is calculated by
119901119897
119894119895= Prob (119894 isin 119895 | 119909
119894119897isin Δ119897119896
)
=Prob (119894 isin 119895 119909
119894119897isin Δ119897119896
)
Prob (119909119894119897
isin Δ119897119896
)
=Prob (119894 isin 119895)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 119895)
sum1198951015840isin119869Prob (119894 isin 119895
1015840)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 1198951015840)
=
(119873119895 |119868|) sdot (119873
119897119896119895119873119895)
sum1198951015840isin119869
(1198731015840
119895 |119868|) sdot (119873
11989711989611989510158401198731015840
119895)
=
119873119897119896119895
119873119897119896
(14)
Note that from the definition of 119875120598 we easily compute the
upper bound 119902119897
119894119895and lower bound 119902
119897
119894119895for the true class-
conditional probability 119902119897
119894119895as follows
119902119897
119894119895= max
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(15)
119902119897
119894119895= min
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(16)
The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]
3 Solution Methods for RPC
In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]
Mathematical Problems in Engineering 5
Consider a conic program of the following form
(CP) min 119888119879119909
st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898
119860119909 = 119887
(17)
and its dual problem
(DP) max 119887119879119911 +
119898
sum
119894=1
119887119879
119894119910119894
st 119860lowast119911 +
119898
sum
119894=1
119860lowast
119894119910119894= 119888
119910119894isin 119862lowast
119894 forall119894 = 1 119898
(18)
where 119862119894is a cone in R119899119894 and 119862
lowast
119894is its dual cone defined by
119862lowast
119894= 119910 isin R
119899119894 119910119879119909 ge forall119909 isin 119862
119894 (19)
A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860
119894119909 minus 119887119894
isin int119862119894 forall119894 = 1 119898 where
int119862119894denotes the interior point set of 119862
119894
Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value
31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently
Lemma 2 For given 119894 119895 the robust constraint
0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598 (20)
is equal to the following constraints
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 119906
1198971
119894119895 V1198971119894119895
ge 0 forall119897 isin 119871
(21)
Proof First note that the distributional set 119875120598119894can be repre-
sented as theCartesian product of a series of projected subsets
119875120598
= prod
119894isin119868
119875120598119894
(22)
where the projected subset on index 119894 is defined by
119875120598119894
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0
sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598 forall119897 isin 119871 119895 isin 119869
(23)
Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902
119897
119894119895 119897 isin 119871 we can further split the
projected subset 119875120598119894into |119869| subsets
119875120598119894
= prod
119895isin119869
119875120598119894119895
= prod
119895isin119869
119902119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 (24)
where 119902119897
119894119895and 119902119897
119894119895are computed by (15) and (16) respectively
For constraint sum119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall119902
119897
119894119895 isin 119875120598 it is equal to
the following constraint
sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894
lArrrArr sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894119895
lArrrArr minsum
119897isin119871
120572119897
119895119901119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 ge 0
lArrrArr maxsum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
)
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871 ge 0
lArrrArr sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
(25)
where the last equivalence comes from the strong dualitybetween these two linear programs
For the constraint sum119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119902
119897
119894119895 isin 119875120598 the same
technique applies thus we complete the proof
32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889
lowast of the modified 1205942-distance
119889lowast
(119904) = sup119905ge0
119904119905 minus 119889 (119905) =[119904 + 2]
2
+
4minus 1 (26)
6 Mathematical Problems in Engineering
where the function [sdot]+is defined as [119909]
+= 119909 if 119909 ge
0 otherwise [119909]+
= 0 For more details about conjugatefunctions see [28]
Proposition 3 The following inner maximization problem
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895+ |119868| 119902
119897
119894119895 isin 119875120598 (27)
is equivalent to a second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 120582119897
119894119895 119911119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(28)
where a second order cone 119871119899+1 is defined as
119871119899+1
=
119909 isin R119899+1
119909119899+1
ge radic
119899
sum
119894=1
1199092
119894
(29)
Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)
(MP) min 119905
st 119905 ge sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895 + |119868|
forall 119902119897
119894119895 isin 119875120598
(30)
The above constraint can be further reduced to the followingconstraint
max
sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| minus 119905 forall 119902119897
119894119895 isin 119875120598 le 0
(31)
By assigning Lagrange multipliers 120579119897
119894isin R and 120582
119897
119894isin R+
to the constraints in the left optimization problem we obtainthe following Lagrange function
119871 (119902 120579 120582) = sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
)
+ |119868| minus 119905
(32)
where 119903119897
119894119895= 120572119897
119895(1 minus 2119910
119894119895) + 120579119897
119894 Its dual function is given as
119863 (120579 120582) = max119902ge0
119871 (119905 119902 120579 120582)
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
max119902119897
119894119895ge0
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894119901119897
119894119895(
119902119897
119894119895minus 119901119897
119894119895
119901119897
119894119895
)
2
)
+ |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895max119905ge0
(119903119897
119894119895119905 minus 120582119897
119894(119905 minus 1)
2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894max119905ge0
(
119903119897
119894119895
120582119897
119894
119905 minus (119905 minus 1)2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) + |119868| minus 119905
(33)
Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897
119894119895 thus there is no duality gap between (31) and the
following dual problem
min 119863 (120579 120582) 120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
lArrrArr
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| minus 119905
st 119908119897
119894119895ge120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) forall119894isin119868 119897isin119871 119895isin119869
120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
(34)
Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints
120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) le 119908119897
119894119895lArrrArr 120582
119897
119894(minus1 +
1
4
[
[
119903119897
119894119895
120582119897
119894
+ 2]
]
2
+
) le 119908119897
119894119895
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge [119903
119897
119894119895+ 2120582119897
119894]2
+
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge (119911
119897
119894119895)2
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
Mathematical Problems in Engineering 7
lArrrArr (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
(35)
By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem
min 119905
st sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| le 119905
(
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 119911119897
119894119895 120582119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(36)
By eliminating variable 119905 we complete the proof
Based on the Lemma 2 and Proposition 3 we obtain ourmain result
Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119911119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
120582119897
119894119895 119911119897
119894119895 1199061198971
119894119895 V1198971119894119895
1199061198970
119894119895 V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895 120579119897
119894119895 119908119897
119894119895 120572119897
119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(37)
4 Numerical Experiments onReal-World Applications
In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems
All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM
41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility
Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples
42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575
2|119869| where
120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901
119897
119894119895= 1|119869| its maximal variation
range is between 119901119897
119894119895(1 minus 120575) and 119901
119897
119894119895(1 + 120575) The number of
data intervals 119870119897is defined as 119870
119897= |119868|(|119869| times 119870) such that if
the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8
We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)
(RSVM) min sum
119894isin119868
120585119894119895
+ 120582119895
10038171003817100381710038171003817119908119895
10038171003817100381710038171003817
st 119910119894119895
(sum
119897isin119871
119908119897
119895119909119897
119894+ 119887119895) ge 1 minus 120585
119894119895 119894 isin 119868
120585119894119895
ge 0 119894 isin 119868
(38)
where 119910119894119895
= 2119910119894119895
minus1 and 120582119895
ge 0 is a regularization parameterAs pointed by [8] 120582
119895ge 0 represents a trade-off between the
number of training set errors and the amount of robustness
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 3
However such estimating method leads to the problem ofldquodimension disasterrdquo
To address this issue the naive Bayes classifier makes thefollowing ldquoconditional independence assumptionrdquo
119875 (119909 | 119895) =
|119871|
prod
119897=1
119901119897
119895(119909) (4)
where 119901119897
119895(119909) = 119875(119909
119897| 119895) is the class-conditional probability
that the observation 119909 belongs to the 119895th class based on the119897th feature Here we introduce another ldquolinear combinationassumptionrdquo for the class-conditional probability
119875 (119909 | 119895) =
|119871|
sum
119897=1
120573119897
119895119901119897
119895(119909) (5)
where 120573119897
119895is a coefficient Compared with the ldquoconditional
independence assumptionrdquo which uses the probabilisticinformation in terms of multiplication the proposed ldquolinearcombination assumptionrdquo uses the probabilistic informationin terms of weighted sum We will further discuss therationality of this assumption at the end of this subsection
Under this assumption we have
119875 (119895 | 119909) prop 119875 (119895) 119875 (119909 | 119895) = 119875 (119895)
|119871|
sum
119897=1
120573119897
119895119901119897
119895(119909) =
|119871|
sum
119897=1
120572119897
119895119901119897
119895(119909)
(6)
where 120572119897
119895= 119875(119895)120573
119897
119895denotes the probability weight of the 119897th
feature for the 119895th classTo obtain the optimal probability classifier based on the
ldquolinear combination assumptionrdquo it is natural to consider thefollowing optimization problem
min120572isinΘ
sum
119895isin119869
sum
119894isin119868
119871 (119875 (119895 | 119909119894) 119910119894119895
) (7)
where 119871(sdot sdot) R timesR rarr 119877+is a prespecified loss function In
the following context we will take the absolute error functionas our loss function that is 119871(119909 119910) = |119909 minus 119910| In view ofits probability property it is straightforward to impose thefollowing constraints on the posterior probability
0 le 119891 (119895 | 119909119894) le 1 forall119894 isin 119868 119895 isin 119869 (8)
Under such constraints we have that
sum
119895isin119869
sum
119894isin119868
119871 (119891 (119895 | 119909119894) 119910119894119895
)
= sum
119895isin119869
sum
119894isin119868
10038161003816100381610038161003816119891 (119895 | 119909
119894) minus 119910119894119895
10038161003816100381610038161003816
= sum
119895isin119869
sum
119894isin119868
119910119894119895
(1 minus 119891 (119895 | 119909119894)) + (1 minus 119910
119894119895) 119891 (119895 | 119909
119894)
= sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) 119891 (119895 | 119909119894) + |119868|
(9)
where |119868| = sum119895isin119869
sum119894isin119868
119910119894119895
Thus the optimal probability classifier (PC) problem canbe formulated as follows
(PC) min sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119901119897
119894119895+ |119868|
st 0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119894 isin 119868 119895 isin 119869
(10)
It is no doubt that the ldquolinear combination assumptionrdquomay not work sometimes However we justify the proposedclassifier by the following facts
(1) As an intuitive interpretation note that 119901119897
119895(119909) esti-
mates the probability of the observation 119909 belongingto the 119895th class only based on the 119897th feature thusit provides partial probabilistic information of thesample Hence we can interpret the weight 120572
119897
119895as
certain degree of trust on the information and in thissense the ldquolinear combination assumptionrdquo is a wayof combining evidence fromdifferent sources Similarideas can also be found in the theory of evidence seethe Dempster-Shafer theory [24 25]
(2) In terms of the classification performance in theworst case the proposed classifier may put all weighton one feature thus in such case it is equivalent toa Bayes classifier based on a well-selected feature Ifeach class has its ldquotypicalrdquo feature which can distin-guish it from other classes the proposed classifier hasthe ability to learn this property by putting differentweights on different features for different classes andthus provides better classification performance Areal-life application on lithology classification prob-lems also validates its classification performance bycomparison with support vector machines and thenaive Bayes classifier
(3) Another advantage of the proposed classifier is itshigh computability As we show in Section 3 the pro-posed classifier and its robust counterpart problemscan be reformulated as second order cone program-ming problems and thus can be solved by interioralgorithms in polynomial time
22 Robust Probability Classifier Due to observationalnoises the true class-conditional probability distribution isoften difficult to obtain Instead we can construct a confi-dence distributional set which contains the true distributionUnlike the traditional distributional sets in minimax prob-ability machines which only utilize mean and covariancematrix we construct our class-conditional probability distri-butional set based on the modified 120594
2-distance which usesmore information from the samples
4 Mathematical Problems in Engineering
The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is
used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901
1 119901
119898)119879
and 119902 = (1199021 119902
119898)119879 it is defined as
119889 (119902 119901) =
119898
sum
119895=1
(119902119895
minus 119901119895)2
119901119895
(11)
Based on the modified 1205942-distance we present the following
class-conditional probability distributional set
119875120598
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0 sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598
forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(12)
where 119901119897
119894119895is the nominal class-conditional distribution prob-
ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set
To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875
120598 the robust constraints ensure
that all the original constraints should also be satisfied forany distribution in 119875
120598 Thus the robust probability classifier
problem is of the following form
(RPC) min
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| 119902119897
119894119895 isin 119875120598
st 0 le sum
119897isin119871
120572119897
119895119902119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598
forall119894 119895
(13)
Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3
23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal
probability 119901119897
119894119895 The selection of parameter 120598 is application
based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897
119894119895For the 119897th feature the following procedure takes an
integer 119870119897indicating the number of data intervals as an input
andwill output the estimated probability119901119897
119894119895of the 119894th sample
belonging to the 119895th class
(1) Sort samples in the increased order and divide theminto 119870
119897intervals such that each interval has at least
lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval
by Δ119897119896
(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval
119873119897119896 and the total number of samples belonging to the
119895-class in the 119896th interval 119873119897119896119895
(3) For the 119894th sample if it falls into the 119896th interval the
class-conditional probability 119901119897
119894119895is calculated by
119901119897
119894119895= Prob (119894 isin 119895 | 119909
119894119897isin Δ119897119896
)
=Prob (119894 isin 119895 119909
119894119897isin Δ119897119896
)
Prob (119909119894119897
isin Δ119897119896
)
=Prob (119894 isin 119895)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 119895)
sum1198951015840isin119869Prob (119894 isin 119895
1015840)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 1198951015840)
=
(119873119895 |119868|) sdot (119873
119897119896119895119873119895)
sum1198951015840isin119869
(1198731015840
119895 |119868|) sdot (119873
11989711989611989510158401198731015840
119895)
=
119873119897119896119895
119873119897119896
(14)
Note that from the definition of 119875120598 we easily compute the
upper bound 119902119897
119894119895and lower bound 119902
119897
119894119895for the true class-
conditional probability 119902119897
119894119895as follows
119902119897
119894119895= max
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(15)
119902119897
119894119895= min
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(16)
The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]
3 Solution Methods for RPC
In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]
Mathematical Problems in Engineering 5
Consider a conic program of the following form
(CP) min 119888119879119909
st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898
119860119909 = 119887
(17)
and its dual problem
(DP) max 119887119879119911 +
119898
sum
119894=1
119887119879
119894119910119894
st 119860lowast119911 +
119898
sum
119894=1
119860lowast
119894119910119894= 119888
119910119894isin 119862lowast
119894 forall119894 = 1 119898
(18)
where 119862119894is a cone in R119899119894 and 119862
lowast
119894is its dual cone defined by
119862lowast
119894= 119910 isin R
119899119894 119910119879119909 ge forall119909 isin 119862
119894 (19)
A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860
119894119909 minus 119887119894
isin int119862119894 forall119894 = 1 119898 where
int119862119894denotes the interior point set of 119862
119894
Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value
31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently
Lemma 2 For given 119894 119895 the robust constraint
0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598 (20)
is equal to the following constraints
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 119906
1198971
119894119895 V1198971119894119895
ge 0 forall119897 isin 119871
(21)
Proof First note that the distributional set 119875120598119894can be repre-
sented as theCartesian product of a series of projected subsets
119875120598
= prod
119894isin119868
119875120598119894
(22)
where the projected subset on index 119894 is defined by
119875120598119894
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0
sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598 forall119897 isin 119871 119895 isin 119869
(23)
Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902
119897
119894119895 119897 isin 119871 we can further split the
projected subset 119875120598119894into |119869| subsets
119875120598119894
= prod
119895isin119869
119875120598119894119895
= prod
119895isin119869
119902119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 (24)
where 119902119897
119894119895and 119902119897
119894119895are computed by (15) and (16) respectively
For constraint sum119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall119902
119897
119894119895 isin 119875120598 it is equal to
the following constraint
sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894
lArrrArr sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894119895
lArrrArr minsum
119897isin119871
120572119897
119895119901119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 ge 0
lArrrArr maxsum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
)
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871 ge 0
lArrrArr sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
(25)
where the last equivalence comes from the strong dualitybetween these two linear programs
For the constraint sum119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119902
119897
119894119895 isin 119875120598 the same
technique applies thus we complete the proof
32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889
lowast of the modified 1205942-distance
119889lowast
(119904) = sup119905ge0
119904119905 minus 119889 (119905) =[119904 + 2]
2
+
4minus 1 (26)
6 Mathematical Problems in Engineering
where the function [sdot]+is defined as [119909]
+= 119909 if 119909 ge
0 otherwise [119909]+
= 0 For more details about conjugatefunctions see [28]
Proposition 3 The following inner maximization problem
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895+ |119868| 119902
119897
119894119895 isin 119875120598 (27)
is equivalent to a second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 120582119897
119894119895 119911119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(28)
where a second order cone 119871119899+1 is defined as
119871119899+1
=
119909 isin R119899+1
119909119899+1
ge radic
119899
sum
119894=1
1199092
119894
(29)
Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)
(MP) min 119905
st 119905 ge sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895 + |119868|
forall 119902119897
119894119895 isin 119875120598
(30)
The above constraint can be further reduced to the followingconstraint
max
sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| minus 119905 forall 119902119897
119894119895 isin 119875120598 le 0
(31)
By assigning Lagrange multipliers 120579119897
119894isin R and 120582
119897
119894isin R+
to the constraints in the left optimization problem we obtainthe following Lagrange function
119871 (119902 120579 120582) = sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
)
+ |119868| minus 119905
(32)
where 119903119897
119894119895= 120572119897
119895(1 minus 2119910
119894119895) + 120579119897
119894 Its dual function is given as
119863 (120579 120582) = max119902ge0
119871 (119905 119902 120579 120582)
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
max119902119897
119894119895ge0
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894119901119897
119894119895(
119902119897
119894119895minus 119901119897
119894119895
119901119897
119894119895
)
2
)
+ |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895max119905ge0
(119903119897
119894119895119905 minus 120582119897
119894(119905 minus 1)
2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894max119905ge0
(
119903119897
119894119895
120582119897
119894
119905 minus (119905 minus 1)2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) + |119868| minus 119905
(33)
Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897
119894119895 thus there is no duality gap between (31) and the
following dual problem
min 119863 (120579 120582) 120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
lArrrArr
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| minus 119905
st 119908119897
119894119895ge120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) forall119894isin119868 119897isin119871 119895isin119869
120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
(34)
Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints
120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) le 119908119897
119894119895lArrrArr 120582
119897
119894(minus1 +
1
4
[
[
119903119897
119894119895
120582119897
119894
+ 2]
]
2
+
) le 119908119897
119894119895
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge [119903
119897
119894119895+ 2120582119897
119894]2
+
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge (119911
119897
119894119895)2
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
Mathematical Problems in Engineering 7
lArrrArr (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
(35)
By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem
min 119905
st sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| le 119905
(
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 119911119897
119894119895 120582119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(36)
By eliminating variable 119905 we complete the proof
Based on the Lemma 2 and Proposition 3 we obtain ourmain result
Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119911119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
120582119897
119894119895 119911119897
119894119895 1199061198971
119894119895 V1198971119894119895
1199061198970
119894119895 V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895 120579119897
119894119895 119908119897
119894119895 120572119897
119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(37)
4 Numerical Experiments onReal-World Applications
In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems
All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM
41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility
Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples
42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575
2|119869| where
120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901
119897
119894119895= 1|119869| its maximal variation
range is between 119901119897
119894119895(1 minus 120575) and 119901
119897
119894119895(1 + 120575) The number of
data intervals 119870119897is defined as 119870
119897= |119868|(|119869| times 119870) such that if
the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8
We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)
(RSVM) min sum
119894isin119868
120585119894119895
+ 120582119895
10038171003817100381710038171003817119908119895
10038171003817100381710038171003817
st 119910119894119895
(sum
119897isin119871
119908119897
119895119909119897
119894+ 119887119895) ge 1 minus 120585
119894119895 119894 isin 119868
120585119894119895
ge 0 119894 isin 119868
(38)
where 119910119894119895
= 2119910119894119895
minus1 and 120582119895
ge 0 is a regularization parameterAs pointed by [8] 120582
119895ge 0 represents a trade-off between the
number of training set errors and the amount of robustness
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
4 Mathematical Problems in Engineering
The modified 1205942-distance 119889(sdot sdot) R119898 times R119898 rarr 119877 is
used tomeasure the distance between twodiscrete probabilitydistribution vectors in statistics For given 119901 = (119901
1 119901
119898)119879
and 119902 = (1199021 119902
119898)119879 it is defined as
119889 (119902 119901) =
119898
sum
119895=1
(119902119895
minus 119901119895)2
119901119895
(11)
Based on the modified 1205942-distance we present the following
class-conditional probability distributional set
119875120598
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0 sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598
forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(12)
where 119901119897
119894119895is the nominal class-conditional distribution prob-
ability for the 119894th sample belonging to the 119895th class based onthe 119897th feature and the prespecified parameter 120598 is used tocontrol the size of the set
To design a robust classifier we need to consider the effectof data uncertainty on the objective function and constraintsThe robust objective function is to minimize the worst-case loss function value over all the possible distributionsin the distributional set 119875
120598 the robust constraints ensure
that all the original constraints should also be satisfied forany distribution in 119875
120598 Thus the robust probability classifier
problem is of the following form
(RPC) min
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| 119902119897
119894119895 isin 119875120598
st 0 le sum
119897isin119871
120572119897
119895119902119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598
forall119894 119895
(13)
Note that the above optimization problem has an infinitenumber of robust constraints and its objective function is alsoan embedded subproblem We will show how to solve suchminimax optimization problem in Section 3
23 Construct the Distributional Set To get the distributionalset 119875120598 we need to define the parameter 120598 and the nominal
probability 119901119897
119894119895 The selection of parameter 120598 is application
based and we will discuss this issue in the numerical exper-iment section next we will provide a procedure to calculate119901119897
119894119895For the 119897th feature the following procedure takes an
integer 119870119897indicating the number of data intervals as an input
andwill output the estimated probability119901119897
119894119895of the 119894th sample
belonging to the 119895th class
(1) Sort samples in the increased order and divide theminto 119870
119897intervals such that each interval has at least
lfloor|119868|119870119897rfloor number of samples Denote the 119896th interval
by Δ119897119896
(2) Calculate the total number of samples in the 119895-class119873119895 the total number of samples in the 119896th interval
119873119897119896 and the total number of samples belonging to the
119895-class in the 119896th interval 119873119897119896119895
(3) For the 119894th sample if it falls into the 119896th interval the
class-conditional probability 119901119897
119894119895is calculated by
119901119897
119894119895= Prob (119894 isin 119895 | 119909
119894119897isin Δ119897119896
)
=Prob (119894 isin 119895 119909
119894119897isin Δ119897119896
)
Prob (119909119894119897
isin Δ119897119896
)
=Prob (119894 isin 119895)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 119895)
sum1198951015840isin119869Prob (119894 isin 119895
1015840)Prob (119909
119894119897isin Δ119897119896
| 119894 isin 1198951015840)
=
(119873119895 |119868|) sdot (119873
119897119896119895119873119895)
sum1198951015840isin119869
(1198731015840
119895 |119868|) sdot (119873
11989711989611989510158401198731015840
119895)
=
119873119897119896119895
119873119897119896
(14)
Note that from the definition of 119875120598 we easily compute the
upper bound 119902119897
119894119895and lower bound 119902
119897
119894119895for the true class-
conditional probability 119902119897
119894119895as follows
119902119897
119894119895= max
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(15)
119902119897
119894119895= min
119902119897
119894119895 sum
119904
119902119897
119894119904= 1
sum
119904isin119869
(119902119897
119894119904minus 119901119897
119894119904)2
119901119897
119894119904
le 120598 119902119897
119894119904ge 0 forall119904 isin 119869
(16)
The above problems can be efficiently solved by a secondorder cone solver such as SeDuMi [26] or SDPT3 [27]
3 Solution Methods for RPC
In this section we first reduce the infinite number of robustconstraints to a finite set of linear constraints and then trans-form the inner robust objective function into a minimizationproblem by the conic duality theorem At last we obtainan equivalent computable second order cone programmingfor the RPC problem The following analysis is based on thestrong duality result in [8]
Mathematical Problems in Engineering 5
Consider a conic program of the following form
(CP) min 119888119879119909
st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898
119860119909 = 119887
(17)
and its dual problem
(DP) max 119887119879119911 +
119898
sum
119894=1
119887119879
119894119910119894
st 119860lowast119911 +
119898
sum
119894=1
119860lowast
119894119910119894= 119888
119910119894isin 119862lowast
119894 forall119894 = 1 119898
(18)
where 119862119894is a cone in R119899119894 and 119862
lowast
119894is its dual cone defined by
119862lowast
119894= 119910 isin R
119899119894 119910119879119909 ge forall119909 isin 119862
119894 (19)
A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860
119894119909 minus 119887119894
isin int119862119894 forall119894 = 1 119898 where
int119862119894denotes the interior point set of 119862
119894
Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value
31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently
Lemma 2 For given 119894 119895 the robust constraint
0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598 (20)
is equal to the following constraints
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 119906
1198971
119894119895 V1198971119894119895
ge 0 forall119897 isin 119871
(21)
Proof First note that the distributional set 119875120598119894can be repre-
sented as theCartesian product of a series of projected subsets
119875120598
= prod
119894isin119868
119875120598119894
(22)
where the projected subset on index 119894 is defined by
119875120598119894
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0
sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598 forall119897 isin 119871 119895 isin 119869
(23)
Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902
119897
119894119895 119897 isin 119871 we can further split the
projected subset 119875120598119894into |119869| subsets
119875120598119894
= prod
119895isin119869
119875120598119894119895
= prod
119895isin119869
119902119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 (24)
where 119902119897
119894119895and 119902119897
119894119895are computed by (15) and (16) respectively
For constraint sum119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall119902
119897
119894119895 isin 119875120598 it is equal to
the following constraint
sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894
lArrrArr sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894119895
lArrrArr minsum
119897isin119871
120572119897
119895119901119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 ge 0
lArrrArr maxsum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
)
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871 ge 0
lArrrArr sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
(25)
where the last equivalence comes from the strong dualitybetween these two linear programs
For the constraint sum119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119902
119897
119894119895 isin 119875120598 the same
technique applies thus we complete the proof
32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889
lowast of the modified 1205942-distance
119889lowast
(119904) = sup119905ge0
119904119905 minus 119889 (119905) =[119904 + 2]
2
+
4minus 1 (26)
6 Mathematical Problems in Engineering
where the function [sdot]+is defined as [119909]
+= 119909 if 119909 ge
0 otherwise [119909]+
= 0 For more details about conjugatefunctions see [28]
Proposition 3 The following inner maximization problem
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895+ |119868| 119902
119897
119894119895 isin 119875120598 (27)
is equivalent to a second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 120582119897
119894119895 119911119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(28)
where a second order cone 119871119899+1 is defined as
119871119899+1
=
119909 isin R119899+1
119909119899+1
ge radic
119899
sum
119894=1
1199092
119894
(29)
Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)
(MP) min 119905
st 119905 ge sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895 + |119868|
forall 119902119897
119894119895 isin 119875120598
(30)
The above constraint can be further reduced to the followingconstraint
max
sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| minus 119905 forall 119902119897
119894119895 isin 119875120598 le 0
(31)
By assigning Lagrange multipliers 120579119897
119894isin R and 120582
119897
119894isin R+
to the constraints in the left optimization problem we obtainthe following Lagrange function
119871 (119902 120579 120582) = sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
)
+ |119868| minus 119905
(32)
where 119903119897
119894119895= 120572119897
119895(1 minus 2119910
119894119895) + 120579119897
119894 Its dual function is given as
119863 (120579 120582) = max119902ge0
119871 (119905 119902 120579 120582)
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
max119902119897
119894119895ge0
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894119901119897
119894119895(
119902119897
119894119895minus 119901119897
119894119895
119901119897
119894119895
)
2
)
+ |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895max119905ge0
(119903119897
119894119895119905 minus 120582119897
119894(119905 minus 1)
2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894max119905ge0
(
119903119897
119894119895
120582119897
119894
119905 minus (119905 minus 1)2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) + |119868| minus 119905
(33)
Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897
119894119895 thus there is no duality gap between (31) and the
following dual problem
min 119863 (120579 120582) 120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
lArrrArr
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| minus 119905
st 119908119897
119894119895ge120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) forall119894isin119868 119897isin119871 119895isin119869
120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
(34)
Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints
120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) le 119908119897
119894119895lArrrArr 120582
119897
119894(minus1 +
1
4
[
[
119903119897
119894119895
120582119897
119894
+ 2]
]
2
+
) le 119908119897
119894119895
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge [119903
119897
119894119895+ 2120582119897
119894]2
+
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge (119911
119897
119894119895)2
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
Mathematical Problems in Engineering 7
lArrrArr (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
(35)
By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem
min 119905
st sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| le 119905
(
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 119911119897
119894119895 120582119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(36)
By eliminating variable 119905 we complete the proof
Based on the Lemma 2 and Proposition 3 we obtain ourmain result
Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119911119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
120582119897
119894119895 119911119897
119894119895 1199061198971
119894119895 V1198971119894119895
1199061198970
119894119895 V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895 120579119897
119894119895 119908119897
119894119895 120572119897
119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(37)
4 Numerical Experiments onReal-World Applications
In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems
All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM
41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility
Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples
42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575
2|119869| where
120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901
119897
119894119895= 1|119869| its maximal variation
range is between 119901119897
119894119895(1 minus 120575) and 119901
119897
119894119895(1 + 120575) The number of
data intervals 119870119897is defined as 119870
119897= |119868|(|119869| times 119870) such that if
the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8
We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)
(RSVM) min sum
119894isin119868
120585119894119895
+ 120582119895
10038171003817100381710038171003817119908119895
10038171003817100381710038171003817
st 119910119894119895
(sum
119897isin119871
119908119897
119895119909119897
119894+ 119887119895) ge 1 minus 120585
119894119895 119894 isin 119868
120585119894119895
ge 0 119894 isin 119868
(38)
where 119910119894119895
= 2119910119894119895
minus1 and 120582119895
ge 0 is a regularization parameterAs pointed by [8] 120582
119895ge 0 represents a trade-off between the
number of training set errors and the amount of robustness
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 5
Consider a conic program of the following form
(CP) min 119888119879119909
st 119860119894119909 minus 119887119894isin 119862119894 forall119894 = 1 119898
119860119909 = 119887
(17)
and its dual problem
(DP) max 119887119879119911 +
119898
sum
119894=1
119887119879
119894119910119894
st 119860lowast119911 +
119898
sum
119894=1
119860lowast
119894119910119894= 119888
119910119894isin 119862lowast
119894 forall119894 = 1 119898
(18)
where 119862119894is a cone in R119899119894 and 119862
lowast
119894is its dual cone defined by
119862lowast
119894= 119910 isin R
119899119894 119910119879119909 ge forall119909 isin 119862
119894 (19)
A conic program is called strictly feasible if it admits a feasiblesolution 119909 such that 119860
119894119909 minus 119887119894
isin int119862119894 forall119894 = 1 119898 where
int119862119894denotes the interior point set of 119862
119894
Lemma 1 (see [8]) If one of the problems (CP) and (DP) isstrictly feasible and bounded then the other problem is solvableand (CP) = (DP) in the sense that both have the same optimalobjective function value
31 Robust Constraints The following lemma provides anequivalent characterization for the infinite number of robustconstraints in terms of a finite set of linear constraints whichcan be solved efficiently
Lemma 2 For given 119894 119895 the robust constraint
0 le sum
119897isin119871
120572119897
119895119901119897
119894119895le 1 forall 119902
119897
119894119895 isin 119875120598 (20)
is equal to the following constraints
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 119906
1198971
119894119895 V1198971119894119895
ge 0 forall119897 isin 119871
(21)
Proof First note that the distributional set 119875120598119894can be repre-
sented as theCartesian product of a series of projected subsets
119875120598
= prod
119894isin119868
119875120598119894
(22)
where the projected subset on index 119894 is defined by
119875120598119894
=
119902119897
119894119895 sum
119895
119902119897
119894119895= 1 119902119897
119894119895ge 0
sum
119895isin119869
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
le 120598 forall119897 isin 119871 119895 isin 119869
(23)
Then for given 119894 119895 since the robust constraint is onlyassociated with variables 119902
119897
119894119895 119897 isin 119871 we can further split the
projected subset 119875120598119894into |119869| subsets
119875120598119894
= prod
119895isin119869
119875120598119894119895
= prod
119895isin119869
119902119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 (24)
where 119902119897
119894119895and 119902119897
119894119895are computed by (15) and (16) respectively
For constraint sum119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall119902
119897
119894119895 isin 119875120598 it is equal to
the following constraint
sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894
lArrrArr sum
119897isin119871
120572119897
119895119901119897
119894119895ge 0 forall 119902
119897
119894119895 isin 119875120598119894119895
lArrrArr minsum
119897isin119871
120572119897
119895119901119897
119894119895 119902119897
119894119895le 119902119897
119894119895le 119902119897
119894119895 forall119897 isin 119871 ge 0
lArrrArr maxsum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
)
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871 ge 0
lArrrArr sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 1199061198970
119894119895 V1198970119894119895
ge 0 forall119897 isin 119871
(25)
where the last equivalence comes from the strong dualitybetween these two linear programs
For the constraint sum119897isin119871
120572119897
119895119901119897
119894119895le 1 forall119902
119897
119894119895 isin 119875120598 the same
technique applies thus we complete the proof
32 Robust Objective Function In the RPC problem therobust objective function is defined by an innermaximizationproblem The following proposition shows that it can betransformed into a minimization problem over second ordercones To prove the following result we utilize the concept ofconjugate function 119889
lowast of the modified 1205942-distance
119889lowast
(119904) = sup119905ge0
119904119905 minus 119889 (119905) =[119904 + 2]
2
+
4minus 1 (26)
6 Mathematical Problems in Engineering
where the function [sdot]+is defined as [119909]
+= 119909 if 119909 ge
0 otherwise [119909]+
= 0 For more details about conjugatefunctions see [28]
Proposition 3 The following inner maximization problem
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895+ |119868| 119902
119897
119894119895 isin 119875120598 (27)
is equivalent to a second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 120582119897
119894119895 119911119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(28)
where a second order cone 119871119899+1 is defined as
119871119899+1
=
119909 isin R119899+1
119909119899+1
ge radic
119899
sum
119894=1
1199092
119894
(29)
Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)
(MP) min 119905
st 119905 ge sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895 + |119868|
forall 119902119897
119894119895 isin 119875120598
(30)
The above constraint can be further reduced to the followingconstraint
max
sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| minus 119905 forall 119902119897
119894119895 isin 119875120598 le 0
(31)
By assigning Lagrange multipliers 120579119897
119894isin R and 120582
119897
119894isin R+
to the constraints in the left optimization problem we obtainthe following Lagrange function
119871 (119902 120579 120582) = sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
)
+ |119868| minus 119905
(32)
where 119903119897
119894119895= 120572119897
119895(1 minus 2119910
119894119895) + 120579119897
119894 Its dual function is given as
119863 (120579 120582) = max119902ge0
119871 (119905 119902 120579 120582)
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
max119902119897
119894119895ge0
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894119901119897
119894119895(
119902119897
119894119895minus 119901119897
119894119895
119901119897
119894119895
)
2
)
+ |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895max119905ge0
(119903119897
119894119895119905 minus 120582119897
119894(119905 minus 1)
2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894max119905ge0
(
119903119897
119894119895
120582119897
119894
119905 minus (119905 minus 1)2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) + |119868| minus 119905
(33)
Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897
119894119895 thus there is no duality gap between (31) and the
following dual problem
min 119863 (120579 120582) 120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
lArrrArr
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| minus 119905
st 119908119897
119894119895ge120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) forall119894isin119868 119897isin119871 119895isin119869
120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
(34)
Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints
120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) le 119908119897
119894119895lArrrArr 120582
119897
119894(minus1 +
1
4
[
[
119903119897
119894119895
120582119897
119894
+ 2]
]
2
+
) le 119908119897
119894119895
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge [119903
119897
119894119895+ 2120582119897
119894]2
+
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge (119911
119897
119894119895)2
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
Mathematical Problems in Engineering 7
lArrrArr (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
(35)
By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem
min 119905
st sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| le 119905
(
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 119911119897
119894119895 120582119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(36)
By eliminating variable 119905 we complete the proof
Based on the Lemma 2 and Proposition 3 we obtain ourmain result
Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119911119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
120582119897
119894119895 119911119897
119894119895 1199061198971
119894119895 V1198971119894119895
1199061198970
119894119895 V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895 120579119897
119894119895 119908119897
119894119895 120572119897
119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(37)
4 Numerical Experiments onReal-World Applications
In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems
All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM
41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility
Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples
42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575
2|119869| where
120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901
119897
119894119895= 1|119869| its maximal variation
range is between 119901119897
119894119895(1 minus 120575) and 119901
119897
119894119895(1 + 120575) The number of
data intervals 119870119897is defined as 119870
119897= |119868|(|119869| times 119870) such that if
the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8
We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)
(RSVM) min sum
119894isin119868
120585119894119895
+ 120582119895
10038171003817100381710038171003817119908119895
10038171003817100381710038171003817
st 119910119894119895
(sum
119897isin119871
119908119897
119895119909119897
119894+ 119887119895) ge 1 minus 120585
119894119895 119894 isin 119868
120585119894119895
ge 0 119894 isin 119868
(38)
where 119910119894119895
= 2119910119894119895
minus1 and 120582119895
ge 0 is a regularization parameterAs pointed by [8] 120582
119895ge 0 represents a trade-off between the
number of training set errors and the amount of robustness
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
6 Mathematical Problems in Engineering
where the function [sdot]+is defined as [119909]
+= 119909 if 119909 ge
0 otherwise [119909]+
= 0 For more details about conjugatefunctions see [28]
Proposition 3 The following inner maximization problem
maxsum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895+ |119868| 119902
119897
119894119895 isin 119875120598 (27)
is equivalent to a second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 120582119897
119894119895 119911119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(28)
where a second order cone 119871119899+1 is defined as
119871119899+1
=
119909 isin R119899+1
119909119899+1
ge radic
119899
sum
119894=1
1199092
119894
(29)
Proof For given feasible 120572 satisfying the robust constraints itis straightforward to show that the inner maximum problemis equal to the following minimization problem (MP)
(MP) min 119905
st 119905 ge sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895 + |119868|
forall 119902119897
119894119895 isin 119875120598
(30)
The above constraint can be further reduced to the followingconstraint
max
sum
119895isin119869
sum
119894isin119868
(1 minus 2119910119894119895
) sum
119897isin119871
120572119897
119895119902119897
119894119895
+ |119868| minus 119905 forall 119902119897
119894119895 isin 119875120598 le 0
(31)
By assigning Lagrange multipliers 120579119897
119894isin R and 120582
119897
119894isin R+
to the constraints in the left optimization problem we obtainthe following Lagrange function
119871 (119902 120579 120582) = sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894
(119902119897
119894119895minus 119901119897
119894119895)2
119901119897
119894119895
)
+ |119868| minus 119905
(32)
where 119903119897
119894119895= 120572119897
119895(1 minus 2119910
119894119895) + 120579119897
119894 Its dual function is given as
119863 (120579 120582) = max119902ge0
119871 (119905 119902 120579 120582)
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
max119902119897
119894119895ge0
(119903119897
119894119895119902119897
119894119895minus 120582119897
119894119901119897
119894119895(
119902119897
119894119895minus 119901119897
119894119895
119901119897
119894119895
)
2
)
+ |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895max119905ge0
(119903119897
119894119895119905 minus 120582119897
119894(119905 minus 1)
2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894max119905ge0
(
119903119897
119894119895
120582119897
119894
119905 minus (119905 minus 1)2) + |119868| minus 119905
= sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894)
+ sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) + |119868| minus 119905
(33)
Note that for any feasible 120572 the primal maximizationproblem (31) is bounded and has a strictly feasible solution119901119897
119894119895 thus there is no duality gap between (31) and the
following dual problem
min 119863 (120579 120582) 120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
lArrrArr
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| minus 119905
st 119908119897
119894119895ge120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) forall119894isin119868 119897isin119871 119895isin119869
120579119897
119894isin R 120582
119897
119894isin R+ forall119894 isin 119868 119897 isin 119871
(34)
Next we show that the constraint about the conjugate func-tion can be represented by second order cone constraints
120582119897
119894119889lowast
(
119903119897
119894119895
120582119897
119894
) le 119908119897
119894119895lArrrArr 120582
119897
119894(minus1 +
1
4
[
[
119903119897
119894119895
120582119897
119894
+ 2]
]
2
+
) le 119908119897
119894119895
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge [119903
119897
119894119895+ 2120582119897
119894]2
+
lArrrArr 4120582119897
119894(120582119897
119894+ 119908119897
119894119895) ge (119911
119897
119894119895)2
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
Mathematical Problems in Engineering 7
lArrrArr (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
(35)
By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem
min 119905
st sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| le 119905
(
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 119911119897
119894119895 120582119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(36)
By eliminating variable 119905 we complete the proof
Based on the Lemma 2 and Proposition 3 we obtain ourmain result
Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119911119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
120582119897
119894119895 119911119897
119894119895 1199061198971
119894119895 V1198971119894119895
1199061198970
119894119895 V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895 120579119897
119894119895 119908119897
119894119895 120572119897
119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(37)
4 Numerical Experiments onReal-World Applications
In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems
All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM
41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility
Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples
42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575
2|119869| where
120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901
119897
119894119895= 1|119869| its maximal variation
range is between 119901119897
119894119895(1 minus 120575) and 119901
119897
119894119895(1 + 120575) The number of
data intervals 119870119897is defined as 119870
119897= |119868|(|119869| times 119870) such that if
the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8
We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)
(RSVM) min sum
119894isin119868
120585119894119895
+ 120582119895
10038171003817100381710038171003817119908119895
10038171003817100381710038171003817
st 119910119894119895
(sum
119897isin119871
119908119897
119895119909119897
119894+ 119887119895) ge 1 minus 120585
119894119895 119894 isin 119868
120585119894119895
ge 0 119894 isin 119868
(38)
where 119910119894119895
= 2119910119894119895
minus1 and 120582119895
ge 0 is a regularization parameterAs pointed by [8] 120582
119895ge 0 represents a trade-off between the
number of training set errors and the amount of robustness
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 7
lArrrArr (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge 0 119911
119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894
(35)
By reinjecting the above constraints into (MP) the robustobjective function is equivalent to the following problem
min 119905
st sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868| le 119905
(
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713
119911119897
119894119895ge119903119897
119894119895+2120582119897
119894 119911119897
119894119895 120582119897
119894119895ge0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119906119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119897 isin 119871 119895 isin 119869
(36)
By eliminating variable 119905 we complete the proof
Based on the Lemma 2 and Proposition 3 we obtain ourmain result
Proposition 4 The RPC problem can be solved as the follow-ing second order cone programming
min sum
119894isin119868
sum
119897isin119871
(120598120582119897
119894minus 120579119897
119894) + sum
119894isin119868
sum
119897isin119871
sum
119895isin119869
119901119897
119894119895119908119897
119894119895+ |119868|
st (
119908119897
119894119895
119911119897
119894119895
2120582119897
119894+ 119908119897
119894119895
) isin 1198713 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895= 120572119897
119895(1 minus 2119868
119894119895) + 120579119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119911119897
119894119895ge 119903119897
119894119895+ 2120582119897
119894 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
sum
119897isin119871
(119902119897
1198941198951199061198970
119894119895minus 119902119897
119894119895V1198970119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
1 + sum
119897isin119871
(119902119897
1198941198951199061198971
119894119895minus 119902119897
119894119895V1198971119894119895
) ge 0 forall119894 isin 119868 119895 isin 119869
120572119897
119894119895minus 1199061198970
119894119895+ V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
V1198971119894119895
minus 120572119897
119894119895minus 1199061198971
119894119895ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
120582119897
119894119895 119911119897
119894119895 1199061198971
119894119895 V1198971119894119895
1199061198970
119894119895 V1198970119894119895
ge 0 forall119894 isin 119868 119895 isin 119869 119897 isin 119871
119903119897
119894119895 120579119897
119894119895 119908119897
119894119895 120572119897
119894119895isin R forall119894 isin 119868 119895 isin 119869 119897 isin 119871
(37)
4 Numerical Experiments onReal-World Applications
In this section numerical experiments on real-world appli-cations are carried out to verify the effectiveness of theproposed robust probability classifier model Specifically weconsider lithology classification data sets from our practicalapplication We compare our model with the regularizedSVM (RSVM) and the naive Bayes classifier (NBC) on bothbinary and multiple classification problems
All the numerical experiments are implemented in Mat-lab 770 and run on Intel(R) Core(TM) i5-4570 CPU SDPT3solver [27] is called to solve the second order cone programsin our proposed method and the regularized SVM
41 Data Sets Lithology classification is one of the basic tasksfor geological investigation To discriminate the lithology ofthe underground strata various electromagnetic techniquesare applied to the same strata to obtain different features suchas Gamma coefficients acoustic wave striation density andfusibility
Here numerical experiments are carried out on a seriesof data sets the borehole T1 Y4 Y5 and Y6 All boreholesare located in Tarim Basin China In total there are 12 datasets used for binary classification problems and 8 data setsused for multiple classification problems For each data setbased on a prespecified training rate 120574 isin [0 1] it is randomlypartitioned into two subsets a training set and a test set suchthat the size of training set accounts for 120574 of the total numberof samples
42 Experiment Design The parameters in our models arechosen based on the size of data setThe parameter 120598 dependson the number of the classes and defined as 120598 = 120575
2|119869| where
120575 isin (0 1)The choice of 120598 can be explained in this way if thereare |119869| classes and the training data are uniformly distributedthen for each probability 119901
119897
119894119895= 1|119869| its maximal variation
range is between 119901119897
119894119895(1 minus 120575) and 119901
119897
119894119895(1 + 120575) The number of
data intervals 119870119897is defined as 119870
119897= |119868|(|119869| times 119870) such that if
the training data are uniformly distributed then in each datainterval there are 119870 samples in each class In the followingcontext we set 120575 = 02 and 119870 = 8
We compare the performances of the proposed RPCmodel with the following regularized support vectormachinemodel [6] (take the 119895th class for example)
(RSVM) min sum
119894isin119868
120585119894119895
+ 120582119895
10038171003817100381710038171003817119908119895
10038171003817100381710038171003817
st 119910119894119895
(sum
119897isin119871
119908119897
119895119909119897
119894+ 119887119895) ge 1 minus 120585
119894119895 119894 isin 119868
120585119894119895
ge 0 119894 isin 119868
(38)
where 119910119894119895
= 2119910119894119895
minus1 and 120582119895
ge 0 is a regularization parameterAs pointed by [8] 120582
119895ge 0 represents a trade-off between the
number of training set errors and the amount of robustness
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
8 Mathematical Problems in Engineering
Table 1 Performances of RSVM NBC and RPC for binary classification problems on Y5 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 907 882 639 662 884 905lowast
55 899 886 691 728 895 899lowast
60 890 850 703 721 913 864lowast
65 863 859 721 728 880 925lowast
70 923 841 703 757 908 863lowast
75 888 879 742 746 887 916lowast
80 887 938lowast 900 875 883 93385 895 893 934 896 892 910lowast
90 895 884 933 958lowast 892 926
Table 2 Performances of RSVM NBC and RPC for binary classification problems on T1 data set
tr () RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
50 914 848 765 689 913 875lowast
55 925 866 680 770 920 903lowast
60 898 861 729 738 889 909lowast
65 910 823 805 816 898 929lowast
70 868 955lowast 834 898 884 93775 894 852 859 795 897 935lowast
80 918 808 881 799 897 911lowast
85 883 899 899 928 908 971lowast
90 885 902 888 942 909 972lowast
with respect to spherical perturbations of the data pointsTo make a fair comparison in the following experiments wewill test a series of 120582 values and choose the one with bestperformance Note that if 120582
119895= 0 we refer to this model as the
classic support vector machine (SVM) See also [6] for moredetails onRSVMand its applications tomultiple classificationproblems
43 Test on Binary Classification In this subsection RSVMNBC and RPC are implemented on 12 data sets for the binaryclassification problems using the cross-validation methodsTo improve the performances of RSVM we transform theoriginal data by the popularly used polynomial kernels [6]
Tables 1 and 2 show the averaged classification per-formances of RSVM NBC and the proposed RPC (over10 randomly generated instances) for binary classificationproblems on Y5 and T1 data sets respectively For each dataset we randomly partition it into a training set and a testset based on the parameter tr which varies from 05 to 09The highest classification accuracy on a training set amongthese three methods is highlighted in bold while the bestclassification accuracy on a test set is marked with an asterisk
Tables 1 and 2 validate the effectiveness of the proposedRPC for binary classification problems compared with NBCand RSVM Specifically for most of the cases RSVM hasthe highest classification accuracy on training sets but itsperformance on test sets is unsatisfactory For most of thecases the proposed RPC provides the highest classification
accuracy on test sets NBC provides better performanceson test sets as the training rate increases The experimentalresults also show that for given training rate PRC can providebetter performances on test sets than that on training setsthus it can avoid the ldquooverlearningrdquo phenomenon
To further validate the effectiveness of the proposed RPCwe test it on additional 10 data sets that is T41ndashT45 andT61ndashT65 Table 3 reports the averaged performances of threemethods over 10 randomly generated instances when thetraining rate is set to 70 Except for data sets T45 T63and T64 RPC provides the highest accuracy on the test setsand for all the data sets its accuracy is higher than 80 Asshown in Tables 1 and 2 the robustness of the proposed RPCguarantees its scalability on the test sets
44 Test onMultiple Classification In this subsection we testthe performances of on multiple classification problems bycomparison with RSVM and NBC Since the performance ofRSVM is determined by its regularization parameter 120582 werun a set of RSVM with 120582 varying from 0 to a big enoughnumber and select the one with the best performance on testsets
Figures 1 and 3 plot the performances of three methodson Y5 and T1 training sets respectively Unlike the case ofbinary classification problems we can see that RPC providesa competitive performance even on the training sets Oneexplanation is that RSVM can outperform the proposed RPCon training sets by finding the optimal separation hyperplane
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 9
Table 3 Performances of RSVM NBC and RPC for binary classification problems on other data sets when tr = 70
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
T41 620 597 824 785 779 835lowast
T42 870 822 841 831 805 853lowast
T43 680 612 802 754 855 869lowast
T44 913 839 779 868 888 905lowast
T45 865 870 932 910lowast 840 891T61 806 790 805 830 836 878lowast
T62 714 665 869 854lowast 863 854lowast
T63 637 695 896 891lowast 822 844T64 882 867 970 969lowast 934 955T65 750 634 797 815 905 929lowast
Table 4 Performances of RSVM NBC and RPC for multiple classification problems on T1 data set
Data set RSVM NBC RPCTrain () Test () Train () Test () Train () Test ()
M1 654 682 727 737 791 774lowast
M2 769 753 826 748 817 809lowast
M3 579 699 748 874 954 920lowast
M4 704 641 971 923 954 923lowast
M5 774 713 894 881lowast 920 880M6 757 705 741 794 864 808lowast
06 065 07 075 08 085 09055
06
065
07
075
08
085
09
095
Training rate
Accu
racy
on
trai
ning
set (
)
RSVMNBCRPC
Figure 1 Performances of RSVM NBC and RPC on Y5 trainingset
for binary classification problem S while RPC is more robustto extend to solve multiple classification problems since ituses the nonlinear probability information of the data setsThe accuracy of NBC on the training sets also improves asthe training rate increases
Figures 2 and 4 show the performances of both methodson Y5 and T1 test sets respectively We can see that for most
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
055
06
065
07
075
08
085
09
095
1
Accu
racy
on
test
set (
)
Figure 2 Performances of RSVM NBC and RPC on Y5 test set
of the cases RPC provides the highest accuracy among threemethods The accuracy of RSVM outperforms that of NBCon Y5 test set while the latter outperforms the former on theT1 test set
To further test the performance of PRC on multipleclassification problems we carry out more experiments ondata sets M1ndashM6 Table 4 reports the averaged performancesof three methods on these data sets when the training rateis set to 70 Except for the M5 data set PRC always
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
10 Mathematical Problems in Engineering
06
065
07
075
08
085
Accu
racy
on
trai
ning
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 3 Performances of RSVM NBC and RPC on T1 trainingset
055
06
065
07
075
08
085
09
Accu
racy
on
test
set (
)
06 065 07 075 08 085 09Training rate
RSVMNBCRPC
Figure 4 Performances of RSVM NBC and RPC on T1 test set
provides the highest classification performances among threemethods and even for the M5 data set its accuracy (880)is very close to the best one (881)
From the tested real-life application we conclude that theproposed RPC has the robustness to provide better perfor-mance for both binary and multiple classification problemscompared with RSVM and NBC The robustness of PRCenables it to avoid the ldquooverlearningrdquo phenomenon especiallyfor the binary classification problems
5 Conclusion
In this paper we propose a robust probability classifier modelto address the data uncertainty in classification problems
To quantitatively describe the data uncertainty a class-conditional distributional set is constructed based on themodified 120594
2-distance We assume that the true distribu-tion lies in the constructed distributional set centered inthe nominal probability distribution Based on the ldquolinearcombination assumptionrdquo for the posterior class-conditionalprobabilities we consider a classification criterion using theweighted sum of the posterior probabilities The optimalrobust probability classifier is determined by minimizingthe worst-case absolute error value over all the possibledistributions belonging to the distributional set
Our proposed model introduces the recently developeddistributionally robust optimization method into the clas-sifier design problems To obtain a computable modelwe transform the resulted optimization problem into anequivalent second order cone programming based on conicduality theorem Thus our model has the same compu-tational complexity as the classic support vector machineand numerical experiments on real-life application validateits effectiveness On the one hand the proposed robustprobability classifier provides a higher accuracy comparedwith RSVM and NBC by avoiding overlearning on trainingsets for binary classification problems on the other hand italso has a promising performance for multiple classificationproblems
There are still many important extensions in our modelOther forms of loss function such as the mean squarederror function and Hinge loss functions should be studied toobtain tractable reformulations and the resulted models mayprovide better performances Probability models consideringjoint probability distribution information are also interestingresearch directions
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
References
[1] R O Duda and P E Hart Pattern Classification and SceneAnalysis John Wiley amp Sons New York NY USA 1973
[2] P Langley W Iba and K Thompson ldquoAn analysis of Bayesianclassifiersrdquo in Proceedings of the 10th National Conference onArtificial Intelligence (AAAI rsquo92) vol 90 pp 223ndash228 AAAIPress Menlo Park Calif USA July 1992
[3] B D Ripley Pattern Recognition and Neural Networks Cam-bridge University Press Cambridge UK 2007
[4] V Vapnik The Nature of Statistical Learning Theory SpringerBerlin Germany 2000
[5] M Ramoni and P Sebastiani ldquoRobust Bayes classifiersrdquo Artifi-cial Intelligence vol 125 no 1-2 pp 209ndash226 2001
[6] Y Shi Y Tian G Kou and Y Peng ldquoRobust support vectormachinesrdquo in Optimization Based Data Mining Theory andApplications Springer London UK 2011
[7] Y Z Wang Y L Zhang F L Zhang and J N Yi ldquoRobustquadratic regression and its application to energy-growth con-sumption problemrdquoMathematical Problems in Engineering vol2013 Article ID 210510 10 pages 2013
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Mathematical Problems in Engineering 11
[8] A Ben-Tal L El Ghaoui and A Nemirovski Robust Optimiza-tion Princeton University Press Princeton NJ USA 2009
[9] A Ben-Tal and A Nemirovski ldquoRobust optimizationmdashmethodology and applicationsrdquo Mathematical Programmingvol 92 no 3 pp 453ndash480 2002
[10] D Bertsimas D B Brown and C Caramanis ldquoTheory andapplications of robust optimizationrdquo SIAM Review vol 53 no3 pp 464ndash501 2011
[11] G R G Lanckriet L E Ghaoui C Bhattacharyya and M IJordan ldquoMinimax probability machinerdquo in Advances in NeuralInformation Processing Systems pp 801ndash807 2001
[12] G R G Lanckriet L El Ghaoui C Bhattacharyya and M IJordan ldquoA robust minimax approach to classificationrdquo Journalof Machine Learning Research vol 3 no 3 pp 555ndash582 2003
[13] L El Ghaoui G R G Lanckriet and G Natsoulis ldquoRobustclassification with interval datardquo Tech Rep UCBCSD-03-1279Computer Science Division University of California 2003
[14] K Huang H Yang I King andM R Lyu ldquoLearning classifiersfrom imbalanced data based on biased minimax probabilitymachinerdquo in Proceedings of the IEEE Computer Society Confer-ence on Computer Vision and Pattern Recognition (CVPR rsquo04)vol 2 pp 558ndash563 IEEE July 2004
[15] K Huang H Yang I King M R Lyu and L Chan ldquoTheminimum error minimax probability machinerdquo The Journal ofMachine Learning Research vol 5 pp 1253ndash1286 2004
[16] C-H Hoi and M R Lyu ldquoRobust face recognition usingminimax probability machinerdquo in Proceedings of the IEEEInternational Conference on Multimedia and Expo (ICME rsquo04)vol 2 pp 1175ndash1178 June 2004
[17] T Kitahara S Mizuno and K Nakata ldquoQuadratic and convexminimax classification problemsrdquo Journal of the OperationsResearch Society of Japan vol 51 no 2 pp 191ndash201 2008
[18] T Kitahara S Mizuno and K Nakata ldquoAn extension of aminimax approach to multiple classificationrdquo Journal of theOperations Research Society of Japan vol 50 no 2 pp 123ndash1362007
[19] D Klabjan D Simchi-Levi and M Song ldquoRobust stochasticlot-sizing by means of histogramsrdquo Production and OperationsManagement vol 22 no 3 pp 691ndash710 2013
[20] L V Utkin ldquoA framework for imprecise robust one-class classi-fication modelsrdquo International Journal of Machine Learning andCybernetics 2012
[21] N Cristianini and J Shawe-Taylor An Introduction to SupportVector Machines and Other Kernel-Based Learning MethodsCambridge University Press Cambridge UK 2000
[22] B Scholkopf and A J Smola Learning with Kernels The MITPress Cambridge UK 2002
[23] T Hastie R Tibshirani and J J H Friedman The Elements ofStatistical Learning Springer New York NY USA 2001
[24] L A Zadeh ldquoA simple view of the Dempster-Shafer theory ofevidence and its implication for the rule of combinationrdquo AIMagazine vol 7 no 2 pp 85ndash90 1986
[25] R Yager M Fedrizzi and J Kacprzyk Advances in theDempster-Shafer Theory of Evidence John Wiley amp Sons NewYork NY USA 1994
[26] J F Sturm ldquoUsing SeDuMi 102 a MATLAB toolbox foroptimization over symmetric conesrdquoOptimizationMethods andSoftware vol 11 no 1 pp 625ndash653 1999
[27] K C Toh R H T Tutunu and M J Todd ldquoOn the implemen-tation and usage of SDPT3Cmdasha Matlab software package for
semidefinite quadratic linear programming version 40rdquo 2006httpwwwmathnusedusgsimmattohkcsdpt3guide4-0-draftpdf
[28] A Ben-Tal D D Hertog A D Waegenaere B Melenberg andG Rennen ldquoRobust solutions of optimization problems affectedby uncertain probabilitiesrdquo Management Science vol 59 no 2pp 341ndash357 2013
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of
Submit your manuscripts athttpwwwhindawicom
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical Problems in Engineering
Hindawi Publishing Corporationhttpwwwhindawicom
Differential EquationsInternational Journal of
Volume 2014
Applied MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
OptimizationJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Operations ResearchAdvances in
Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Function Spaces
Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
International Journal of Mathematics and Mathematical Sciences
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Algebra
Discrete Dynamics in Nature and Society
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Decision SciencesAdvances in
Discrete MathematicsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Stochastic AnalysisInternational Journal of