three papers: auc, pfa and bioinformatics the three papers are posted online
Post on 19-Dec-2015
212 views
TRANSCRIPT
Three Papers: AUC, PFA and Three Papers: AUC, PFA and BIOInformaticsBIOInformatics
The three papers are posted online
Learning Algorithms for Better RankingLearning Algorithms for Better Ranking
Jin Huang, Charles X. Ling: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng. 17(3): 299-310 (2005)
Find the citations online (google scholar) Goal: accuracy vs ranking Secondary Goal: Decision Tree vs Bayesian Networks in
Ranking– Design Algorithms That Directly Optimize Ranking
Accuracy: not good enoughAccuracy: not good enough
Two classifiers
Accuracy of Classifier1: 4/5Accuracy of Classifier2: 4/5But intuitively, Classifier 1 is better!
Classifier 1 – – – – + – + + + +
Classifier 2 + – – – – + + + + –
Cutoff line
Higher ranking: more desirable
Accuracy vs rankingAccuracy vs ranking
Accuracy-based: making two assumptions: balanced class distribution and equal
costs for misclassificationRanking: step aside these assumptions
– Problem: Training examples are labeled, not ranked
How to evaluate ranking?
ROC curveROC curve(Provost & Fawcett, AAAI’97)
How to calculate AUCHow to calculate AUC
Rank test examples in an increasing orderLet ri be the rank of the ith positive example
(left: low r_i, right: high r_i = better)S0 = ∑ ri
AUC:
(Hand & Till, 2001, MLJ)
10
000 2/)1(ˆnn
nnSA
An exampleAn example
Classifier 1 – – – – + – + + + +
ri 5 7 8 9 10
10
000 2/)1(ˆnn
nnSA
S0 = 5+7+8+9+10 = 39AUC = (39 – 5x6/2) / 25 = 24/25
Better result
ROC curve and AUCROC curve and AUC
If A dominates D, then A is better than DOften A and B are not dominating each
other AUC (area under the ROC curve)
– Overall performance
AUC for evaluating ranking
AUCAUCTwo classifiers:
The AUC of Classifier1: 24/25The AUC of Classifier2: 16/25Classifier 1 is better than 2!
Classifier 1 – – – – + – + + + +
Classifier 2 + – – – – + + + + –
AUC is more discriminatingAUC is more discriminating
For N examples(N+1) different accuraciesN (N+1)/2 different AUC values
AUC is a better and more discriminating evaluation measure than accuracy
Naïve Bayes vs C4.4 Naïve Bayes vs C4.4
Overall, Naïve Bayes outperforms C4.4 in AUC
Ling&Zhang, submitted, 2002
PCA in Face RecognitionPCA in Face Recognition
Problem with PCAProblem with PCA
The features are principal components– Thus they do not correspond directly to the original
features– Problem with face recognition: wish to pick a subset of
original features rather than composed ones
Principal Feature Analysis: pick the best, uncorrelated, subset of features of a data set– Equivalent to finding q dimensions of a random
variable X=[x1,x2, … , xn]^T
How to find the q features?How to find the q features?
[ q1, q2, q3, … qn] i^th row= i^th feature
q
The subspaceThe subspace
AlgorithmAlgorithm
ResultResult
When PCA does not workWhen PCA does not work
PCA + Clustering = Bad IdeaPCA + Clustering = Bad Idea
More…More…
Rand Index for Clusters Rand Index for Clusters (Partitions)(Partitions)
ResultsResults