three papers: auc, pfa and bioinformatics the three papers are posted online

Three Papers: AUC, PFA and Three Papers: AUC, PFA and BIOInformaticsBIOInformatics

The three papers are posted online

Learning Algorithms for Better RankingLearning Algorithms for Better Ranking

Jin Huang, Charles X. Ling: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng. 17(3): 299-310 (2005)

Find the citations online (google scholar) Goal: accuracy vs ranking Secondary Goal: Decision Tree vs Bayesian Networks in

Ranking– Design Algorithms That Directly Optimize Ranking

Accuracy: not good enoughAccuracy: not good enough

Two classifiers

Accuracy of Classifier1: 4/5Accuracy of Classifier2: 4/5But intuitively, Classifier 1 is better!

Classifier 1 – – – – + – + + + +

Classifier 2 + – – – – + + + + –

Cutoff line

Higher ranking: more desirable

Accuracy vs rankingAccuracy vs ranking

Accuracy-based: making two assumptions: balanced class distribution and equal

costs for misclassificationRanking: step aside these assumptions

– Problem: Training examples are labeled, not ranked

How to evaluate ranking?

ROC curveROC curve(Provost & Fawcett, AAAI’97)

How to calculate AUCHow to calculate AUC

Rank test examples in an increasing orderLet ri be the rank of the ith positive example

(left: low r_i, right: high r_i = better)S0 = ∑ ri

AUC:

(Hand & Till, 2001, MLJ)

10

000 2/)1(ˆnn

nnSA

An exampleAn example

Classifier 1 – – – – + – + + + +

ri 5 7 8 9 10

10

000 2/)1(ˆnn

nnSA

S0 = 5+7+8+9+10 = 39AUC = (39 – 5x6/2) / 25 = 24/25

Better result

ROC curve and AUCROC curve and AUC

If A dominates D, then A is better than DOften A and B are not dominating each

other AUC (area under the ROC curve)

– Overall performance

AUC for evaluating ranking

AUCAUCTwo classifiers:

The AUC of Classifier1: 24/25The AUC of Classifier2: 16/25Classifier 1 is better than 2!

Classifier 1 – – – – + – + + + +

Classifier 2 + – – – – + + + + –

AUC is more discriminatingAUC is more discriminating

For N examples(N+1) different accuraciesN (N+1)/2 different AUC values

AUC is a better and more discriminating evaluation measure than accuracy

Naïve Bayes vs C4.4 Naïve Bayes vs C4.4

Overall, Naïve Bayes outperforms C4.4 in AUC

Ling&Zhang, submitted, 2002

PCA in Face RecognitionPCA in Face Recognition

Problem with PCAProblem with PCA

The features are principal components– Thus they do not correspond directly to the original

features– Problem with face recognition: wish to pick a subset of

original features rather than composed ones

Principal Feature Analysis: pick the best, uncorrelated, subset of features of a data set– Equivalent to finding q dimensions of a random

variable X=[x1,x2, … , xn]^T

How to find the q features?How to find the q features?

[ q1, q2, q3, … qn] i^th row= i^th feature

q

The subspaceThe subspace

AlgorithmAlgorithm

ResultResult

When PCA does not workWhen PCA does not work

PCA + Clustering = Bad IdeaPCA + Clustering = Bad Idea

More…More…

Rand Index for Clusters Rand Index for Clusters (Partitions)(Partitions)

ResultsResults

three papers: auc, pfa and bioinformatics the three papers are posted online

Documents

accuracy slide

ranking slide

online slide

desirable slide

subspace slide

aaai97 slide

algorithm slide

mlj slide