three papers: auc, pfa and bioinformatics the three papers are posted online

22
Three Papers: AUC, PFA Three Papers: AUC, PFA and BIOInformatics and BIOInformatics The three papers are posted online

Post on 19-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Three Papers: AUC, PFA and Three Papers: AUC, PFA and BIOInformaticsBIOInformatics

The three papers are posted online

Page 2: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Learning Algorithms for Better RankingLearning Algorithms for Better Ranking

Jin Huang, Charles X. Ling: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Trans. Knowl. Data Eng. 17(3): 299-310 (2005)

Find the citations online (google scholar) Goal: accuracy vs ranking Secondary Goal: Decision Tree vs Bayesian Networks in

Ranking– Design Algorithms That Directly Optimize Ranking

Page 3: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Accuracy: not good enoughAccuracy: not good enough

Two classifiers

Accuracy of Classifier1: 4/5Accuracy of Classifier2: 4/5But intuitively, Classifier 1 is better!

Classifier 1 – – – – + – + + + +

Classifier 2 + – – – – + + + + –

Cutoff line

Higher ranking: more desirable

Page 4: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Accuracy vs rankingAccuracy vs ranking

Accuracy-based: making two assumptions: balanced class distribution and equal

costs for misclassificationRanking: step aside these assumptions

– Problem: Training examples are labeled, not ranked

How to evaluate ranking?

Page 5: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

ROC curveROC curve(Provost & Fawcett, AAAI’97)

Page 6: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

How to calculate AUCHow to calculate AUC

Rank test examples in an increasing orderLet ri be the rank of the ith positive example

(left: low r_i, right: high r_i = better)S0 = ∑ ri

AUC:

(Hand & Till, 2001, MLJ)

10

000 2/)1(ˆnn

nnSA

Page 7: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

An exampleAn example

Classifier 1 – – – – + – + + + +

ri 5 7 8 9 10

10

000 2/)1(ˆnn

nnSA

S0 = 5+7+8+9+10 = 39AUC = (39 – 5x6/2) / 25 = 24/25

Better result

Page 8: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

ROC curve and AUCROC curve and AUC

If A dominates D, then A is better than DOften A and B are not dominating each

other AUC (area under the ROC curve)

– Overall performance

AUC for evaluating ranking

Page 9: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

AUCAUCTwo classifiers:

The AUC of Classifier1: 24/25The AUC of Classifier2: 16/25Classifier 1 is better than 2!

Classifier 1 – – – – + – + + + +

Classifier 2 + – – – – + + + + –

Page 10: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

AUC is more discriminatingAUC is more discriminating

For N examples(N+1) different accuraciesN (N+1)/2 different AUC values

AUC is a better and more discriminating evaluation measure than accuracy

Page 11: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Naïve Bayes vs C4.4 Naïve Bayes vs C4.4

Overall, Naïve Bayes outperforms C4.4 in AUC

Ling&Zhang, submitted, 2002

Page 12: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

PCA in Face RecognitionPCA in Face Recognition

Page 13: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Problem with PCAProblem with PCA

The features are principal components– Thus they do not correspond directly to the original

features– Problem with face recognition: wish to pick a subset of

original features rather than composed ones

Principal Feature Analysis: pick the best, uncorrelated, subset of features of a data set– Equivalent to finding q dimensions of a random

variable X=[x1,x2, … , xn]^T

Page 14: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

How to find the q features?How to find the q features?

[ q1, q2, q3, … qn] i^th row= i^th feature

q

Page 15: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

The subspaceThe subspace

Page 16: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

AlgorithmAlgorithm

Page 17: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

ResultResult

Page 18: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

When PCA does not workWhen PCA does not work

Page 19: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

PCA + Clustering = Bad IdeaPCA + Clustering = Bad Idea

Page 20: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

More…More…

Page 21: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

Rand Index for Clusters Rand Index for Clusters (Partitions)(Partitions)

Page 22: Three Papers: AUC, PFA and BIOInformatics The three papers are posted online

ResultsResults