![Page 1: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/1.jpg)
ROC analysis, a method to assess binarydecision rules
David MakowskiINRA
An introduction to modelling, Poznan, Nov. 2008
![Page 2: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/2.jpg)
Outline
1. What is a binary decision rule?
2. ROC analysis, a method to assess the accuracy of binary decision rules
3. An example: assessment of decision rules for the control of sclerotinia
4. Exercice with R:
Assessment of models for categorizing soft wheat fields according to theirgrain protein content
![Page 3: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/3.jpg)
1.What is a binary decision rule?
![Page 4: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/4.jpg)
Decision rule
« A rule for taking decisions in function of some variables ».
1. What is a binary decision rule?
Decision rule DecisionVariables
![Page 5: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/5.jpg)
Binary decision rule
« A rule for choosing among two decisions ».
1. What is a binary decision rule?
Decision rule
Decision 1
or
Decision 2
Variables
![Page 6: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/6.jpg)
• Apply a chemical treatment / No chemical treatment
• Sow cultivar 1 / Sow cultivar 2
• Apply fertilizer / No application
…
Examples of binary decision rules
1. What is a binary decision rule?
![Page 7: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/7.jpg)
Binary decision rule based on an indicator and a decision threshold
« I apply fungicide if the indicator is higher than a decisionthreshold »
« I apply fungicide if I ≥ S. No application otherwise »
• Indicator I = measure or model prediction (ex: % diseased organs).
• Threshold S = Numerical value (ex: 20%).
1. What is a binary decision rule?
![Page 8: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/8.jpg)
Optimization of
Two practical problems:
• Choose the best threshold S for a given indicator I.
• Choose the best indicator among several candidates.
« I apply fungicide if I ≥ S. No application otherwise »
1. What is a binary decision rule?
![Page 9: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/9.jpg)
A framework for assessing binary decision rules
1. Define a series of indicators (measured variables and/or models).
2. Define the range of variation for the threshold S associatedto each indicator (e.g 0-100 % of diseased flowers).
3. Define one or several criteria for assessing the decisionrules (i.e the combinations of all possible I and S).
4. Estimate the values of the criteria for each rule.
5. Choose the « best » rule.
1. What is a binary decision rule?
![Page 10: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/10.jpg)
2. ROC analysis
ROC = Receiver Operating Characteristic
![Page 11: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/11.jpg)
ROC analysis
Notations
Y : a random variable taking the value 0 or 1 for a negative and positive responserespectively.
I : a variable corresponding to the output of a given indicator.
S : a decision threshold.
Examples for Y
Y = 0 if the yield loss due to the disease is small, Y=1 otherwise.
Y = 0 if the percentage of diseased plants at harvest < 10%, Y=1 otherwise.
Y = 0 if weed biomass < 0.15 t/ha, Y=1 otherwise.
2. ROC analysis
![Page 12: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/12.jpg)
ROC analysis
n plots with Y=0 (e.g. % diseased plants at harvest < 10%).
m plots with Y=1 (e.g. % diseased plants at harvest ≥ 10%).
(i). Determine the value of the indicator I for each plot.
(ii). Define a decision threshold S.
(iii). Sensitivity = Prob(I ≥ S | Y=1) = 1 – False negative rate
(iv). Specificity = Prob(I < S | Y=0) = 1 – False positive rate
(v). ROC curve : Sensitivity (S) versus 1 – Specificity (S)
(vi). Estimate the area under the ROC curve (AUC) for each indicator I.
If AUC~0.5, the indicator is not useful (not better than random decisions).
2. ROC analysis
![Page 13: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/13.jpg)
Number of plots
Threshold S
Plots with highincidence at harvest
Plots with lowincidence at harvest
SensitivitySpecificity
Value of I
ROC analysis2. ROC analysis
![Page 14: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/14.jpg)
Specificity
Sensitivity
0
0
1
1Optimal values
AREA
2. ROC analysis
![Page 15: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/15.jpg)
1 - Specificity
Sensitivity
1
0 1
Area Under the Curve=
Probability of correctlyranking two plots
2. ROC analysis
Optimal values
![Page 16: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/16.jpg)
3. An example: assessment of decisionrules for the control of sclerotinia
![Page 17: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/17.jpg)
Sclerotinia sclerotiorum, Lib., de Bary, in oilseed rape crops
• High variability of disease incidence across sites and years.
• High yield losses if disease incidence at harvest > 10%.
• Efficient chemical treatments exist, but are not always required.
3. An example
![Page 18: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/18.jpg)
Rule 1. Indicator I1 = measured proportion of diseased plant organs
Agricultural field n plant organs are collectedbefore treatment.
If y/n ≥ T, treatmentIf y/n < T , no treatment
Organs are scored for the presence of the disease.
y organs are diseased.
3. An example
![Page 19: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/19.jpg)
In this example, organs = flowers
tSowing HarvestFlowering
n collected flowers
Incubation in Petri dishes
y diseased flowers
If y / n ≥ T treatment
else no treatment
3. An example
![Page 20: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/20.jpg)
0Less than normal
5Normal (50-60
mm)
10More than
normalRain in the last month before
flowering
0Low
5Normal
10High
Plant density
0Dry
10WetType of field
0Low
5Moderate
15HighLevel of
infection in the last crop
0No
15YesOther host crops during the last
five years
01
102-3
203-5
30>5Number of oil-
seed crops during the last
ten years
PointsLevelRisk factor
3. An example
Rule 2. Indicator I2 = sum of risk points
![Page 21: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/21.jpg)
3. An example
Probability that the diseaseincidence at harvest is higher
than 10%
% diseased flowers atflowering
Sum of risk points at flowering
MODEL
Rule 3. Indicator I3 = output of a logistic model
![Page 22: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/22.jpg)
Logistic model
x
Probability
1
0
( )( )0 1 1 2 2
0 1 1 2 2
exp
1 exp
x xz
x x
θ θ θθ θ θ
+ +=
+ + +
Rule 3. Indicator I3 = output of a logistic model
3. An example
![Page 23: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/23.jpg)
3. An example
Which rule is the best?
If I1 ≥ S, a treatment is recommended
If I2 ≥ S, a treatment is recommended
If I3 ≥ S, a treatment is recommended
Three rules for deciding about a chemical treatment at flowe ring
![Page 24: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/24.jpg)
Two types of error
Type 1. False positive rate = 1 - Specificity
I ≥ S (a treatment was recommended)
but % diseased plants at harvest < 10%
(a treatment was not required)
Type 2. False negative rate = 1 – Sensitivity
I < S (a treatment was not recommended)
but % diseased plants at harvest ≥ 10%
(a treatment was required)
3. An example
![Page 25: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/25.jpg)
Low disease incidence at harvest
% diseased flowers
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8
05
1015
2025
High disease incidence at harvest
% diseased flowers
Fre
quen
cy
0.0 0.2 0.4 0.6 0.8 1.0
01
23
45
6
Point sum
Fre
quen
cy
20 40 60 80 100
05
1015
Point sum
Fre
quen
cy
20 40 60 80 100
01
23
4
3. An example
Data from 85 experimental plots in France
Fraction of diseased flowers
Risk point sum
![Page 26: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/26.jpg)
TAB<-read.table("f:\\ Exemples\\Sclero0203.txt",header=T,sep="\t")
TAB<-TAB[is.na(TAB[,1])==F,]
Ind.1<-TAB$KIT
Ind.2<-TAB[,3]+TAB[,4]+TAB[,5]+TAB[,6]+TAB[,7]+TAB[,8]+TAB[,9]+TAB[,10]+TAB[,11]+TAB[,12]
Incidence.t<-0.10
Incidence<-TAB$TxAttNT
Incidence[Incidence<Incidence.t]<-0
Incidence[Incidence>=Incidence.t]<-1
Fit<-glm(Incidence~Ind.1+Ind.2,family=binomial)
3. An example
R code for fitting the logistic model
# glm = R function for fitting generalized linear models (e.g logistic, Poisson)
![Page 27: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/27.jpg)
> print(summary(Fit))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.31581 1.20336 -3.586 0.000335 ***
Ind.1 5.27346 1.21518 4.340 1.43e-05 ***
Ind.2 0.01356 0.01800 0.753 0.451329
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
3. An example
![Page 28: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/28.jpg)
3. An example
R code for ROC analysis for rule 1
library(ROCR)
pred<-prediction(Ind.1,Incidence)
perf<-performance(pred,"sens","spec")
spec.1<-perf@"x.values"[[1]]
sens.1<-perf@"y.values"[[1]]
plot(spec.1,sens.1, ylab="Sensibilité", xlab="Spécificité", type="l",lty=1,lwd=3)
abline(1,-1)
![Page 29: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/29.jpg)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Spécificité
Sen
sibi
lité
Specificity
Sen
sitiv
ity
% diseased flowers
Risk point sum
ROC curves for rules 1 and 2
3. An example
![Page 30: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/30.jpg)
Pred.cv<-NA
for (i in (1:length(TAB[,1]))) {
TAB.est.i<-data.frame(Ind.1[-i],Ind.2[-i],Incidence[-i])
TAB.pred.i<-c(Ind.1[i],Ind.2[i])
Fit.cv<-glm(TAB.est.i$Incidence~TAB.est.i[,1]+TAB.est.i[,2],family=binomial,data=TAB.est.i)
Para<-as.vector(Fit.cv$coefficients)
Pred.i<-exp(Para[1]+Para[2]*TAB.pred.i[1]+Para[3]*TAB.pred.i[2])/(1+ exp(Para[1]+Para[2]*TAB.pred.i[1]+Para[3]*TAB.pred.i[2]))
Pred.cv<-c(Pred.cv, Pred.i)
}
pred<-prediction(Pred.cv[-1],Incidence)
perf<-performance(pred,"sens","spec")
R code for ROC analysis with cross validation
3. An example
![Page 31: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/31.jpg)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Spécificité
Sen
sibi
lité
Specificity
Sen
sitiv
ity
without cross validation
with cross validation
ROC curves for rule 3
3. An example
![Page 32: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/32.jpg)
3. An example
Area under the ROC curves
Indicator AUC
% diseased flowers 0.88
Point sum 0.62
Logistic (without cross validation) 0.87
Logistic (with cross validation) 0.85
![Page 33: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/33.jpg)
5. Exercice with R:
Assessment of models for categorizing soft wheatfields according to their grain protein content
Four candidate indicators:
i. Transmitance
ii. Nitrogen nutrition index
iii. Model 1 (dynamic crop model)
iv. Model 2 (static crop model including two input varia bles)
Objective: Identify plots with high grain protein conte nt (>11.5%)
Which indicator is the best? What is its optimal decision threshold?
![Page 34: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/34.jpg)
![Page 35: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/35.jpg)
library(ROCR)
#Read an external data fileTAB<-read.table("f:\\David\\Enseignements\\FormationPologne\\dataAgralys.txt",header=T,sep="\t")print(TAB)
#Grain protein thresholdGPC.t<-11.5
#Variable of reference (binary variable Y)GPC<-TAB$ProteinGPC[GPC<GPC.t]<-0GPC[GPC>=GPC.t]<-1
#IndicatorsInd.1<-TAB$SPADInd.2<-TAB$NNIInd.3<-TAB$Model_1Ind.4<-TAB$Model_2
#Some graphspar(mfrow=c(2,2))hist(Ind.1[GPC==0], xlab="SPAD", main="Low grain protein content")hist(Ind.1[GPC==1], xlab="SPAD", main="High grain protein content")hist(Ind.2[GPC==0], xlab="NNI", main="Low grain protein content ")hist(Ind.2[GPC==1], xlab="NNI", main="High grain protein content ")
![Page 36: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/36.jpg)
Low grain protein content
SPAD
Fre
quen
cy
40 45 50
01
23
45
High grain protein content
SPAD
Fre
quen
cy
38 40 42 44 46 48 50 52
02
46
8
Low grain protein content
NNI
Fre
quen
cy
0.60 0.70 0.80 0.90
01
23
45
High grain protein content
NNI
Fre
quen
cy
0.7 0.8 0.9 1.0 1.1
01
23
45
67
![Page 37: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/37.jpg)
####ROC analysis for Ind.1####
pred<-prediction(Ind.1,GPC)perf<-performance(pred,"auc")
#Area under the ROC curveauc.1<-perf@"y.values"print("AUC for Indicator 1")print(auc.1)
#Sensitivity and specificityperf<-performance(pred,"sens","spec")print(perf)spec.1<-perf@"x.values"[[1]]sens.1<-perf@"y.values"[[1]]
#ROC curveplot(spec.1,sens.1, ylab="Sensitivity", xlab="Specificity", type="l",lty=1,lwd=3)abline(1,-1)
#Thresholdprint(perf@"alpha.values"[[1]][spec.1>0.65 & sens.1>0.65])
![Page 38: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/38.jpg)
#Logistic regressions#Combination of Ind.1 and Ind.2Fit<-glm(GPC~Ind.1+Ind.2,family=binomial)
print("Combination of Ind.1 and Ind.2")print(summary(Fit))
print("ROC analysis for the combinations of indicators without cross-validation")
#ROC analysis for Ind.1 + Ind.2
pred<-prediction(Fit$fitted.values,GPC)perf<-performance(pred,"auc")auc<-perf@"y.values"print("AUC for Ind.1 + Ind.2")print(auc)perf<-performance(pred,"sens","spec")spec.1<-perf@"x.values"[[1]]sens.1<-perf@"y.values"[[1]]plot(spec.1,sens.1, ylab="Sensitivity", xlab="Specificity", type="l",lty=1,lwd=3)abline(1,-1)
![Page 39: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/39.jpg)
print("ROC analysis for the combinations of indicators with cross-validation")
#Initialization
Pred.cv<-NA
for (i in (1:length(TAB[,1]))) {
#New tablesTAB.est.i<-data.frame(Ind.1[-i],Ind.2[-i],GPC[-i])TAB.pred.i<-c(Ind.1[i],Ind.2[i])
#Combination of Ind.1 and Ind.2Fit.cv<-glm(TAB.est.i$GPC~TAB.est.i[,1]+TAB.est.i[,2],family=binomial,data=TAB.est.i)Para<-as.vector(Fit.cv$coefficients)Pred.i<-exp(Para[1]+Para[2]*TAB.pred.i[1]+Para[3]*TAB.pred.i[2])/(1+exp(Para[1]+Para[2]*TAB.pred.i[1]+Para[3]*TAB.pred.i[2]))Pred.cv<-c(Pred.cv, Pred.i)
}
![Page 40: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/40.jpg)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Specificity
Sen
sitiv
ity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Specificity
Sen
sitiv
ity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Specificity
Sen
sitiv
ity
SPAD NNI Model 2Model 1
without CVwith CV
Indicator AUC
SPAD 0.77
NNI 0.84
Model 1 0.46
Model 2 0.62
SPAD + NNI 0.86 (0.90)
![Page 41: ROC analysis, a method to assess binary decision rules](https://reader034.vdocuments.us/reader034/viewer/2022042600/586a29891a28ab17578c297b/html5/thumbnails/41.jpg)
References
Barbottin A, Makowski D, Le Bail M, Jeuffroy M-H, Bouchard Ch, Barrier C. 2008. Comparison of models and indicators for categorizing soft wheat fields according to their grain protein contents. European Journal of Agronomy 29, 159-183.
Hughes, G., McRoberts, N., Burnett, F.J., 1999. Decision-making and diagnosis in disease management. Plant Pathol. 48, 147-153.
Makowski D., Denis J-B., Ruck L., Penaud A. 2008. A Bayesian approach to assess the accuracy of a diagnostic test based on plant disease measurement. Crop Protection 27:1187-1193.
Makowski, D., M. Taverne, J. Bolomier, M. Ducarne. 2005. Comparison of risk indicators for sclerotinia control in oilseed rape. Crop Protection 24:527-531.
Pepe MS. 2003. The statistical evaluation of medical tests for classification and prediction. Oxford Statistical Science Series 28.
Sing, T., Sander, O., Beerenwinkel, N., Lengauer, T., 2005. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940-3941.
Swets, J.A., Dawes, R.M., Monahan, J., 2000. Better decisions through science. Scientific American 283, 70-75.