data analysis 1 mark stamp. topics experimental design o training set, test set, n-fold cross...
TRANSCRIPT
![Page 1: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/1.jpg)
1
Data Analysis
Data Analysis
Mark Stamp
![Page 2: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/2.jpg)
2
Topics
Experimental designo Training set, test set, n-fold cross
validation, thresholding, imbalance, etc.
Accuracyo False positive, false negative, etc.
ROC curveso Area under the ROC curve (AUC)o Partial AUC (sometimes written as
AUCp)Data Analysis
![Page 3: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/3.jpg)
3
Objective
Assume that we have a proposed method for detecting malware
We want to determine how well it performs on specific dataseto We want to quantify effectiveness
Ideally, compare to previous worko But, often difficult to directly compare
Comparisons to AV products?
Data Analysis
![Page 4: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/4.jpg)
4
Basic Assumptions
We have a set of known malwareo All from a single (metamorphic)
“family”… o …or, at least all of a similar typeo For broader “families”, more difficult
Also, a representative non-family seto Often, assumed to be benign fileso The more diverse, the more difficult
Much depends on problem specificsData Analysis
![Page 5: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/5.jpg)
5
Experimental Design Want to test malware detection
scoreo Refer to malware dataset as match
seto And benign dataset is nomatch set
Partition match set into… o Training set used to determine
parameters of the scoring functiono Test set reserved to test scoring
function generated from training set Note: Cannot test on training setData Analysis
![Page 6: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/6.jpg)
6
Training and Scoring
Two phases: Training and scoring Training phase
o Train a model using training set Scoring phase
o Score data in test set and score nomatch (benign) set
Analyze results from scoring phaseo Assume representative of general case
Data Analysis
![Page 7: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/7.jpg)
7
Scatterplots Train a model on the training set Apply score to test and nomatch
setso Can visualize result as scatterplot
Data Analysis
score
test case
match scores
nomatch scores
![Page 8: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/8.jpg)
8
Experimental Design
A couple of potential problems…o How to partition match set?o How to get most out of limited data
set? Why are these things concerns?
o When we partition match set, might get biased training/test sets, and …
o … more data points is “more better” Cross validation solves these
problemsData Analysis
![Page 9: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/9.jpg)
9
n-fold Cross Validation
Partition match set into n equal subsetso Denote subsets as S1,S2,…,Sn
Let training set be S2 S3… Sn o And test set is S1
Repeat with training set S1 S3… Sn o And test set S2
And so on, for each of n “folds” o In our work, we usually select n = 5 Data Analysis
![Page 10: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/10.jpg)
10
n-fold Cross Validation
Benefits of cross validation? Any bias in match data smoothed
outo Since bias only affects one/few of the
Si
Obtain lots more match scoreso Usually, no shortage of nomatch datao But match data can be very limited
And it’s easy to do, so why not?o Best of all, it sounds so fancy…
Data Analysis
![Page 11: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/11.jpg)
11
Thresholding
Threshold based on test vs nomatcho After training and scoring phases
Ideal is complete separationo I.e., no overlap in scatterploto Usually, that doesn’t happeno So, where to set the threshold?
In practical use, thresholding criticalo At research stage, more of a
distraction
Data Analysis
![Page 12: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/12.jpg)
12
Thresholding
Where to set threshold?o Left case is easy, right case, not so
much
Data Analysis
score
test case
score
test case
![Page 13: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/13.jpg)
13
Quantifying Success
We need a way to quantify “better”o Ideas?
Data Analysis
score
test case
score
test case
![Page 14: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/14.jpg)
14
Accuracy
Given scatterplot and a threshold… We have following 4 cases
o True positive correctly classified as +
o False positive incorrectly classified +
o True negative correctly classified as −
o False negative incorrectly classified −
TP, FP, TN, FN, respectively, for shorto Append “R” to each for “rate”
Data Analysis
![Page 15: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/15.jpg)
15
Sensitivity and Specificity
The TPR also known as sensitivity while TNR is known as specificity
Consider a medical testo Sensitivity is percentage of sick people
who “pass” the test (as they should)o Specificity is percentage of healthy
people who “fail” the test (as they should)
Inherent tradeoff between TPR/TNRo Note that these depend on thresholdData Analysis
![Page 16: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/16.jpg)
16
Accuracy
Let P be number of positive cases tested and N negative cases testedo Note: P is size of test set, N nomatch
seto Also, P = TP + FN and N = TN + FP
Finally, Accuracy = (TP + TN) / (P + N) o Note that accuracy ranges from 0 to 1o Accuracy of 1 is the ideal caseo Accuracy 0? Don’t give up your day
job…Data Analysis
![Page 17: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/17.jpg)
17
Balanced Accuracy
Often, there is a large imbalance between test set and nomatch seto Test set is small relative to nomatch
set Define
Balanced accuracy = (TPR + TNR) / 2 = 0.5 TP/P
+ 0.5 TN/No Errors on both sets weighted same
Consider imbalance issue again later
Data Analysis
![Page 18: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/18.jpg)
18
Accuracy
Accuracy tells us something…o But it depends on where threshold is
seto How should we set the threshold?o Seems we are going around in circles
like a dog chasing its tail Bottom line? Still don’t have a good
way to compare different techniqueso Next slide, please…Data Analysis
![Page 19: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/19.jpg)
19
ROC Curves Receiver Operating Characteristic
o Originated from electrical engineeringo But now widely used in many fields
What is an ROC curve?o Plot TPR vs FPR by varying threshold
thru the range of scoreso That is, FPR on x-axis, TPR on y-axis o Equivalently, 1 – specificity vs
sensitivity o What the … ?
Data Analysis
![Page 20: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/20.jpg)
20
ROC Curve
Suppose threshold is set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 1.0o FPR = 1.0 – TNR
= 1.0 – 0.0 = 1.0Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 21: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/21.jpg)
21
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 1.0o FPR = 1.0 – TNR
= 1.0 – 0.2 = 0.8Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 22: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/22.jpg)
22
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 1.0o FPR = 1.0 – TNR
= 1.0 – 0.4 = 0.6Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 23: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/23.jpg)
23
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 1.0o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.4Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 24: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/24.jpg)
24
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 0.8o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.4Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 25: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/25.jpg)
25
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 0.6o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.4Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 26: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/26.jpg)
26
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 0.6o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.2Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 27: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/27.jpg)
27
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 0.4o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.2Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 28: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/28.jpg)
28
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 0.4o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.0Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 29: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/29.jpg)
29
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 0.2o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.0Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 30: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/30.jpg)
30
ROC Curve
Suppose threshold set at yellow lineo Above yellow,
classified as positive, o Below yellow is
negative In this case,
o TPR = 0.0o FPR = 1.0 – TNR
= 1.0 – 0.6 = 0.0Data Analysis
score
test case
TPR
FPR 1
1
0
![Page 31: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/31.jpg)
31
ROC Curve
Connect the dots… This is ROC curve What good is it?
o Captures info wrt all possible thresholds
o Removes threshold as a factor in the analysis
What does it all mean?Data Analysis
TPR
FPR 1
1
0
![Page 32: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/32.jpg)
32
ROC Curve
Random classifier? o Yellow 45 degree line
Perfect classifier?o Red lines (Why?)
Above 45 degree line? o Better than randomo The closer to the red,
the closer to perfectData Analysis
TPR
FPR 1
1
0
![Page 33: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/33.jpg)
33
Area Under the Curve (AUC)
ROC curve lives within a 1x1 square
Random classifier?o AUC ≈ 0.5
Perfect classifier (red)?o AUC = 1.0
Example curve (blue)?o AUC = 0.8
Data Analysis
TPR
FPR 1
1
0
![Page 34: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/34.jpg)
34
Area Under the Curve (AUC)
Area under ROC curve quantifies successo 0.5 like flipping a coino 1.0 perfection
achieved AUC of ROC curve
o Enables us to compare different techniques
o And no need to worry about threshold
Data Analysis
TPR
FPR 1
1
0
![Page 35: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/35.jpg)
35
Partial AUC
Might only consider cases where FPR < p
“Partial” AUC is AUCp o Area up to FPR of po Normalized by p
In this example,AUC0.4 = 0.2 / 0.4 = 0.5
AUC0.2 = 0.08/0.2 = 0.4 Data Analysis
TPR
FPR 1
1
0
![Page 36: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/36.jpg)
36
Imbalance Problem
Suppose we train model for given malware family
In practice, we expect to score many more non-family files than family o Number of negative cases is largeo Number of positive cases is small
So what? Let’s consider an exampleData Analysis
![Page 37: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/37.jpg)
37
Imbalance Problem
In practice, we need threshold For a given threshold, suppose
sensitivity = 0.99, specificity = 0.98o Then TPR = 0.99 and FPR = 0.02
Assume 1 in 1000 tested is malwareo Of the type our model trained to
detect Suppose we scan, say, 100k files
o What do we find?
Data Analysis
![Page 38: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/38.jpg)
38
Imbalance Problem
Assuming TPR = 0.99 and FPR = 0.02o And 1 in 1000 is malware
After scanning 100k files…o Detect 99 of 100 actual malware (TP)o Misclassify 1 malware as benign (FN)o Correctly classify 97902 (out of
99900) benign as benign (TN)o Misclassify 1998 benign as malware
(FP) Data Analysis
![Page 39: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/39.jpg)
39
Imbalance Problem
We have 97903 classified as benigno Of those, 97902 are actually benigno And 97902/97903 > 0.9999
We classified 2097 as malwareo Of these, only 99 are actual malwareo But 99/2097 < 0.05
Remember the “boy who cried wolf”?o Here, we have detector that cries
wolf…Data Analysis
![Page 40: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/40.jpg)
40
Imbalance Solution?
What to do? There is inherent tradeoff between
sensitivity and specificity Suppose we can adjust threshold
soo TPR = 0.92 and FPR = 0.0003
As before…o We have 1 in 1000 is malwareo And we test 100k files
Data Analysis
![Page 41: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/41.jpg)
41
Imbalance Solution?
Assuming TPR = 0.92 and FPR = 0.0003o And 1 in 1000 is malware
After scanning 100k files…o Detect 92 of 100 actual malware (TP)o Misclassify 8 malware as benign (FN)o Correctly classify 99870 (out of 99900)
benign as benign (TN)o Misclassify 30 benign as malware (FP)
Data Analysis
![Page 42: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/42.jpg)
42
Imbalance Solution?
We have 99878 classified as benigno Of those, all but 8 are actually benigno And 99870/99878 > 0.9999
We classified 122 as malwareo Of these, 92 are actual malwareo And 92/122 > 0.75
Can adjust threshold to further reduce “crying wolf” effect
Data Analysis
![Page 43: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/43.jpg)
43
Imbalance Problem
A better alternative? Instead of increasing FPR to lower
TPR o Perform secondary testing on files that
are initially classified as malwareo We can thus weed out most FP cases
This gives us best of both worldso Low FPR, few benign reported as
malware No free lunch, so what’s the cost?Data Analysis
![Page 44: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/44.jpg)
44
Bottom Line Design your experiments properly
o Use n-fold cross validation (e.g., n = 5)o Generally, cross validation is important
Thresholding is important in practiceo But not so useful for analyzing resultso Accuracy not so informative either
Use ROC curves and compute AUCo Sometimes, partial AUC is better
Imbalance problem may be significant issue
Data Analysis
![Page 45: Data Analysis 1 Mark Stamp. Topics Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc. Accuracy o](https://reader035.vdocuments.us/reader035/viewer/2022062322/56649e9f5503460f94ba119e/html5/thumbnails/45.jpg)
45
References
A.P. Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition, 30:1145-1159, 1997
Data Analysis