steep learning curves reading: dh&s, ch 4.6, 4.5
Post on 21-Dec-2015
222 views
TRANSCRIPT
![Page 1: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/1.jpg)
Steep learning curves
Reading: DH&S, Ch 4.6, 4.5
![Page 2: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/2.jpg)
Administrivia•HW1 due now
•Late days are ticking...
•No other news today..
![Page 3: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/3.jpg)
Viewing and re-viewing•Last time:
•HW1 FAQ
•5 minutes of math: function optimization
•Measuring performance
•Cross-validation
•Today:
•Learning curves
•Metrics
•The nearest-neighbor rule
![Page 4: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/4.jpg)
Exercise
•Given the function:
•Find the extremum
•Show that the extremum is really a minimum
![Page 5: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/5.jpg)
Mea culpa!
•I copied the wrong example out of the book.
•Oops. My bad.
•You guys did a great job figuring it out, though...
![Page 6: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/6.jpg)
The saddle point
![Page 7: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/7.jpg)
Cross-validation in words•Shuffle data vectors
•Break into k chunks
•Train on first k-1 chunks
•Test on last 1
•Repeat, with a different chunk held-out
•Average all test accuracies together
![Page 8: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/8.jpg)
CV in pix[X;y]Original
data
[X’;y’]Randomshuffle
k-waypartition
[X1’Y1’]
[X2’Y2’]
[Xk’Yk’]
...
k train/test sets
k accuracies53.7% 85.1% 73.2%
![Page 9: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/9.jpg)
But is it really learning?•Now we know how well our models are performing
•But are they really learning?
•Maybe any classifier would do as well
•E.g., a default classifier (pick the most likely class) or a random classifier
•How can we tell if the model is learning anything?
![Page 10: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/10.jpg)
The learning curve•Train on successively larger fractions of data
•Watch how accuracy (performance) changes Learning
Static classifier(no learning)
Anti-learning(forgetting)
![Page 11: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/11.jpg)
Measuring variance•Cross validation helps you get better estimate of accuracy for small data
•Randomization (shuffling the data) helps guard against poor splits/ordering of the data
•Learning curves help assess learning rate/asymptotic accuracy
•Still one big missing component: variance
•Definition: Variance of a classifier is the fraction of error due to the specific data set it’s trained on
![Page 12: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/12.jpg)
Measuring variance•Variance tells you how much you expect your classifier/performance to change when you train it on a new (but similar) data set
•E.g., take 5 samplings of a data source; train/test 5 classifiers
•Accuracies: 74.2, 90.3, 58.1, 80.6, 90.3
•Mean accuracy: 78.7%
•Std dev of acc: 13.4%
•Variance is usually a function of both classifier and data source
•High variance classifiers are very susceptible to small changes in data
![Page 13: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/13.jpg)
Putting it all together•Suppose you want to measure the expected accuracy of your classifier, assess learning rate, and measure variance all at the same time?for (i=0;i<10;++i) { // variance reps
shuffle datado 10-way CV partition of datafor each train/test partition { // xval
for (pct=0.1;pct+=0.1;pct<=0.9) { // LCSubsample pct fraction of training settrain on subsample, test on test set
}}avg across all folds of CV partitiongenerate learning curve for this partition
}get mean and std across all curves
![Page 14: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/14.jpg)
Putting it all together“hepatitis” data
![Page 15: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/15.jpg)
5 minutes of math...
•Decision trees are non-metric
•Don’t know anything about relations between instances, except sets induced by feature splits
•Often, we have well-defined distances between points
•Idea of distance encapsulated by a metric
![Page 16: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/16.jpg)
5 minutes of math...•Definition: a metric function
•is a function that obeys the following properties:
•Non-negativity:
•Reflexivity:
•Symmetry:
4.Triangle inequality:
![Page 17: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/17.jpg)
5 minutes of math...•Euclidean distance
![Page 18: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/18.jpg)
5 minutes of math
xa
xb
dE(xa,xb)
![Page 19: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/19.jpg)
5 minutes of math...•Manhattan (taxicab) distance
•Distance travelled along a grid between two points
•No diagonals allowed
•Good for integer features
![Page 20: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/20.jpg)
5 minutes of math
xa
xb
dM(xa,xb)
![Page 21: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/21.jpg)
5 minutes of math...•What if some attribute is categorical?
![Page 22: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/22.jpg)
5 minutes of math...•What if some attribute is categorical?
•Typical answer is Hamming (sometimes 0/1) distance:
•For each attribute, add 1 if the instances differ in that attribute, else 0
![Page 23: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/23.jpg)
Distances in classification•Nearest neighbor rule: find the nearest instance to the query point in feature space, return the class of that instance
•Simplest possible distance-based classifier
•With more notation:
![Page 24: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/24.jpg)
Distances in classification•Nearest neighbor rule: find the nearest instance to the query point in feature space, return the class of that instance
•Simplest possible distance-based classifier
•With more notation:
•Distance here is “whatever’s appropriate to your data”
![Page 25: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/25.jpg)
Properties of NN•Training time of NN?
•Classification time?
•Geometry of model?
d( , )
Closer to
Closer to
![Page 26: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/26.jpg)
Properties of NN•Training time of NN?
•Classification time?
•Geometry of model?
![Page 27: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/27.jpg)
Properties of NN•Training time of NN?
•Classification time?
•Geometry of model?
![Page 28: Steep learning curves Reading: DH&S, Ch 4.6, 4.5](https://reader035.vdocuments.us/reader035/viewer/2022062320/56649d635503460f94a466e1/html5/thumbnails/28.jpg)
NN miscellaney
•Slight generalization: k-Nearest neighbors (k-NN)
•Find k training instances closest to query point
•Vote among them for label
•Q: How does this affect system?