Lazy Learningk-Nearest Neighbour
Motivation: availability of large amounts of processingpower improves our ability to tune k-NN classifiers
What is Lazy Learning?
• Compare ANNs and CBR or k-NN classifier– Artificial Neural Networks are eager learners
• training examples compiled into a model at training time
• not available at runtime
– CBR or k-Nearest Neighbour are lazy• little offline learning done
• work deferred to runtime
Compare conventional use of lazy-eager in computer science
Outline
• Classification problems
• Classification techniques
• k-Nearest Neighbour– Condense training Set– Feature selection– Feature weighting
• Ensemble techniques in ML
Classification problems
• Exemplar characterised by a set of features; decide class to which exemplar belongs
Compare regression problems
• Exemplar characterised by a set of features;
• decide value of continuous output (dependant) variable
Classifying apples and pears
Greeness Height Width Taste Weight Height/Width ClassNo. 1 210 60 62 Sweet 186 0.97 AppleNo. 2 220 70 51 Sweet 180 1.37 PearNo. 3 215 55 55 Tart 152 1.00 AppleNo. 4 180 76 40 Sweet 152 1.90 PearNo. 5 220 68 45 Sweet 153 1.51 PearNo. 6 160 65 68 Sour 221 0.96 AppleNo. 7 215 63 45 Sweet 140 1.40 PearNo. 8 180 55 56 Sweet 154 0.98 AppleNo. 9 220 68 65 Tart 221 1.05 Apple
No. 10 190 60 58 Sour 174 1.03 Apple
No. x 222 70 55 Sweet 185 1.27 ?
To what class does this belong?
Distance/Similarity Function
For query q and training set X (described by features F)compute d(x,q) for each x X, where
F
qxf
fff qxwd ),(),(
continuous is
and discrete is 1
and discrete is 0
),(
fqx
qxf
qxf
qx
ff
ff
ff
ff
and where
Category of q decided by its k Nearest Neighbours
k-NN and Noise
• 1-NN easy to implement– susceptible to noise
• a misclassification every time a noisy pattern retrieved
• k-NN with k 3 will overcome this
e.g. Pregnancy prediction
http://svr-www.eng.cam.ac.uk/projects/qamc/
e.g. MVT
• Machine Vision for inspection of PCBs– components present or
absent– solder joints good or bad
Components present?Absent
Present
Characterise image as a set of features
type name Wid2 Wid3 CenX CenY M1 Sig1 M2 Sig2 M3 Sig3 Min2c0402_mvc c815 556 1344 3 28 134 7 61 16 109 5 51c0402_mvc c804 1221 1253 -20 -49 127 30 78 34 97 39 54c0402_mvc c802 441 1189 -45 -52 122 28 91 24 89 40 68c0402_mvc c808 532 1294 59 60 130 23 74 29 138 9 58c0402_mvc c806 1384 1492 -9 65 140 6 72 15 144 13 62c0402_mvc c605 943 1278 51 -9 116 29 68 28 139 7 54c0402_mvc c813 1446 1462 209 48 93 15 139 29 162 6 100c0402_mvc c606 1219 1302 40 -8 161 7 93 25 135 3 65c0402_mvc c710 1113 1128 -99 -13 145 6 95 40 88 38 56c0402_mvc c703 1090 1386 -56 -18 149 11 72 28 147 14 52c0402_mvc c761 1214 1203 -95 -21 149 11 77 34 113 40 56c0402_mvc c701 1487 1296 -30 33 142 9 73 28 135 12 54c0402_mvc c732 1038 1196 -19 -3 148 8 62 10 100 44 56c0402_mvc c753 1015 1288 58 -16 123 13 73 35 128 8 54c0402_mvc c751 1146 1036 -163 -25 140 5 102 34 85 2 80c0402_mvc c760 1113 1091 -121 44 133 11 94 44 96 37 57
Classification techniques• Artificial Neural Networks
– also good for non linear regression
– black box• development tricky
• users do not know what is going on
• Decision Trees– built using induction (information theoretic analysis)
• k-Nearest Neighbour classifiers– keep training examples, find k nearest at run time
Dimension reduction in k-NN
• Not all features required– noisy features a
hindrance
• Some examples redundant– retrieval time depends on
no. of examples
p features
q best features
n covering examples
m examples
Condensed NND set of training samplesFind E where E D; NN rule used with E should be as good as with D
choose x D randomly, D D \ {x}, E {x},DO
learning? FALSE, FOR EACH x D
classify x by NN using E,if classification incorrect
then E E {x}, D D \ {x}, learning TRUE,
WHILE (learning? FALSE)
Condensed NN
100 examples2 categories
Different CNN solutions
Improving Condensed NN
• Sort data based on distance to nearest unlike neighbour
A
B
– identify exemplars near decision surface
– in diagram
B more useful than A
• Different outcomes depending on data order– that’s a bad thing in an algorithm
Condensed NN
100 examples2 categories
Different CNN solutions
CNNusingNUN
m
Feature selection
Greeness Height Width Taste Weight Height/Width ClassNo. 1 210 60 62 Sweet 186 0.97 AppleNo. 2 220 70 51 Sweet 180 1.37 PearNo. 3 215 55 55 Tart 152 1.00 AppleNo. 4 180 76 40 Sweet 152 1.90 PearNo. 5 220 68 45 Sweet 153 1.51 PearNo. 6 160 65 68 Sour 221 0.96 AppleNo. 7 215 63 45 Sweet 140 1.40 PearNo. 8 180 55 56 Sweet 154 0.98 AppleNo. 9 220 68 65 Tart 221 1.05 Apple
No. 10 190 60 58 Sour 174 1.03 Apple
• Irrelevant features are noise:– make classification harder
• Extra features add to computation cost mpT p
Ensemble techniques
• For the user with more machine cycles than they know what to do with
Outcome
Combiner
Classifiers
• Build several classifiers– different training subsets
– different feature subsets
• Aggregate results– voting
• vote based on generalisation error
Conclusions
• Finding a covering set of training data– very good solutions exist
• Compare with results of Ensemble techniques