learning from data lecture 17 memory and efficiency in...
Post on 27-Jun-2020
1 Views
Preview:
TRANSCRIPT
Learning From DataLecture 17
Memory and Efficiency in Nearest Neighbor
MemoryEfficiency
M. Magdon-IsmailCSCI 4100/6100
recap: Similarity and Nearest Neighbor
Similarity
d(x,x′) = ||x− x′ ||
1-NN rule
21-NN rule
1. Simple.
2. No training.
3. Near optimal Eout:
k →∞, k/N → 0 =⇒ Eout → E∗out.
4. Good ways to choose k:
k = 3; k =⌈√
N⌉
; validation/cross validation.
5. Easy to justify classification to customer.
6. Can easily do multi-class.
7. Can easily adapt to regression or logistic regression
g(x) =1
k
k∑
i=1
y[i](x) g(x) =1
k
k∑
i=1
q
y[i](x) = +1y
8. Computationally demanding.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 2 /25 Computational demands −→
Computational Demands of Nearest Neighbor
Memory.
Need to store all the data, O(Nd) memory.
N = 106, d = 100, double precision≈ 1GB
Finding the nearest neighbor of a test point.
Need to compute distance to every data point, O(Nd).
N = 106, d = 100, 3GHz processor ≈ 3ms (compute g(x))
N = 106, d = 100, 3GHz processor ≈ 1hr (compute CV error)
N = 106, d = 100, 3GHz processor > 1month (choose best k from among 1000 using CV)
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 3 /25 Two basic approaches −→
Two Basic Approaches
Reduce the amount of data.
The 5-year old does not remember every horse he has seen, only a few representative horses.
Store the data in a specialized data structure.
Ongoing research field to develop geometric data structures to make finding nearest neighbors fast.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 4 /25 Irrelevant data −→
Throw Away Irrelevant Data
−−−−−−→
−−−→
k = 1
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 5 /25 Decision boundary consistent −→
Decision Boundary Consistent
−−−−−−→
−−−→
g(x) unchanged
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 6 /25 Training set consistent −→
Training Set Consistent
−−−−−−→
−−−→
g(xn) unchanged
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 7 /25 Comparing −→
Decision Boundary Vs. Training Set Consistent
DB−−−−−−→
TS
−−−→
g(x) unchangedversus
g(xn) unchanged
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 8 /25 Consistent 6=⇒ (g(xn) = yn) −→
Consistent Does Not Mean g(xn) = yn
DB−−−−−−→
TS
−−−→
k = 3
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 9 /25 Training set consistent (k = 3) −→
Training Set Consistent (k = 3)
−−−−−−→
−−−→
g(xn) unchanged
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 10 /25 CNN −→
CNN: Condensed Nearest Neighbor (k = 3)
+ +
++
add this point
Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D
Add a red point:i. not already selected
ii. closest to the inconsistent point
1. Randomly select k data points into S .
2. Classify all data according to S .
3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.
4. Add the closest point to x∗ not in S that has class y∗.
5. Iterate until S classifies all points consistently with D.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 11 /25 CNN: add red point −→
CNN: Condensed Nearest Neighbor
+ +
++
add this point
Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D
Add a red point:i. not already selected
ii. closest to the inconsistent point
1. Randomly select k data points into S .
2. Classify all data according to S .
3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.
4. Add the closest point to x∗ not in S that has class y∗.
5. Iterate until S classifies all points consistently with D.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 12 /25 CNN: algorithm −→
CNN: Condensed Nearest Neighbor
+ +
++
add this point
Consider the solid blue point:i. blue w.r.t. selected pointsii. red w.r.t. D
Add a red point:i. not already selected
ii. closest to the inconsistent point
1. Randomly select k data points into S .
2. Classify all data according to S .
3. Let x∗ be an inconsistent point and y∗ its class w.r.t. D.
4. Add the closest point to x∗ not in S that has class y∗.
5. Iterate until S classifies all points consistently with D.
Minimum consistent set (MCS)?← NP-hard
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 13 /25 Digits Data −→
Nearest Neighbor on Digits Data
1-NN rule 21-NN rule
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 14 /25 Condensing the Digits Data −→
Condensing the Digits Data
1-NN rule 21-NN rule
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 15 /25 Finding the nearest neighbor −→
Finding the Nearest Neighbor
1. S1, S2 are ‘clusters’ with centers µ1,µ2 and radii r1, r2.
2. [Branch] Search S1 first→ x̂[1].
3. The distance from x to any point in S2 is at least
||x− µ2 || − r2
4. [Bound] So we are done if
||x− x̂[1] || ≤ ||x− µ2 || − r2
A branch and bound algorithm
Can be applied recursively
S1
S2
x
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 16 /25 When does the bound hold? −→
When Does the Bound Hold?
Bound condition: ||x− x̂[1] || ≤ ||x− µ2 || − r2.
||x− x̂[1] || ≤ ||x− µ1 ||+ r1
So, it suffices that
r1 + r2 ≤ ||x− µ2 || − ||x− µ1 ||.
||x− µ1 || ≈ 0 means ||x− µ2 || ≈ ||µ2 − µ2 ||.
It suffices that
r1 + r2 ≤ ||µ2 − µ1 ||. S1
S2
x
within cluster spread should be less than between cluster spread
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 17 /25 Finding clusters – Lloyd’s algorithm −→
Finding Clusters – Lloyd’s Algorithm
1. Pick well separated centers for each cluster.
2. Compute Voronoi regions as the clusters.
3. Update the Centers.
4. Update the Voronoi regions.
5. Compute centers and radii:
µj =1
|Sj|∑
xn∈Sj
xn; rj = maxxn∈Sj
||xn − µj ||.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 18 /25 Furtherest away point −→
Finding Clusters – Lloyd’s Algorithm
1. Pick well separated centers for each cluster.
2. Compute Voronoi regions as the clusters.
3. Update the Centers.
4. Update the Voronoi regions.
5. Compute centers and radii:
µj =1
|Sj|∑
xn∈Sj
xn; rj = maxxn∈Sj
||xn − µj ||.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 19 /25 Next furtherest away point −→
Finding Clusters – Lloyd’s Algorithm
1. Pick well separated centers for each cluster.
2. Compute Voronoi regions as the clusters.
3. Update the Centers.
4. Update the Voronoi regions.
5. Compute centers and radii:
µj =1
|Sj|∑
xn∈Sj
xn; rj = maxxn∈Sj
||xn − µj ||.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 20 /25 All centers picked −→
Finding Clusters – Lloyd’s Algorithm
1. Pick well separated centers for each cluster.
2. Compute Voronoi regions as the clusters.
3. Update the Centers.
4. Update the Voronoi regions.
5. Compute centers and radii:
µj =1
|Sj|∑
xn∈Sj
xn; rj = maxxn∈Sj
||xn − µj ||.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 21 /25 Construct Voronoi regions −→
Finding Clusters – Lloyd’s Algorithm
1. Pick well separated centers for each cluster.
2. Compute Voronoi regions as the clusters.
3. Update the Centers.
4. Update the Voronoi regions.
5. Compute centers and radii:
µj =1
|Sj|∑
xn∈Sj
xn; rj = maxxn∈Sj
||xn − µj ||.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 22 /25 Update centers −→
Finding Clusters – Lloyd’s Algorithm
1. Pick well separated centers for each cluster.
2. Compute Voronoi regions as the clusters.
3. Update the Centers.
4. Update the Voronoi regions.
5. Compute centers and radii:
µj =1
|Sj|∑
xn∈Sj
xn; rj = maxxn∈Sj
||xn − µj ||.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 23 /25 Update Voronoi regions −→
Finding Clusters – Lloyd’s Algorithm
1. Pick well separated centers for each cluster.
2. Compute Voronoi regions as the clusters.
3. Update the Centers.
4. Update the Voronoi regions.
5. Compute centers and radii:
µj =1
|Sj|∑
xn∈Sj
xn; rj = maxxn∈Sj
||xn − µj ||.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 24 /25 Preview RBF −→
Radial Basis Functions (RBF)
k-Nearest Neighbor: Only considers k-nearest neighbors.each neighbor has equal weight
What about using all data to compute g(x)?
RBF: Use all data.data further away from x have less weight.
c© AML Creator: Malik Magdon-Ismail Memory and Efficiency in Nearest Neighbor: 25 /25
top related