![Page 1: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/1.jpg)
NEAREST NEIGHBOR CLASSIFICATION
PRESENTED BYPRESENTED BY
Zulhanif
1
![Page 2: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/2.jpg)
NEAREST NEIGHBOR?
� Metode klasifikasi berdasarkan K - jarak terdekat
� Top 10 Data Mining Algorithm
� Metode K-NN Simple tetapi merupakan metode
terkini dalam mengklasifikasian
?
2
![Page 3: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/3.jpg)
REVIEW
3
![Page 4: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/4.jpg)
REVIEW
![Page 5: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/5.jpg)
KONSEP JARAK (DISTANCE)
Tipe data
Binary Data
Interval Data Euclidean, Manhattan (Block)
Chi-square
Jaccard, pattern
Qualitativ
Sifat jarak:•d(a, b) ≥ 0•d(a, a) = 0•d(a, b) = d(b, a)•d(a, b) meningkat seiring semakin tidak mirip kedua objek•d(a,c) ≤ d(a,b) + d(b,c)
Manhattan (Block)
![Page 6: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/6.jpg)
EUCLIDEAN DISTANCE
*
A
B
Y
(x2, y2)
y2-y1
*
X
(x1, y1)x2-x1
*
d = (x2-x1)2
+ (y2-y1)2
![Page 7: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/7.jpg)
EUCLIDEAN
DISTANCE
*B
Y
(3, 5)
5-2
A
X
(1, 2)3-1
*
d = (3-1)2
+ (5-2)2
= 3,61
![Page 8: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/8.jpg)
BEBERAPA UKURAN JARAK
![Page 9: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/9.jpg)
MANHATTANN
λλ
jkikij xxd −=
9
jkikij xxd −=
![Page 10: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/10.jpg)
10
![Page 11: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/11.jpg)
11
![Page 12: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/12.jpg)
CHEBYSHEV DISTANCE
jkikk
ij xxd −= max
12
![Page 13: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/13.jpg)
13
![Page 14: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/14.jpg)
14
![Page 15: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/15.jpg)
NEAREST NEIGHBOR CLASSIFIERS
� Basic idea:
� Jika object yang diamati berjalan seperti bebek,bersuara seperti bebek,berenang seperti bebek maka secara peluang object itu adalah bebek
Test Compute Distance Test
RecordDistance
Choose k of the “nearest” records
Trainingrecords
![Page 16: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/16.jpg)
PREDICT CLASSIFIERS
Atr1 ……... AtrN ClassA
B
B
Set of Stored Cases• Store the training records
• Use training records to predict the class label of unseen cases
B
C
A
C
B
Atr1 ……... AtrN
Unseen Case
![Page 17: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/17.jpg)
DEFINITION OF NEAREST NEIGHBOR
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
![Page 18: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/18.jpg)
NEAREST NEIGHBOR
CLASSIFICATION…ISSUE
� Scaling issues
� Adanya penskalaan attribut untuk mencegah
dominasi salah satu atribut.
� Example:� Example:
� Tinggi bervariasi dari 1.5m to 1.8m
� Berat bervariasi dari 50 Kg to 80 Kg
� Income bervariasi dari 3Jt to 6 Jt
![Page 19: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/19.jpg)
NEAREST NEIGHBOR
CLASSIFICATION…
� Pemilihan nilai k:
� Jika k terlalu kecil, sensitive terhadap noise points
� Jika k terlalu besar, neighborhood (titik terdekat)
mungkin ada dari klass yang berbeda
![Page 20: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/20.jpg)
X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -
20
6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?
![Page 21: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/21.jpg)
X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -
Distance20 (5-1)^2+(3-5)^2
21
6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?
![Page 22: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/22.jpg)
X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -
Distance202517851616101341102
22
6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?
2172025517429258126
![Page 23: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/23.jpg)
X1 X2 Y1 3 +2 1 +6 1 +7 3 +7 6 +5 1 +1 5 +6 5 +5 5 +2 7 +10 9 -4 8 -6 4 -
Distance202517851616101341102
Nearest Neighbor sign
++
++
-
23
6 4 -6 9 -9 7 -9 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -5 5 ?
2172025517429258126
-
-
-
--
![Page 24: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/24.jpg)
6
8
10
12
Positive
Negative
24
0
2
4
0 2 4 6 8 10 12
?
![Page 25: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/25.jpg)
LATIHANX1 X2 Y2 3 +1 1 +5 1 +7 3 +7 6 +3 1 +1 5 +6 5 +5 5 +2 7 +4 9 -
25
4 9 -6 8 -5 4 -5 9 -5 7 -7 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -6 5 ?
![Page 26: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/26.jpg)
X1 X2 Y2 3 +1 1 +5 1 +7 3 +7 6 +3 1 +1 5 +6 5 +5 5 +2 7 +4 9 -6 8 -5 4 -
Distance Nearest Neighbor sign2041175 +2 +25250 +1 +20209
26
5 4 -5 9 -5 7 -7 8 -7 4 -4 9 -7 5 -7 10 -8 9 -7 7 -4 5 -6 10 -6 5 ?
92 -175 -102 -201 -26205 -4 -25
![Page 27: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/27.jpg)
V-FOLD CROSSVALIDATION
� V-fold cross validation membagi data kedalam V
folds. Lalu pilih nilai k untuk model nearest
neighbor analysis selanjutnya buat prediksi dari
the vth fold (menggunakan V−1 folds sebagai data training) dan evaluasi error. Proses ini data training) dan evaluasi error. Proses ini
diulang secara successively dengan
menggunakan semua kemungkinan dari v. Pada
akhir dari proses V folds, Hitung rata-rata error,
Ulangi langkah nya untuk nilai k yang lain.
Pemilihan nilai k berdasarkan error yang paling
kecil
27
![Page 28: NEAREST NEIGHBOR CLASSIFICATION · KONSEP JARAK(DISTANCE) Tipe data Binary Data Interval Data Euclidean, Manhattan (Block) Chi-square Jaccard, pattern Qualitativ Sifatjarak: •d(a,](https://reader036.vdocuments.us/reader036/viewer/2022081518/60753abc68b2922f8b3401d1/html5/thumbnails/28.jpg)
N=40 FOLD=4
Tr Test Test Test Test Test Test Test Test Test
Test Tr Test Test Test Test Test Test Test Test
28
Test Test Tr Test Test Test Test Test Test Test
Test Test Test Tr Test Test Test Test Test Test