nearest neighbor classifier 1.k-nn classifier 2.multi-class classification

Nearest Neighbor Classifier

1. K-NN Classifier2. Multi-Class Classification

k Nearest Neighbor Classification

kNN = k Nearest Neighbor (Basic steps used to Classify a document d into class c):

1. Define k-neighborhood N as k nearest neighbors of d. 2. Count number of documents i in N that belong to c. 3. Estimate P(c|d) as i/k. 4. Choose as class argmaxc P(c|d) [ = majority class]

Figure: KNN (K=6)

KNN Important Points

K-NN Definition: Consider the two class problem, Let ‘K’ be an odd number. Denoted by inX ,

the ith nearest neighbor of X in a training set of size ‘n’ and let inY be the corresponding label.

K-NN Rule

1. Decide 0, If 2

1

KYK

i

in

and 2. Decide 1, If

21

KYK

i

in

Hence, the K-NN rule finds the K nearest neighbor of ‘X’ and uses the majority vote of their labels to assign a label to ‘X’

Table: kNN training (with preprocessing) and testing

TRAIN K-NN(C,D) 1. DseprocessorD Pr' 2. ', DCKSelectK 3. return D’, K.

Apply K-NN (C,D’,K,d) 1. dKDNeighboursNearestComputeSK ,,' 2. For each Cc j

3. Do KcS

P jKj

4. return jj Pmaxarg

Note: (1) Here, jP is an estimate for dcPSCP jKi and (2) jc denotes the set of all documents

in the class jc

KNN Important PointsSome important points:

1. In kNN classifier, NEAREST NEIGHBOR determines the decision boundary locally. For 1NN we assign each document to the class of its closest neighbor. For kNN we assign each document to the majority class of its k closest neighbors where k is a parameter.

2. 1NN is not very robust. The classification decision of each test document relies on the class of a single training document, which may be incorrectly labeled or atypical. kNN for k > 1 is more robust. It assigns documents to the majority class of their k closest neighbors, with ties broken randomly.

3. More robust alternative is to find the k most-similar examples and return the majority category of these k examples. Value of k is typically odd to avoid ties; 3 and 5 are most common.

4. Also called: (1) Case-based learning, (2) Memory-based learning and (3) Lazy learning

Figure: Voronoi tessellation and decision boundaries (double lines) in 1NN

5

kNN decision boundaries

Government

Science

Arts

Boundaries are in principle arbitrary surfaces – but usually polyhedra

kNN gives locally defined decision boundaries betweenclasses – far away points do not influence each classificationdecision (unlike in Naïve Bayes, Rocchio, etc.)

Sec.14.3

6

Similarity Metrics and Complexity

• Nearest neighbor method depends on a similarity (or distance) metric.

• Simplest for continuous m-dimensional instance space is Euclidean distance.

• Simplest for m-dimensional binary instance space is Hamming distance (number of feature values that differ).

• For text, cosine similarity of tf.idf weighted vectors is typically most effective.

• Testing Time: O(B|Vt|) where B is the average number of training documents in which a test-document word appears.– Typically B << |D|

Sec.14.3

7

Illustration of 3 Nearest Neighbor for Text Vector Space

Sec.14.3

8

3 Nearest Neighbor vs. Rocchio

• Nearest Neighbor tends to handle polymorphic categories better than Rocchio/NB.

9

kNN: Discussion• No feature selection necessary• Scales well with large number of classes

– Don’t need to train n classifiers for n classes• Classes can influence each other

– Small changes to one class can have ripple effect• Scores can be hard to convert to probabilities• No training necessary

– Actually: perhaps not true. (Data editing, etc.)• May be expensive at test time• In most cases it’s more accurate than NB or Rocchio• Bias/Variance tradeoff

– Variance ≈ Capacity• kNN has high variance and low bias.

– Infinite memory• NB has low variance and high bias.

– Decision surface has to be linear.

Sec.14.3

Types of Classifiers

Binary and Multi-class Classification Problem: Given a training dataset of the form ii yx , , where n

ix is the ith example and kyi ,....,2,1 is the ith class label, we aim at finding a learning model ‘H’,

such that ii yxH for new unseen example.

Case1: The case when the labels iy are just +1 or -1 called two class classification problems.

Case2: when ‘I’ is related to more then two classes then it is called multi-class classification problem.

Three approaches applied:

1. The first category of algorithms includes decision trees, neural networks, k-Nearest Neighbor, Naive Bayes classifiers, and Support Vector Machines.

2. The second category include approaches for converting the multiclass classification problem into a set of binary classification problems that are efficiently solved using binary classifiers e.g. Support Vector Machines.

3. Another approach tries to pose a hierarchy on the output space, the available classes, and performs a series of tests to detect the class label of new patterns.

Figure: Example tree for 5-class problem

References

• Aly M. Survey on Multiclass Classification Methods. Technical report, California Institute of Technology, 2005.

• Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack; Information retrieval ; MIT Press, 2010.

• Andoni, Alexandr, Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab Mirrokni. 2006. Locality-sensitive hashing using stable distributions. In Nearest Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press.

nearest neighbor classifier 1.k-nn classifier 2.multi-class classification

Documents