examples of classification methods csit5210. content knn decision tree naïve bayesian bayesian...

36
Examples of classification methods CSIT5210

Upload: marvin-merritt

Post on 17-Dec-2015

228 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Examples of classification methods

CSIT5210

Page 2: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Content

• KNN• Decision Tree• Naïve Bayesian• Bayesian Belief Network• Naïve Neural Network• Multilayer Neural Network• SVM

Page 3: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

KNN• Question: Assignment 1 Q1• Solution:

1)Understand the distance function: # of different attributes.

The distance between tuple 2 and tuple 3:

one attribute is the same and three attributes are different

Dist(2,3) = |{Height(low!=med), Weight(med!=high), BloodPressure(med!=high)}| = 3

Page 4: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

KNN2) Calculate the Distance table.

Dist 1 2 3 4 5 6 7 8 9 10

11 4 3 2 2 3 3 2 3 2 2

12 1 4 3 3 3 3 2 3 2 4

13 4 4 2 2 4 3 2 4 4 1

14 4 3 2 2 4 3 3 3 3 1

15 2 2 4 2 4 3 3 3 4 3

16 4 3 3 3 4 3 3 3 4 2

17 0 4 4 3 3 2 2 4 3 4

18 2 4 3 2 3 2 2 3 3 3

19 3 3 3 2 2 3 2 3 3 3

20 2 4 3 4 3 1 4 4 3 3

Testing data

Training data

Page 5: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

KNN3) For k=1 find the nearest 1 neighbor(choose the smaller id in ties), compare actual and prediction result.

Dist 1(N) 2(Y) 3(Y) 4(Y) 5(N) 6(N) 7(N) 8(Y) 9(N) 10(Y)11(Y==Y) 4 3 2 2 3 3 2 3 2 212(N==N) 1 4 3 3 3 3 2 3 2 413(Y==Y) 4 4 2 2 4 3 2 4 4 114(Y==Y) 4 3 2 2 4 3 3 3 3 115(N==N) 2 2 4 2 4 3 3 3 4 316(N!=Y) 4 3 3 3 4 3 3 3 4 217(N==N) 0 4 4 3 3 2 2 4 3 418(Y!=N) 2 4 3 2 3 2 2 3 3 319(Y==Y) 3 3 3 2 2 3 2 3 3 320(N==N) 2 4 3 4 3 1 4 4 3 3

There are 2 errors in the the 10 test data(11-20), so the error rate is 2/10 = 0.2

We choose the one with smaller id to break tie

Page 6: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

KNN• 4) For k=3 repeat the above procedure. Use the

majority win rule to get prediction result.Dist 1(N) 2(Y) 3(Y) 4(Y) 5(N) 6(N) 7(N) 8(Y) 9(N) 10(Y)

11(Y==Y) 4 3 2 2 3 3 2 3 2 2

12(N==N) 1 4 3 3 3 3 2 3 2 4

13(Y==Y) 4 4 2 2 4 3 2 4 4 1

14(Y==Y) 4 3 2 2 4 3 3 3 3 1

15(N!=Y) 2 2 4 2 4 3 3 3 4 3

16(N!=Y) 4 3 3 3 4 3 3 3 4 2

17(N==N) 0 4 4 3 3 2 2 4 3 4

18(Y!=N) 2 4 3 2 3 2 2 3 3 3

19(Y!=N) 3 3 3 2 2 3 2 3 3 3

20(N==N) 2 4 3 4 3 1 4 4 3 3

There are 4 errors, so the error rate is 4/10 = 0.4

Page 7: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Decision Tree• Question: Assignment 1 Q2• Solution:– There are 6 yes and 6 no in the training data. So: Info(D) = I(6,6)

– For all the 4 attributes, calculate the Information gained by branching on it:• E.g. : If branching on the attribute “age”, the data will

be split into: • D(age=old) = {4 yes, 0 no } • D(age=young) = {2 yes, 6 no}

• Infoage(D) = 8/12 * I(2,6) + 4/12 * I(4,0)

Page 8: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Decision Tree

– Age has the largest gain so we choose Age as the root.

– For the age=old branch, all decisions are yes, cannot split any more.

– For the age=young branch, repeat the gain calculation again.

Page 9: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Decision Tree

Then we choose married for splitting. The remaining data is:Dmarried=yes = {4 approved=no}Dmarried=no = {2 approved=yes, 2 approved=no}

We choose approved=yes for the branch married=no. Here is the final tree:

Page 10: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Decision Tree Age

MarriedYes

Yes No

old young

no yes

Apply the tree on the testing data:error rate = 4/6 = 0.667

Page 11: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naive Bayesian

• Question: Assignment 1 Q3• Answer: – In the training data, there are 6 approved=yes and 6

approved=no, so • P(C1) = P(approved=yes) = 6/12 = 0.5• P(C2) = P(approved=no) = 6/12 = 0.5

– For every attribute and class, compute P(X|Ci)• P(Sex = “male” | C1) = 4/6 = 0.667• P(Sex = “female” | C1) = 2/6 = 0.333• P(Sex = “male” | C2) = 4/6 = 0.667• P(Sex = “female” | C2) = 2/6 = 0.333

Page 12: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naive Bayesian

• P(Age = “old” | C1) = 4/6 = 0.667 • P(Age = “young” | C1) = 2/6 = 0.333 • P(Age = “old” | C2) = 0/6 = 0• P(Age = “young” | C2) = 6/6 = 1

• P(Housing = “yes” | C1) = 1/6 = 0.167 • P(Housing = “no” | C1) = 5/6 = 0.833 • P(Housing = “yes” | C2) = 4/6 = 0.667• P(Housing = “no” | C2) = 2/6 = 0.333

Page 13: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naive Bayesian

• P(Employed = “yes” | C1) = 4/6 = 0.667 • P(Employed = “no” | C1) = 2/6 = 0.333 • P(Employed = “yes” | C2) = 1/6 = 0.167 • P(Employed = “no” | C2) = 5/6 = 0.833

– For the first testing data:X1 = (Sex = “female”, Age = “young”, Housing = “yes”, Employed = “yes”)

P(X1|C1) = 0.333×0.333×0.167×0.667 = 0.012 P(X1|C2) = 0.333×1×0.667×0.167 = 0.037P(X1|C1) * P(C1) = 0.012 * 0,5 = 0.006P(X1|C2) * P(C2) = 0.037 * 0,5 = 0.019 > P(X|C1) * P(C1)

So X1 belongs to C2 (Approved=no)

Page 14: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naive Bayesian

• For the remaining testing data, repeat the same procedure.

• So the error rate is 2/3 = 0.667

id sex age housing employed Approved(actual) prediction

13 female young yes yes yes no

14 male young yes yes yes no

15 female young yes no no no

Page 15: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Bayesian Network

• Question:• Smoking is prohibited on High-Speed trains. If

someone smokes, the alarm may sound, and also other passengers may report it to the police. If the police hear the alarm or get the report, he will, very possibly, come and arrest the smoker.

• This can be modeled in the following Bayes network:

Page 16: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Bayesian Network

Smoking

ReportAlarm

Police comes

The police comes if he believes there is someone smoking. They don’t trust the alarm very much and they may, rarely, patrol on the train.

S P(A=F) P(A=T)

T 0.4 0.6

F 0.8 0.2

A R P(P=T) P(P=F)

T T 0.8 0.2

F T 0.6 0.4

T F 0.4 0.6

F F 0.01 0.99

S P(R=F) P(R=T)

T 0.6 0.4

F 0.9 0.1

The alarm is not accurate enough, it ignores some smoking and sometimes sound for nothing.

Not every passenger want to report smokers and some passengers make mistakes.(The alarm does not affect passengers.)

Page 17: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Bayesian Network

Suppose the probability of someone smoking is 0.5, what is the probability of the police comes?• Answer:

– P(S=T)=0.5 and P(S=F)=0.5,

– The alarm sounds: • P(A) = P(A|S)*P(S) + P(A|¬S)*P(¬S) = 0.4

– Passengers report:• P(R) = P(R|S)*P(S) + P(R|¬S)*P(¬S) = 0.25

– Police comes:P(P) = P(P|AR) * P(A) * P(R) + P(P|A¬R) * P(A) * P(¬R) +

P(P|¬AR) * P(¬A) * P(R) + P(P|¬A¬R) * P(¬A) * P(¬R)= 0.08 + 0.12 + 0.09 + 0.0045 = 0.2945

Page 18: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naïve Neural Network

• Question: • Given a perceptron, the training samples are given in the

table below.

• In addition, the initial weights are also given: W0=0.5, W1=0.4, W2=0.5. The learning rate α is 0.2. Please use the sample data as training data and update W0, W1, and W2.

Page 19: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naïve Neural Network

• Answer:Step 1: – a = = -0.5 +0.4*0 + 0.5*0 = -0.5 < 0– y = 0 = T1, so no need to change weights.

Step 2: – a = = -0.5 +0.4*0 + 0.5*1 = 0– y = 1 = T2, so no need to change weights.

Page 20: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naïve Neural Network

• Step 3: – a = = -0.5 +0.4*1 + 0.5*1 = -0.1 < 0– y = 0 ≠ T3,

– ∆w0 = α (t-y) x0 = 0.2 * 1 * (-1) = -0.2

– ∆w1 = α (t-y) x1 = 0.2 * 1 * 1 = 0.2

– ∆w2 = α (t-y) x2 = 0.2 * 1 * 0 =0

• Thus, – w0 = w0 + ∆w0 =0.5 – 0.2 =0.3

– w1 = w1 + ∆w1 = 0.4 + 0.2 = 0.6

– w2 = w2 + ∆w2 = 0.5

Page 21: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Naïve Neural Network

• Step 4– a = = -0.3 +0.6*1 + 0.5*1 = 0.8 > 0– Y = 1 = T4 , no need to change weights.

• So, the final weights are: – w0 =0.3

– w1 =0.6

– w2 =0.5

Page 22: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Multilayer Neural Network

• Given the following neural network with initialized weights as in the picture(next page), we are trying to distinguish between nails and screws and an example of training tuples is as follows: – T1{0.6, 0.1, nail}– T2{0.2, 0.3, screw}

• Let the learning rate (l) be 0.1. Do the forward propagation of the signals in the network using T1 as input, then perform the back propagation of the error. Show the changes of the weights. Given the new updated weights with T1, use T2 as input, show whether the predication is correct or not.

Page 23: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Multilayer Neural Network

Page 24: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Multilayer Neural Network

• Answer:• First, use T1 as input and then perform the

back propagation.– At Unit 3:• a3 =x1w13 +x2w23+θ3 =0.14

• o3 = = 0.535

– Similarly, at Unit 4,5,6: • a4 = 0.22, o4 = 0.555

• a5 = 0.64, o5 = 0.655

• a6 = 0.1345, o6 = 0.534

Page 25: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Multilayer Neural Network

• Now go back, perform the back propagation, starts at Unit 6:– Err6 = o6 (1- o6) (t- o6) = 0.534 * (1-0.534)*(1-0.534) = 0.116 – ∆w36 = (l) Err6 O3 = 0.1 * 0.116 * 0.535 = 0.0062– w36 = w36 + ∆w36 = -0.394– ∆w46 = (l) Err6 O4 = 0.1 * 0.116 * 0.555 = 0.0064 – w46 = w46 + ∆w46 = 0.1064 – ∆w56 = (l) Err6 O5 = 0.1 * 0.116 * 0.655 = 0.0076 – w56 = w56 + ∆w56 = 0.6076– θ6 = θ6 + (l) Err6 = -0.1 + 0.1 * 0.116 = -0.0884

Page 26: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Multilayer Neural Network• Continue back propagation: • Error at Unit 3:

Err3 = o3 (1- o3) (w36 Err6) = 0.535 * (1-0.535) * (-0.394*0.116) = -0. 0114

w13 = w13 + ∆w13 = w13 + (l) Err3X1 = 0.1 + 0.1*(-0.0114) * 0.6 = 0.09932

w23 = w23 + ∆w23 = w23 + (l) Err3X2 = -0.2 + 0.1*(-0.0114) * 0.1 = -0.2001154

θ3 = θ3 + (l) Err3 = 0.1 + 0.1 * (-0.0114) = 0.09886• Error at Unit 4:

Err4 = o4 (1- o4) (w46 Err6) = 0.555 * (1-0.555) * (-0.1064*0.116) = 0.003

w14 = w14 + ∆w14 = w14 + (l) Err4X1 = 0 + 0.1*(-0.003) * 0.6 = 0.00018

w24 = w24 + ∆w24 = w24 + (l) Err4X2 = 0.2 + 0.1*(-0.003) * 0.1 = 0.20003

θ4 = θ4 + (l) Err4 = 0.2 + 0.1 * (0.003) = 0.2003• Error at Unit 5:

Err5 = o5 (1- o5) (w56 Err6) = 0.655 * (1-0.655) * (-0. 6076*0.116) = 0.016

w15 = w15 + ∆w15 = w15 + (l) Err5X1 = 0.3 + 0.1* 0.016 * 0.6 = 0.30096

w25 = w25 + ∆w25 = w25 + (l) Err5X2 = -0.4 + 0.1*0.016 * 0.1 = -0.39984

θ5= θ5 + (l) Err5 = 0.5 + 0.1 * 0.016 = 0.5016

Page 27: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Multilayer Neural Network

• After T1, the updated values are as follows:

• Now, with the updated values, use T2 as input:– At Unit 3:– a3 = x1w13 + x2w23 + θ3 = 0.0586898

– o3 = = 0.515

Page 28: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Multilayer Neural Network

• Similarly, • a4 = 0.260345, o4 = 0.565

• a5 = 0.441852, o5 = 0.6087

• At Unit 6:– a6 = x3w36 + x4w46 + x5w56 + θ6 = 0.13865

– o6 = = 0.5348

• Since O6 is closer to 1, so the prediction should be nail, different from given “screw”.

• So this predication is NOT correct.

Page 29: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

SVM• Consider the following data points. Please use SVM to train a classifier,

and then classify these data points. Points with ai=1 means this point is support vector. For example, point 1 (1,2) is the support vector, but point 5 (5,9) is not the support vector.

• Training data:

• Testing data:

Page 30: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

SVM

• Question: – (a) Find the decision boundary, show detail

calculation process. – (b) Use the decision boundary you found to

classify the Testing data. Show all calculation process in detail, including the intermediate result and the formula you used.

Page 31: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

SVM• Answer:• a) As the picture shows, P1, P2, P3 are support vectors.

Page 32: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

SVM• Suppose w is (w1,w2). Since both P1(1,2) and P3(0,1) have y = 1, while

P2(2,1) has y =-1:– w1*1+w2*2+b = 1

– w1*0+w2*1+b = 1

– w1*2+w2*1+b =-1

w1= -1, w2 = 1, b = 0

then, the decision boundary is: – w1 * x1+w2 * x2 + b =0

-x1+x2 = 0

• Showed in the picture next page.

Page 33: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

SVM

Page 34: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

SVM

• b) Use the decision boundary to classify the testing data:– For the point P9 (2,5)

-x1+x2 = -2+5 = 3 >= 1

So we choose y = 1– For the point P10 (7,2)

-x1+x2 = -7+2 = -5 <= -1

So we choose y = -1 • Showed in the picture next page.

Page 35: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

SVM

Page 36: Examples of classification methods CSIT5210. Content KNN Decision Tree Naïve Bayesian Bayesian Belief Network Naïve Neural Network Multilayer Neural Network

Q&A