machine learning: algorithms and applications › ~zini › ml › slides › ml_2012_mockup... ·...

of 8.

Machine Learning: Algorithms and Applications Mockup Examination

14 May 2012

Instructions for students Write First Name, Last Name, Student Number and Signature where indicated. If not, the examination cannot be marked. Use a pen, not a pencil. Write neatly and clearly. Student Code Ethics Students are expected to maintain the highest standards of academic integrity. Work that is not of the student's own creation will receive no credit. Remember that you cannot give or receiving unauthorized aid on any assignment, quiz, or exam. A student cannot use the ideas of another and declares it as his or her own. Students are required to properly cite the original source of the ideas and information used in his or her work.

FIRST NAME LAST NAME

STUDENT NUMBER SIGNATURE

of 8.

Question 1 (1 point) For which problems is Machine Learning suitable? Explain in a few lines.

Solution Machine Learning is suitable for problems with (some of) the following characteristics:

• There is no (or it is very difficult to define) an algorithm that solves them; • Human expertise does not exist or human are unable to explain their expertise (therefore it

is extremely difficult to build an algorithm based on this expertise); • There is abundant data at low cost. Such data can be mined in order to find regularities,

patterns, or structures (a model of the problem). The model can be used to predict the solution of new instances of the problem.

• There is an agent that needs to learn how to maximize its performance at a task by interacting with an (partially) unknown environment, which gives the agent a feedback about how good is the current performance.

Question 2 (2 point) Make 1 example of learning tasks:

• Informally describe the task in few lines in English; • Describe the task more formally focusing on the task T, the performance measure P, and

the training experience E; • Propose a target function F be learned.

Solution Any of the example tasks presented in slides ml_2012_lecture_01 is a good example. Question 3 (1 point) Classification and regression are two types of supervised learning. Explain the main differences between them. Solution In short: in classification the variable to be predicted is categorical. In regression the variable to be predicted is numeric and continuous. Question 4 (1 point) Suppose that:

• H is a set of possible hypotheses; • X is a training data set; • h*∈H is the most probable hypothesis given the data X.

Under what conditions does the following equality hold? How is h* called if the equality holds?

h* ! argmaxh " H

P(h | X) = argmaxh " H

P(X | h)

Solution The equality holds in the all the hypotheses in H are equally probable a priori (without taking into account the content of the training set X). When the equality hold h* is called the maximum likelihood hypothesis.

of 8.

Question 5 (2 points) Consider the following training data set X, containing commercial transactions performed by some people. Each training example is represented by four attributes:

• Age, with possible values of {Young, Medium, Old}; • Income, with possible values of {Low, Medium, High}; • CreditRating, with possible values of {Fair, Excellent}; and • BuyComputer – the classification attribute. It is Yes if the person bought, No if the person did

not buy.

Record ID Age Income CreditRating BuyComputer 1 Young High Fair No 2 Young High Excellent No 3 Medium High Fair Yes 4 Old Medium Fair Yes 5 Old Low Fair Yes 6 Old Low Excellent No 7 Medium Low Excellent Yes 8 Young Medium Fair No 9 Young Low Fair Yes

10 Old Medium Fair Yes 11 Young Medium Excellent Yes 12 Medium Medium Excellent Yes 13 Medium High Fair Yes 14 Old Medium Excellent No 15 Medium Medium Excellent Yes

Using the Naïve Bayes classification approach, compute in detail the prediction (i.e., buy a computer or not) for a Young person with Medium income and Fair credit rating.

Solution Let ! = !"# = !"#$%, !"#$%& = !"#$%&,!"#$%&'(&%)* = !"#$ be the instance to be predicted. Let !! ≝ !"#$%&'"()* = !"# and !! ≝ !"#$%&'"()* = !"

ℎ!" = argmax!∈ !1,!2 ! ℎ ∗ ! !! ℎ =!

!!!

= argmax!∈ !1,!2 ! ℎ ∗ ! !"# = !"#$% ℎ ∗ ! !"#$%& = !"#$%& ℎ ∗ !(!"#$%&'(&%)* = !"#$|ℎ) ! !1 = !"

!"= !

! ! !2 = !

!"= !

!

! !"# = !"#$% !1 = 2/10 = 1/5 !(!"# = !"#$%|!2) = 3/5 ! !"#$%& = !"#$%& !1 = 5/10 = 1/2 !(!"#$%& = !"#$%&|!2) = 2/5 ! !"#$%&'(&%)* = !"#$ !1 = 6/10 = 3/5 !(!"#$%&'(&%)* = !"#$|!2) = 2/5 ! !1 ∗ ! !"# = !"#$% !1 ∗ ! !"#$%& = !"#$%& !1 ∗ ! !"#$%&'(&%)* = !"#$ !1

= 2/3 ∗ 1/5 ∗ 1/2 ∗ 3/5 = 1/25

of 8.

! !1 ∗ ! !"# = !"#$% !2 ∗ ! !"#$%& = !"#$%& !2 ∗ ! !"#$%&'(&%)* = !"#$ !2= 1/3 ∗ 3/5 ∗ 2/5 ∗ 2/5 = 4/125

ℎ!"(!) = !!

Question 6 (2 points)

Consider the problem of plant classification. The training set X is made of plant records. Each plant record (i.e., training example) is represented by 5 attributes:

• SepalLength – the plant’s sepal length in cm. • SepalWidth – the plant’s sepal width in cm. • PetalLength – the plant’s petal length in cm. • PetalWidth – the plant’s petal width in cm.

• Class – the classification attribute, with the possible values {Iris-setosa, Iris-versicolor, Iris-virginica}.

PlantID SepalLength SepalWidth PetalLength PetalWidth Class

1 5.1 3.5 1.4 0.2 Iris-setosa 2 7.1 3.0 5.9 2.1 Iris-virginica 3 5.4 3.4 1.5 0.4 Iris-setosa 4 6.4 3.2 4.5 1.5 Iris-versicolor 5 6.3 3.3 4.7 1.6 Iris-versicolor 6 7.3 2.9 6.3 1.8 Iris-virginica 7 4.4 2.9 1.4 0.2 Iris-setosa 8 4.9 3.1 1.5 0.1 Iris-setosa 9 5.8 2.8 5.1 2.4 Iris-virginica

10 5.6 2.9 3.6 1.3 Iris-versicolor 11 6.9 3.2 5.7 2.3 Iris-virginica 12 6.0 3.4 4.5 1.6 Iris-versicolor 13 7.2 3.0 5.8 1.6 Iris-virginica 14 4.8 3.4 1.9 0.2 Iris-setosa 15 6.8 2.8 4.8 1.4 Iris-versicolor

Predict the class of the following plant

• Plant #16: (SepalLength=6.1; SepalWidth=2.8; PetalLength=4.0; PetalWidth=1.3).

by applying the K Nearest Neighbor learning algorithm. Use k=3 and Manhattan distance. Assume that the values of the attributes are standardized.

Solution See mockup_exam_nn_solution.xlsx

of 8.

Question 7 (1 point) How can the SVM learning approach be used for datasets that are not linearly separable?

Solution Data an be not linearly separable for two reasons:

1. Noise. With noisy data, the constraints may not be satisfied and the SVM algorithm may not be able to find a solution because the constraints of the minimization problem are not satisfied. The solution is to relax the margin constraints and introduce slack variables. See ml_2012_lecture_08 from page 9 for details.

2. Data is intrinsically not linearly separable. The input data is transformed into another (higher dimensional) space so that a linear decision boundary can separate positive and negative examples in the transformed space. The “kernel” trick is used to avoid a specific transformation and make the optimization problem computationally tractable. See ml_2012_lecture_08 from page 14 for details.

Question 8 (1 point)

Which are the main weaknesses of the k-means clustering algorithm?

Solution

• k has to be selected a priori; • it is sensitive to initial seeds; • it is sensitive to outliers; • it may fail in finding clusters of arbitrary shapes • the mean of data points should be calculable

Question 9 (1 point) Explain how to evaluate a classifier’s performance using k-fold cross-validation.

Solution k-fold cross validation is explained in ml_2012_lecture_06 on page 5. Question 10 (1.5 point) Consider a binary classification problem. In the training set there are 100 instances of class c1 and 80 instances of class c2. Suppose that, for the learned classifier, we have the following confusion matrix related to class c1.

class c1 The learned classifier’s

classification

c1 c2 Total

The true classification

c1 90 10 100

c2 20 60 80

Give a definition of precision, recall, and F-measure wrt class c1 and preform the calculation of the 3 evaluation metrics.

of 8.

Solution The explanation of precision, recall, and F-measure is in ml_2012_lecture_06 from page 9.

!"#$ !! =!"!

!"! + !"!=

9090 + 20

=911

= 0.82

!"# !! =!"!

!"! + !"!=

9090 + 10

=910

= 0.9

! !! =2 ∗ !"#$ !! ∗ !"#(!!)!"#$ !! + !"#(!!)

=2 ∗ 0.82 ∗ 090.82 + 0.9

= 0.86

Question 11 (1 point) Discuss whether there is the survival of the fittest in generational GAs.

Solution In generational GAs the entire population is replaced by the offspring at each generation. Therefore, the fittest individual at generation t survives at generation t+1 if:

• It is selected for mating • It is not “destroyed” by crossover and mutation

Assuming fitness proportionate selection, the number of copies in the mating pool of the fittest is as higher as the fittest is outstanding in the population (its fitness is much higher than all other finesses). At least one of these copies should not be “destroyed” by crossover and mutation. This depends on the probabilities of crossover and mutation.

Question 12 (1 point) Describe the main steps and the main genetic operators of the Genetic Algorithms learning approach.

Solution The general schema of a Evolutionary algorithm is described in ml_2012_lecture_04 on page 5. This schema can be adopted for a Genetic Algorithm. The main genetic operators of a genetic algorithm are selection, crossover, mad mutation. See ml_2012_lecture_04 for more details.

Question 13 (1.5 point) Design a two-input perceptron that implements the Boolean function A∨¬B.

Solution The requested perceptron has 3 inputs: A, B, and the constant 1. The values of A and B are 1 (true) or -1 (false). The following table describes the output O of the perceptron:

A B O = A∨¬B -1 -1 1 -1 1 -1 1 -1 1 1 1 1

of 8.

One of the correct decision surfaces (any line that separates the positive point from the negative points would be fine) is shown in the following picture.

The line crosses the A axis at -1 and the B axis at 1. The equation of the line is

(A-0)/(-1-0)=(B-1))/(0-1)) → -A= -B+1 → 1+A-B=0

So 1, 1, -1 are possible values for the weights w0, w1, and w2, respectively. Using this values the output of the perceptron for A=0, B=0 is positive. Therefore, the sign of the weights is corrects and therefore we can conclude that w0=1,w1=1,w2=-1.

Question 13 (1 point) Consider the three linearly separable two-dimensional input vectors in the following figure. Find the linear SVM that optimally separates the classes by maximizing the margin.

Solution All three data points are support vectors. The margin hyperplan H+ is the line passing through the two positive points. The margin hyperplan H- is the line passing through the negative point that is parallel to H+. The decision boundary is the red line “half way” between H+ and H-. The equation of the decision boundary is –x+2=0. The following picture illustrates the solution:

machine learning: algorithms and applications › ~zini › ml › slides › ml_2012_mockup... ·...

Documents