machine learning for computer vision part 2

Post on 26-Jun-2015

77 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

From BMVA summer school 2014 Watch the lecture here: https://www.youtube.com/watch?v=G6hf6YbPA_s

TRANSCRIPT

Machine Learning Extra : 1BMVA Summer School 2014

The bits the whirlwind tour left out ...

BMVA Summer School 2014 – extra background slides

Machine Learning Extra : 2BMVA Summer School 2014

Machine Learning

Definition:

– “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, improves with experience E.”

[Mitchell, 1997]

Machine Learning Extra : 3BMVA Summer School 2014

Algorithm to construct decision trees ….

Machine Learning Extra : 4BMVA Summer School 2014

Building Decision Trees – ID3 node = root of tree

Main loop:A = “best” decision attribute for next node

.....

But which attribute is best to split on ?

Machine Learning Extra : 5BMVA Summer School 2014

Entropy in machine learning

Entropy : a measure of impurity

– S is a sample of training examples

– P is the proportion of positive examples in S

– P⊖ is the proportion of negative examples in S

Entropy measures the impurity of S:

Machine Learning Extra : 6BMVA Summer School 2014

Information Gain – reduction in Entropy

Gain(S,A) = expected reduction in entropy due to splitting on attribute A– i.e. expected reduction in impurity in the data

– (improvement in consistent data sorting)

Machine Learning Extra : 7BMVA Summer School 2014

Information Gain – reduction in Entropy

– reduction in entropy in set of examples S if split on attribute A

– Sv = subset of S for which attribute A has value v

– Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A)

Machine Learning Extra : 8BMVA Summer School 2014

Information Gain – reduction in Entropy

Information Gain : – “information provided about the target function given the value of

some attribute A”

– How well does A sort the data into the required classes?

Generalise to c classes :– (not just or ⊖)

Entropy S =−∑i=1

c

pi log pi

Machine Learning Extra : 9BMVA Summer School 2014

Building Decision Trees Selecting the Next Attribute

– which attribute should we split on next?

Machine Learning Extra : 10BMVA Summer School 2014

Building Decision Trees Selecting the Next Attribute

– which attribute should we split on next?

Machine Learning Extra : 11BMVA Summer School 2014

Backpropogation Algorithm ….

Machine Learning Extra : 12BMVA Summer School 2014

Backpropagation AlgorithmAssume we have:

– input examples d={1...D}

• each is pair {xd,t

d} = {input

vector, target vector}

– node index n={1 … N}

– weight wji connects node j → i

– input xji is the input on the

connection node j → i

• corresponding weight = wji

– output error for node n is δn

• similar to (o – t)

Output Layer

Input layer

Input, x

Output vector, Ok

Hidden Layer

node

inde

x {1

… N

}

Machine Learning Extra : 13BMVA Summer School 2014

Backpropagation Algorithm(1) Input Example

example d

(2) output layer error based on :

difference between output and target

(t - o)

derivative of sigmoid function

(3) Hidden layer errorproportional to node contribution to output error

(4) Update weights wij

Machine Learning Extra : 14BMVA Summer School 2014

Backpropagation

Termination criteria– number of iterations

reached

– Or error below suitable bound

Output layer error

Hidden layer error

Add weights updated using relevant error

Machine Learning Extra : 15BMVA Summer School 2014

Backpropagation

Output Layer, unit k

Input layer

Input, x

Output vector, Ok

Hidden Layer, unit h

Machine Learning Extra : 16BMVA Summer School 2014

BackpropagationOutput vector, O

k

δh is expressed as a weighted

sum of the output layer errors δk

to which it contributes (i.e. whk

> 0)

Output Layer, unit k

Input layer

Input, x

Hidden Layer, unit h

Machine Learning Extra : 17BMVA Summer School 2014

Backpropagation

Error is propogated backwards from network output ....to weights of output layer....to weights of the hidden layer…

Hence the name: backpropagation

Output Layer, unit k

Input layer

Input, x

Hidden Layer, unit h

Output vector, Ok

Machine Learning Extra : 18BMVA Summer School 2014

Backpropagation

Repeat these stages for every hidden layer in a multi-layer network:(using error δ

i where x

ji>0)

.......

Output Layer, unit k

Input layer

Input, x

Hidden Layer(s),

unit h

Output vector, Ok

Machine Learning Extra : 19BMVA Summer School 2014

Backpropagation

Error is propogated backwards from network output ....to weights of output layer....over weights of all N hidden layers…

Hence the name: backpropagation

.......

Output Layer, unit k

Input layer

Input, x

Hidden Layer(s),

unit h

Output vector, Ok

Machine Learning Extra : 20BMVA Summer School 2014

Backpropagation

Will perform gradient descent over the weight space of {w

ji} for all

connections i → j in the network

Stochastic gradient descent– as updates based on

training one sample at a time

Machine Learning Extra : 21BMVA Summer School 2014

Understanding (and believing) the SVM stuff ….

Machine Learning Extra : 22BMVA Summer School 2014

Remedial Note: equations of 2D lines

Line:

where:

are 2D vectors.

Offset from origin

Normal to line

2D LINES REMINDER

Machine Learning Extra : 23BMVA Summer School 2014

Remedial Note: equations of 2D lines

http://www.mathopenref.com/coordpointdisttrig.html

2D LINES REMINDER

Machine Learning Extra : 24BMVA Summer School 2014

Remedial Note: equations of 2D lines

For a defined line equation:Fixed Insert point into equation …...

Normal to line

Result is +ve if point on this side of line (i.e.> 0).

Result is -ve if point on this side of line. ( < 0 )

Result is the distance (+ve or -ve) of point from line given by:

for:

2D LINES REMINDER

Machine Learning Extra : 25BMVA Summer School 2014

Linear Separator Instances (i.e, examples) {x

i , y

i }

– xi = point in instance space

(Rn) made

up of n attributes

– yi =class value for classification of x

i

Want a linear separator. Can view this as constraint satisfaction problem:

Equivalently,

y = +1

y = -1

Classification of example function f(x) = y = {+1, -1} i.e. 2 classes

N.B. we have a vector of weights coefficients w⃗

Machine Learning Extra : 26BMVA Summer School 2014

Linear Separator

If we define the distance of the nearest point to the margin as 1

→ width of margin is

(i.e. equal width each side)

We thus want to maximize:

finding the parameters:

y = +1

y = -1

Classification of example function f(x) = y = {+1, -1} i.e. 2 classes

Machine Learning Extra : 27BMVA Summer School 2014

which is equivalent to minimizing:

Machine Learning Extra : 28BMVA Summer School 2014

…............. back to main slides

Machine Learning Extra : 29BMVA Summer School 2014

So ….

Find the “hyperplane” (i.e. boundary) with:

a) maximum margin

b) minimum number of (training) examples on the wrong side of the chosen boundary

(i.e. minimal penalties due to C)

Solve via optimization (in polynomial time/complexity)

Machine Learning Extra : 30BMVA Summer School 2014

Find hyperplane separator (plane in 3D) via optimization

Non-linear Separation (red / blue data itemson 2D plane).

Kernel projection to higher dimensional space

Non-linear boundary in original dimension (e.g. circle n 2D) defined by planar boundary (cut) in 3D.

Example:

Machine Learning Extra : 31BMVA Summer School 2014

.... but it is all about the data!

Machine Learning Extra : 32BMVA Summer School 2014

Desirable Data Properties

Machine learning is a Data Driven ApproachThe Data is important! Ideally training/testing data used for learning must be:

– Unbiased• towards any given subset of the space of examples ...

– Representative• of the “real-world” data to be encountered in use/deployment

– Accurate• inaccuracies in training/testing produce inaccuracies results

– Available• the more training/testing data available the better the results

• greater confidence in the results can be achieved

Machine Learning Extra : 33BMVA Summer School 2014

Data Training Methodologies

Simple approach : Data Splits

– split overall data set into separate training and test sets• No established rule but 80%:20%, 70%:30% or ⅓:⅔ training to testing

splits common

– Training on one, test on the other

– Test error = error on the test set

– Training error = error on training set

– Weakness: susceptible to bias in data sets or “over-fitting”• Also less data available for training

Machine Learning Extra : 34BMVA Summer School 2014

Data Training Methodologies

More advanced (and robust): K Cross Validation

– Randomly split (all) the data into k-subsets

– For 1 to k

• train using all the data not in kth subset

• test resulting learned [classifier|function …] using kth subset

– report mean error over all k tests

Machine Learning Extra : 35BMVA Summer School 2014

Key Summary Statistics #1

tp = true positive / tn = true negative

fp = false positive / fn = false negative

Often quoted or plotted when comparing ML techniques

Machine Learning Extra : 36BMVA Summer School 2014

Kappa Statistic

Measure of classification of “N items into C mutually exclusive categories”

Pr(a) = probability of success of classification ( = accuracy)Pr(e) = probability of success due to chance

– e.g. 2 categories = 50% (0.5), 3 categories = 33% (0.33) ….. etc.

– Pr(e) can be replaced with Pr(b) to measure agreement between classifiers/techniques a and b

[Cohen, 1960]

top related