machine learning for computer vision part 2

36
Machine Learning Extra : 1 BMVA Summer School 2014 The bits the whirlwind tour left out ... BMVA Summer School 2014 – extra background slides

Upload: potaters

Post on 26-Jun-2015

76 views

Category:

Science


1 download

DESCRIPTION

From BMVA summer school 2014 Watch the lecture here: https://www.youtube.com/watch?v=G6hf6YbPA_s

TRANSCRIPT

Page 1: Machine learning for computer vision part 2

Machine Learning Extra : 1BMVA Summer School 2014

The bits the whirlwind tour left out ...

BMVA Summer School 2014 – extra background slides

Page 2: Machine learning for computer vision part 2

Machine Learning Extra : 2BMVA Summer School 2014

Machine Learning

Definition:

– “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, improves with experience E.”

[Mitchell, 1997]

Page 3: Machine learning for computer vision part 2

Machine Learning Extra : 3BMVA Summer School 2014

Algorithm to construct decision trees ….

Page 4: Machine learning for computer vision part 2

Machine Learning Extra : 4BMVA Summer School 2014

Building Decision Trees – ID3 node = root of tree

Main loop:A = “best” decision attribute for next node

.....

But which attribute is best to split on ?

Page 5: Machine learning for computer vision part 2

Machine Learning Extra : 5BMVA Summer School 2014

Entropy in machine learning

Entropy : a measure of impurity

– S is a sample of training examples

– P is the proportion of positive examples in S

– P⊖ is the proportion of negative examples in S

Entropy measures the impurity of S:

Page 6: Machine learning for computer vision part 2

Machine Learning Extra : 6BMVA Summer School 2014

Information Gain – reduction in Entropy

Gain(S,A) = expected reduction in entropy due to splitting on attribute A– i.e. expected reduction in impurity in the data

– (improvement in consistent data sorting)

Page 7: Machine learning for computer vision part 2

Machine Learning Extra : 7BMVA Summer School 2014

Information Gain – reduction in Entropy

– reduction in entropy in set of examples S if split on attribute A

– Sv = subset of S for which attribute A has value v

– Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A)

Page 8: Machine learning for computer vision part 2

Machine Learning Extra : 8BMVA Summer School 2014

Information Gain – reduction in Entropy

Information Gain : – “information provided about the target function given the value of

some attribute A”

– How well does A sort the data into the required classes?

Generalise to c classes :– (not just or ⊖)

Entropy S =−∑i=1

c

pi log pi

Page 9: Machine learning for computer vision part 2

Machine Learning Extra : 9BMVA Summer School 2014

Building Decision Trees Selecting the Next Attribute

– which attribute should we split on next?

Page 10: Machine learning for computer vision part 2

Machine Learning Extra : 10BMVA Summer School 2014

Building Decision Trees Selecting the Next Attribute

– which attribute should we split on next?

Page 11: Machine learning for computer vision part 2

Machine Learning Extra : 11BMVA Summer School 2014

Backpropogation Algorithm ….

Page 12: Machine learning for computer vision part 2

Machine Learning Extra : 12BMVA Summer School 2014

Backpropagation AlgorithmAssume we have:

– input examples d={1...D}

• each is pair {xd,t

d} = {input

vector, target vector}

– node index n={1 … N}

– weight wji connects node j → i

– input xji is the input on the

connection node j → i

• corresponding weight = wji

– output error for node n is δn

• similar to (o – t)

Output Layer

Input layer

Input, x

Output vector, Ok

Hidden Layer

node

inde

x {1

… N

}

Page 13: Machine learning for computer vision part 2

Machine Learning Extra : 13BMVA Summer School 2014

Backpropagation Algorithm(1) Input Example

example d

(2) output layer error based on :

difference between output and target

(t - o)

derivative of sigmoid function

(3) Hidden layer errorproportional to node contribution to output error

(4) Update weights wij

Page 14: Machine learning for computer vision part 2

Machine Learning Extra : 14BMVA Summer School 2014

Backpropagation

Termination criteria– number of iterations

reached

– Or error below suitable bound

Output layer error

Hidden layer error

Add weights updated using relevant error

Page 15: Machine learning for computer vision part 2

Machine Learning Extra : 15BMVA Summer School 2014

Backpropagation

Output Layer, unit k

Input layer

Input, x

Output vector, Ok

Hidden Layer, unit h

Page 16: Machine learning for computer vision part 2

Machine Learning Extra : 16BMVA Summer School 2014

BackpropagationOutput vector, O

k

δh is expressed as a weighted

sum of the output layer errors δk

to which it contributes (i.e. whk

> 0)

Output Layer, unit k

Input layer

Input, x

Hidden Layer, unit h

Page 17: Machine learning for computer vision part 2

Machine Learning Extra : 17BMVA Summer School 2014

Backpropagation

Error is propogated backwards from network output ....to weights of output layer....to weights of the hidden layer…

Hence the name: backpropagation

Output Layer, unit k

Input layer

Input, x

Hidden Layer, unit h

Output vector, Ok

Page 18: Machine learning for computer vision part 2

Machine Learning Extra : 18BMVA Summer School 2014

Backpropagation

Repeat these stages for every hidden layer in a multi-layer network:(using error δ

i where x

ji>0)

.......

Output Layer, unit k

Input layer

Input, x

Hidden Layer(s),

unit h

Output vector, Ok

Page 19: Machine learning for computer vision part 2

Machine Learning Extra : 19BMVA Summer School 2014

Backpropagation

Error is propogated backwards from network output ....to weights of output layer....over weights of all N hidden layers…

Hence the name: backpropagation

.......

Output Layer, unit k

Input layer

Input, x

Hidden Layer(s),

unit h

Output vector, Ok

Page 20: Machine learning for computer vision part 2

Machine Learning Extra : 20BMVA Summer School 2014

Backpropagation

Will perform gradient descent over the weight space of {w

ji} for all

connections i → j in the network

Stochastic gradient descent– as updates based on

training one sample at a time

Page 21: Machine learning for computer vision part 2

Machine Learning Extra : 21BMVA Summer School 2014

Understanding (and believing) the SVM stuff ….

Page 22: Machine learning for computer vision part 2

Machine Learning Extra : 22BMVA Summer School 2014

Remedial Note: equations of 2D lines

Line:

where:

are 2D vectors.

Offset from origin

Normal to line

2D LINES REMINDER

Page 23: Machine learning for computer vision part 2

Machine Learning Extra : 23BMVA Summer School 2014

Remedial Note: equations of 2D lines

http://www.mathopenref.com/coordpointdisttrig.html

2D LINES REMINDER

Page 24: Machine learning for computer vision part 2

Machine Learning Extra : 24BMVA Summer School 2014

Remedial Note: equations of 2D lines

For a defined line equation:Fixed Insert point into equation …...

Normal to line

Result is +ve if point on this side of line (i.e.> 0).

Result is -ve if point on this side of line. ( < 0 )

Result is the distance (+ve or -ve) of point from line given by:

for:

2D LINES REMINDER

Page 25: Machine learning for computer vision part 2

Machine Learning Extra : 25BMVA Summer School 2014

Linear Separator Instances (i.e, examples) {x

i , y

i }

– xi = point in instance space

(Rn) made

up of n attributes

– yi =class value for classification of x

i

Want a linear separator. Can view this as constraint satisfaction problem:

Equivalently,

y = +1

y = -1

Classification of example function f(x) = y = {+1, -1} i.e. 2 classes

N.B. we have a vector of weights coefficients w⃗

Page 26: Machine learning for computer vision part 2

Machine Learning Extra : 26BMVA Summer School 2014

Linear Separator

If we define the distance of the nearest point to the margin as 1

→ width of margin is

(i.e. equal width each side)

We thus want to maximize:

finding the parameters:

y = +1

y = -1

Classification of example function f(x) = y = {+1, -1} i.e. 2 classes

Page 27: Machine learning for computer vision part 2

Machine Learning Extra : 27BMVA Summer School 2014

which is equivalent to minimizing:

Page 28: Machine learning for computer vision part 2

Machine Learning Extra : 28BMVA Summer School 2014

…............. back to main slides

Page 29: Machine learning for computer vision part 2

Machine Learning Extra : 29BMVA Summer School 2014

So ….

Find the “hyperplane” (i.e. boundary) with:

a) maximum margin

b) minimum number of (training) examples on the wrong side of the chosen boundary

(i.e. minimal penalties due to C)

Solve via optimization (in polynomial time/complexity)

Page 30: Machine learning for computer vision part 2

Machine Learning Extra : 30BMVA Summer School 2014

Find hyperplane separator (plane in 3D) via optimization

Non-linear Separation (red / blue data itemson 2D plane).

Kernel projection to higher dimensional space

Non-linear boundary in original dimension (e.g. circle n 2D) defined by planar boundary (cut) in 3D.

Example:

Page 31: Machine learning for computer vision part 2

Machine Learning Extra : 31BMVA Summer School 2014

.... but it is all about the data!

Page 32: Machine learning for computer vision part 2

Machine Learning Extra : 32BMVA Summer School 2014

Desirable Data Properties

Machine learning is a Data Driven ApproachThe Data is important! Ideally training/testing data used for learning must be:

– Unbiased• towards any given subset of the space of examples ...

– Representative• of the “real-world” data to be encountered in use/deployment

– Accurate• inaccuracies in training/testing produce inaccuracies results

– Available• the more training/testing data available the better the results

• greater confidence in the results can be achieved

Page 33: Machine learning for computer vision part 2

Machine Learning Extra : 33BMVA Summer School 2014

Data Training Methodologies

Simple approach : Data Splits

– split overall data set into separate training and test sets• No established rule but 80%:20%, 70%:30% or ⅓:⅔ training to testing

splits common

– Training on one, test on the other

– Test error = error on the test set

– Training error = error on training set

– Weakness: susceptible to bias in data sets or “over-fitting”• Also less data available for training

Page 34: Machine learning for computer vision part 2

Machine Learning Extra : 34BMVA Summer School 2014

Data Training Methodologies

More advanced (and robust): K Cross Validation

– Randomly split (all) the data into k-subsets

– For 1 to k

• train using all the data not in kth subset

• test resulting learned [classifier|function …] using kth subset

– report mean error over all k tests

Page 35: Machine learning for computer vision part 2

Machine Learning Extra : 35BMVA Summer School 2014

Key Summary Statistics #1

tp = true positive / tn = true negative

fp = false positive / fn = false negative

Often quoted or plotted when comparing ML techniques

Page 36: Machine learning for computer vision part 2

Machine Learning Extra : 36BMVA Summer School 2014

Kappa Statistic

Measure of classification of “N items into C mutually exclusive categories”

Pr(a) = probability of success of classification ( = accuracy)Pr(e) = probability of success due to chance

– e.g. 2 categories = 50% (0.5), 3 categories = 33% (0.33) ….. etc.

– Pr(e) can be replaced with Pr(b) to measure agreement between classifiers/techniques a and b

[Cohen, 1960]