machine learning for computer vision part 2

Machine Learning Extra : 1BMVA Summer School 2014

The bits the whirlwind tour left out ...

BMVA Summer School 2014 – extra background slides

Machine Learning

Definition:

– “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, improves with experience E.”

[Mitchell, 1997]

Algorithm to construct decision trees ….

Building Decision Trees – ID3 node = root of tree

Main loop:A = “best” decision attribute for next node

But which attribute is best to split on ?

Entropy in machine learning

Entropy : a measure of impurity

– S is a sample of training examples

– P is the proportion of positive examples in S

– P⊖ is the proportion of negative examples in S

Entropy measures the impurity of S:

Information Gain – reduction in Entropy

Gain(S,A) = expected reduction in entropy due to splitting on attribute A– i.e. expected reduction in impurity in the data

– (improvement in consistent data sorting)

– reduction in entropy in set of examples S if split on attribute A

– Sv = subset of S for which attribute A has value v

– Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A)

Information Gain : – “information provided about the target function given the value of

some attribute A”

– How well does A sort the data into the required classes?

Generalise to c classes :– (not just or ⊖)

Entropy S =−∑i=1

pi log pi

Building Decision Trees Selecting the Next Attribute

– which attribute should we split on next?

Building Decision Trees Selecting the Next Attribute

– which attribute should we split on next?

Backpropogation Algorithm ….

Backpropagation AlgorithmAssume we have:

– input examples d={1...D}

• each is pair {xd,t

d} = {input

vector, target vector}

– node index n={1 … N}

– weight wji connects node j → i

– input xji is the input on the

connection node j → i

• corresponding weight = wji

– output error for node n is δn

• similar to (o – t)

Output Layer

Input layer

Input, x

Output vector, Ok

Hidden Layer

Backpropagation Algorithm(1) Input Example

example d

(2) output layer error based on :

difference between output and target

(t - o)

derivative of sigmoid function

(3) Hidden layer errorproportional to node contribution to output error

(4) Update weights wij

Backpropagation

Termination criteria– number of iterations

reached

– Or error below suitable bound

Output layer error

Hidden layer error

Add weights updated using relevant error

Backpropagation

Output Layer, unit k

Input layer

Input, x

Output vector, Ok

Hidden Layer, unit h

BackpropagationOutput vector, O

δh is expressed as a weighted

sum of the output layer errors δk

to which it contributes (i.e. whk

Input layer

Input, x

Backpropagation

Error is propogated backwards from network output ....to weights of output layer....to weights of the hidden layer…

Hence the name: backpropagation

Input layer

Input, x

Output vector, Ok

Backpropagation

Repeat these stages for every hidden layer in a multi-layer network:(using error δ

i where x

.......

Input layer

Input, x

Hidden Layer(s),

unit h

Output vector, Ok

Backpropagation

Error is propogated backwards from network output ....to weights of output layer....over weights of all N hidden layers…

Hence the name: backpropagation

.......

Input layer

Input, x

Hidden Layer(s),

unit h

Output vector, Ok

Backpropagation

Will perform gradient descent over the weight space of {w

ji} for all

connections i → j in the network

Stochastic gradient descent– as updates based on

training one sample at a time

Understanding (and believing) the SVM stuff ….

Remedial Note: equations of 2D lines

where:

are 2D vectors.

Offset from origin

Normal to line

2D LINES REMINDER

http://www.mathopenref.com/coordpointdisttrig.html

2D LINES REMINDER

For a defined line equation:Fixed Insert point into equation …...

Normal to line

Result is +ve if point on this side of line (i.e.> 0).

Result is -ve if point on this side of line. ( < 0 )

Result is the distance (+ve or -ve) of point from line given by:

2D LINES REMINDER

Linear Separator Instances (i.e, examples) {x

– xi = point in instance space

(Rn) made

up of n attributes

– yi =class value for classification of x

Want a linear separator. Can view this as constraint satisfaction problem:

Equivalently,

y = +1

y = -1

Classification of example function f(x) = y = {+1, -1} i.e. 2 classes

N.B. we have a vector of weights coefficients w⃗

Linear Separator

If we define the distance of the nearest point to the margin as 1

→ width of margin is

(i.e. equal width each side)

We thus want to maximize:

finding the parameters:

y = +1

y = -1

Classification of example function f(x) = y = {+1, -1} i.e. 2 classes

which is equivalent to minimizing:

…............. back to main slides

So ….

Find the “hyperplane” (i.e. boundary) with:

a) maximum margin

b) minimum number of (training) examples on the wrong side of the chosen boundary

(i.e. minimal penalties due to C)

Solve via optimization (in polynomial time/complexity)

Find hyperplane separator (plane in 3D) via optimization

Non-linear Separation (red / blue data itemson 2D plane).

Kernel projection to higher dimensional space

Non-linear boundary in original dimension (e.g. circle n 2D) defined by planar boundary (cut) in 3D.

Example:

.... but it is all about the data!

Desirable Data Properties

Machine learning is a Data Driven ApproachThe Data is important! Ideally training/testing data used for learning must be:

– Unbiased• towards any given subset of the space of examples ...

– Representative• of the “real-world” data to be encountered in use/deployment

– Accurate• inaccuracies in training/testing produce inaccuracies results

– Available• the more training/testing data available the better the results

• greater confidence in the results can be achieved

Data Training Methodologies

Simple approach : Data Splits

– split overall data set into separate training and test sets• No established rule but 80%:20%, 70%:30% or ⅓:⅔ training to testing

splits common

– Training on one, test on the other

– Test error = error on the test set

– Training error = error on training set

– Weakness: susceptible to bias in data sets or “over-fitting”• Also less data available for training

Data Training Methodologies

More advanced (and robust): K Cross Validation

– Randomly split (all) the data into k-subsets

– For 1 to k

• train using all the data not in kth subset

• test resulting learned [classifier|function …] using kth subset

– report mean error over all k tests

Key Summary Statistics #1

tp = true positive / tn = true negative

fp = false positive / fn = false negative

Often quoted or plotted when comparing ML techniques

Kappa Statistic

Measure of classification of “N items into C mutually exclusive categories”

Pr(a) = probability of success of classification ( = accuracy)Pr(e) = probability of success due to chance

– e.g. 2 categories = 50% (0.5), 3 categories = 33% (0.33) ….. etc.

– Pr(e) can be replaced with Pr(b) to measure agreement between classifiers/techniques a and b

[Cohen, 1960]

machine learning for computer vision part 2

machine learning extra

bmvasummer school

machine learning entropy

backpropagation output

machine learning definition

output layer errors

backpropagation output

network output

Science

machine learning in computer vision mva 2011 - lecture...

nikita dvornik – phd in computer vision and machine...

overview of computer and machine visioncomputer vision...

computer vision and machine learning making the …

machine learning for computer vision – lecture 1 1 machine

computer and machine vision

machine learning for computer visioncomputer vision group...

machine learning in computer vision - cgg.mff.cuni.cz

computer vision -...

crime forecasting: a machine learning and computer vision

machine learning for computer vision

discover the power of machine learning & computer vision …

machine learning in computer vision: mixture models, part...

computer vision-basedgesture tracking, object tracking...

machine learning & computer vision

machine learning lecture 1 - computer vision

slides: machine learning for computer vision

jhu johns hopkins computer vision machine learning · jhu...

computer and machine vision - mercury.pr.erau.edu

computer vision machine learning features