machine learning for computer vision part 2
DESCRIPTION
From BMVA summer school 2014 Watch the lecture here: https://www.youtube.com/watch?v=G6hf6YbPA_sTRANSCRIPT
Machine Learning Extra : 1BMVA Summer School 2014
The bits the whirlwind tour left out ...
BMVA Summer School 2014 – extra background slides
Machine Learning Extra : 2BMVA Summer School 2014
Machine Learning
Definition:
– “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, improves with experience E.”
[Mitchell, 1997]
Machine Learning Extra : 3BMVA Summer School 2014
Algorithm to construct decision trees ….
Machine Learning Extra : 4BMVA Summer School 2014
Building Decision Trees – ID3 node = root of tree
Main loop:A = “best” decision attribute for next node
.....
But which attribute is best to split on ?
Machine Learning Extra : 5BMVA Summer School 2014
Entropy in machine learning
Entropy : a measure of impurity
– S is a sample of training examples
– P is the proportion of positive examples in S
– P⊖ is the proportion of negative examples in S
Entropy measures the impurity of S:
Machine Learning Extra : 6BMVA Summer School 2014
Information Gain – reduction in Entropy
Gain(S,A) = expected reduction in entropy due to splitting on attribute A– i.e. expected reduction in impurity in the data
– (improvement in consistent data sorting)
Machine Learning Extra : 7BMVA Summer School 2014
Information Gain – reduction in Entropy
– reduction in entropy in set of examples S if split on attribute A
– Sv = subset of S for which attribute A has value v
– Gain(S,A) = original entropy – SUM(entropy of sub-nodes if split on A)
Machine Learning Extra : 8BMVA Summer School 2014
Information Gain – reduction in Entropy
Information Gain : – “information provided about the target function given the value of
some attribute A”
– How well does A sort the data into the required classes?
Generalise to c classes :– (not just or ⊖)
Entropy S =−∑i=1
c
pi log pi
Machine Learning Extra : 9BMVA Summer School 2014
Building Decision Trees Selecting the Next Attribute
– which attribute should we split on next?
Machine Learning Extra : 10BMVA Summer School 2014
Building Decision Trees Selecting the Next Attribute
– which attribute should we split on next?
Machine Learning Extra : 11BMVA Summer School 2014
Backpropogation Algorithm ….
Machine Learning Extra : 12BMVA Summer School 2014
Backpropagation AlgorithmAssume we have:
– input examples d={1...D}
• each is pair {xd,t
d} = {input
vector, target vector}
– node index n={1 … N}
– weight wji connects node j → i
– input xji is the input on the
connection node j → i
• corresponding weight = wji
– output error for node n is δn
• similar to (o – t)
Output Layer
Input layer
Input, x
Output vector, Ok
Hidden Layer
node
inde
x {1
… N
}
Machine Learning Extra : 13BMVA Summer School 2014
Backpropagation Algorithm(1) Input Example
example d
(2) output layer error based on :
difference between output and target
(t - o)
derivative of sigmoid function
(3) Hidden layer errorproportional to node contribution to output error
(4) Update weights wij
–
Machine Learning Extra : 14BMVA Summer School 2014
Backpropagation
Termination criteria– number of iterations
reached
– Or error below suitable bound
Output layer error
Hidden layer error
Add weights updated using relevant error
Machine Learning Extra : 15BMVA Summer School 2014
Backpropagation
Output Layer, unit k
Input layer
Input, x
Output vector, Ok
Hidden Layer, unit h
Machine Learning Extra : 16BMVA Summer School 2014
BackpropagationOutput vector, O
k
δh is expressed as a weighted
sum of the output layer errors δk
to which it contributes (i.e. whk
> 0)
Output Layer, unit k
Input layer
Input, x
Hidden Layer, unit h
Machine Learning Extra : 17BMVA Summer School 2014
Backpropagation
Error is propogated backwards from network output ....to weights of output layer....to weights of the hidden layer…
Hence the name: backpropagation
Output Layer, unit k
Input layer
Input, x
Hidden Layer, unit h
Output vector, Ok
Machine Learning Extra : 18BMVA Summer School 2014
Backpropagation
Repeat these stages for every hidden layer in a multi-layer network:(using error δ
i where x
ji>0)
.......
Output Layer, unit k
Input layer
Input, x
Hidden Layer(s),
unit h
Output vector, Ok
Machine Learning Extra : 19BMVA Summer School 2014
Backpropagation
Error is propogated backwards from network output ....to weights of output layer....over weights of all N hidden layers…
Hence the name: backpropagation
.......
Output Layer, unit k
Input layer
Input, x
Hidden Layer(s),
unit h
Output vector, Ok
Machine Learning Extra : 20BMVA Summer School 2014
Backpropagation
Will perform gradient descent over the weight space of {w
ji} for all
connections i → j in the network
Stochastic gradient descent– as updates based on
training one sample at a time
Machine Learning Extra : 21BMVA Summer School 2014
Understanding (and believing) the SVM stuff ….
Machine Learning Extra : 22BMVA Summer School 2014
Remedial Note: equations of 2D lines
Line:
where:
are 2D vectors.
Offset from origin
Normal to line
2D LINES REMINDER
Machine Learning Extra : 23BMVA Summer School 2014
Remedial Note: equations of 2D lines
http://www.mathopenref.com/coordpointdisttrig.html
2D LINES REMINDER
Machine Learning Extra : 24BMVA Summer School 2014
Remedial Note: equations of 2D lines
For a defined line equation:Fixed Insert point into equation …...
Normal to line
Result is +ve if point on this side of line (i.e.> 0).
Result is -ve if point on this side of line. ( < 0 )
Result is the distance (+ve or -ve) of point from line given by:
for:
2D LINES REMINDER
Machine Learning Extra : 25BMVA Summer School 2014
Linear Separator Instances (i.e, examples) {x
i , y
i }
– xi = point in instance space
(Rn) made
up of n attributes
– yi =class value for classification of x
i
Want a linear separator. Can view this as constraint satisfaction problem:
Equivalently,
y = +1
y = -1
Classification of example function f(x) = y = {+1, -1} i.e. 2 classes
N.B. we have a vector of weights coefficients w⃗
Machine Learning Extra : 26BMVA Summer School 2014
Linear Separator
If we define the distance of the nearest point to the margin as 1
→ width of margin is
(i.e. equal width each side)
We thus want to maximize:
finding the parameters:
y = +1
y = -1
Classification of example function f(x) = y = {+1, -1} i.e. 2 classes
Machine Learning Extra : 27BMVA Summer School 2014
which is equivalent to minimizing:
Machine Learning Extra : 28BMVA Summer School 2014
…............. back to main slides
Machine Learning Extra : 29BMVA Summer School 2014
So ….
Find the “hyperplane” (i.e. boundary) with:
a) maximum margin
b) minimum number of (training) examples on the wrong side of the chosen boundary
(i.e. minimal penalties due to C)
Solve via optimization (in polynomial time/complexity)
Machine Learning Extra : 30BMVA Summer School 2014
Find hyperplane separator (plane in 3D) via optimization
Non-linear Separation (red / blue data itemson 2D plane).
Kernel projection to higher dimensional space
Non-linear boundary in original dimension (e.g. circle n 2D) defined by planar boundary (cut) in 3D.
Example:
Machine Learning Extra : 31BMVA Summer School 2014
.... but it is all about the data!
Machine Learning Extra : 32BMVA Summer School 2014
Desirable Data Properties
Machine learning is a Data Driven ApproachThe Data is important! Ideally training/testing data used for learning must be:
– Unbiased• towards any given subset of the space of examples ...
– Representative• of the “real-world” data to be encountered in use/deployment
– Accurate• inaccuracies in training/testing produce inaccuracies results
– Available• the more training/testing data available the better the results
• greater confidence in the results can be achieved
Machine Learning Extra : 33BMVA Summer School 2014
Data Training Methodologies
Simple approach : Data Splits
– split overall data set into separate training and test sets• No established rule but 80%:20%, 70%:30% or ⅓:⅔ training to testing
splits common
– Training on one, test on the other
– Test error = error on the test set
– Training error = error on training set
– Weakness: susceptible to bias in data sets or “over-fitting”• Also less data available for training
Machine Learning Extra : 34BMVA Summer School 2014
Data Training Methodologies
More advanced (and robust): K Cross Validation
– Randomly split (all) the data into k-subsets
– For 1 to k
• train using all the data not in kth subset
• test resulting learned [classifier|function …] using kth subset
– report mean error over all k tests
Machine Learning Extra : 35BMVA Summer School 2014
Key Summary Statistics #1
tp = true positive / tn = true negative
fp = false positive / fn = false negative
Often quoted or plotted when comparing ML techniques
Machine Learning Extra : 36BMVA Summer School 2014
Kappa Statistic
Measure of classification of “N items into C mutually exclusive categories”
Pr(a) = probability of success of classification ( = accuracy)Pr(e) = probability of success due to chance
– e.g. 2 categories = 50% (0.5), 3 categories = 33% (0.33) ….. etc.
– Pr(e) can be replaced with Pr(b) to measure agreement between classifiers/techniques a and b
[Cohen, 1960]