data mining - neural networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · dr. jean-michel...
TRANSCRIPT
![Page 1: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/1.jpg)
Data Mining - Neural Networks
Dr. Jean-Michel RICHER
Dr. Jean-Michel RICHER Data Mining - Neural Networks 1 / 79
![Page 2: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/2.jpg)
Outline
1. Introduction
2. History and working principle
3. Improvements of NN
4. How to learn with a NN ?
5. Backpropagation example
6. Interesting links and applications
Dr. Jean-Michel RICHER Data Mining - Neural Networks 2 / 79
![Page 3: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/3.jpg)
1. Introduction
Dr. Jean-Michel RICHER Data Mining - Neural Networks 3 / 79
![Page 4: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/4.jpg)
What we will cover
What we will coverbasics of Artificial Neural Networksthe perceptronthe multi-layer networkthe sigmoid functionbackpropagationSynaptic.js
Dr. Jean-Michel RICHER Data Mining - Neural Networks 4 / 79
![Page 5: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/5.jpg)
2. History and workingprinciple
Dr. Jean-Michel RICHER Data Mining - Neural Networks 5 / 79
![Page 6: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/6.jpg)
ANN
Artificial Neural NetworksNNs, ANNs or Connectionist Systems are computingsystems inspired by the biological neural networks thatconstitute animal brainsbased on a collection of connected units or nodescalled artificial neuronsthey try to model how neurons in the brain functionsuch systems learn or progressively improve theirperformance by considering examples (trainingphase)
Note: strong and weak AI, intelligence = calculation ?
Dr. Jean-Michel RICHER Data Mining - Neural Networks 6 / 79
![Page 7: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/7.jpg)
ANN
Specific Artificial Neural Networksfor image recognition: Convolutional Neural Network(CNN or ConvNet), a variation of multilayerperceptrons designed to require minimalpreprocessingfor speech recognition: Time Delay Neural Network(TDNN)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 7 / 79
![Page 8: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/8.jpg)
what can you do with NN ?
A first example: MNISTthe MNIST database of handwritten digits of 28× 28pixels784 inputs and 10 outputsdatabase of 60.000 examples and a test set of 10.000smallest error rate of 0.35% with 6-layers NN (Ciresanet al., 2010)smallest error rate of 0.23% with ConvolutionalNetwork (Ciresan et al., 2012)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 8 / 79
![Page 9: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/9.jpg)
ANN
McCulloch and Pitts, 1943Warren S. McCulloch, a neuroscientist, and WalterPitts, a logician explain the complex decisionprocesses in a brain using a linear threshold gatetakes a sum and returns 0 if the result is below thethreshold and 1 otherwise.very simple: binary inputs and outputs, threshold stepactivation function, no weighting of inputs
Dr. Jean-Michel RICHER Data Mining - Neural Networks 9 / 79
![Page 10: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/10.jpg)
ANN
Donald O. Hebb, 1949Hebbian rule basis of nearly all neural learningproceduresconnection between two neurons is strengthenedwhen both neurons are active at the same timethis change in strength is proportional to the productof the two activitiesuse weights
Dr. Jean-Michel RICHER Data Mining - Neural Networks 10 / 79
![Page 11: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/11.jpg)
ANN
Rosenblatt, 1958Frank Rosenblatt, a psychologist at Cornell, wasworking on understanding the comparatively simplerdecision systems present in the eye of a fly, whichunderlie and determine its flee response.he proposed the idea of a Perceptron (Mark IPerceptron)an algorithm for pattern recognitionsimple input output relationship, modeled on aMcCulloch-Pittsperceptron learning: weights are adjusted only whena pattern is misclassified
Dr. Jean-Michel RICHER Data Mining - Neural Networks 11 / 79
![Page 12: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/12.jpg)
ANN
Bernard Widrow, Marcian E. Hoff, 1960professor Widrow and his student Hoff introduced theADALINE (ADAptive LInear NEuron)a fast and precise adaptive learning system: leastmean squares filter (LMS)delta rule: minimises the output error using(approximate) gradient descentfound in nearly every analog telephone for real-timeadaptive echo filtering
Note: Hoff received his master’s degree from Stanford Uni-versity in 1959 and his PhD in 1962, father of the micropro-cessor at Intel
Dr. Jean-Michel RICHER Data Mining - Neural Networks 12 / 79
![Page 13: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/13.jpg)
ANN
Minsky and Papert, 1969Marvin Minsky and Seymour Papert led a campaignto discredit neural network researchall neural networks suffer from the same fatal flaw asthe perceptron (XOR)they left the impression that neural network researchwas a dead end
Note: Minsky (MIT) is known for co-founding the field of AI,Papert (MIT) developed the Logo programming language
Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79
![Page 14: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/14.jpg)
ANN
Paul Werbos, 1974in 1974 developed the back-propagation learningmethod although its importance wasn’t fullyappreciated until a 1986accelerates the training of multi-layer networksinput vector is applied to the network andpropagated forward from the input layer to thehidden layer, and then to the output layeran error value is then calculated by using the desiredoutput and the actual output for each output neuronin the network.the error value is propagated backward through theweights of the network beginning with the outputneurons through the hidden layer and to the inputlayer
Dr. Jean-Michel RICHER Data Mining - Neural Networks 14 / 79
![Page 15: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/15.jpg)
ANN
Geoffrey Hinton, David Rumelhart, Ronald Williams ,1986
Backpropagation: repeatedly adjust the weights soas to minimize the difference between actual outputand desired outputHidden Layers: neuron nodes stacked in betweeninputs and outputs, allow NN to learn morecomplicated features (such as XOR logic)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 15 / 79
![Page 16: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/16.jpg)
Multi Layer NN
Figure: from the course of Nahua Kang ontowardsdatascience.com
Dr. Jean-Michel RICHER Data Mining - Neural Networks 16 / 79
![Page 17: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/17.jpg)
ANN
Deep LearningDeep Learning is about constructing machinelearning models that learn a hierarchicalrepresentation of the dataNeural Networks are a class of machine learningalgorithmsexample: NVIDIA CUDA Deep Neural Network library(cuDNN) is a GPU-accelerated library of primitives fordeep neural networks.
Dr. Jean-Michel RICHER Data Mining - Neural Networks 17 / 79
![Page 18: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/18.jpg)
ANN working principle
The Artificial Neuronconnected with n input channels x1 to xn
each has a synaptic weight w1 to wn
there is a bias b
use an activation function fa
The output is defined as:
y = fa(n∑
i=1
xi ×wi + b)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 18 / 79
![Page 19: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/19.jpg)
ANN principle
Neuron formulaCan be modified by incorporating the bias into the xi ×wi
set x0 = 1w0 = b
the formula becomes:
y = fa
(n∑
i=0
xi ×wi
)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 19 / 79
![Page 20: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/20.jpg)
The perceptron
1
0
w1
fa∑
xn
x1
x0
w0
wn
y
Dr. Jean-Michel RICHER Data Mining - Neural Networks 20 / 79
![Page 21: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/21.jpg)
Neuron
Figure: from the course of Nahua Kang ontowardsdatascience.com
Dr. Jean-Michel RICHER Data Mining - Neural Networks 21 / 79
![Page 22: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/22.jpg)
The perceptron
Neuron activationThe heaviside (thresold or binary) function is of the form
y =
{1 if
∑ni=0 wi × xi > 0
0 otherwise
The perceptron is a simple model of prediction.
Dr. Jean-Michel RICHER Data Mining - Neural Networks 22 / 79
![Page 23: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/23.jpg)
The perceptron
Learn with perceptron
initialize w ;while not convergence do
compute errors;update w from errors;
endAlgorithm 1: Perceptron learning scheme
wj = wj + η(y − y)× xj
where η is the learning constant (not too big, not too small)between 0.05 and 0.15
Dr. Jean-Michel RICHER Data Mining - Neural Networks 23 / 79
![Page 24: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/24.jpg)
The perceptron
AND / ORThe perceptron can implement boolean formulas like theboolean OR or the AND
a b a ∨ b a ∧ b0 0 0 00 1 1 01 0 1 01 0 1 1
Dr. Jean-Michel RICHER Data Mining - Neural Networks 24 / 79
![Page 25: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/25.jpg)
The perceptron
Exemple with AND
X =
x0 x1 x21 0 01 0 11 1 01 1 1
y =
0001
η = 0.1w = [0.1, 0.2, 0.05]
Dr. Jean-Michel RICHER Data Mining - Neural Networks 25 / 79
![Page 26: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/26.jpg)
The perceptron
Exemple with AND - first casetake X0 = [1, 0, 0], y0 = 0∑
wi × xi = 0.1× 1 + 0.2× 0 + 0.05× 0 = 0.1fa(0.1) = 1 = y
w0 = w0 + η(y − y)× X00 = 0.1 + 0.1× (0− 1)× 1 = 0
w1 = w1 + η(y − y)× X10 = 0.2 + 0.1× (0− 1)× 0 = 0.2
w2 = w2 + η(y − y)× X20 = 0.05+ 0.1× (0− 1)× 0 = 0.05
continue with X1 = [1, 0, 1], ...
Dr. Jean-Michel RICHER Data Mining - Neural Networks 26 / 79
![Page 27: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/27.jpg)
The perceptron
Exemple with AND - Convergencew = [−0.30000001, 0.22, 0.10500001]result is
x0 x1 x2 yp1. 0. 0. 01. 0. 1. 01. 1. 0. 01. 1. 1. 1
It works !
Dr. Jean-Michel RICHER Data Mining - Neural Networks 27 / 79
![Page 28: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/28.jpg)
The perceptron
ExerciseTry to implement the perceptron in python, C++ orJavaand test it for the boolean AND and OR
Dr. Jean-Michel RICHER Data Mining - Neural Networks 28 / 79
![Page 29: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/29.jpg)
The perceptron
Why XOR is not possible with a perceptron ? (1/2)The one layer perceptron acts as a linear separator:
0 1
0
1
0
0
0
1
AND OR XOR0 1
0
1
0
1
1
1
0 1
0
1
0
1
1
0
Dr. Jean-Michel RICHER Data Mining - Neural Networks 29 / 79
![Page 30: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/30.jpg)
The perceptron
Why XOR is not possible with a perceptron ? (2/2)
a b a XOR b equation0 0 0 w0 + 0×w1 + 0×w2 ≤ 0 (1)0 1 1 w0 + 0×w1 + 1×w2 > 0 (2)1 0 1 w0 + 1×w1 + 0×w2 > 0 (3)1 1 0 w0 + 1×w1 + 1×w2 ≤ 0 (4)
adding (1) and (4) and then (2) and (3):
(1) + (4) 2w0 + w1 + w2 ≤ 0(2) + (3) 2w0 + w1 + w2 > 0
impossible !
Dr. Jean-Michel RICHER Data Mining - Neural Networks 30 / 79
![Page 31: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/31.jpg)
3. Improvements of NeuralNetworks
Dr. Jean-Michel RICHER Data Mining - Neural Networks 31 / 79
![Page 32: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/32.jpg)
Other activation functions
Heaviside problemIf the activation function is linear then the final output is stilla linear combination of the input data
SigmoidA sigmoid function is a real function (special case of thelogistic function):
bounded (min, max)differentiablehas a characteristic "S"-shaped curve
s(x) = σ(x) =1
1 + e−x =ex
1 + ex
Dr. Jean-Michel RICHER Data Mining - Neural Networks 32 / 79
![Page 33: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/33.jpg)
The sigmoid function
Other sigmoid-like functions
hyperbolic tangent: tanh(x) = ex−e−x
ex+e−x
arctangent function: arctan(x)
error function: 2√π
∫ x0 e−t2
dt
See wikipedia for a complete list
Dr. Jean-Michel RICHER Data Mining - Neural Networks 33 / 79
![Page 34: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/34.jpg)
The sigmoid function
Dr. Jean-Michel RICHER Data Mining - Neural Networks 34 / 79
![Page 35: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/35.jpg)
Characteristic features of the sigmoid function
Properties of the sigmoid functionoutput values range from 0 to 1the curve crosses 0.5 at x = 0simple derivative of s(x)× (1− s(x))
used for models where you have to predictprobability of an output
See math.stackexchange.com for demonstration of thederivative
Dr. Jean-Michel RICHER Data Mining - Neural Networks 35 / 79
![Page 36: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/36.jpg)
4. How to learn with aNeural Network ?
Dr. Jean-Michel RICHER Data Mining - Neural Networks 36 / 79
![Page 37: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/37.jpg)
The backpropagation
Notationswe will use L to refer to a layer
yL represents the output of layer L
xL−1 represents the input layer for the computation ofyL
wL is the vector of weightsthe output is then computed by
yL = σ(wL × xL−1 + bL)
where σ is the sigmoid activation function and bL is the bias
Dr. Jean-Michel RICHER Data Mining - Neural Networks 37 / 79
![Page 38: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/38.jpg)
The backpropagation
zL
0.6
zL−1
0.2
σ(zL)
yL−1 yL
σ(zL−1) wL,bL
To simplify understanding we will write:
yL = σ(zL)
withzL = wL × xL−1 + bL
Dr. Jean-Michel RICHER Data Mining - Neural Networks 38 / 79
![Page 39: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/39.jpg)
The backpropagation
Imagine you want to build a NN to implement the XORfunction using a hidden layer:
L0 L1 L2
y1 y2x1
Input layer Hidden layer Output layer
we propagate the input values to the output layer:
y1 = σ(w1 × x0 + b1)
y2 = σ(w2 × y1 + b2)
We then can compare y2 to the expected value yexp
Dr. Jean-Michel RICHER Data Mining - Neural Networks 39 / 79
![Page 40: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/40.jpg)
The backpropagation
Error function and gradientIf y2 and yexp (the expected value for the output) are differ-ent we need to modify the wi and bi , for this we computethe error as:
E(y2) =12(yexp − y2)
2
which in fact results from:
E(y2) =12(yexp − σ(W2 × σ(W1 × x0 + b1) + b2)
2
and in fact the error depends of w1, b1, w2, b2
Dr. Jean-Michel RICHER Data Mining - Neural Networks 40 / 79
![Page 41: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/41.jpg)
The backpropagation
Error function and gradientWe will use the gradient of E to determine the influence ofthe wL’s and the bias bL’s:
∇E =
(∂E∂wL
,∂E∂bL
)+∇E is the direction to increase the function−∇E is the direction to decrease the function
Dr. Jean-Michel RICHER Data Mining - Neural Networks 41 / 79
![Page 42: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/42.jpg)
The backpropagation
How to compute ∂E∂wL
?
Remember that
zL = WL × yL−1 + bLyL = σ(zL)
E = 12(yexp − yL)
2
So the derivative of E with respect to wL can be rewritten:
∂E∂wL
=
(∂E∂yL
)(∂yL
∂zL
)(∂zL
∂wL
)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 42 / 79
![Page 43: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/43.jpg)
The backpropagation
How to compute ∂E∂wL
?
Remember that
∂E∂yL
= 12 × 2× (yexp − yL)×−1
∂yL∂zL
= σ′(zL)∂zL∂wL
= yL−1
So the derivative of E with respect to wL is:
∂E∂wL
= −× (yexp − yL)× σ′(zL)× yL−1
Dr. Jean-Michel RICHER Data Mining - Neural Networks 43 / 79
![Page 44: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/44.jpg)
The backpropagation
What about ∂E∂bL
?
Following the same demonstration, we get:
∂E∂bL
= −× (yexp − yL)× σ′(zL)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 44 / 79
![Page 45: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/45.jpg)
The backpropagation
Last stepwe have to sum all errors for each input datathen propagate the change to the previous layerusing the gradient:L2_delta = L2_error * sigmoid_deriv(L2)
L1_error = L2_delta.dot(w2.T)
L1_delta = L1_error * sigmoid_deriv(L1)
w2 += L1.T.dot(L2_delta) * eta
w1 += L0.T.dot(L1_delta) * eta
Dr. Jean-Michel RICHER Data Mining - Neural Networks 45 / 79
![Page 46: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/46.jpg)
The backpropagation
Dr. Jean-Michel RICHER Data Mining - Neural Networks 46 / 79
![Page 47: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/47.jpg)
How to design a NN ?
Design of Neural Network1 collect data (data structure ?)2 normalize data3 define training sets (Fold technique)4 define a test set (or use one of the folds)5 train the network using backpropagation6 test result
Follow the tutorial of Jason Brownlee on the net calledHow to Implement the Backpropagation Algorithm FromScratch In Python, November 2016.
Dr. Jean-Michel RICHER Data Mining - Neural Networks 47 / 79
![Page 48: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/48.jpg)
Backpropagation example
0.05
0.10
W2
0.01
W3
0.99
expectedvalues
0.35 0.60
b3b2
0.15 0.40
y12 = σ(z1
2)
y22 = σ(z2
2)
0.450.20
0.500.25
y23 = σ(z2
3)
y13 = σ(z1
3)x11
x21
0.30 0.55
Dr. Jean-Michel RICHER Data Mining - Neural Networks 48 / 79
![Page 49: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/49.jpg)
Backpropagation example
define W2 and W3 as matrices:
W2 =
0.15 0.2
0.25 0.3
=
w1,12 w1,2
2
w2,12 w2,2
2
W3 =
0.4 0.45
0.5 0.55
=
w1,13 w1,2
3
w2,13 w2,2
3
Dr. Jean-Michel RICHER Data Mining - Neural Networks 49 / 79
![Page 50: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/50.jpg)
Backpropagation example
Propagate values of x11 and x2
1 by computing z12 and y1
2 :
z12 = b2 + w1,1
2 × x11 + w1,2
2 × x21
z12 = 0.35 + 0.15× 0.05 + 0.2× 0.1 = 0.3775
y12 = σ(z1
2 )
y12 = 1/(1 + e−0.3775) = 0.5932
Dr. Jean-Michel RICHER Data Mining - Neural Networks 50 / 79
![Page 51: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/51.jpg)
Backpropagation example
Repeat the process for z22 and y2
2 :
z22 = b2 + w2,1
2 × x11 + w2,2
2 × x21
z22 = 0.35 + 0.25× 0.05 + 0.3× 0.1 = 0.3925
y22 = σ(z1
2 )
y22 = 1/(1 + e−0.3925) = 0.5968
Dr. Jean-Michel RICHER Data Mining - Neural Networks 51 / 79
![Page 52: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/52.jpg)
Backpropagation example
To simplify the computation we coud write: z12
z22
=
w1,12 w1,2
2
w2,12 w2,2
2
︸ ︷︷ ︸
W2
×
x12
x22
︸ ︷︷ ︸
X1
+b2 ×
1
1
︸ ︷︷ ︸
B2
or
Z2 = W2 × X1 + B2
and then
Y2 = σ(Z2)
Dr. Jean-Michel RICHER Data Mining - Neural Networks 52 / 79
![Page 53: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/53.jpg)
Backpropagation example
Propagate values of y12 and y2
2 by computing z13 and y1
3 :
z13 = b3 + w1,1
3 × y12 + w1,2
3 × y22
z13 = 0.6 + 0.4× 0.5932 + 0.45× 0.5968 = 1.1059
y13 = σ(z1
3 )
y13 = 1/(1 + e−1.1059) = 0.7513
Dr. Jean-Michel RICHER Data Mining - Neural Networks 53 / 79
![Page 54: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/54.jpg)
Backpropagation example
Do the same for z23 and y2
3 :
z23 = b3 + w2,1
3 × y12 + w2,2
3 × y22
z23 = 0.6 + 0.5× 0.5932 + 0.55× 0.5968 = 1.2249
y23 = σ(z1
3 )
y23 = 1/(1 + e−1.2249) = 0.7729
Dr. Jean-Michel RICHER Data Mining - Neural Networks 54 / 79
![Page 55: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/55.jpg)
Backpropagation example
Compute the error of the network where yexp is the vectorof expected values:
E(y3) = 12∑2
i=1(yiexp − y i
3)2
E(y3) = 12
((y1
exp − y13 )
2 + (y2exp − y2
3 )2)
E(y3) = 12
((0.01− 0.7513)2 + (0.99− 0.7729)2
)E(y3) = 1
2 (0.5496 + 0.0471)
E(y3) = 12(0.5496 + 0.0471) = 0.2983
Dr. Jean-Michel RICHER Data Mining - Neural Networks 55 / 79
![Page 56: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/56.jpg)
Backpropagation example
We need to compute the gradient of the error to updateW3:
y13 = σ(z1
3)
W 1,13
E(y3)z1
3 y13
Dr. Jean-Michel RICHER Data Mining - Neural Networks 56 / 79
![Page 57: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/57.jpg)
Backpropagation example
We apply the chain rule for w1,13 :
∂E(y3)
∂w1,13
= ∂E(y3)
∂y13× ∂y1
3∂z1
3× ∂z1
3
∂w1,13
where∂E(y3)
∂y13
= 2× 12 × (y1
exp − y13 )×−1
∂y13
∂z13
= σ′(z13 ) = y1
3 × (1− y13 )
∂z13
∂w1,13
= y12
Dr. Jean-Michel RICHER Data Mining - Neural Networks 57 / 79
![Page 58: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/58.jpg)
Backpropagation example
∂E(y3)
∂w1,13
= −(y1exp − y1
3 )× y13 × (1− y1
3 )× y12
= −(0.01− 0.7513)× 0.7513× (1− 0.7513)× 0.5932
= 0.7413× 0.1868× 0.5932
= 0.0821
Dr. Jean-Michel RICHER Data Mining - Neural Networks 58 / 79
![Page 59: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/59.jpg)
Backpropagation example
For w1,23 :
∂E(y3)
∂w1,23
= ∂E(y3)
∂y13× ∂y1
3∂z1
3× ∂z1
3
∂w1,23
where∂E(y3)
∂y13
= 2× 12 × (y1
exp − y13 )×−1
∂y13
∂z13
= σ′(z13 ) = y1
3 × (1− y13 )
∂z13
∂w1,23
= y22
Dr. Jean-Michel RICHER Data Mining - Neural Networks 59 / 79
![Page 60: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/60.jpg)
Backpropagation example
∂E(y3)
∂w1,23
= −(y1exp − y1
3 )× y13 × (1− y1
3 )× y22
= −(0.01− 0.7513)× 0.7513× (1− 0.7513)× 0.5968
= 0.7413× 0.1868× 0.5968
= 0.0826
Dr. Jean-Michel RICHER Data Mining - Neural Networks 60 / 79
![Page 61: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/61.jpg)
Backpropagation example
For w2,13 :
∂E(y3)
∂w2,13
= ∂E(y3)
∂y23× ∂y2
3∂z2
3× ∂z2
3
∂w2,13
where∂E(y3)
∂y23
= 2× 12 × (y2
exp − y23 )×−1
∂y23
∂z23
= σ′(z23 ) = y2
3 × (1− y23 )
∂z23
∂w2,13
= y12
Dr. Jean-Michel RICHER Data Mining - Neural Networks 61 / 79
![Page 62: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/62.jpg)
Backpropagation example
∂E(y3)
∂w2,13
= −(y2exp − y2
3 )× y23 × (1− y2
3 )× y12
= −(0.99− 0.7729)× 0.7729× (1− 0.7729)× 0.5932
= −0.2171× 0.1755× 0.5932
= −0.02260
Dr. Jean-Michel RICHER Data Mining - Neural Networks 62 / 79
![Page 63: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/63.jpg)
Backpropagation example
For w2,23 :
∂E(y3)
∂w2,23
= ∂E(y3)
∂y23× ∂y2
3∂z2
3× ∂z2
3
∂w2,23
where∂E(y3)
∂y23
= 2× 12 × (y2
exp − y23 )×−1
∂y23
∂z23
= σ′(z23 ) = y2
3 × (1− y23 )
∂z23
∂w2,23
= y22
Dr. Jean-Michel RICHER Data Mining - Neural Networks 63 / 79
![Page 64: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/64.jpg)
Backpropagation example
∂E(y3)
∂w2,13
= −(y2exp − y2
3 )× y23 × (1− y2
3 )× y12
= −(0.99− 0.7729)× 0.7729× (1− 0.7729)× 0.5968
= −0.2171× 0.1755× 0.5968
= −0.02274
Dr. Jean-Michel RICHER Data Mining - Neural Networks 64 / 79
![Page 65: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/65.jpg)
Backpropagation example
We now can update W3:
W ?3 = W3 − η
0.0821 0.0826
−0.0226 −0.0227
where η is the learning rate, we set it to 0.5 in this case
W ?3 =
[0.358916479717885 0.4086661860762330.511301270238737 0.561370121107989
]
Dr. Jean-Michel RICHER Data Mining - Neural Networks 65 / 79
![Page 66: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/66.jpg)
Limits of NN
Limits of Neural Networksdoes the use of the gradient function gives theminimum ?like for Maximum Parsimony: does the minimumrepresent the best network ?number and size of the hidden layers ?
Dr. Jean-Michel RICHER Data Mining - Neural Networks 66 / 79
![Page 67: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/67.jpg)
6. Interesting linksand applications
Dr. Jean-Michel RICHER Data Mining - Neural Networks 67 / 79
![Page 68: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/68.jpg)
Links
to explore in greater depth the course, follow those links:a series of very intersting videos about NN:3Blue1BrownApple Watch Detects Signs of DiabetesSolving SpaceNet Road Detection Challenge WithDeep LearningDeep neural network from scratch from FlorianCourtial
Dr. Jean-Michel RICHER Data Mining - Neural Networks 68 / 79
![Page 69: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/69.jpg)
Toolkits
There are many tookits for NN available for many languages:
GPU computing: cuDNN (NVidia)Theano (University of Montreal)Tensorflow (Google)Caffe (Berkeley AI Research)MXNet (Microsoft, Nvidia, Intel, ...)many more on wikipedia
Dr. Jean-Michel RICHER Data Mining - Neural Networks 69 / 79
![Page 70: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/70.jpg)
Synaptic.js
Synaptic.jsSynaptic.js defines itself as the javascript architecture-freeneural network library for node.js and the browser
you can easily define a NNtrain it efficientlyintegrate the code in a web page
Dr. Jean-Michel RICHER Data Mining - Neural Networks 70 / 79
![Page 71: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/71.jpg)
Synaptic.js for XOR
Dr. Jean-Michel RICHER Data Mining - Neural Networks 71 / 79
![Page 72: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/72.jpg)
Synaptic.js for XOR
Manuallyvar manualTrainingSet = [
{ input: [0,0], output: [0] },
{ input: [0,1], output: [1] },
{ input: [1,0], output: [1] },
{ input: [1,1], output: [0] }
]
GeneratedgeneratedTrainingSet = [];
for (var i = 0; i < 4; ++i) {
var op1 = Math.trunc(i/2);
var op2 = Math.trunc(i & 1);
input=[op1 , op2];
generatedTrainingSet.push({ input,
"output": [Math.trunc(op1 ^ op2)] });
}
Dr. Jean-Michel RICHER Data Mining - Neural Networks 72 / 79
![Page 73: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/73.jpg)
Synaptic.js for XOR
Training of the neural networkDon’t use myTrain.trainXOR() which automatically pro-vides a XOR training set, but use train(..)
// var trainingSet = manualTrainingSet;
var trainingSet = generatedTrainingSet;
myTrainer.train(trainingSet);
Dr. Jean-Michel RICHER Data Mining - Neural Networks 73 / 79
![Page 74: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/74.jpg)
Synaptic.js for IRIS
Neural Network for the IRIS datasetModify the example of the XOR network to create anetwork for the IRIS dataset
take the IRIS dataset from WEKA and convert it toJSONload the JSON data into the web page using JQuerytrain the network and display the results
Dr. Jean-Michel RICHER Data Mining - Neural Networks 74 / 79
![Page 75: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/75.jpg)
Synaptic.js for IRIS
Dr. Jean-Michel RICHER Data Mining - Neural Networks 75 / 79
![Page 76: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/76.jpg)
Synaptic.js for IRIS
JQuery<script type="text/javascript"
src="https://ajax.googleapis.com/ajax/libs/
jquery/2.1.3/jquery.min.js">
</script>
<script type="text/javascript">
var trainingSet = [];
$(document).ready(function(){
$.getJSON("iris.json", function(result){
for (i in result) {
...
}
...
});
});
</script>
Dr. Jean-Michel RICHER Data Mining - Neural Networks 76 / 79
![Page 77: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/77.jpg)
Synaptic.js for IRIS
NormalizationTo get the best results you need to normalize the data, forexample:
feature scaling:
x ′ =x − xmin
xmax − xmin
standard score:x ′ =
x − µσ
where µ and σ are respectively the mean andstandard deviation of the data
Dr. Jean-Michel RICHER Data Mining - Neural Networks 77 / 79
![Page 78: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/78.jpg)
6. End
Dr. Jean-Michel RICHER Data Mining - Neural Networks 78 / 79
![Page 79: Data Mining - Neural Networks - univ-angers.frricher/dm/data_mining_6_ann.pdf · Dr. Jean-Michel RICHER Data Mining - Neural Networks 13 / 79. ANN Paul Werbos, 1974 in 1974 developed](https://reader033.vdocuments.us/reader033/viewer/2022052100/6039e4189a69d92aca5104ec/html5/thumbnails/79.jpg)
University of Angers - Faculty of Sciences
UA - Angers2 Boulevard Lavoisier 49045 AngersCedex 01Tel: (+33) (0)2-41-73-50-72
Dr. Jean-Michel RICHER Data Mining - Neural Networks 79 / 79