artificial neural networks - university of texas at arlington history-victor 2017.pdf · per...
Post on 19-Jul-2018
215 Views
Preview:
TRANSCRIPT
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
Artificial Neural Networks
Historical description
Victor G. Lopez
1 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Artificial Neural Networks (ANN)
• An artificial neural network is a computational model that attempts to
emulate the functions of the brain.
2 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Characteristics of ANNs
Modern ANNs are complex arrangements of processing units able to adapt theirparameters using learning techniques.
Their plasticity, nonlinearity, robustness and highly distributed framework have at-tracted a lot of attention from many areas of research.
Several applications have been studied using ANNs: classification, pattern recog-nition, clustering, function approximation, optimization, forecasting and prediction,among others.
To date, ANN models are the artificial intelligence methods that imitate human in-telligence more closely.
3 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
1890s: A neuron model
Santiago Ramon y Cajal proposes the brain works in a parallel and distributedmanner, with neurons as basic processing units.
He described the first complete biological model of the neuron.
4 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Neural synapses
A neural synapse is the region where the axon of a neuron interacts with anotherneuron.
A neuron usually receives information by means of its dendrites, but this is notalways the case.
Neurons share information using electrochemical signals.
5 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Action Potential
The signal sent by a single neuron is usually weak, but a neuron receives manyinputs from many other neurons.
The inputs from all the neurons are integrated. If a threshold is reached, the neuronsends a powerful signal through its axon, called an action potential.
6 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Neural Pathways
The action potential is an all-or-none signal. It doesn’t matter if the threshold isbarely reached or vastly surpassed, the resulting action potential is the same.
This means that the action potential alone does not carry much information. Allcerebral processes, like memory or learning, depend on neural pathways.
There are over 1011 neurons in the human brain, forming around 1015 synapses.They form the basis of human intelligence and consciousness.
7 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
1943: McCulloch and Pitts
Warren McCulloch (neurophysiologist) and Walter Pitts (matematician) wrote a pa-per describing a logical calculus of neural networks.
Their model can, in principle, approximate any computable function.
This is considered the birth of artificial intelligence.
8 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
1958: The Perceptron
Frank Rosenblatt (psychologist) proposes the Perceptron with a novel method ofsupervised learning.
This is the oldest neural network still in use today.
9 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Single-neuron Perceptron
Here, the activation function f was selected as a saturation function. This simulatesthe all-or-none property of the action potential.
The single-neuron Perceptron can solve classification problems of two linearly se-parable groups.
10 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Perceptron with a Layer of Neurons
Using several neurons, the Perceptron can clas-sify objects into many categories, as long asthey are linearly separable.
The number of total categories is 2S , with S thenumber of neurons.
11 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Training algorithm per neuron
1 Initialize the weights W0.
2 Compute the output of the network for input pk. If the output is correct, set
Wk+1 = Wk
3 If the output is incorrect, set
Wk+1 = Wk − ηpk, if WTk pk ≥ 0
Wk+1 = Wk + ηpk, if WTk pk < 0
Here, 0 < η ≤ 1 is the learning rate.
12 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
Logical gates AND and OR
Separation of the outputs of the logical gates AND and OR are simple examples ofproblems solvable by the single-layer Perceptron.
In contrast, the outputs of the XOR gate are not linearly separable.
x1 x2 y0 0 00 1 01 0 01 1 1
AND
x1 x2 y0 0 00 1 11 0 11 1 1
OR
x1 x2 y0 0 00 1 11 0 11 1 0
XOR
13 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
DefinitionBiological modelMcCulloch and PittsSingle-layer Perceptron
1959: ADALINE and MADALINE
Bernard Widrow and Marcian Hoff developed models called ADALINE (adaptivelinear elements) and MADALINE (Multiple ADALINE).
The main difference with respect to the Perceptron is the absence of the thresholdactivation function.
Training of these networks is performed using derivatives.
14 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
1970s: First Winter in ANNs research
After the successful introduction and development of ANNs during the 1960s, in-terest in their applications decayed for almost two decades.
The limitations of the single-layer Perceptron narrowed its possible practical imple-mentations.
Theoretical research showed that a multilayer Perceptron would drastically improveits performance, but there was no training algorithm for it.
15 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
1986: Multilayer Perceptron
16 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
1986: Multilayer Perceptron
In his 1974 PhD thesis, Paul Werbos proposed to use the backpropagation algo-rithm as a solution to the multilayer Perceptron training problem. His suggestion,however, remained ignored for more than a decade.
In 1986, the backpropagation method is finally popularized in a paper by Rumelhart,Hinton and Williams.
The multilayer Perceptron became the most powerful ANN model to date.
It is proven to solve nonlinearly separable classification problems, it can approxi-mate any continuous function, it generalizes from particular samples, among manyother applications.
17 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
Backpropagation TrainingThis is a supervised learning. Then, we have a list of inputs and their correspondingtarget outputs, (pk, tk).
We can compute the output of the NN for each given input pk. This is called theforward propagation step. For a 3 layer network, this would be
ak = f3(W 3f2(W 2f1(W 1pk + b1) + b2) + b3)
Define the output error as ek = tk − ak.
Now, it is our interest to minimize the squared error
J =1
2e2k =
1
2(tk − ak)2
or the average sum of the squared error
J =1
2Q
Q∑k=1
e2k =1
2Q
Q∑k=1
(tk − ak)2
18 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
Backpropagation Training
Use a gradient descent algorithm to update the weights Wk while minimizing theerror ek
Wk+1 = Wk + ∆Wk
∆Wk = −η∂J
∂Wk
The chain rule for derivatives can be used to obtain a clearer expression
∂J
∂Wk=
∂J
∂ek
∂ek
∂ak
∂ak
∂Wk
From the previous definitions we note that
∂J
∂ek= ek,
∂ek
∂ak= −1
and ∂ak∂Wk
depends on the activation functions f i. Notice that all activation functionsmust be differentiable.
19 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
Backpropagation Algorithm
1 Initialize the weights W0.
2 By forward propagation, get ak.
3 Calculate the error ek = tk − ak.
4 Update the neural weights as
Wk+1 = Wk + η∂ak
∂Wkek
20 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
Activation functions
21 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
Late 1990s - Early 2000’s: Second Winter in ANNs research
In the 1990s, several applications of ANNs are studied and implemented. Areas asvision, pattern recognition, unsupervised learning and reinforcement learning takeadvantage of the ANNs adaptive characteristics.
Late in that decade, a new difficulty delays the advancement in the field. The ba-sic backpropagation algorithm is not appropriate for several hidden layers, mainlybecause of limited computational capabilities.
Many researchers became pessimistic about ANNs.
22 / 23
First modelsFirst Winter
Multilayer PerceptronSecond WinterDeep Learning
2006-2016: Deep learning
In 2006, Hinton, Osindero and Teh published a fast learning algorithm for deepbelief networks. This marks the dawn of deep learning.
The decade of 2010s has seen a boom in deep neural network applications. Com-panies as Microsoft, Google and Facebook have developed advanced deep lear-ning ANNs. Optimism has returned to the field and human-level intelligence is ex-pected to be achieved in a few decades.
23 / 23
top related