neural network architectures aydın ulaş 02 december 2004 [email protected]

Neural Network ArchitecturesNeural Network Architectures

Aydın Ulaş

02 December 2004

[email protected]

Outline Of PresentationOutline Of Presentation

IntroductionNeural NetworksNeural Network ArchitecturesConclusions

IntroductionIntroduction

Some numbers…– The human brain contains about 10 billion nerve cells

(neurons)– Each neuron is connected to the others through 10000

synapses

Brain as a computational unit – It can learn, reorganize from experience– It adapts to the environment – It is robust and fault tolerant– Fast computations with too much individual

computational units

IntroductionIntroduction

Taking the nature as a model. Consider the neuron as a PE A neuron has

– Input (dendrites)– Output (the axon)

The information circulates from the dendrites to the axon via the cell body

Axon connects to dendrites via synapses– Strength of synapses change– Synapses may be excitatory or inhibitory

Perceptron (Artificial Neuron)Perceptron (Artificial Neuron)

Definition : Non linear, parameterized function with restricted output range

1

10

n

iiixwwfo

x1

ActivationFunction

w1

wd

w2

w0x2

xd

x0=1

o

Activation FunctionsActivation Functions

-10 -8 -6 -4 -2 0 2 4 6 8 10-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-10 -8 -6 -4 -2 0 2 4 6 8 10-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

Linear

Sigmoid

Hyperbolic tangent

xy

)exp(1

1

xy

)exp()exp(

)exp()exp(

xx

xxy

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

Neural NetworksNeural Networks

A mathematical model to solve engineering problems– Group of highly connected neurons to realize

compositions of non linear functions Tasks

– Classification– Clustering– Regression

According to input flow– Feed forward Neural Networks– Recurrent Neural Networks

Feed Forward Neural NetworksFeed Forward Neural Networks

The information is propagated from the inputs to the outputs

Time has no role (Acyclic, no feedbacks from outputs to inputs)

h1 hs

v10

h0=1

omoio1

hj

vmsvij

x1 xd

w10

x0=1 xk

wsdwjk

Recurrent NetworksRecurrent Networks

Arbitrary topologies Can model systems with

internal states (dynamic ones)

Delays can be modeled More difficult to train Problematic performance

– Stable Outputs may be more difficult to evaluate

– Unexpected behavior (oscillation, chaos, …)

x1 x2

LearningLearning The procedure that consists in estimating the parameters

of neurons (setting up the weights) so that the whole network can perform a specific task.

2 types of learning– Supervised learning– Unsupervised learning

The Learning process (supervised)– Present the network a number of inputs and their

corresponding outputs (Training)– See how closely the actual outputs match the desired ones– Modify the parameters to better approximate the desired

outputs– Several passes over the data

Supervised LearningSupervised Learning

The real outputs of the model for the given inputs is known in advance. The networks task is to approximate those outputs.

A “Supervisor” provides examples and teach the neural network how to fulfill a certain task

Unsupervised learningUnsupervised learning

Group typical input data according to some function.

Data clusteringNo need of a supervisor

– Network itself finds the correlations between the data

– Examples:• Kohonen feature maps (SOM)

Properties of Neural NetworksProperties of Neural Networks

Supervised networks are universal approximators (Non recurrent networks)

Can act as– Linear Approximator (Linear

Perceptron)– Nonlinear Approximator (Multi Layer

Perceptron)

Other PropertiesOther Properties

Adaptivity– Adapt weights to the environment easily

Ability to generalize– May provide against lack of data

Fault tolerance– Not too much degradation of performances if

damaged The information is distributed within the entire net.

An Example RegressionAn Example Regression

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-10

-8

-6

-4

-2

0

2

4

Example ClassificationExample Classification

Handwritten digit recognition 16x16 bitmap representation

– Converted to 1x256 bit vector

7500 points on training set 3500 points on test set

0000000001100000000000011010000000000001000000000000001000000000000001000000000000001000000000000000100000000000000010000000000000001000000000000001000111110000000101100001100000011000000010000001100000001000000100000000100000001000000100000000011111110000

TrainingTraining

Try to minimize an error or cost function

Backpropogation algorithm– Gradient Descent

Learn the weights of the networkUpdate the weights according to the

error function

ApplicationsApplications

Handwritten Digit RecognitionFace recognitionTime series predictionProcess identificationProcess controlOptical character recognitionEtc…

Neural NetworksNeural Networks

Neural networks are statistical tools– Adjust non linear functions to accomplish a task– Need of multiple and representative examples but fewer

than in other methods Neural networks can model static (FF) and

dynamic (RNN) tasks NN’s are good classifiers BUT

– Good representations of data have to be formulated– Training vectors must be statistically representative of

the entire input space The use of NN needs a good comprehension of

the problem

Implementation of Neural NetworksImplementation of Neural Networks

Generic architectures (PC’s etc)Specific Neuro-HardwareDedicated circuits

Generic architecturesGeneric architectures

Conventional microprocessorsIntel Pentium, Power PC, etc … Advantages

– High performances (clock frequency, etc)– Cheap– Software environment available (NN tools, etc)

Drawbacks– Too generic, not optimized for very fast neural

computations

Classification of HardwareClassification of Hardware

NN Hardware– Neurochips

• Special Purpose• General Purpose (Ni1000, L - Neuro)

– NeuroComputers• Special Purpose (CNAPS, Synapse)• General Purpose

Specific Neuro-hardware circuitsSpecific Neuro-hardware circuits

Commercial chips CNAPS, Synapse, etc. Advantages

– Closer to the neural applications– High performances in terms of speed

Drawbacks– Not optimized to specific applications– Availability– Development tools

CNAPSCNAPS

SIMDOne instruction sequencing and

control unitProcessor nodes (PN)Single dimensional array (only right

or left nodes)

CNAPS 1064CNAPS 1064

CNAPSCNAPS

Dedicated circuitsDedicated circuits

A system where the functionality is buried in the hardware.

For specific applications only not changeable

AdvantagesOptimized for a specific applicationHigher performances than the other systems

DrawbacksHigh development costs in terms of time and

money

What type of hardware to be used in What type of hardware to be used in dedicated circuits ?dedicated circuits ?

Custom circuits– ASIC (Application-Specific Integrated Circuit)– Necessity to have good knowledge of the hardware

design– Fixed architecture, hardly changeable– Often expensive

Programmable logic– Valuable to implement real time systems– Flexibility– Low development costs– Lower performances compared to ASIC (Frequency,

etc.)

Programmable logicProgrammable logic

Field Programmable Gate Arrays (FPGAs)– Matrix of logic cells – Programmable interconnection– Additional features (internal memories +

embedded resources like multipliers, etc.)– Reconfigurability

• We can change the configurations as many times as desired

Real Time SystemsReal Time Systems

Execution of applications with time constraints.– Hard real-time systems

• Digital fly-by-wire control system of an aircraft:No lateness is accepted. The lives of people depend on the correct working of the control system of the aircraft.

– Soft real-time systems• Vending machine:

Accept lower performance for lateness, it is not catastrophic when deadlines are not met. It will take longer to handle one client with the vending machine.

Real Time SystemsReal Time Systems

ms scale real time system– Connectionist retina for image

processing• Artificial Retina: combining an image

sensor with a parallel architecture

µs scale real time system– Level 1 trigger in a HEP experiment

Connectionist RetinaConnectionist Retina Integration of a neural

network in an artificial retina

Screen– Matrix of Active Pixel

sensors CAN

– 8 bits ADC converter 256 levels of grey

Processing Architecture– Parallel system where

neural networks are implemented Processing

Architecture

CAN

Eye

Maharadja Processing ArchitectureMaharadja Processing Architecture

Micro-controller– Generic architecture executing

sequential cost with low power consumption

Memory – 256 Kbytes shared between

processor, PE’s, input– Store the network parameters

UNE (Unit Neural SIMD – Completely pipelined – 16 bit internal data bus)– Processors to compute the

neurons outputs– Command bus manages all

different operators in UNE Input/Output module

– Data acquisition and storage of intermediate results

Micro-controllerMicro-controller

Sequencer Sequencer

Command busCommand bus

Input/OutputInput/Outputunitunit

Instruction BusInstruction Bus

UNE-0 UNE-1 UNE-2 UNE-3

M M M M

Level 1 trigger in a HEP experimentLevel 1 trigger in a HEP experiment

High Energy Physics (Particle Physics) Neural networks have provided interesting

results as triggers in HEP.– Level 2 : H1 experiment 10 – 20 µs – Level 1 : Dirac experiment 2 µs

Particle Recognition High timing constraints (in terms of

latency and data throughput)

Neural Network architectureNeural Network architecture

……..

……..

64

128

Execution time : ~500 ns

Weights coded in 16 bitsStates coded in 8 bits

with data arriving every BC=25ns

4Electrons, tau, hadrons, jets

Very Fast ArchitectureVery Fast Architecture

TanHPE PE PEPE

PE PE PEPE

PE PE PEPE

PE PE PEPE

TanH

TanH

TanH

ACC

ACC

ACC

ACC

256 PE’s Matrix of n*m

matrix elements Control unit I/O module TanH are stored in

LUTs 1 matrix row

computes a neuron The results is

back-propagated to calculate the output layer

I/O module Control unit

PE architecturePE architecture

X

AccumulatorMultiplier

Weights mem

Input data 8

16

Addr gen

+

Data in

cmd bus

Control Module

Data out

Neuro-hardware todayNeuro-hardware today

Generic Real time applications– Microprocessors technology (PCs, computers, i.e.

software) is sufficient to implement most of neural applications in real-time (ms or sometimes µs scale)

• This solution is cheap• Very easy to manage

Constrained Real time applications– It still remains specific applications where powerful

computations are needed e.g. particle physics– It still remains applications where other constraints

have to be taken into consideration (Consumption, proximity of sensors, mixed integration, etc.)

ClusteringClustering

Idea : Combine performances of different processors to perform massive parallel computations

High speedconnection


Advantages– Take advantage of the implicit

parallelism of neural networks– Utilization of systems already available

(university, Labs, offices, etc.)– High performances : Faster training of a

neural net – Very cheap compare to dedicated

hardware


Drawbacks– Communications load : Need of very

fast links between computers – Software environment for parallel

processing– Not possible for embedded applications

Hardware ImplementationsHardware Implementations

Most real-time applications do not need dedicated hardware implementation– Conventional architectures are generally appropriate– Clustering of generic architectures to combine

performances Some specific applications require other

solutions– Strong Timing constraints

• Technology permits to utilize FPGAs – Flexibility– Massive parallelism possible

– Other constraints (consumption, etc.)• Custom or programmable circuits

Questions?Questions?

neural network architectures aydın ulaş 02 december 2004 [email protected]

Documents

supervisor network

actual outputs

neural networksthe information

networks task

data clusteringno

axonthe information

number of inputs

specific task