introduction to artificial neural networks (anns) · introduction to artiﬁcial neural networks...

Introduction to Artificial Neural Networks(ANNs)

Keith L. Downing

The Norwegian University of Science and Technology (NTNU)Trondheim, [email protected]

January 19, 2015

Keith L. Downing Introduction to Artificial Neural Networks (ANNs)

NETtalk (Sejnowski + Rosenberg, 1986)

NEUR

ECNEI

SO

C

Letters"Concepts"

Phonemes

Silent

Context

Window

IBM’s DECtalk: several man years of work→ Reading machine.

NETtalk: 10 hours of backprop training on a 1000-word text, T1000.

95% accuracy on T1000; 78% accuracy on novel text.

Improvement during training sounds like a child learning to read.

Concept layer is key. 79 different (overlapping) clouds of neurons aregradually formed, with each mapping to one of the 79 phonemes.


Sample ANN Applications: Forecasting

1 Train the ANN (typically using backprop) on historical data to learn[X (t−k ),X (t−k+1), . . . ,X (t0)] 7→ [X (t1), . . . ,X (tm−1),X (tm)]

2 Use to predict future value(s) based on the past k values.

Sample applications (Ungar, in Handbook of Brain Theory and NNs, 2003)

Car sales

Airline passengers

Currency exchange rates

Electrical loads on regional power systems.

Flour prices

Stock prices (Warning: often tried, but few good, documented results).


Brain-Computer Interfaces (BCI)

Scalp EEG

ActionNeuralContext

NeuralEnsembles

1 Ask subject to think about an activity (e.g. moving joystick left)2 Register brain activity (EEG waves - non-invasive) or (Neural ensembles -

invasive)3 ANN training case = (brain readings, joystick motion)

Sample applications (Millan, in Handbook of Brain Theory and NNs, 2003)

Keyboards (3 keystrokes per minute)

Artificial (prosthetic) hands

Wheelchairs

Computer games


Brains as Bio-Inspiration

"Watermelon"

Grandmother

"The truth?You can't handle

the truth.""I got a 69 Chevy

with a 396..."

Texas

Distributed Memory - A key to the brain’s success, and a majordifference between it and computers.

Brain operations slower than computers, but massively parallel.

How can the brain inspire AI advances?

What is the proper level of abstraction?


Signal Transmission in the Brain

Nucleus

Dendrites

Axons

SP

AP

Action Potential (AP)A wave of voltage change along an axon.Nucleus (soma) generates an AP if the sum of its incomingsynaptic potentials SPs (similar, but weaker, voltagechange along dendrites) is strong enough.Unlike neuroscientists, AI people rarely distinguishbetween APs and SPs. Both are just signals.


Ion Channels

Na+Ca++

K+

Ca++ Na+

K+

Repolarization

Depolarization


Depolarization and Repolarization

+40 mV

-65 mV

0 mV

Undershoot

Overshoot

Na+ gates openNa+ Influx

K+ gates opensK+ Efflux

Na+ gates closeK+ Efflux

K+ gates close

TimeResting Potential


Transferring APs across a Synapse

PresynapticTerminal

PostsynapticTerminal

Synapse

Vesicle

Neurotransmitter (NT)

NT-gatedIon Channel

Action-Potential (AP)

NeurotransmittersExcite - Glutamate, AMPA ; bind Na+ and Ca++ channels.Inhibit - GABA; binds K+ channels


Location, Location, Location..of Synapses

Soma

Dendrites

Axons

I2

I1P2

P1

Distal and Proximal SynapsesSynapses closer to the soma normally have a stronger effect.


Donald Hebb (1949)

Fire Together, Wire Together

When an axon of cell A is near enough to excite a cell B andrepeatedly or persistently takes part in firing it, some growthprocess or metabolic change takes place in one or both cells,such that A’s efficiency as one of the cells firing B, is increased.

Hebb Rule4wi ,j = λoioj

Instrumental in Binding of..pieces of an imagewords of a songmultisensory input (e.g. words and images)sensory inputs and proper motor outputssimple movements of a complex action sequence


Coincidence Detection and Synaptic Change

2 Key Synaptic Changes1 the propensity to release neurotransmitter (and amount

released) at the pre-synaptic terminal,2 the ease with which the post-synaptic terminal depolarizes

in the presence of neurotransmitters.

Coincidences1 Pre-synaptic: Adenyl cyclase (AC) detects simultaneous

presence of Ca++ and serotonin.2 Post-synaptic: NMDA receptors detect co-occurrence of

glutamate (a neurotransmitter) and depolarization.


Pre-synaptic Modification

AC5HT

Ca++

ATPcAMP

PKA

Ca++

Post-synapticTerminal

Pre-synapticTerminal

Glutamate

Mg++NMDA Receptor

AC 5HT SerotoninAdenylCyclase

Salient Event

Depolarization


Post-synaptic Modification

Mg++CA++Net

Negative Charge

Polarized (relaxed) postsynaptic

state

Depolarized (firing) postsynaptic

state

Mg++

CA++

NetPositive Charge

Glutamate

NMDAReceptor


Neurochemical Basis of Hebbian Learning

Fire together: When the pre- and post-synaptic terminalof a synapse depolarize at about the same time, the NMDAchannels on the post-synaptic side notice the coincidenceand open, thus allowing Ca++ to flow into the post-synapticterminal.Wire together: Ca++ (via CaMKII and protein kinase C)promotes post- and pre-synaptic changes that enhance theefficiency of future AP transmission.


Hebbian Basis of Classical Conditioning

Salivate (R)

Hear Bell(CS)

See Food (US)

S1

S2

Unconditioned Stimulus (US) - sensory input normallyassociated with a response (R). E.g. the sight of foodstimulates salivation.Conditioned Stimulus (CS) - sensory input having noprevious correlation with a response but which becomesassociated with it. E.g. Pavlov’s bell.


Long-Term Potentiation (LTP)

Early Phase

Chemical changes to pre- and post-synaptic terminals, due toAC and NMDA activity, respectively, increase the probability(and efficiency) of AP transmission for minutes to hours aftertraining.

Late PhaseStructural changes occur to the link between the upstreamand downstream neuron. This often involves increases in thenumbers of axons and dendrites linking the two, and seems tobe driven by chemical processes triggered by highconcentrations of Ca++ in the post-synaptic soma.


Abstraction

Human Brains

1011 neurons1014 connections between them (a.k.a. synapses), manymodifiableComplex physical and chemical activity to transmit ONEaction potential (AP) (a.k.a. signal) along ONE connection.

Artificial Neural Networks

N = 101−104 nodesMax N2 connectionsAll physics and chemistry represented by a few parametersassociated with nodes and arcs.


Structural Abstraction

SomaAxons

AP

Soma

Soma

Soma

Soma

Dendrites

AP

SynapsesSoma

Node

NodeNode

Node

Node

Nodew w

w w

w

w

w

Soma

Soma

Soma

Axonal Compartments

Dendritic Compartments


Diverse ANN Topologies

A B

D E

C

F


Functional Abstraction

Na+Ca++

K+

Ca++ Na+

K+

Lipid bilayer

=capacitor

Ion channel=

resistor

K+ Na+

CMEK

RK VM

ENa

RNa

Integrate

Activate

N2

N3

w12

w13

N1

Learn

Reset


Main Functional Components

Integrate

Activate

N2

N3

w12

w13

N1

Learn

Reset

Integrate

neti = ∑nj=1 xjwi ,j : Vi ← Vi +neti

Activate

xi =1

1+e−Vi

Reset

Vi ← 0

Learn

4wi ,j = λxixj


Functional Options

Vi

xi

i Vi xi

Spiking Neuron Model: Reset Vi only when above threshold

Neurons without state: Always reset Vi

Never reset Vi

Vi <= Vi + neti

Integrate

Activate

Reset


Activation Functions xi = f (Vi)

Vi0

1Identity

0

1Step

T

0

1Ramp

T

0

1Logistic

0-1

1Hyperbolic Tangent (tanh)

0

xi xi

xi

xi xi

Vi

Vi

ViVi


Diverse Model Semantics

What Does xi Represent?1 The occurrence of a spike in the action potential,2 The instantaneous membrane potential of a neuron,3 The firing rate of a neuron (AP’s / sec),4 The average firing rate of a neuron over a time window,5 The difference between a neuron’s current firing rate and

its average firing rate.


Circuit Models of Neurons

Lipid bilayeracts as a

capacitor

Ion channelsact as

resistors

CMEK

RK VM

K+ Na+

ENa

RNa


Using Kirchoff’s Current Law

The sum of all currents into the cell must be zero.

The currents

: Capacitance: Icap = CMdVidt

: Ionic (Potassium): IK = (VM−EK )rK

= gK (VM −EK )

: Ionic (Sodium): INa = (VM−ENa)rNa

= gNa(VM −ENa)

: Ionic (Leak): IL =(VM−EL)

rL= gL(VM −EL) = Passive flow of

ions through ungated channels.

where I = current, r = resistance, g = conductance (1r ), and VM

= membrane potential

Icap + IK + INa + IL = 0

CMdVM

dt=−gK (VM −EK )−gNa(VM −ENa)−gL(VM −EL)


Modeling Voltage-Gated Channels

gK and gNa are sensitive to the membrane potential, VM

The gating probabilitiesm, n and h = gating probabilities (between 0 and 1)They are complex functions of VM , determined empiricallyby Hodgkin and Huxley’s work on the giant squid axon.

Conductances are functions of the gating probabilities

gK = gK n4 - since 4 identical and independent parts of a Kgate need to be open.gK = maximum K conductance.gNa = gNam3h - since 3 identical and independent parts(along with a different, 4th part) of an Na gate need to beopen.gNa = maximum Na conductance.


A Basic Version of the Hodgkin-Huxley Model

Na+Ca++

K+

Ca++ Na+

K+

Repolarization

Depolarization

τmdVM

dt=−gK (VM −EK )−gNa(VM −ENa)−gL(VM −EL)

4VM ∝ Inflow(Na+) - outflow(K+) - Leak currentEL ≈−60mV , EK ≈−70mV , and ENa ≈ 50mVτm includes the capacitance, CM .


Leaky Integrate and Fire Neurons

c

b

a

i

Leak

Vi

EL = -65 mV

wia

wib

wic

xi

xa

xb

xc

These models ignore ion channels and activity along axons and dendrites.


A Simple Leak-and-Integrate Model

τmdVi

dt= cL(EL−Vi)+cI

N

∑j=1

xjwij (1)

Vi = intracellular potential for neuron i.xi = output (current) from neuron i.wij = weight on connection from j to i.EL = extracellular potentialτm = membrane time const. Higher τm → slower change.cL, cI = leak and integration constants.

A Common Abstraction

τmdVi

dt=−Vi +

N

∑j=1

xjwij (2)


Firing Models

Continuous: Sigmoid Function

xi =1

1+e−csVi(3)

* Often used for rate-coding, where xi = the neuron’s firing rate;cs is a scaling constant.

Discrete: Step Function with Reset

xi =

{1 if Vi > Tf0 otherwise

(4)

Vi ← Vreset after exceeding the threshold, Tf .Typical values: Vreset =−65mV , Tf =−50mV .Often used in spiking neuron models, where xi is binary,denoting presence or absence of an action potential.


Temporal Abstraction

A

B

C

0.8

0.5

0.4

+40 mV

-65 mV

0 mV

Time

A

B

C

Time


Spike Response Model (SRM) - Gerstner et. al., 2002

Vi(t) = κ(Iext)+η(t− t̂i)+N

∑j=1

wij

H

∑h=1

εij(t− t̂i , t− thj )

i

j

kt*

t*

t* !

ki

!

kj

The timing of each spike is very important in determining its effects upondownstream neurons.


Spiking Neurons

Eugene Izhikevich, 2003

A Simple Model of Spiking Neurons. IEEE Transactions onNeural Networks, 14(6).

τmdVi

dt= 0.04V 2

i +5v +140−Ui +cI

N

∑j=1

xjwij (5)

τmdUi

dt= a(bVi −Ui) (6)

Ui = recovery factorIf Vi ≥ 30mVthen Vi ← Vreset , and Ui ← Ui +Ureset


Parameterized Spiking Patterns

Vi

Time

ChatteringRegular Spiking

Intrinsic Bursting Thalamocortical

Key parameters a, b, Vreset , and Ureset → spike patterns.


Continuous Time Recurrent Neural Networks

1 2 3 4 5

Sensory Input Layer

1 2

1 2

HiddenLayer

B

MotorOutputLayer

BiasNode

CTRNNs abstract away spikes but achieve complex dynamics withneuron-specific time constants, gains and biases.

All weights evolve, but none are modified by learning.

Invented by Randall Beer in early 1990’s and used in many evolved,minimally-cognitive agents.


The Simple CTRNN Model

si =n

∑j=1

xjwi ,j + Ii

dVi

dt=

1τi[−Vi +si +θi ]

xi =1

1+e−gi Vi

θi = bias; gi = gain.

τi = time constant for neuron i.

Each neuron implicitly runs at a different temporal resolution.


Essence of Learning in Neural Networks

u1

u2

un

vw2

wn

w1

pre-synaptic neurons

post-synaptic neuron

?

∆w

Most ANNs do not model spikes nor STDP. Learning is basedon a comparison of recent firing rates of neuron pairs.


Spike-Timing Dependent Plasticity (STDP)

0

t

s

-40 ms 40 ms

0

0.4

-0.4

Change in synaptic strength (4s) as function of4t = tpre− tpost ,the times of the most recent pre- and post-synaptic spikes. Themaximum magnitude of change is roughly 0.4% of themaximum possible synaptic strength/conductance.


3 Fundamental ANN Learning Paradigms

SupervisedConstant, detailed feedback that includes the correct responseto each input; Omnipresent teacher.

ReinforcedSimple feedback mainly at the end of a problem-solvingattempt, although possibly a few intermediate rewards orpenalties, but no direct response recommendations.

UnsupervisedNo feedback whatsoever. ANN normally tries to intelligentlycluster the inputs and/or learn proper correlations betweencomponents of input space.


Supervised Learning

You should have turned RIGHT at

the last intersection.

SensoryInput

MotorOutput

CorrectAction

-

∆W

Error


Reinforced Learning

You are at the goal!

w

∆w

ww

ww

ReinforcementSignal


Unsupervised Learning

A long trip down a corridor is followed

by a left turn.

w

∆w

ww

ww

Input

w

w


Hebbian Learning Rules

Basic Heterosynaptic Basic Homosynaptic4wi = λv(ui −θi) 4wi = λ (v −θv )ui

General Hebbian BCM Oja4wi = λuiv 4wi = λuiv(v −θv ) 4wi = uiv −wiv2

Homosynaptic

All active synapses are modified the same way, depending only onthe strength of the postsynaptic activity.

Heterosynaptic

Active synapses can be modified differently, depending upon thestrength of their presynaptic activity.


Modelling Options to Consider

1 Single or multiple neurons?2 Can neuron A send more than one axon to neuron B?3 Are connections modeled as cables or just simple connector points (i.e.

a single weight).4 Do neurons have state? I.e., does Vi (t +1) depend on Vi (t)?5 Do outputs (xi ) represent individual spikes or spike rates or ..?6 Are neurons organized by layers?7 Do layers follow a feed-forward topology or is there recurrence (i.e.

looping)?8 Are neurons connected within layers or only between layers?9 Is learning supervised, unsupervised or reinforced?

10 Is spike-timing dependent plasticity (STDP) involved in the learningrule?


introduction to artificial neural networks (anns) · introduction to artiﬁcial neural networks...

Documents