hardware implementation of a pulse-stream neural network

7
Hardware implementation of a pulse-stream neural network R.J.Haycock T. A. York Indexing term: Neural networks, Synapses, BiCMOS technology, Modelling, Numerical analysis Abstract: The authors describe the design and test of an artificial neural network, using a pulse- stream approach, that is implemented using BiCMOS technology. Networks are constructed from arrays of customised neuron chips and synapse chips. The neuron chip uses novel circuitry to implement an accurate sigmoid transfer characteristic. The synapse chip uses a new pulse-stream implementation of the differential amplifier and requires only five transistors to produce a linear multiplier. Measured results from the chips show that the neuron has an accurate sigmoid transfer characteristic and gradient suitable for the error backpropagation learning algorithm. The synapse has excellent 1% linearity and properties suitable for multiplication. The chips have been used to implement a three-layer artificial neural network which has been tested using hard learning problems. 1 Introduction Biological neural networks are extremely good at learning and performing tasks such as content addressable memory, control, and especially, pattern recognition. Artificial neural networks (ANNs) [l, 21 aim to emulate these networks and, like their natural counterparts, consist of highly parallel structures of neurons and synapses. Individually, these are simple processing elements. Each synapse multiplies its input, which can be from another neuron or an input to the network, with the synaptic weight. Each neuron sums the outputs of all the synapses that are connected to its input and then passes the result through a transfer function to form the neuron output. The architecture for a typical simple ANN is shown in Fig. 1. Training of ANNs typically consists of presenting input patterns to the network, calculating the resulting output and comparing this with the desired value, then propagating the difference back through the network while adjusting the synaptic weights accordingly. The 0 IEE, 1998 IEE Proceedings online no. 19981923 Paper first received 14th July 1997 and in revised form 16th January 1998 The authors are with the Department of Electrical Engineering and Elec- tronics, UMIST, Sackville Street, PO Box 88, Manchester, M60 lQD, UK adjusted synaptic weights then reduce the error of the network output when subjected to further input patterns. input hidden output layer layer layer networ inputs @ synapse Fig. 1 Typical artificial neural network structure The architecture of neural networks means that, although the individual processing elements may not themselves be very fast, the network as a whole is able to process data very quickly if the inherent parallelism is exploited. This is because the processing elements operate upon data concurrently, rather than in a sequential manner. Implementing ANNs, which can consist of hundreds of neurons and thousands of syn- apses, on sequential digital computers does not exploit the full performance benefits that are inherent in the network. Therefore, implementing ANNs in hardware which exploits the parallel structure is desirable. Digital hardware implementations are accurate but relatively bulky, primarily because of the size of the digital multipliers that are used for each synapse [3-51. Analogue hardware implementations are able to imple- ment small and very fast multipliers but are not as accurate, being more susceptible to process variations and noise [3-51. Hybrid approaches, such as pulse- stream neural networks, aim to combine the best attributes of digital and analogue implementations and are an excellent compromise [3]. Pulse-stream ANNs [3, 6, 71 use digital pulses to transmit neural information, making them robust, and use analogue processing ele- ments, making them small and fast. The popular error backpropagation learning algo- rithm demands a transfer function that is monotonic and continuous. It employs neurons with a sigmoid eqn. 1: transfer function as shown in Fig. 2 and described by IEE Proc.-Circuits Devices Syst., Vol. 145, No. 3, June 1998

Upload: ta

Post on 20-Sep-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hardware implementation of a pulse-stream neural network

Hardware implementation of a pulse-stream neural network

R.J.Haycock T. A. York

Indexing term: Neural networks, Synapses, BiCMOS technology, Modelling, Numerical analysis

Abstract: The authors describe the design and test of an artificial neural network, using a pulse- stream approach, that is implemented using BiCMOS technology. Networks are constructed from arrays of customised neuron chips and synapse chips. The neuron chip uses novel circuitry to implement an accurate sigmoid transfer characteristic. The synapse chip uses a new pulse-stream implementation of the differential amplifier and requires only five transistors to produce a linear multiplier. Measured results from the chips show that the neuron has an accurate sigmoid transfer characteristic and gradient suitable for the error backpropagation learning algorithm. The synapse has excellent 1% linearity and properties suitable for multiplication. The chips have been used to implement a three-layer artificial neural network which has been tested using hard learning problems.

1 Introduction

Biological neural networks are extremely good at learning and performing tasks such as content addressable memory, control, and especially, pattern recognition. Artificial neural networks (ANNs) [l, 21 aim to emulate these networks and, like their natural counterparts, consist of highly parallel structures of neurons and synapses. Individually, these are simple processing elements. Each synapse multiplies its input, which can be from another neuron or an input to the network, with the synaptic weight. Each neuron sums the outputs of all the synapses that are connected to its input and then passes the result through a transfer function to form the neuron output. The architecture for a typical simple ANN is shown in Fig. 1. Training of ANNs typically consists of presenting input patterns to the network, calculating the resulting output and comparing this with the desired value, then propagating the difference back through the network while adjusting the synaptic weights accordingly. The 0 IEE, 1998 IEE Proceedings online no. 19981923 Paper first received 14th July 1997 and in revised form 16th January 1998 The authors are with the Department of Electrical Engineering and Elec- tronics, UMIST, Sackville Street, PO Box 88, Manchester, M60 lQD, UK

adjusted synaptic weights then reduce the error of the network output when subjected to further input patterns.

input hidden output layer layer layer

networ inputs

@ synapse Fig. 1 Typical artificial neural network structure

The architecture of neural networks means that, although the individual processing elements may not themselves be very fast, the network as a whole is able to process data very quickly if the inherent parallelism is exploited. This is because the processing elements operate upon data concurrently, rather than in a sequential manner. Implementing ANNs, which can consist of hundreds of neurons and thousands of syn- apses, on sequential digital computers does not exploit the full performance benefits that are inherent in the network. Therefore, implementing ANNs in hardware which exploits the parallel structure is desirable.

Digital hardware implementations are accurate but relatively bulky, primarily because of the size of the digital multipliers that are used for each synapse [3-51. Analogue hardware implementations are able to imple- ment small and very fast multipliers but are not as accurate, being more susceptible to process variations and noise [3-51. Hybrid approaches, such as pulse- stream neural networks, aim to combine the best attributes of digital and analogue implementations and are an excellent compromise [3]. Pulse-stream ANNs [3, 6, 71 use digital pulses to transmit neural information, making them robust, and use analogue processing ele- ments, making them small and fast.

The popular error backpropagation learning algo- rithm demands a transfer function that is monotonic and continuous. It employs neurons with a sigmoid

eqn. 1: transfer function a s shown in Fig. 2 and described by

IEE Proc.-Circuits Devices Syst., Vol. 145, No. 3, June 1998

Page 2: Hardware implementation of a pulse-stream neural network

where output is the neuron output, and activation is the summed neuron input.

c -

I

/ 0 - - act ivat I on

Fig. 2 Sigmoid transfer characteristic

A particularly attractive feature of this activation function is that the gradient, used in propagating the errors, can be expressed, solely, in terms of the output as shown in eqn. 2:

d( output) d( activation)

= output(1- output) (2)

Accurate electronic implementation of a sigmoid trans- fer function is not straightforward and approximations are frequently used. The differential amplifier has been used [6, 81 because it has a transfer function which is monotonic, continuous and saturates at the extremes. However, this implements the tanh function [9], and consequently, its derivative does not accurately repre- sent the desired gradient because the curves are more acute. Furthermore, it is difficult to produce a truly symmetrical function using the differential amplifier. This results in problems during training because of dis- crepancies between the error of the network and the error calculated using eqn. 2. Experiments carried out by the authors suggest that this often results in longer training times, and in some cases, very different weight values than those obtained using eqn. 1. CMOS invert- ers have also been used to implement the neurons [lo, 111 and these have similar limitations.

In this paper, an asynchronous neuron is imple- mented with an accurate sigmoid function. The neuron accepts an activation voltage as input and converts this to a time delay via a sigmoid transfer function.This time delay determines the period of the pulse-stream neuron output. Design and implementation of the neu- ron is described in Section 2.

The outputs from the neurons in a layer of a neural network are multiplied by the synaptic weights and then summed to form the activation voltage for the

tion the synapse must be small, to offer the possibility of integrating many of them, and asynchronous in operation, to obtain the full parallel performance of ANNs. Typical pulse-stream implementations involve integrating the pulses of charge produced by amplitude modulation of the neural pulse stream. The synapse described by Murray [6, 81 requires only five transistors and implements an asynchronous synapse in a small area. However, the synapse uses, essentially, a current source load inverter. Consequently, linearity can be

neurons in the nent layer. In addition to linear opera-

142

poor due to variation of the output voltage and the resulting effects are observable in the literature [ 3 ] . Sec- tion 3 describes a new pulse-stream synapse based on a differential amplifier. It also requires 5 transistors but has a superior linear region of operation. Section 4 presents the results of tests of a small neural network that was constructed using the neuron and synapse chips.

6 Fig. 3 Basic sigmoid stage

2 Neuron chip

In this paper, a sigmoidal transfer function is obtained by charging a capacitor, Csumjneuronl, with two currents as shown in Fig. 3. Iexp represents the input to the neu- ron and Iconst is constant. The time taken to charge the capacitor to a threshold voltage, Vthr,, assuming per- fect capacitor and current mirrors (MI, M2 and M,, M4), is given by:

(3)

where t is time, CsumJneuronj is the summing capacitor, Vthres is the voltage at time t, Iexp and Iconst are current sources.

v m t I vat, on

Fig. 4 Simple exponential current source

Current Iexp is implemented using a bipolar transistor (TI) together with a CMOS inverter (M5, M6) and cur- rent mirror as shown in Fig. 4. Therefore, Iexp varies exponentially with the base-emitter voltage ( VBE) of TI. Substituting for Iexp using the Ebers-Moll transport model, eqn. 3 can be rewritten as:

(4) Gum(neuTo7L) . %res t =

(Jconst - Is, + JSc exp ( ~ V B E / H ' ) ) where k is Boltzmann's constant, T is temperature, q is electron charge and I,, is saturation current. If VBE represents the activation voltage of the neuron, then eqn. 4, assuming constant temperature, implements the transfer function given in eqn. 1, in which the input is a voltage (VBE) and the output is the time to charge the capacitor (t). The gradient of eqn. 4, which is required by the backpropagation learning algorithm [I], is

IEE Proc -Circuits Devices Syst , Val 145, No 3, June 1998

Page 3: Hardware implementation of a pulse-stream neural network

readily calculated from d t

~ V B E = (-A)t(l - Bt) ( 5 )

where

In the circuit shown in Fig. 4, the emitter voltage of TI is raised to reduce the potential difference VBE and this, in conjunction with a CMOS inverter comprising M5 and M,, limits the base current. This also helps to reduce the effects of temperature [12] and is beneficial in reducing the Early effect. An input amplifier with a gain of 0.5 has also been added and is used to increase the dynamic range of the activation voltage. This also reduces the effects of noise on the input, because of the attenuation. The inputs are integrated on the summing capacitor, and therefore, noise performance is good, because the effects of high-frequency noise, such as from neighbouring neurons and digital circuits, are averaged. MOSFETs M1, M2 and M3, M4 which form 2 current mirrors employ long channels. This reduces the channel length modulation parameter which, in turn, reduces the effects of changes in VDs on the mir- rored current. Long channels are also beneficial in improving matching.

4 Fig. 5 Neuron circuit

circuit

Fig. 5 shows the complete neuron. Compensation for the temperature term in eqn. 4 is obtained by the addi- tion of a second bipolar transistor T2. This acts to compensate for changes with temperature of the term VBE, for T1, by adjusting the base voltage V,. The dimensions of M, and M6 are chosen such that, not only is the inverter linear in the required operating region, but the change with temperature of the MOS- FET drain currents are approximately equal. There- fore:

Solving for dVBldT = dV,,/dT and dIDsjldT = dzDsddT results in

p6 = 1.6p5 ( 8 ) IEE Puoc-Circuits Devices Syst.. Vol. 145, Nu 3, June 1998

for the chosen technology. An alternative approach would be the use of a pnp transistor with the same temperature characteristic as TI to replace M6 [12], but this was not possible with the available fabrication process.

When the voltage on Csumineuroni reaches Vthres a pulse circuit is triggered. The pulse circuit produces a pulse of constant width, which represents the neuron output. The time between pulses represents the neuron output and, as this is the time taken by the sigmoid stage to charge C,,, to Vthres, the output is a sigmoid function of the activation voltage. The functionality of the pulse stage is that of a simple astable multivibrator as shown in Fig. 6 and is implemented using digital library com- ponents. The use of digital cells enables a logic high to be readily used for the threshold voltage from the sig- moid stage, and, therefore, enables the sigmoid and pulse stages to be interfaced directly. The effects of temperature on this threshold voltage are small com- pared to the effects of temperature on the currents Zexp and ZcOnst. Simulation revealed only a 0.6mVI"C change in the threshold voltage. Similarly, process variation across a wafer results in a change of approximately 5mV in the threshold voltage. Both effects are suffi- ciently small to be ignored. The constant pulsewidth of the astable is determined by an RC constant. To save silicon area the resistance is implemented using an active resistor. The pulse circuit also resets Csumithresi. The net effect of the circuits shown in Figs. 5 and 6 is to produce a pulse stream with a sigmoidal characteris- tic given by eqn. 2, the output of which is related to the input voltage by a sigmoid expression of the form given in eqn. 1. The present neuron design uses a pulsewidth of 0 . 5 ~ and a time between pulses that varies from 0 . 5 ~ to 1O.lp.

I I Fig.6 CMOS astable circuit

A neuron chip has been fabricated using the AMS 1.2 pm BiCMOS process. This technology was chosen in preference to a CMOS process together with a para- sitic bipolar transistor which would have poor match- ing and current gain [13]. The chip includes 10 neurons together with a high-stability current source [14] for the constant current Iconst, which is based on a 'E/;. refer- enced current source [13]. The layout of the neuron measures 600 x 1 9 0 p , although a significant propor- tion of this is related to additional test circuits. A more compact neuron in the same technology could be real- ised in an area measuring approximately 400 x 1 9 0 ~ . Maximum power consumption of the neuron occurs when the exponential current source is fully turned on. In this case, the power consumption of the neuron is lOOpW. Fig. 7 shows graphs of neuron output, repre- sented by the time between pulses, against activation voltage. The two curves are for the ideal case and measured test data averaged across the chip. Clearly, there is excellent correspondence. The measurements were taken using a source monitor unit to provide the activation voltage, and the output timing was measured

143

Page 4: Hardware implementation of a pulse-stream neural network

using a digitising oscilloscope. The standard deviation of the measurements is largest in the region of the tran- sition between the two extreme values of the neuron output. This corresponds to a variation in activation voltage by approximately 20mV to maintain each neu- ron at the midpoint of the sigmoid characteristic. The cause of this has been 'attributed to the effects of proc- ess variation on the input amplifier. This uses two active load inverters and is consequently sensitive to the affects of variation in the transconductance the MOSFET transistors.

x10-5

v)

v)

ul 3 - a I

3) Q

3 0

C a,

I

9

E

4-

a,

a,

n

c

V C k .9 C .(3 TI- a >

2.4 2.6 2.8 3.0 3.2 activation, V

Fig. 7 Neuron transfir characteristic _ _ _ _ ideal transfer function (calculated) ~ test measurement (chip average)

x10-5 5 r

-1 1 I

2.1 2.6 2.8 3.0 3.2 activation, V

Fig.8 Neuron gradient ~~~- ideal gradient (calculated) __ test measurements (chip average)

Measurements from the wafer test sites showed that the transconductance varied by as much as 1 pAN2 and 0.2fl/V2 for the nMOS and PMOS transistors. When variations of this magnitude are used in simulations of the amplifier, the ouput voltage was found to change by 34mV for a fixed input voltage. Consequently, the variation between neurons is a result of the input amplifier and not a fundamental problem with the neu- ron. Future neurons will use a different design of input amplifier. Fig. 8 shows results for the gradient of the transfer function. Note that the plots slope in the opposite direction to that described by eqn. 4. This is

144

because of the inversion caused by transistors M5, M6 and T,.

x10-5

.- E ..- 0 c

0- 2 .o c .E 2 s

Fig.9

GlU

2

3 a U

C

9 n

. - E c

1.0

0.8

0.6

0.1

0.2

0 2.L 2.6 2.8 3.0 3.2

2r x10-6

1 0 2.1 2 .6 2.8 3.0 3.2

activat ion,V Measured temperature effects on neuron

m5

0 .8 -

0.6-

0 . 1 -

0 . 2 -

01 I

2.1 2.6 2.8 3.0 3.2 ac t ivat i on,V

Simulated temperature effects on compensated sigmoid stage Fig. 10

x10-5

1.0 -

3 4.2

0

C

U I I

2.1 2.6 2.8 3.0 3.2 activation,\/

Fig. 'I 1 Simulated temperature effects on uncompensated sigmoid stage

The sigmoid transfer characteristic of the neuron was measured over the temperature range 0°C to 65°C. The measured results are shown in Fig. 9. These were disap- pointing and were not as good as the simulated results of the sigmoid stage shown in Fig. 10. However, they are superior to the uncompensated results shown in Fig. 11. The measured results show that, for a given activation voltage, the output, represented as the time between pulses, varies by, at the most, 4 p over the temperature range 0°C to 65°C. This compares to 1 p

IEE Proc.-Circuits Devices Syst., Vol. 145, No. 3, June 1998

Page 5: Hardware implementation of a pulse-stream neural network

and 8 p for simulated compensated and uncompen- sated versions, respectively. The reduced temperature compensation was due to process variation, which resulted in the change with temperature of the drain currents of M, and M6 in Fig. 5 no longer being as well matched.

VDD

neuron pulse stream via current mir ror ----T---

Fig. 12 synapse

3 Synapse chip

The asynchronous pulse-stream synapse is implemented using a differential amplifier as shown in Fig. 12. This compares favourably with the design described by Murray [6, 81. The differential amplifier has a linear region which, for 1% linearity, has the following limits [15]:

where K is transconductance, 1, is the amplifier tail current and Vid is the differential input voltage ( Vr6, -

Pulse mode operation is obtained by switching the tail current of the differential amplifier. When the amplifier is turned off, almost zero current flows through the output. Measured test results using a source monitor unit show this current to be +/- 10pA. However, if the output voltage is sufficiently large, the amplifier can start to sink current when it should be turned off. This is because the drain-source voltage of M, is sufficient to operate the transistor in its linear region. The amplifier has been designed such that nor- mal operation is outside of these conditions,

When turned on, the differential amplifier works as normal and sinks or sources current, depending on the value of the weight voltage in relation to the reference voltage. Switching speed is limited by the size of the transistors, as their capacitance determines how quickly the amplifier can be turned on and off. Consequently, dimensions are kept small. This is also important with respect to the overall size of the synapse. The charge of each pulse supplied by the amplifier is

Qpuise = I x tpuise = f (wzj) tpuise (10) where Qppulse is the charge supplied by pulse, I is current magnitude of pulse and tpulse is pulsewidth. The voltage on the summing capacitor after a time t is, therefore,

W J .

where Csumlsynapsel is voltage on summing capacitor, V, is voltage on Csumjsynapsej after time t and NDulse is the number of pulses in time t. Multiplication of the neuron output with weight voltage and summation of these products is readily performed. The functionf( W,) is the synapse output current in eqn. 11. This is

IEE Proc C~vcuzts Deeicer Sysst , Yo! 145, No 3, June 1998

calculated from the CMOS differential amplifier equation [ 131:

P (Kef - W Z 3 l 2 - P2 ( K e f - K,j4 41L5

(12) 1:' Iout =

A 10 by 10 array of synapses has been manufactured using the Mietec 2 . 4 ~ CMOS process. The array con- sists of p-channel differential pairs. These have a smaller transconductance than n-channel devices, and, consequently, the linear range of the synapse is increased. The activation voltage for input to the next neuron chip is obtained by sampling the voltage on the

The voltage on this capacitor represents the summed products of the inputs to the synapse chip and associated weight. The voltage on Csumisunapsel is measured using a sample-and-hold circuit which is implemented using a CMOS switch capacitor and two op-amps configured with unity gain. After each sample, the capacitor is reset ready for the next charging cycle. The advantage of the sample-and-hold circuit, as compared to an integrator, is that it does not require operational amplifiers. Therefore, it uses less silicon and requires only one capacitor instead of a capacitor and a resistor. Sample-and-hold times are similar to the maximum neuron output period, and so this approach has little effect on the speed of a pulse-stream neural network. Analogue weight storage is achieved by using capacitors which are refreshed using an external 8 bit DAC and RAM. This is a simple method of achieving otherwise complicated and expensive analogue memory, such as those based upon EEPROM technologies. The size of the synapse's cell is 206 x 230pt-11, of which the main feature is a large weight storage capacitor, and the multiplier occupies less than a quarter of the cell area. A large capacitor i s used to allow long refresh cycles and a minimum of 0.5 LSB accuracy. This is similar to the synapse described by Murray [3], one version of which uses 2prn ES2 technology and has dimensions of 165 x 1 3 0 p . Maximum power consumption of the new synapse is 500pW and occurs when the synapse is turned on. This is quite large and is a result of the large differential tail current and large supply voltage. Reducing the power consumption is easily performed but care would need to be taken to ensure a suitable linear range for the synapse.

capacitor Csum(synapse).

x10-5

-1.51 I I

1.5 2.0 2.5 3.0 3.5 1.0 1.5 u c XIO-7

Wij,'J Fig. 13 Measured output current of synapse

145

Page 6: Hardware implementation of a pulse-stream neural network

Fig. 13 shows the average measured output current, from 50 random synapses, which was a 5% sample of the available synapses on the 10 chips manufactured. As can be seen from Fig. 13, the synapse has excellent linearity and accuracy. Fig. 14 shows typical measured results for multiplication by a synapse and compares these to the ideal. As can be seen, eqn. 11 closely mod- els synapse chip performance, which is linear over the required range for matching to the neuron. The syn- apse chip which consists of a 10 x 10 synapse array performs 100 million connections per second (10 x 10 x 100000) at the maximum output rate of the present neuron chips. A network containing all 10 synapse chips is capable of 1 giga connections per second. The sample-and-hold time of 1 0 ~ means that the propaga- tion speed through the synapse chip is a maximum of 10p.s. This performance is limited by the sample-and- hold circuit and the neuron output frequency. The syn- apse chip obviously offers high performance which is a consequence of its analogue implementation. The results compare very favourably with digital ANNs, for which connections per second are often quoted as a measure of performance, with recent implementations offering speeds of 1.37 giga connections [16] and 254 million connections per second [17], for chips of lo6 and 4600 synapses, respectively.

W i j ,V Fig. 14 _ _ _ _ calculated result ~ measured result Neuron pulse period from 1.1 to 11.5 ps in steps of 1.1

Test results for synapse multiplication

weight address weiqht address

8 bit wetyht 8 bit weight Fig. 15 NC = neuron chip SC = synapse chip

Three layer hardware neural network

4 ANN results

A three-layer ANN has been constructed using 3 neu- ron chips and 2 synapse chips, as shown in Fig. 15. This network has been tested using hard learning prob- lems such as parity. Parity is a particularly hard prob-

146

lem for a neural network to learn, as patterns which require a different result may only differ by a single bit. For example, for 8-bit parity, the network inputs 0000001 1 and 00000001, which are very similar, would require different outputs. Eqns. 4, 11 and 12 have been used to model the network and have enabled desirable weights to be generated in software, using the back- propagation equations [I]:

where WJ2 is weight, h is learning rate, SpJ is error of neuron j for pattern p , SO,,JS Wjl is the change in out- put with respect to weight, tcJ is target output, j’(netpJ) is the gradient of transfer function, GEplGnetpk is the change in error with respect to net input and Snetpkl SO,, is the change in net input with respect to output.

xlo-’o 1.0r

+

C +

x10-5 1

epoch epoch a b

Fig. 16 Software leuvnir2g: XOR problem

Fig. 16 shows an example of a training run for the XOR or 2-bit parity problem, using a 2-2-1 neuron network with full interconnection between layers. The weights calculated in software using the given equations are converted to an 8-bit representation and down loaded onto RAM, from which they can be loaded onto the synapse chips via DACs. It was found that the hardware network outputs were identical to the trained software version indicating that the equations accurately model the hardware. The speed of the network, that is the time required to change the output after the input has changed, was found to be typically twice the sample-and-hold time. This was as expected because there are two synapse chips for the inputs to propagate through. Therefore, for the present hardware configuration, the forward-propagation speed is 2 0 ~ . If the chips implemented a larger network, the forward-propagation time would be the same because it is the number of layers not the number of connections that has the major effect on the speed. This compares very favourably to simulations of the parity problem using the Matlab neural network toolkit on a SPARC 10 workstation, as shown in Fig. 17. As can be seen,

IEE Proc -Circuits Devices Syst , Vol 145, No 3, June 1998

Page 7: Hardware implementation of a pulse-stream neural network

assuming that the chips were sufficiently large, then the hardware implementation always outperforms the software implementation on this particular platform. So, for example, if the neuron and synapse chips were able to implement 16-bit parity, then the hardware would be able to perform a forward pass in 20ps while the software takes 1.4s. This is obviously a very significant improvement. This would require only 6 more neurons in the input and hidden layers and 16 x 16 synapse arrays instead of the present 10 x 10 of the test chips.

0.8

5 0.6-

z 0.1-

0. 3 c

z U

0.2

VI

VI VI

a -0

0 % 0

In

c

.- : Y

C .- .l- 3

a2 X

1.0-- ~~ - .. -

-

- - - - - - - - - - - - - - - - - - .. - - - - - - - - - I I

2 0 2 L 6 8 1 0 1 2 1 L 1 6 number of bits

Fig. 17 Average of 100 runs and 85% CPU time

Software implementations of parity problem

Measurements were also made on the stability of the outputs to determine the effects of the sample-and-hold circuit. Over a 2 min period, no significant change in output was observable, as can be seen from the results of the XOR problem as shown in Fig. 18.

5 Conclusion

We have presented two custom integrated circuits which together implement ANNs. The neuron has a sigmoid transfer characteristic which accurately imple- ments eqn. 4. The synapse is small and has excellent linearity. A 3-layer ANN has been implemented using 3 neuron chips and 2 synapse chips, thus enabling a 10 neuron per layer ANN with full interconnection between layers. The ANN has been tested using the parity hard learning problem. The weights are gener- ated in software using eqns. 4, 11 and 12, the results of which can be downloaded directly onto the synapse chips because of the excellent correspondence between

IEE Proc -Circuits Devices Syst , Vol 145, No 3, June 19Y8

theory and implementation. Results from the hardware closely match those obtained by modelling, indicating that the equations correctly describe the hardware implementations. Future work will concentrate on developing larger networks so that more complicated problems can be learnt. This will enable the hardware to be used in real-world applications, such as pattern recognition, where the performance advantage of hard- ware neural networks can be exploited. In these situa- tions, the neural hardware would be used in the training cycle because of the greater speed and hence reduced training times. In such cases, the neuron and synapse models will enable the error backpropagation learning algorithm to obtain excellent solutions for the hardware, while exploiting the favourable characteris- tics of the sigmoid function. This is because the algo- rithm is able to produce accurate weight changes caused by the accuracy of the models used in the weight update calculations.

6 Acknowledgments

R.J. Haycock would like to thank ESPRC for a 3-year research studentship.

7

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

References

RUMELHART, D.E., and MCCELLAND, J.L.: ‘Parallel dis- tributed processing, explorations in the microstructure of cogni- tion - volume 1: foundations’ (The Massachusetts Institute of Technology, London, 1986) ALEKSANDER, I., and MORTON, H.: ‘An introduction to neural computing’ (Chapman and Hall, London, 1991) MURRAY, A.F., and TARASSENKO, L.: ‘Analogue neural VLSI, a pulse stream approach’ (Chapman and Hall, London, 1994) GLESNER, M., and POCHMULLER, W.: ‘Neurocomputers, an overview of neural networks in VLSI’ (Chapman and Hall, Lon- don, 1994) KUNG, S.Y.: ‘Digital neural networks’ (PTR Prentice-Hall, Eng- lewood Cliffs, 1993) HAMILTON, A., MURRAY, A.F., and REEKIE, H.M.: ‘Inte- grated pulse stream neural networks: results, issues, and pointers’, ZEEE Trans. Neural Netw., 1992, 3, (3), pp. 385-393 HAMILTON, A., CHURCHER, S., EDWARDS, P.J., JACK- SON, G.B., MURRAY, A.F., and REEKIE, H.M.: ‘Pulse stream VLSI circuits and systems: the epsilon neural network chipset’, h t . J. Neural Syst., 1993, 4, (4), pp. 395-405 BAXTER, D.J., MURRAY, A.F., and REEKIE, H.M.: ‘Fully cascadable analogue synapses using distributed feedback in VLSI for artificial intelligence and neural networks’ (Plenum Press, London, 1991) MEAD, C.: ‘Analog VLSI and neural systems’ (Addison-Wesley Publishing Company, New York, 1989) MASSENGILL, L.M., and MUNDIE, D.B.: ‘An analog neural hardware implementation using charge-injection multipliers and neuron-specific gain control’, ZEEE Trans. Neural Netw., 1992, 3, (3), pp. 354362 LAZZARO, J.: ‘Low power spiking neurons and axons’. ISCAS’92, San Diego, USA, May 1992, pp, 2220-2223 HOROWITZ, P., and HILL, W.: ‘The art of electronics’ (Cam- bridge University Press, New York, 1989, 2nd edn.) ALLEN, P.E., and HOLBERG, D.R.: ‘CMOS analog circuit design’ (Holt, Rinehart and Winston, New York, 1987) LOMAS, D.G.: ‘A high resolution electron imaging device’. PhD Thesis, Department of Electrical Engineering and Electronics, University of Manchester Institute of Science and Technology, 1996 TOUMAZOU, C., LIDGEY, F.J., and HAIGH, D.G.: ‘Ana- logue IC design: the current mode approach’ (Peter Peregrinus Ltd., 1990) WATANABE, T., KIMURA, K., AOKT, M., SAKATA, T., and ITO, K.: ‘A single 1.5V digital chip for a 106 synapse neural net- work’, IEEE Trans. Neural Netw., 1993, 4, (3), pp. 387-393 KONDO, Y., KOSHIBA, Y., ARIMA, Y., MURASAKI, M., YAMADA, T., AMISHIRO, T., MORI, H., and KYUMA, K.: A 1.2 GFLOPS neural network chip for high-speed neural net-

work servers’, ZEEE J. Solid-state Circuits, 1996, 31, (6), pp. 860- 864

147