[ieee 2010 international conference on educational and information technology (iceit) - chongqing,...

2010 International Conference on Educational and Information Technology (ICEIT 2010)

SIMULATION OF ARTIFICIAL NEURAL NETWORKS ON PARALLEL

COMPUTER ARCHITECTURES

Khushboo Aggarwall

lUG Research Scholar, Bharati Vidyapeeth College of Engineering, Computer Science Dept. New Delhi, INDIA

E-mail: [email protected]

Abstract-Artificial neural networks are contemplated as the

future of computing technology. It tenders solutions for

numerous complex problems such as robotics arm control,

speech recognition, signal processing, pattern recognition etc. These networks exhibit inherent parallelism i.e. that is it is

their rudimentary property to perform parallel computing

because the networks emulates the structure of a human brain

which has the property to respond to various inputs parallelly

and to exploit this parallelism we need to run these on parallel

processors .This aggravates the computation speed

substantially. Despite the increase in computational speed, the

complex and exorbitant structure of these is a major drawback. This paper is directed to study the various techniques through

which we can implement the learning processes parallely .The

two types of simulation techniques are software techniques and

hardware techniques. The paper alludes the software

techniques of simulation. In the last section we also study

various topologies and models and infer results based on

complexity and costs, thus showing best available option for

parallel computing arrangement so as to provide maximum

output in minimum time.

Keywords--Artificial neural networks, Parallel simulation, Parallel and Distributed processing, Neuroprocessing, Speedup, Granularity

I. INTRODUCTION

Artificial Intelligence is a colossal sphere of science and technology which focus at providing machines the jurisdiction to conceive and to perceive things which are in the domain of human beings. Smart machines are able to perform tasks such as to communicate efficiently in English or any comprehendible language , to analyze a problem and draw its conclusion to detect patterns. An AI machine can be provided infinite capabilities that can perform tasks that are almost impossible for humans to do .The applications stretch from the military for autonomous control and target identification, to the entertainment in the form of computer games and robotic pets. AI can be used in banks, hospitals, and insurances to predict customer behavior and detect trends. Some recent examples of AI machine is a robot named ASIMO" having the capability to avoid obstacles and navigate stairs through sensors and intelligent systems. An artificial neural networks attempt to impersonate the Computational power of human brain by simply connecting numerous computational neurons with each other. A human brain consists of miIIions of neurons each with an average of

1000-10000 connections. The working is based on the apotheosis that a node receives an input, performs the weighted summation calculation and applies an activation summation to the sum and outputs the result to other nodes. In the learning phase, a network learns a problem i.e. set of training an example is dispensed into a network and weights are adjusted in accordance with input values, correlation, output values. The learning process is the most compute bound part of processing. The use of ANN contributes to fact of increase boom in simulation methods and neurohardware. The implementation of large parallel and complex models on single processor machines is tedious and highly inefficacious as all the nodes (neurons) have to be processed by a single processor resulting in increase in computational time. As a result their implementation on parallel computers is a requisite for better performance. Parallel computers are having more than one processor units thus resulting in multiprocessing. On the other hand building multi processor system with number of single processors is cost effective rather than a high performance single processor system. Further, we infer results from various techniques or models given in order to optimize helps the parallel processors in such a way that it provides maximum output with minimum work i.e. performance increases with decrease in time.

II. SIMULATION TECHNIQUES ON PARALLEL MACHINES

For simulating on parallel hardware for general purpose computers we need to develop an efficient system taking into account specific features of neural network models and available hardware. There are various techniques for simulation of neural nets.

These are as follows: 1. Training-session parallelism( executing different

sessions simultaneously) 2. Simultaneous learning 3. Layer parallelism(concurrent execution of layers) 4. Node parallelism(execution of nodes parallel) 5. Weight parallelism(simultaneous weighted matrix

calculations)

This characterization was made on feed-forward networks.

978-1-4244-8035-7110/$26.00 © 2010 IEEE V2-255

2010 International Coriference on Educational and Information Technology (ICEIT 2010)

Fig 1

III. PARALLEL COMPUTER ARCHITECTURES

The large networks can be simulated in different topologies. these can be distinguished into two:

Data parallel techniques: In data paraJlel techniques have centralized control and distributed data. A large amount of data is processed synchronously or asynchronously. The synchronization of computation is centralized. the most popular data paraJlel computers are Connection Machine, systolic array, Warp and MasPar. Data parallelization exploits three types of parallel techniques:

t. Coarse-structuring: Zhang et al. used node-per layer and training parallelism to implement Back Propagation on Connection Machine. A node from each layer is stored on one processor so that a 'slice' of nodes is stored on each processors. The number of processors is equal to the number of nodes in largest layer. The weights are stored in memory structure which is shared by 32 processors. The implementation was tested on Net Talk.

2. Fine-structuring: Rosenberg and Blelloch used node and weight parallelism to implement Back Propagation on Connection Machine. The processors are organized as 1-0 array. Every node is assigned to a processor and every connection has two nodes: one for input and one for output. a pass in forward direction was implemented by spreading the activation values over processors holding forward connection. The connection processors multiply their respective weights and forward the results to the respective node processors. Products are added incrementally at the destinations.

3. Pipelining: Pipeline structures is exploited on systolic arrays. The processors are arranged in one dimensional or multi dimensional arrays. The operations are performed in pipe lined fashion. Pormerleau et al have used training and layer parallelism for implementation of the Back Propagation. In forward phase of learning, the activation values are circulated along the ring multiplying it with the weights. Each processor accumulates the partial weighted sum. When the final sum is calculated it is passed through the activation function. During back phase instead of activation values, errors are passed through.

Structure No. of Computer Performan processors Architecture ce

(CUPS)

COARSE 64K Connection 38M Training, machine Node/layer

COARSE 16K MesPar 42M Training, Node/layer

FINE 64K Connection 13M Node, machine weight

PIPELINE 10K Warp 17M Training, layer

PIPELINE 13K Systolic 248M Layer, Array node

COARSE 6K Transputers 207K partition

Table 1

IV. STUDY OF VARIOUS MODELS OF PARALLEL ARCHITECTURE

The new technologies that are using a network/cluster of computers (PCs or workstations, to SMPs,) which are a costeffective way of parallel processing and this is consequently leading to low-cost commodity super-computing.

Speed-up is used as reference in determining the success of a parallel algorithm. Defined by ratio of the elapsed time using m processors for parallel algo and elapsed time

V2-256

2010 International Coriference on Educational and lriformation Technology (ICEIT 2010)

completing the same task using sequential algorithm with one processor.

Granularity is the average process size measured in instructions executed. Granularity does affect the result of the resulting program. In most cases overhead related to communications and synchronization is high relative to execution speed so its advantageous to have coarse granularity, (high computation to computation ratio)

DYNAMIC network: It consists of the following topologies: The Crossbar , Multiple Bus, Multistage Interconnection Networks

Static Networks: It consists of the following topologies Completely Connected Networks (CCNs)

Linear Array Networks, Tree Networks, Cube-Connected Networks, Mesh-Connected Networks

The k-ary n-Cube Networks

Network Degree Cost Symmetry Worst Delay

CCNs N-l N (N- Yes 1 1)12

Linear 2 n-l No N Array Binary 3 n-l No LogzN Tree n-Cube LogzN nNI2 Yes LogzN

2-D 4 2(N-n) No Sq. Mesh rt.N k-ary 2n N*n yes K* n-cube LogzN

Table 2. Performance characteristics of Static Networks

where d is the degree of node d=d(in) + d(out) , D is the diameter of the network having N nodes defined as longest path p of the shortest path between any two nodes D=max(min(length(p» and symmetric if it is isomorphic to itself .

The following inferences can be drawn from the table: We learn that if the optimization criteria depends on the

cost of network then the cost of CCN' s and 2-D mesh being o (N

z) as compared to Linear array, Binary tree and K ary n

cube being 0 (N) and for that of n-cube it is 0 (2ll)

EQUAL DURA TIONAL MODEL: According to this model, a given task is divided into n

equal subtasks each to be executed on a single processor. If all the processors are executing their subtasks simultaneously, the time taken to execute the whole task is 1",= t,/n , where ts is time taken for execution on single processor, n processors is equal to the number of processors used. Sen) = t.ltm = n

The above equation indicates the speedup factor resulting from using n processors. If we also considering the

communication overhead factor i.e. the time needed for processors to communicate and exchange data with each other while executing their subtasks , let it be 1:c then the actual time taken by each processor to execute its subtask is given :

T m=(t.ln)+ tc

Sen) = n/(1 +n(tc Its»

E(efficiency)=lI(l +n(tc Its» Where efficiency is measure of speed up acquired per

processor.

The following inferences can be drawn: l.) if tc«ts then potential speed up factor is n 2.) if tc»ts then potential speed up factor is ts I tc «1 3.) if ts=tc then the potential speed up factor is n/(n+l)

=1 for n>1

v. CONCLUSION

This paper describes the various researches carried out in this field. This discussion is followed by description of various techniques for general purpose hardware and latter description of various models and topologies of parallel architecture and draw inferences based on costs of parallel computing techniques. The following result graphs were obtained on simulation

l ro.----r----�--�----_.----�------__.

[ )J

Y II -- .-

_ .- -

-WO�OCK:CIf == IhlM p(OC«';S«

�==/�/ .... ---.. ___ -t',.

�L----L----�--�----�----L-------� 180 190 100 �IO 720 2""JO 200 150

No.ofpro,e:S:5con

GRAPH 1: Efficiency vs. Number of Processors

V2-257

2010 International Coriference on Educational and Information Technology (ICEIT 2010)

4 ProcessOl'$

GRAPH 2: Speed up vs. number of processors

REFERENCES

[I] D. M. Anthony. Reducing connectivity in compression networks. Neural Network Review, 1990.

[2] 1. M. 1. Murre. Neurosimulators. In M. A. Arbib, editor, Handbook of Brain Research and Neural Networks. MIT Press, 1995.

[3] R. C. O'Reilly, C. K. Dawson, and 1. L. McClelland. Pdp++ users manual. Technical report, Carnegie Mellon University, April 1997. Available from http//wwwcnbc. cmu.eduIPDP++IPDP++.html.

[4] D. E. Rumelhart, J. L. McClelland, and the PDP Research Group. Parallel Distributed Processing: Exploration in the Microstructure of Cognition, volume I. MIT Press, Cambridge, Massachusetts, 1986

[5] A. Zell et al. SNNS Manual. SNNS can be retrieved from ftp.inforrnatik.uni-stuttgartde from /pub/SNNS.

[6] A. Zell, T. Korb, N. Mache, and T. Sommer. Recent developments of the SNNS Neural Network Simulator. In Proceedings of the Applications of Neural Networks Conference, SPIE, volume 1294,

[7] A. Zell, T. Korb, T. Sommer, and R Bayer. A neural network simulation environment In Proceedings of the Applications of Neural Networks Conference, SPIE, volume 1294, pages 534{544, 1990.

[8] N. H. Goddard, K. 1. Lynne, T. Mintz, and L. Bukys. The Rochester Connectionist Simulator: User Manual. University of Rochester, Tech Report 233, 1989.

[9] A. Zell, N. Mache, M. Vogt, and M. Huettel. Problems of massive parallelism in neural network simulation. In Proceedings of the IEEE Int Conf. on Neural Networks, San Francisco, CA, volume

[10] E. Mesrobian, 1. Skrzypek, A. Lee, and B. Ringer. A simulation environment for computational neuroscience. In 1. Skrzypek, editor, Neural Network Simulation Environments, chapter I. Kluwer Academic Publishers, 1993.

[11] P Sajda, K. Sakai, S.-c. Yen, and L. Finkel. Nexus: A neural simulator for integrating top-down and bottom-up modeling. In 1. Skrzypek, editor, Neural Network Simulation Environments, chapter 2.

[12] O. Ekeberg, P Hammarlund, B. Levin, and A. Lansner. Swim: A simulation environment for realistic neural network modeling. In 1. Skrzypek, editor, Neural Network Simulation Environments,chapter 3.

[13] D. Wang and C. Hsu. SLONN: A simulation language for modeling of neural networks. Simulation, 55(2):69{ 83, Aug 1990.

[14] L.-c. Chu and B. W. Wah. Optimal mapping of neural network learning on message passing multicomputers. Journal of Parallel and Distributed Computing, 14:319{339, 1992.

[15] B. W Wah and L.-c. Chu. efficient mapping of neural networks on multicomputers. In International Conference on Parallel Processing, volume I, pages 234{238, 1990.

[16] K. W. Przytula and V. K. P. Kumar. Algorithmic mapping of neural networks models on parallel SIMD machines. In International Conference on Application Specic Array Processing, 1990.

[17] K. W. przytula, W.-M. Lin, and V K. P. Kumar Partitioned implementation of neural networks on mesh connected array processors. In Workshop on VLSI Signal Processing, 1990.

[18] S. Shams and K. W. Przytula. Mapping of neural networks onto programmable parallel machines. In International Symposium on Circuits and Systems, May 1990.

[19] W.-M. Lin, V K. Prasanna, and K. W Przytula. Algorithmic mapping of neural network models onto parallel SIMD machines. IEEE Transactions on Computers, 40(12): 1390{1401, December 1991.

[20] DesigningT. Nordstrom Neural Distributed Computing Massively Networks", And Vo1.l4, Parallel Journal B. Svensson, No.3, 1992, Computers of Parallel"Usingpp.260

[21] C.R. Rosenberg and G. B1elloch. "An Implementations of Network Learning 1987, pp.329

[22] 1.Ghosh and K Hwang "mapping neural networks onto message passing multicomputers".

[23] Article on "Neural networks implementation on parallel architectures" by Seniz demir , Bogazivi University

V2-258

[ieee 2010 international conference on educational and information technology (iceit) - chongqing,...

Documents