aparameterisedgeneticalgorithmipcore:fpgadesign...

18
A parameterised genetic algorithm IP core: FPGA design, implementation and performance evaluation K.M. Deliparaschos*, G.C. Doyamis and S.G. Tzafestas Intelligent Robotics and Automation Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, Athens 15773, Greece (Received 22 March 2007; final version received 15 July 2008) Genetic algorithm (GA) is a directed random search technique working on a population of solutions and is based on natural selection. However, its convergence to the optimum may be very slow for complex optimisation problems, especially when the GA is software-implemented, making it difficult to be used in real-time applications. In this article, a parameterised GA intellectual property core is designed and implemented on hardware, achieving impressive time-speedups when compared to its software version. The parameterisation stands for the number of population individuals and their bit resolution, the bit resolution of each individual’s fitness, the number of elite genes in each generation, the crossover and mutation methods, the maximum number of generations, the mutation probability and its bit resolution. The proposed architecture is implemented in a field programmable gate array chip with the use of a very high-speed integrated- circuits hardware description language and advanced synthesis and place and route tools. The GA discussed in this work achieves a frequency rate of 92 MHz and is evaluated using the ‘travelling salesman problem’ as well as several benchmarking functions. Keywords: genetic algorithm; travelling salesman problem; field programmable gate array chip; very high-speed integrated-circuits description language; intellectual property core 1. Introduction Genetic algorithms (GAs), initially developed by Holland (1975), are based on the notion of population individuals (genes/chromosomes), to which genetic operations as mutation, crossover and elitism are applied. GAs obey Darwin’s natural selection law, i.e. the survival of the fittest. GAs have been successfully applied to several hard optimisation problems, because of their endogenous flexibility and freedom in finding the optimal solution of the problem (Koza 1992; Mitchell 1996). However, the most serious drawbacks of software-implemented GAs are the vast time and system resources consumption. Keeping that in mind, a multitude of hardware- implemented GAs have been evolved mainly during the last decade, exploiting the rapid evolution in the field of the field programmable gate array chips (FPGAs) technology and achieving impressive time-speedups. This article deploys the design and hardware implementation of a parameterised GA intellectual property (IP) core (Wikipedia 2008) on an FPGA chip. The genetic operators *Corresponding author. Email: [email protected] International Journal of Electronics Vol. 95, No. 11, November 2008, 1149–1166 ISSN 0020-7217 print/ISSN 1362-3060 online ? 2008 Taylor & Francis DOI: 10.1080/00207210802387494 http://www.informaworld.com Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Upload: others

Post on 15-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

A parameterised genetic algorithm IP core: FPGA design, implementationand performance evaluation

K.M. Deliparaschos*, G.C. Doyamis and S.G. Tzafestas

Intelligent Robotics and Automation Laboratory, School of Electrical and Computer Engineering,National Technical University of Athens, Athens 15773, Greece

(Received 22 March 2007; final version received 15 July 2008)

Genetic algorithm (GA) is a directed random search technique working on a populationof solutions and is based on natural selection. However, its convergence to the optimummay be very slow for complex optimisation problems, especially when the GA issoftware-implemented, making it difficult to be used in real-time applications. In thisarticle, a parameterised GA intellectual property core is designed and implemented onhardware, achieving impressive time-speedups when compared to its software version.The parameterisation stands for the number of population individuals and their bitresolution, the bit resolution of each individual’s fitness, the number of elite genes in eachgeneration, the crossover and mutation methods, the maximum number of generations,the mutation probability and its bit resolution. The proposed architecture is implementedin a field programmable gate array chip with the use of a very high-speed integrated-circuits hardware description language and advanced synthesis and place and route tools.The GA discussed in this work achieves a frequency rate of 92 MHz and is evaluatedusing the ‘travelling salesman problem’ as well as several benchmarking functions.

Keywords: genetic algorithm; travelling salesman problem; field programmable gatearray chip; very high-speed integrated-circuits description language; intellectualproperty core

1. Introduction

Genetic algorithms (GAs), initially developed by Holland (1975), are based on the notionof population individuals (genes/chromosomes), to which genetic operations as mutation,crossover and elitism are applied. GAs obey Darwin’s natural selection law, i.e. thesurvival of the fittest. GAs have been successfully applied to several hard optimisationproblems, because of their endogenous flexibility and freedom in finding the optimalsolution of the problem (Koza 1992; Mitchell 1996).

However, the most serious drawbacks of software-implemented GAs are the vast timeand system resources consumption. Keeping that in mind, a multitude of hardware-implemented GAs have been evolved mainly during the last decade, exploiting the rapidevolution in the field of the field programmable gate array chips (FPGAs) technology andachieving impressive time-speedups.

This article deploys the design and hardware implementation of a parameterised GAintellectual property (IP) core (Wikipedia 2008) on an FPGA chip. The genetic operators

*Corresponding author. Email: [email protected]

International Journal of Electronics

Vol. 95, No. 11, November 2008, 1149–1166

ISSN 0020-7217 print/ISSN 1362-3060 online

? 2008 Taylor & Francis

DOI: 10.1080/00207210802387494

http://www.informaworld.com

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 2: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

applied to the genes of the population are crossover, mutation and elitism, whoseemployed method is parametrically selected. The designed selection algorithm is the‘Roulette wheel selection algorithm’ (RWS). The FPGA chip used in this work is a XilinxXC3S1500–4FG676C Spartan-3 FPGA (Xilinx 2008a). A software implementation of thedesigned GA using the Matlab Platform has also been developed to produce input andoutput test vectors for the performance evaluation of the hardware-implemented GA usingseveral benchmark functions. Finally, after adapting the proposed hardware-implementedGA to the ‘travelling salesman problem’ (TSP), a successful solution to it has been found.

The organisation of this article is constituted of seven sections. After a brief introductionin Section 1 an overview of the designed GA follows in Section 2, whereas the detailedhardware-implementation of the algorithm is explained in Section 3. In Section 4, the designflow of the proposed GA is presented and is followed by its implementation results (Section5). The evaluation of the algorithm using the TSP and several benchmark functions follows(Section 6). Finally, to cap it all, concluding remarks are put forward in Section 7.

2. Overview of the designed GA

A high level view of the architectural structure of the presented GA is shown in Figure 1.The system is composed of six basic modules i.e. control module, fitness evaluationmodule, selection module, crossover module, mutation module and observer module. Thecontrol module implements a Mealy state machine (Zainalabedin 1997), which feeds allother modules with the necessary control signals guaranteeing their synchronisedexecution. The selection module implements the RWS (Koza 1992; Mitchell 1996;Tzafestas 1999) picking the genes of the current population (parents), which will begenetically processed to create the individuals of the new population (offsprings/children).

Following this, the crossover and mutation modules perform the corresponding geneticoperations on the selected parents of the current generation. Thereafter, the fitnessevaluation module not only computes the fitness of each offspring produced by theaforementioned modules but also applies elitism to them, creating the elite genes for the

Figure 1. Architectural structure of the GA (high level view).

1150 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 3: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

next generation. The task of the observer module is to determine if the stopping criteria ofthe GA, i.e. maximum number of generations, fitness limit, have been met so as to decidethe continuation or not of the algorithm.

Four random number generators (RNG) are also used to produce both the initialrandom generation and the necessary random numbers. In addition, one RAM is neededfor the storage of the current gene population (random access memory (RAM 1)) andanother one for the storage of the selected parents (RAM 2) of each generation. In thefollowing section, the aforementioned hardware modules will be described in detail.

3. GA Hardware implementation

This section describes and explains analytically the various hierarchical modules of thepresented GA architecture.

3.1. GA characteristics

The parameters characterising the proposed GA are summarised in Table 1.

3.2. GA architecture

The architecture employed is depicted in Figure 2. As shown, the architecture is brokeninto separate blocks, each one of which performs a particular task, coordinated by thecontrol block. Moreover, they send back signals to the control module notifying theirstate, i.e. ready out signals. Signal and data buses (tenuous and bold lines, respectively) arenoted on the block diagram.

3.2.1. Control module

To assure the control and to synchronise the order of execution of the several hardware-implemented modules of the proposed GA architecture, a control module has beenimplemented. This control block, illustrated in Figure 3, produces and feeds all othermodules with the needed control signals using a nine-state Mealy state machine. The tasksperformed by each of the nine states, i.e. clear_ram, fill_ram, fit_eval, sel, cross, mut, done,read_write_ram_1, read_write_ram_2, is described in Table 2.

3.2.2. Fitness evaluation module

The fitness evaluation module functions every time a new population of individuals isformed. This block performs two separate tasks; on the one hand, it calculates the fitness

Table 1. GA characteristics.

Parameter name Description Possible value

genom_lngt Chromosome length in bits 16score_sz Fitness value bit resolution 16pop_sz Population size 32scaling_factor_res Bit resolution of the random number used in RWS algorithm 4elite Number of elite children 2mr Mutation rate 80

International Journal of Electronics 1151

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 4: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

Figure

2.

GA

architecture.

1152 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 5: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

of each individual according to the given fitness function, and on the other, it performselitism on the current population producing the elite genes for the next generation. Havingthat in mind, the structural architecture of this module consists of two sub-modules asshown in Figure 4. One of the sub-modules is used for fitness calculation and the other forperforming elitism. The aforementioned modules are externally connected to the

Table 2. Diverse states of the implemented state machine.

States Task performed

clear_ram Clear RAM 1 (population ram)fill_ram Fill RAM 1 with a random gene, to create the random initial populationfit_eval Fitness evaluation of the input gene and generation of the elite

offsprings’ indexessel Selects one parent among the genes of the current populationcross Apply crossover operation to the input genesmut Apply mutation operation to the input genedone Check of the termination criteriaread_write_ram_1 Read/Write the population ram (RAM 1)read_write_ram_2 Read/Write the parent’s ram (RAM 2)

Figure 3. Control state machine.

International Journal of Electronics 1153

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 6: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

remaining GA architecture, so as to allow any further fitness functions without affectingthe rest of the design. Furthermore, the module outputs both the sum of fitnesses and themaximum fitness of the current population as well as the RAM indexes of the elite genes.The number of elite genes is parametrically set. In order for an individual to become anelite gene of the next generation, i.e., to survive in the next generation, it has to be thefittest among the rest of the individuals.

3.2.3. Selection module

This section describes the selection module, which operates after the fitness evaluation ofthe individuals of the current population has ended and is controlled by the controlmodule. The selection block, connected to both RAMs, implements the RWS (Figure 5a),a pseudocode of which is shown in Figure 5b.

The selection module receives a random number produced by the RNG. It then scalesdown the sum of the current population fitnesses produced by the fitness evaluationmodule. Afterwards, it selects one gene of the current population, called the parent,through the RWS algorithm and notifies the control module about the completion of itstask. Consequently, this parent will be stored in RAM 2. The selection block will thencontinue selecting parents in the same way till their number suffices to produce a newpopulation with an equal number of individuals to the current population. The selectionoperation performed depends on the fitness value of each stored gene, i.e. the gene with ahigher fitness value is more likely to be selected than a gene with a lower one.

3.2.4. Crossover module

This section describes the crossover module that runs after the selection module hascompleted its task and applies the crossover operation to the selected parents. Thecrossover method to be implemented is parametrically employed. There are diversecrossover strategies reported in literature (Koza 1992; Koza, Bennet, Andre and Keanne1999). The present implementation includes three different crossover methods, i.e. singlepoint crossover, two point crossover and uniform crossover, which are schematicallyexplained in Figure 6a–c, respectively.

It is obvious from the figures that the crossover module needs a coupleof random numbers (random crossover points, mask), according to the method

Figure 4. Top structural architecture of fitness evaluation module.

1154 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 7: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

employed, in order to apply the desired crossover operation. As a result two RNGs areused for that reason. The former produces the crossover points (if the method put touse is single or two point crossover) and the other the mask needed for the applicationof uniform crossover to the parents (if the method selected is uniform crossover). Thecrossover block outputs one offspring in each execution, produced by two of theselected parents.

3.2.5. Mutation module

This section describes the mutation module, which functions after the crossover modulehas completed its task and applies the parametrically employed mutation method tothe crossovered offspring, i.e. the offspring produced by crossover module. Variousmutation strategies are reported in literature (Koza 1992; Mitchell 1996). The proposeddesign includes three different mutation methods, i.e. single point mutation, maskedmutation and uniform mutation, which are schematically explained in Figure 7a–crespectively.

As derived from the abovementioned figures, the mutation module requires a couple ofrandom numbers (random mutation points, mask, random numbers pr,i), according to the

Figure 5. Roulette wheel selection: (a) algorithm and (b) pseudocode.

International Journal of Electronics 1155

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 8: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

Figure 6. Crossover methods: (a) single point crossover; (b) two point crossover; (c) uniformcrossover.

method employed so as to apply the desired mutation operation. For this reason twoRNGs are needed. The former produces the mutation points (if the method put intooperation is single point mutation) and the other a random binary mask (if the methodemployed is masked mutation). The latter RNG also generates the necessary randomnumber to decide if mutation operation will be applied, i.e. only if its value is less or equalto the parametrically set mutation probability pm, will mutation be applied to theoffspring. If the mutation method employed is uniform mutation, it also generates theessential random numbers pr,i, as shown in Figure 7c. The mutation block outputs one

1156 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 9: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

Figure 7. Mutation methods: (a) single point mutation; (b) masked mutation; (c) uniformmutation.

offspring in each execution produced by processing one offspring, result of crossover, eachtime.

3.2.6. Observer module

This section describes the observer module, which executes each time a new population hasbeen formed. This block determines the continuation of the algorithm, checking if theparametrically set stopping criteria, i.e. maximum generations, fitness value limit, havebeen met.

3.2.7. Random number generators

This section describes the RNG modules, which feed most of the described modulesmentioned earlier with random numbers. This block implements a linear feedback shiftregister (LFSR) generator, whose sequence length is parametrically set. The maximumlength of the random-generated sequence is 128 bits, whereas the specific characteristics ofthe four RNGs used in our design are shown in Table 3. The hardware design of an 8-bitLFSR generator is shown in Figure 8.

3.2.8. Random access memory

This section describes the RAM modules, RAM 1 and RAM 2, which store the currentpopulation and the selected parents, respectively. Both the address and data widths areparametrically set, as shown in Table 4.

International Journal of Electronics 1157

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 10: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

4. Design flow for the GA

This section presents the design flow, illustrated in Figure 9, in a top-down manner(Deliparaschos, Nenadakis and Tzafestas 2006), followed in our design. In a top-downdesign, one first concentrates on specifying and then on designing the circuit functionality(Sjoholm and Lindh 1997). The starting point of the design process is the system levelmodelling of the proposed GA. The latter enabled us to evaluate the proposed model andto extract valuable test vector values to be used later for RTL and timing simulation. Veryhigh-speed integrated-circuits hardware description language (VHDL) language was usedfor the description of the circuit in register transfer level (RTL). Special attention has beenpaid to the coding of the different blocks, because we aim at writing a fully parameterisedcode for the GA IP core. The GA can be parametric in terms of the number of populationindividuals and their resolution in bits, resolution in bits of the fitness, number of elitegenes in each generation, number of maximum generations, mutation probability and itsresolution in bits, the resolution in bits of the scaling factor r used in selection module aswell as the method used for crossover and mutation.

The GA IP core presented here utilises 32, 16, 16, 2, 1, 1, 500, 80, 8, and 4, respectively,for the generic parameters aforementioned; however, each parameter is properly setaccording to the optimisation problem to be solved. A VHDL package stores theabovementioned generic parameters for each module. An RTL simulation has beenperformed to ensure the correct functionality of the circuit. Next, logic synthesis has beendone, where the tool first creates a generic (technology-independent) schematic based onthe VHDL code and then optimises the circuit to the FPGA specific library chosen

Figure 8. Hardware implementation of an 8-bit LFSR.

Table 4. Parameters of RAM 1 and RAM 2.

RAMs Address width Data width

RAM 1 (population ram) pop_sz genom_lngt þ score_szRAM 2 (parents’ ram) 2 ? (pop_sz-elite) genom_lngt

Table 3. Characteristics of the used RNGs.

RNG Length

RNG 0 genom_lngtRNG 1 scaling_factor_resRNG 2 genom_lngt þ mut_resRNG 3 2*log2(genom_lngt)

1158 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 11: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

(Spartan-3 1500–4FG676). At this point, area and timing constraints and specific designrequirements must be defined as they play an important role for the synthesis result.

Following this, the Xilinx ISE (Xilinx 2008b) place and route (PAR) tool accepts theinput netlist file (.edf), generated with the Synplify Pro synthesis tool and continues asfollowing: First, the translation programme translates the input netlist together with thedesign constraints to a Xilinx database file. The map programme maps the logical designto the Xilinx FPGA device, after the translation programme has run successfully. Finally,the PAR programme accepts the mapped design, places and routes the FPGA andproduces output for the bitstream generator (BitGen). The latter programme receives theplaced and routed design and generates a bitstream (.bit) for Xilinx device configuration.

Figure 9. Design flow.

International Journal of Electronics 1159

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 12: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

Before programming the FPGA file, a timing simulation is performed to ensure that thecircuit meets the timing requirements set and functions equivalent to the RTL code.

5. Implementation results

This section describes the implementations results of the designed GA. After the synthesisof the design, ISE translates maps and places and routes our design to the FPGA device.The FPGA utilisation for both the GA and the GA adapted to the TSP produced by ISEare shown in Table 5.

The hardware implementation of the proposed GA achieves an internal clockfrequency rate of 92 MHz (10.8 ns) whereas the adapted GA to the TSP achieves aninternal clock frequency rate of 91 MHz (11 ns). Moreover, 2450 ns (2.4 ms) and 14,391 ns(14.3 ms) are required to form a new generation of eight individuals in the former andlatter version of the GA respectively. Finally, the VHDL codes for the GA modelspresented here are fully parameterised, allowing us to generate and test the GA modelswith different specification scenarios.

6. Evaluation results

6.1. Introduction

The evaluation of the system performance has been made both by solving the TSPproblem and by optimising several benchmark functions (Digalakis and Margaritis 2002),which are noted in Subsection 6.3.2. To evaluate the performance of the implemented GAusing the TSP, first the hardware was appropriately adapted to the TSP and second asoftware version of the hardware-implemented GA for the TSP was written. Both thesoftware version of the GA and the one adapted to the TSP have been developed on theMatlab platform. The following subsections explain analytically the hardware adaptationof the proposed GA to the TSP (Subsection 6.2) and the evaluation results (Subsection 6.3)with the use of various helpful figures.

6.2. GA hardware adaptation to the TSP

According to the definition of the TSP (Pham and Karaboga 2000), each city should bevisited only once. So every gene of the population, which contains the towns to be visited

Table 5. FPGA utilisation for the implemented GA.

GA GA adapted to the TSP

Logic utilisationSlice flip flops 681 (2%) 1045 (3%)Four input LUT’s 1.086 (4%) 1630 (6%)

Logic distributionOccupied slices 892 (6%) 1305 (9%)Four input LUT’s 1.116 (6%) 1686 (6%)Used as logic 1.086 1630Used as route-thru 6 4Used as 16 6 1 RAMs 24 52Bonded IOBs 59 (12%) 53 (10%)MULT 18 6 18s 1 (3%) 3 (9%)GCLKs 1 (12%) 1 (12%)

1160 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 13: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

in sequence, must contain each town only once. Because our genes are unique, we cannotpossibly use the aforementioned crossover and mutation techniques, but new crossoverand mutation methods must be developed (Martel 2007).

The crossover operator uses a pool of indices of the towns. To keep the uniqueness ofthe visited cities, used indices will be removed from the pool. The offspring produced afterthe application of crossover to the parents is formed via the following procedure: in thebeginning, a random crossover point is generated and then the town indices of the firstparent are added to the offspring starting from the crossing site to the head of the gene.The aforementioned indices are removed from the pool. Afterwards, the right side of thegene is filled by checking whether the town indices in the right side of the second parent arecontained in the pool. If a town index is still free, i.e. exists both in the pool and the secondparent, we place it in the offspring and remove it from the pool; otherwise we skip it andleave its place in the offspring empty till we reach the tail of the gene. Finally, empty placesof the offspring are randomly filled with the town indices remaining in the pool. Thedescribed method is depicted in Figure 10a.

The developed mutation operator utilises two random generated mutation points andsimply swaps the town indices stored at these points of the gene. The method is shown inFigure 10b.

The fitness evaluation function implemented here is the computation of the sum overthe number of towns (N) of the square of the Euclidean distances between two adjacenttowns according to the computed tour, as shown in the Equation (1) later. We compute thesquare of the Euclidean distance to avoid the hardware implementation of the square root,

Figure 10. GA hardware adaptation to the TSP: (a) crossover operation for the TSP and (b)mutation operation for the TSP.

International Journal of Electronics 1161

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 14: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

because it is not necessary as we do nothing but merely compare the fitnesses among thechromosomes.

fitj ¼XN?1

i¼1

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffixi ? xiþ1ð Þ2þ yi ? yiþ1ð Þ2

q? ?2

þffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffixN ? x1ð Þ2þ yN ? y1ð Þ2

q? ?2

; for j 2 ½1; pop sz?

ð1Þ

Because we seek the optimal minimum path connecting the given towns and the GA isdesigned to result in the gene with the maximum fitness instead, i.e., with the maximumsum of Euclidean distances, we have either to either invert the computed fitness or tosubtract the fitness from its highest value, according to the resolution of bits adopted forthe binary coding of it, i.e. 2score_sz-1 (see Table 1), in order to get higher values for smallerpath lengths. The method for the inversion of the fitness to be implemented is selectedthrough a generic in the VHDL code (inv_type).

6.3. Evaluation results

6.3.1. TSP

The performance evaluation of the proposed GA using the TSP has been performed bycomparing the time needed for the software version developed and the one needed for thehardware-implementation to find the optimal solution. The results for eight cities, 60generations and 32 individuals are summarised in Table 6, where an impressive speedupratio of 11.035 can be observed. Figure 11a depicts the map of the eight cities used, whichare existing cities of the Greek territory, whereas the influence of the population size(pop_sz) on the generations needed for the GA to find the optimal solution is shown inFigure 11b. The algorithm was also tested using the benchmark burma14 derived from theTSPLIB (Reinelt 2007), and the result is depicted in Figure 11c.

6.3.2. Benchmark functions

Several benchmark functions are known in the literature (Zhang and Zhang 2000;Digalakis and Margaritis 2002) for evaluating the performance of a GA, i.e. its abilityto reach the optimum of an objective function. In our case we have tested theproposed GA using the following functions, which are noted in Table 7 and depicted inFigure 12a–c.

In the following figures, the results of a number of experiments using theaforementioned functions are presented. Figure 13a shows the effect of the chromosomelength (genom_lngt) on the optimal solution found, whereas Figure 13b depicts the effectof the population size (pop_sz) on the generations needed by the algorithm to converge.

Table 6. Software versus hardware-implemented GA.

GA version Time (msec)

Hardware (clk ¼ 10.8 ns) 1.702Software (Pentium 43.2 Ghz 1Gb RAM) 18,783

1162 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 15: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

Finally, the influence of the population size on the calculation time is shown in Figure 13c.We also have to note that the precision of the optimum value found by the proposed GAdepends on the chromosome length adopted in each experiment. A length of 16 and 32 bitsfor evaluating one- and two-variable benchmark functions, respectively, is observed togive high accuracy on the result in relatively low calculation times.

Figure 11. TSP evaluation: (a) map of the cities used; (b) population size versus generations and (c)TSP solution of burma14 benchmark.

Table 7. Benchmarking functions.

Function name Type Optimum point

F1; Zhang Zhang (1 7 2 sin20 (3px) þ sin20 (20px))20, for x 2 (0,1) (0.675,1.0485)F2; Rastrigin’s

100þP2

i¼1

x2i ? 10 ? cos 2pxið Þ? ?

; for? 5:12 ? xi ? 5:12((0,0),120)

F3; Easom Y2

i¼1cos xið Þ

!? e

?P2i¼1

xi?pð Þ20@

1A;

for 2 ½10; 100? and? A ? xi ? A

((p, p),1)

International Journal of Electronics 1163

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 16: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

7. Conclusion

In this work, we presented a fully parameterised GA IP core in terms of the number ofpopulation individuals (pop_sz) and their resolution in bits (genom_lngt), resolution in bitsof the fitness (score_sz), number of elite genes in each generation (elite), method used forcrossover (cross_method) and mutation (mut_method), number of maximum generations(max_gen), mutation probability (mr) and its resolution in bits (mut_res), as well as theresolution in bits of the scaling factor r used by the RWS algorithm. This parameterisationallows the adaptation of the GA to any problem specifications without any further changeto the developed VHDL code. Furthermore, the proposed hardware implemented GAoperates at a clock rate of 92 MHz (10.8 ns) and achieves a noteworthy speedup whencompared to its software version. Additionally, the hardware area required for theimplementation and the requirements of RAM are kept small according to the PAR report(see Table 5).

Compared to other GAs hardware implementations (Aporntewan and Chongstitva-tana 2001; Tu Lei, Zhu Ming-cheng and Wang Jing-Xia 2002; Tang and Yip 2004; Zhu,Mulvaney and Chouliaras 2007), our design operates at a clock frequency up to five timesfaster and implements more than one crossover and mutation methods, which can be

Figure 12. Benchmark functions: (a) Zhang Zhang function; (b) Easom function; (c) Rastrigin’sfunction.

1164 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 17: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

changed during its execution. Moreover, our design utilises more parameters and isevaluated not only by using benchmarking functions but also by solving the NP–completeTSP.

References

Aporntewan, C., and Chongstitvatana, P. (2001), ‘‘A Hardware Implementation of the CompactGenetic Algorithm,’’ in Proceedings of IEEE Congress on Evolutionary Computation, pp. 624–629.

Deliparaschos, K.M., Nenedakis, F.I., and Tzafestas, S.G. (2006), ‘‘Design and Implementation of aFast Digital Fuzzy Logic Controller Using FPGA Technology,’’ Journal of Intellect and RoboticsSystematics, 45, 77–96.

Digalakis, J.G., and Margaritis, K.G. (2002), ‘‘An Experimental Study of Benchmarking Functionsfor Genetic Algorithms,’’ in Proceedings of IEEE Conference on Transactions, Systems, Man andCybernetics, 79, 3810–3815.

Holland, J.H. (1975), Adaptation in Natural and Artificial Systems: An Introductory Analysis withApplications to Biology, Control, and Artificial Intelligence, University of Michigan Press: AnnArbor, MI.

Koza, J.R. (1992), Genetic Programming: On the Programming of Computers by means of NaturalSelection, MIT Press: Cambridge, MA.

Koza, J.R., Bennett, F.H. III., Andre, D., and Keane, M.A. (1999), Genetic Programming III:Darwinian Invention and Problem Solving, Morgan Kaufmann: San Francisco, CA.

Figure 13. Experiment results: (a) estimated optima versus chromosome length; (b) estimatedgenerations versus population size; (c) estimated calculation time versus population size.

International Journal of Electronics 1165

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009

Page 18: AparameterisedgeneticalgorithmIPcore:FPGAdesign ...users.ntua.gr/kdelip/resources/pdf/Deliparaschos-et-al.---2008---A-parameterised...applied to the genes of the population are crossover,

Martel, E. (2007), ‘‘Solving Travelling Salesman Problems Using Genetic Algorithms,’’ http://ai-depot.com/Articles/51/TSP.html.

Mitchell, M. (1996), An Introduction to Genetic Algorithms, MIT Press: Cambridge, MA.Pham, D., and Karaboga, D. (2000), Intelligent Optimisation Techniques: Genetic Algorithms, Tabu

Search, Simulated Annealing and Neural Networks, Springer-Verlag: New York, NY.Reinelt, G. (2007), TSP Libraries, University of Heidelberg, Department of Computer Science.

http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/.Sjoholm, S., and Lindh, L. (1997), VHDL For Designers, Prentice Hall Europe: England.Tang, W., and Yip, L. (2004), ‘‘Hardware Implementation of Genetic Algorithms Using FPGA,’’

47th Midwest Symposium on Circuits and Systems (MWSCAS ’04), pp. 549–552.Tu Lei, ZhuMing-cheng, and Wang, Jing-Xia (2002). ‘‘The Hardware Implementation of a Genetic

Algorithm Model with FPGA,’’ in IEEE International Conference on Field-ProgrammableTechnology (FPT ’02), pp. 374–377.

Tzafestas, S.G. (1999), Soft Computing in Systems and Control Technology, World Scientific:Singapore.

Wikipedia (2008), ‘‘Semiconductor Intellectual Property Core,’’ Wikipedia, The free encyclopedia,http//en.wikipedia.org/wiki/Semiconductor_intellectual_property_core.

Xilinx (2008a), ‘‘Spartan–3 FPGA Family: Complete Data Sheet–DS099,’’ Xilinx. http://www.xilinx.com/bvdocs/publications/ds099.pdf.

Xilinx (2008b), ‘‘ISE Foundation software,’’ http://www.xilinx.com/ise/logic_design_prod/.Zainalabedin, N. (1997), VHDL: Analysis and Modeling of Digital Systems, McGraw-Hill: New

York, NY.Zhang, L., and Zhang, B. (2000), ‘‘Research on the Mechanism of Genetic Algorithms,’’ Journal of

Software, 11, 945–952.Zhu, Z., Mulvaney, D.J., and Chouliaras, V.A. (2007), ‘‘A Novel Genetic Algorithm Designed for

Hardware Implementation,’’ International Journal of Computational Intelligence, 3, 281–288.

1166 K.M. Deliparaschos et al.

Downloaded By: [HEAL-Link Consortium] At: 04:02 15 July 2009