[ieee 2010 21st ieee international symposium on rapid system prototyping (rsp) - fairfax, va, usa...

6
Reconfigurable Router for Dynamic Networks-on-Chip Philipp Mahr Department of Computer Science University of Potsdam 14482 Potsdam, Germany Email: [email protected] Christophe Bobda Department of Computer Science University of Potsdam 14482 Potsdam, Germany Email: [email protected] Abstract—A reconfigurable router architecture for dynamic Networks-on-Chip (DyNoC) is presented. Dynamically placed modules cover several processing elements and routers of the DyNoC. These processing elements communicate over a second communication level using direct-links between neighbouring elements. Routers covered by modules are therefore useless. In this paper, several possibilities to use the router as additional resources to enhance complexity of modules are presented. The reconfigurable router is evaluated in terms of area, speed and latencies. A case-study where the router is used as a lookup-table demonstrates the feasibility of this approach. I. I NTRODUCTION In the past, the performance of microprocessors could be in- creased through the exploitation of instruction level parallelism and the scaling of transistors. Unfortunately, this can not be continued because of a growing concern in power consumption and the diminishing returns in exploiting instruction level parallelization. A solution to this problem is the exploitation of high-level parallelization using multiprocessor-systems [1]. One of the key component in multiprocessor-systems is the interconnect. A variety of interconnection schemes, like buses, crossbars, rings, stars or Networks-on-Chip (NoCs) are currently used, whereas buses and NoCs are the dominant schemes in research communities [2]. While buses offer a low latency they do not scale well and are therefore a good solution for a limited number of processing elements (PEs). NoCs provide a far better scalability but introduce high and variable latencies and are seen as the most viable solution for chips containing many PEs [3]. Because of the diversity of applications it is not possible to design multiprocessor-systems that can meet the require- ments of all applications. Therefore, there is an increasing interest in the research community to employ reconfigurable architectures to build flexible and adaptive multiprocessor- systems. Common reconfigurable chips are FPGAs, which use single-bit configurable logic blocks (CLBs) and are therefore known as fine-grain reconfigurable devices. Coarse-grain de- vices provide reconfiguration on a function level using word- level configurable function blocks (CFBs). While fine-grain reconfigurable hardware has the benefit of high flexibility, coarse-grain reconfigurable hardware trades-off some flexibil- ity for a potentially higher degree of optimisation in terms of area and power. With coarse-grain architectures significantly less time is needed to perform reconfiguation and the amount of reconfiguration data (bit stream) is much lower compared to fine-grain architectures [4], [5]. This work focuses on the design of a coarse-grain reconfig- urable router for the dynamic Network-on-Chip (DyNoC). In such a network, modules, covering several PEs and routers, are dynamically placed and removed on the chip. Therefore, the routers need to enable communication, when used as normal network-routers, and need to be used as additional resource to accelerate computation, when covered by a module. This work exploits the usage of routers as additional resources and presents an implementation of the reconfigurable router. The rest of this paper is organized as follows. Related work is given in section II and section III describes the central points of DyNoC, a dynamic Network-on-Chip architecture. Section IV presents the architecture of the reconfigurable router and in section V the proposed architecture is evaluated in terms of area, speed and latencies. A case study is presented before this paper is concluded in section VI. II. RELATED WORK This section lists examples of routers which are adaptive to the network state or provide enhanced fault tolerance by offering several configurations. Additionally a controller for mapping configurations on NoCs and examples of dynamic NoCs are presented. [6] describes a modular router architecture using decoupled parallel arbiters and small crossbars for row and column connection. Because the router has a modular design, the network can be graceful degraded in the event of a permanent fault, meaning that components of the router can be shared or recycled, thus increasing fault tolerance. ViChar [7], the dynamic Virtual Channel regulator, is a unified buffer structure, able to dynamically allocate buffer resources depending on network state. A variable number of virtual channels (VC) can be provided to each port of the 978-1-42447074-7/10/$26.00 C2010 IEEE DOI 10.1109/rsp 2010.43

Upload: christophe

Post on 25-Dec-2016

218 views

Category:

Documents


4 download

TRANSCRIPT

Reconfigurable Router for DynamicNetworks-on-Chip

Philipp MahrDepartment of Computer Science

University of Potsdam14482 Potsdam, Germany

Email: [email protected]

Christophe BobdaDepartment of Computer Science

University of Potsdam14482 Potsdam, Germany

Email: [email protected]

Abstract—A reconfigurable router architecture for dynamicNetworks-on-Chip (DyNoC) is presented. Dynamically placedmodules cover several processing elements and routers of theDyNoC. These processing elements communicate over a secondcommunication level using direct-links between neighbouringelements. Routers covered by modules are therefore useless. Inthis paper, several possibilities to use the router as additionalresources to enhance complexity of modules are presented. Thereconfigurable router is evaluated in terms of area, speed andlatencies. A case-study where the router is used as a lookup-tabledemonstrates the feasibility of this approach.

I. INTRODUCTION

In the past, the performance of microprocessors could be in-creased through the exploitation of instruction level parallelismand the scaling of transistors. Unfortunately, this can not becontinued because of a growing concern in power consumptionand the diminishing returns in exploiting instruction levelparallelization. A solution to this problem is the exploitationof high-level parallelization using multiprocessor-systems [1].

One of the key component in multiprocessor-systems isthe interconnect. A variety of interconnection schemes, likebuses, crossbars, rings, stars or Networks-on-Chip (NoCs) arecurrently used, whereas buses and NoCs are the dominantschemes in research communities [2]. While buses offer alow latency they do not scale well and are therefore a goodsolution for a limited number of processing elements (PEs).NoCs provide a far better scalability but introduce high andvariable latencies and are seen as the most viable solution forchips containing many PEs [3].

Because of the diversity of applications it is not possibleto design multiprocessor-systems that can meet the require-ments of all applications. Therefore, there is an increasinginterest in the research community to employ reconfigurablearchitectures to build flexible and adaptive multiprocessor-systems. Common reconfigurable chips are FPGAs, which usesingle-bit configurable logic blocks (CLBs) and are thereforeknown as fine-grain reconfigurable devices. Coarse-grain de-vices provide reconfiguration on a function level using word-level configurable function blocks (CFBs). While fine-grain

reconfigurable hardware has the benefit of high flexibility,coarse-grain reconfigurable hardware trades-off some flexibil-ity for a potentially higher degree of optimisation in terms ofarea and power. With coarse-grain architectures significantlyless time is needed to perform reconfiguation and the amountof reconfiguration data (bit stream) is much lower comparedto fine-grain architectures [4], [5].

This work focuses on the design of a coarse-grain reconfig-urable router for the dynamic Network-on-Chip (DyNoC). Insuch a network, modules, covering several PEs and routers, aredynamically placed and removed on the chip. Therefore, therouters need to enable communication, when used as normalnetwork-routers, and need to be used as additional resourceto accelerate computation, when covered by a module. Thiswork exploits the usage of routers as additional resources andpresents an implementation of the reconfigurable router.

The rest of this paper is organized as follows. Related workis given in section II and section III describes the central pointsof DyNoC, a dynamic Network-on-Chip architecture. SectionIV presents the architecture of the reconfigurable router andin section V the proposed architecture is evaluated in termsof area, speed and latencies. A case study is presented beforethis paper is concluded in section VI.

II. RELATED WORK

This section lists examples of routers which are adaptiveto the network state or provide enhanced fault tolerance byoffering several configurations. Additionally a controller formapping configurations on NoCs and examples of dynamicNoCs are presented.

[6] describes a modular router architecture using decoupledparallel arbiters and small crossbars for row and columnconnection. Because the router has a modular design, thenetwork can be graceful degraded in the event of a permanentfault, meaning that components of the router can be shared orrecycled, thus increasing fault tolerance.

ViChar [7], the dynamic Virtual Channel regulator, is aunified buffer structure, able to dynamically allocate bufferresources depending on network state. A variable number ofvirtual channels (VC) can be provided to each port of the

978-1-42447074-7/10/$26.00 C2010 IEEE DOI 10.1109/rsp 2010.43

router allowing fewer but deeper VCs or more but flat VCs,depending on traffic situation.

[8] describes an adaptive network interface where thebuffers can be dynamically sized, so that each buffer can bespread over its neighbours, when more storage is needed. Thebuffer size can be configured to fit the needs of each networkclient using a delegate manager, which checks the status (e. g.full, empty) of the buffers and can initiate a configuration.

In [9] the mapping of applications on a NoC based reconfig-urable data-flow architecture is presented. A communicationand configuration controller is used to manage reconfigurationand data synchronisation inside each resource. Configurationand scheduling of resources is distributed over the networkusing configuration servers, reducing latencies of schedulingand configuration compared to centralized schemes.

CuNoC [10] is a dynamic interconnection structure for run-time fine-grain reconfigurable modules. The basic element arecommunication units (CUs) which uses adaptive routing ofpackets. The CU has no local port and only one buffer for allfour inputs, thus having a small memory footprint. Modulesplaced on the CuNoC occupy at least one CU and inherit alladdresses and ports of the occupied CUs.

[11] describes the FLUX interconnection network, whichis a scheme where the interconnection between processingelements is established on demand before or during executionof the program. However placement of processing elementshas not to be fixed. The FLUX scheme generally targets fine-grain reconfigurable devices like FPGAs, which is in contrastto this work.

III. DYNOC ARCHITECTURE

Packet-based communication used in a Network-on-Chipallows two modules to communicate by sending packets,instead of communicating through direct connections. How-ever, with reconfigurable devices, like FPGAs, parts of thenetwork can be reconfigured during runtime and therefore thecommunication among modules need to be dynamic to exploitthe possibilities of reconfigurable devices. In [12], [13] Bobdaet. al. presented the DyNoC, a coarse-grain reconfigurablecomputing architecture for the dynamic interconnection ofreconfigurable modules in a mesh network. The following subchapters describe the central points of the DyNoC approach,the dynamic communication infrastructure and the routing insuch a dynamic environment.

A. Communication Infrastructure

In the basic state with no modules placed, the networkbehaves like a normal NoC, consisting of processing elementsand routers. PEs access the network via corresponding routers.In contrast to a normal NoC, PEs can also communicate withtheir nearest neighbours using direct links. This allows theaggregation of several PEs to create a rectangular module, tocompute a complex task. Such modules can be dynamicallyplaced and removed. PEs inside a modules do not need to usethe routers in between for communication, but use the directlinks instead. The advantage of this hybrid communication

Fig. 1. Placement of modules at times t and t + 1

scheme is a gain in flexibility. However resource usage ishigher than that of a normal NoC. Figure 1 shows the DyNoCinfrastructure, whereas modules M1, M2 and M3 coveringseveral PEs and routers are placed on the network at timet. At time t+ 1 the module M3 has been removed and a newmodule M4 is placed on the network.

Because communication between modules is established atruntime and the configuration of the network is not know apriori, it must be ensured that all modules and pins of thenetwork are reachable. Therefore the modules and pins needto be strongly connected, which is achieved by having a ringof routers surrounding rectangular modules. This guarantees,that each message to a module or pin can reach it.

B. Surrounding XY-Routing

Modules placed on the network cover several routers andPEs and therefore create obstacle for packets. In the basicnetwork state each router has four active neighbour routers(except for routers at the border), but with modules placeddynamically on the network, some of the channels betweenrouters are blocked. This is why traditional routing algorithmscan not be used in a DyNoC. An extension to the well-knownXY routing algorithm is the adaptive S-XY (Surrounding-XY)algorithm, which is locally decisive.

The routing algorithm has three different modes. In normalXY mode (N-XY), XY routing is applied. Packets are firstsend to the correct row (X-position) and then to the correctcolumn (Y-position). The second mode is surround horizontal(SH-XY) which is entered when an obstacle left or right of thepacket need to be surrounded and the last mode is surroundvertical (SV-XY) which is entered when an obstacle to theupper or lower direction is detected. In the N-XY mode,packets are first send horizontally to the right X-coordinateand then vertically to the Y-coordinate. In [12] it was shownthat with a very high probability S-XY routing is deadlockfree by proving that there is always a path form source ofthe packet to its destination and that the packet will reach itsdestination after a fixed number of steps.

In figure 2 a packet surrounding an obstacle in verticaldirection is shown. To avoid a deadlock, which happen with

Fig. 2. Surrounding obstacle in vertical direction using router guiding

N-XY routing and occurs when a packet at the correct column(X-coordinate) is send to a router left or right, the packetneed to be guided around the obstacle. Because the optimaldirection to surround an obstacle is not known, long routingpath can be the result.

Router guiding is used to minimize the routing path. Therouters around a module are instructed on the direction wherea packet must be send to minimize the surrounding routingpath. For example, a packet already in the correct X-positionmoving from top to bottom is blocked by an obstacle. Thepacket firstly needs to travel in the X-direction, guided by therouters, to surround the obstacle before the packet can travel inY-direction to complete surrounding. For router guiding eachrouter has two additional lines for each neighbour router.

IV. RECONFIGURABLE ROUTER

In the DyNoC architecture, the routers inside a moduleare redundant, because communication inside a module usesdirect wiring and therefore packet-based communication is notneeded. In [12] it was proposed to use the redundant routersas additional resources to implement more complex modules.However, it was left open how to use the routers as additionalresources. In this work we provide an answer to this question.

Routers generally consist of three main components. Acrossbar-switch which connects multiple inputs to multipleoutputs, a set of buffers to temporarily store packets or partsof packets and components for controlling, managing theforwarding of packets according to their destination. Severalpossibilities to use these components as additional resources ina DyNoC exist: Firstly the crossbar-switch can be used as anadditional communication resource between PEs and secondlythe buffers can serve as random accessible memories, e. g.to implement lookup-tables. Lastly the control componentscan be replaced by using a small processor running allcontrol functionality in software and therefore providing the

possibility to use the router as a co-processor. While a co-processor would provide great flexibility, it also has somemayor drawbacks. Besides the resources for implementing theprocessor, additional memory to store instructions is needed(when in router mode), thus leading to a higher demand ofarea. Reconfiguration in this case is performed by exchangingthe software making the process of reconfiguration more com-plex and additionally increasing the amount of reconfigurationdata and time significantly.

In the following sections, a coarse-grain reconfigurablerouter is presented exploiting the usage of the crossbar-switchand the set of buffers of the router as additional resources toincrease flexibility and complexity of modules in a DyNoC.

A. Router Architecture

In figure 3 a simplified view of the proposed router architec-ture is given. The input channels for the local, east and southport and the output channels for the local and south port canbe seen. Compared to a basic router (see for instance [14]) themain differences of the reconfigurable router are the extensionof control logic to manage the different configurations and thepossibility to randomly access the input buffers of the north,east west and south ports (memory management and config-uration controller). Also, the routing protocol is extended tofit the changing environment of the DyNoC, by implementingthe previously described S-XY routing algorithm and routerguiding.

Fig. 3. Simplified reconfigurable router architecture

Wormhole switching (or wormhole flow control) is usedbecause it offers some advantages over switching techniqueslike cut-through, store-and-forward or circuit switching. Whilecut-through and store-and-forward switching allocate buffers

and channel bandwidth to packets, wormhole switching allo-cates both of these resources to smaller pieces of a packets(flow control digits) resulting in an overall smaller footprintof the router. Circuit switching on the other hand is a buffer-less switching technique were circuits between two elementsare established before actual communication happens. Circuitswitching in reconfigurable devices presents some drawbacks,like long communication delay or the exclusive use of chipspace [14], [15].

All five ports (local, north, south, east and west) of therouter are 32 Bits wide. In a DyNoC, routers need to be tightlycoupled with their corresponding PE to have maximum utiliza-tion of the available router resources when combined insidea module. Therefore, it is sufficient to allow reconfigurationto be initiated by a configuration network interface betweenrouter and PE. This constraint additionally eases the designof the router and reduces the amount of resources neededfor configuration inside the router, because the local port isused for both communication and configuration. In contrast tothe buffers of the ports connecting neighboring routers (north,south, east and west), the local buffer of the router uses aFIFO implementation only, because configuration is initiatedover the local port, which must be available at any time.

The buffers assigned to the (input) ports east, west, southand north can be accessed like dual-ported RAM (DPRAM).The first port of the DPRAM is used as a write port connectedto a (local) configurable memory controller (CMC), whichon the one hand is connected to the channel of neighbouringrouters using FIFO logic and on the other hand is connected tothe local port allowing the local PE to write. The second portis a read port connected to the crossbar switch using FIFOlogic for reading the flits of packets or control logic allowingrandom reads. A memory management unit (MMU) is used toprovide a contiguous buffer (virtual memory) to the attachedPE, by converting virtual addresses to actual physical ones.In fact up to four physically fragmented buffers can be usedas one contiguous resource by the PE. This allows the PE toeasily access the memory of the router.

All five buffers, allowing simultaneous read and writes,and corresponding controllers function independently and inparallel, which is critical for the low latency requirements ofthe router.

A configuration controller inside the router manages recon-figuration and is implemented as a state machine.

B. Reconfiguration

When a reconfiguration is initiated the configuration con-troller reconfigures the router to act as a lookup table (LUT),a write and readable memory (RAM) or as a resource forintra module communication. A fixed path between PEs canbe established, when the router is used for communicationinside a module. Additionally, router guiding can be enabledin each reconfiguration mode.

In LUT mode a 8, 16 or 32 bit word can be looked up,so that a computation is replaced with a simple read duringruntime. Therefore, all the words to lookup need to be written

to the memory during configuration. To fully utilize the limitedamount of memory of the router four 8 bit words, respectivelytwo 16 bit words can be stored in one 32 bit memory column.When a lookup is initiated one or more addresses are sendto the router, which then looks up the results by calculatingthe physical addresses using the MMU and addressing thecorresponding CMC. The results are send directly to the PE.

When in RAM mode the memory of the router is used as aread and writable memory. A write packet contains an addressfollowed by a 32 Bit data word, while a read packet onlycontains an address. The read data is then send to the local PE.This mode is basically for free as all logic to access memoriesis needed in LUT mode as well.

A path (communication over routers) between Routers canbe established by fixing the in and outputs of the the switchingmatrix, so that no routing and switch allocation is required.This configuration allows the sending of data between two(non-neighbour) PEs inside a module and therefore increasesthe complexity and flexibility of modules. This feature iscalled tunneling, as a tunnel between two PEs using routers iscreated.

Fig. 4. Module using reconfigurable routers as resources

In figure 4 an example mapping of a module using reconfig-urable routers as resources can be seen. The router at position1/3 and 2/2 are used as lookup-tables by their correspondingPEs, while the PE at position 3/1 uses the router as a randomaccess memory. A fixed path (tunnel) between the PEs atposition 1/1 and 3/2 is established. The router at 2/2 is used asa lookup-table and a resource for intra module communication.Router guiding is enabled to tell the routers at the border ofthe module on the direction where a packet must be send. Themodule can access the network over router 4/3. Only the activecommunication channels are drawn in this figure.

Reconfiguration happens with configuration packets sendover the local channel to the router. To reconfigure the routerless than 32 Bits are needed, as this is enough to distinguishbetween several coarse-grain configurations. However, in LUTmode the results to lookup need to be written to the memoryduring configuration, increasing the amount of configurationdata.Accessing the router always happens with packets whichdiffer in configuration packets, memory access packets (LUT,RAM) and (normal) communication packets. Therefore anadaptive header is used. The header consists of a static partand a configuration-dependent part. The static part is usedto differ between configuration and non-configuration pack-ets, while the configuration-dependent part holds informationabout packet size, address, configuration type or tunnelingsource and destination port. The header of communicationpackets holds information about size and destination. Theinformation of the configuration packets is stored in configu-ration control registers inside the router.

When a configuration is triggered, it has to be guaranteed,that no packets present in the memories are destroyed. Packetsin the memories need to be processed, before the memory canbe configured. In parallel router guiding is enabled to lock theport so that corresponding neighbour routers will not send anynew packets over the channel.

V. EVALUATION

Evaluation of the configurable router is done in terms ofarea, size and latencies. A case study shows the feasibility ofthis approach on real life example.

A. Synthesis ResultsFor purpose of comparison a generic five port router

(Generic Router) using wormhole switching and XY routingwas implemented and compared with the reconfigurable router(Reconfig. Router). Both routers were implemented usingHandelC, whereas special coding techniques and self-madecompiler were used [16]. In table I the difference betweenthe routers is shown. Synthesis results are given for a XilinxVirtex 5 LX110 FPGA using Xilinx XST synthesis.

TABLE ISYNTHESIS RESULTS

SliceLUT

SliceRegisters

MaximumFrequency [MHz]

Reconfig. Router 3994 (5%) 2246 (3%) 113.1Generic Router 2701 (3%) 1542 (2%) 176.7

Each of the five buffers of the router has 1024 (32x32) bitsof memory. Compared to the generic router the reconfigurablerouter occupies about 48% more LUTs and 45% more regis-ters, which leads to a total increase of about 47%. This increasein size compared to the generic router implementation resultsin the extended control logic for configuration and randommemory access and the implementation of the S-XY routingalgorithm which is more costly than XY routing. Also a 32 Bit1-to-4 demultiplexer is needed to access all input port buffersover the local port.

B. Timings

The time needed to configure the router is low, which isone advantage of coarse-grain reconfiguration. In table II alltimings in number of cycles are given including timings forprocessing and reconfiguration.

TABLE IIROUTER TIMINGS

Min. # Cycles Max. # Cycles

TLUT 3 5TRAM 0TCOMM 1TGUIDE 1TRESET 4 traffic and size dependentTSETUP 2TLOOKUP 5 ∗ i 6 ∗ i

TWRITE 3 5TREAD 4 6TPAY LOAD i

TROUTING 1 3 (obstacle)

Equation 1 describes the amount of time TCONF , whichis needed for every configuration. No packets should bedestroyed during a reconfiguration. Therefore, packets in thememories need to be routed before the buffer can be used,which is indicated by TRESET . TGUIDE is the time for acti-vating or deactivating router guiding. A setup time TSETUP isneeded to read the first header flit and to extract all necessaryinformation from the header flit (update configuration controlregisters).

TCONF = TSETUP + TGUIDE + TRESET (1)

While most configurations are basically setting a coupleof configuration registers (RAM or communication), whichare then accessed by other router resources, configuring therouter as LUT takes longer. The words to lookup need tobe written to the memory of the router depending on thesize of the memories and their availability, therefore min. andmax. timings are given in II. Equation 2 describes the time toconfigure the router TTTC . TLUT is the time needed to fill thememory with 32 bit words and TRAM the time to configurethe router as a RAM, which does not introduce an additionaloverhead. The time TCOMM is the timing to configure therouter for communication, to enable tunneling or to be usedas a basic router.

TTTC = TCONF +

TLUT

TRAM

TCOMM

(2)

The biggest impact on configuration time can come fromTRESET , which is the time needed to forward packets tothe neighboring routers. Only after all packets are processedconfiguration can take place. This timing is strongly related topacket size, applications and current network capacity.

In equation 3 the time for accessing the router TTTA isgiven. Again a setup time TSETUP is needed to read thefirst header flit and to evaluate if a reconfiguration need tobe performed. Additionally all necessary information from theheader flit is extracted (packet size and destination). Timingsfor looking up words TLOOKUP and for random accessto the memory TWRITE and TREAD are given, whereas aburst mode is currently only available for look-ups. Whenin communication mode, the time for calculating the routeand the switching of the data is given by TROUTING +TPAY LOAD. For tunneling TROUTING is zero, as the routefor the data has been fixed during configuration time.

TTTA = TSETUP +

TLOOKUP

TWRITE

TREAD

TROUTING + TPAY LOAD

(3)

C. Case Study

A 3x3 DyNoC was implemented on a Xilinx FPGA usingconfigurable routers and MicroBlaze soft-core processors. Theprocessors running at a frequency of 125 MHz were connectedto the routers running at 50 MHz using FSL direct link (Fast-Simplex-Link). The ML509 development board with Virtex5 LX110 FPGA was used as this board provides sufficientresources. In this case study only the LUT mode of the routeris evaluated by comparing the time to calculate the value ofa sine using the MicroBlaze processor and the time to lookupthe result using the router as a LUT. Therefore one Router andMicroBlaze were used. Taylor series is used to approximate thevalues of the sine by using the domain [0,+π/4]. To calculatethe sine for other values periodicity and symmetry of the sinecan be used. The series is calculated till the 4th term, whichhas an accuracy of about five decimal points.

TABLE IIICOMPARISON OF SINE LOOKUP AND COMPUTATION

Cycles(@125MHz)

Speedup

Lookup 57 315MicroBlaze + FPU 583 30,8Lookup + Conf. 1115 16,1MicroBlaze 17954 1

The calculation is compared with a lookup of the sine valueby using a hardware timer to measure the amounts of cycles.The hardware timer was attached to the MicroBlaze using FSL.Results are given in table III, whereas the speedup is givenin relation to the computation using a MicroBlaze withoutadditional floating point unit (FPU). Also a lookup includingprior configuration of the router with 64 x 32 bit words isgiven.

VI. CONCLUSION

In this paper a coarse-grain reconfigurable router for the usein dynamic Networks-on-Chip (DyNoC) has been presented.

In such an architecture tasks are placed on the network asmodules covering several PEs and routers. The router canbe configured as a lookup-table, a random access memoryor can be used to build up a fixed path between separatedPEs, allowing the reuse of the routers as additional resourcesinside a module. Evaluation of the router in terms of area,size and latencies was given and a case study showed, thatthe router can be used to increase performance, by lookingup results. Further developments will optimize the design interms of latencies (e. g. tunneling) and area. The mapping oftasks on the routers as well as the placement, rearrangementand scheduling of tasks during run-time will be consideredin further developments. Also the latency and bandwith ofthe network and the impcat of reconfiguration has to bedetermined.

REFERENCES

[1] J. L. Hennessy and D. A. Patterson, Computer Architecture - A Quan-titative Approach, 4th ed. Morgan Kaufmann, 2007.

[2] L. Benini and G. D. Micheli, “Networks on chips: A new soc paradigm,”Computer, vol. 35, pp. 70–78, 2002.

[3] T. D. Richardson, C. Nicopoulos, D. Park, V. Narayanan, Y. Xie,C. Das, and V. Degalahal, “A hybrid soc interconnect with dynamictdma-based transaction-less buses and on-chip networks,” in VLSID ’06:Proceedings of the 19th International Conference on VLSI Design heldjointly with 5th International Conference on Embedded Systems Design.Washington, DC, USA: IEEE Computer Society, 2006, pp. 657–664.

[4] S. Vassiliadis and D. Soudris, Fine- and Coarse-Grain ReconfigurableComputing. Springer Publishing Company, Incorporated, 2007.

[5] R. Hartenstein, “A decade of reconfigurable computing: a visionaryretrospective,” in DATE ’01: Proceedings of the conference on Design,automation and test in Europe. Piscataway, NJ, USA: IEEE Press,2001, pp. 642–649.

[6] J. Kim, C. Nicopoulos, and D. Park, “A gracefully degrading and energy-efficient modular router architecture for on-chip networks,” in ISCA ’06:Proceedings of the 33rd annual international symposium on ComputerArchitecture. Washington, DC, USA: IEEE Computer Society, 2006,pp. 4–15.

[7] C. Nicopoulos, V. Narayanan, and C. R. Das, Network-on-Chip: AHolistic Design Exploration. Springer Sceince + Buisness Mesia, 2009.

[8] R. Dafali and J.-P. Diguet, “Self-adaptive network interface (sani): Localcomponent of a noc configuration manager,” in International Conferenceon Reconfigurable Computing and FPGAs, 2009, pp. 296–301.

[9] F. Clermidy, R. Lemaire, Y. Thonnart, and P. Vivet, “A communicationand configuration controller for noc based reconfigurable data flowarchitecture,” in NOCS ’09: Proceedings of the 2009 3rd ACM/IEEEInternational Symposium on Networks-on-Chip. Washington, DC, USA:IEEE Computer Society, 2009, pp. 153–162.

[10] S. Jovanovic, C. Tanougast, C. Bobda, and S. Weber, “Cunoc: A dynamicscalable communication structure for dynamically reconfigurable fpgas,”Microprocess. Microsyst., vol. 33, no. 1, pp. 24–36, 2009.

[11] S. Vassiliadis and I. Sourdis, “Flux interconnection networks on de-mand,” J. Syst. Archit., vol. 53, no. 10, pp. 777–793, 2007.

[12] C. Bobda and A. Ahmadinia, “Dynamic interconnection of reconfig-urable modules on reconfigurable devices,” IEEE Design and Test ofComputers, vol. 22, pp. 443–451, 2005.

[13] C. Bobda, A. Ahmadinia, M. Majer, J. Teich, S. P. Fekete, and J. van derVeen, “Dynoc: A dynamic infrastructure for communication in dynam-ically reconfigurable devices,” in FPL, 2005, pp. 153–158.

[14] J. Duato, S. Yalamanchili, and N. Lionel, Interconnection Networks: AnEngineering Approach. San Francisco, CA, USA: Morgan KaufmannPublishers Inc., 2002.

[15] C. Bobda, Introduction to Reconfigurable Computing: Architectures,Algorithms, and Applications. Springer Publishing Company, Incor-porated, 2007.

[16] L. Middendorf and C. Bobda, “Declarative programming with handel-c,”in ERSA ’10 International Conference on Engineering of ReconfigurableSystems and Algorithms, Las Vegas, Nevada, USA, 2010.