[ieee 2011 international conference on communications, computing and control applications (ccca) -...

6
Network Interface Sharing for SoCs based NoC Brahim Attia, Wissem Chouchene, Abdelkrim Zitouni and Rached Tourki Faculty of Sciences of Monastir,Monastir, 5019, Tunisia Electronics and Micro-Electronics Laboratory(LAB-IT06) {brahim.attia,chouchenewissem}@yahoo. {abdelkrim.zitouni,rached.tourki}@fsm.fu.tn Abstract-The demand for IP reuse and system level scalability in System-on-Chip designs is growing. Network-on-chip consti- tutes a viable solution space to emerging SoC design challenges. In this paper, we present a configurable Network Interface(NI) architecture design approach with smaller area and lower power. The small area is achieved by memory resources sharing in the three modes used by the OCP IP or by many IPs connected to the NI. The low power is obtained by the implementation of a mechanism based on two level of gated clock for power saving. Experimental results show that adaptability, FIFO sharing, and gated clock aspects integrated in the proposed NI allow a significant reduction in terms of area and power. Keywords-:Network-on-Chip, Network Interface,OCP,sharing; I. INTRODUCT ION A big challenge of current and future chip design is the integration of components with millions of transistors and their efficient operation. System on-chip (SoC) designs provide such an integrated solution to various complex applications. One of the key issues of SoC designs is the communication archi- tecture between heterogeneous components. Most of the com- munication architectures in current SoCs are based on buses. However, the bus architecture has its inherent limitations [1-3]. A packet-switched network which delivers messages between communicating components has been proposed as a solution for SoC design [4-6]. Such network-on-chip (NoC) provides a high performance communication inastructure. NoC is a new paradigm for integrating a large number of IPs cores for implementing a SoC [7]. In NoC paradigm a router based network is used for packet switched communication among on chip cores. Since the communication inastructure as well as the cores om one design can be easily reused for a new product, NoC provides a high possibility for reusability. Network on Chip are composed of routers, which transport data om one node to another, links between routers, and Network Interfaces (NI), which implement the interface to the IP modules. One of the key components for on-chip networks is the wrapper for different IP cores in the tiles [8-10]. Since different reusable IP cores may not be developed based on the NoC, a wrapper is required as the interface between the IP core and its associated router. NI must provide services at the transport layer in the OSI reference model [11], because this is the first layer where offered services are independent of the network implementation. This is a key ingredient in achieving the decoupling between computation and communication [12- 13], which allows IP modules and interconnects to be designed independently om each other. There exist a number of socket specifications to this end, such as VCI (Virtual component Interface) [14], OCP (Open Core Protocol) [15] and AXI (Advanced eXtensible Interface) [16]. Since most NoCs are message-passing by nature, an adapter is needed. It should be observed that in realistic NoC architectures, NIs play a significant role in determining the overall NoC area. NNs sharing could be a viable option to reduce the area overhead of these components. The most common approach [8] consists of attaching multiple cores to the same NI, thus sharing the same input port to the network. In [8], the authors proposed the replication of the buffering resources in the NI, thus leading to an increase of area which hardly justifies this design choice. In fact, buffers can account for more than 30 of the NI area [34]. Moreover, the replication of buffering resources increases the complexity of buffer control logic , thus resulting in a reduction of the maximum operating equency or in an increase of NI latency. The problem can be relieved as in [8] by inferring custom made hardware FIFOs at the cost of flexibility. In this paper we present how many IPs connected to the NI can share the NI and share the buffering resource with low power. Many masters IPs can share one MNI and many slaves IPs can share one SNI. In this way, we can considerably reduce the number of NI in the SoC and as a result we can reduce the area of all SoC. The aim of the proposed approach is to identify key issues of NI design and developing an efficient NI to be used in various kinds of NOC topology like 2D mesh [18], torus [5], spidergone [19], and octagon [20]. The paper is organized as follows. In section 2 the related work is presented. In Section 3, we details the architecture of the proposed CNI. Section 5 presents some experimental results. Finally, in section 6 main results are summarized, and future work is proposed. II. RELATED WORKS There are many publications in the area of novel network architectures design. Although network interfaces, their ar- chitectures and their optimizations have not been addressed extensively in the open literature, and few publications have addressed particular issues to the design of NI module [21- 24]. Guidelines and trade-offs for NI designs have been summarized in [25]. NIs with low area footprint have been proposed in [26-27]. The NI hardware complexity reduction coming om aligned packet formats is highlighted in [28].

Upload: rached

Post on 23-Dec-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Network Interface Sharing for SoCs based NoC

Brahim Attia, Wissem Chouchene, Abdelkrim Zitouni and Rached Tourki Faculty of Sciences of Monastir,Monastir, 5019, Tunisia

Electronics and Micro-Electronics Laboratory(LAB-IT06) {brahim.attia,chouchenewissem}@yahoo.fr

{ abdelkrim.zitouni,rached.tourki}@fsm.fllu.tn

Abstract-The demand for IP reuse and system level scalability in System-on-Chip designs is growing. Network-on-chip consti­tutes a viable solution space to emerging SoC design challenges. In this paper, we present a configurable Network Interface(NI) architecture design approach with smaller area and lower power. The small area is achieved by memory resources sharing in the three modes used by the OCP IP or by many IPs connected to the NI. The low power is obtained by the implementation of a mechanism based on two level of gated clock for power saving. Experimental results show that adaptability, FIFO sharing, and gated clock aspects integrated in the proposed NI allow a significant reduction in terms of area and power.

Keywords-:Network-on-Chip, Network Interface,OCP,sharing;

I. IN T RODUC T ION

A big challenge of current and future chip design is the integration of components with millions of transistors and their efficient operation. System on-chip (SoC) designs provide such an integrated solution to various complex applications. One of the key issues of SoC designs is the communication archi­tecture between heterogeneous components. Most of the com­munication architectures in current SoCs are based on buses. However, the bus architecture has its inherent limitations [1-3]. A packet-switched network which delivers messages between communicating components has been proposed as a solution for SoC design [4-6]. Such network-on-chip (NoC) provides a high performance communication infrastructure. NoC is a new paradigm for integrating a large number of IPs cores for implementing a SoC [7]. In NoC paradigm a router based network is used for packet switched communication among on chip cores. Since the communication infrastructure as well as the cores from one design can be easily reused for a new product, NoC provides a high possibility for reusability. Network on Chip are composed of routers, which transport data from one node to another, links between routers, and Network Interfaces (NI), which implement the interface to the IP modules. One of the key components for on-chip networks is the wrapper for different IP cores in the tiles [8-10]. Since different reusable IP cores may not be developed based on the NoC, a wrapper is required as the interface between the IP core and its associated router. NI must provide services at the transport layer in the OSI reference model [11], because this is the first layer where offered services are independent of the network implementation. This is a key ingredient in achieving the decoupling between computation and communication [12-

13], which allows IP modules and interconnects to be designed independently from each other. There exist a number of socket specifications to this end, such as VCI (Virtual component Interface) [14], OCP (Open Core Protocol) [15] and AXI (Advanced eXtensible Interface) [16]. Since most NoCs are message-passing by nature, an adapter is needed. It should be observed that in realistic NoC architectures, NIs play a significant role in determining the overall NoC area. NNs sharing could be a viable option to reduce the area overhead of these components. The most common approach [8] consists of attaching multiple cores to the same NI, thus sharing the same input port to the network. In [8], the authors proposed the replication of the buffering resources in the NI, thus leading to an increase of area which hardly justifies this design choice. In fact, buffers can account for more than 30 of the NI area [34]. Moreover, the replication of buffering resources increases the complexity of buffer control logic , thus resulting in a reduction of the maximum operating frequency or in an increase of NI latency. The problem can be relieved as in [8] by inferring custom made hardware FIFOs at the cost of flexibility. In this paper we present how many IPs connected to the NI can share the NI and share the buffering resource with low power. Many masters IPs can share one MNI and many slaves IPs can share one SNI. In this way, we can considerably reduce the number of NI in the SoC and as a result we can reduce the area of all SoC. The aim of the proposed approach is to identify key issues of NI design and developing an efficient NI to be used in various kinds of NOC topology like 2D mesh [18], torus [5], spidergone [19], and octagon [20]. The paper is organized as follows. In section 2 the related work is presented. In Section 3, we details the architecture of the proposed CNI. Section 5 presents some experimental results. Finally, in section 6 main results are summarized, and future work is proposed.

II. RELATED WORKS

There are many publications in the area of novel network architectures design. Although network interfaces, their ar­chitectures and their optimizations have not been addressed extensively in the open literature, and few publications have addressed particular issues to the design of NI module [21-24]. Guidelines and trade-offs for NI designs have been summarized in [25]. NIs with low area footprint have been proposed in [26-27]. The NI hardware complexity reduction coming from aligned packet formats is highlighted in [28].

In [29], an NI implementing VCI standard interface was presented for the SPIN NoC. However, the adapter has quite a high forward latency. In [30] an OCP compliant adapter for the Xpipes NoC was touched upon. The NI has a low area and supports a single outstanding read transaction. In [31] an OCP compliant NI for Asynchronous NoC is presented. In [40] a Burst precise mode OCP compliant Nls for the mesh NoC was designed. In [41] a generic architecture is presented to provide any mode of Nis compliant OCP for the mesh NoC and it can be used for others topology. In [17] NI sharing techniques is used for area optimized NoC architectures. In [32] a monitoring run-time system activity in adaptive NoC based MP-SoC platforms through the observation of transac­tions have been performed on the communication subsystem. Collecting information about system activity is implemented within OCPIIP compliant NIs. The area concern for NoC implementation was raised in the literature [33-36]. It was showed in Xpipes based system [33] that NoC architecture for a 30 core system takes one order of magnitude more cell area than that of a state- of the-art multi-layer interconnects. In terms of floor plan area, the increase can be as large as a few tens of squared millimeters and more than half of the NoC area is due to Nls. In [34], authors demonstrate the ideas of extending a commercially available SOC for picture improvement in high end TVs with the AEthereal NOe. Its initial results indicate that replacing the original interconnect (consisting of dedicated links and multiplexers for by passes) by programmable NOC as the interconnect fabric increases the SOC area by 4% and 78% of the increase in chip area was proved to come from the NIs. We conclude that these network components should come with low area footprint, since the size of IP modules attached to them is relatively small. Perhaps the most complete description of a NI is that reported in [8], which provides throughput or latency guarantees and supports multiple points to- point Connections. In practice, multiple initiator cores could be connected to the same NI by exploiting these connections. Two message queues are allocated at NI instantiation time for each point­to-point connection; one for outgoing and one for incoming messages. The underlying principle consists in replicating the buffering resources of the shared NI depending on the number of attached cores. This paper proposed a structured approach for designing a configurable NI architecture based on two levels of gated clock scheme. The proposed Configurable NI (CNI) architecture can be shared by many IPs that use the 3 burst modes. The NI buffering resources are shared among a cluster of processing cores or communication targets that support OCP IP standard. The advantage of the proposed approach with conventional architecture is that we do not need a new buffer for each new connected IP [8], (without duplication of buffering resource for each attached core). The connected IPs share the same NI and the buffer resource which allows reducing considerably the area.The user of this NI can use a design traffic merging and splitting modules for this purpose [17], which interleave the different traffic flows on a unique NI. While potentially resulting in a more efficient area

of implementations, this solution needs to preserve compliance with end-to-end communication protocols linking NIs with communicating cores. This work can be extended to support AXI, AHB protocols with OCP and IPs share the NI and share also the buffering resources.

III. NE TWORK IN TERFACE ARCHI T EC TURE

We have designed two types of Nls for OCP based cores in the proposed NoC in each mode, named MNI (Master NI ) and SNI (slave NI). Each NI is additionally split in two sub modules, one for the request and the other is for the response channel. In OCP, there are three different burst models. (i) Precise burst: in this model, the burst length is known when the burst is sent. Each data word is transferred as a normal single transfer, where the address and command are given for each data-word, which has been written or read. (ii) Imprecise burst: in this model, the burst length can change within the transaction. The MBurstLength shows estimation on the remaining data-words that will be transferred. Each data­word is transferred as in the precise burst model, with the command and address sent for every data-word. (iii) Single request multiple data burst: In this model, the command and address fields are only sent once. That is in the beginning of the transaction. This means that the destination core must be able to reconstruct the whole address sequence, based on the first address and the MBurstSeq signal. The advantage gained by using burst transfers is that the bandwidth is used more effectively, since it is only necessary to send the starting address together with some information about the burst. In [30] and [31], the authors present the implementation details of NI allowing support for single reads and writes, reads and writes (SRMD). In our case, we have designed a NI with these 3 burst modes with credit based and handshake flow control mechanisms. The used network on chip and package format specification are described in [40]. We will present in next section the architecture of proposed MNI.

A. Master Network Interface The mission of the Master NI is to receive requests from the

master core and to encapsulate the request into a package. It also allows the packets transmission to the network, response receipt from it, and response decapsulation and transmission to the master core. The architecture of the MNI is composed of two data flows paths. One data flow is the request data flow, where the core is the source and the network is the destination. The second data flow is the response data flow where the network is the source and the core is the destination.

J) Request data-flow: The Request data-flow of the MNI architecture in Figure 1 is mainly composed of 3 blocks representing the 3 burst OCP modes, a multiplexer, a FIFO, a routing table, a decision module and a set of logical OR gates. In the presence of an OCP request in input, routing table uses a portion of the field Maddr to provide to the output the field path to target is a local memory in the master NI. It stores the route paths to others slaves cores in the NoC. Initially, it activates the clock of the macro block

corresponding to the mode generated by the IP and stops the clocks of remaining modules. Finally, it provides the address selection to the multiplexer. Indeed, one block can operate at the same time and the two others are at idle state. In this context, we exploited the concept of clock tree which selects one branch among the other for a given mode and transaction. This mechanism is implemented in two levels of tree. The

Fig. I. Architecture of MNI Request data-How

levels of this tree are as follows: Level I: Allows the transfer of clock to a given burst mode. The selection is implemented in a module called 'decision module'. Level 2: At this level, stoppable clock module performs the operation of transfer or stopping of clocks of the two modules (header builder and control FIFO) by transaction type. Within the MNI request data flow, several communications between modules proceed; the modules constituting this entity are described as follows: Decision module: This module's role is to distinguish the mode generated by the IP for a given transaction consequently; it transfers the input clock to one of outputs Clock (Clkl, Clk2, and Clk3) and stops the remaining clocks. When the designated output clock is activated, the corresponding macro-block (BP, BI, and SRMD) begins the transfer of flits. The selected clock will be stopped after receiving the last flit of the packet by the network. The following table illustrates how this module distinguishes burst modes. A multiplexer is

TABLE I BURST MODE DETECTION

elk Burst mode Mburstsinglerequest Mburstprecise elkl SRMD I 1 elk2 BP 0 I elk3 BI 0

useful for passing NI signals of the modes burst elected. For this, Decision module provides the necessary coding to the multiplexer. Write OR gate: This component takes 3 signals in entry form these tree burst module. The burst module that is selected by decision module asserts its write signal for flit buffering in the shared memory resource (FIFO). The others burst modules puts write signal to zero.

SRMD Burst module: Take for example the SRMD burst module described in Figure 2 which is mainly composed of 4 sub-functional blocks (control FIFO, Header builder, Network Interface, Stoppable clock). For a write transaction,

Control Fifo

Header Builder

NI Request Transmit

Fifo Interconnection

Fig. 2. Architecture of SRMD Burst mode module

the Stoppable clock module activates the 2 sub modules (header builder, control FIFO). In fact, header builder prepares header flits and control FIFO module to manage the transfer of data from the OCP IP to the FIFO. The NI manages the transmission of header and data flits to the network in two phases. During the first phase and when the header is available (select= I), sending of the 2 header flits starts and the signal validate is disabled. During the second phase, once the header is received by the network, the NI disables the signal validate to allow the deactivation of the module header builder and starts reading data from the FIFO. A read transaction is limited to the achievement of the first phase. Control FIFO: This module is responsible for the management of FIFO writing. It can also put an end or suspend the writing if it receives a high state on the signal full of FIFO. When a data is well written in the FIFO and FIFO are not full, then the controller is ready to accept any request, so it asserts SCmdAccept. It was activated by the Stoppable clock module for a write request only. Header builder: It takes in entry some essential OCP signals during the transfer and the field provided from a routing table which shows the path to the target. It encapsulates this information for building a 2 header flits and sends these flits to the NI Transmit Module. When this flits was been received by this later, the header builder will be deactivated by Stoppable clock module for the remaining time of the current transaction. So, it can construct the next header in the next transaction if its mode is selected. NI request transmit: It is the signals synchronizer between the NI and the network. It receives package flits from the Header builder or from the FIFO- RX and sends it out from the NI to the network. Then, it transmits the flits to the network using the four-phase handshake push or credit based protocol. First flits to be transmitted are header flits which will be received from the module header builder on the bus header. The validate signal is used to indicate that header flits

of current packet are intercepted by the network. Stoppable clock module: The stoppable clock module can transfer or stop the local input clock to the two sub modules control FIFO and header builder in burst mode module. This module's role is to distinguish the type of command issued by the IP for a given transaction. Then, it allows transferring of the clock to the 2 sub modules header builder and controller FIFO. Indeed, through OCP signal (Mcmd, Mdatalast) and signal generated by the NI (validate) the formalism of local Stoppable clock is achieved: a) the Mcmd signal can identify the type of request. b) Mdatalast signal indicates whether the current write data transfer is the last in a burst. c) validate is a signal that is provided by the NI for indicating the reception of the header by the local router port . The transfer or the stopping of the clock for a reading or writing operation is illustrated as follow i) the first phase represents the beginning of transfer and the building of header flits for a read or write request. For a write transaction, the activation of control FIFO module is necessary to allow writing data in the FIFO. ii) Once the header flits are received by the NI, validate will be activated until the 2 flits header will be transmitted to the local port of the router. iii) Once done, a read transaction is completed by disabling the validate signal. In this phase, the clock of header builder module will be stopped for a read or a write transaction. On the other hand for a write transaction, control FIFO module continues its execution while Mdatalast is not asserted. iiii) The disserted of OCP signal Mdatalast leads to the deactivation of header builder and controller fifo clocks. OR gate: This component takes in entry 2 signals. The first comes from the controls FIFO module (in the case of a write request) and the second is that of the header builder (in the case of a read request).

2) Response data-flow: The Response data-flow is also divided into three stages. The first stage is activated when data is received by the NI and the response Receive Module from network via the NI. The second stage is FIFO TX, where data is temporary stored. The third stage is where data are transmitted to the master core by the OCP Response Module. Within the MNI Response Data-flow in Figure 3, several communications between modules proceed. The two modules

OCPCLK

Sdata

Sresp

=SrespLast

MrespAccept

Stoppable elk �

! ,

� �'I g ,

�. d Em'tY

I Full

Read Write I OCP

NI Response

Response Data out fifo Data In nfo Receive

Module Module Erreur

Fig. 3. Architecture of MNI Response Data-flow

RouteurCLK

I�

g ROO

Ack or Credit =

Oat.

Eo,

NI response Receive Module and OCP response Module are controlled by a stoppable clock module to minimize the

consumption of the chip when there's no response packet providing by a NOC. These modules constituting this entity are described as follows: Stoppable clock response Module: This module detects the presence of response packet when BOP= I. It activates the NI module by transferring the clk of transmitter router if the used mode is credit based. On the other side if the used mode is handshake, it transfers the clk of the OCP IP. For activation of OCP response module, it transfers the clk of the OCP IP. (Handshake or credit based) when the presence of the first data placing in the FIFO. Upon detection the end of transaction through SrespLast signal (SrespLast= I), it stop clocks of NI and OCP response modules if there is not a new response packet. NI response Receive Module: It is the synchronizer between the NI and the network. It receives the package flits from the network using the handshake or credit based protocol, and writes the package flits to a FIFO. At the time of the reception of the data on the bus Data; this module starts to make a temporary storage of these data in the FIFO to pass then to OCP Response Module. In the case of the NI that use handshake control flow, the clock passed to this module is the clock of IP OCP Master. In the case of the NI that use credit based control flow; the NI use the clock provided by the Local output port of router. OCP Response Module: its task is to transmit the response back to the master core. This module handles the response phase of the OCP. This module reads the data from FIFO, and then transmits them to the master IP. The passage of the data towards the IP is accompanied by other signals such as Steps which indicates the nature of the response.

IV. EXPERIMEN TAL RESULTS

In this section the synthesis results will be presented, and a cost analysis of area and power consumption will be made based on these results. The SNI's performances and MNI's performances will be evaluated in terms of speed and latency. We will present a comparative study of two implementations for NI. The first implementation of NI uses a handshake flow control and the second uses the credit based flow control . NIs with 32 bit OCP data fields and 32 bit network ports has been modeled by VHDL language at RTL level. They were simulated and synthesized respectively by using the ModelSim and the ISE 9.1 tools respectively. The proposed NIs were proto typed on Xilinx Virtex5 FPGA family device XC5VLX30, which has a capacity of 19200 slice registers. Table II shows the area of Configurable NIs for the two imple­mentations. The power consumption result of the Configurable NIs are presented in table III. Table IV presents the speed of configurable NIs. The maximum operating frequency obtained for these NIs implementations is about 378 MHZ. The result of latency measurements by the simulation of Master NIs and Slave NIs are presented in tables V.

A. Network interface area The table II presents the area of the configurable NIs Master

and Slave that support the three burst modes. The first version uses the handshake control flow mechanism and the second uses the credit based. We indicate that the area occupied by NI using handshake has a larger area than the credit based. The use of handshake 4 phases mechanism requires at least two cycles of clock to be carried out and at least 2 cycles for jitter. In addition, the FSM of NI using handshake is extended because it requires double the number of states than NI using credit based' states. Half of the states in handshake have the same functionality of all sates in credit based but the remaining states are waiting.

TABLE II SYNTHESIS RESULTS FOR AREA MASTER AND SLAVE NA CONFIGURABLE

Area of configurable NA Master NA Slave NA Handshak Number of Slice Registers 10315% 15798%

Number of slice LUTS 772 4% 205710% LUT-FF 56245% 126753%

Credit Based Number of Slice registers 11165% 12936% Number of Slice Luts 811 4% 17238%

Lut-FF 59045% 1031 51%

B. Power consumption The power consumption results are from ISE Xilinx tool

(XPower) where the frequencies on all input ports are set to 200 MHZ. In order to generate a good estimation of power consumption, switching activity estimation for all the input and output signals would be conducted. Normally, using a VHDL simulator and an appropriate test bench, a detailed switching activity summary can be generated for all signals throughout the design. We have evaluated the power consumption results based on the energy used per ransaction. We display in table III thet power consummation of the configurable NI Master and slave using handshake and credit based, when estimating the switching activity for a READ command. According to the

TABLE III CONFIGURABLE NA POWER ESTIMATE

Freq=200Mhz Handshake

Credit Based

master, power consummation in credit based is less than in handshake, but it looks equals for the SNAs. The employment of stoppable clock technique was been benefic in order to reduce dynamic power in configurable NI seeing that his architecture has more complexity than NI supporting only one mode. Otherwise, we will have a circuit may having more consummation.

C. Speed The speed of the Slave NI and the Master NI is presented as

the maximum frequency which the designs can run. We display in table IV the speed of configurable Master and Slave NI that use handshake and credit based. We show that the speed

TABLE IV SPEED OF TWO I M PLEMENTATION

of configurable credit based MNI is lower than handshake. For SNIs, they have approximately the same speed value. The maximum operating frequency obtained for these two NA implementations is about 378 MHZ. We also note that most speed results with handshake are better than those with credit based for the master and inversely for the slave.

D. Latency

For Master Network Interface, the latency for write or read request transaction is defined as the number of cycles needed by Request data-flow when the request is presented at the OCP interface to the time when the first flit of the packet leaves the NI. The latency for read response transaction is defined as the number of cycles needed by the Response Data-flow when the response packet is presented at the local port of the router to the time when the first response appears at the OCP interface. For Slave NI, the latency for write or read request transaction is defined as the number of cycles needed by the Request data­flow when the request packet is presented at the local port of the router to the time when the first request appears at the OCP interface. The latency for read response transaction is defined as the number of cycles needed by the Response Data-flow when the response is presented at the OCP interface to the time when the first flit of the response packet quits the SNI.

TABLE V MASTER LATENCY RESULT

MNI and SNI Latency BP BI Handshake Write request 3 10 3

Read requeste 3 9 5 Read response 7 2 7

Credit Based Write requeste 3 4 3 Read request 3 4 5 Read rsponse var 1 var

SRMD 10 3 10 9 3 9 2 7 2 5 3 5 4 3 2 1 var 1

Slave latency:In handshake, the 3 modes have the same latency for different types of transactions. In credit based, these 3 modes have different latencies due to the algorithm used in OCP request Module for each mode. For credit based control flow, the read response latency is the same for the 3 methods because they use the same architectural structure (response data flow). Credit based allows the SNI to receive in each clock edge a new flit while handshake requires more than two cycles. Therefore, the transactions latency of credit based are almost half those of handshake. The latency of configurable SNI is equal to the latency of the mode that is configured by a decision module. We have measured the request latency from a request which is issued by the Master IP OCP on the Master NI's CI to be received by the Slave IP OCP on the Slave NI's CI.

V. CONCLUSION

In this paper, we describe a new Configurable network interface architecture that supports three burst modes and are conforms to the OCP specifications. A performance study is conducted for area, power consumption, speed, and latency for NIs that use handshake or credit based flow control. The shared memory technique allows a gain in area of MNI and SNI employing respectively the handshake and Credit based flow control. We present how we can reducing the area by using a shared memory betwen 3 OCP modes and how the stoppable clock techniques can saving the power. Many IPs can share the same input port of proposed Configurable network interface by using a traffic merging and splitting modules described in [17], which interleave the different traffic flows on a unique network interface. This work can be extended to support AXI or AHB protocols with OCP and IPs sharing the NI and buffering resources.

REFERENCES

[1] J. Liang, S. Swaminathan, and R. Tessier, aSOC: A scalable, single chip communications architecture, in Proc. PACT, 2000.

[2] W. 1. Dally, Virtual-channel flow control ,inProc. 17th Annu. Int. Symp. Comput. Architecture, May 1990.

[3] A. Mello, L. Tedesco, N. Calazans, and F. Moraes, Virtual Chan­nels in Networks on Chip: Implementation and Evaluation on Hermes NoC,lntegrated Circuits and Systems Design, 18th Symposium on Vol­ume, Issue, Date: 4-7 Sept. 2005, Pages: 178-183.

[4] S. Kumar, A Network on Chip Architecture and Design Methodology, Proc. Of IEEE Annual Symposium on VLSI, 2002, Pittsburgh, USA, Pages: 117-124.

[5] W.J. Dally, and B. Towles,Route Packets, Not Wires: On-Chip Intercon­nection Networks,Proc. Design Automation Conf. (DAC), 2001, Pages: 638-689.

[6] F. Karim, and al,An Interconnect Architecture for Networking Systems on Chips,IEEE Micro, vol. 22, no. 5, Sept. Oct. 2002, pp : 36-45.

[7] L. Benini and G. De Micheli, Network on Chips: A New SoC Paradigm, IEEE Computer, Jan.2002, Pages: 70-78.

[8] A. Radulescu, J. Dielissen, K. Goossens, E. Rijpkema, and P. Wielage, An efficient on-chip network interface offering guaranteed services, shared­memory abstraction, and flexible network configuration,in Proceedings of the 2004 Design, Automation and Test in Europe Conference (DATE'04). IEEE, 2004.

[9] S. Yoo, G. Nicolescu, D. Lyonnard, A. Baghdadi, and A. Jenya, A Generic Wrapper Architecture for Multi-Processor SoC Cosimulation and Design,lnt.Symposium on HW/SW Codesign (CODES), Copenhagen, Denmark, April 2001, Pages: 195-200.

[10] T. Marescaux, E. Brockmeyer, and H. Corporaal, The Impact of Higher Communication Layers on NoC Supported MP-SoCs,Proceedings of the First International Symposium on Networks-on-Chip, NoCs2007, Pages: 107-116.

[II] M. T. Rose. The Open Book: A Practical Perspective on OSI, Prentice Hall, 1990.

[12] K. Keutzer, System-level design: Orthogonalization of concerns and platform-based design, IEEE Trans. on CAD of Integrated Circuits and Systems,2000 , 19(12), Pages: 1523-1543.

[13] M. Sgroi, Addressing the system-on-a-chip interconnect woes through communication-based design, In Proc. DAC, 2001.

[14] Virtual Socket Interface Alliance. Virtual component interface standard - draft specification, v. 2.2.0. http://www.vsia.com (document access may be limited to members only); August 1997.

[15] Open Core Protocol Specification, Release 2.0, www.ocpip.org, OCP-IP Association, 2003.

[16] ARM, AMBA AXI Protocol Specification, version 1.0 www.arm.com. ARM, March 2004.

[17] A. Ferrante, S. Medardoni, and D. Bertozzi, Network Interface Sharing Techniques for Area Optimized NoC Architectures,Digital System Design Architectures, Methods and Tools, DSD 2008, Pages: 10-17.

[18] S. Badrouchi, A. Zitouni, K. Torki, and R. Tourki Asynchronous NOC Router Design,Journal of Computer Science ,vol. 1, no. 3,2005, Pages: 429-436.

[19] M. Moadeli, A. Shahrabi,W. Vanderbauwhede, and M. Ould-Khaoua ,An Analytical Performance Model for the Spidergon NoC,Proceedings of the 21 st International Conference on Advanced Networking and Appli­cations, 2007, Pages : 1024-1021.

[20] F. Karim A. Nguyen, and S. Dey,An interconnect architecture for net­working systems on chips ,IEEE Journal on Micro High Performance Interconnect, vol. 22, issue 5, pp : 36-45, Sept 2002.

[21] P. T. Wolkotte, G. J. M. Smit, G. K. Rauwerda, and L. T. Smit,An energy-ecient reconfigurable circuit switched network-on-chip ,Proc. 19th IEEE International Conference on Parallel and Distributed Pro- cessing Symposium, pp : 155-163, 2005.

[22] C. Albenes, and al,ParIS: A parameterizable interconnect switch for Networks-on-Chips ,Proc. ACM Conference, pp : 204-209, 2004.

[23] C. Neeb, M. Thul, and N. Andwehn,Network on-chip-centric approach to interleaving in high throughput channel decoders ,Proc. IEEE Inter­national Symposium on Circuits and Systems, pp : 1766-1769, 2005.

[24] F. Moraes and N. Calazan,An infrastructure for low area overhead packet-switching network on chip ,Integration - The VLSI Journal, vol. 38, Issue 1, pp : 69-93, October 2004.

[25] D. Bertozzi ,Network Interface Architecture and Design Issues ,book chapter from "Networks on Chips: Technology and Tools", edited by Benini L., G.De Micheli, Morgan Kaufmann, 2006.

[26] P. Bhojwani, and R. Mahapatra,Interfacing cores with on-chip packet­switched networks,ln Proc. VLSI Design, pp :382-387 2003.

[27] C. A. Zeferino, M. E. Kreutz, L. Carro, and A. A. Susin.,A study on communication issues for systems-on-chip ,In Proc. SBCCI,pp : 121-126 2002.

[28] Kim et al.,Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC,lnt. Symp. on Networks-on­Chip, pp :30-39, 2007.

[29] H. Charlery, A. Andriahantenaina, and A. Greiner,Physical design of the V CI wrappers for the on-chip packet-switched network named SPIN, Computer and Electrical Engineering voU3, 2007, pp : 299-309.

[30] S. Stergiou, and al,xpipes Lite: a Synthesis Oriented Design Library for Networks on Chips,DAC 2005, pp : 559-564.

[31] T. Bjerregaard, S. Mahadevan, R. Olsen, and 1. Sparso,An OCP com­pliant network adapter for GALS-based SoC design using the MANGO network-on-chip,ln Proceedings of International Symposium on System on Chip IEEE, 2005.

[32] L. Fiorin, G. Palermo, and C. Silvano,MPSoCs Run-Time Monitoring through Networks-on-Chip,in Proceedings of the 2004 Design, Automa­tion and Test in Europe Conference IEEE 2009, date 2009 pp : 558-561.

[33] F. Angiolini, P. Meloni, S. Carta, L. Benini,Contrasting a NoC and a traditional interconnect fabric with layout awareness,in DATE, 2006, pp : 124-129.

[34] F. Steenhof et al,Networks on Chips for high-end consumer electronics TV system architectures,DATE, 2006, pp :148-153.

[35] W. KIM, and S. HWANG ,Design of an Area-Efficcient and Low-Power NoC Architecture Using a Hybrid Network Topology,IEICE transactions on fundamentals of electronics, communications and computer science 2008, vol. 91, noll, pp : 3297-3303.

[36] D. Tortosa, T. Ahonen, and 1. Nurmi,Issues in the development of a practical NoC: the Proteo concept, Integration, the VLSI Journal Octo­ber 2004,Volume 38, Issue 1, pp : 95-105.

[37] S. Felperin, P. Raghavan, and E. Upfal,A theory of wormhole Routing, Proceeding IEEE Transaction on Computer, June 1996, Vol. 45, no. 6, pp : 704-713.

[38] N. Seongmin, K. Daehyun, N. Vu-Duc , and C. Hae-Wook,Performance and Complexity Analysis of Credit-Based End-to-End Flow Control in Network-on-Chip,Springer Berlin I Heidelberg , Parallel and Distributed Processing and Applications, In Procedings, pp : 268-277, 2007.

[39] P. Guerrier, and A. Greiner,A generic architecture for on-chip packet switched interconnections, in Proc. DATE, 2000.

[40] B. Attia, A. zitouni ,and R. tourki,Design and implementation of net­work interface compatible OCP For packet based NOC,5th Interna­tional Conference on Design and Technology of Integrated Systems in Nanoscale Era,Mars 23-25 2010 Hammamet tunisia.

[41] B. Attia, A. Zitouni, N. Abid and R. Tourki,A modular network inter­face adapter design for OCP compatibles NoCs,lnternational Journal of Computer and Network Security, HCNS 2009, vol. 1, no. 2, pp : 101-109.