architecture and design of an efficient router for oasis

7
1 University of Aizu, Graduation Thesis. March, 2015 s1190130 Abstract High-performance 3D Network-on-Chips (3D- NoCs) have become viable solutions for future many core systems. Through-Silicon-Via (TSV) is a prominent element of 3D-NoC design to support good performance and low power consumption. 3D-OASIS Network-on-Chips (3D-ONoC) was previously proposed in our laboratory. In this thesis, we integrate TSV connections in order to obtain a reliable 3D- ONoC router. We found out from the performance evaluation that the proposed router correctly delivers messages via the integrated TSV connections without observing any timing violations. We also noticed that the area of the proposed router is 51939μm, the power is 403μW, and the speed is 490.1MHz. 1 Introduction Network-on-Chips (NoCs) are packet-based on- chip communication architectures for multi-processor System-on-Chip design. They can improve scalability and throughput, and can reduce the power consumption when compared to conventional shared- bus architectures. However, the increase in the number of cores on a flat chip structure may reduce the efficiency of NoC due to the increase of the network diameter and the communication distance. One solution is extending the 2D-NoC paradigm to the 3D domain, as shown in Fig. 1.1. 3D Integrated Circuits (3D-ICs) has attracted a significant attention in recent years, as it provides new opportunities for the chip architecture innovation. 3D- Network-on-Chips (3D-NoCs) are part of a cutting edge technology that integrates the NoC paradigm and 3D-IC technology. Including multiple silicon layers of active devices, 3D-ICs allow the improvement of performance with less scaling concerns. Moreover, 3D-NoCs have other potential benefits, such as miniaturization, high density (small footprint), fast signal transmission and processing speed, low power, and high functionality. In contemporary technology, a gathering attention and viable solution in building 3D- ICs is to stack several 2D-IC layers upon each other using vertical connections, called Through-Silicon- Via (TSV), as depicted in Fig. 1.2. The TSV technology is relatively recent. Data traffic through TSVs might become a bottleneck in 3D-NoCs, because these vertical connections have a much bigger footprint than ordinary NoC connections and disparate electrical characteristics [1]. Generally, 3D packaging ensured the mutual signal connections along the edges of the chips stacked by wire-bonding. In this method, it is necessary to increase even slightly packages for connection space; in addition, it usually necessary to add interposer layers between the dies. Contrariwise, TSVs can go through the silicon wafer without the need for wire bonding or interposer layers. In this fashion, the number of pins located on the chip’s edges is significantly reduced. Thus, we can spread the package for wiring space and there is no need to provide the interposer between the dies by adopting TSVs. Whereby, it is possible to reduce the 3D packaging’s area and thickness than the conventional method. The TSV technology requires smaller connection distance, and less parasitic capacitance is achieved acquiring less resistance. As a result, we can obtain reduced delay and attenuation, less deterioration of the waveform, and also the extra circuitry for amplification or electrostatic breakdown protection also may be omitted. Moreover, high-speed operation of the circuit implemented in the package is simplified and low power consumption is achieved. These benefits are suitable for 3D-NoC systems. Fig. 1.1 8x8 2D-NoC converted to 4x4x4 3D-NoC architecture Architecture and Design of an Efficient Router for OASIS 3D Network-on-Chip System Yuki Tanaka s1190130 Supervised by Prof. Abderazek Ben Abdallah

Upload: others

Post on 03-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

untitledAbstract High-performance 3D Network-on-Chips (3D-
NoCs) have become viable solutions for future many core systems. Through-Silicon-Via (TSV) is a prominent element of 3D-NoC design to support good performance and low power consumption. 3D-OASIS Network-on-Chips (3D-ONoC) was previously proposed in our laboratory. In this thesis, we integrate TSV connections in order to obtain a reliable 3D- ONoC router. We found out from the performance evaluation that the proposed router correctly delivers messages via the integrated TSV connections without observing any timing violations. We also noticed that the area of the proposed router is 51939μm, the power is 403μW, and the speed is 490.1MHz.
1 Introduction Network-on-Chips (NoCs) are packet-based on-
chip communication architectures for multi-processor System-on-Chip design. They can improve scalability and throughput, and can reduce the power consumption when compared to conventional shared- bus architectures. However, the increase in the number of cores on a flat chip structure may reduce the efficiency of NoC due to the increase of the network diameter and the communication distance. One solution is extending the 2D-NoC paradigm to the 3D domain, as shown in Fig. 1.1.
3D Integrated Circuits (3D-ICs) has attracted a significant attention in recent years, as it provides new opportunities for the chip architecture innovation. 3D- Network-on-Chips (3D-NoCs) are part of a cutting edge technology that integrates the NoC paradigm and 3D-IC technology. Including multiple silicon layers of active devices, 3D-ICs allow the improvement of performance with less scaling concerns. Moreover, 3D-NoCs have other potential benefits, such as miniaturization, high density (small footprint), fast signal transmission and processing speed, low power, and high functionality. In contemporary technology, a gathering attention and viable solution in building 3D- ICs is to stack several 2D-IC layers upon each other using vertical connections, called Through-Silicon- Via (TSV), as depicted in Fig. 1.2.
The TSV technology is relatively recent. Data
traffic through TSVs might become a bottleneck in
3D-NoCs, because these vertical connections have a
much bigger footprint than ordinary NoC connections
and disparate electrical characteristics [1]. Generally,
3D packaging ensured the mutual signal connections
along the edges of the chips stacked by wire-bonding.
In this method, it is necessary to increase even slightly
packages for connection space; in addition, it usually
necessary to add interposer layers between the dies.
Contrariwise, TSVs can go through the silicon wafer
without the need for wire bonding or interposer layers.
In this fashion, the number of pins located on the
chip’s edges is significantly reduced.
Thus, we can spread the package for wiring space
and there is no need to provide the interposer between
the dies by adopting TSVs. Whereby, it is possible to
reduce the 3D packaging’s area and thickness than the
conventional method.
distance, and less parasitic capacitance is achieved
acquiring less resistance. As a result, we can obtain
reduced delay and attenuation, less deterioration of the
waveform, and also the extra circuitry for
amplification or electrostatic breakdown protection
also may be omitted. Moreover, high-speed operation
of the circuit implemented in the package is simplified
and low power consumption is achieved. These
benefits are suitable for 3D-NoC systems.
Fig. 1.1 8x8 2D-NoC converted to 4x4x4 3D-NoC
architecture
OASIS 3D Network-on-Chip System
2
University of Aizu, Graduation Thesis. March, 2015 s1190130
In this thesis, we integrate Through-Silicon-Via (TSVs) in 3D-OASIS Network-on-Chip (3D-ONoC) [2, 3, 4, 5, 6], which was previously developed in our laboratory, in order to take full advantage of 3D stacking benefits. In addition, we compare the performance of 3D-ONoC before and after the implementation of TSVs.
2 Router architecture
2.1 3D OASIS router architecture
The router block diagram of 3D-OASIS-NoC router (3D-ONoC) [2, 3, 4, 5, 6, and 11] is represented in Fig.2.1. It is characterized by three main pipeline stages: Buffer Writing (BW), Routing- calculation/Switch-Arbitration (RC/SA) and finally the Crossbar-Traversal stage (CT).
The router is considered as the back-bone component of the whole 3D-ONoC design. Each router has a maximum number of 7-input by 7-output ports, where four ports are dedicated to the connection to the neighboring routers, one port is used to connect the switch to the local computation tile, and the remaining two ports are added to connect the router to the upper and downer layers to ensure the inter-layer communication. As shown in Fig.2.1, the router contains seven Input-port modules for each direction in addition to the Switch-Allocator (where the STALL-GO flow control and the Matrix-arbiter scheduler can be found) and the Crossbar module that handles the transfer of flits to the next neighboring node.
Figure 2.2 demonstrates the 3D-ONoC’s flit format. The first bit indicates the tail informing the end of the packet. The next seven bits are dedicated to indicate the Next-Port that will be used by the Look-Ahead- XYZ routing algorithm to define the direction of the next downstream neighboring node where the flit will be sent to. Then, three bits are used to store destination information of each xdest, ydest and zdest. Finally, the remaining 64 bits are dedicated to store the payload. Since 3D-ONoC is targeted for various applications, the payload size can be easily modified in order to respect the requirements of some specific applications. In addition, the architecture does not provide a separate head flit and therefore every flit identifies its X, Y, and Z destination addresses and carries an additional single bit to indicate whether it is a tail flit or not.
2.2 TSV Design
We chose to design the TSV in an abstract way as analyzed and performed in [7]. In this design process, first the specifications of the TSV need to be pointed out. The specification which is important for this design is basically the dimension of the TSV. We considered the 4.06μmX4.06μm size. The files that are needed for the design are listed as follow:
Verilog HDL description: needed in the
Synthesis phase for determining the logical functionality and connection.
LEF file: required by the Place&Route for determining its physical form and layout.
Timing information: it is necessary for timing simulation.
2.2.1 TSV Verilog HDL Description The Verilog HDL description is needed for
simulation, synthesis, and Place&Route purposes. For simulation, it is necessary that the TSV is connected to
Fig.1.2 3D integration with TSVs
Fig. 2.1 3D-ONoC router architecture
Fig. 2.2 3D-ONoC flit format
3
University of Aizu, Graduation Thesis. March, 2015 s1190130
the vertical input and output ports of the router. This means that a vertical output signal from the router should go into a TSV, which in turn means that it should have an input port. But at simulation, this output port should be read out form the TSV. So, the TSV should have also an output port. The Verilog code is simple. It connects inputs to outputs. However, the Verilog HDL code cannot represent the vertical connection until the Place&Route phase. A solution for this problem is to use a normal construction which is an ordinary input and output port for the TSV.
2.2.2 TSV LEF file and Timing In Cadence SoC Encounter CAD tool, for every
standard cell or macro cell, a Library Exchange Format (LEF) file is needed. A LEF file gives the physical description of such a cell [8]. In our case, a LEF file is needed to describe the physical properties of this cell. Because TSV is not a standard cell, it will be implemented as a macro cell with its own physical properties. When we make TSV LEF file, we need to use Virtuoso tool and FreePDK3D45.
First, we implement a TSV layout obtained from FreePDK3D45 tool kit with Cadence Virtuoso. Figure2.3 shows TSV structure in FreePDK3D45 [10]. It shows that a TSV is a stack of metal layers (Metal 2 to Metal 9 are omitted for simplicity). Now, we make the TSV connection with one router. However, the information provided in the FreePDK3D45 tool kit is for two tiers. To make the desired TSV we extract the metal layer structure for one tier as seen in Fig. 2.4. Until this point, the TSV layout is not completed. We should add I/O ports. As we can see in Fig. 2.4, we added I/O port as Metal1 and Output port as TM. After
adding the I/O pins, we exported the LEF file. Moreover, after generating the TSV LEF file from Virtuoso, we should add more information to it, such as the OBS information. OBS is a macro obstruction area where no metal wire can be routed. So the OBS is defined as the entire TSV area. Other information such as the CLASS type should be added to the LEF file. CLASS illustrates the macro definitions which begin with the macro name followed by the type of macro. In our case, the BLOCK class was chosen. The reason for this choice is that in this case the Place&Route tool sees this as an ordinary macro. Finally, every macro description has its own SITE specification. This defines a placement site in the design. It is important to be compatible with the technology LEF file and the site defines therein, which is FreePDK45_38x28_10R_NP_162NW_34O.
3 Implementation
HDL
The first step, we integrate the TSV to the 3D-
ONoC router in Verilog HDL. The TSV is placed
before the input-ports and after the crossbar. In
particular, TSVs are only connected to the Up and
Down input- and output-ports. The stop signals
(required for the adopted Stall-Go flow control) going
to and coming from the Up and Down output/input
ports also require TSV connections. Figure 3.1
represents the complete proposed 3D-ONoC router
architecture with TSV connections.
Fig2.4 Adding I/O to TSV layout in virtuoso
4
3.2 Synthesis with Synopsys Design
Compiler
In the second step, we use Synopsys Design
Compiler to synthesis the Verilog HDL file which
contains both router and TSVs. Figure 3.2 illustrates
the file hierarchy of the synthesized system. Synopsys
Design Compiler can generate the netlist and SDC
files. Both of files are needed in the Place&Route step.
They contain the pin connection information and the
timing information.
Encounter
In the third step, we use Cadence SoC Encounter to Place&Route both router and TSV. After this step, we can generate the new netlist, SDF, and SPEF files.
They have information about the system’s physical properties after the Place&Route phase. They are used to perform the post-layout simulation using Synopsys Design Compiler.
4 Evaluation Results In this section, simulations results and the obtained
layout are provided.
4.1 Timing analysis
For this step, we execute a script that contains the necessary operations for time analysis. The operations are performed using Synopsys Design Compiler. The complete router’s (with TSV) netlist file obtained from
Fig.3.1 Proposed router architecture
5
the Place&Route phase is used. As a result, we
concluded that the maximum frequency for correct data transmission is 490 MHz.
4.2 Timing simulation
In this second step, we check whether our design is
free from any delay violation. We use ncverilog and
simvision. We created a Testbench where two routers
are connected via their Up and Down ports. First, flits
are injected to the local port of Router-1 and destined
to Router-2. On the other side, flits are concurrently
injected in the local port of Router-2 and destined to
Router-1. In Figure 4.1, we can see the injected data
into the local port of Router-1 and how they are
correctly ejected from Router-2. Both values are the
same. Thus, confirming the correctness of the
proposed router and its freeness from timing violations.
4.3 Power evaluation
After we made sure that there are no timing violations in our design, we proceed to evaluate the power consumption. For the evaluation, we execute a script that contains the necessary operations for this evaluation. The operations are almost the same as the ones performed in the synthesis step (section 3.2). After performing this evaluation, we found out that the total power consumption of the complete router is 403 μW.
4.4 Hardware design results:
Figure 4.2 represents the layout of the proposed router with TSVs using Cadence SoC Encounter. We can see the different modules of the router: the seven input ports, the Switch-allocator, the Crossbar, and the TSV array containing 156 TSV connections. The TSV array is placed in the center of the chip while the remaining components are placed in reference to the tool’s optimization. Table 4.1, shows the hardware design results in terms of area, power, and speed for both baseline and proposed routers. The area increases because we add TSVs to the router. So the power increases as well. The frequency decreases since it is usually related to the area overhead.
Technology Nangate 45nm,
TSV size 4.06 μm × 4.06μm
TSV Pitch 15 μm Keep-out-Zone 2 μm flit size 38(payload: 21)
Fig.4.1 Post-layout timing simulation of the proposed router
Table 4.1 Evaluation parameter
5 Conclusion
of 3D-Network-on-Chip router integrated with TSVs.
The proposed router aims to fully take advantage of
the benefits of 3D-integration. We made several TSV
connections and integrate them with our previously
designed 3D OASIS-NoC (3D-ONoC) router. The
correctness of the proposed complete router
architecture’s connection was proved. From the
hardware complexity evaluation, we observed that the
area of the proposed router is 51939μm2, the power is
403μW, and the speed is 490MHz. As a future work, we plan to evaluate the proposed
router with larger network size and observe its behavior under larger real benchmarks.
Reference [1] C. Liu et al. “Vertical Interconnects Squeezing in
Symmetric 3D Mesh Network-on-Chip”, in: ASP-DAC,
2011.
Chip Interconnection for Future Gigascale MCSoCs
Applications: Communication and Computation
Symposium on Society, Science and Technology
(TJASSST), Dec. 4-9th, 2006.
[3] K. Mori, A. Ben Abdallah, K. Kuroda,”Design and
Evaluation of a Complexity Effective Network-on-Chip
Architecture on FPGA”, The 19th Intelligent System
Symposium (FAN 2009), Sep. 2009, pp.318-321.
[4] A. Ben Ahmed, A. Ben Abdallah, K.
Kuroda. ”Architecture and Design of Efficient 3D Network-
on-Chip (3D-NoC) for Custom Multicore SoC”, IEEE Proc.
of the 5th International Conference on Broadband, Wireless
Number of pins
Table 4.3 Hardware design comparison results
(a) (b) Fig4.2 proposed 3D-ONoC router Design (a) with metal wire (b) without metal wire
7
Computing, Communication and Applications (BWCCA-
2010), Nov. 2010
[5] A. Ben Ahmed, ”On the Design of a 3D Network-
on-Chip for Many-core SoC”, Master thesis, Graduate
School of Computers Science and Engineering, the
University of Aizu, Feb. 2012.
[6] A. Ben Ahmed, A. Ben Abdallah, “LA-XYZ: Low
Latency, High Throughput Look-Ahead Routing Algorithm
for 3D Network-on-Chip (3D-NoC) Architecture”, 2012
IEEE 6th International Symposium on Embedded Multicore
SoCs.
Delft University of Technology, September 2011.
[8] J. Bhasker, THE EXCHANGE FORMAT
HANDBOOK: A DEF, LEF, PDEF, SDF, SPEF&VCD
PRIMER, Star Galaxy Publishing, 2006, ISBN: 0965039137.
[9] Library Overview,
2011.
Deadlock-Free Fault-Tolerant Routing Algorithm for 3D
Network-on-Chip Architectures, Journal of Parallel and
Distributed Computing 74/4 (2014), pp. 2229-2240