lab-3: simple accelerator design - national chiao tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf ·...
TRANSCRIPT
![Page 1: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/1.jpg)
Lab-3: Simple AcceleratorDesign
National Chiao Tung UniversityChun-Jen Tsai
4/11/2011
![Page 2: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/2.jpg)
2/29
Video Decoder Accelerator Goal: Implement a block-based motion compensation circuit,
Tasks: Extend the given accelerator from point-wise
operation to 88 block-based operation
Please give a demo (per group) to the TAs toshow that your system is working by 5/5 Each team hands in a three-page report to describe
your design, and provide the performanceimprovement numbers
![Page 3: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/3.jpg)
3/29
mcomp
Architecture of the MCOMP Logic
hwdata
haddr
3
32
The architecture of the provided motion comp.module is composed of a register file and ainterpolation datapath
hrdata
Register filereg a
reg b
reg c
reg d
reg r
addr in
data in
addr[4:2]
Datapath
(a+b+1-r)/2, or(a+b+c+d+2-r)/4
data outhconfig
(832 ROM)
LEON3ahbctrl
256
![Page 4: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/4.jpg)
4/29
Inside Lab 3 Package Download and unzip the package for lab3, you
will see the following directory structure:
Lab3_pkg|+- m4v_dec_ecos (the complete video decoder with HW accelerator API)|+- rtl (synthesizable RTL model of the motion compensation logic)|+- testbench (C program for waveform simulation or FPGA verification)
![Page 5: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/5.jpg)
5/29
Functions to be Accelerated In bilinear8x8.c, there are three functions for
bilinear interpolation: halfpel8x8_h() halfpel8x8_v() halfpel8x8_hv()
Current code performs pixel-wise interpolation;you must modify it to block-based interpolation
p = (A + B + 1 –r)/2q = (A + C + 1 –r)/2o = (A + B + C + D + 2 –r)/4r = 0 or 1, is a “rounding control”
parameter (selected by the encoder).
A B
DC
p
oq
![Page 6: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/6.jpg)
6/29
Pixel vs. Block-based Operations Take halfpel8x8_h() for example:
voidhalfpel8x8_h(uint8 * dst, uint8 * src, xint stride, xint rounding){
xint row, col, idx, sum;
idx = 0;for (row = 0; row < (stride << 3); idx = (row += stride)){
for (col = 0; col < 8; col++, idx++){
sum = (xint) src[idx] + (xint) src[idx+1] + 1 - rounding;dst[idx] = (uint8) (sum >> 1);
}}
}
cut-off point between HW and SW in your logic
Cut-off point between HW and SW in the provided logic
![Page 7: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/7.jpg)
7/29
Block Interpolation Architecture
mcomp
hwdata
haddr
7
32
hrdata
Register file
pixel(0)
pixel(80)
addr in
data in
addr[8:2]
hconfig(832 ROM)
LEON3ahbctrl
256
..
.
type block_9x9 is array (0 to 80) of std_logic_vector(7 downto 0);signal pixel: block_9x9;
Controller
Block-basedInterpolator reg r
![Page 8: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/8.jpg)
8/29
Adding Logic to the LEON Platform In Lab 3, we add an AHB slave logic to the
LEON 3 platform. The RTL model directorycontains some sample modifications:
rtl|+- nctu +- video_acc +- mcomp.vhd, video_acc.vhd| | +- vhdlsyn.txt| || +- dirs.txt|+- device.vhd (to replace the one in $(GRLIB)/lib/grlib/amba)|+- leon3mp.vhd (to replace the one in $(GRLIB)/design/leon3-gr-xc3s-1500)
Put this whole nctu directoryin $(GRLIB)/lib
Note: $(GRLIB) is the directory of the complete source tree of GRLIB.
![Page 9: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/9.jpg)
9/29
Modify LEON Working Libraries To add our “nctu”video accelerator logic into the
LEON3 working library, we have to add thelibrary name “nctu”to the files libs.txt under thefollowing two directories: $(GRLIB)/designs/leon3-gr-xc3s1500/ $(GRLIB)/lib/
![Page 10: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/10.jpg)
10/29
Generation of Build Scripts Enter the directory, designs/leon3-gr-xc3s-1500, in
GRLIB source three, and type the followingcommands:$ make distclean clean up previous builds
$ make scripts generate automatic build scripts
The command “make scripts”searches thedirectory tree $(GRLIB)/lib for libs.txt, dirs.txt,vhdlsyn.txt, and vhdlsim.txt in order to generateproper build scripts for all the hardware logics
![Page 11: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/11.jpg)
11/29
Adding Logic to AHB Bus†
The AMBA bus used for LEON 3 has someextensions All HSEL signals are routed to all slave logics Some extra signals are added: HCACHE, HIRQ,
HCONFIG, and HINDEX The HCONFIG signal contains information about
the logic to be added Each HCONFIG signal is 84 bytes GRLIB allows 64 masters and 64 slaves, with their
HCONFIG record mirrored at 0xFFFFF000 ~0xFFFFFFFF
†For details, see Jiri Gaisler and Sandi Habinc, GRLIB IP Library User’s Manual, ver. 1.0.22, Aeroflex Gaisler, 2010
![Page 12: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/12.jpg)
12/29
HCONFIG Record HCONFIG record controls the address map,
interrupt control, and caching control of a logic
![Page 13: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/13.jpg)
13/29
Address Decoding (1/2) The memory area mapped to a logic is
determined by its bank address registers (BARs) There are two types of logics: memory and I/O For memory logic, each BAR specifies 1MB address
space in any location (except 0xFFF00000 ~0xFFFFFFFF) of the 32-bit address space by default
For I/O logic, each BAR specifies 256 bytes addressspace in any location in the address range0xFFF00000 ~ 0xFFFFFFFF by default
![Page 14: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/14.jpg)
14/29
Address Decoding (2/2) If a logic needs a larger contiguous address
space, the MASK of a BAR can be used toincrease the default range of its spaceahbctrl: AHB arbiter/multiplexer rev 1ahbctrl: Common I/O area disabledahbctrl: AHB masters: 3, AHB slaves: 8ahbctrl: Configuration area at 0xfffff000, 4 kbyteahbctrl: mst0: Gaisler Research Leon3 SPARC V8 Processorahbctrl: mst1: Gaisler Research AHB Debug UARTahbctrl: mst2: Gaisler Research SVGA frame bufferahbctrl: slv0: European Space Agency Leon2 Memory Controllerahbctrl: memory at 0x00000000, size 512 Mbyte, cacheable, prefetchahbctrl: memory at 0x20000000, size 512 Mbyteahbctrl: memory at 0x40000000, size 1024 Mbyte, cacheable, prefetchahbctrl: slv1: Gaisler Research AHB/APB Bridgeahbctrl: memory at 0x80000000, size 1 Mbyteahbctrl: slv2: Gaisler Research Leon3 Debug Support Unitahbctrl: memory at 0x90000000, size 256 Mbyteahbctrl: slv6: CS NCTU CODESIGN Lab3 Motion Compensation Logicahbctrl: memory at 0xb0000000, size 1 Mbyte, prefetch
![Page 15: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/15.jpg)
15/29
Interface of Lab 3 Logicentity mcomp isgeneric (ahbndx : integer := 0;ahbaddr : integer := 0;addrmsk : integer := 16#fff#;verid : integer := 0;hirq_no : integer := 0
);port (rst : in std_ulogic;clk : in std_ulogic;ahbsi : in ahb_slv_in_type;ahbso : out ahb_slv_out_type
);end;
architecture rtl of mcomp is
constant hconfig : ahb_config_type := (0 => ahb_device_reg(VENDOR_NCTU, NCTU_MCOMP, 0, verid, hirq_no),4 => ahb_membar(ahbaddr, '1', '0', addrmask),others => zero32
);
![Page 16: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/16.jpg)
16/29
Vendor/Device ID Registration Your vendor and device ID can be stored in the
file $(GRLIB)/lib/grlib/amba/devices.vhd forsimulation purposes:-- HW/SW Co-design LAB vender code & device ID
constant VENDOR_NCTU : amba_vendor_type := 16#18#;constant NCTU_MCOMP : amba_device_type := 16#001#;
-- pragma translate_offconstant NCTU_DESC : vendor_description := "CS NCTU CODESIGN ";
constant nctu_device_table : device_table_type := (NCTU_ALU => "Lab3 Motion Compensation Logic ",others => "Unknown Device "
);
constant nctu_lib : vendor_library_type := (vendorid => VENDOR_NCTU,vendordesc => NCTU_DESC,device_table => nctu_device_table
);-- pragma translate_on
![Page 17: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/17.jpg)
17/29
Device Instantiation The device must be instantiated in the file$(GRLIB)/designs/leon3-gr-xc3s-1500/leon3mp.vhd
In this example, the device registered IRQ #14.Raising HIRQ(14) causes an interrupt to theinterrupt controller, and then to the ISR
my_mcomp : mcomp generic map (ahbndx => 6,ahbaddr => 16#B00#,verid => 1,hirq_no => 14)
port map (rstn, clkm, ahbsi, ahbso(6));
![Page 18: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/18.jpg)
18/29
Implement the HW Bit File Change directory to designs/leon3-gr-xc3s-1500,
logic implementation is done by typing:$ make ise
You can use iMPACT to configure the FPGA When you use “info sys”in GRMON, you should see
02.01:004 Gaisler Research LEON3 Debug Support Unit (ver 0x1)ahb: 90000000 - a0000000AHB trace 1 lines, stack pointer 0x43fffff0. . .
06.18:001 Unknown vendor Unknown device (ver 0x1)ahb: b0000000 - b0100000
01.01:00c Gaisler Research Generic APB UART (ver 0x1). . .
![Page 19: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/19.jpg)
19/29
Testing the Accelerator To test the motion compensation logic you can
build and run the C testbench program,mcomp_tb.c Building mcomp_tb.c does not require eCos
To run the complete video decoder with HW“acceleration,”simply drop-in replace the filebilinear8x8.c in previous lab source tree
![Page 20: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/20.jpg)
20/29
Calling HW Logic from SW Code
#define USE_HW_MC 1
#if USE_HW_MCvolatile xint *reg_a = (xint *)0xb0000000;volatile xint *reg_b = (xint *)0xb0000004;volatile xint *reg_c = (xint *)0xb0000008;volatile xint *reg_d = (xint *)0xb000000c;volatile xint *reg_r = (xint *)0xb0000010;
volatile xint *mc_2pt = (xint *)0xb0000014;volatile xint *mc_4pt = (xint *)0xb0000018;#elsexint sum;#endif
voidhalfpel8x8_h(uint8 * dst, uint8 * src,
xint stride, xint rounding){
xint row, col, idx;
idx = 0;for (row=0; row<(stride<<3);
idx=(row+=stride)){
for (col = 0; col < 8; col++, idx++){
#if USE_HW_MC*reg_a = (xint) src[idx];*reg_b = (xint) src[idx+1];*reg_r = (xint) rounding;dst[idx] = (uint8) (*mc_2pt);
#elsesum = (xint) src[idx] +
(xint) src[idx + 1] + 1- rounding;
dst[idx] = (uint8) (sum >> 1);#endif
}}
}
. . .
The current referenceinterface design isnaive:
![Page 21: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/21.jpg)
21/29
HW/SW Co-Simulation Sometimes, you want to use logic simulator to
debug your accelerator It is difficult to write comprehensive testbench in HDL
for complex systems (with both HW and SWcomponents)
Since we have the RTL model of the processor,we can perform HW/SW co-simulation using alogic simulator
![Page 22: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/22.jpg)
22/29
Installation of Logic Simulator For logic simulation, we use GHDL, an open-
source, free logic simulator for VHDL, the projecthome page is located at http://ghdl.free.fr/
The native Windows version of GHDL does notwork for LEON3; get the Cygwin version from:http://home.comcast.net/~bp_labs/software/ghdl/ Note: the command path to GHDL should be set after
/usr/bin, because LEON3 configuration script requiresgcc 3.x.x while GHDL has a built-in gcc 4.x.x that willcause problems if the path is not set properly
![Page 23: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/23.jpg)
23/29
Platform Setup for Simulation To set up LEON3 for simulation, type make xconfig under
the directory: $(GRLIB)/design/leon3-gr-xc3s-1500 In configuration menu, made the following modifications:
Processor Debug Support Unit Disable instr./AHB trace buffers Debug Link Enable only serial debug link VHDL Debugging Enable accelerated UART tracing Peripherals Disable everything except Memory Controller, UART,
Timer, Interrupt Controller
![Page 24: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/24.jpg)
24/29
Building Software for Co-simulation For HW/SW co-simulation, you must build the
software image so that it can be included in theSDRAM model image, sdram.srec.
If your software code is mcomp_tb.c, type thefollowing commands:$ sparc-elf-gcc -o mcomp_tb.elf mcomp_tb.c$ sparc-elf-objcopy -O srec mcomp_tb.elf sdram.srec
Now, sdram.srec can be put under design/leon3-gr-xc3s-1500 for simulation.
![Page 25: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/25.jpg)
25/29
Logic Simulation with GHDL GHDL is a VHDL compiler that generates a
executable, testbench.exe, for logic simulation.Under $(GRLIB)/design/leon3-gr-xc3s-1500 type:$ make ghdl$ ./testbench --vcd=waveform.vcd
All the runtime signals will be saved in thespecified vcd file, waveform.vcd
![Page 26: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/26.jpg)
26/29
Waveform Analysis with GTKWave You can analyze the signal file using GTKWave Downloadable from http://gtkwave.sourceforge.net/
![Page 27: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/27.jpg)
27/29
Issues with GHDL Simulation GHDL simulation is somewhat slow –it takes
about 10 minutes to co-simulate mcomp_tb.c andthe HW logic on a 2.4GHz dual core machine
VCD files are huge (over 200 MB for mcomp_tb.c) Structured VHDL signals not saved in vcd format You can, break out structured signals in your design
Another waveform format, ghw, saves structuredVHDL signals, but ghw file of LEON3 platformcrashes GTKWave
![Page 28: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/28.jpg)
28/29
GHDL Structured Signal Breakout If you plan to use vcd format, a structured signal
should be registered using simple signals:architecture rtl of mcomp is. . .
begin. . .
-- pragma translate_offhsel <= ahbsi.hsel;haddr <= ahbsi.haddr;hwrite <= ahbsi.hwrite;hwdata <= ahbsi.hwdata;hiready <= ahbsi.hready;
-- pragma translate_on. . .
-- sequential codes follows
end;
![Page 29: Lab-3: Simple Accelerator Design - National Chiao Tung ...cjtsai/courses/soc/labs/soc11_lab3.pdf · HW/SW Co-Simulation Sometimes, you want to use logic simulator to debug your accelerator](https://reader033.vdocuments.us/reader033/viewer/2022041611/5e37e7645e9897148c2314b7/html5/thumbnails/29.jpg)
29/29
Final Remark
Doing digital circuit design is likecrafting a sculpture;
you have to visualize thewhole structure before you
start writing HDL code!