![Page 1: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/1.jpg)
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform
Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform
Hao Wang
University of Wisconsin, Madison
![Page 2: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/2.jpg)
2
OutlineOutline
Introduction on SoC
Motivation
Verilog implementation of JPEG encoder
Integrated SoC simulator
Future work
![Page 3: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/3.jpg)
3
System-on-Chip PlatformSystem-on-Chip Platform Mobile computing – New driving force
Smartphones, Tablets
SoC – Popular solution Qualcomm’s Snapdragon, Samsung’s Exyons General-purpose CPU, Graphics processing, Application-specific
accelerators, Modem, etc.
![Page 4: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/4.jpg)
4
Resource Management on SoCResource Management on SoC Schematic of Snapdragon SoC
![Page 5: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/5.jpg)
5
Resource Management on SoCResource Management on SoC Memory bandwidth is the most critical resource shared on SoC
Shared Memory Channel
![Page 6: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/6.jpg)
6
MotivationMotivation Heterogeneous system
CPU – Sensitive to memory latency
GPU – High bandwidth demand, real-time deadline
DSP, multimedia processor – Low response latency requirement
Key problem No architectural simulator available for SoC platform
Integrated CPU-GPU simulator: http://cpu-gpu-sim.ece.wisc.edu/
Goal of this project Design a hardware JPEG encoder using Verilog
Write an architectural model for the hardware encoder
Integrate into a CPU simulator (gem5) as one step to build an architectural simulator for SoC platform
![Page 7: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/7.jpg)
7
JPEG Encoder (Verilog) ImplementationJPEG Encoder (Verilog) Implementation Matlab generates input matrix; read by testbench;
Input 8x8 blocks of data (24-bit) into the encoder; one pixel per clock cycle; Operand collector to ensure the full block is ready
To tolerant variable memory access latency
RGB to YCbCr conversion
DCT on 8x8 blocks
Quantization; multiply (2^13/Qij) then right shift
DPCM and Huffman Encoding for DC components;
RLE and Huffman Encoding for AC components;
Bit streams coming from Y, Cb and Cr are combined to form an output stream (temporal multiplexing)
![Page 8: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/8.jpg)
8
JPEG Encoder ResultJPEG Encoder Result
tif format
768KB
output jpg format
68KB
![Page 9: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/9.jpg)
9
Synthesis Result & ThroughputSynthesis Result & Throughput Synopsys Design Compiler
TSMC 45nm general-purpose library, 800MHz
~1.0e7 blocks per sec
![Page 10: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/10.jpg)
10
Simulator IntegrationSimulator Integration Difficult to find a standard
Which hardware components to include? Low level implementation details: pipelining, circuit design, etc.
Use Trimaran instead A widely-used compilation/architecture infrastructure
General VLIW/Application-specific processor
Configured to model DSP processor
JPEG encoder on Trimaran Software implementation
9.16e7 cycles @ 1GHz – 91.6ms ( verilog design ~0.4ms )
![Page 11: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/11.jpg)
11
Simulator IntegrationSimulator Integration Still separate process; communicate using shared memory
structure in Linux OS;
Memory Requests on Trimaran side will be feed to CPU simulator (gem5) side; simulate the DRAM timing and respond;
gem5 (CPU) Trimaran (DSP)
Request queue
Memory subsystem (M5)
Response queue
Request queue
Memory subsystem (M5)
Shared memory
clock tickset
reset
tickTick scheduler
L2 cache
![Page 12: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/12.jpg)
12
Future WorkFuture Work Figure out how Trimaran simulates timing info
Get lock-step execution done
Figure out real-world usage scenario
Real research – writing papers – graduate
![Page 13: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/13.jpg)
13
THANK YOU!THANK YOU!
![Page 14: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/14.jpg)
14
BACKUP SLIDESBACKUP SLIDES
![Page 15: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/15.jpg)
15
Some DetailsSome Details RGB – YCbCr
24-bit in; 24-bit out;
Pipelined; 3 cycles; 1 – mult; 2 – sum; 3 – rounding;
DCT 8-bit in, pipelined; 64 11-bit output;
Internal 32-bit;
Output_enable set when input enable unset, so requiring idle cycle between 8x8 blocks
Quantization 4 cycles; 1 – latch in; 2 – quantify; 3 – buffer; 4 – rounding;
Huffman Encoding DC calculated first, AC calculated in zigzag order;
Totally 13 cycles inserted between 8x8 blocks
![Page 16: Hardware Image Signal Processing and Integration into Architectural Simulator for SoC Platform Hao Wang University of Wisconsin, Madison](https://reader036.vdocuments.us/reader036/viewer/2022062321/56649f1c5503460f94c33065/html5/thumbnails/16.jpg)
16
Some DetailsSome Details FIFO buffer
Check for 0xFF in the bitstream, add a dummy 0x00;
Append 0xFFD9 at the end
Post-processing MATLAB generates JPEG header and standard Huffman table
Then get the actual JPEG file