presenter: zong-ze huang synchronization for hybrid mpsoc full-system simulation luis gabriel...
TRANSCRIPT
Paper Report
Presenter: Zong-Ze Huang
Synchronization for Hybrid MPSoC Full-System
Simulation
Luis Gabriel Murillo, Juan Eusse, Jovana Jovic,Sergey Yakoushkin, Rainer Leupers and Gerd Ascheid
Design Automation Conference (DAC), 2012 49th
2
Full-system simulators are essential to enable early software development and increase the MPSoC programming productivity, however, their speed is limited by the speed of processor models.
Although hybrid processor simulators provide native execution speed and target architecture visibility, their use for modern multi-core OSs and parallel software is restricted due to dynamic temporal and state decoupling side effects.
Abstract
3
This work analyzes the decoupling effects caused by hybridization and presents a novel synchronization technique which enables full-system hybrid simulation for modern MPSoC software.
Experimental results show speed-ups from 2x to 45x over instruction-accurate simulation while still attaining functional correctness.
Abstract (cont.)
4
Instruction Set Simulators(ISSs) is slower than the real systems and increasing their speed is a difficult challenge Hybrid Full-System Simulation
。Target ISS (TS)。Host-compiled abstract simulator (AS)
Multi-processor system simulation Temporally decoupling Hybridization-introduced decoupling
What is the Problem
5
Related worksimulation
frameworks[1][2][3][4][6]
Dynamic binary
translation[17]
[9],[14],[15],[16]
Synchronization for Hybrid MPSoC Full-System SimulationThis
paper:
HiSim[10]
Increase ISS speed
More abstract processor models
[7]
Virtualized function estimated time
Synchronize hybrid processor simulator
6
Traditional simulation workflow
C Sources
Target Compiler
Target Binary
Target ISS
Memory
Application
Closed Source
Libraries
Simulator
7
Bridge the gap between two different abstraction levels. Host mode
One level of abstraction
above ISS-IA Execute dircetlyit on the
host machine
Disadvantage。Losing the visibility of the target
architecture。Synchronization problem
HySim – A Hybrid Simulation Framework
8
TLM-2 offers Temporal Decoupling to improve simulation speed.
Concept : Some simulation parts that not interact with the
surrounding environment frequently might run ahead of the current simulation time for a short amount of time.
Avoid unnecessary kernel synchronization points and context switches.
Temporal Decoupling (1)
Synchronized simulation
Temporally decoupled simulation
time
time
9
TLM-2 defines four timing entities to describe temporal decoupling. (System) Global Quantum
。This represents the time unit on
which all PEs synchronize.
(PE) Global Quantum(βi)。This represents the time unit on
which a particular PE synchronizes.
Local Quantum(αi)。For each PE, this represents the
time remaining from the current
SystemC time until the end of the
current PE Global Quantum.
Loacl Time Offset(λi)。Time PEi is ahead of the system.
Temporally Decoupled timing entities
10
Hybridization-Introduced Decoupling Concept :
Host-compiled execution is incapable of affecting directly the simulated time.
Execution of a virtualized function is performed in zoro time from the simulator’s perspective.
Software performance estimation techniques help to obtain timing values Ƭ for the functions executed natively.
This causes a hybrid ISS to be temporally decoupled from the rest of the system.
11
Suspension quantum is created dynamically upon the execution of a virtualized function.
Advantage: Avoid unnecessary kernel synchronizations
Disadvantage: It would losing interrupts causes systems behave incorrectly.
Suspension Quantum
12
Step 1. The hybrid ISS detects an incoming interrupt.
Step 2. The processor is waken up
Step 3. The PC value is associated to a remaining suspension quantum.
Step 4. A breakpoint-like mechanism is activated on the saved PC in order to restore the remaining suspension time.
Breaking the Suspension Quantum
13
Mix hybridization-introduced decoupling and traditional temporal decoupling in the same PE.
Suspension quantum are used to recompute decoupling parameters. Update local time exceeds the end of the next βi (i.e. > α-λ)
。 ’ = i + ( – (α-λ) )。 ti’ = ti +
Update local time exceeds the end of the next βi (i.e. α-λ)。 ti’ = ti +
local time exceeds the next synchronizes time
14
In a full system simulation, virtualizable function are not allowed to : Perform software synchronization or unrestrictedly access shared
memory. Memory accesse in AS mode will interact with peripherals and
acceleracors.
HySim virtualization chain
15
Simulator Simics Tensilica Diamond and Xtensa ISSs
Host machine 64-bit AMD Phenom Quad-Core 8GB of memory Fedora Core 5
Scenario 1 : 3DES on Single-core system Single-core platform
。Tensilica Diamond DC_B_570T
Scenario 2 : MJPEG on Single-core system Single-core platform
。Tensilica Diamond DC_B_570T
Multi-media acceleration LCD controller
Test Cases 1
16
Scenario 3 : Circular-FFT on Multi-core system Multi-core platform
。Three Xtensa XRC_D2MR cores
Scenario 4 : OFDM(Orthogonal Frequency
Division Multiplexing) Transceiver system Multi-core platform
。Three Xtensa XRC_D2MR cores
Test Cases 2
17
Presented an approach to synchronize hybrid processor simulators within full-system . Defining a specialized temporal decoupling mechanism. Identifying functions that must be avoided in native execution in order
ensure correctness of parallel applications.
Future work Combination with other advanced simulation techniques in this
hybridization simulation.
My comment Novel idea to improve simulation speed.
Conclusion