tkt-1212 digitaalijärjestelmien toteutus · tkt-1212 digitaalijärjestelmien toteutus. a. kulmala,...
TRANSCRIPT
A. Kulmala, E. Salminen, TUT, Spring 2009
Lecture 8 – Simulation engines
Ari Kulmala, Erno Salminen 2009
TKT-1212 Digitaalijärjestelmien toteutus
A. Kulmala, E. Salminen, TUT, Spring 2009
ContentsModeling dimensions
1. Temporal2. Data abstraction3. Functional4. Structural
Basic simulator typesEvent-driven simulation enginesCycle-aseb simulation engines
Waveform viewers and summary
A. Kulmala, E. Salminen, TUT, Spring 2009
IntroductionVerification => Functional correctnessTesting => Manufacturing correctnessDesign Under Verification (DUV)Much of design time is spend on developing the verification environment and debugging HDLSimulation based verification is the most common
The heart is the simulation engineModels the behavior of the designSupports high-level verification languages (e.g. Vera), code coverage tools etc.
3
A. Kulmala, E. Salminen, TUT, Spring 2009
Acknowledgements
4
This presentation is based on the book ”Comprehensive Functional Verification: The Complete Industry Cycle” by Bruce Wile, John Goss, and Wolfgang Roesner
Some examples obtained from Mentor Modelsim manual
A. Kulmala, E. Salminen, TUT, Spring 2009
Modeling dimensions
“Model is a simplification of reality and every system is best approached through a small set of nearly independent models”
– Booch & Rumbaugh
5
A. Kulmala, E. Salminen, TUT, Spring 2009
Modeling dimensions#1: Temporal1. Temporal dimension -Behavior over time, i.e. when the
state changesI/Os represent the state of the DUV
a) Continuous time (Analog)Fairly close to physical properties of electrical circuit
b) Discrete time”Digital”, electrical properties abstracted, delta delay, events occur seemingly simultaneouslyClock cycleNo wiring or gate delays
c) Event-based (instruction-level, transaction-level)Waits for certain events, time between events varies
6
A. Kulmala, E. Salminen, TUT, Spring 2009
Modeling dimensions #2: Data2. Data abstraction - Signal values
a) Continuous range (analog)E.g. voltage measurement, arbitrarily accurate real numbers
b) Discrete valuesBits, strings, integers, states …
E.g. std_logic:’1’, ’0’, ’u’, ’x’, ’z’, ’H’, ’L’ …
Abstract values, e.g. user defined enumeration statesMain, read_io, write_io …
Structs combine several abstract values
7
A. Kulmala, E. Salminen, TUT, Spring 2009
Modeling dimensions #3: Function3. Functional Dimension
May just be Continuous mathematical functions (e.g. Spice simulator)Select level of abstraction
Transistors -> switches -> Boolean Logic -> Algorithms -> Abstract mathematical formula
E.g.:a) Boolean (half) adder:
z0 = x0 xor y0 C1 = x0 and y0
b) Algorithm+ (automatic implementation or user defined)
c) Abstract formulaThe whole functionality is specified with abstract implementation-independent notationsx=(2+y)^z mod a
8
A. Kulmala, E. Salminen, TUT, Spring 2009
Modeling dimensions #4: Structure4. Structural Dimension
a) Flat (single black box)No structurality, ”just implementation”, eg. FFT with abstarct mathematical formula
b) HierarchicalImplementation is structural
Many subblocks
Many components that have subblocks
E.g. FFT has subblocks for add and multiply
FFT
FFT+
*
+
*
Input Output
Input Output
9
A. Kulmala, E. Salminen, TUT, Spring 2009
VHDL support for modeling dimensionsTemporal
Continuous Gate Delay Clock CycleIntruction
CycleEvents
Data
Continuous Multivalue Bit Bit Abstract Value Struct
Functional
Continuous Switch Level Boolean Logic Algorithmic Abstract Mathematical
Structural
Single black box
Functional blocks
Detailed component hierarchy
10Verilog’s support
A. Kulmala, E. Salminen, TUT, Spring 2009
Modeling compromise: Speed vs. Accuracy
RTLStyle
Gate-levelstyle
Gate-level with detailed
delays
Simulation runtime and memory requirements
Model details and accuracy11
More speed -> more abstract models -> more test cases within same time, but less accuracy.Designers should start with high-level models
Basic verification, ”something like this could work”Gradually refine the models if needed
A. Kulmala, E. Salminen, TUT, Spring 2009
Simulation engines
A. Kulmala, E. Salminen, TUT, Spring 2009
Simulation engine typesEvaluate HDL model over time and present its state
Standardization: The HDL language reference manual (LRM) defines the behavior of the simulation engine.
1. Evaluate signals and blocks only at model times for which eventsare scheduled
Event-driven simulation enginesMajority of simulators are in this category, e.g. ModelSimEvaluate only the ”active parts”
2. Evaluate the model at every point of time along the finest granularity known to the simulation engine
Cycle-based simulation enginesSimpler simulator and hence fasterCommonly evaluate the whole model regardless of activity
13
A. Kulmala, E. Salminen, TUT, Spring 2009
Simulation engine
HDL Model
Basic HDL simulator block diagramInteractive user control GUI
HDL model of DUV
HDL Testbench
stimuluscheck
stimuluscheck
Trace Files
Coverage Files
Interactive waveform viewer GUI
Testbench program
Interactive testbench
debug GUIstimulus
check
14
Interactive coverage analysis GUI
A. Kulmala, E. Salminen, TUT, Spring 2009
Event-Driven Simulation Engines
15
A. Kulmala, E. Salminen, TUT, Spring 2009
Event-driven simulationMost popular approach,
Used also in other areas since it is very general approachChannels or signals transfer data between blocksBlocks process data at its inputs which may initiate a new transfer.
Engine acticates only those blocks whose inputs changeAlgorithm to evaluate time:
”Evaluate signals and blocks only at model times for which events are scheduled”The model objects need to notify the simulation engine about future changes scheduling engine may skip unused time intervalsSimulator keeps track of ”current time” and when events are scheduled events
16
A. Kulmala, E. Salminen, TUT, Spring 2009
Event-driven simulation (2)Scheduling is done in internal time intervals if no delay is specified
Zero-delay scheduling (delta delay), most usual in RTLEach scheduling step evaluation creates a resulting update to the next occuring stepParallel updates are handled sequentially by the engine, effectively randomly
Signal changes propagate through the model as the scheduling progressesFeedback loops may cause endless oscillation
User or the engine must take action to interrupt uncontrolled oscillationUsually bad HDL design, e.g. combinatorial loop
17
A. Kulmala, E. Salminen, TUT, Spring 2009
Simulator exampleSimulating a top level having two blocks, which contain 3 sub-blocks, b1, b2, c1.
inputs model outputs
i1
i2
a1
a2
s1
s2
s3
s4
s5 s6
a3
a4
s7
s8
s9a5
a6
a7 o1
o2
18
Signal directions imply partial order for the evalutation.
A. Kulmala, E. Salminen, TUT, Spring 2009
Sim. example: evaluation over timeA change occurs in i1Simulator starts to evaluate the model over time by steps (delta delay)
B1
B2
C1
inputs model outputs
i1
i2
a1
a2
s1
s2
s3
s4
s5 s6
a3
a4
s7
s8
s9a5
a6
a7 o1
o2
i1 a1 s1 B1
s3
a3 s7 C1
B2 s6 a6
s4 …
o2After reaching o2, simulator goes back to one of the uncompleted branches (a3 or s4). Their mutual order is not specified.
Block B2 might be evaluated twice (due to S4 and S5)
…Signal update
Block evaluation
Simulator engine’s scheduling steps19
A. Kulmala, E. Salminen, TUT, Spring 2009
Another example
20
s9s10
s11
s12 s13
s7
s8
1.
2.
3.
4.
5.
A. Kulmala, E. Salminen, TUT, Spring 2009
Another example: Update sequence
21
...
...
A. Kulmala, E. Salminen, TUT, Spring 2009
Example of how network view is constructed from VHDL
22
user-defined signal name
simulator’s internal signal
A. Kulmala, E. Salminen, TUT, Spring 2009
Signal assignment1. concurrent - in an architectural body
Signal activities control – not top-down flow - their execution order
2. sequential - inside a process, top-down orderingThe value on the right side will be scheduled for the left side
Value is placed on the driver of the left-hand side signalMultiple concurrent assignments produce multiple drivers
That is legal if the signal type defines resolution function which resolves a single value from multiple drivers
Sequential body (i.e. process) may have multiple assignment but they produce only a single driver
Note that this includes both sequential (synchronous) and combinatorial (asynchronous) processesThe last assignment in HDL is ”kept”
Last assigned value (in time) is kept unless otherwise stated -> may require state-holding logic in synthesis (DFF or latch)
A. Kulmala, E. Salminen, TUT, Spring 2009
Events and transactionsSignal assignment may have an associated delay
An event occurs when a signal changes its value
When a value is scheduled to be assigned to a target signal after a given time, a transaction has been placed on the driver
A transaction may assign the same value again but no event occurTransaction is represented with value-time tuple (time, new_value)
A. Kulmala, E. Salminen, TUT, Spring 2009
Events and transactions (2)
target_signal <= v after d;
t+ t+ t+
At time t, new value v is computed from the right-hand sideAssignment specifies also delay dtransaction tri = (v,d) is placed on the driver
At time t+t0,time component of the transaction tri has decresed to d-t0
At time t+t1, time component decreases furtherAt time t+d, time component becomes 0 and transaction expires
Target signal get the value v
A. Kulmala, E. Salminen, TUT, Spring 2009
Transaction example
a b
c
expiration, but no event
event
A. Kulmala, E. Salminen, TUT, Spring 2009
Events and sensitivity listSimulation engine considers events of only those signals that are included in sensitivity list
Example1: combinatorial processcomb_foo: process (a_in, b_in) ... No need to simulate this when, e.g. clk changesHowever, it could be simulated
but that would not change any value (needed inputs are stable) waste of simulation time
comb_foocomb_foo ...
a_in
b_in assigned signals
clkrst_n
others
not needed in sens.list
(Do not read or put any of these outputs into sens.list! That would create combinatorial loop, i.e. random oscillator and infinite loop in simulator.)
A. Kulmala, E. Salminen, TUT, Spring 2009
sync_bar (in simulator)sync_bar (in simulator)
Events and sensitivity list (2)Example 2: synchronous processsync_bar: process (rst, clk)begin
if(rst = '0')thens0 <= '0';
elsif(clk'event and clk='1') theno ... <statements>
Process will be simulated on every clock edgeBut statements inside elsif-branch executed only at rising clock edgeOne could include all signals that are read in statements
their events do not occur at the same time with clk’event waste of time
a_in
b_in
clkrst_n
othersnot needed in sens.list
(These will change just after the clk edge and these can be read inside sync_bar )
... assigned signals
A. Kulmala, E. Salminen, TUT, Spring 2009
Events and sensitivity list (3)Synthesis tool does not care about sensitivity list!
All necessary signals (those that are read) will go into the logic cloud!
All signals assigned inside if-branch with x’event and x=’1’create register whose clk input is connected to signal x
Detecting edge on arbitrary control signal must be coded explicitlyCompare values from two consecutive clk cycles: if (a_old_r /= a_in)Nested ’events won’t produce any meaningful logic
sync_bar (synthesized HW)sync_bar (synthesized HW)
...
a_in
b_in assigned signals
clkrst_n
others
comb logic
comb logic
A. Kulmala, E. Salminen, TUT, Spring 2009
Delays in VHDL1. Real delays: inertial, reject-inertial transport
See last lecture
Model the gate and wire delays
2. Delta delaysSimulator’s concept to deal with seemingly concurrent events
Multiple signals may need updating, statements that are sensitive to these signals must be executed, and any new events that result from these statements must then be queued and executed as well
The steps taken to evaluate the design without advancing simulation time are referred to as "delta times" or just "deltas.“
An infinitesimal interval
Waveform shows the same global time no matter how many delta delays elapses
This mechanism may cause unexpected results
A. Kulmala, E. Salminen, TUT, Spring 200931
Delta delay exampleRS latchIn waveform viewer, all transitions occurat the same time
ENTITY rsl ISPORT (s, r: IN BIT; q, qn: OUT BIT );
END rsl;ARCHITECTURE gate OF rsl IS
SIGNAL q_temp, gn_temp : BIT;BEGIN
q <= q_temp;qn <= qn_temp;q_temp <= s NAND qn_temp;-- Executed once in tqn_temp <= r NAND q_temp;-- Executed twice in t
END gate; q_temp
qn_temp
A. Kulmala, E. Salminen, TUT, Spring 2009
Delta delayThe execution order of components with zero delay is unclear, e.g. two processes
Simulator assumes some order
This is bad problem if signal value is momentarily out of range of its type
A. Kulmala, E. Salminen, TUT, Spring 2009
Event-driven simulationEssential properties:
1. Evaluate model behaviour only at those times when model events are scheduled
2. Evaluate behavior only for the blocks or signals for which events are scheduled
VHDL and Verilog definition include the assumption of underlying event-driven simulator
Cyclic process1. Update signals2. Execute processes (concurrent statements are actually also
processes)3. Adavance global time
33
A. Kulmala, E. Salminen, TUT, Spring 2009
Event-driven simulationThe three basic core data structures of the event-driven simulation engine
1. A list of all executable blocks present in the model network2. Data structure that shows the connections between blocks via signals3. A value table that holds all current signal values
Activity and time progress controlled by time wheelAt zero time, all executable blocks are scheduled
In VHDL, all processes and concurrent assignments
Each time wheel entry has a to-do listAssignments scheduled to happen at that point
34
A. Kulmala, E. Salminen, TUT, Spring 2009
ModelSim general flow
Source: Modelsim manual
35
A. Kulmala, E. Salminen, TUT, Spring 2009
Simulator checks all projected signal traces
Global time is advanced to the next transaction
1 ns, 10ns, 15 ns, 20 ns, 35 ns...
S3 is 10 between 15ns and 20ns
After that value is function of 1 ns 10
Fig: [http://www.ida.liu.se/~petel/SysSyn/lect2.frm.pdf]
Proj
ecte
d si
gnal
va
lues
Simulator example
A. Kulmala, E. Salminen, TUT, Spring 2009
Example of delta delay problemclk2 <= clk;seq0: process (rst, clk)
beginif(rst = '0')then
s0 <= '0';elsif(clk'event and clk='1') then
s0 <= inp;end if;
end process;seq1: process (rst, clk2)
beginif(rst = '0')then
s1 <= '0';elsif(clk2'event and clk2='1')
thens1 <= s0;
end if;end process;
seq0
s0
inp
seq1
s1
clk2
clk
rst_n
Desired HW: this is what you would expect
A. Kulmala, E. Salminen, TUT, Spring 2009
clk2 <= clk; process (rst, clk) begin
if(rst = '0')then s0 <= '0';
elsif(clk'event and clk='1') then s0 <= inp;
end if; end process; process (rst, clk2) begin
if(rst = '0')then s1 <= '0';
elsif(clk2'event and clk2='1') thens1 <= s0;
end if; end process;
Inp=1Clk = 1
Clk2<=clkEvent-queue[t0]
Signal update queue [t0] clk2
seq0:(clk)
S0=1
These change first,
then signal updates
(s0=0)
seq1:(clk2)
S1=1
which create new event,
and last signal
Wrong value!
Delta delay problem in simulator
Inp= 1Clk = 0
38In one simulation round
A. Kulmala, E. Salminen, TUT, Spring 2009
Behavior of example codeIn this example you have two synchronous processes,
1. one triggered with clk 2. one triggered with clk2
To your surprise, the signals change in the clk2 process on the same edge as they are set in the clk process!As a result, the value of inp appears at s0 and s1 in the same simulation cycleDuring simulation
1. An event on clk occurs (from the testbench). 2. From this event ModelSim performs the "clk2 <= clk" assignment and the
process which is sensitive to clk3. Before advancing the simulation time, ModelSim finds that the process sensitive to
clk2 can also be run.In order to get the expected results, you must do one of the following:
a) Make certain to use the same clock on both processes or use just one process
b) Insert a delay at every output c) Insert a delta delay
A. Kulmala, E. Salminen, TUT, Spring 2009
Event-driven simulation performanceWidely used optimization well understoodPerformance critical portions:
Management of to-do listsThe time wheelThe data that represents model topology
When event is evaluated traverse model topology to find signals and blocks to update find the corresponding slots in the time wheelput the corresponding event to the to-do list
Model granularity compromiseActivity rate affects fine-grained model more than coarse grained model
40
A. Kulmala, E. Salminen, TUT, Spring 2009
Granularity vs performance
41
A. Kulmala, E. Salminen, TUT, Spring 2009
Simulation PerformanceSimulation throughput
Per time spent:Amount of verification, i.e. number of testsNumber of cyclesNumber of distinct states visited and checked
Improve throughput:Increase simulation engine performance
Or increase the simulated model performanceRun simulations in parallelEliminate redundant simulations
Hard to measureThe target of the simulation engine: all-around, gate-level, RT-level?Profiling: which parts of the model are most time consuming
42
A. Kulmala, E. Salminen, TUT, Spring 2009
Improving performanceEfficiency: time_spent_on_HDL / time_spent_on_scheduling
Less events, more efficient
Speed can be optimized by more abstract HDL:No gate-level structuresIntegers instead of bit-vectorsStandard librariesBinary values over multivaluedNo delay statementsProcesses instead of concurrent assignments
Process is a pre-scheduled atomic action for simulation engine
43
A. Kulmala, E. Salminen, TUT, Spring 2009
Example VHDLs
44
Fastest
Slowest
A. Kulmala, E. Salminen, TUT, Spring 2009
Event driven simulations- the future
Significant research on parallel algorithms for simulation engine over the years
No breakthroughSeems to be inherently hard to parallelize
no commercially available parallel event-driven simulation engine
Two alternatives to parallelize1. Trivial parallelization: Simulation farm
Pool of workstations, each running independent simulation
2. Running single model partiotioned and parallelized accross several workstations
45
A. Kulmala, E. Salminen, TUT, Spring 2009
Cycle-Based Simulation Engines (CBSE)
46
A. Kulmala, E. Salminen, TUT, Spring 2009
Cycle-Based simulation enginesAlgorithm to evaluate time:Evaluate the model at every point of time along the finest granularity
known to the simulation engineE.g. once per clok cycle
Based on much simpler algorithms than event-driven simSuperior performance
10x to 20x speed, models 3-10 times smallerOptimized totally for synchronous hardware design style
DownsidesSevere constraints to HDL design style
No delaysLimited sequential structuresTestbench features of the HDLs largely not supported
Testbenches with other languages, APIs
47
A. Kulmala, E. Salminen, TUT, Spring 2009
Cycle-based simulation engines(2)Due to constraints, not commercially accepted (came to market in mid 90s)However, some features have been then integrated to the event-driven simulators
Applicable portions of the code are automatically handled with cycle-based fashion, others with event-driven
Hybrid simulators
Synchronous designTiming verification and functional verification can be separated
Modeling propertiesZero-delay simulationNo combinational feedback loops
a directed acyclic graph (DAG)
No dynamic block scheduling, evalution times are known
48
A. Kulmala, E. Salminen, TUT, Spring 2009
Example model network
49
A. Kulmala, E. Salminen, TUT, Spring 2009
Cycle-based simulation enginesCBSEs typically use an oblivious simulation algorithm
Calculates all the combinational functions at every cycleRedundant work: evaluate also parts that do not change
simplicity (time wheels, to-do lists removed)No multivalue bits
Synchronous do not care about glitches and hazards
Simulation model actually becomes a piece of executable code (a program)
Each output has a mathematical function dependent of the inputs typical arithmetic optimizations can be used at the compile time(synthesis-like optimizations).
e.g. constant propagation, redundant logic
50
A. Kulmala, E. Salminen, TUT, Spring 2009
CB model of 2b-adder
Compare with the one shown on slide #22 (fig 5.25). Note the absence of delays51
A. Kulmala, E. Salminen, TUT, Spring 2009
CBSE extensionsIn order to be more usable, CBSE’s have been extended at the expense of the simulation efficiency
Multivalued bitsE.g. Busses (three-state), bus driving errorsIn VHDL, std_logic has 9 different states lots more computation on boolean logicIn verilog, there are 4 states for bit
Performance degrates 3x-4xShould be selectable feature
Multiple clock domainsOverclock the slower domainHowever, Simulators can never be solely used to quarantee clock domain crossings!
Hybrid simulatorsEvents inside CBSE vs. CBSE inside event BSE
52
A. Kulmala, E. Salminen, TUT, Spring 2009
Multiclocking example
Slower clock domain clocked with the rate of the faster. (overhead)
53
A. Kulmala, E. Salminen, TUT, Spring 2009
Waveform Viewers
54
A. Kulmala, E. Salminen, TUT, Spring 2009
Waveform ViewersEvery simulator can produce a trace file
At minimum, symbol name and signal value containedUnfortunately, EDA vendors all have own file formatsUsual difference is the compression, because in large simulations, data amount is very high
Waveform viewers share very common looking GUI
GTK wave viewer is a free tool that supports many different formats
Search capabilities are required for usabilityCertain transitions on a signalSpecific values
55
A. Kulmala, E. Salminen, TUT, Spring 2009
Summary1. Event driven simulation engine:
Most commonSupports arbitrary delaysSupports a large set of HDL features
2. Cycle-based simulation engineMostly used to boost simulation within event driven simulation engineDoes not support delays (fixed time steps only)Severely restricts usable HDL featuresSignificantly faster than Event DSE
Designer can affect simulation speed by design choicesMore abstract code -> more speed -> less accuracy
Balance and compromise!
56
A. Kulmala, E. Salminen, TUT, Spring 2009
Extra
A. Kulmala, E. Salminen, TUT, Spring 2009
Other sourcesVHDL: Analysis and Modeling of Digital Systems
Tekijät Zainalabedin NavabiJulkaisija McGraw-Hill Professional, 1998ISBN 0070464790, 9780070464797632 sivuahttp://books.google.fi/books?id=Z_EjcfIQqGgC
http://www.ece.msstate.edu/~reese/EE8993/lectures/delay/delay.pdfhttp://www.imit.kth.se/courses/2B1512/F1.pdfhttp://www.ida.liu.se/~petel/SysSyn/lect2.frm.pdfhttp://www.cs.lth.se/EDA380/Lectures/Lecture3.pdf
A. Kulmala, E. Salminen, TUT, Spring 2009
Time wheel and data structures
59
A. Kulmala, E. Salminen, TUT, Spring 2009
Typical ED simulator flow
yesno
60
no yes
A. Kulmala, E. Salminen, TUT, Spring 2009
Debugging delta delay problemsThe best way to debug delta delay problems is observe your signals in the List window. There you can see how values change at each delta time.
View -> ListSelect signal in Object window -> RightMouseButton+Add to List
A. Kulmala, E. Salminen, TUT, Spring 2009
Delta delays in List window
A. Kulmala, E. Salminen, TUT, Spring 2009
Detecting Infinite Zero-Delay LoopsIf a large number of deltas occur without advancing time, it is usually a symptom of an infinitezero-delay loop in the design.In order to detect the presence of these loops, ModelSim defines a limit, the iteration limit", on the number of successive deltas that can occur. When ModelSim reaches the iteration limit, it issues a warning message.The iteration limit default value is 5000.If you receive an iteration limit warning, first increase the iteration limit and try to continue simulation. You can set the iteration limit from the Simulate > Runtime Options menu or by modifying the IterationLimit variable in the modelsim.ini. See Control Variables Located in INI Files for more information on modifying the modelsim.ini file.If the problem persists, look for zero-delay loops. Run the simulation and look at the source code when the error occurs.Use the step button to step through the code and see which signals or variables are continuously oscillating. Two common causes are a loop that has no exit, or a series of gates with zero delay where the outputs are connected back to the inputs.
A. Kulmala, E. Salminen, TUT, Spring 2009•Source: ModelSim SE Userís Manual, v6.2a, June 2006