1 f a s t f a s t frequency-aware static timing analysis by kiran seth, aravindh anantaraman, frank...
TRANSCRIPT
1
F A S TF A S T
Frequency-Aware Static Timing Analysis
By
Kiran Seth, Aravindh Anantaraman,
Frank Mueller and Eric Rotenberg
Center for Embedded Systems Research Departments of CS & ECE
North Carolina State University
2
Real-Time Systems
Tasks have a deadline must terminate on time
Classification— Hard Real-time: missed deadline catastrophe— Soft Real-time: missed deadline low QoS.
Multi-tasking real-time systems require scheduling algorithms
— Scheduler ensures task arbitration online— Schedulability test ensures met deadlines (static test) requires known Worst-Case Execution Time (WCET)
3
Static Timing Analysis
To schedule tasks in Real-time systems, need— Worst-case Execution Time (WCET) and— Worst-case Execution Cycles (WCEC)
Experimental WCET unsafe bounds— Due to input & hardware complexity
Use static timing analysis toolset to obtain safe WCET bounds
4
Static Instruction Cache Analysis
Work explained in [Mueller RTS-J’00]
Interprocedural data-flow analysis Predicts each cache reference as one of
— always-hit— always-miss— first-hit— first-miss
Each instruction categorized— for each loop level— and function (loop w/ 1 iteration)
5
Static Data Cache Simulation
For accurate static timing analysis— need data cache analysis
Currently, data cache analysis tool not accurate enough— Too many restrictions, not general enough for real code— Improvements by [Vera RTSS’03]
Solutions — All data accesses hits… highly underestimated.— All data accesses misses… highly overestimated.
Assume big enough cache to fit all data set
Assume first-time accesses as misses (cold misses, only), o/w hits
— Accurate? Yes. But what is caches smaller?— No significant impact on this study
6
Static Timing Analyzer
Path & tree-based approach [Healy IEEE TC’99]
Find nodes in the CFG and derive WCEC for each node
A node is a function or loop
WCET is calculated bottom-up
Standard timing analysis assumptions apply — No recursion— All loop bounds must be known— No function pointers
7
Motivation of FAST
Dynamic Voltage Scaling (DVS) scheduling schemes— Change frequency/voltage for system
save power without missing deadlines— Several DVS scheduling schemes available— Good fit for real-time systems— Most real-time systems
– have low utilization– are low-power embedded systems
Potential for considerable energy savings with DVS
8
Problem
Current DVS schemes:— Ignore effects of frequency scaling on WCEC
— DVS schemes assume: WCEC constant with frequency Overestimate WCET at lower frequencies
To demonstrate the problem— WCET of C-Lab benchmark static timing analysis tool— For frequencies 100MHz – 1GHz— Assess observed WCEC & WCET vs.
assumption made by DVS schemes
9
Actual vs. Assumed WCEC for FFT
0
500000
1000000
1500000
2000000
2500000
3000000
Frequency (MHz)
Nu
mb
er o
f cy
cles
Actual WCEC
Assumed WCEC
WCEC changes with frequency modulation— WCEC increases with higher frequency— Constant memory latency: 100ns
10
0.000000
5.000000
10.000000
15.000000
20.000000
25.000000
30.000000
Frequency (MHz)
Tim
e (m
s)Actual WCET
Assumed WCET
Actual vs. Assumed WCET for FFT
Difference in chosen frequency for DVS w/ WCET=5ms— assumed: ~ 550 MHz— actual: ~ 150 MHz
11
Parametric Frequency Model
Problem:
DVS— Considers processor frequency scaling— Ignores effect of frequency scaling on memory accesses
With frequency scaling:— Cycles for processor operations remains constant— Except for memory operations problem
DVS schemes overestimate the WCET at lower frequencies— Cannot fully utilize available slack— Power savings potential largely wasted
12
Parametric Frequency Model
Solution:
Calculate WCEC— accounting for effects of memory accesses— using the new parametric frequency model
Model:
WCEC(f) = i + mN = i + mLf
i: Invariant # of worst-case cycles (for non-memory operations)
m: # of worst-case memory accesses
N: # of cycles per memory access— depends on memory latency L and frequency f: N = Lf
13
Using the Parametric Frequency Model
A: add R2, R1, R3B: load R4, [M1]C: add R2, R1, R4D: add R2, R1, R5
Instruction sequence simulated through simple pipeline explain parametric frequency model
Simple pipeline:— 6 stages— Data & instruction cache— N = 10
14
Example 0: Cache Hits
Recall: B is load instruction
WCEC = 9 + 0N
— Each row represents pipeline stage.— Time (and cycle count) increases horizontally.
15
Example 1: Effect of I-cache miss
WCEC = 9 + 1N
Stall due to I-cache miss is shown
Model accurately captures memory latency, however long
16
Example 2: Effect of D-cache miss
Recall: B is load instruction
WCEC = 9 + 1N
Stall due to D-cache miss is shown
Again, model captures memory latency, however long
Notice: during stall cycles, no useful work is done
17
Example 3: Effect of I- & D-cache Miss
WCEC = 9 + 2N I-cache miss first, then D-cache miss
Overlap between useful cycles & stall cycles
Also during high-latency execution operations— E.g. floating-point, multiply, … overlap w/ D-cache miss
Leads to overestimation in practice rare, still safe WCET
18
Experimental Validation
Combine frequency model with our static timing analyzer FAST tool
WCEC FAST equations
Experiment to validate results from FAST tool— Run benchmarks through FAST tool— An equation representing WCEC for benchmark obtained
— Run same benchmarks through traditional timing analysis tool
— Vary frequencies: 100MHz-1GHz
19
Frequency-Aware Static Timing Analysis (FAST)
FAST tool “as accurate” as traditional static timing analysis
Slight overestimation in case of floating-point benchmarks
0.998
0.999
1.000
1.001
1.002
1.003
1.004
1.005
fft adpcm lms cnt mm srt
Benchmarks
Ra
tio
(F
AS
T V
S.
Sta
tic
tim
ing
an
aly
sis
) Frequency = 100 MHz
Frequency = 400 MHz
Frequency = 700 MHz
Frequency = 1000 MHz
20
FAST in EDF Scheduling with DVS
DVS with EDF: Ck/Pk , where =fc/fm
FAST with EDF: (ik+mkLfm)/Pkfm
— Schedulability test: (ik/Pk) / fm (1 - L mk/Pk)
Implemented frequency model for 3 EDF-DVS algorithms— Algorithms by [Pillai & Shin]— Look-ahead improved:
– @ completion, consider next deadline– up to 34% additional energy savings (5-11% on avg.), low U– but 0.5-8% less savings at high utilization
21
Improving DVS schemes
Use parametric frequency model to improve DVS schemes— provide accurate WCET
Improved energy savings
Architectural Simulator: SimpleScalar+Wattch [Brooks ISCA’00]
— 6-stage simple in-order pipeline processor model— I-cache and D-cache (8KB each)— Run 4-8 tasks simultaneously (scheduler runs as its own
task)— More accurate than E ~ V2f model ? Results newer than paper
22
Static RT-DVS vs. FAST Static RT-DVS
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
taskset 1(high)
taskset 2(high)
taskset 3(high)
taskset 1(low)
taskset 2(low)
taskset 3(low)
Tasksets
En
erg
y N
orm
aliz
ed t
o b
ase
edf
Static
Fast Static
Base case: EDF
Tasks at 1GHz Idle: 100MHz
no sleep mode small task periods
tasksets 1: integer 2: float 3: mix
Static scheme better than base EDF 12-60% energy savings FAST-Static even better 40-78% savings
high + lower utilization
23
Cycle-conserving RT-DVS vs.FAST cycle-conserving RT-DVS
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
0.450
0.500
taskset 1(high)
taskset 2(high)
taskset 3(high)
taskset 1(low)
taskset 2(low)
taskset 3(low)
Tasksets
En
erg
y N
orm
aliz
ed t
o b
ase
edf
Cycle
Fast Cycle
dynamic scheduling early completion, reclaimed as slack Cycle-conserving 57-72% energy savings FAST 71-80% savings
24
Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
taskset 1(high)
taskset 2(high)
taskset 3(high)
taskset 1(low)
taskset 2(low)
taskset 3(low)
Tasksets
En
erg
y N
orm
aliz
ed t
o b
ase
edf
Look-Ahead
Fast Look-Ahead
most aggressive DVS: early completion + max. deferral Look-ahead: slightly higher savings than cycle-conserving @ 68-80% FAST: slightly better in most cases @ 72-83%
25
Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS
E ~ V2f model
Higher savings: up to 96% ?
Ratio look-ahead / FAST similar
Wattch detailed power model
Probably more accurate
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
taskset 1(high)
taskset 2(high)
taskset 3(high)
taskset 1(low)
taskset 2(low)
taskset 3(low)
Tasksets
En
erg
y N
orm
aliz
ed t
o b
ase
edf
Look-Ahead
Fast Look-Ahead
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6
Tasksets
En
erg
y n
orm
aliz
ed t
o b
ase
ED
F
Look-ahead RT-DVS
FAST Look-ahead RT-DVS
26
Conclusion
Energy savings in real-time systems can be significantly improved by considering the effects of frequency scaling on WCET
— FAST + Static RT-DVS– as good as Look-Ahead RT-DVS– less overhead
The parameterized frequency model can easily track effects of frequency scaling on WCET
FAST tool works best when — Many cache misses— If D-cache analysis is highly inaccurate (usually true)
FAST can make up for it— High memory latency— Insufficient dynamic slack reclaiming (during DVS scheduling)— Integrated into real-time hardware support [VISA ISCA’03]
27
BACKUP SLIDES
28
The V2f model
0.00
500.00
1000.00
1500.00
2000.00
2500.00
3000.00
3500.00
Frequency (MHz)
Po
wer
29
Old DVS Scheduling Simulator
Event based simulator of scheduler.
Have to assume miss rate for the tasks in dynamic schemes.
Uses E ~ V2f energy model.
Gives a good idea about savings, BUT accurate ??
30
Static RT-DVS vs. FAST Static RT-DVS
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5 6
Tasksets
En
erg
y n
orm
aliz
ed t
o b
ase
ED
F
Static RT-DVS
FAST Static RT-DVS
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
taskset 1(high)
taskset 2(high)
taskset 3(high)
taskset 1(low)
taskset 2(low)
taskset 3(low)
Tasksets
En
erg
y N
orm
aliz
ed t
o b
ase
edf
Static
Fast Static
31
Cycle-conserving RT-DVS vs.FAST cycle-conserving RT-DVS
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5 6
Tasksets
En
erg
y n
orm
ali
zed
to
ba
se
ED
F
Cycle-conserving RT-DVS
FAST Cycle-conserving RT-DVS
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
0.450
0.500
taskset 1(high)
taskset 2(high)
taskset 3(high)
taskset 1(low)
taskset 2(low)
taskset 3(low)
Tasksets
En
erg
y N
orm
aliz
ed t
o b
ase
edf
Cycle
Fast Cycle
32
Look-ahead RT-DVS vs.FAST Look-ahead RT-DVS
0
0.02
0.04
0.06
0.08
0.1
0.12
1 2 3 4 5 6
Tasksets
En
erg
y n
orm
aliz
ed t
o b
ase
ED
F
Look-ahead RT-DVS
FAST Look-ahead RT-DVS
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
taskset 1(high)
taskset 2(high)
taskset 3(high)
taskset 1(low)
taskset 2(low)
taskset 3(low)
Tasksets
En
erg
y N
orm
aliz
ed t
o b
ase
edf
Look-Ahead
Fast Look-Ahead
33
DVS schemes (Pillai & Shin)
Static RT-DVS – Uses static slack available in the schedule.
Cycle-conserving RT-DVS – Uses static slack + dynamic slack due to early completion.
Look-ahead RT-DVS – Uses static slack + dynamic slack due to early completion + latest possible scheduling (look-ahead).
34
Complexity
Original EDF test O(n)
Modified EDF test still O(n)