a mixed time-criticality sdram controller meaow30-09-2013 sven goossens, benny akesson, kees...
TRANSCRIPT
A Mixed Time-Criticality SDRAM Controller
MeAOW
30-09-2013
Sven Goossens, Benny Akesson, Kees Goossens
COBRA – CA104
NEST
www.compsoc.eu
Mixed Time-Criticality2/15
• Embedded multi-core systems are getting more complex:– Integrating more applications– Applications get more complex– (Functionality/Energy) ratio increases
• Driven by power, area and cost constraints
• Results in a mix of applications of different time-criticalities sharing hardware resources
– Firm real-time + Soft real-time = Mixed real-time
The hardware can no longer be tailored for a specific time-criticality class
www.compsoc.eu
SDRAM Controllers3/15
• DRAM: Most commonly used off-chip memory resource• Shared across FRT and SRT
• Performance metrics: bandwidth (throughput) and latency (response time)• Difficult to bound performance: locality dependent
Firm Real-Time Controllers
•Maximize worst-case performance• Simple / analyzable command scheduler•No attention for average-case performance•Do not exploit locality across requests
Soft Real-Time Controllers
•Maximize average-case performance•Complex high performance command scheduler•Guaranteeable performance is usually low•Exploit locality as much as possible
Mixed Real-Time Controllers: requirementsFor FRT: guarantee enough worst-case performance to satisfy requirementsFor SRT: maximizing the average-case performance
How can locality be exploited by a MRT controller?
www.compsoc.eu
Outline4/15
Introduction
Firm Real-Time Performance
Conclusions
Reconfigurable Controller Architecture
Conservative Open-Page Policy
Firm Real-Time Performance
• Our approach: do not schedule individual commands at run time, instead, use design-time computed command sequences called patterns, and schedule those.
• Select the right memory map / configuration for the mix of applications.
5/15
An example read pattern for a DDR3-800 in configuration (BI 2, BC 4):
Access Granularity (AG):Number of bytes read/written in a pattern:AG = BI BC BL IW∙ ∙ ∙
ACT0 RD04 ×
NOP3 ×
NOP RD0 3 × NOP RD0 2 ×
NOPACT
1 RD0 RD1 RD1 3 × NOP RD1 3 ×
NOP RD1
Each read command results in a burst data transfer. The number of burst per bank is called Burst Count (BC)
The number of Banks Interleaved (BI) in the access
Pattern length
3 × NOP
3 × NOP
(Gross) efficiency: fraction of time that the data bus is occupied in the worst-case
Burst Length (BL): The number of words per read command. They are transferred in BL/2 clock cycles.Interface width (IW)
• The designer can choose the bank interleaving and burst count.• Each configuration results in a different trade-off between bandwidth, latency and power‡
‡ Goossens, Kouters, Akesson, Goossens, Memory Map Selection for Firm Real-time SDRAM Controllers,Proc. DATE 2012
Firm Real-Time Performance6/15
0 0.2 0.4 0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0.1
0.3
0.4
0.6
0.7
0.9
1.1
1.41.72.22.83.75.49.7 42
1
2
4
8
16
32
2,1
2,2
2,4
4,1
4,2
8,1
8,2
8,64
1
2
4
8
16
32
2,1
2,2
2,4
4,1
4,2
8,1
8,64
1
2
4
8
16
32
64
2,1
2,2
2,4
2,8
4,1
4,2
4,4
8,1
8,2
8,64b net
AG
(G
B/s
)
Power (W)
128MB_DDR2-800
128MB_DDR3-800256MB_LPDDR2-800-S4
• Labels: BI,BC (BI 1 is omitted)• All memories in this graph run at 400MHz• Pareto optimal points are connected• Isolines denote energy efficiency in GB/J• Peak bandwidth (1.6GB/s)
0 0.2 0.4 0.6 0.8 1 1.20
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0.1
0.3
0.4
0.6
0.7
0.9
1.1
1.41.72.22.83.75.49.7 42
1
2
4
8
16
32
2,1
2,2
2,4
4,1
4,2
8,1
8,2
8,64
1
2
4
8
16
32
2,1
2,2
2,4
4,1
4,2
8,1
8,64
1
2
4
8
16
32
64
2,1
2,2
2,4
2,8
4,1
4,2
4,4
8,1
8,2
8,64b net
AG
(G
B/s
)
Power (W)
128MB_DDR2-800
128MB_DDR3-800256MB_LPDDR2-800-S4
• Select the configuration based on the real-time requirements of the requestors, and their request sizes.
Band
wid
th (
GB/
s)
Power (W)
www.compsoc.eu
Outline7/15
Introduction
Firm Real-Time Performance
Conclusions
Reconfigurable Controller Architecture
Conservative Open-Page Policy
www.compsoc.eu
Open vs. Close-Page Policy8/15
• Color indicates locality (and request origin)• For the blue requestor the open-page policy:
• Increases the worst-case execution time• Reduces the average-case execution time• We would like to improve average case performance for SRT applications, without
hurting the FRT guarantees
A Read
A Read Read Read
P A Read P A Read P A Read P
A Read PP
A Read P
A Read
1 2 3Request arrivals:
Close-Page policy
Open-Page policy
Time ε
4
1
Conservative Open-Page Policy9/15
A Read
A Read Read Read
P A Read P A Read P A Read P
A Read PP
A Read P
A Read
1 2 3Request arrivals:
Close-Page policy
Open-Page policy
Time
A Read Read P A Read P A Read P A Read PConservative Open-Page policy
4
1
• Key idea:• Do not precharge if next request is known to target the open row• Precharge if next address is not known in time, or in case of a miss
ε
‡ Goossens, Akesson, Goossens, Conservative Open-Page Policy for Mixed Time-Criticality Memory Controllers, Proc. DATE 2013
• Conservative Open-Page policy‡ can be used in a MRT controller:• Worst-case guarantees are equal to a close-page policy.• Average-case performance is better, leading to lower execution times, lower average-case
latencies. SRT applications can even benefit indirectly from the quicker service to FRT requests!• The execution time reduction depends on the memory load of the application.
10/15
• Starting point is a predictable memory pattern set, with a “bypass” in case of a page hit• Use explicit precharges instead of auto-precharge flags
• postpones the precharge as long as possible, to increase the hit-window in which we can decide to bypass the precharge and activate.
A0
N N N N N N N A1
N R0
N N N R0
N N N R1
N N N R1
N N N N N P0
N N N N N N N P1N A0
N N N N
PRE-to-ACT = 10Hit window (28 cc)
A0
N N N N N N N A1
N R0
N N N R0
N N N R1
N N N R1
N N N N N N N N N N N N N N N A0
N N N N
ACT-to-ACT constraint = 38 cycles
Hit window (14 cc)Next request
Bank:Cmd:
Bank:Cmd:
Conservative Open-Page Policy
(DDR3-1600)
‡ Goossens, Akesson, Goossens, Conservative Open-Page Policy for Mixed Time-Criticality Memory Controllers, Proc. DATE 2013
www.compsoc.eu
Outline11/15
Introduction
Firm Real-Time Performance
Conclusions
Reconfigurable Controller Architecture
Conservative Open-Page Policy
Reconfigurable Back-end12/15
SDRAM back-end
Internal configuration busInternal configuration bus
PatternselectorPatternselectorRequest
type
Configuration data
Logical address
Pattern LUTPattern LUT
OffsetOffset
Commandplayer
Commandplayer
Refreshtimer
Refreshtimer
AddressgeneratorAddress
generator
Pattern memoryPattern memory
row/col,bank
RAS, CAS WE, etc
Address masks, shift-amounts
Pattern base-address, length
commands
• Patterns are reprogrammable at run time.• Can support all devices supported by the PHY (all DDR3 devices)
• Different pattern different worst-case bandwidth, latency and power trade-off. • Allows different trade-off per use case.
‡ Goossens, Kuijsten, Akesson, Goossens, A Reconfigurable Real-Time SDRAM Controller for Mixed Time-Criticality Systems, Proc. CODES+ISSS 2013
Reconfigurable Controller Architecture13/15
SDRAMback-endSDRAMback-end
Resource front-endResource front-end
Configuration BusConfiguration Bus
WidthConverter
WidthConverter
WidthConverter
WidthConverter
Req./Resp.queue
Req./Resp.queue
AtomizerAtomizer
TDMArbiterTDM
Arbiter
Req./Resp.queue
Req./Resp.queue
SD
RA
M
Re
so
ur c
e Bu
s
Memory client 1
Memory client 2
Configuration data
AtomizerAtomizer
PH
Y
Run-time reconfiguration infrastructure (memory mapped)
Reconfigurable TDM arbiter (predictable and composable during reconfiguration)
Reconfigurable back-end
• Implemented in SystemC, and on a ML605 Virtex-6 development board from Xilinx• 2-port instance: 13754 registers, 9543 LUTs and 1 BRAM • 4-port instance: 22065 registers, 14016 LUTs and 1 BRAM
• (Most registers are used in the req./resp. queue, that contain 256 bytes / port)‡ Goossens, Kuijsten, Akesson, Goossens, A Reconfigurable Real-Time SDRAM Controller forMixed Time-Criticality Systems, Proc. CODES+ISSS 2013
www.compsoc.eu
Outline14/15
Introduction
Firm Real-Time Performance
Conclusions
Reconfigurable Controller Architecture
Conservative Open-Page Policy
www.compsoc.eu
Conclusions
• Mixed-time criticality controllers should focus on:• For FRT: guarantee enough worst-case performance to satisfy requirements• For SRT: maximizing the average-case performance
• Choosing the right memory map / pattern configuration for the mix of applications:• Trade-offs exist between worst-case bandwidth, latency and power• Select the configuration that satisfies the firm real-time requirements
• Using a conservative open-page policy, some of the locality across requests can be exploited:• Decrease the gap between worst-case performance and average-case performance• Reduce average case latency and thus average case execution time• For soft real-time applications
• Reconfigurable architecture allows changing the memory map / configuration at run-time:• Select the right trade-off per use-case• Leads to other interesting challenges (see CODES2013 paper on predictable reconfiguration)
15/15
www.compsoc.eu
• For further information / a broader perspective:www.compsoc.eu
Sven Goossens <[email protected]>Benny Akesson <[email protected]>Kees Goossens <[email protected]>
Referred papers:www.svengoossens.nl
16/16
Electronic Systems GroupElectrical Engineering Faculty
Eindhoven University of Technology
5-tile compSOC platform: