new strategies for system-level design : where to go? · new strategies for system-level design :...
TRANSCRIPT
New Strategies for
System-Level Design :Where to Go?
Daniel GajskiCenter for Embedded Computer Systems
University of California, [email protected]
Copyright 2007 Daniel D. GajskiKeynote 2007
Overview
• Introduction
• Issues
• Models
• Platforms
• Tools
• Benefits
• Conclusion
Copyright 2007 Daniel D. GajskiKeynote 2007
Y-Chart
Copyright 2007 Daniel D. GajskiKeynote 2007
Processor behavioral model
Language C -> CDFG -> FSMD (FSM +DFG)
Copyright 2007 Daniel D. GajskiKeynote 2007
Processor structural model
RF / Scratch pad
ALU MUL
Status
B3
PC
AG
CW
offset
status
address
const
CMem
Memory
B2
B1
Programmable controller
Controller pipelining
Datapath pipelining
Data forwarding
Register file Memory
(Component netlist: NISC technology)
Copyright 2007 Daniel D. GajskiKeynote 2007
Processor synthesis
FSMD model
Component selectionCA schedulingVariable binding Operation Binding
Bus BindingController Synthesis
Processor
Processor
RF / Scratch pad
ALU MUL
Status
B3
PC
AG
CW
offset
status
address
const
CMem
Memory
B2
B1
Copyright 2007 Daniel D. GajskiKeynote 2007
System behavioral model
(Serial-parallel processes: UML + C/ SystemC)
Copyright 2007 Daniel D. GajskiKeynote 2007
System structure
(Netlist of system components: processors, memories, buses)
Copyright 2007 Daniel D. GajskiKeynote 2007
System Synthesis
System behavior
Proc
Proc
Proc
Proc
Proc
Memory
Memory
µProcessor
Interface
Comp.IP
Bus
Interface
Interface
Interface
Custom HW
System structure
ProfilingAllocation IF Synthesis
Refinement
Behavior Binding Channel Binding
System
Scheduling
Copyright 2007 Daniel D. GajskiKeynote 2007
Closing the System Gap
Real gap: behavior and structure (semantics and syntax)
Copyright 2007 Daniel D. GajskiKeynote 2007
Simulation Based Methodology
Simuletable but not synthesizable or verifiable
Ambiguous semantics of hardware/system level languages
Copyright 2007 Daniel D. GajskiKeynote 2007
In Search of a Solution
Algebra: < objects, operations>
Arithmetic algebra allows creation of expressions and equivalences
Copyright 2007 Daniel D. GajskiKeynote 2007
Model Algebra
Model algebra: <objects, compositions>
Model algebra allows creation of models and model equivalences
Copyright 2007 Daniel D. GajskiKeynote 2007
Specify-Explore-Refine Methodology
Design decisions
Model refinement
Replacement or re-composition
FPGA board
Copyright 2007 Daniel D. GajskiKeynote 2007
How many models?
Minimal set for any methodology
(3 is enough)
• System specification model (application designers)
• Transaction-level model (system designers)
• Pin&Cycle accurate model (implementation designers)
Copyright 2007 Daniel D. GajskiKeynote 2007
Three Models (with Respect to OSI)
Pin / Cycle Accurate Model
Transaction Level Model
Specification Model
7. Application6. Presentation5. Session4. Transport3. Network2b. Link + Stream2a. Media Access Ctrl2a. Protocol1. Physical
7. Application6. Presentation5. Session4. Transport3. Network2b. Link + Stream2a. Media Access Ctrl2a. Protocol1. Physical
Address lines
Data lines
Control lines
TLM
Spec
P/CAM
Source: G Schirner
Copyright 2007 Daniel D. GajskiKeynote 2007
System Specification
Brid
ge
v1C
1B1 B2
CPU Mem
HW
B3
IP
B4
C2
System Definition = (Partial) Platform + (Partial) Spec
Arb
iter
Communication• Channels (in C)• Variables (in C)
Computation • Behaviors (in C)
Copyright 2007 Daniel D. GajskiKeynote 2007
Transaction-Level Model (TLM)
CPU Bus
B1 B2
OS
B4
CPU Mem
IP
B3
HW
HALDrivers
IP Bus
Copyright 2007 Daniel D. GajskiKeynote 2007
Pin/Cycle Accurate Model (P/CAM)C
PU Mem
Bridge
HW IP
Arbiter
HALRTOS
EXE
P/CAM is downloaded automatically for fast prototyping
with FPGAs or ASIC design
ICProgram
Source: D. Gajski et al.
Copyright 2007 Daniel D. GajskiKeynote 2007
How many components?
Minimal set for any design
(4 is enough?)
• Processing element (PE)
• Memory
• Transducer / Bridge
• Arbiter
Copyright 2007 Daniel D. GajskiKeynote 2007
General System Model
Bus2Bus1 Bus3
Arbiter 1
PE 1.1
Transducer2-3
Arbiter 2
Arbiter 3
PE 1.2
Memory 1
PE 2.1(Master)
PE 3.1
Memory 3
Interrupt1.1
Interrupt2.1
PE 2.2(Slave)
Transducer1-2 Interrupt2.2
Interrupt3.1
Interrupt3.2
Copyright 2007 Daniel D. GajskiKeynote 2007
Transducer Model
Processor1<clk1>
Processor2<clk2>
FSMD1<clk1>
FSMD2<clk2>
Transducer
Ready1Ack_ready1
Ready2Ack_ready2
Memory1 Memory2
PE1 PE2
Addr bus1 Addr bus2Data bus1 Data bus2
Interrupt2Interrupt1
Data1 Data2
Queue<clk3>
Source: H. Cho
Copyright 2007 Daniel D. GajskiKeynote 2007
Processing Element: NISC technology
RF / Scratch pad
ALU MUL
Status
PC
AG
CW
offset
status
address
const
CMem
Memory
B2
B1
B3
Data forwarding
Datapath pipelining
Controller pipelining
Programmable controller
Multi-cycle units
Datapath Pipelined units
• Direct compilation of C to HW (fastest possible execution)• Statically and dynamically reconfigurable (anytime, anywhere)• Designed for manufacturability (solving timing closure)
Copyright 2007 Daniel D. GajskiKeynote 2007
General System Design EnvironmentModel A
Model B
Estimationtool
Synthesistool
Componentlibrary
SimulationtoolGUI Verify
tool
ti
Refinementtool
Transforms:t1t2...
tn
Copyright 2007 Daniel D. GajskiKeynote 2007
How many tools?
Minimal set for any methodology
(2 is enough?)
• Front-End (for application developers)– Input: C, C++, Mathlab, UML, …
– Output: TLM
• Back-End (for SW/HW system designers)– Input : TLM
– Output: Pin/Cycle accurate Verilog/VHDL
Copyright 2007 Daniel D. GajskiKeynote 2007
DecisionUser
Interface (DUI)
ValidationUser
Interface (VUI)
TIMED
Create
Map
Compile
Replace
Select
Partition
Compiler
Debugger
Stimulate
Verify
ES Environment
Application Tools : Compilers/Debuggers Commercial Tools : FPGA, ASIC
CYCLE ACCURATE
Verify
Simulate
Check
Compile
ESE Front – End
System Capture + Platform Development
SW Development + HW Development
ESE Back – End
Copyright 2007 Daniel D. GajskiKeynote 2007
Benefit: Spec-to-Prototype in 1 Week
Copyright 2007 Daniel D. GajskiKeynote 2007
Does it work?• Intuitively it does
– Well defined models, rules, transformations, refinements– Worked in the past: layout, logic, RTL?– System level complexity simplified
• Proof of concept demonstrated– Embedded System Environment (ESE)– Automatic model generation– Model synthesis and verification– Universal IP technology (NISC)– Productivity gains greater then 1000
• Benefits– Large productivity gains– Easy design management – Easy derivatives– Shorter TTM
Copyright 2007 Daniel D. GajskiKeynote 2007
Design flow with NISC technology
offset
status
const
address
CWPC CMem
AG
status
B1B2
ALU Memory
RF
MUL
B3
const
status
RF
OR
ALUAR
MemDR
offset
CMem
CWPC
AG P
bL
Sum
Add
aL
Mul
for(int i=0; i<8; i++)for(int j=0; j<8; j++){
sum=0;for(int k=0; k<8; k++)
sum = sum + A[i][k] ×B[k][j];C[i][j] = sum;
}
for(int i=0; i<8; i++)for(int j=0; j<8; j++){
sum=0;for(int k=0; k<8; k++)
sum = sum + A[i][k] ×B[k][j];C[i][j] = sum;
}
for(int i=0; i<8; i++)for(int j=0; j<8; j++){
i8 = i × 8;sum = *(A + i8) × *(B + j);sum += *(A + i8 + 1) × *(B + 8 + j);...C[i][j] = sum;
}
for(int i=0; i<8; i++)for(int j=0; j<8; j++){
i8 = i × 8;sum = *(A + i8) × *(B + j);sum += *(A + i8 + 1) × *(B + 8 + j);...C[i][j] = sum;
}
NISC Compiler
Results
Code Refinement
NISC Refinement
NISC
Application
NISC
ApplicationNISC
CompilerResults
Iterative design & refinement
Source: M. Reshadi
Copyright 2007 Daniel D. GajskiKeynote 2007
5.3X 2.1X 11.6X 2.5X
0.83X 1.3X 0 2.1X
1.25X NA NA NA
DCT with NISC technology
0
0.2
0.4
0.6
0.8
1
1.2
1.4
MIPS NMIPS CDCT1 CDCT2 CDCT3 CDCT4 CDCT5 CDCT6 CDCT7 Manual
Execution Time Power Energy Area
10X 1.3X 12.8X 3X
Performance Power saving Energy saving Area reduction
NMIPS vs. MIPS
CDCT3 vs. NMIPS
CDCT7 vs. NMIPS
CDCT7 vs. ManualSource: B. Gorjiara
Copyright 2007 Daniel D. GajskiKeynote 2007
Example: MP3 Decoder
• Functional block diagram (major blocks only)
• Timing constraints• 38 frames per second• Frame delay < 26.12ms
HuffDec
FilterCoreIMDCT
PCM
FilterCoreIMDCT
mp3 pcm
Left channel
Right channel
AliasRed
AliasRed
2 granules
Copyright 2007 Daniel D. GajskiKeynote 2007
MP3 Player TLM (SW+4HW)
• MP3 encoder mapped to SW (MicroBlaze), filters and IMDCT to HW• Mem1 (on OPB bus) for data, Mem2 (on LMB bus) for program• Custom HWs on DoubleHdshk (DH) bus, with bridge to OPB
OPB Bus
MP3OS
Mic
robl
aze
CPU
RightFilter
HW2Mem1
HAL
DH Bus
Bridge
LeftFilter
HW1
LMB Bus
Mem2
RightIMDCT
HW4
LeftIMDCT
HW3
Copyright 2007 Daniel D. GajskiKeynote 2007
Manual Design Quality
• Area• % of FPGA slices and BRAMS
• Performance• Time to decode 1 frame of MP3 data
0102030405060708090
100
SW+0 SW+1 SW+2 SW+4
Design Points
% c
hip
utili
zatio
n
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
seco
nds %Slices
%BRAMsExec. time
Copyright 2007 Daniel D. GajskiKeynote 2007
Design Quality with NISC components
• Area• NISC uses fewer FPGA slices and more BRAMs than manual HW
• Performance• NISC comparable to manual HW and much faster than SW
0102030405060708090
100
SW+0 SW+1 SW+2 SW+4 NISC+0
Design Points
% c
hip
utili
zatio
n
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
seco
nds %Slices
%BRAMsExec. time
Copyright 2007 Daniel D. GajskiKeynote 2007
Development Time with ESE
0
10
20
30
40
50
60
70
Spec. TLM RTL Board
models
pers
on-d
ays SW+0
SW+1SW+2SW+4
Manual Development Time
• Model Development time• Includes time for C, TLM and RTL Verilog coding and debugging
• ESE drastically cuts RTL and Board development time
ESE
0
10
20
30
40
50
60
70
Spec. TLM RTL Board
models
pers
on-d
ays SW+0
SW+1SW+2SW+4
Source: S. Abdi
Copyright 2007 Daniel D. GajskiKeynote 2007
0
10
20
30
40
50
60
70
Spec. TLM RTL Board
models
pers
on-d
ays SW+0
SW+1SW+2SW+4
Development Time with ESE
• ESE drastically cuts RTL and Board development time• Models can be developed at Spec and TL• Synthesizable RTL models are generated automatically by ESE
ESE
Source: S. Abdi
Copyright 2007 Daniel D. GajskiKeynote 2007
Validation Time with ESE
0123456789
10
Spec. TLM RTL Boardmodels
seco
nds SW+0
SW+1SW+2SW+4
Validation Time
• Simulation time measured on 3.3 GHz processor• Emulation time measured on board with Timer• ESE cuts validation time from hours to seconds
ESE
0123456789
10
Spec. TLM RTL Boardmodels
seco
nds SW+0
SW+1SW+2SW+4
X
hour
s
18.06 hrs17.71 hrs17.56 hrs15.93 hrs
Source: S. Abdi
Copyright 2007 Daniel D. GajskiKeynote 2007
0123456789
10
Spec. TLM RTL Boardmodels
seco
nds SW+0
SW+1SW+2SW+4
Validation Time with ESE
• ESE cuts validation time from hours to seconds• No need to verify RTL models• Designers can perform high speed validation at TLM and board
ESE
Source: S. Abdi
Copyright 2007 Daniel D. GajskiKeynote 2007
Conclusions
• Extreme makeover is necessary for a new paradigm, where
– SW = HW = SOC = Embedded Systems– Simulation based flow is not acceptable– Design methodology is based on scientific principles
• Model algebra is enabling technology for– System design, modeling and simulation– System synthesis, verification, and test
• What is next?– Change of mind– Application oriented EDA– Looking for early adapters
Thank You
Daniel GajskiCenter for Embedded Computer Systems (CECS)
www.cecs.uci.edu