designing programmable platforms: from asic to asipflavio/ensino/cmp237/aula20.pdf · current work...
TRANSCRIPT
![Page 1: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/1.jpg)
Designing Programmable Platforms:From ASIC to ASIP
MPSoC 2005Heinrich Meyr
CoWare Inc., San Joseand
Integrated Signal Processing Systems (ISS),
Aachen University of Technology, Germany
![Page 2: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/2.jpg)
Agenda
Facts & Conclusions
Heterogeneous MPSoC» Energy Efficiency vs.Flexibility» How to explore the Design Space?
ASIP Design
Economics of SoC Development
Conclusions
Agenda
![Page 3: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/3.jpg)
Facts & Conclusion
![Page 4: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/4.jpg)
Core Proposition
ASIP ASIP basedbased PlatformsPlatforms((heterogenousMPSoCheterogenousMPSoC))
![Page 5: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/5.jpg)
Agenda
Facts & Conclusions
Heterogeneous MPSoC» Energy Efficiency vs.Flexibility» How to explore the Design Space?
ASIP Design
Economics of SoC Development
Conclusions
Agenda
![Page 6: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/6.jpg)
Trade-off between Flexibility and Energy -Efficiency
HeterogeneousHeterogeneous MPSoCMPSoC
![Page 7: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/7.jpg)
Architectural Objectives
Need more MOPS/Watt and MOPS/mm² to minimize the global performance measure for battery driven devices
Energy / decoded Bit = (Joule/Bit)
![Page 8: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/8.jpg)
Computational Effiency vs. Flexibility
SourceSource: : T.NollT.Noll, RWTH Aachen, RWTH Aachen
![Page 9: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/9.jpg)
Enabling MP-SoC Design
![Page 10: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/10.jpg)
block implementationmicro
architecturedomain
•• RTL SynthesisRTL Synthesis
•• MatlabMatlab•• SPWSPW•• System StudioSystem Studio
algorithmdomain
block specificationArchitectureDescriptionLanguage
•• LISATek Processor SynthesisLISATek Processor Synthesis•• ConvergenSC ConvergenSC BuscompilerBuscompiler
High-level IP block design
block implementationmicro
architecturedomain
•• RTL SynthesisRTL Synthesis
block specificationArchitectureDescriptionLanguage
•• LISATek Processor SynthesisLISATek Processor Synthesis•• ConvergenSC ConvergenSC BuscompilerBuscompiler
system application design
algorithmic exploration
System Level Tools I: Application & IP Creation
![Page 11: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/11.jpg)
Systemapplication design
System Level Tools II: MP-SoC Platform Design
•• MatlabMatlab•• SPWSPW••System StudioSystem Studio
block implementationmicro
architecturedomain
•• RTL SynthesisRTL Synthesis
High-level IP block designblock implementation
microarchitecture
domain•• RTL SynthesisRTL Synthesis
block specificationArchitectureDescriptionLanguage
•• LISATek Processor SynthesisLISATek Processor Synthesis•• ConvergenSC ConvergenSC BuscompilerBuscompiler
algorithmic exploration
virtual prototype
SystemCTransaction
LevelModeling •• ConvergenSCConvergenSC Platform CreatorPlatform Creator
abstract architecture •• MPMP--SoCSoC Intermediate RepresentationIntermediate Representation
algorithmdomain
MP-SoC platform design
abstract architecture
virtual prototype
SystemCTransaction
LevelModeling
•• MPMP--SoCSoC Intermediate RepresentationIntermediate Representation
•• ConvergenSCConvergenSC Platform CreatorPlatform Creator
System Level Tools I: Application & IP Creation
![Page 12: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/12.jpg)
Agenda
Facts & Conclusions
Heterogeneous MPSoC» Energy Efficiency vs.Flexibility» How to explore the Design Space?
ASIP Design
Economics of SoC Development
Conclusions
Agenda
![Page 13: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/13.jpg)
Processor Design Space
MMU
Memory Peripheral
Core Cache
FEFE DCDC EXEX WBWB
• Bypass ?
• Pipeline length ?• Shared resources ?• Parallel execution units ?
which cache required ?
bus fast enough?
butterfly 0 load/storebutterfly 1
communication?
• Exploit regularity/parallelism in data flow/data storage
• VLIW, SIMD, ? • Which instructions for compiler support?• Instruction Encoding?• How much general purpose registers?
• Area constraints met?• Clock frequency?
Instruction Set Design Micro Architecture Design
RTL Design Soc Integration
- Instruction-Set Design- Compiler Design
- Instruction-Set Design- Compiler Design -Micro Architecture Design-Micro Architecture Design
-RTL Design- RTL ISS Co-verification
-RTL Design- RTL ISS Co-verification
-System Integration- Embedded Software
Simulation
-System Integration- Embedded Software
Simulation
Optimal design requires powerful toolsand automation !
Optimal design requires powerful toolsand automation !
MESCAL 2:MESCAL 2:InclusivelyInclusively identifyidentify the the architecturalarchitectural spacespace
![Page 14: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/14.jpg)
The purpose of an architecture description language (e.gLISA) is:
» To allow for an iterative design to efficiently explore architecture alternatives
» To jointly design “Architecture –Compiler” and on chip communication
» To automatically generate hardware (path to implementation)
» To automatically generate tools» Assembler ,Linker, Compiler, Simulator, co-simulation
interfaces
From a single model at various level of temporal and spatial abstraction
Architecture Description Language based Processor Design
MESCAL 3:MESCAL 3:EfficientlyEfficiently describedescribethe ASIPthe ASIP
![Page 15: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/15.jpg)
very detailed
no details
LISA 2.0 - Abstraction Levels
time
highlevel
model
PseudoPseudoInstructionsInstructions
ProcessorProcessorInstructionsInstructions
CyclesCycles PhasesPhases
PseudoPseudoResourcesResources(e.g. c(e.g. c--variables)variables)
Functional units,Functional units,Registers,Registers,MemoriesMemories
+ Pipelines+ Pipelines
+ IRQ, etc.+ IRQ, etc.
instructionaccurate
model
cycleaccurate
model
phaseaccurate
model
architecture
accu
racy
accu
racy
![Page 16: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/16.jpg)
FFT Processor
Application
SoftwareTool Chain
SoftwareTool Chain
LISATekLISATek
Processor Processor DesignerDesigner
RTL RTL
ExecutableExecutableSoftwareSoftwarePlatformPlatform
RTLRTLSoCSoCIntegration KitIntegration Kit((e.g.:SystemCe.g.:SystemC))
DSP SampleVLIW Sample
RISC Sample
Empty Model
LISATek IP LISATek IP SamplesSamples
CustomProcessor
Model(LISA 2.0language)
GenerateGenerateToolsTools
Function and instruction levelFunction and instruction levelprofiling reveals hotprofiling reveals hot--spotsspots--> special purpose instructions> special purpose instructions
Describe/AdoptDescribe/AdoptProcessor ModelProcessor Model
Generate...Generate...
Rapid modeling and re-targetable simulation + code-generation allows for:joint optimization of application and architecture
Rapid modeling and reRapid modeling and re--targetabletargetable simulation + codesimulation + code--generation allows for:generation allows for:joint optimization of application and architecturejoint optimization of application and architecture
MESCAL 3:MESCAL 3:EfficientlyEfficiently describedescribeand and evaluateevaluate the the ASIPASIP
MESCAL 5:MESCAL 5:SucessfullySucessfully deploydeploythe ASIPthe ASIP
![Page 17: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/17.jpg)
Current Work
Evaluation ResultsChip Area, Clock Speed,
Power Consumption
SystemC, VHDL, Verilog Output
Gate Level Synthesis
Target Architecture
LISA Description
Evaluation ResultsProfile Information,
Application Performance
Model Verification& Evaluation
LISA CompilerC-Compiler
AssemblerLinker
Simulator
EXPLORATION
IMPLEMENTATION
Optimization
HDL Generator
•Instruction Set Synthesis
•Memory architecture•Verification
MESCAL 3:MESCAL 3:……....evaluateevaluate the ASIPthe ASIP
![Page 18: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/18.jpg)
JuneJune 1010,, 20042004
A Novel Approach for Flexible and A Novel Approach for Flexible and Consistent ADLConsistent ADL--driven ASIP Designdriven ASIP Design
Gunnar BraunGunnar BraunAchim NohlAchim Nohl
CoWare, IncCoWare, IncDAC Booth #1844 DAC Booth #1844 www.CoWare.comwww.CoWare.com
Weihua Sheng, Jianjiang Ceng, Manuel Hohenauer,Weihua Sheng, Jianjiang Ceng, Manuel Hohenauer,Hanno Scharwächter, Rainer Leupers, Heinrich MeyrHanno Scharwächter, Rainer Leupers, Heinrich Meyr
Integrated Signal Processing Systems (ISS)Integrated Signal Processing Systems (ISS)AachenAachen University of TechnologyUniversity of Technology
GermanyGermany
![Page 19: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/19.jpg)
IntroductionIntroduction
Architecture Description Languages (ADL)Architecture Description Languages (ADL)
•• Automatic generation of Software ToolkitAutomatic generation of Software Toolkit(Compiler, Assembler, Linker, IS(Compiler, Assembler, Linker, IS--Simulator)Simulator)
•• Architecture ExplorationArchitecture Exploration
•• SystemC models, RTL code, verification tools, ...SystemC models, RTL code, verification tools, ...
Challenges:Challenges:
•• Different tools need different informationDifferent tools need different information
•• Unambiguous, redundancyUnambiguous, redundancy--free free architecturearchitecture modelmodel(rather than (rather than tools descriptiontools description))
•• Multiple abstraction levels (instructionMultiple abstraction levels (instruction--accurateaccurateand/or cycleand/or cycle--accurate)accurate)
![Page 20: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/20.jpg)
Tool Requirements: Compiler
++
rsrs rtrt
rdrd
add rd = rs, rtadd rd = rs, rt
**
rsrs rtrt
rdrd
mul rd = rs, rtmul rd = rs, rt
LDLD
@@
rdrd
ld rd = @ld rd = @
STST
rsrs
@@
st @ = rsst @ = rsC CompilerC CompilerC Compiler
a = b + c;a = b + c;a = b + c; CC
add c = a, badd c = a, badd c = a, b AssemblyAssembly
![Page 21: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/21.jpg)
Tool Requirements: Simulator
add rd = rs, rtadd rd = rs, rtALU_read (rs, rt);ALU_read (rs, rt);ALU_add ();ALU_add ();Update_flags ();Update_flags ();writeback (rd);writeback (rd);
mul rd = rs, rtmul rd = rs, rtMUL_read (rs, rt);MUL_read (rs, rt);MUL_add ();MUL_add ();Update_flags ();Update_flags ();writeback (rd);writeback (rd);
ld rd = @ld rd = @LSU_addrgen();LSU_addrgen();data_bus.req();data_bus.req();data_bus.read();data_bus.read();writeback (rd);writeback (rd);
st @ = rsst @ = rsLSU_addrgen();LSU_addrgen();LSU_read(rs);LSU_read(rs);data_bus.req();data_bus.req();data_bus.write(rs);data_bus.write(rs);
SimulatorSimulatorSimulator
add r5 = r2, r1add r5 = r2, r1add r5 = r2, r1Machine CodeMachine Code
ALU_read (r2, r1);ALU_add ();
Update_flags ();writeback (r5);
ALU_read (r2, r1);ALU_read (r2, r1);ALU_add ();ALU_add ();
Update_flags ();Update_flags ();writeback (r5);writeback (r5);
Simulation Code (C)Simulation Code (C)
![Page 22: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/22.jpg)
ADL Model
C CompilerC CompilerC Compiler
a = b + c;a = b + c;a = b + c;
add c = a, badd c = a, badd c = a, b
SimulatorSimulatorSimulator
add r5 = r2, r1add r5 = r2, r1add r5 = r2, r1
add rd = rs, rtadd rd = rs, rtALU_read (rs, rt);ALU_read (rs, rt);ALU_add ();ALU_add ();Update_flags ();Update_flags ();writeback (rd);writeback (rd);
++
rsrs rtrt
rdrd
SYNTAX {“ADD“ dst, src1, src2
}
CODING {0b0010 dst src1 src2
}
BEHAVIOR { ALU_read (src1, src2);ALU_add ();Update_flags ();writeback (dst);
}
SEMANTICS {src1 + src2 dst;
}
SYNTAX {“ADD“ dst, src1, src2
}
CODING {0b0010 dst src1 src2
}
BEHAVIOR { ALU_read (src1, src2);ALU_add ();Update_flags ();writeback (dst);
}
SEMANTICS {src1 + src2 dst;
}
ADL ModelADL Model
ALU_read (r2, r1);ALU_add ();
Update_flags ();writeback (r5);
ALU_read (r2, r1);ALU_read (r2, r1);ALU_add ();ALU_add ();
Update_flags ();Update_flags ();writeback (r5);writeback (r5);
![Page 23: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/23.jpg)
Problem Statement
•• Compiler and Simulator need different information:Compiler and Simulator need different information:•• Compiler: C operation to instruction(s)Compiler: C operation to instruction(s)
WHATWHAT is the instruction good for? Purpose?is the instruction good for? Purpose?
•• Simulator: instructions to sequence of operationsSimulator: instructions to sequence of operationsHOWHOW is the instruction executed? What actions to perform?is the instruction executed? What actions to perform?
•• Architecture Designer‘s Perspective:Architecture Designer‘s Perspective:
?????????
src1 + src2 dst;src1 + src2 dst;
ALU_read (src1, src2);ALU_add ();Update_flags ();writeback (dst);back (dst);
ALU_read (src1, src2);ALU_add ();Update_flags ();write
![Page 24: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/24.jpg)
Examples
![Page 25: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/25.jpg)
ASDSP FPGA Implementation
ASDSP Core Design
FPGA Implementation
iProve Xilinx xc2v6000
Support the Special Instruction Set for FFT Operation and the BMU InstructionImprove the Performance for OFDM Communication
SEC 0.18um Synthesis• Gate : 77,000• Program Memory : 4 Kbyte, Data Memory : 8 Kbyte
• Frequency : 290MHz
• Power consumption : 0.87W (3mW/MHz)
MyjungMyjung Sunwoo, Sunwoo, AjiouAjiou University,University,
![Page 26: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/26.jpg)
The ICORE
A low-power ASIP for Infineon DVB-T 2nd
generation Single-Chip Receiver:
• ASIP for DVB-T acquisition and tracking algorithms (sampling-clock-synchronization, interpolation / decimation, carrier frequency offset estimation)
• Harvard Architecture• 60 mostly RISC-like Instructions &
Special Instructions for CORDIC-Algorithm• 8x32-Bit General Purpose Registers, 4x9-Bit Address Registers• 2048x20-Bit Instruction ROM, 512x32-Bit Data Memory• I2C Registers and dedicated interfaces for external communication
![Page 27: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/27.jpg)
Increasing SW Content- but How?The Motorola M68HC11
Architecture
The Motorola M68HC11 Architecture
![Page 28: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/28.jpg)
Architecture Overview
M68HC11 CPU Architecture :» 8-bit micro-controller.
» Harvard Architecture
» 7 CPU Registers.» 6 different Addressing Modes.» Shared data and program bus. :» Instruction width : 8,16, 24, 32, 40 :» 8-bit opcode : 181 instructions» Clock speed : ~200 MHz» Performance : :» Area : 15K to 30K (DesignWare® Library)
Hot spots
stalled data accessmulti-cycle fetch
non-pipelined
![Page 29: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/29.jpg)
Architecture Development with LISA
FE DC
512Bytes int. RAM
64Bytes Conf. Reg.
3.5K ext. RAM
61K ext. RAM
16
32
16
32
0x0000
0x10000
ACCU
Index XIndex Y
Stack Pointer
Condition
Accu BAccu A
EX3232
+ pipelined architecture+ separate program and data bus+ pipelined architecture+ separate program and data bus
![Page 30: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/30.jpg)
Results
•Area < 23k gates
•Clock speed ~ 200 MHz
•Execution time speed up 62 % for spanning tree application
•Mapped onto Xilinx FPGA
![Page 31: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/31.jpg)
Architecture Development with LISA
•Studying the architecture
•Basic architecture modifications
•Grouping and coding of the instructions
•Writing the LISA model
-basic syntax and coding
-behavior section
•Validation
•HDL Generation Total
4 days
2 days
1 day
4 days
6 days
4 days
2 days
23 days
![Page 32: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/32.jpg)
Institute for Integrated Signal Processing Systems
Design of Application SpecificProcessor Architectures
Rainer LeupersRWTH Aachen University
Software for Systems on [email protected]
![Page 33: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/33.jpg)
42005 © R. Leupers
Overview
1. Introduction2. ASIP design methodologies3. Software tools4. ASIP architecture design5. Case study6. Advanced research topics
![Page 34: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/34.jpg)
52005 © R. Leupers
1. Introduction
![Page 35: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/35.jpg)
62005 © R. Leupers
Embedded system design automation
Embedded systemsSpecial-purpose electronic devicesVery different from desktop computers
Strength of European IT marketTelecom, consumer, automotive, medical, ...Siemens, Nokia, Bosch, Infineon, ...
New design requirementsLow NRE cost, high efficiency requirementsReal-time operation, dependabilityKeep pace with Moore´s Law
![Page 36: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/36.jpg)
72005 © R. Leupers
What to do with chip area ?
![Page 37: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/37.jpg)
82005 © R. Leupers
Example: wireless multimedia terminals
Multistandard radioUMTSGSM/GPRS/EDGEWLANBluetoothUWB…
Multimedia standardsMPEG-4MP3AACGPSDVB-H…
Key issues:
• Time to market (≤ 12 months)
• Flexibility (ongoing standardupdates)
• Efficiency (battery operation)
Key issues:
• Time to market (≤ 12 months)
• Flexibility (ongoing standardupdates)
• Efficiency (battery operation)
![Page 38: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/38.jpg)
92005 © R. Leupers
Application specific processors (ASIPs)
„As the performance of conventional microprocessors improves, theyfirst meet and then exceed the requirements of most computingapplications. Initially, performance is key. But eventually, other factors, like customization, become more important to the customer...“
[M.J. Bass, C.M. Christensen: The Future of the Microprocessor Business, IEEE Spectrum 2002]
design budget = (semiconductor revenue) × (% for R&D)growth ≈ 15% ≈ 10%
# IC designs = (design budget) / (design cost per IC)growth ≈ 50-100% growth ≈ 15%
[Keutzer05]
→ Customizable application specific processors as reusable, programmable platforms
![Page 39: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/39.jpg)
102005 © R. Leupers
Efficiency and flexibility
Source: T.Noll, RWTH Aachen
HW Design
SWDesign
DigitalSignal
Processors
GeneralPurpose
Processors
103 . . . 104
Log
P O
W E
R
D I
S S
I P
A T
I O
N
105
. . .
106
ApplicationSpecific
ICs
PhysicallyOptimized
ICs
FieldProgrammable
Devices
Log
F L
E X
I B
I L
I T Y
Application Specific Instruction
Set Processors
Why use ASIPs?• Higher efficiency for given rangeof applications• IP protection• Cost reduction (no royalties)• Product differentiation
Log P E R F O R M A N C E
![Page 40: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/40.jpg)
122005 © R. Leupers
2. ASIP designmethodologies
![Page 41: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/41.jpg)
132005 © R. Leupers
ASIP architecture exploration
Linker
Assembler
Compiler
Simulator
Profiler
Application
Linker
Assembler
Compiler
Simulator
Profiler
Application
initial processorarchitecture
Linker
Assembler
Compiler
Simulator
Profiler
Application
optimizedprocessor
architecture
![Page 42: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/42.jpg)
142005 © R. Leupers
Expression (UC Irvine)
![Page 43: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/43.jpg)
152005 © R. Leupers
Tensilica Xtensa/XPRES
Source: Tensilica Inc.
![Page 44: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/44.jpg)
162005 © R. Leupers
MIPS CorXtend/CoWare CorXpert
CorExtend Module
+
Profileand
identify custom
instructions
Hotspot
1
User Defined Instruction
User Defined Instruction
Replace critical codewith specialinstruction
2
Synthesize HW and profilewith
MIPSsimand
extensions
3
![Page 45: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/45.jpg)
172005 © R. Leupers
CoWare LISATek ASIP architecture exploration
Integrated embedded processor development environment Unified processor model in LISA 2.0 architecture description language (ADL)Automatic generation of:
SW toolsHW models
![Page 46: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/46.jpg)
182005 © R. Leupers
LISA operation hierarchy
addr cond opcode opnds
imm linear cycl control arithm move short long
add sub mul and or
main
decode
Reflects hierarchicalorganization of ISAs
![Page 47: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/47.jpg)
192005 © R. Leupers
LISA operations structure
LISA operation
BEHAVIOR
Computation and processor state update
SYNTAXAssembly syntax
CODINGBinary coding
DECLAREReferences to other operations
EXPRESSION
Resource access, e.g. registers
ACTIVATION
Initiate “downstream” operations in pipe
SEMANTICS
C compiler generation
![Page 48: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/48.jpg)
202005 © R. Leupers
LISA operation example
OPERATION ADD{
DECLARE{
GROUP src1, src2, dest = { Register } }CODING { 0b1011 src1 src2 dest }
SYNTAX { “ADD” dest “,” src1 “,” src2 }
BEHAVIOR { dest = src1 + src2; }}
OPERATION Register{
DECLARE{
LABEL index; }
CODING { index }
SYNTAX { “R” index }EXPRESSION{ R[index] }
}
C/C++ Code
ADD
Register Register Register
src1src1 src2src2 destdest
![Page 49: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/49.jpg)
212005 © R. Leupers
Exploration/debugger GUI
• Application simulation• Debugging• Profiling• Resource utilization analysis• Pipeline analysis• Processor model debugging• Memory hierarchy exploration• Code coverage analysis• ...
• Application simulation• Debugging• Profiling• Resource utilization analysis• Pipeline analysis• Processor model debugging• Memory hierarchy exploration• Code coverage analysis• ...
![Page 50: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/50.jpg)
222005 © R. Leupers
Some available LISA 2.0 models
DSP:Texas Instruments TMS320C54x
Analog DevicesADSP21xx
Motorola 56000
RISC:MIPS32 4K
ESA LEON SPARC 8
ARM7100
ARM926
• VLIW:
– Texas Instruments TMS320C6x
– STMicroelectronicsST220
• µC:
– MHS80C51
• ASIP:
– Infineon PP32 NPU
– Infineon ICore
– MorphICs DSP
![Page 51: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/51.jpg)
232005 © R. Leupers
3. Software tools
![Page 52: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/52.jpg)
242005 © R. Leupers
Tools generated from processor ADL model
Linker
Assembler
Compiler
Simulator
Profiler
Application
![Page 53: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/53.jpg)
252005 © R. Leupers
Instruction set simulation
Interpretive:• flexible• slow (~ 100 KIPS) Memory
ExecuteDecodeApplication Instruction
Run-TimeRun-Time
Compiled:• fast (> 10 MIPS)• inflexible • high memory
consumption
CompiledSimulation
Application
Compile-TimeCompile-Time Run-TimeRun-Time
ProgramMemory
SimulationCompiler Execute
Instruction BehaviorInstruction BehaviorInstruction Behavior
JIT-CCS™:• „just-in-time“
compiled• SW simulation cache• fast and flexible
CompiledSimulation
Cache
Run-TimeRun-Time
ProgramMemory
Application Decode
Instruction Instruction BehaviorInstructionInstruction Instruction Behavior
Execute
![Page 54: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/54.jpg)
262005 © R. Leupers
JIT-CC simulation performance
0
1
2
3
4
5
6
7
8
9
Compil
edInt
erpret
ive 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
432
768
0
10
20
30
40
50
60
70
80
90
100
Cache size [records]
Perf
orm
ance
[MIP
S]C
acheM
issR
atio[%
]
• Dependent on simulation cache size• 95% of compiled simulation performance @ 4096 cache
blocks (10% memory consumption of compiled sim.)• Example: ST200 VLIW DSP
![Page 55: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/55.jpg)
272005 © R. Leupers
Why care about C compilers?
Embedded SW design becoming predominant manpowerfactor in system designCannot develop/maintain millions of code lines in assemblylanguageMove to high-level programming languages
![Page 56: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/56.jpg)
282005 © R. Leupers
Why care about compilers?
Trend towards heterogeneous multiprocessor systems-on-chip (MPSoC)Customized application specific instruction set processors(ASIPs) are key MPSoC componentsHow to achieve efficient compiler support for ASIPs?
ASICASIC CPUCPU ASIPASIP
CPUCPUASIPASIP ASIPASIP
MemoryMemory MemoryMemory MemoryMemory
ASICASIC CPUCPU
MemMem
![Page 57: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/57.jpg)
292005 © R. Leupers
C compiler in the exploration loop
„„Compiler/Architecture CoCompiler/Architecture Co--DesignDesign““
Efficient C-compilers cannot bedesigned for ARBITRARY architectures!
ApplicationApplicationSoftwareSoftware CompilerCompiler ProcessorProcessor ResultsResults
Compiler and processor form a UNIT that needs to beoptimized!“Compiler-friendliness“ needs to be taken into accountduring the architecture exploration!
![Page 58: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/58.jpg)
302005 © R. Leupers
Retargetable compilers
source code
asmcode
CompilerCompiler
processormodel
Retargetable compiler
source code
asmcode
Classical compiler
CompilerCompilerprocessor
model
![Page 59: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/59.jpg)
312005 © R. Leupers
GNU C compiler (gcc)
• Probably the most widespread retargetable compiler
• Mostly used as a native Unix/Linux compiler, but may operate as a cross-compiler, too
• Support for C/C++, Java, and other languages
• Comes with comprehensive support software, e.g. runtime and standard libraries, debug support
• Portable to new architectures by means of machine description file and C support routines
“The main goal of GCC was to make a good, fast compiler for
machines in the class that the GNU system aims to run on: 32-bit
machines that address 8-bit bytes and have several general registers.
Elegance, theoretical power and simplicity are only secondary.”
“The main goal of GCC was to make a good, fast compiler for
machines in the class that the GNU system aims to run on: 32-bit
machines that address 8-bit bytes and have several general registers.
Elegance, theoretical power and simplicity are only secondary.”
![Page 60: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/60.jpg)
342005 © R. Leupers
CoSy compiler system (ACE)
© ACE - Associated Compiler Experts
• Universal retargetable C/C++ compiler
• Extensible intermediate representation (IR)
• Modular compiler organization
• Generator (BEG) for code selector, register allocator, scheduler
![Page 61: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/61.jpg)
362005 © R. Leupers
LISATek C compiler generation
Autom. analyses
Manual refinement
GUI
CoSy systemCoSy system
C CompilerC Compiler
LISAprocessor model
SYNTAX {“ADD“ dst, src1, src2
}
CODING {0b0010 dst src1 src2
}
BEHAVIOR { ALU_read (src1, src2);ALU_add ();Update_flags ();writeback (dst);
}
SEMANTICS {src1 + src2 dst;
}
…
SYNTAX {“ADD“ dst, src1, src2
}
CODING {0b0010 dst src1 src2
}
BEHAVIOR { ALU_read (src1, src2);ALU_add ();Update_flags ();writeback (dst);
}
SEMANTICS {src1 + src2 dst;
}
…
![Page 62: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/62.jpg)
372005 © R. Leupers
LISATek compiler generation
Frontend Opt Backend
ASM-CodeLD R1, [R2]ADD R1, #1SHL R1, #3…
C-Codeint a,b,c;a = b+1;c = a<<3;…
Code-Selector
Register-Allocator Scheduler
Instruction-Fetch
Mem
ALUFE DE EX
WBWrite-Back
Pipeline Control
Decoder
Registers
Decoder
Jump
DataRAM
ProgRAM
ADD …
…R[i] …
…#1
R[0..31]
JMPADDSUBSUB MUL
JMP 2 1
ADD 2 3
![Page 63: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/63.jpg)
382005 © R. Leupers
Compiled code quality: MIPS example
LISATek generated C-CompilerOut-of-the-box C-CompilerNo manual optimizationsDevelopment time of model
approx. 2 weeks
LISATek generated C-CompilerOut-of-the-box C-CompilerNo manual optimizationsDevelopment time of model
approx. 2 weeks
gcc C-Compilergcc with MIPS32 4kc backendUsed by most MIPS usersLarge group of developers,
several man-years of optimization
gcc C-Compilergcc with MIPS32 4kc backendUsed by most MIPS usersLarge group of developers,
several man-years of optimization
Cycles
0
20.000.000
40.000.000
60.000.000
80.000.000
100.000.000
120.000.000
140.000.000
gcc,-O4 gcc,-O2 cosy,-O4 cosy,-O2
Cycles
Size
0
10.000
20.000
30.000
40.000
50.000
60.000
70.000
80.000
gcc,-O4 gcc,-O2 cosy,-O4 cosy,-O2
SizeOverhead of 10% in cycle count and 17% in code densityOverhead of 10% in cycle count and 17% in code density
![Page 64: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/64.jpg)
392005 © R. Leupers
Demands on code quality
Compilers for embedded processors have to generateextremely efficient code
Code size: » system-on-chip» on-chip RAM/ROM
Performance:» real-time constraints
Power/energy consumption:» heat dissipation» battery lifetime
![Page 65: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/65.jpg)
402005 © R. Leupers
Compiler flexibility/code quality trade-off
variety ofembeddedprocessors
specialization
DSP NPU VLIW
dedicatedoptimizationtechniques
retargetablecompilation
unification
![Page 66: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/66.jpg)
412005 © R. Leupers
Adding processor-specific code optimizations
High-level (compiler IR)Enabled by CoSy´s engine concept
Low-level (ASM):
.C.C LISA CCompilerLISA C
Compiler Unscheduled.asm
Unscheduled.asm
Binary Code Generation
AssemblerAssembler LinkerLinker .out
Assembly API
Optimization 3Optimization 3Optimization 2Optimization 2Optimization 1Optimization 1Scheduled &Optimized
.asm
Scheduled &Optimized
.asm
![Page 67: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/67.jpg)
472005 © R. Leupers
4. ASIP architecture design
![Page 68: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/68.jpg)
482005 © R. Leupers
ASIP implementation after exploration
![Page 69: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/69.jpg)
492005 © R. Leupers
Unified Description Layer
G a t e – L e v e l
Register-Transfer-Level
L I S A
HDL Generation
Gate–Level Synthesis(e.g. SYNOPSYS design compiler)
![Page 70: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/70.jpg)
502005 © R. Leupers
Challenges in Automated ASIP Implementation
Instructions
Arithmetic Control
Mul
Mac
JMP
BRC
Independent description of instruction behavior:
+ Efficient Design Space Exploration
ADL:
1:1Mapping
HDL:
Multiplier(MUL)
Multiplier(MAC)
Independent mapping tohardware blocks:
- Insufficient architectural efficiencyby 1:1 mapping
![Page 71: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/71.jpg)
512005 © R. Leupers
Unified Description Layer
G a t e – L e v e l
Register-Transfer-Level
Unified Description Layer
L I S A
Structure & Mapping(incl. JTAG/DEBUG)
Optimizations
Backend (VHDL, Verilog, SystemC)
Gate–Level Synthesis(e.g. SYNOPSYS design compiler)
![Page 72: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/72.jpg)
522005 © R. Leupers
Optimization strategies
LISA: separate descriptionsfor separate instructions
Goal: share hardware forseparate instructions
Instruction A Instruction B
LISA Operation A
LISA Operation B
MutualExclusiveness
+
a b
x
+
c d
yPossible Optimizations• ALU Sharing
x,y
+
a c b d
![Page 73: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/73.jpg)
532005 © R. Leupers
Optimization strategies
AddressA
DataA
Register ArrayDataB
AddressB
LISA Operation A
LISA Operation B
Instruction A Instruction B
Path PA
Path PB
…
……
LISA: separate descriptionsfor separate instructions
Goal: same hardware forseparate instructions
Possible Optimizations• ALU Sharing• Path Sharing• ...
MutualExclusiveness
DataA, DataB
AddressA
AddressBRegister Array
…
ResourceSharing
![Page 74: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/74.jpg)
542005 © R. Leupers
5. Case study
![Page 75: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/75.jpg)
552005 © R. Leupers
Motorola 6811
Project Goals:
• Performance (MIPS) must be increased
• Compatibility on the assembly levelfor reuse of legacy code(Integration into existing tool flow)
• Royalty free design
compatible architecture developed with LISA using RTL processor synthesis
![Page 76: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/76.jpg)
562005 © R. Leupers
Motorola 6811
68116812
010010101001101011100101101011110000110110110100
legacy code
?
compiler
assembly
assembler
Increase
Performance!!!
(MIPS)Increase
Performance!!!
(MIPS)
![Page 77: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/77.jpg)
572005 © R. Leupers
Motorola 6811
010010101001101011100101101011110000110110110100
Bluetooth app.
SynthesizedArchitecture
6811 compiler
assembly
assembler
LISA
assembly levelcompatible
![Page 78: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/78.jpg)
582005 © R. Leupers
Architecture Development
original 6811 Processor LISA 6811 Processor
8 bit instructions 16 bit instructions
16 bit instructions 32 bit instructions
24 bit instructions
32 bit instructions
40 bit instructions
Instruction is fetched by 8 bit blocks:
up to 5 cycles for fetching!
Instruction is fetched by 8 bit blocks:
up to 5 cycles for fetching!
16 bit are fetched simultaneously:
max 2 cycles for fetching!
+ pipelined architecture+ possibility for special instructions
16 bit are fetched simultaneously:
max 2 cycles for fetching!
+ pipelined architecture+ possibility for special instructions
![Page 79: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/79.jpg)
592005 © R. Leupers
Tools Flow and RTL Processor Synthesis
C-Application
6811 compiler
AssemblyLISA model
LISA assembler
Executable
LISA tools
6811 compatible architecturegenerated completely in VHDL
1) VLSI Implementation: Area: <17kGates
Clock Speed: ~154 MHz2) Mapped onto XILINX FPGA
![Page 80: Designing Programmable Platforms: From ASIC to ASIPflavio/ensino/cmp237/aula20.pdf · Current Work Evaluation Results Chip Area, Clock Speed, Power Consumption SystemC, VHDL, Verilog](https://reader034.vdocuments.us/reader034/viewer/2022051919/600b698362d336617e0b0b05/html5/thumbnails/80.jpg)
752005 © R. Leupers
References
R. Leupers: Code Optimization Techniques for Embedded Processors - Methods, Algorithms, and Tools, Kluwer, 2000R. Leupers, P. Marwedel: Retargetable Compiler Technology for Embedded Systems - Tools and Applications, Kluwer, 2001A. Hoffmann, H. Meyr, R. Leupers:Architecture Exploration for Embedded Processors with LISA, Kluwer, 2002C. Rowen, S. Leibson: Engineering the Complex SoC: Fast, Flexible Design with Configurable Processors, Prentice Hall, 2004M. Gries, K. Keutzer, et al.: Building ASIPs: The Mescal Methodology, Springer, 2005P. Ienne, R. Leupers (eds.): Customizable and Configurable Embedded Processor Cores, Morgan Kaufmann, to appear 2006