mescal: design support for embedded processors and...
TRANSCRIPT
1
MESCAL: Design Support forEmbedded Processors and Applications
Prof. Kurt Keutzerand the MESCAL teamUC Berkeley
2
When We Got Started
01995 1996 1997 1998
Year
IC D
esig
ns
ASIC
ASSP
Handel Jones, IBS 9/23/2002
2
3
More Trouble for ASICs
DSM
Effects
Com
plex
ity
HeterogeneityTime-to-Money
Exponentially more complex, greater design risk,greater variety, and a smaller design window !
QuadrupleQuadruple--WhammyWhammy
4
Today’s Environment
• Unprecedented desire for product differentiation using per-application silicon, but …
• ASIC design becoming expensive and unpredictable– Increasing device complexity– Deep sub-micron effects: interconnect delay, noise– Design heterogeneity: analog, digital, processors, memory– Increasing time-to-market pressure
3
5
The result: total IC Designs
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
Year
IC D
esig
ns
ASSPASIC
ASIC
ASSP
Handel Jones, IBS 9/23/2002
6
Solution: ASIC => ASSP => ASIP
ASIP: Programmable Platforms•Develop platforms that allow for amortization of design costs over multiple generations•Make platforms programmable so that they have maximum flexibility with minimum overheadThe MESCAL Mission:
– To bring a disciplined methodology, and a supporting tool set, to the development, deployment and programming of application-specific programmable platforms akaASIPs
Invited paper: ``From ASIC to ASIP:The Next Design Discontinuity’’,K. Keutzer, S. Malik, R. Newton,Proceedings of ICCD, pp. 84-91, 2002.
Press coverage Sept 2002:Programmable Platforms will Rule:http://www.eetimes.com/story/OEG20020911S0063High on MESCALhttp://www.eetimes.com/story/OEG20020911S0065
SDRAM Controller
µenginePCI
Interface
SRAMController
StrongArmCore
I$
µengine
µengine
µengine
µengine
µengine
MiniD$
D$
IX BusInterface
HashEngine
ScratchPad
SRAM
4
7
The New Design Target
• Explosion of ASIP programmable platforms– Diverse types of processing elements– Diverse communications architecture– Multiple memories– Peripherals
ARM CoreARM Core
µµEnginesEngines
BusesBuses
IntelIntelIXP1200IXP1200
EthernetEthernetMACsMACs,,RAMRAM
8
A Discipline of Programmable Platform Design
1. Develop a disciplined approach to selecting application benchmarks
2. Develop a disciplined approach to identifying the architectural/micro-architectural design-space to be explored
3. Develop a convenient and comprehensive environment for the description, simulation, and analysis of potential architectural platforms within the design space
4. Develop an environment to efficiently explore and evaluate the design space of architectural platforms
5
9
A Discipline of Programmable Platform Design
1. Develop a disciplined approach to selecting application benchmarks
2. Develop a disciplined approach to identifying the architectural/micro-architectural design-space to be explored
3. Develop a convenient and comprehensive environment for the description, simulation, and analysis of potential architectural platforms within the design space
4. Develop an environment to efficiently explore and evaluate the design space of architectural platforms
10
Step 1: Disciplined Approach to Benchmarking
• The primary goals of (network processor) benchmarks– The chosen suite of benchmarks should be
• Representative• Easy to specify• Consist of a manageable number of benchmarks
– Enable quantitative comparison of architectures• Developed three benchmark specifications
– IPv4 Packet Forwarding– Network Address Port Translation– Multiprotocol Label Switching (MPLS)
• Implemented benchmarks on the Intel IXP1200 in assembler, C, Click, and a commercial environment (Teja)
• M. Tsai, C. Kulkarni, C. Sauer, N. Shah, K. Keutzer, “A Benchmarking Methodology for Network Processors,” First Workshop on Network Processors at the 8th International Symposium on High Performance Computer Architecture (HPCA8), Cambridge MA, USA, February 2002.
6
11
A Discipline of Programmable Platform Design
1. Develop a disciplined approach to selecting application benchmarks
2. Develop a disciplined approach to identifying the architectural/micro-architectural design-space to be explored
3. Develop a convenient and comprehensive environment for the description, simulation, and analysis of potential architectural platforms within the design space
4. Develop an environment to efficiently explore and evaluate the design space of architectural platforms
12
0
2
4
6
8
10
12
14
16
18
20
0 1 2 3 4 5 6 7 8 9Issue width per PE
Num
ber o
f PEs
32
48
64
Cognigine
Cisco
EZchip
Xelerated
IBMLexraMotorola
Intel
BRECISBroadcom
AppliedMicro
Clearwater
ClearSpeedVitesse
Agere
PMC-Sierra
AlchemyConexant
64 instrs/cycle
16 instrs/cycle
8 instrs/cycle
10
Charted the Architectural Diversity of NPUs
Surveyed over 30 network processor platforms
7
13
Step 2: Defined the Architectural Search SpaceFocused on NPU’s but this has been a robust classification for ASIPs5 Axes of the Architectural Design Space• Approaches to Parallel Processing
– Processing Element (PE) level– Instruction-level– Bit-level
• Elements of Special Purpose Hardware• Structure of Memory Architectures• Wide-variety of On-Chip Communication Mechanisms• Use of wide range of peripherals
Niraj Shah. Understanding Network Processors. Master's thesis, University of California, Berkeley, September, 2001.
Invited paper: Network Processors: Origin of Species, Niraj Shah, Kurt Keutzer, Proceedings of ISCIS XVII, The Seventeenth International Symposium on Computer and Information Sciences, October, 2002
14
A Discipline of Programmable Platform Design
1. Develop a disciplined approach to selecting application benchmarks
2. Develop a disciplined approach to identifying the architectural/micro-architectural design-space to be explored
3. Develop a convenient and comprehensive environment for the description, simulation, and analysis of potential architectural platforms within the design space
4. Develop an environment to efficiently explore and evaluate the design space of architectural platforms
8
15
Step 3: comprehensive environment for the description, simulation, and analysis of architectural platforms
Three significant sub-problems:• Individual processor models • Communication network models• Task-specific processor models
FiberFiber
GbEGbENoC
EthernetEthernet
802.11g802.11g
POTSPOTSMEMMEM
NoC
NPUNPU
MEMMEM
IP-SECIP-SECMedia Serve
r
Media Serve
rSATASATA
Media Acceleration
InternetInternet
OfficeOfficeNetworkNetwork
UWBUWB
Home Gateway
16
Key Features
• Natural description:– The environment must enable the easy description of all
the key elements of the programmable platform• Automated high-performance simulation
– The environment must automatically generate simulation models
– Simulation models must be high-performance• Amenable to Analysis
– Analytical or simulation models must provide the relevant information for making key design decisions
• Industrial strength– The environment must be capable of describing,
simulating, and analyzing REAL industrial-strength designs
9
17
Step 3: comprehensive environment for the description, simulation, and analysis of architectural platforms
Three significant sub-problems:• Individual processor models• Communication network models• Task-specific processor models
FiberFiber
GbEGbENoC
EthernetEthernet
802.11g802.11g
POTSPOTSMEMMEM
NoC
NPUNPU
MEMMEM
IP-SECIP-SECMedia Serve
r
Media Serve
rSATASATA
Media Acceleration
InternetInternet
OfficeOfficeNetworkNetwork
UWBUWB
Home Gateway
18
Processor Modeling with MADL•Research focus
– Modeling concurrency and resource utilization in processors
– Automating software tool-chain generation
• Achievements– Operation State Machine (OSM) as
micro-processor model (For StrongARM, PowerPC750, TMS320C54x)
MADLMADL
Model Analyzer
SimulatorSimulator
Compiler
Machine CodeMachine Code
FalseFalse
FalseFalseFalseFalse
PE modelPE model ApplicationApplication
W. Qin, S. Malik. Automated Synthesis of Efficient Binary Decoders for Automated Synthesis of Efficient Binary Decoders for Retargetable Software ToolkitsRetargetable Software Toolkits, Proceedings of the 40th Design Automation Conference (DAC 03), June 2003, pp. 764-769. W. Qin, S. Malik. Flexible and Formal Modeling of Microprocessors with Flexible and Formal Modeling of Microprocessors with Application to Retargetable SimulationApplication to Retargetable Simulation, Proceedings of 2003 Design Automation and Test in Europe Conference (DATE 03), Mar, 2003, pp.556-561
10
19
Step 3: comprehensive environment for the description, simulation, and analysis of architectural platforms
Three significant sub-problems:• Individual processor models • Communication network models• Task-specific processor models
FiberFiber
GbEGbENoC
EthernetEthernet
802.11g802.11g
POTSPOTSMEMMEM
NoC
NPUNPU
MEMMEM
IP-SECIP-SECMedia Serve
r
Media Serve
rSATASATA
Media Acceleration
InternetInternet
OfficeOfficeNetworkNetwork
UWBUWB
Home Gateway
M. Sgroi, M. Sheet, A. Mihal, K. Keutzer, S. Malik, J. Rabaey, A. Sangiovanni-Vincentelli, ,
"Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design",
Proceedings of the 38th Design Automation Conference, Los Angeles, CA., Pages 667-672,
June 2001.
20
Network-on-a-Chip (NOC) Architectures
NOC Description Distributed Application
Simulation Engine
Timing Power
• Research focus:– Design space exploration
tools to evaluate and make NOC design choices
• An application driven approach based on modular modeling environments– Multiprocessor simulators
developed based on SystemC, Liberty Simulation Environment (LSE)
Xinping Zhu, Sharad Malik, A Hierarchical Modeling Framework for On-Chip Communication Architectures, Proceedings of International Conference on Computer-Aided Design, 2002.
Hang-Sheng Wang, Xinping Zhu, Li-Shiuan Peh and Sharad Malik, Orion: A Power-Performance Simulator for Interconnection Networks, In Proceedings of the 35th International
Symposium on Microarchitecture (MICRO), Istanbul, Turkey, November 2002.
11
21
Power-aware Networks-on-a-Chip
Research focus: Modeling and development of power efficient network architectures• Hang-Sheng Wang, Li-Shiuan Peh and Sharad Malik, "Power-Driven Design of
Router Microarchitectures in On-Chip Networks.", In Proceedings of the 36th International Symposium on Microarchitecture (MICRO), San Diego, November 2003, to appear.
• Hang-Sheng Wang, Li-Shiuan Peh and Sharad Malik, Power Model for Routers: Modeling Alpha 21364 and InfiniBand Routers , In IEEE Micro, Vol. 24, No. 1, January/February 2003 (Best of Hot Interconnects 10).
average power savings of synthetic and real traces
0.00%
5.00%
10.00%
15.00%
20.00%
25.00%
30.00%
35.00%
40.00%
45.00%
50.00%
8x8 torus randomtraffic
4x4 torus randomtraffic
TRIPS CMPtraces
benchmarks
pow
er s
avin
gs cut-through crossbarsegmented crossbarwrite-through bufferexpress cubeall
Power-efficient network architectures/microarchitectures
Power modeling of interconnection networks
a flit
in/out queue energy
buffer model
xb traversal energy
xb model
link energy
link model
arbitration energy
arbiter model
a flit
in/out queue energy
buffer model
xb traversal energy
xb model
link energy
link model
arbitration energy
arbiter modelarchitectural-level power modeling, validated against Raw microprocessor and other routers
22
Step 3: comprehensive environment for the description, simulation, and analysis of architectural platformsThree significant sub-problems:• Individual processor models • Communication network models• Task-specific processor models
FiberFiber
GbEGbENoC
EthernetEthernet
802.11g802.11g
POTSPOTSMEMMEM
NoC
NPUNPU
MEMMEM
IP-SECIP-SECMedia Serve
r
Media Serve
rSATASATA
Media Acceleration
InternetInternet
OfficeOfficeNetworkNetwork
UWBUWB
Home Gateway
12
23
TIPI: A design environment for task-specific processors
TIPI: Research focus: operation-based design approach for datapath intensive
task-specific processors• Convolution Coding Processor was pushed through the Tipi design
methodology by an industrial ASIC designer• An automatically generated compiled code simulator executed >100
million instructions/second on 2.4 GHz P4.• Synthesizable RTL was also automatically generated.• “Multi-View Operation-Level Design -- Supporting the Design of
Irregular ASIPS”, Scott J Weber, Matthew W. Moskewicz, Manuel Loew, and Kurt Keutzer, University of California, Berkeley, UCB/ERL M03/12, April, 2003
24
Step 3: comprehensive environment for the description, simulation, and analysis of architectural platformsThree significant sub-problems:• Individual processor models • Communication network models• Task-specific processor models
FiberFiber
GbEGbENoC
EthernetEthernet
802.11g802.11g
POTSPOTSMEMMEM
NoC
NPUNPU
MEMMEM
IP-SECIP-SECMedia Serve
r
Media Serve
rSATASATA
Media Acceleration
InternetInternet
OfficeOfficeNetworkNetwork
UWBUWB
Home Gateway
13
25
The Liberty Simulation EnvironmentReleased!• Research focus:
Validation, Automatic Model Generation, Model Language Theory, Simulator Synthesis
• Detailed micro-architectural modeling
• Users: UCLA, UPC Barcelona, Colorado, UIUC, Rice, Intel, UMich, Princeton, Infineon
• Models: IA-64, DLX, Multiprocessor, Networks/Routers
• Version 1.0 Release Party at MICRO!
Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August Proceedings of the 40th Design Automation Conference, June 2003.
Microarchitectural Exploration with Liberty Manish Vachharajani, Neil Vachharajani, David A. Penry, Jason A. Blome, and David I. August Proceedings of the 35th International Symposium on Microarchitecture, November 2002. (Best Student Paper Award)
26
A Discipline of Programmable Platform Design
1. Develop a disciplined approach to selecting application benchmarks
2. Develop a disciplined approach to identifying the architectural/micro-architectural design-space to be explored
3. Develop a convenient and comprehensive environment for the description, simulation, and analysis of potential architectural platforms within the design space
4. Develop an environment to efficiently explore and evaluate the design space of architectural platforms
14
27
Step 4: Efficiently explore and evaluate the design space of architectural platforms
PerformanceAnalysis
PerformanceAnalysis
ArchitectureArchitectureApplicationApplication
ArchitecturePlatform
ArchitecturePlatform
510152025303540
28
Comprehensive Survey of Design Space Exploration MethodsResearch focus: comprehensive survey of design space exploration techniques• Comparison of 16 frameworks, 9 evaluation schemes, 18 covering and
automation methods, cost functions, and representations for architectures and applications
• Overall more than 120 papers considered• M. Gries: Methods for Evaluating and Covering the Design Space during Early
Design Development. Technical report, UC Berkeley, UCB/ERL M03/32, 53 pages, Aug. 2003
Architecture Application
Mapping
Evaluation
15
29
Case Study on Fast Design Space Exploration
Research focus:• Evaluation of analytical method for fast design space exploration• Comparison for IPv4 packet forwarding on Intel IXPM. Gries, C. Kulkarni, C. Sauer, K. Keutzer: Comparing Analytical Modeling
with Simulation for Network Processors. DATE, March 2003
0
5
10
15
20
25
30
35
40
40 64 65 128 129 192 193 256Packet length [byte]
End-to-end packet delay [µs]
SimulationNP-GPS analysis
Packet length [byte]
µ−Engine load [%]
Analytical model
Simulation: polling artifactsSimulation: computation part
0%10%20%30%40%50%60%70%80%90%
40 64 65 128 129 192 193 256
30
Exploring Processing Element Topologies
Research focus• Which topology
performs best?• What is the impact of
choosing a certain topology on the programmability of the device?
• Scaling issues?
Number of PE stages
Number of PEs per stage
Intel (IXP1200)Intel (IXP1200)
Cisco (PXF/ToasterCisco (PXF/Toaster--2)2)
AgereAgere (Payload Plus)(Payload Plus)UniUni processorprocessor1
4
8
841
2
6
2 6
VitesseVitesse IQ2000IQ2000
BroadcomBroadcom 1250012500
XeleratedXelerated Packet DevicesPacket Devices
1x8 Pool1x8 Pool
2x4 2x4 Pool of PipelinesPool of Pipelines
4x2 4x2 Pool of PipelinesPool of Pipelines
8x1 Pipeline8x1 Pipeline
M. Gries, C. Kulkarni, C. Sauer, K. Keutzer: M. Gries, C. Kulkarni, C. Sauer, K. Keutzer: Exploring TradeExploring Trade--offs in Performance and offs in Performance and Programmability of Processing Element Topologies for Network ProProgrammability of Processing Element Topologies for Network Processors, In: Network cessors, In: Network Processor Design: Issues and Practices, volume 2Processor Design: Issues and Practices, volume 2, (NP2 Workshop @ HPCA9), Morgan , (NP2 Workshop @ HPCA9), Morgan Kaufmann Publishers, Oct. 2003Kaufmann Publishers, Oct. 2003
16
31
The New Design Source
• Heterogeneous applications• Multiple flavors of concurrency
FromDevice(0)ToDevice(0)
FromDevice(1)
FromDevice(2)
FromDevice(3)
Discard
ToDevice(1)
ToDevice(2)
ToDevice(3)
Discard
…
FromDevice(15)
LookupIPRoue
ToDevice(15)
… …
IPVerify DecIPTTL
DiscardDiscard
IPVerifyDecIPTTL
Discard Discard
IPVerifyDecIPTTL
… Discard
DecIPTTL
Discard
DecIPTTL
32
Modeling
• Many interesting problems in modeling complex heterogeneous systems
• We are hoping that Metropolis and Ptolemy II solve them all
17
33
Implementation Gap
The New Implementation Problem
Mapping concurrent heterogeneous applications onto heterogeneous multiprocessor systems
Can we bridge this gap and provide
- Programmer productivity- Implementation efficiency- System correctness
34
Implementation Gap
The New Design Problem
Mapping concurrent heterogeneous applications onto heterogeneous multiprocessor systems
Can we bridge this gap and provide
- Programmer productivity- Implementation efficiency- System correctness
Goal: Close the gap!
18
35
MESCAL Approaches
• Bottom-up: generalize from specific instance– Start with a specific application domain and a specific
architecture– Develop useful abstractions of the device– Aspire to achieve within 10% of hand-coded performance
with 2-5X improvement in productivity– Should teach us a lot about how to get this right
• Top-down: specify from general approach– Consider heterogeneous applications that use
combinations of MoCs– Develop a mapping discipline
• Correct-by-construction implementation• Target a broad class of architectures
– Should teach us a lot about how to provide a general solution
36
Bottom-up Approach
• Start with a specific application development environment and a specific architecture instance
• Identify the preferred device – IXP1200• Identify the preferred progamming environment - Click • Attempt to fill the implementation gap with
– Within 10% of hand-coded efficiency– With 2-5X productivity
19
37
Target Architecture of choice: Intel IXP1200
SDRAM Controller
µenginePCI
Interface
SRAMController
StrongArmCore
I$
µengine
µengine
µengine
µengine
µengine
MiniD$
D$
IX BusInterface
HashEngine
ScratchPad
SRAM
38
IXP1200 Programming Difficulties
• Current programming abstraction: IXP-C– Subset of C– Need to write 6 parallel multi-threaded programs– Not clear where the architectural bottlenecks are
• Programmer must still:– Divide code among threads– Take advantage of distributed, heterogeneous memories– Arbitrate access to shared resources– Interact with peripherals– Take advantage of application concurrency
20
39
Environment of choice: Click
• Domain-specific language for describing networking applications
• Applications are built by composing elements that correspond to common packet processing operations
• Elements communicate via ports that pass packets– push: initiated by source element– pull: initiated by destination element
• Current implementation in C++ for Linux workstations
FromDevice(0) ToDevice(0)
FromDevice(1) ToDevice(1)
LookupIPRoute
Source: E. Kohler et al. The Click Modular Router. TOCS. pg. 263-297, August 2000.
40
Programming Model (NP-Click)• raises abstraction of architecture• facilitates mapping of application env
Implementation Gap
Bridging the Gap
21
41
What is a Programming Model?
• A programmer’s view of the architecture that balances:
Opacity
– Abstract architecture
– Obviate need to initially learn microarchitecture
– Ease of programming
Visibility
– Expose key architectural features
– Allow performance improvement
– Enable efficient implementation
• Presents a productive approach to using computational power of the device
42
Our Solution: NP-Click
• NP-Click is a programming model implemented on the Intel IXP1200
• Integrates concepts from Click– elements – communication via push and pull of packets
• And an abstraction of the underlying hardware– thread boundaries– data layout– arbitration of shared resources
22
43
NP-Click: Usage Model
• Methodology: identify what is important to the programmer and narrow the scope of their concerns
• Two steps– Design application by composing elements
• determine thread boundaries• mapping shared data to physical memory• select/implement arbitration schemes
– Implement elements in IXP-C• elements have well-defined I/O• data descriptors for scoping of variables• simple interface to access shared resources• special-purpose hardware
44
Evaluating the Methodology
• Implemented a 16-port IPv4 packet forwarder – NP-Click– NP-Click with arbitration optimization– IXP-C (hand-coded)– Assembler (hand-coded)
• Use maximum sustainable data rate as proxy for performance
• Measured performance across a range of packet sizes, including an IETF recommended packet mix
23
45
Initial Results
• NP-Click achieves 35% of IXP-C’s performance• Poor TFIFO arbitration scheme is responsible for
performance shortfall
0
200
400
600
800
1000
1200
1400
64 128 256 512 1024 1280 1518 IETFInput Packet Size
Dat
a R
ate
(Mbp
s)
NP-Click
IXP-C
Source: N. Shah et al, “NP-Click: A Programming Model for the Intel IXP1200,” NP-2, 9th HPCA, 2003.
46
Performance Tuning
• A better arbitration scheme results in >2x performance improvement• This improved version performs within 10% of IXP-C for larger
packets
0
200
400
600
800
1000
1200
1400
64 128 256 512 1024 1280 1518 IETFInput Packet Size
Dat
a Ra
te (
Mbp
s) NP-Click
NP-Click (w/arb opt)
IXP-C
Source: N. Shah et al, “NP-Click: A Programming Model for the Intel IXP1200,” NP-2, 9th HPCA, 2003.
24
47
Comparison to Assembly Language
• ASM version outperforms IXP-C version by ~15%• Fine-grain synchronization with TFIFO state machine
0
200
400
600
800
1000
1200
1400
1600
64 128 256 512 1024 1280 1518 IETFInput Packet Size
Dat
a Ra
te (
Mbp
s) NP-Click
NP-Click (w/arb opt)
IXP-C
ASM
Source: N. Shah et al, “A Comparison of Programming Models”, submitted to LCTES 2003.
48
Bottom-up: Lessons Learned
• What does the designer need to see in order to do mapping?– Application characteristics– Architectural features
• Concurrency– Application thread boundaries– Architectural multiprocessing capabilities– Match threads with PEs
• State– Application memory usage– Multiprocessor memory architecture / memory hierarchy
• Arbitration of shared resources– Special-purpose function units– I/O
25
49
Top-down Approach
• Start with a general application development environment and a broad family of architectures– Heterogeneous applications are important– Architectural features evolve during design-space exploration
• Create a formal model of the application– Capture application concurrency– Handle heterogeneous combinations of MoCs
• Disciplined approach to mapping– Enable design-space exploration– Discover architectural features that give the most
performance• Warpath
– Model heterogeneous applications (with the goal of implementation)
– Map to Teepee architectures
50
Warpath
• Disciplined methodologies and a supporting tool set for the top-down approach
Formal models capture concurrency
Formal model enables automatic exportation
Correct-by-construction implementation
Programmer’sModel
Programmer’sModel Mapping
ProcessMappingProcess
CodeGeneration
Process
CodeGeneration
Process
PerformanceAnalysis
PerformanceAnalysis
ApplicationDevelopmentEnvironment
ApplicationDevelopmentEnvironment
Architecture Instance
Architecture Instance
ApplicationsApplicationsApplicationsApplicationsApplicationsApplications
26
51
Disciplined Design-Space Exploration
• Y-chart (Kienhuis, Deprettere et al. 2001), Polis 2001
Programmer’sModel
Programmer’sModel Mapping
ProcessMappingProcess
CodeGeneration
Process
CodeGeneration
Process
PerformanceAnalysis
PerformanceAnalysis
Suggest architecturalimprovements Modify the
applications
Use differentmappingstrategies
ApplicationDevelopmentEnvironment
ApplicationDevelopmentEnvironment
Architecture Instance
Architecture Instance
ApplicationsApplicationsApplicationsApplicationsApplicationsApplications
52
Application Development
• Model concurrent applications formally with Models of Computation
• CLICK: MoC and actor library for network processing applications
27
53
Warpath Application Development Env.
• Good ideas from Ptolemy II– Models of Computation– Orthogonalization of computation, communication, and
control– Library of domain-polymorphic components– Hierarchical heterogeneity
• Targeted for implementation on a Teepee architectural platform– Strict software interfaces for computation,
communication, control– Separate implementation and visualization– Get rid of Java– Don’t assume RISC-like datapaths
54
Teepee Processing Elements
• Control structures are implicit in the model• Control synthesis strategies:
– Hardcoded state machine– Horizontal/vertical microcode– Reconfigurable– RISC/VLIW– None of the above
• Runs sequential programs Executes one or more operations each cycle
• Opportunity to customize processing element control to the style of computation the application uses
28
55
Lessons Learned
• What to capture in an Application Development Environment?– Ptolemy II– Separate communications, computation, control
• What to export up from an architecture?– Processing element capabilities– Communication architecture capabilities
• Communications Implementation View– Match application actors with architecture PEs– Implement communication semantics over communication
architecture– Verify that an implementation is correct
56
MESCAL Summary
• Address the key challenges in supporting a the design, deployment, and implementation on a new generation of programmable platforms
• Supply new generation of ASIPs with programming models• Close the implementation gap between application development
environments and target ASIPs• Explore in parallel a ``bottom-up’’ approach seeking ``industrial
strength’’ results and a ``top-down’’ approach seeking a generally applicable methodology
• Examine tradeoffs between– Quality-of-results (e.g. speed, but also power, device cost)– Programmer productivity (how long does all this take?)
• Active questions:– What are the costs and benefits of a general approach vs. an
application- and architecture-specific approach?