design support for embedded processors and applications

24
Design Support for Design Support for Embedded Processors Embedded Processors and Applications and Applications Prof. Kurt Keutzer Prof. Kurt Keutzer EECS EECS University of California University of California Berkeley, CA Berkeley, CA [email protected] [email protected]

Upload: roxy

Post on 15-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

Design Support for Embedded Processors and Applications. Prof. Kurt Keutzer EECS University of California Berkeley, CA [email protected]. Embedded system needs meet technology constraints. Embedded system design needs: Fast-time to market Predictability Reliability Robustness - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design Support for  Embedded Processors and Applications

Design Support for Design Support for Embedded ProcessorsEmbedded Processorsand Applicationsand Applications

Prof. Kurt KeutzerProf. Kurt Keutzer

EECSEECS

University of CaliforniaUniversity of California

Berkeley, CABerkeley, CA

[email protected]@eecs.berkeley.edu

Page 2: Design Support for  Embedded Processors and Applications

2

Embedded system needs meet technology constraintsEmbedded system needs meet technology constraints

Embedded system design needs: Embedded system design needs:

Fast-time to marketFast-time to market

Predictability Predictability

ReliabilityReliability

RobustnessRobustness

EfficiencyEfficiency

EconomyEconomy

Application-specific integrated circuits failing to deliver thisApplication-specific integrated circuits failing to deliver this

Design riskDesign risk

Design/tool costDesign/tool cost

Unmanageable complexityUnmanageable complexity

Page 3: Design Support for  Embedded Processors and Applications

3

Demise of ASIC: Total IC DesignsDemise of ASIC: Total IC Designs

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

1995

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

Year

IC D

esig

ns

ASSP

ASIC

ASIC

ASSP

Handel Jones, IBS 9/23/2002

Page 4: Design Support for  Embedded Processors and Applications

4

Customer needs meet technology constraintsCustomer needs meet technology constraints

Customer needs: Customer needs: Fast-time to marketFast-time to market Predictability Predictability ReliabilityReliability RobustnessRobustness EfficiencyEfficiency EconomyEconomy

Looking for ``platforms’’ – devices that will amortize system design costs over Looking for ``platforms’’ – devices that will amortize system design costs over

multiple generations multiple generations

``"Based on our analysis, having a software approach is the only way to scale to ``"Based on our analysis, having a software approach is the only way to scale to

the next generation," Corgan (, Intel PMM said) . "If you have to approach each the next generation," Corgan (, Intel PMM said) . "If you have to approach each

fourfold increase in speed — from OC-48 [2.5 Gbits/second] to OC-192 [10 fourfold increase in speed — from OC-48 [2.5 Gbits/second] to OC-192 [10

Gbits/s], say — with a new architecture, it's not cost-effective."Gbits/s], say — with a new architecture, it's not cost-effective."

Page 5: Design Support for  Embedded Processors and Applications

5

Solution: ASIC => ASSP => ASIPSolution: ASIC => ASSP => ASIPASIP: Programmable PlatformsASIP: Programmable Platforms

Develop platforms that allow for Develop platforms that allow for

amortization of design costs over multiple amortization of design costs over multiple

generationsgenerations

Make platforms Make platforms programmable programmable so that so that

they have maximum flexibility with they have maximum flexibility with

minimum overheadminimum overhead

SDRAM Controller

enginePCI

Interface

SRAMController

StrongArmCore

I$

engine

engine

engine

engine

engine

MiniD$

D$

IX BusInterface

HashEngine

ScratchPad

SRAM

Page 6: Design Support for  Embedded Processors and Applications

6

TM-xxxxTM-xxxx D$

D$

I$I$

TriMedia CPU

DEVICE I/P BLOCKDEVICE I/P BLOCK

DEVICE I/P BLOCKDEVICE I/P BLOCK

DEVICE I/P BLOCKDEVICE I/P BLOCK

. . .

DVP System Silicon

VLIW Media Processor:• 100 to 300+ MHz• 32-bit or 64-bit

NexperiaSystem Busses• PI bus• Memory bus• 32-128 bit

PI

BU

S

SDRAMSDRAM

MMIMMI

DV

P M

EM

OR

Y

BU

S

DEVICE I/P BLOCKDEVICE I/P BLOCK

PRxxxxPRxxxxD$

D$

I$I$

MIPS CPU

DEVICE I/P BLOCKDEVICE I/P BLOCK. . .

DEVICE I/P BLOCKDEVICE I/P BLOCK

PI

BU

S

General Purpose RISC Processor• 50 to 300+ MHz• 32-bit or 64-bitLibrary of Device Blocks• Imagecoprocessors

• DSPs• UART• 1394• USB

•…and more

TriMediaTMMIPSTM

Example: Philips NexperiaExample: Philips NexperiaTMTM

Flexible architecture for digital video applications

Page 7: Design Support for  Embedded Processors and Applications

7

Configurable/Reconfigurable ProcessorsConfigurable/Reconfigurable Processors

FPGA Processor

SpecializedMicro-Architectures

SpecializedInstruction-SetArchitectures

Domain-Specialization

ChameleonSystems

Morphics

Frontier Design

Tensilica

ARC

Improv SystemsPMC Sierra

Xilinx Altera AtmelTriscend

Actel

Adaptive SiliconProcelereASIC

Page 8: Design Support for  Embedded Processors and Applications

8

0

2

4

6

8

10

12

14

16

18

20

0 1 2 3 4 5 6 7 8 9

Issue width per PE

Nu

mb

er o

f P

Es

32

48

64

Cognigine

Cisco

EZchip

Xelerated

IBMLexraMotorola

Intel

BRECISBroadcom

AppliedMicro

Clearwater

ClearSpeedVitesse

Agere

PMC-Sierra

AlchemyConexant

64 instrs/cycle

16 instrs/cycle

8 instrs/cycle

10

Galaxy of Network ProcessorsGalaxy of Network Processors

Page 9: Design Support for  Embedded Processors and Applications

9

ASIP/Programmable Platform CharacteristicsASIP/Programmable Platform Characteristics

5 Axes of the Architectural Design Space5 Axes of the Architectural Design Space

Approaches to Parallel Processing Approaches to Parallel Processing

Processing Element (PE) levelProcessing Element (PE) level

Instruction-levelInstruction-level

Bit-levelBit-level

Elements of Special Purpose HardwareElements of Special Purpose Hardware

Structure of Memory ArchitecturesStructure of Memory Architectures

Types of On-Chip Communication MechanismsTypes of On-Chip Communication Mechanisms

Use of PeripheralsUse of Peripherals

Era of a single RISC PE (+ 1 DSP) over!Era of a single RISC PE (+ 1 DSP) over!

This is not a ``run POSIX on an ARM’’ problem!This is not a ``run POSIX on an ARM’’ problem!

Explosion of application specific solutions =>ASIPsExplosion of application specific solutions =>ASIPs

RISC +

SFU

RISC +

SFU

RISC +

SFU

RISC

Co-proc

Co-proc

CAM

Ethernet MAC

SRAM

Page 10: Design Support for  Embedded Processors and Applications

10

Three Key Problem Areas EmergeThree Key Problem Areas Emerge

Development of programmable platforms/ASIP:Development of programmable platforms/ASIP:

Characterizing target applicationsCharacterizing target applications

Design space explorationDesign space exploration

Deployment of programmable platforms/ASIP:Deployment of programmable platforms/ASIP:

Development of programming modelDevelopment of programming model

Provision of software environmentProvision of software environment

Mapping applications onto programmable platforms/ASIP:Mapping applications onto programmable platforms/ASIP:

Application modelingApplication modeling

Application mappingApplication mapping

Page 11: Design Support for  Embedded Processors and Applications

11

Addressing the problem areasAddressing the problem areasMModern odern EEmbedded mbedded SSystems ystems CCompilers ompilers

AArchitectures and rchitectures and LLanguagesanguages

MESCALMESCAL research mission: research mission:

To bring a disciplined methodology, To bring a disciplined methodology, and a supporting tool set, to the and a supporting tool set, to the development, deployment, and development, deployment, and programming of application-specific programming of application-specific programmable platforms aka programmable platforms aka application specific instruction application specific instruction processors.processors.

Invited paper: ``From ASIC to ASIP:The Next Design Discontinuity’’,K. Keutzer, S. Malik, R. Newton,Proceedings of ICCD, pp. 84-91, 2002.www.gigascale.org/mescal

Press coverage Sept 2002:Programmable Platforms will Rule:http://www.eetimes.com/story/OEG20020911S0063

High on MESCALhttp://www.eetimes.com/story/OEG20020911S0065

SDRAM Controller

enginePCI

Interface

SRAMController

StrongArmCore

I$

engine

engine

engine

engine

engine

MiniD$

D$

IX BusInterface

HashEngine

ScratchPad

SRAM

Page 12: Design Support for  Embedded Processors and Applications

12

Three Key Problem AreasThree Key Problem Areas

Development of programmable platforms:Development of programmable platforms:

Characterizing target applicationsCharacterizing target applications

Design space explorationDesign space exploration

Deployment of programmable platforms:Deployment of programmable platforms:

Development of programming modelDevelopment of programming model

Provision of software environmentProvision of software environment

Mapping applications onto programmable platformsMapping applications onto programmable platforms

Application modelingApplication modeling

Application mappingApplication mapping

Page 13: Design Support for  Embedded Processors and Applications

13

Complementary IssuesComplementary Issues

Heterogeneous applicationsHeterogeneous applications

Programming EnvironmentProgramming Environment

Domain-specific modelsDomain-specific models Domain specific librariesDomain specific libraries Environmental modelsEnvironmental models

``Software architecture’’/MOC``Software architecture’’/MOC

Primitive computation and Primitive computation and

communication mechanismscommunication mechanisms

Mapping to implementationMapping to implementation

Heterogeneous programmable platforms/ASIPHeterogeneous programmable platforms/ASIP

Programming modelProgramming model

Domain-specific presentationDomain-specific presentation Device Specific LibrariesDevice Specific Libraries Environmental supportEnvironmental support

System architecture/micro-architecture/MOCSystem architecture/micro-architecture/MOC

Primitive computation and communication Primitive computation and communication mechanismsmechanisms

Port 0 IP ForwardingEngine

Port 0

Page 14: Design Support for  Embedded Processors and Applications

14

Our ApproachOur Approach

Bottom-up view - create abstractions of existing devices Bottom-up view - create abstractions of existing devices opacity - hide micro-architectural details from programmeropacity - hide micro-architectural details from programmervisibility - sufficient detail of the architecture to allow the visibility - sufficient detail of the architecture to allow the

programmer to improve the efficiency of the programprogrammer to improve the efficiency of the program Top down – experiment with existing modeling/programming Top down – experiment with existing modeling/programming

environmentsenvironmentsLearn from their abstractions of the devicesLearn from their abstractions of the devicesTry to maximize performance within these environmentsTry to maximize performance within these environments

Page 15: Design Support for  Embedded Processors and Applications

15

Our Constraint/Angle/PrejudiceOur Constraint/Angle/Prejudice

In real-time embedded systems correct logical functionality can In real-time embedded systems correct logical functionality can never be divorced from system performancenever be divorced from system performance

In commercial (especially consumer-oriented) embedded systems In commercial (especially consumer-oriented) embedded systems system price is an utmost concernsystem price is an utmost concern

QuantitativeQuantitative (Quantitatively) examine trade-offs among:(Quantitatively) examine trade-offs among:

Quality-of-results (e.g. speed, but also power, device cost)Quality-of-results (e.g. speed, but also power, device cost)Programmer productivity (how long does all this take?)Programmer productivity (how long does all this take?)

Page 16: Design Support for  Embedded Processors and Applications

16

Application: IPv4 Forwarding BenchmarkApplication: IPv4 Forwarding Benchmark

Port 1

Port 2

Port 15

.

.

.

Port 1

Port 2

Port 15

.

.

.

Ingress Ports

FIFOs Functionality FIFOsEgress Ports

Port 0 IP ForwardingEngine

Port 0

IP ForwardingEngine

IP ForwardingEngine

Page 17: Design Support for  Embedded Processors and Applications

17

Example Programming target – IXP1200Example Programming target – IXP1200

Intel IXP1200Intel IXP1200 Multiple processorsMultiple processors specialized execution unitsspecialized execution units hardware context swaphardware context swap

``Intel Corp., … chose a "fully ``Intel Corp., … chose a "fully

programmable" architecture with programmable" architecture with

plenty of space for users to add their plenty of space for users to add their

own software — but one that turned own software — but one that turned

out to be difficult to program. ‘’out to be difficult to program. ‘’

http://www.eetimes.com/story/http://www.eetimes.com/story/

OEG20020830S0061OEG20020830S0061

SDRAM Controller

enginePCI

Interface

SRAMController

StrongArmCore

I$

engine

engine

engine

engine

engine

MiniD$

D$

IX BusInterface

HashEngine

ScratchPad

SRAM

How do we program these architectures? What’s the right programming model?How do we program these architectures? What’s the right programming model?

Page 18: Design Support for  Embedded Processors and Applications

18

Base-line reference implementations from IntelBase-line reference implementations from Intel

AssemblerAssembler

Reference applicationReference application

uEngine C FeaturesuEngine C Features

Reference application modifiedReference application modified

Basic C language constructs like loops, condition statements and basic Basic C language constructs like loops, condition statements and basic

data types (char, int, float)data types (char, int, float)

IXP library defines additional data types, macros and functions (useful for IXP library defines additional data types, macros and functions (useful for

common networking applications)common networking applications)

Memory management is user defined. Hence explicit declaration of Memory management is user defined. Hence explicit declaration of

memory allocation (and no support for pointers).memory allocation (and no support for pointers).

Page 19: Design Support for  Embedded Processors and Applications

19

Commercial NPU programming environmentCommercial NPU programming environment

Teja TechnologiesTeja Technologies

Teja is founded by Akash Deshpande – Student of Prof. Pravin VaraiyaTeja is founded by Akash Deshpande – Student of Prof. Pravin Varaiya

Based on his thesis “Control of Hybrid Systems” (1994)Based on his thesis “Control of Hybrid Systems” (1994)

Teja Language FeaturesTeja Language Features

User interacts mostly with the graphical interface (which exports pre-User interacts mostly with the graphical interface (which exports pre-

defined application primitives)defined application primitives)

Extending the Teja primitives is done via a FSM-based model (however, Extending the Teja primitives is done via a FSM-based model (however,

this still requires coding in assembly via the graphical interface)this still requires coding in assembly via the graphical interface)

Memory management for pre-defined primitives is done by Teja. User can Memory management for pre-defined primitives is done by Teja. User can

alter this process (but is tedious and error prone)alter this process (but is tedious and error prone)

Page 20: Design Support for  Embedded Processors and Applications

20

Teja FeaturesTeja Features

Page 21: Design Support for  Embedded Processors and Applications

21

Our own NPU programming environment: NPClickOur own NPU programming environment: NPClick Based on ClickBased on Click

Popular environment for describing/implementing network applicationsPopular environment for describing/implementing network applications Developed by Eddie Kohler, MIT=> ICSIDeveloped by Eddie Kohler, MIT=> ICSI

NPClickNPClick Implemented subset of element library in IXP uCImplemented subset of element library in IXP uC Element communication via function callsElement communication via function calls

maintained semantics (packet push/pull)maintained semantics (packet push/pull) packet storage fixed:packet storage fixed:

header in SRAMheader in SRAM payload in DRAMpayload in DRAM

Designer needs to specify:Designer needs to specify: thread boundariesthread boundaries thread/uEngine assignmentthread/uEngine assignment memory allocation of queues (SRAM, DRAM, Scratch)memory allocation of queues (SRAM, DRAM, Scratch)

Opportunities for optimization (future work)Opportunities for optimization (future work) redundant memory loads/stores based on element/thread mappingredundant memory loads/stores based on element/thread mapping schemes for multiplexing hardware resources among multiple element instantiations (e.g. muxing TFIFO among 8 to schemes for multiplexing hardware resources among multiple element instantiations (e.g. muxing TFIFO among 8 to

Device’s)Device’s)

Page 22: Design Support for  Embedded Processors and Applications

22

Programming Models for IXP1200Programming Models for IXP1200

0

200

400

600

800

1000

1200

1400

Th

rou

gh

pu

t (M

b/s

)

Click Teja uEngineC ASM

Page 23: Design Support for  Embedded Processors and Applications

23

Productivity EstimatesProductivity Estimates ``First time’’ learning curve issues makes it difficult to compare the productivity of these approaches``First time’’ learning curve issues makes it difficult to compare the productivity of these approaches

Based on our experience, we estimate the following design times for implementing an IPv4 routerBased on our experience, we estimate the following design times for implementing an IPv4 router

Time to functional correctnessTime to functional correctness Additional time for Additional time for

performance tuningperformance tuning

ASMASM 8 weeks8 weeks 8 weeks8 weeks

uCuC 4 weeks4 weeks 6 weeks6 weeks

TejaTeja 2 weeks2 weeks 3-4 weeks3-4 weeks

NPClickNPClick 2 days2 days 2 weeks2 weeks

The advantages with Teja and NPClick come from the ability to perform design-space The advantages with Teja and NPClick come from the ability to perform design-space

exploration at a higher levelexploration at a higher level

Page 24: Design Support for  Embedded Processors and Applications

24

Conclusions: Programming Embedded SystemsConclusions: Programming Embedded Systems Neither ASICs or general-purpose processors will fill the needs of most embedded system Neither ASICs or general-purpose processors will fill the needs of most embedded system

applicationsapplications

System design teams will increasingly choose ASIPs/programmable platformsSystem design teams will increasingly choose ASIPs/programmable platforms

Programming these devices is a new challenge:Programming these devices is a new challenge: ParallelismParallelism

ProcessProcess OperatorOperator Bit/gate levelBit/gate level

Special-purpose execution unitsSpecial-purpose execution units

Need to develop matches between application development environments and programming Need to develop matches between application development environments and programming models of ASIPs/programmable platformsmodels of ASIPs/programmable platforms

Match must consider:Match must consider: EfficiencyEfficiency ProductivityProductivity RobustnessRobustness ReliabiltyReliabilty