architecture tuning in embedded systems greg stitt, frank vahid, tony givargis dept. of computer...

27
Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside Roman Lysecky Department of IP Management Conexant Newport Beach This work was supported by the National Science Foundation under grants CCR- 9811164 and CCR-9876006, and by a Design Automation Conference graduate scholarship. This work is being presented at CASES’00 (Compilers, Architectures and Synthesis for Embedded Systems), November 18-19, 2000, San Jose, CA.

Post on 21-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Architecture Tuning in Embedded Systems

Greg Stitt, Frank Vahid, Tony Givargis

Dept. of Computer Science & EngineeringUniversity of California, Riverside

Roman LyseckyDepartment of IP

Management Conexant

Newport Beach

This work was supported by the National Science Foundation under grants CCR-9811164 and CCR-9876006, and by a Design Automation

Conference graduate scholarship.

This work is being presented at CASES’00 (Compilers, Architectures and Synthesis for Embedded Systems), November 18-19, 2000, San

Jose, CA.

Page 2: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

A “short list” of embedded systems

And the list goes on and on

Anti-lock brakesAuto-focus camerasAutomatic teller machinesAutomatic toll systemsAutomatic transmissionAvionic systemsBattery chargersCamcordersCell phonesCell-phone base stationsCordless phonesCruise controlCurbside check-in systemsDigital camerasDisk drivesElectronic card readersElectronic instrumentsElectronic toys/gamesFactory controlFax machinesFingerprint identifiersHome security systemsLife-support systemsMedical testing systems

ModemsMPEG decodersNetwork cardsNetwork switches/routersOn-board navigationPagersPhotocopiersPoint-of-sale systemsPortable video gamesPrintersSatellite phonesScannersSmart ovens/dishwashersSpeech recognizersStereo systemsTeleconferencing systemsTelevisionsTemperature controllersTheft tracking systemsTV set-top boxesVCR’s, DVD playersVideo game consolesVideo phonesWashers and dryers

Page 3: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Introduction: Traditional micro-processor use in embedded

systems Tasks (not necessarily in the given order)

(1) Buy a microprocessor IC (integrated circuit) (2) Integrate it with other IC’s onto a board and insert

it into an embedded system (3) Download a software program

Processor

Software

1 2 3

Notice that the processor IC is designed independent of the software Different microprocessor variations thus exist, like low-

power or high-performance IC’s

Board

Page 4: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Introduction: Modern core-based approach

Tasks (1) Buy a microprocessor CORE

Hard: layout; Firm: structural HDL; Soft: synthesizable HDL You are buying Intellectual Property, like a file that may come on a floppy, CD-ROM, over the

web, etc. You are NOT buying hardware. (2) Design a system-on-a-chip (SOC) from this and other cores (3) Fabricate a SOC IC (4) Insert the IC into an embedded system (5) Download a software program

Software

1 4 5

ProcessorProcessor

HDLHDL 2 3

Page 5: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Introduction: embedded system unique feature of fixed program

SOC’s implementing an embedded system have a unique feature Implements a particular application Thus, the processor may execute a single

fixed program that never changes Unlike desktop systems, which execute a

variety of programs Examples: digital camera, automobile

cruise-controller

We can exploit this fixed-program feature For example, by using mask-programmed

ROM But much more can be done

The software in here never changes

after production

Page 6: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Introduction: Proposed core-based approach with architecture tuning

Tasks (1) Buy a microprocessor core (2) Design a system-on-a-chip (SOC) from this and other cores (3) TUNE the SOC architecture to a software program (4) Fabricate a SOC IC (5) Insert the IC into an embedded system (6) Download the software program

Software

1

4 5

Processor Processor

HDLHDL 2 3

Processor

HDL 6

Page 7: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Introduction: architecture tuning

Architecture tuning A way to exploit the fixed-

program feature of embedded systems

First, do architecture design for the particular application

Then, “tune” the core-based system architecture to the particular application program, before IC fabrication

Goals: better performance, power, size

Core libraryPeripheralA

PeripheralB

ProcessorX

Peripheral Prog.

Processor

Architecture design

Architecture tuning

Prog.

Processor

Peripheral

Prog.

Processor

Peripheral

Fixed program

Fabrication

HDL

HDL

IC

Tuned cores

Page 8: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Introduction: architecture tuning

Examples of tuning optimizations Memory hierarchy: no cache, L1 cache, L1+L2 cache Cache organization: size, associativity, write policies Bus structure, data/address encoding DMA block sizes Microprocessor optimizations

Internal small-loop table Controller partitioning Datapath shortcuts Register file copies

Page 9: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Introduction: Tuning is a special case of Y-Chart iteration

Philips/TriMedia approach of simultaneously developing architecture and its applications

Architecture Applications

Numbers

Mapping

Analysis

Our focus

Page 10: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Problem description

Focus of this work: Tuning a microcontroller to its program Goal is reduced power without performance loss

Restrict tuning to maintain exact instruction set compatibility No instructions may be added or deleted Thus, no modification to software development environment Also, no problems with porting software to/from other

versions of the microcontroller Instruction set incompatibility can be a show stopper

Maintenance/upgrades/re-porting of binaries over the lifetime of product and for product variations is a key issue

Likewise, a stable software development environment is needed

Page 11: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Previous work

Application-specific instruction-set processors [Fisher99] Customize a microprocessor to its application(s)

Delete unnecessary instructions, add new ones along with accompanying datapath extensions

e.g., Tensilica Customized instruction-set requires customized

development tools (e.g., compiler, debugger)

Tuning compiler to architecture [Tiwari et al 94] Architectural description languages to inform compiler

of architecture features [Halambi et al 99]

Tuning cache and cache/bus [Givargis et al 99] organization to application

Page 12: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Tuning environment

Currently for the 8051 microcontroller Starts from VHDL synthesizable model of 8051 (soft core) Uses Synopsys synthesis, simulation and power analysis Uses 8051 instruction-set simulator Uses numerous scripts

Goal of the enviroment Understand how power is being consumed for a particular

application, so that modifications to the architecture (or application) can be made to minimize that power

Three main tools Architectural view Instruction-set view Program/data memory view

Page 13: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Tuning environment: architectural view tool

Microprocessor structure

Program binary

ROM generator

ROM entity

Simulator and power analyzer

“Flat” power data

Structural hierarchical power data translator and xdu display

Microprocessor soft core

RT-synthesizer

ROM1.04 mW

ALU1.62 mW

RAM1.42 mW

CTRL2.69 mW

DECODER0.07 mW

Total7.66 mW

Page 14: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Tuning environment: instruction-set view tool

Flat power data for instruction 3

Flat power data for instruction 2

Binaries to exe instruction 3

Binaries to exer instruction 2

Microprocessor structure

Binaries to exercise instruction 1

ROM generator

ROM entity

Simulator and power analyzer

Flat power data for instruction 1

Power data collector, structural power data translator, and xdu display

Instruction Power (mW)ADDC_1 7.340834ADD_1 7.350741ANL_1 6.631394CLR_1 3.76228CPL_1 5.481627DA 5.28897DEC_1 5.368807DIV 7.716592INC_1 4.662862MOVC_1 6.078014MOVC_2 5.021021MOV_1 5.577664MOV_2 6.164267MUL 5.522886NOP 4.900275ORL_1 6.954121POP 8.103867PUSH 8.7116

Page 15: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Tuning environment: program/data memory view tool

Program binary

Instruction-set simulator

Per-instruction power data (from previous

tool)

Program hierarchy power translator and xdu display

Program/data memory access frequencies and power

Addr Ins Freq Pwr Freq*Pwr00000 LJMP 1 0 000003 MOV_9 108 5.46067 589.75200005 MOV_9 108 5.46067 589.75200007 MOV_9 108 5.46067 589.75200009 MOV_9 108 5.46067 589.75200011 RET 108 0 000012 MOV_9 27 5.46067 147.43800014 MOV_9 27 5.46067 147.43800016 MOV_9 27 5.46067 147.43800018 MOV_9 27 5.46067 147.43800020 MOV_4 27 4.83507 130.54700022 LCALL 27 0 0

Addr Purpose Accesses00128 P0 131100129 SP 7031700130 DPL 3118900131 DPH 797700144 P1 16100208 PSW 41352700224 ACC 36094900240 B 2598

Page 16: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Tuning environment

Program binary Microprocessor core

Program/data memory view

tool

(seconds)

Architectural view tool

(1 hour)

Instruction-set power view tool

(1 day)

Program power data

Architecture power data

Instruction-set power data

Page 17: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Design flow using the tuning environment

Change application

DONE

Change architecture

Run program / data memory

view tool

Run architecture view tool

Run instruction-set

view tool

Satisfied?

Yes

No

Page 18: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Experiments

Started with 8051 soft core in VHDL Tuning environment was used to

Examine where power consumption was occurring for a given application

Quickly evaluate the impact of tuning optimizations

These are early results, much more work remains

Page 19: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Power consumption of the initial 8051 model

Power consumption Mainly due to switching

wires Any wire who’s value

changed (from 0 to 1) consumes power

Want to minimize switching 8051 power consumption

5 main components Controller, RAM, and ALU

are the most expensive components

These components have potential for general optimizations

Total Gates - 25854

Average power: 37.1824 mW

Power consumption of major components for inital model of 8051

22683

8555

808

10389

2381

CTR

RAM

DEC

ALU

ROM

Page 20: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

General optimizations made to the 8051

Prevent unnecessary switching on wires connecting to memories Wires connecting processor to memories are high

capacitance They were switching even when not being used So we inserted latches to hold the previous value, a

standard power-saving technique

Prevent unnecessary switching in decoder and ALU Again, by latching the inputs coming from the controller

Fetch instruction bytes only when needed Hold ROM output when not being read

Page 21: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Power after general optimizations

Overall power reduction from 37.2 to 11.6 mW.

Total gates - 25951 % improvements

ROM 82.9% RAM 70.5% ALU 60.0% CTR 19.9%Average power: 11.6025 mW

Power consumption of major components after general optimizations

18167

2527

808

4159 406

CTR

RAM

DEC

ALU

ROM

0 10 20 30 40

1

History of Power Improvements

Optimized

Original

Page 22: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Tuning optimizations

Sought to tune the microprocessor to a particular applicaton GCD (Greatest common divisor) computation

Tuning optimizations invoked 1) Replace frequently-accessed RAM locations by

internal registers 2) Create datapath shortcuts for most common

instructions 3) Partition the controller into a big controller and a

small controller, with the small one handling the most frequently-executed GCD instructions

Page 23: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Sample tuning optimization

Observation RAM consumes much power Address 224 accessed frequently

Possible tuning optimization Replace this RAM location by a

register

Steps Modify VHDL model Run all three view tools

Results Power reduction: 7.67 to 7.27 mW RAM reduced from 1.42 to 0.8

mW, CTRL increased slightly

ROM1.04 mW

ALU1.62 mW

RAM1.42 mW

CTRL2.69 mW

DECODER0.07 mW

Total7.66 mW

Addr Purpose Accesses00128 P0 131100129 SP 7031700130 DPL 3118900131 DPH 797700144 P1 16100208 PSW 41352700224 ACC 36094900240 B 2598

Page 24: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Replacing certain RAM locations by registers

PSW and accumulator are separated from RAM entity, placed in internal registers

Total gates - 26465 % improvements

RAM 46.1% Overall 15.8%Average Power: 9.7684 mW

Power consumption of GCD after removing PSW and accumulator from RAM entity

18170

1361

8084192 406

CTR

RAM

DEC

ALU

ROM

0 10 20 30 40

1

History of Power Improvements

ACC/PSW

Optimized

Original

Page 25: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Optimized datapath

MOV from reg7 to ACC very common

Add “shortcut” signal to register file

Avoids having data go through ALU

Total Gates - 26315 Power reduced by 0.32 mW

(2.7%)

Power consumption of GCD after datapath optimization

20340

2672

7624134 394

CTR

RAM

DEC

ALU

ROM

Average power: 11.2857 mW

0 10 20 30 40

1

History of Power Improvements

Datapath

ACC/PSW

Optimized

Original

Addr Ins Freq Pwr Freq*Pwr00000 LJMP 1 0 000003 MOV_9 108 5.46067 589.75200005 MOV_9 108 5.46067 589.75200007 MOV_9 108 5.46067 589.75200009 MOV_9 108 5.46067 589.75200011 RET 108 0 000012 MOV_9 27 5.46067 147.43800014 MOV_9 27 5.46067 147.43800016 MOV_9 27 5.46067 147.43800018 MOV_9 27 5.46067 147.43800020 MOV_4 27 4.83507 130.54700022 LCALL 27 0 0

Page 26: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Controller Partitioning

Motivation In many applications, 90% of the time is spent in 10%

of the code (or some similar ratio) So let’s partition the controller into two, one handling

the 10% of frequently executed code This smaller controller should consume less power

Results Average power reduced from 11.6 mW to 11.3 mW

(2.6%) Total gates - 28731

Page 27: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside

Conclusions

Described an environment for tuning a microprocessor to its application for low power Full instruction set compatibility Multiple views helps find power hogs Fully automated

Focus is now on developing tuning optimizations Controller partitioning, small-loop table, datapath

shortcuts, register-file copies, etc. Investigate possibility of automating tuning optimizations,

develop more general tuning methodology

Environment for the 8051 is available on the web: http://www.cs.ucr.edu/~dalton