architecture tuning in embedded systems greg stitt, frank vahid, tony givargis dept. of computer...
Post on 21-Dec-2015
221 views
TRANSCRIPT
![Page 1: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/1.jpg)
Architecture Tuning in Embedded Systems
Greg Stitt, Frank Vahid, Tony Givargis
Dept. of Computer Science & EngineeringUniversity of California, Riverside
Roman LyseckyDepartment of IP
Management Conexant
Newport Beach
This work was supported by the National Science Foundation under grants CCR-9811164 and CCR-9876006, and by a Design Automation
Conference graduate scholarship.
This work is being presented at CASES’00 (Compilers, Architectures and Synthesis for Embedded Systems), November 18-19, 2000, San
Jose, CA.
![Page 2: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/2.jpg)
A “short list” of embedded systems
And the list goes on and on
Anti-lock brakesAuto-focus camerasAutomatic teller machinesAutomatic toll systemsAutomatic transmissionAvionic systemsBattery chargersCamcordersCell phonesCell-phone base stationsCordless phonesCruise controlCurbside check-in systemsDigital camerasDisk drivesElectronic card readersElectronic instrumentsElectronic toys/gamesFactory controlFax machinesFingerprint identifiersHome security systemsLife-support systemsMedical testing systems
ModemsMPEG decodersNetwork cardsNetwork switches/routersOn-board navigationPagersPhotocopiersPoint-of-sale systemsPortable video gamesPrintersSatellite phonesScannersSmart ovens/dishwashersSpeech recognizersStereo systemsTeleconferencing systemsTelevisionsTemperature controllersTheft tracking systemsTV set-top boxesVCR’s, DVD playersVideo game consolesVideo phonesWashers and dryers
![Page 3: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/3.jpg)
Introduction: Traditional micro-processor use in embedded
systems Tasks (not necessarily in the given order)
(1) Buy a microprocessor IC (integrated circuit) (2) Integrate it with other IC’s onto a board and insert
it into an embedded system (3) Download a software program
Processor
Software
1 2 3
Notice that the processor IC is designed independent of the software Different microprocessor variations thus exist, like low-
power or high-performance IC’s
Board
![Page 4: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/4.jpg)
Introduction: Modern core-based approach
Tasks (1) Buy a microprocessor CORE
Hard: layout; Firm: structural HDL; Soft: synthesizable HDL You are buying Intellectual Property, like a file that may come on a floppy, CD-ROM, over the
web, etc. You are NOT buying hardware. (2) Design a system-on-a-chip (SOC) from this and other cores (3) Fabricate a SOC IC (4) Insert the IC into an embedded system (5) Download a software program
Software
1 4 5
ProcessorProcessor
HDLHDL 2 3
![Page 5: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/5.jpg)
Introduction: embedded system unique feature of fixed program
SOC’s implementing an embedded system have a unique feature Implements a particular application Thus, the processor may execute a single
fixed program that never changes Unlike desktop systems, which execute a
variety of programs Examples: digital camera, automobile
cruise-controller
We can exploit this fixed-program feature For example, by using mask-programmed
ROM But much more can be done
The software in here never changes
after production
![Page 6: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/6.jpg)
Introduction: Proposed core-based approach with architecture tuning
Tasks (1) Buy a microprocessor core (2) Design a system-on-a-chip (SOC) from this and other cores (3) TUNE the SOC architecture to a software program (4) Fabricate a SOC IC (5) Insert the IC into an embedded system (6) Download the software program
Software
1
4 5
Processor Processor
HDLHDL 2 3
Processor
HDL 6
![Page 7: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/7.jpg)
Introduction: architecture tuning
Architecture tuning A way to exploit the fixed-
program feature of embedded systems
First, do architecture design for the particular application
Then, “tune” the core-based system architecture to the particular application program, before IC fabrication
Goals: better performance, power, size
Core libraryPeripheralA
PeripheralB
ProcessorX
Peripheral Prog.
Processor
Architecture design
Architecture tuning
Prog.
Processor
Peripheral
Prog.
Processor
Peripheral
Fixed program
Fabrication
HDL
HDL
IC
Tuned cores
![Page 8: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/8.jpg)
Introduction: architecture tuning
Examples of tuning optimizations Memory hierarchy: no cache, L1 cache, L1+L2 cache Cache organization: size, associativity, write policies Bus structure, data/address encoding DMA block sizes Microprocessor optimizations
Internal small-loop table Controller partitioning Datapath shortcuts Register file copies
![Page 9: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/9.jpg)
Introduction: Tuning is a special case of Y-Chart iteration
Philips/TriMedia approach of simultaneously developing architecture and its applications
Architecture Applications
Numbers
Mapping
Analysis
Our focus
![Page 10: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/10.jpg)
Problem description
Focus of this work: Tuning a microcontroller to its program Goal is reduced power without performance loss
Restrict tuning to maintain exact instruction set compatibility No instructions may be added or deleted Thus, no modification to software development environment Also, no problems with porting software to/from other
versions of the microcontroller Instruction set incompatibility can be a show stopper
Maintenance/upgrades/re-porting of binaries over the lifetime of product and for product variations is a key issue
Likewise, a stable software development environment is needed
![Page 11: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/11.jpg)
Previous work
Application-specific instruction-set processors [Fisher99] Customize a microprocessor to its application(s)
Delete unnecessary instructions, add new ones along with accompanying datapath extensions
e.g., Tensilica Customized instruction-set requires customized
development tools (e.g., compiler, debugger)
Tuning compiler to architecture [Tiwari et al 94] Architectural description languages to inform compiler
of architecture features [Halambi et al 99]
Tuning cache and cache/bus [Givargis et al 99] organization to application
![Page 12: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/12.jpg)
Tuning environment
Currently for the 8051 microcontroller Starts from VHDL synthesizable model of 8051 (soft core) Uses Synopsys synthesis, simulation and power analysis Uses 8051 instruction-set simulator Uses numerous scripts
Goal of the enviroment Understand how power is being consumed for a particular
application, so that modifications to the architecture (or application) can be made to minimize that power
Three main tools Architectural view Instruction-set view Program/data memory view
![Page 13: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/13.jpg)
Tuning environment: architectural view tool
Microprocessor structure
Program binary
ROM generator
ROM entity
Simulator and power analyzer
“Flat” power data
Structural hierarchical power data translator and xdu display
Microprocessor soft core
RT-synthesizer
ROM1.04 mW
ALU1.62 mW
RAM1.42 mW
CTRL2.69 mW
DECODER0.07 mW
Total7.66 mW
![Page 14: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/14.jpg)
Tuning environment: instruction-set view tool
Flat power data for instruction 3
Flat power data for instruction 2
Binaries to exe instruction 3
Binaries to exer instruction 2
Microprocessor structure
Binaries to exercise instruction 1
ROM generator
ROM entity
Simulator and power analyzer
Flat power data for instruction 1
Power data collector, structural power data translator, and xdu display
Instruction Power (mW)ADDC_1 7.340834ADD_1 7.350741ANL_1 6.631394CLR_1 3.76228CPL_1 5.481627DA 5.28897DEC_1 5.368807DIV 7.716592INC_1 4.662862MOVC_1 6.078014MOVC_2 5.021021MOV_1 5.577664MOV_2 6.164267MUL 5.522886NOP 4.900275ORL_1 6.954121POP 8.103867PUSH 8.7116
![Page 15: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/15.jpg)
Tuning environment: program/data memory view tool
Program binary
Instruction-set simulator
Per-instruction power data (from previous
tool)
Program hierarchy power translator and xdu display
Program/data memory access frequencies and power
Addr Ins Freq Pwr Freq*Pwr00000 LJMP 1 0 000003 MOV_9 108 5.46067 589.75200005 MOV_9 108 5.46067 589.75200007 MOV_9 108 5.46067 589.75200009 MOV_9 108 5.46067 589.75200011 RET 108 0 000012 MOV_9 27 5.46067 147.43800014 MOV_9 27 5.46067 147.43800016 MOV_9 27 5.46067 147.43800018 MOV_9 27 5.46067 147.43800020 MOV_4 27 4.83507 130.54700022 LCALL 27 0 0
Addr Purpose Accesses00128 P0 131100129 SP 7031700130 DPL 3118900131 DPH 797700144 P1 16100208 PSW 41352700224 ACC 36094900240 B 2598
![Page 16: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/16.jpg)
Tuning environment
Program binary Microprocessor core
Program/data memory view
tool
(seconds)
Architectural view tool
(1 hour)
Instruction-set power view tool
(1 day)
Program power data
Architecture power data
Instruction-set power data
![Page 17: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/17.jpg)
Design flow using the tuning environment
Change application
DONE
Change architecture
Run program / data memory
view tool
Run architecture view tool
Run instruction-set
view tool
Satisfied?
Yes
No
![Page 18: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/18.jpg)
Experiments
Started with 8051 soft core in VHDL Tuning environment was used to
Examine where power consumption was occurring for a given application
Quickly evaluate the impact of tuning optimizations
These are early results, much more work remains
![Page 19: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/19.jpg)
Power consumption of the initial 8051 model
Power consumption Mainly due to switching
wires Any wire who’s value
changed (from 0 to 1) consumes power
Want to minimize switching 8051 power consumption
5 main components Controller, RAM, and ALU
are the most expensive components
These components have potential for general optimizations
Total Gates - 25854
Average power: 37.1824 mW
Power consumption of major components for inital model of 8051
22683
8555
808
10389
2381
CTR
RAM
DEC
ALU
ROM
![Page 20: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/20.jpg)
General optimizations made to the 8051
Prevent unnecessary switching on wires connecting to memories Wires connecting processor to memories are high
capacitance They were switching even when not being used So we inserted latches to hold the previous value, a
standard power-saving technique
Prevent unnecessary switching in decoder and ALU Again, by latching the inputs coming from the controller
Fetch instruction bytes only when needed Hold ROM output when not being read
![Page 21: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/21.jpg)
Power after general optimizations
Overall power reduction from 37.2 to 11.6 mW.
Total gates - 25951 % improvements
ROM 82.9% RAM 70.5% ALU 60.0% CTR 19.9%Average power: 11.6025 mW
Power consumption of major components after general optimizations
18167
2527
808
4159 406
CTR
RAM
DEC
ALU
ROM
0 10 20 30 40
1
History of Power Improvements
Optimized
Original
![Page 22: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/22.jpg)
Tuning optimizations
Sought to tune the microprocessor to a particular applicaton GCD (Greatest common divisor) computation
Tuning optimizations invoked 1) Replace frequently-accessed RAM locations by
internal registers 2) Create datapath shortcuts for most common
instructions 3) Partition the controller into a big controller and a
small controller, with the small one handling the most frequently-executed GCD instructions
![Page 23: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/23.jpg)
Sample tuning optimization
Observation RAM consumes much power Address 224 accessed frequently
Possible tuning optimization Replace this RAM location by a
register
Steps Modify VHDL model Run all three view tools
Results Power reduction: 7.67 to 7.27 mW RAM reduced from 1.42 to 0.8
mW, CTRL increased slightly
ROM1.04 mW
ALU1.62 mW
RAM1.42 mW
CTRL2.69 mW
DECODER0.07 mW
Total7.66 mW
Addr Purpose Accesses00128 P0 131100129 SP 7031700130 DPL 3118900131 DPH 797700144 P1 16100208 PSW 41352700224 ACC 36094900240 B 2598
![Page 24: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/24.jpg)
Replacing certain RAM locations by registers
PSW and accumulator are separated from RAM entity, placed in internal registers
Total gates - 26465 % improvements
RAM 46.1% Overall 15.8%Average Power: 9.7684 mW
Power consumption of GCD after removing PSW and accumulator from RAM entity
18170
1361
8084192 406
CTR
RAM
DEC
ALU
ROM
0 10 20 30 40
1
History of Power Improvements
ACC/PSW
Optimized
Original
![Page 25: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/25.jpg)
Optimized datapath
MOV from reg7 to ACC very common
Add “shortcut” signal to register file
Avoids having data go through ALU
Total Gates - 26315 Power reduced by 0.32 mW
(2.7%)
Power consumption of GCD after datapath optimization
20340
2672
7624134 394
CTR
RAM
DEC
ALU
ROM
Average power: 11.2857 mW
0 10 20 30 40
1
History of Power Improvements
Datapath
ACC/PSW
Optimized
Original
Addr Ins Freq Pwr Freq*Pwr00000 LJMP 1 0 000003 MOV_9 108 5.46067 589.75200005 MOV_9 108 5.46067 589.75200007 MOV_9 108 5.46067 589.75200009 MOV_9 108 5.46067 589.75200011 RET 108 0 000012 MOV_9 27 5.46067 147.43800014 MOV_9 27 5.46067 147.43800016 MOV_9 27 5.46067 147.43800018 MOV_9 27 5.46067 147.43800020 MOV_4 27 4.83507 130.54700022 LCALL 27 0 0
![Page 26: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/26.jpg)
Controller Partitioning
Motivation In many applications, 90% of the time is spent in 10%
of the code (or some similar ratio) So let’s partition the controller into two, one handling
the 10% of frequently executed code This smaller controller should consume less power
Results Average power reduced from 11.6 mW to 11.3 mW
(2.6%) Total gates - 28731
![Page 27: Architecture Tuning in Embedded Systems Greg Stitt, Frank Vahid, Tony Givargis Dept. of Computer Science & Engineering University of California, Riverside](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d615503460f94a43b28/html5/thumbnails/27.jpg)
Conclusions
Described an environment for tuning a microprocessor to its application for low power Full instruction set compatibility Multiple views helps find power hogs Fully automated
Focus is now on developing tuning optimizations Controller partitioning, small-loop table, datapath
shortcuts, register-file copies, etc. Investigate possibility of automating tuning optimizations,
develop more general tuning methodology
Environment for the 8051 is available on the web: http://www.cs.ucr.edu/~dalton