techniques for optimizing performance and energy consumption

33
Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform BL Standard IC’s, PL Microcontrollers October 2007

Upload: others

Post on 11-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Techniques for Optimizing Performance and Energy Consumption

Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform

BL Standard IC’s, PL MicrocontrollersOctober 2007

Page 2: Techniques for Optimizing Performance and Energy Consumption

2LPC247x Feb 2007

Outline

LPC3180 Description

What makes this low power

Measurements using EEMBC energy bench

Page 3: Techniques for Optimizing Performance and Energy Consumption

3LPC247x Feb 2007

LPC3180ARM9-based microcontroller built on 90nm

– ARM926EJ-S CPU core– Separate 32Kbyte instruction and data caches– Vector Floating Point (VFP9) coprocessor

Operating range– 13MHz to 20MHz at 0.9 V– 20 MHz to 208 MHz at 1.1 V.

Page 4: Techniques for Optimizing Performance and Energy Consumption

4LPC247x Feb 2007

LPC3180 Block diagram

Page 5: Techniques for Optimizing Performance and Energy Consumption

5LPC247x Feb 2007

What makes the LPC3180 Low powerIntrinsic features

90nm low power process

Minimized switching losses– 0.5*Cload*V2*Fclk

Low voltage operation 1.1-1.3 volts

Architectural Clock gating

Minimized bridge latency

Memories are not clocked until accessed

VDD

PMOS

NMOSC Load

Charge/discharge current

Cross conduction current

Page 6: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Vector Floating Point Unit

Fully compliant with ANSI/IEEE STD 754-1985Coprocessor provides full support for single-precision and double precision add, subtract, multiply, divide, and multiply accumulate operations.No assembly neededFloating-point libraries and compile options

– ARM Real View Developer Suite (RVDS)– ARM Developer Suite (ADS)– Real View Developer Kit for NXP– IAR Embedded Workbench for ARM– GCC

Page 7: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Benefits of VFP

Turned on/off via software through a control registerWith the clocks disabled, no dynamic power consumedMany clock cycles saved

– Increases performance by a factor of about 5 with an approximately 14% increase in power consumption

VFP9 is fully clock gated.

Many microcontrollers don’t have a HW floating-point– User is forced to emulate the instructions using special software libraries

that require significantly more processor cycles and power consumption

Page 8: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

External Memory Interface Features

Memory subsystem and software design choices can have significant impact on power consumption

– Memory types– Code partitioning– Use of system features that save power

LPC3180 external memory interface support– DDR and SDR SDRAM– Single-level and multi-level NAND flash devices (although flash not

be used for the case study being presented here)

Page 9: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Typical Power Consumption of Memory

Memory Type

Size (Mb)

Bit Width

Frequency (MHz)

Voltage (V)

Current (mA)

Power (mW)

SDRAM 128 16 133 3.3 150 495

Mobile SDRAM

128 16 125 1.8 50 90

DDR 128 16 133 2.6 110 143*

Mobile SDRAM

128 16 133 1.8 80 72*

Demonstrating relative comparison

Actual system performance varies based on how memories are used

Doesn’t include power dissipated in processor’s pads due to capacitive loading from board layout and memories

Page 10: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Benefits of Internal Memory

Most microcontrollers have internal SRAM and flash memory– Consumes much lower power than external

LPC3180 has 64K SRAM that runs at half the processor frequency– 72 µW/MHz, which is 7.5 mA at 104 MHz of constant access– Interfaces to internal memories have automatic clock gating and are

only clocked when an access occurs

Run code in internal SRAM when possible– Partition code so frequently active processes are located in the

internal memory bank and seldom used routines are placed in external memory

Page 11: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Phytec Core Module Housing LPC3180

Board instrumented to measure current and voltage for each supply input

Phytec core modules have jumpers to the processor where series resistors can be inserted to measure the current

Page 12: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Power Take Offs for V and I Measurements

Boards designed to connect 0.1 inch header pins and include a mini USB cable connection

– For connecting to NI DAQ USB-6251

Mini USB cables– Two conductors for current

sense– Two conductors for voltage

sense

Page 13: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

The Standard

Benchmark suites targeting several application areas– Automotive: Powertrain, industrial, general purpose– Consumer: Digital imaging (printers, digital cameras)– Digital entertainment: Multimedia– Java: Mobile phones– Networking: Routing and testing network packets– Office Automation: Text and image processing for printers– Telecom: Modem and xDSL related algorithms

Page 14: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

EEMBC Energy Methodology

Consortium work over 2005-2006

Applies to all EEMBC benchmarks– Ties performance with energy consumption

Specified for silicon devices which can be certified under current procedures

– Specific device information is disclosed according to EEMBC rules

Non-intrusive methodology

Page 15: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Available Data

Two figures of merit– Maximum power consumption– Typical power consumption

Maximum power consumption used in system design

Typical power is more relevant to battery life, operating costs, heat dispersion, etc.

“Typical power” doing what, however?

Page 16: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Challenges of Hardware-Based Power Measurements

What system components to measure?– CPU core– Caches– Integrated peripheral controllers

Which benchmarks to use?– Does it matter?

How to measure?– Equipment– Time consuming– Sensitive to environment

Page 17: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

The Importance of Benchmark Variety

Processors are complex enough that the power can change based onbenchmark

Processors have multiple resources affected by different benchmarks and even different datasets

Even the coding of the instructions and register selection may affect power

The following slides present sample academic information that was used while creating the methodology

Page 18: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

EEMBC Methodology Highlights

Calculate average energy per iteration

Benchmark and workload specific

Use affordable hardware (NI DAQ)– Multiple unaliased sampling frequencies– Adaptive statistical process

Specifies ambient temperature to avoid need of complex hardware.– Alternatives considered:

• Measure case temperature• Measure junction temperature

Page 19: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

EnergyBench Test Conditions

Connect to the various power planes– Core– IO

Consistent 5% variance between runs– Resistor contributes 1%– DAQ board contributes 1%

Maintain ambient room temperature (70 degrees F)– Vendor must disclose cooling method (e.g. heat-sink dimensions,

or fan model)Warm up target device for 30 minutes

Page 20: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Power Measurement Procedure

Sample over the same workload multiple times to achieve statistical confidence

Sample at multiple unaliased frequencies for consistent procedure using affordable hardware

Calculate average power (using RMS) and energy (per benchmark iteration)

Repeat process with more benchmark iterations if Std. Deviation is too big (5%)

Page 21: Techniques for Optimizing Performance and Energy Consumption

EEMBC Methodology

LPC247x Feb 2007

System Warm up

Run Benchmarkand sample V,I

Calculate Avg. Energy/It

Std Dev Too Big?Or

Energy from 2 freq differs too much?

2 unaliasedfrequencies

increasesamples

changefrequencies

Done

yes

no

Page 22: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Requires Special EEMBC Implementation

Test Harness modified

Partnership with National Instruments– Use LabVIEW and inexpensive DAQ board

Allows simultaneous measurement of performance and power– Energy calculated per benchmark

Use NI tools and hardware– NI DAQ can be easily controlled– Precompiled LabVIEW software presents consistent GUI and avoids user

error

Page 23: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

NI LabVIEW Used for Display and Analysis

Page 24: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

NI LabVIEW Used for Display and Analysis

Page 25: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Computing the Energy Consumption

Computing energy per iteration:

Samples / Iteration = Sampling Freq. / (iterations/sec)

Power RMS = RMS (power samples for each iteration)

Energy for each iteration = POWER RMS * (seconds/iteration)

For final result – report average of all energy values calculated

Page 26: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Computing the Energy Consumption

Published result is average energy consumption for one iteration of the workload

Use confidence intervals to validate power measurements

Sampling will not catch all spikes, but maximum and minimum readings will also be reported

Page 27: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Results (Power)

basefp01 power

0.0020.0040.0060.0080.00

100.00120.00140.00160.00

no fp, no ic no fp, ic on fp on, no ic fp on, ic on

setup

pow

er

13MHz,0.9V

13MHz,1.2V

52MHz

104MHz

208MHz

Page 28: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Results (Performance)

basefp01 iter/s

0

5000

10000

15000

20000

25000

30000

35000

no fp, no ic no fp, ic on fp on, no ic fp on, ic on

setup

iter/s

13MHz,0.9V

13MHz,1.2V

52MHz

104MHz

208MHz

Page 29: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

Results (Energy)

basefp01 energy

0.00

5.00

10.00

15.00

20.00

25.00

no fp, no ic no fp, ic on fp on, no ic fp on, ic on

setup

estim

ated

ene

rgy

per

itera

tion

13MHz,0.9V

13MHz,1.2V

52MHz

104MHz

208MHz

Page 30: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

LPC3000 ARM926EJ Road Map

LPC3180

Production In develop.

LPC3190

LPC3220

LPC3240LPC3250

LPC3230

Adeneo WinCE 6.0 BSP

Wind River Linux BSP

VoIP ref design

Soft Modem ref design

Full motion video graphics support

Wireless Kiosk display ref design

LPC3180/01

LPC3xxx

Func

tiona

lity

Page 31: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

LPC3000 Product PortfolioARM926EJ-S Core

90nm low-power process, operation down to 0.9 V Ultra Low Power Mode

Vector Floating Point Co-Processor

Integrated Java Byte-Code Co-Processor

0.9 V Ultra Low-Power Mode

DMA, 32KB D-Cache, 32KB I-Cache, SPI, I2C (2), UART (7), IrDA, 10-b ADC

289 TFBGA – 15 x 15 x0.7 with 0.8mm ball pitch

320 LFBGA – 13 x 13 x 0.9 with 0.5mm ball pitch

Optional, contact marketing 289 TFBGA – 12 x 12 x 0.7 with 0.65mm ball pitch

Page 32: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007

LPC3180/01 vs LPC3180

Same Advanced Low Power 90nm CMOS process– LPC3180 is the predecessor manufactured at Crolles2– LPC3180/01 is manufactured at TSMC

Same Pinout and Package

LPC3180/01 improvement– I2C is now master, multi-master, and slave (instead of master only)– JTAG pull-up and pull-down fully meets IEEE specification– Improved voltage ranges

• Peripheral I/O domains:– More flexible voltage domains that can reduce the number of different supply voltages

– These are all extended to include 1.8V, 2.8V, or 3.0V

• External bus domain:– Now extended to include 2.8V and above

– Improved power-up state• 2 pins on each SPI now defaults to inputs (instead of outputs)

Page 33: Techniques for Optimizing Performance and Energy Consumption

LPC247x Feb 2007