techniques for optimizing performance and energy consumption
TRANSCRIPT
Techniques for Optimizing Performance and Energy Consumption: Results of a Case Study on an ARM9 Platform
BL Standard IC’s, PL MicrocontrollersOctober 2007
2LPC247x Feb 2007
Outline
LPC3180 Description
What makes this low power
Measurements using EEMBC energy bench
3LPC247x Feb 2007
LPC3180ARM9-based microcontroller built on 90nm
– ARM926EJ-S CPU core– Separate 32Kbyte instruction and data caches– Vector Floating Point (VFP9) coprocessor
Operating range– 13MHz to 20MHz at 0.9 V– 20 MHz to 208 MHz at 1.1 V.
4LPC247x Feb 2007
LPC3180 Block diagram
5LPC247x Feb 2007
What makes the LPC3180 Low powerIntrinsic features
90nm low power process
Minimized switching losses– 0.5*Cload*V2*Fclk
Low voltage operation 1.1-1.3 volts
Architectural Clock gating
Minimized bridge latency
Memories are not clocked until accessed
VDD
PMOS
NMOSC Load
Charge/discharge current
Cross conduction current
LPC247x Feb 2007
Vector Floating Point Unit
Fully compliant with ANSI/IEEE STD 754-1985Coprocessor provides full support for single-precision and double precision add, subtract, multiply, divide, and multiply accumulate operations.No assembly neededFloating-point libraries and compile options
– ARM Real View Developer Suite (RVDS)– ARM Developer Suite (ADS)– Real View Developer Kit for NXP– IAR Embedded Workbench for ARM– GCC
LPC247x Feb 2007
Benefits of VFP
Turned on/off via software through a control registerWith the clocks disabled, no dynamic power consumedMany clock cycles saved
– Increases performance by a factor of about 5 with an approximately 14% increase in power consumption
VFP9 is fully clock gated.
Many microcontrollers don’t have a HW floating-point– User is forced to emulate the instructions using special software libraries
that require significantly more processor cycles and power consumption
LPC247x Feb 2007
External Memory Interface Features
Memory subsystem and software design choices can have significant impact on power consumption
– Memory types– Code partitioning– Use of system features that save power
LPC3180 external memory interface support– DDR and SDR SDRAM– Single-level and multi-level NAND flash devices (although flash not
be used for the case study being presented here)
LPC247x Feb 2007
Typical Power Consumption of Memory
Memory Type
Size (Mb)
Bit Width
Frequency (MHz)
Voltage (V)
Current (mA)
Power (mW)
SDRAM 128 16 133 3.3 150 495
Mobile SDRAM
128 16 125 1.8 50 90
DDR 128 16 133 2.6 110 143*
Mobile SDRAM
128 16 133 1.8 80 72*
Demonstrating relative comparison
Actual system performance varies based on how memories are used
Doesn’t include power dissipated in processor’s pads due to capacitive loading from board layout and memories
LPC247x Feb 2007
Benefits of Internal Memory
Most microcontrollers have internal SRAM and flash memory– Consumes much lower power than external
LPC3180 has 64K SRAM that runs at half the processor frequency– 72 µW/MHz, which is 7.5 mA at 104 MHz of constant access– Interfaces to internal memories have automatic clock gating and are
only clocked when an access occurs
Run code in internal SRAM when possible– Partition code so frequently active processes are located in the
internal memory bank and seldom used routines are placed in external memory
LPC247x Feb 2007
Phytec Core Module Housing LPC3180
Board instrumented to measure current and voltage for each supply input
Phytec core modules have jumpers to the processor where series resistors can be inserted to measure the current
LPC247x Feb 2007
Power Take Offs for V and I Measurements
Boards designed to connect 0.1 inch header pins and include a mini USB cable connection
– For connecting to NI DAQ USB-6251
Mini USB cables– Two conductors for current
sense– Two conductors for voltage
sense
LPC247x Feb 2007
The Standard
Benchmark suites targeting several application areas– Automotive: Powertrain, industrial, general purpose– Consumer: Digital imaging (printers, digital cameras)– Digital entertainment: Multimedia– Java: Mobile phones– Networking: Routing and testing network packets– Office Automation: Text and image processing for printers– Telecom: Modem and xDSL related algorithms
LPC247x Feb 2007
EEMBC Energy Methodology
Consortium work over 2005-2006
Applies to all EEMBC benchmarks– Ties performance with energy consumption
Specified for silicon devices which can be certified under current procedures
– Specific device information is disclosed according to EEMBC rules
Non-intrusive methodology
LPC247x Feb 2007
Available Data
Two figures of merit– Maximum power consumption– Typical power consumption
Maximum power consumption used in system design
Typical power is more relevant to battery life, operating costs, heat dispersion, etc.
“Typical power” doing what, however?
LPC247x Feb 2007
Challenges of Hardware-Based Power Measurements
What system components to measure?– CPU core– Caches– Integrated peripheral controllers
Which benchmarks to use?– Does it matter?
How to measure?– Equipment– Time consuming– Sensitive to environment
LPC247x Feb 2007
The Importance of Benchmark Variety
Processors are complex enough that the power can change based onbenchmark
Processors have multiple resources affected by different benchmarks and even different datasets
Even the coding of the instructions and register selection may affect power
The following slides present sample academic information that was used while creating the methodology
LPC247x Feb 2007
EEMBC Methodology Highlights
Calculate average energy per iteration
Benchmark and workload specific
Use affordable hardware (NI DAQ)– Multiple unaliased sampling frequencies– Adaptive statistical process
Specifies ambient temperature to avoid need of complex hardware.– Alternatives considered:
• Measure case temperature• Measure junction temperature
LPC247x Feb 2007
EnergyBench Test Conditions
Connect to the various power planes– Core– IO
Consistent 5% variance between runs– Resistor contributes 1%– DAQ board contributes 1%
Maintain ambient room temperature (70 degrees F)– Vendor must disclose cooling method (e.g. heat-sink dimensions,
or fan model)Warm up target device for 30 minutes
LPC247x Feb 2007
Power Measurement Procedure
Sample over the same workload multiple times to achieve statistical confidence
Sample at multiple unaliased frequencies for consistent procedure using affordable hardware
Calculate average power (using RMS) and energy (per benchmark iteration)
Repeat process with more benchmark iterations if Std. Deviation is too big (5%)
EEMBC Methodology
LPC247x Feb 2007
System Warm up
Run Benchmarkand sample V,I
Calculate Avg. Energy/It
Std Dev Too Big?Or
Energy from 2 freq differs too much?
2 unaliasedfrequencies
increasesamples
changefrequencies
Done
yes
no
LPC247x Feb 2007
Requires Special EEMBC Implementation
Test Harness modified
Partnership with National Instruments– Use LabVIEW and inexpensive DAQ board
Allows simultaneous measurement of performance and power– Energy calculated per benchmark
Use NI tools and hardware– NI DAQ can be easily controlled– Precompiled LabVIEW software presents consistent GUI and avoids user
error
LPC247x Feb 2007
NI LabVIEW Used for Display and Analysis
LPC247x Feb 2007
NI LabVIEW Used for Display and Analysis
LPC247x Feb 2007
Computing the Energy Consumption
Computing energy per iteration:
Samples / Iteration = Sampling Freq. / (iterations/sec)
Power RMS = RMS (power samples for each iteration)
Energy for each iteration = POWER RMS * (seconds/iteration)
For final result – report average of all energy values calculated
LPC247x Feb 2007
Computing the Energy Consumption
Published result is average energy consumption for one iteration of the workload
Use confidence intervals to validate power measurements
Sampling will not catch all spikes, but maximum and minimum readings will also be reported
LPC247x Feb 2007
Results (Power)
basefp01 power
0.0020.0040.0060.0080.00
100.00120.00140.00160.00
no fp, no ic no fp, ic on fp on, no ic fp on, ic on
setup
pow
er
13MHz,0.9V
13MHz,1.2V
52MHz
104MHz
208MHz
LPC247x Feb 2007
Results (Performance)
basefp01 iter/s
0
5000
10000
15000
20000
25000
30000
35000
no fp, no ic no fp, ic on fp on, no ic fp on, ic on
setup
iter/s
13MHz,0.9V
13MHz,1.2V
52MHz
104MHz
208MHz
LPC247x Feb 2007
Results (Energy)
basefp01 energy
0.00
5.00
10.00
15.00
20.00
25.00
no fp, no ic no fp, ic on fp on, no ic fp on, ic on
setup
estim
ated
ene
rgy
per
itera
tion
13MHz,0.9V
13MHz,1.2V
52MHz
104MHz
208MHz
LPC247x Feb 2007
LPC3000 ARM926EJ Road Map
LPC3180
Production In develop.
LPC3190
LPC3220
LPC3240LPC3250
LPC3230
Adeneo WinCE 6.0 BSP
Wind River Linux BSP
VoIP ref design
Soft Modem ref design
Full motion video graphics support
Wireless Kiosk display ref design
LPC3180/01
LPC3xxx
Func
tiona
lity
LPC247x Feb 2007
LPC3000 Product PortfolioARM926EJ-S Core
90nm low-power process, operation down to 0.9 V Ultra Low Power Mode
Vector Floating Point Co-Processor
Integrated Java Byte-Code Co-Processor
0.9 V Ultra Low-Power Mode
DMA, 32KB D-Cache, 32KB I-Cache, SPI, I2C (2), UART (7), IrDA, 10-b ADC
289 TFBGA – 15 x 15 x0.7 with 0.8mm ball pitch
320 LFBGA – 13 x 13 x 0.9 with 0.5mm ball pitch
Optional, contact marketing 289 TFBGA – 12 x 12 x 0.7 with 0.65mm ball pitch
LPC247x Feb 2007
LPC3180/01 vs LPC3180
Same Advanced Low Power 90nm CMOS process– LPC3180 is the predecessor manufactured at Crolles2– LPC3180/01 is manufactured at TSMC
Same Pinout and Package
LPC3180/01 improvement– I2C is now master, multi-master, and slave (instead of master only)– JTAG pull-up and pull-down fully meets IEEE specification– Improved voltage ranges
• Peripheral I/O domains:– More flexible voltage domains that can reduce the number of different supply voltages
– These are all extended to include 1.8V, 2.8V, or 3.0V
• External bus domain:– Now extended to include 2.8V and above
– Improved power-up state• 2 pins on each SPI now defaults to inputs (instead of outputs)
LPC247x Feb 2007