lecture 16: power reduction techniques november 5, 2013 ece 636 reconfigurable computing lecture 16...

45
Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

ECE 636

Reconfigurable Computing

Lecture 16

Power Reductions Techniques for FPGAs

Page 2: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Overview

• FPGAs generally considered power hungry compared to ASIC and processor counterparts

- Mostly due to unused interconnect

• Recent area of extensive research

• Device techniques

- Voltage scaling

- Sleep mode

• Software techniques

- Reduced switching

- Reduced capacitance

Page 3: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Dynamic Power

° Dynamic power is required to charge and discharge load capacitances when transistors switch.

° One cycle involves a rising and falling output.

° On rising output, charge Q = CVDD is required

° On falling output, charge is dumped to GND

Cfsw

iDD(t)

VDD

Courtesy: Harris

Short circuit current

Charge/discharge current

Page 4: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Dynamic Power

Cfsw

iDD(t)

VDD

dynamic

0

0

sw

2sw

1( )

( )

T

DD DD

TDD

DD

DDDD

DD

P i t V dtT

Vi t dt

T

VTf CV

T

CV f

Short circuit power <10% of dynamic power

Page 5: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

° Junction leakage

° Gate oxide leakage

° Subthreshold leakage

FPGA Static Power Consumption

Page 6: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

° Junction leakage• Small fraction of leakage

° Gate oxide leakage• When Vgs < Vt still some

source-drain current

• Increases exponentially as Vt decreases

• Decreases exponentially as Vgs decreases

° Subthreshold leakage• Increases exponentially as Vgs

increases

FPGA Static Power Consumption

Courtesy: NowakTechnology trend

Page 7: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

FPGA Power Reduction Goals

• Dynamic power goals

- Reduce Vdd along non-critical paths

- Low swing signalling

- Use CAD approaches to limit long high-toggle paths

- Pdynamic = 0.5 * C * Vdd2 * f

• Static power goals

- Cut-off Vdd for unused transistors

- Use high Vt transistors for SRAM cells

- Various other voltage biasing techniques

Page 8: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Traditional Routing Switch

S S S...

SRAM cell CONFIG

…..

i1i2i3i4

in

MU

XMUX

S

S

S

S

i1

i2

i3

i4

MP1

OUT

VINT

MP2

level-restoringbuffer

Courtesy: Anderson

Page 9: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Proposed Switch Designs: Anderson

° Based on 3 observations:• Routing switch inputs tolerant to

weak-1 signals (level-restoring buffers).

• Considerable slack in FPGA designs many switches can be slowed down.

• Most routing switches feed other routing switches.

- Can produce weak-1 logic signals.

Page 10: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

“Basic” Switch Design

high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF

MODEOPERATION:

OUT

VVD

~SLEEP LOW_POWER v SLEEP

VDD

GND GND

VDD

S S ...

SRAM cell CONFIG

…..

i1

i2

i3

i4

in

SMNX MPX

sLOW_POWER ~LOW_POWER

MUX

VVD

Page 11: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

OUT

VVD

~SLEEP LOW_POWER v SLEEP

VDD

GND GND

VDD

S S ...

SRAM cell CONFIG

…..

i1

i2

i3

i4

in

SMNX MPX

sLOW_POWER ~LOW_POWER

MUX

High-Speed Mode

high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF

MODEOPERATION:

output swing:rail-to-rail.

VVD = VDD

Page 12: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Low-Power Mode

high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF

MODEOPERATION:

output swing:GND-to-(VDD-VTH).

VVD = VDD - VTH

OUT

VVD

~SLEEP LOW_POWER v SLEEP

VDD

GND GND

VDD

S S ...

SRAM cell CONFIG

…..

i1

i2

i3

i4

in

SMNX MPX

sLOW_POWER ~LOW_POWER

MUX

VVD

output swing:GND-to-(VDD-VTH).

Page 13: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Sleep Mode

high-speed: MNX & MPX ONlow-power: MNX ON, MPX OFFsleep: MNX OFF, MPX OFF

MODEOPERATION:

OUT

VVD

~SLEEP LOW_POWER v SLEEP

VDD

GND GND

VDD

S S ...

SRAM cell CONFIG

…..

i1

i2

i3

i4

in

SMNX MPX

sLOW_POWER ~LOW_POWER

MUX

VVD

Page 14: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Leakage Power Results: Anderson

36

60.8

39.7 38.7

0.30

10

20

30

40

50

60

70

LP mode Sleep mode LP mode(+unused

fanout)

LP mode(+usedfanout)

Traditionalswitch

% le

akag

e p

ow

er

red

uct

ion

vs

. h

igh

-sp

eed

mo

de

Basic

Page 15: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Region Constrained Placement

• Rather than just focusing on routing, consider constraining logic

• Most circuits exhibit locality

• Gayasen: FPGA’2004

Page 16: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Region Constrained Placement

• Several issues to consider

• Size of sleep transistor

- Too large: increases leakage, area

- Too small: affects logic performance

• Size of region

- Too large: possibly unused resources, complicates placement

- Too small: Sleep transistors take up too much room

Page 17: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Experimental Flow: RCP

• Different region sizes considered for flow

• Area constraints for portions of design determined by hand

• May encourage designers to create granular designs

Page 18: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Power Savings: RCP

• Note significant reduction in leakage power savings as region size increases

• Bottom curve primarily due to luck

Page 19: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Performance Limitation: RCP

• Performance limited by use of regions

• Nearly 10% clock frequency reduction for many designs

Page 20: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Low-swing Signalling

• Techniques we have examined so far look at tinkering with supply voltage

• Also possible to modify wire signalling to reduce voltage swing

• Most of FPGA is made up of interconnect

• Approach targets dynamic power consumption

George and Rabaey: 1997

Page 21: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Low-swing Signalling

• Interconnect swing is at 0.8V while rest of circuit operates at 1.5V

• Cascode circuitry used at sink to overcome slow speed issues

• 50% energy savings at cost of 25% delay

Page 22: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Alternate approach: Modifying FPGA CAD

• FPGA architecture modification impact all designs- even those that don’t care about power

• Can placement and routing be modified to consider dynamic power

- Need to know which signals are high toggle

- Attempt to minimize length of high-toggle wires

- Minimize impact on performance and area

• Techniques fit well into our previous work on placement and routing

Lamoreaux and Wilton

Page 23: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Modifying FPGA CAD Placement

• Previous cost metrics for annealing considered bounding box wire length and timing costs

• Include additional term which considers signal switching activity

Page 24: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

FPGA Placement for Power

• Previous cost metrics for annealing considered bounding box wire length and timing costs

• Include additional term which considers signal switching activity

• Post-route energy reduced by 3.0%. Power decreased by 7% but delay increases by 4%

Page 25: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

FPGA Routing Modifications for Power

• Original routing cost function takes congestion b(n) and delay(n) into account

• Augment with factor that takes net activity into account

• Minimize length of most active nets, even in the presence of congestion.

Page 26: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

FPGA Routing for Power Results

• Potential benefits somewhat limited by placement

• Note that most nets have low activity

• Power is decreased by 6% but delay increased by 4%. Energy savings of about 3%

Page 27: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

FPGA Embedded Memory Blocks

° Embedded memory blocks (EMBs) are important parts of FPGAs

° Consume roughly 14% of Altera Stratix II dynamic power *

• Increasing in recent designs

* Stratix II Low Power Applications Note, 2005

Page 28: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Embedded Memory Block Port Internal View

Write Data

MClk

MClk

Write Enable

Column MuxWrite BuffersSense Amps

Row Decode

Read Data

ReadEnable Latch

AddressMClk

MClkClk Enable

Clk

RAM cell

BIT BIT

Bit LinePre-charge

MClk

Reducing clocking saves dynamic power

Page 29: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Power Optimization #1

° Convert EMB read enable/write enable signals to associated read/write clock enable signals

° Limitations

• Each port has read or write enable control signal

• Embedded memory block has read enable input

Clock

Wren

DataData

WriteAddress

ReadAddress

Q

Write enable

Read enable

Q

Rden

Vcc VccWr clkenable

Rd clkenable

WriteAddress

ReadAddress

Clock

Wren

DataData

WriteAddress

ReadAddress

Q

Write enable

Read enable

Q

Rden

Vcc Vcc

Wr clkenable

Rd clkenable

WriteAddress

ReadAddress

Before After

Page 30: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Implementation

° Conversion mode • Ties off R/W enable to RAM clock enables

• Doesn’t make transform if CE already present on port

° Combining mode

• AND user RAM clock enables with derived R/W clock

• Could impact performance

Combined Write Clk Enable

Write Enable

User-defined Write Clk Enable

Page 31: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

FPGA RAM Processing

° FIFOs and Shift registers converted into logical RAMs

° Logical RAMs mapped to RAM blocks

FIFO, Shift Register, RAM specification

Create Logical Memory

Logical RAMs/logic

Logical-to-physical

RAM processing

RAM blocks/ logic

Memory/logic

placement

Placed Memory

Page 32: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Mapping RAM to EMBs

° Implementation choice can impact design area, performance, and power.

° Some mappings may require multiple EMBs

4k deep x 4 wide

16K bits4K bits 4K bits 4K bits 4K bits

M4K M4K M4K M4K

User-defined (logical) memory

Physical (EMB) memory

512K MRAM

Page 33: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Memory Organization

° Each EMB can be configured to have different depth and width (e.g. Stratix II M4K)

° All hold 4K bits

° Slightly lower power consumption for wider EMB configurations (not including routing)

4K words deep

1 bit wide

32 bits wide

128 words deep

8 bits wide

512 words deep

Page 34: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Area and Delay Optimal Mapping

° Configure each EMB to be as deep as possible

° Number of address bits on each EMB same as on logical memory

° Area and performance efficient: no external logic needed

° Power inefficient: All EMBs must be active during each logical RAM access

4k words deep and 1 bit wide(4 times)

Addr[0:11]

Data[0:3]

4k words deep and 4 bits wide

Logical memory

4 EMBs active during access

EMB

Vertical Slicing

Page 35: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Alternative Mapping

° Configure EMB to have width of logical RAM (e.g. 1Kx4)

• Allows shutdown of some RAMs each cycle

• But adds some logic

° Saves RAM power, adds combinational logic and register power

More Power Efficient:

1K deep x 4 wide

(4 times)

1 EMB active during access

AddrDecoder

4

Addr[0:9]

Addr[10:11]

Data[0:3]

4k words deep and 4 bits wide

Logical memory

Addr[10:11]

Horizontal Slicing

Page 36: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

RAM Slicing - Example

° Power reduction available with different slicing

4kx32 Dynamic Power

0

20

40

60

80

100

120

140

Maximum Depth

Dyn

amic

Po

wer

(m

W)

Best range

Multiplexer Power Increasing

128 256 512 1k 2k 4k

EMB Power Increasing

Page 37: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Power Optimization #2: Power-aware RAM Partitioning

° Algorithm considers possible logical to physical RAM mappings

Completed placement

Insert Decode and Mux Logic

FIFO, Shift Register

Create Logical Memory

Power-aware Physical RAM

processing

Memory/Logic

Placement

Power Library

Page 38: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Experimental Approach

° 40 designs evaluated

° Quartus 5.1

° Mapped to smallest possible device and target max frequency

° Simulation with test vectors

° Power analysis with PowerPlay

Page 39: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Memory Power

° 21.0% average reduction for all techniques (9.7% with convert/combine)

-10

0

10

20

30

40

50

60

70

80

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Designs

% D

yn

Po

we

r R

ed

uc

tio

n

Enable convert/combine

Enable convert/combine + Mempartition

Page 40: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Overall Core Dynamic Power

° 6.8% average power reduction for all techniques (2.6% with convert/combine)

-5

0

5

10

15

20

25

30

35

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39

Designs

% D

yn.

Po

wer

Red

uct

ion

Enable convert/combine

Enable convert/combine + mempartition

Page 41: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Design Performance

° 1.0% average performance loss for all techniques (0.1% for enable convert/combine)

Average Design Clock Frequency

-30

-25

-20

-15

-10

-5

0

5

10

Designs

% F

req

uen

cy Im

pro

vem

ent

EnableConvert/Combine

EnableConvert/Combine +Mem Partition

Page 42: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Results Summary

° Almost 7% core dynamic power reduction across all designs

• Some designs benefit more than others

° Minimal clock frequency hit for most designs

Enable convert

Enable convert/ combine

Enable convert/

combine + Mem

partition

Core dynamic power -1.8% -2.6% -6.8%

Memory dynamic power -6.3% -9.7% -21.0%

Max clk freq -0.1% -0.2% -1.0%

LUT count 0.0% 0.1% 0.7%

Page 43: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Impact of Multiple Embedded Memory Blocks° Rerun 40 designs but only allow one type of target EMB for each

mapping

° All designs targeted to Stratix II EP2S180

° Significant power impact for most designs versus EP2S180 target with no restrictions

M512 M4K M-RAM

Designs completed 23 38 4

Core dynamic power 40.4% 6.6% 47.3%

Memory power 279.5% 33.3% 754.0%

Max clk freq. -2.2% 0.6% -1.0%

LUT count 0.4% -0.5% 0.0%

Page 44: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Summary

° Key to reducing RAM power is keeping clocks disabled.

° Movement of read/write enables to clock enables limits dynamic activity

° Power-aware RAM partitioner attempts to select power-optimal mapping – combined with clock enable enhancement

° Overall

• About 21% average memory power reduction

- 10% enable convert/combine

• About 7% average dynamic power reduction

- 3% enable convert/combine

• Diversity of EMBs reduces power by 33%

Page 45: Lecture 16: Power Reduction Techniques November 5, 2013 ECE 636 Reconfigurable Computing Lecture 16 Power Reductions Techniques for FPGAs

Lecture 16: Power Reduction Techniques November 5, 2013

Summary

• FPGA power consumption under consideration at numerous level: architecture, circuit, CAD, and physical

• FPGA companies just now embracing power-aware CAD, power-aware architectures on the way

• Many circuit-level techniques still possible

• RTL CAD synthesis techniques provide a promising area for exploration