optimization of power reduction in fpga interconnect by charge recycling deepa soman, hyunsuk nam,...

39
OPTIMIZATION OF POWER REDUCTION IN FPGA INTERCONNECT BY CHARGE RECYCLING Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar

Upload: alexis-gibson

Post on 30-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

OPTIMIZATION OF POWER REDUCTION

IN FPGA INTERCONNECT BY

CHARGE RECYCLING

Deepa Soman, HyunSuk Nam, Rekha Srinivasaraghavan, Shashank Sivakumar

Agenda

Day 2 Power Reduction T

echniques (Conti) Charge Recycling Our Project Discussions

Day 1 Intro Power Consumpti

on Techniques Power Reduction T

echniques Discussions

Motivation Achilles’ Heel Logic flexibility & re-programmability -

longer wires (7-14 X) higher than asics

Introduction

Power Consumption

Dynamic Power -  power consumed while the inputs are active

Static power - power consumed even when there is no circuit activity !!!

fCVP dddynamic2

KTqV

DSdd

leakageddsub

th

eIV

IVP

0

.

Why Panic about Power?

Why Static Power??

Low Power Opportunities

Hardware Techniques

• Voltage Scaling Dual Vdd

• Frequency Scaling• Clock Gating

9

Voltage Scaling

Selecting core voltage based on performance requirements

How to Choose? – From Timing Analysis

Types: 1) Static Voltage Scaling 2) Dynamic Voltage Scaling

10

1. Static Voltage Scaling

Selected core voltage only Realized using on chip Low-Dropout

regulator(LDO) Voltage controlled by configuration bit

stream  0.8-V - minimum dynamic and leakage

power 1.0-V - overall highest performance

[1]"A FPGA Prototype Design Emphasis on Low Power Technique" Xu, Jian

1.0v

0.8v

LDO

11

2. Dynamic Voltage Scaling

Provides different voltage levels Realized using voltage controlling unit

Can be level shifter or DC-DC converter

DVS implementation

(LDMC – Logic Delay Measurement Unit) Delay error

 ”Dynamic Voltage Scaling for Commercial FPGAs”, C.T. Chow1, L.S.M. Tsui1, P.H.W.

12

Dual Supply Voltage (Vdd)

Separate voltage supplies for configuration SRAM and other elements

Purpose: To support sleep mode Shutdown most logic except SRAM using

LDO

“A Dual-VDD Low Power FPGA Architecture” A. Gayasen1, K. Lee1, N. Vijaykrishnan1, M. Kandemir1, M.J. Irwin1, and T. Tuan2

13

Performance

Static voltage scaling techniques leads to nearly 53% power reduction. Dynamic(upto 54%). Dual Vdd- 14%

Merits: SVS - Simple hardware DVS - Self adaptive Dual Vdd – eliminate speed penalty

Demerits: SVS - Voltage is fixed DVS - design complexity Dual Vdd - area overhead

[1]"A FPGA Prototype Design Emphasis on Low Power Technique" Xu, Jian[2]”A 90-nm Low-Power FPGA for Battery-Powered Applications”,Tuan, Das, Steve, Sean

14

Frequency Scaling

fCVP dddynamic2 f : frequency of switching

Dynamic Clock Management Implementations

(a)Simple dynamic clock management circuit

(b) Using Feedback, PLL circuit can reduce skew; lock time

(c) dynamic clock division

Merits:• Can subsequently reduce voltageDemerits:• Increased Latency

15

Benefits of Frequency Scaling

As frequency decreases, power consumption also decreases

"Dynamic Clock Management for Low Power Applications in FPGAs", Lan, zilic

16

Clock Gating

Controlling the clock flow Purpose: To temporarily disable blocks Can be realized in hardware using clock enable

signals minimizes power dissipation in clock

circuits/network

17

Clock Gating - Performance

industry-a,b,c,d, are DSP circuits, while the remaining circuits are collected from customers and are of unknown function

Over 20% power reductions are observed for the DSP circuits

Clock Power Reduction for Virtex-5 FPGAs

Eliminates unnecessary toggling on outputs, gates of FFs and clock signals

Demerits:Clock skew

"Clock Power Reduction for Virtex-5 FPGAs",Wang, Gupta, Anderson

A

• System Level: • Algorithm

Modification• CAD Tools :

• Logic Partitioning

• Mapping,• Clustering • Placement &

Routing

Software

Techniques

Low Power FFT Implementation Architecture

Matrix multiplication ->1D array low power dissipation than 2D array

Module Disabling – Clock gating to disable modules eg: twiddle factor calculation

dynamic memory activation Multiple time multiplexed Pipeline uP Parallel Processing Algorithm : Block Matrix Multiplication

FFT implementation Results

17% to 26% power reduction

"High throughput energy efficient multi-FFTarchitecture on FPGAs" , Chen , Park, Prasanna

21

Energy Reduction Contributions of CAD Stages

Clustering contributes to the major share !

"On the interaction between power aware FPGA CAD algorithms" , Julien , Steven

Power Aware Clustering

Power Aware TV pack How?? Cost function Modification to include

power

Results: Power Aware clustering

“Netlength Based Routability Driven Power Aware Clustering" , Akoglu, Easwaran

Power Aware Placement

Results

"On the interaction between power aware FPGA CAD algorithms" , Julien , Steven

Temperature Aware Routing

leakage current increases exponentially with temperature

Switching capacitance

27

Algorithm

By discouraging routing algorithm to form connections that cross hotspot regions

Cost Function Modification:

Power Savings Range between 30 – 63 %

"A Temperature-Aware Placement and Routing targeting 3D FPGAs", Kostas, Soudris

Power-Aware FPGA Design Flow

Step 2• Power Aware

Packing• or Clustering

CAD• Power Aware

Placement

Tools• Power Aware

Routing

Step 1

• Power Based Architectural

• (High level modelling)

RTL

• Voltage scaling, Dual Vdd

• Freq Scaling, Clock gating

Main/Baseline Paper

Problem Addressed Power consumption in FPGAs

is dominated by interconnect(62%)

Proposed idea Charge recycling for

power reduction in FPGA interconnect

Charge Recycling (CR)

Charge Recycling in FPGAs

How?? “Unused routing resources “ as reservoirs

Reduces charge drawn from Vdd25% reduction in energy

1. 2. 3.

4.

5. 6. 7.

Unused/Reservoir

Unused/Reservoir

Unused w/o friends !!

CR-Capable FPGA Interconnect

Analysis Four components

SRAM Cell• Produce signals CR and TS :

control a switch (Normal, CR, tri-state )

Delay Line• Transition between VIN and

DLOUT

CR Circuit• Perform the charge sharing

between the load and reservoir Input Stage

Experiments/Methodology

VPR6.0

Baseline : Island style, Unidirectional, Wilton (K=6 ,N=4)

Router – Path Finder - Cost Function Modification

Post Routing CR mode

VPR place/route tool helps in finding % increase in area

VPR Cost Function

Cost Function – Path Finder

Modified Cost Function

Post - Routing

Mixed Integer Linear Program

Tries to maximize the number of nodes to be put into CR mode

Constraint: Critical delay of the circuit

Results

Dynamic power in the FPGA interconnect is reduced by up to ∼15-18.4%

Results Continued…

Number of min-width transistors as the area metric

Reductions in power savings are not directly proportional to the reduction in CR-capable switches (area)

What we propose new?

Not all unused wires become friends Unused wires connected to constant voltage

“URekha” --- Unused wires Tri-stated “further power savings!!”

~6% savings

Thank you