energy-performance trade-offs in processor architecture...

50
Energy-Performance Trade-offs in Processor Architecture and Circuit Design: A Marginal Cost Analysis Omid Azizi Aqeel Mahesri, Ben Lee, Sanjay Patel, Mark Horowitz Stanford University, UIUC ISCA 2010 June 21, 2010

Upload: hathien

Post on 07-Mar-2018

215 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

Energy-Performance Trade-offs in Processor Architecture and Circuit Design:A Marginal Cost Analysis

Omid AziziAqeel Mahesri, Ben Lee, Sanjay Patel, Mark Horowitz

Stanford University, UIUC

ISCA 2010

June 21, 2010

Page 2: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

2

The Power Problem

Processor designs today are power-constrained

VDD has stopped scaling, so the problem will only get worse

Power Ceiling

Page 3: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

3

A New Era of Design

We have to be careful with power consumption in designs

Many design features offer performance, but come at a power cost

Question: How should you spend your power budget?

What design features are worth including?

How can we optimize designs for energy efficiency?

The New Design Objective: Design for Energy Efficiency

Page 4: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

4

The Energy-Performance Design Space

Every design can be plotted in the performance-energy space

We want designs on the energy-efficient frontier

Energy-Efficient

Frontier

Page 5: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

5

Optimizing for Energy Efficiency

Goal: Find the processors on the efficient frontier

Study: Consider a large part of the processor design space

High-level architectures

In-order vs out-of-order, single-issue vs dual-issue vs quad-issue, etc.

Micro-architectural design knobs

Cache sizes, pipeline depth, instruction window sizes, etc.

Circuit design

Gate sizing, circuit topology, circuit style, etc.

Page 6: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

6

Outline

Quick review of optimization and marginal costs

Experimental Methodology

Modeling approach for performance and power

Integrated architecture-circuit optimization framework

Results

Compare designs from a simple singe-issue in-order core…

…to an aggressive quad-issue out-of-order processor

Page 7: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

7

Marginal Costs & Optimization

Finding efficient designs is a trade-off analysis problem

A design feature usually affects both performance and energy

To gauge efficiency of design choices, we use marginal costs

Want those choices with the lowest cost per unit performance

If we know marginal costs, then we can optimize a design

“Buy” parameters with a low marginal cost, “sell” parameters with high cost

x

Px

E

P

Ex

ofCost Marginal

Energy cost of x

Performance benefit of x

Page 8: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

8

Current power modeling tools use fixed energy costs for circuits

But circuits can be designed in different ways

Trade-off: faster circuits require more energy, slower circuits save energy

For true optimization, we need circuit-aware architectural models

A Circuit-Aware Approach To Energy Modeling

D

E

D

E

D

E

D

E

D

EADDER MULTIPLIER REG FILE I-CACHE DECODER

Page 9: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

9

Example: Simple In-order Processor

I-CACHEREGISTER

FILE

P

C

NPC/

BRANCH

PRED

ADDER

MULT

FPADD

D-CACHE

QUEUEWRITE

BACK

How big should I make my I-cache?

How fast should I run it?

How fast should I run my multiplier?

D

ESIZE D

E

Page 10: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

10

Optimization Framework Overview

ADDER MULTIPLIER REG FILE I-CACHE

Simulate

Random

Designs

Benchmark

App(s)

Circuit

Tradeoffs

Library

Optimizer

(GP Solver)

Architecture

Circuit Link

Energy Budget

Optimized

Micro-

Architecture

D

E

D

E

D

E

D

E

Fit

Architecture

ModelMacro

Architecture

Page 11: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

11

Optimization Framework Overview

ADDER MULTIPLIER REG FILE I-CACHE

Simulate

Random

Designs

Benchmark

App(s)

Circuit

Tradeoffs

Library

Optimizer

(GP Solver)

Architecture

Circuit Link

Energy Budget

Optimized

Micro-

Architecture

D

E

D

E

D

E

D

E

Fit

Architecture

ModelMacro

Architecture

Step 1: Create Architectural Models

Use statistical inference to capture a large design space

Page 12: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

12

Statistical Performance Modeling

SimulatorArchitecture

Configuration

Performance

Data Point

Evaluate

Design

Design Optimization Loop

Simulator

Random

Architecture

Configurations

Analytical

Performance

Model

Evaluate

Design

Design Optimization Loop

Statistical

Inference

(Data Fit)

TRADITIONAL

PERFORMANCE MODELING & DESIGN OPTIMIZATION

STATISTICAL INFERENCE

PERFORMANCE MODELING & DESIGN OPTIMIZATION

Page 13: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

13

ADDER MULTIPLIER REG FILE I-CACHE

Simulate

Random

Designs

Benchmark

App(s)

Circuit

Tradeoffs

Library

Optimizer

(GP Solver)

Architecture

Circuit Link

Energy Budget

Optimized

Micro-

Architecture

D

E

D

E

D

E

D

E

Fit

Architecture

ModelMacro

Architecture

Step 2: Characterize Circuit Trade-offs

Optimization Framework Overview

Page 14: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

14

Optimization Framework Overview

ADDER MULTIPLIER REG FILE I-CACHE

Simulate

Random

Designs

Benchmark

App(s)

Circuit

Tradeoffs

Library

Optimizer

(GP Solver)

Architecture

Circuit Link

Energy Budget

Optimized

Micro-

Architecture

D

E

D

E

D

E

D

E

Fit

Architecture

ModelMacro

Architecture

Step 3: Integrate circuit trade-offs into architectural models

To create circuit-aware models

Page 15: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

15

Optimization Framework Overview

ADDER MULTIPLIER REG FILE I-CACHE

Simulate

Random

Designs

Benchmark

App(s)

Circuit

Tradeoffs

Library

Optimizer

(GP Solver)

Architecture

Circuit Link

Energy Budget

Optimized

Micro-

Architecture

D

E

D

E

D

E

D

E

Fit

Architecture

ModelMacro

Architecture

Step 4: Optimize

Use special mathematical models to enable convex optimization

Page 16: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

16

Experimental Setup

90nm CMOS technology

Static logic, except for SRAMs

Energy-delay trade-offs

Logic units: use synthesis tools

Large memories: use CACTI

Architectural Simulator

Joshua simulator from UIUC

Applications

SPECint

Let’s look at the design space without voltage first…

Page 17: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

17

Energy-Performance Tradeoff Space

Optimization of a dual-issue out-of-order processor

Significant performance-energy trade-off range as we tune underlying parameters

~3x energy

~6x performance

TSMC 90nm

1.2 V

Page 18: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

18

Energy-Performance Tradeoff Space

Optimization of a dual-issue out-of-order processor

Significant performance-energy trade-off range as we tune underlying parameters

~3x energy

~6x performance

TSMC 90nm

1.2 V

Clock Cycle: 18.6 FO4

Integer Unit: 1 cycle

I-cache: 32Kb @ 2 cycles

D-cache: 42Kb @ 1 cycle

Instr. Window Size: 8 entries

Clock Cycle: 19.0 FO4

Integer Unit: 1 cycle

I-cache: 32Kb @ 2.2 cycles

D-cache: 18Kb @ 1 cycle

Instr. Window Size: 9 entries

Clock Cycle: 28.4 FO4

Integer Unit: 1 cycle

I-cache: 32Kb @ 1.6 cycles

D-cache: 10Kb @ 1 cycle

Instr Window Size: 9 entries

Page 19: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

19

Exploring High-Level Architectures

2-issue

out-of-order

architecture

Page 20: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

20

Exploring High-Level Architectures

1-issue

In-order

architecture

Page 21: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

21

Exploring High-Level Architectures

2-issue

in-order

architecture

Page 22: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

22

Exploring High-Level Architectures

4-issue

in-order

architecture

Page 23: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

23

Exploring High-Level Architectures

1-issue

out-of-order

architecture

Page 24: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

24

Exploring High-Level Architectures

4-issue

out-of-order

architecture

Page 25: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

25

Exploring High-Level Architectures

1-issue

in-order

2-issue

in-order

2-issue

ooo

4-issue

ooo

Optimal

Architecture: 4-

in

1-issue out-of-order,

never efficient

Page 26: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

26

Voltage Scaling

Voltage is a powerful parameter

Just turn up the voltage a bit, and everything runs faster

So let’s add voltage scaling to the study now…

Page 27: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

27

Voltage Scaling

Voltage is a powerful parameter

Just turn up the voltage a bit, and everything runs faster

Voltage Range:

0.7V – 1.4V,

Normalized to 0.9V

~4x energy

~3x performance

Page 28: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

28

Optimization: It’s All About Marginal Costs

To optimize, you want the cheapest source of performance

Broadly, we consider two sources…

You can buy from or sell to either source (with no transaction/exchange fees)

Architecture &Circuit Design

VoltageScaling

Current Price: 6% Current Price: 1%

For 1% performance

Page 29: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

29

What the Vendors are Offering:Energy-Performance Cost Profiles

VoltageScaling

Current Price: 1%

Architecture &Circuit Design

Current Price: 5%

Page 30: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

30

Scenario #1: Unoptimized Design

VoltageScaling

Current Price: 1%

Architecture &Circuit Design

Current Price: 5%

Page 31: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

31

Scenario #1: Unoptimized Design

VoltageScaling

Current Price: 1%

Architecture &Circuit Design

Current Price: 5%

Question: What should you do?

Page 32: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

32

Scenario #1: Unoptimized Design

VoltageScaling

Current Price:1.1%

Architecture &Circuit Design

Current Price: 2%

150 MIPS lost

50 pJ/op saved150 MIPS regained

16 pJ/op spent

Page 33: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

33

Scenario #1: Unoptimized Design

VoltageScaling

Current Price:1.1%

Architecture &Circuit Design

Current Price: 2% 2%

Page 34: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

34

Scenario #2: Changing Costs

Let’s say you start with your now optimized design

But you want more performance…so you start buying from both categories

But let’s say Voltage Scaling costs never change

While Architecture & Circuit Design quickly become more expensive

You use up all the good architecture & circuit design techniques

Architecture &Circuit Design

VoltageScaling

Current Price: 2% Current Price: 2%

For 1% performance

Page 35: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

35

Scenario #2: Changing Costs

VoltageScaling

Current Price: 2%

Architecture &Circuit Design

Current Price: 2%

Page 36: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

36

Scenario #2: Changing Costs

VoltageScaling

Current Price: 2%

Architecture &Circuit Design

Current Price: 2%

Optimal

architecture/circuit design

never changes

Page 37: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

37

Voltage Scaling Marginal Costs

Marginal cost profile for voltage scaling is relatively steady

Costs don’t change too rapidly

MC% = 2.3

Voltage Range:

0.7V – 1.4V,

Normalized to 0.9V

MC% =

% Energy Cost

for

1% Performance

MC% = 0.8

Page 38: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

38

MC = 1.65%

MC% = 6.2%

Compare voltage scaling vs architectural marginal costs

Architecture-Circuit Marginal Costs

MC% = 14.3

MC% = 3.2

MC% = 0.92

MC% = 0.66

MC% = 0.25MC% = 0.49

Page 39: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

39

Matching Marginal Costs

Recall: For optimality marginal costs must match

Page 40: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

40

Matching Marginal Costs

Recall: For optimality marginal costs must match

Architecture + Circuit Design

Trade-off Curve

Page 41: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

41

Matching Marginal Costs

Recall: For optimality marginal costs must match

Architecture + Circuit Design

Trade-off Curve

Page 42: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

42

Matching Marginal Costs

Recall: For optimality marginal costs must match

Architecture + Circuit Design

Trade-off Curve

Small region of

optimal designs

Page 43: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

43

MC = 1.65%

MC% = 6.2%

Architecture Sweet Spot

Interesting space is where marginal costs match with voltage MC’s

MC% = 14.3

MC% = 3.2

MC% = 0.92

MC% = 0.66

MC% = 0.25MC% = 0.49

Page 44: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

44

MC = 1.65%

MC% = 6.2%

Architecture Sweet Spot

Interesting space is where marginal costs match with voltage MC’s

MC% = 14.3

MC% = 3.2

MC% = 0.92

MC% = 0.66

MC% = 0.25MC% = 0.49

Clock Cycle: 19.6 FO4

Integer Unit: 1 cycle

I-cache: 32Kb @ 2.2 cycles

D-cache: 14Kb @ 1.1 cycle

Instr. Window Size: 10 entries

Clock Cycle: 20.6 FO4

Integer Unit: 1 cycle

I-cache: 32Kb @ 2.3 cycles

D-cache: 12Kb @ 1.1 cycle

Instr. Window Size: 11 entries

Page 45: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

45

Full Optimization With Voltage Scaling

Page 46: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

46

Recall: Without Voltage Scaling

1-issue

in-order

2-issue

in-order

2-issue

ooo

4-issue

ooo

4-

in

Optimal

Architecture:

Page 47: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

47

Full Optimization With Voltage Scaling

2-issue ooo2-issue in-order

With voltage scaling:

Two architectures

dominate

energy-efficient

frontier

Optimal

Architecture:

Page 48: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

48

A Few Designs Can Go A Long Way

Voltage scaling with two fixed designs (architecture and circuits)

Can still achieve within 3% of optimal for a large part of the design space!

3% overhead line

Page 49: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

49

Conclusion

Joint optimization of architecture and circuits is possible

All you need is a performance simulator and circuit libraries

When optimizing, always consider marginal costs

Our framework helps do this in a systematic fashion

Efficient processor design

Architecture/circuits have rapidly changing marginal costs; voltage less so

Law of diminishing returns sets in rapidly for the architecture/circuit design

Small set of architecture/circuit features are efficient

Important to pick a good architecture (in the sweet spot)

Want well-tuned design (cache sizes, cycle time, etc.)

Then voltage scaling can go a long way to achieve the desired performance target

Page 50: Energy-Performance Trade-offs in Processor Architecture ...isca2010.inria.fr/media/slides/Azizi-ISCA2010-EnergyEfficient... · Energy-Performance Trade-offs in Processor Architecture

Thank You!