compilers for dsp processors and low-power

39
1 Compilers for DSP Processors and Compilers for DSP Processors and Low-Power Low-Power Jenq-Kuen Lee Department of Computer Science National Tsing-Hua Univ. Hsinchu, Taiwan

Upload: kyna

Post on 11-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Compilers for DSP Processors and Low-Power. Jenq-Kuen Lee Department of Computer Science National Tsing-Hua Univ. Hsinchu, Taiwan. Agenda. DSP Compilers Compilers for Low-Power. NSC 3C DSP Compiler Infrastructures. Target Machine DSP Processor Low power instruction support. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compilers  for DSP Processors and Low-Power

1

Compilers for DSP Processors and Low-PowerCompilers for DSP Processors and Low-Power

Jenq-Kuen LeeDepartment of Computer Science

National Tsing-Hua Univ.Hsinchu, Taiwan

Page 2: Compilers  for DSP Processors and Low-Power

2

AgendaAgendaAgendaAgenda

DSP Compilers

Compilers for Low-Power

Page 3: Compilers  for DSP Processors and Low-Power

3

NSC 3C DSP Compiler InfrastructuresNSC 3C DSP Compiler InfrastructuresNSC 3C DSP Compiler InfrastructuresNSC 3C DSP Compiler Infrastructures

Target Machine• DSP Processor

• Low power instruction support.

Compiler Infrastructure• Cross-Compiler

GNU Compiler Collection v1.37.1. Support low power instructions.

• Cross-Assembler Re-write new assembler. Support low power instructions.

Page 4: Compilers  for DSP Processors and Low-Power

4

33C DSP Compiler InfrastructureC DSP Compiler Infrastructure (cont’d) (cont’d)33C DSP Compiler InfrastructureC DSP Compiler Infrastructure (cont’d) (cont’d)

GNU Compiler Collection

Machinedescription

Low powerinstructions /

support

Front-end

Optimization

Instructionscheduling for low

power

Assembler

Optimization

Source Program

Executable Code

Assembly Code

Front-End

Page 5: Compilers  for DSP Processors and Low-Power

5

ORISAL Architecture ORISAL Architecture Description LanguageDescription LanguageORISAL Architecture ORISAL Architecture

Description LanguageDescription Language

ORISAL is based on Java-like syntax and styles. Object oriented styles will reduce specification writing

efforts from scratch and also give the designers a more natural view of coding.

Object oriented styles will reduce mistakes compared to other imperative language based ADL.

ORISAL will incorporate power model descriptions to deliver more adaptable power simulations and optimizations for low-power.

Page 6: Compilers  for DSP Processors and Low-Power

6

ORISAL and Simulator ORISAL and Simulator GeneratorGenerator

ORISAL and Simulator ORISAL and Simulator GeneratorGenerator

Benefits in ORISAL:• Java natively has good thread and exception handling support, could behave

better than other language (C/C++) in synchronization mechanism.

• Simulator could be easily extended with distributed network environments and accelerate large-scale System-On-a-Chip simulation. (RMI and JavaBeans)

Status:• Simulator Generator implementation is in progress and we will have an

example simulator in several months.

• Power model is designed in progress.

Page 7: Compilers  for DSP Processors and Low-Power

7

ORISAL and DSP Library ORISAL and DSP Library PortingPorting

ORISAL and DSP Library ORISAL and DSP Library PortingPorting

We have designed an preliminary pseudo assembly language:• A speed-critical or size-critical program written by pseudo assembly

could retarget to another platform more easily than compiler.

• Pseudo assembly with machine description annotations provides a different layer for code optimizations in compiler toolkits, especially for library optimizations.

Status:• We are going on with implementing an pseudo assembler.

• We are starting to write an pseudo assembly based DSP library and keep on enhancing the features of our pseudo assembly design.

Page 8: Compilers  for DSP Processors and Low-Power

8

Compiler Optimization for Compiler Optimization for Low Power on Power GatingLow Power on Power Gating

Yi-Ping YouChingren LeeJenq Kuen Lee

Programming Language Lab.National Tsing Hua University

Page 9: Compilers  for DSP Processors and Low-Power

9

MotivationMotivationMotivationMotivation

Power dissipated while components are idling Static/Leakage powerStatic/Leakage power accounts for the majority of

power dissipated when the circuit is inactive Clock gating doesn’t help reduce leakage power

Page 10: Compilers  for DSP Processors and Low-Power

10

Significance of Leakage Significance of Leakage PowerPower

Significance of Leakage Significance of Leakage PowerPower

As transistors become smaller and faster, static/leakage power becomes an important factor• Deep submicron CMOS circuits

Leakage power soon becomes comparable to dynamic power

Page 11: Compilers  for DSP Processors and Low-Power

11

Trends in Dynamic and Trends in Dynamic and Static Power Dissipation Static Power Dissipation

(From Intel)(From Intel)

Trends in Dynamic and Trends in Dynamic and Static Power Dissipation Static Power Dissipation

(From Intel)(From Intel)

1.E-05

1.E-04

1.E-03

1.E-02

1.E-01

1.E+00

1.E+01

1.E+02

1.0 0.8 0.6 0.35 0.25 0.18

Technology Generation (um)

Pow

er (

Wat

ts)

DynamicStatic

Page 12: Compilers  for DSP Processors and Low-Power

12

Leakage Power Trend in Leakage Power Trend in Temperature (0.13um) Temperature (0.13um)

(From Intel)(From Intel)

Leakage Power Trend in Leakage Power Trend in Temperature (0.13um) Temperature (0.13um)

(From Intel)(From Intel)

1% 2% 3% 5% 8% 11% 15% 20%26%

0

10

20

30

40

50

60

70

80

90

100

30 40 50 60 70 80 90 100 110

Temperature (C)

Pow

er (

Wat

ts)

Dynamic Static

0.13um, 15mm die, 1V

Page 13: Compilers  for DSP Processors and Low-Power

13

Leakage Power Trend in Leakage Power Trend in Temperature (0.1um) (From Temperature (0.1um) (From

Intel)Intel)

Leakage Power Trend in Leakage Power Trend in Temperature (0.1um) (From Temperature (0.1um) (From

Intel)Intel)

6% 9% 14% 19%26%

33%

41%

49%

56%

0

10

20

30

40

50

60

70

80

90

100

30 40 50 60 70 80 90 100 110

Temperature (C)

Pow

er (

Wat

ts)

Dynamic Static

0.1um, 15mm die, 0.7V

Page 14: Compilers  for DSP Processors and Low-Power

14

Static/Leakage Power Static/Leakage Power ReductionReduction

Static/Leakage Power Static/Leakage Power ReductionReduction

Pstatic = Vcc * N * Kdesign * I leak

Partition circuits into several domains operating at different supply voltages

Reduce number of devices Use more efficient circuits

• Technology parameter• Subthreshold leakage

Page 15: Compilers  for DSP Processors and Low-Power

15

Power GatingPower GatingPower GatingPower Gating

Sleep transistor to power on or power off the circuit

Used to turn off useless components in processors

FunctionalBlock

Vdd

Sleep FETVirtual Ground

DAC’97 Kao et al.

Page 16: Compilers  for DSP Processors and Low-Power

16

Power Gating Control Register (64bits)

PC

PC - 4

PC + 4

PC + 8

... .........

Instruction Decoder

Instruction Bus(32bits)

Integer Multiplier

Floating Point Multiplier

Floating Point Divider

Floating Point Adder

Integer Registers (64bits x 32)

Integer ALU/Normal

Operation

Floating Point Registers (64bits x 32)

Program Counter

Micro Codes

ConstantSupplyingVoltage

Input/Output (64bits)

32bits

Input/Output (64bits)

Input/Output (64bits)

Machine Machine ArchitectureArchitecture

Machine Machine ArchitectureArchitecture

Page 17: Compilers  for DSP Processors and Low-Power

17

ObjectiveObjectiveObjectiveObjective

Use compiler analysis techniques to analyze program behaviors• Data-flow analysis

Insert power gating instructions into proper points in programs• Find the maximum inactive intervals• Employing power gating if necessary

Page 18: Compilers  for DSP Processors and Low-Power

18

Component-Activity Data-Component-Activity Data-Flow AnalysisFlow Analysis

Component-Activity Data-Component-Activity Data-Flow AnalysisFlow Analysis

P a predessor of B

_ [ ] _ [ ]comp in B comp out P

_ [ ] _ [ ]comp out B comp gen B

( _ [ ] _ [ ])comp in B comp kill B

Page 19: Compilers  for DSP Processors and Low-Power

19

Component-Activity Data-Component-Activity Data-Flow Analysis (Cont.)Flow Analysis (Cont.)

Component-Activity Data-Component-Activity Data-Flow Analysis (Cont.)Flow Analysis (Cont.)

A component-activity is • generated at a point p if a component is required for

this executing• killed if the component is released by the last request

Page 20: Compilers  for DSP Processors and Low-Power

20

Data-Flow Analysis Algorithm Data-Flow Analysis Algorithm for Component Activitiesfor Component Activities

Data-Flow Analysis Algorithm Data-Flow Analysis Algorithm for Component Activitiesfor Component Activities

Beginfor each block B do begin /* computation of comp_gen */

for each component C that will be used by B do beginRemainingCycle[B][C] := N, where N is the number of cycles needed for C by B;comp_gen[B] := comp_gen[B] C;

endendfor each block B do begin

comp_in[B] := comp_kill[B] := ;comp_out[B] := comp_gen[B];

endwhile changes to any comp_out occur do begin /* iterative analysis */

for each block B do beginfor each component C do begin /* computation of comp_kill */ RemainingCycle[B][C] := MAX(RemainingCycle[P][C]) -1), where P is a predecessor of B; if RemainingCycle[B][C] = 0 then comp_kill[B] := comp_kill[B] C;endcomp_in[B] := comp_out[P], where P is a predecessor of B; /* computation of comp_in */comp_out[B] := comp_gen[B] (comp_in[B] - comp_kill[B]); /* computation of comp_out */

endEnd

Page 21: Compilers  for DSP Processors and Low-Power

21

Example for comp_gen_set Example for comp_gen_set ComputationComputation

Example for comp_gen_set Example for comp_gen_set ComputationComputation

Instruction ComponentExecution

Latency

I1 ALU 3

I2 Multiplier 4

I3 Divider 2

I4 Data Bus 1

I5

ALU 2

Data Bus 2

I6 others -

Mapping table

B1: I6

B2: I1

B3: I2

B4: I6

B5: I4

B6: I3

B7: I5

B8: I6

B9: I4

B10: I6

Instruction sequence

Page 22: Compilers  for DSP Processors and Low-Power

22

Example for comp_gen_set Example for comp_gen_set Computation (Cont.)Computation (Cont.)

Example for comp_gen_set Example for comp_gen_set Computation (Cont.)Computation (Cont.)

BlockCycleCount

comp_gen_setALU Mul Div DBus

B1 0 0 0 0 0000

B2 3 0 0 0 {ALU} 1000

B3 0 4 0 0 {MUL} 0100

B4 0 0 0 0 0000

B5 0 0 0 1 {DBus} 0001

B6 0 0 2 0 {Div} 0010

B7 2 0 0 2 {ALU, DBus} 1001

B8 0 0 0 0 0000

B9 0 0 0 1 {DBus} 0001

B10 0 0 0 0 0000

Page 23: Compilers  for DSP Processors and Low-Power

23

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(1/4)(1/4)

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(1/4)(1/4) 

B1: I6

B2: I1

B3: I4

B4: I2

B5: I6

B6: I6

B7: I4

B8: I3

B9: I6

B10: I1

B11: I6

B12: I3

B13: I4

B14: I6

Block comp_gen_set

B1 0000

B2 1000

B3 0001

B4 0100

B5 0000

B6 0000

B7 0001

B8 0010

B9 0000

B10 1000

B11 0000

B12 0010

B13 0001

B14 0000

Page 24: Compilers  for DSP Processors and Low-Power

24

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(2/4)(2/4)

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(2/4)(2/4)

BlockInitial

comp_in_set comp_kill_set comp_out_set

B1 0000 0000 0000

B2 0000 0000 1000

B3 0000 0000 0001

B4 0000 0000 0100

B5 0000 0000 0000

B6 0000 0000 0000

B7 0000 0000 0001

B8 0000 0000 0010

B9 0000 0000 0000

B10 0000 0000 1000

B11 0000 0000 0000

B12 0000 0000 0010

B13 0000 0000 0001

B14 0000 0000 0000

Page 25: Compilers  for DSP Processors and Low-Power

25

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(3/4)(3/4)

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(3/4)(3/4)

BlockPass 1 Pass 2

in kill out in kill out

B1 0000 0000 0000 0000 0000 0000

B2 0000 0000 1000 0000 0000 1000

B3 1000 0000 1001 1000 0000 1001

B4 1001 0001 1100 1001 0001 1100

B5 1100 1000 0100 1100 1000 0100

B6 0100 0000 0100 0100 0000 0100

B7 0100 0000 0101 0100 0000 0101

B8 0101 0101 0010 0101 0101 0010

B9 0010 0000 0010 0010 0000 0010

B10 0010 0010 1000 0010 0010 1000

B11 1000 0000 1000 1000 0000 1000

B12 0010 0010 0010 0010 0010 0010

B13 0010 0000 0011 0010 0000 0011

B14 0011 0011 0000 0011 0011 0000

Page 26: Compilers  for DSP Processors and Low-Power

26

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(4/4)(4/4)

Example for Component-Example for Component-Activity Data Flow Analysis Activity Data Flow Analysis

(4/4)(4/4)

BlockComponent-Activity

comp_out_setALU Multiplier Divider Data Bus

B1 INACTIVE INACTIVE INACTIVE INACTIVE 0000

B2 ACTIVE INACTIVE INACTIVE INACTIVE 1000

B3 ACTIVE INACTIVE INACTIVE ACTIVE 1001

B4 ACTIVE ACTIVE INACTIVE INACTIVE 1100

B5 INACTIVE ACTIVE INACTIVE INACTIVE 0100

B6 INACTIVE ACTIVE INACTIVE INACTIVE 0100

B7 INACTIVE ACTIVE INACTIVE ACTIVE 0101

B8 INACTIVE INACTIVE ACTIVE INACTIVE 0010

B9 INACTIVE INACTIVE ACTIVE INACTIVE 0010

B10 ACTIVE INACTIVE INACTIVE INACTIVE 1000

B11 ACTIVE INACTIVE INACTIVE INACTIVE 1000

B12 INACTIVE INACTIVE ACTIVE INACTIVE 0010

B13 INACTIVE INACTIVE ACTIVE ACTIVE 0011

B14 INACTIVE INACTIVE INACTIVE INACTIVE 0000

Page 27: Compilers  for DSP Processors and Low-Power

27

Cost ModelCost ModelCost ModelCost Model

Eturn-off(Component) + Eturn-on(Component) ≦

BreakEvenComponent * Pstatic(Component)

Left hand side:

• Energy consumed when power gating employed

Right hand side:

• Normal energy consumption

Page 28: Compilers  for DSP Processors and Low-Power

28

Scheduling Policies for Scheduling Policies for Power GatingPower Gating

Scheduling Policies for Scheduling Policies for Power GatingPower Gating

Basic_Blk_Sched• Schedule power gating instructions in a given basic

block MIN_Path_Sched

• Schedule power gating instructions by assuming the minimum length among plausible program paths

AVG_Path_Sched• Schedule power gating instructions by assuming the

average length among plausible program paths

Page 29: Compilers  for DSP Processors and Low-Power

29

MIN_Path_Sched (Based on Depth-First-MIN_Path_Sched (Based on Depth-First-Traveling)Traveling)

MIN_Path_Sched (Based on Depth-First-MIN_Path_Sched (Based on Depth-First-Traveling)Traveling)

MIN_Path_Sched(C, B, Banched, Count) {if block B is the end of CFG then return Count;if block B has two children then do {

if C comp_out[B] then do { // conditional branch, inactive Count := Count + 1; l_Count := r_Count := Count; if left edge is a forward edge then l_Count := MIN_Path_Sched(C, left child of B, TRUE, Count); if right edge is a forward edge then

r_Count := MIN_Path_Sched(C, right child of B, TRUE, Count);

if MIN(l_Count, r_Count) > BreakEvenC and !Branched thenschedule power gating instructions at the head and tail of inactive blocks;

return MIN(l_Count, r_Count);} else { // conditional branch, active

if Count > BreakEvenC and !Branched thenschedule power gating instructions at the head and tail of inactive blocks;

if left edge is a forward edge then l_Count := MIN_Path_Sched(C, left child of B, FALSE, Count); if right edge is a forward edge then

r_Count := MIN_Path_Sched(C, right child of B, FALSE, Count); return Count;}

Page 30: Compilers  for DSP Processors and Low-Power

30

} else { if C comp_out[B] then do { // statements except conditional branch, inactive Count := Count + 1; if edge is a forward edge then return MIN_Path_Sched(C, child of B, Branched, Count); else

return Count; } else { // statements except conditional branch, active

if Count > BreakEvenC and !Branched then schedule power gating instructions at the head and tail of inactive blocks;

if the edge pointing to child of B is a forward edge then MIN_Path_Sched(C, left child of B, FALSE, Count); return Count; }

}}

Page 31: Compilers  for DSP Processors and Low-Power

31

Experimental EnvironmentExperimental EnvironmentExperimental EnvironmentExperimental Environment

MachSUIF

Low Power Optimization

SUIF

Classical Optimization

High SUIF to Low SUIF

.c Source Code

Alpha Code Generation

CFG construction

Pseudo Code Elimination

Register Allocation

Stack Frame HouseKeeping

Representation Translation

.s Alpha Assembly Code

Component-Activity Data-Flow Analysis

Power Gating Scheduling

Compilation.s Alpha Assembly Code

Alpha Assembler & Liner

Alpha Executable Code

Wattch Simulator

Power Results

Simulation

Alpha-compatible architecture Incorporated into the compiler tool

SUIFSUIF and MachSUIFMachSUIF Evaluated by WattchWattch power

estimator, which is based on SimpleScalarSimpleScalar architectural simulator

Page 32: Compilers  for DSP Processors and Low-Power

32

Alpha 21264 Power Alpha 21264 Power ComponentsComponents

Alpha 21264 Power Alpha 21264 Power ComponentsComponents

Global Clock Network 32%

Instruction Issue Units 18%

Caches 15%

Floating Execution Units 10%

Integer Execution Units 10%

Memory Management Unit 8%

I/O 5%

Miscellaneous Logic 2%

DAC’98

Digital Equipment Corp.

Page 33: Compilers  for DSP Processors and Low-Power

33

BenchmarksBenchmarksBenchmarksBenchmarks

Collections of common benchmarks of FAQ of comp.benchmarks USENET newsgroup, ftp://ftp.nosc.mail/pub/aburto• hanoi, heapsort, nsieve, queens, tfftdp, shuffle, eqntott,

Page 34: Compilers  for DSP Processors and Low-Power

34

Power Gating on FPAdder Power Gating on FPAdder for for nsievensieve

Power Gating on FPAdder Power Gating on FPAdder for for nsievensieve

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

ClockGating

2 4 8 16 32 64

BreakEven Cycle

Po

we

r (W

att

s)

BASIC_BLK_Sched

MIN_Path_Sched

AVG_Path_Sched

Page 35: Compilers  for DSP Processors and Low-Power

35

Power Gating on FPMultiplier Power Gating on FPMultiplier for for nsievensieve

Power Gating on FPMultiplier Power Gating on FPMultiplier for for nsievensieve

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

ClockGating

2 4 8 16 32 64

BreakEven Cycle

Po

we

r (W

att

s)

BASIC_BLK_Sched

MIN_Path_Sched

AVG_Path_Sched

Page 36: Compilers  for DSP Processors and Low-Power

36

Power Gating on FPAdder Power Gating on FPAdder (BreakEven=32)(BreakEven=32)

Power Gating on FPAdder Power Gating on FPAdder (BreakEven=32)(BreakEven=32)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

hanoi heapsort nsieve queens tfftdp eqntott-test1

eqntott-test2

eqntott-test3

eqntott-test4

Floatint Point Adder

Pow

er (

Wat

ts)

Clock_Gating Power_Gating, BASIC_BLK_SchedPower_Gating, MIN_Path_Sched Power_Gating, AVG_Path_Sched

Page 37: Compilers  for DSP Processors and Low-Power

37

Power Gating on FPMultiplier Power Gating on FPMultiplier (BreakEven=32)(BreakEven=32)

Power Gating on FPMultiplier Power Gating on FPMultiplier (BreakEven=32)(BreakEven=32)

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

hanoi heapsort nsieve queens tfftdp eqntott-test1

eqntott-test2

eqntott-test3

eqntott-test4

Floating Point Multiplier

Pow

er (

Wat

ts)

Clock_Gating Power_Gating, BASIC_BLK_SchedPower_Gating, MIN_Path_Sched Power_Gating, AVG_Path_Sched

Page 38: Compilers  for DSP Processors and Low-Power

38

SummarySummarySummarySummary

We investigated the compiler analysis techniques related to reducing leakage power• Component-activity data-flow analysis• Power gating scheduling

Our approach reduces power consumption against clock gating• Average 82% for FPUnits• Average 1.5% for IntUnits• 0.78%~14.58% (avg. 9.9%) for total power consumption

Page 39: Compilers  for DSP Processors and Low-Power

39

ConclusionConclusionConclusionConclusion

Architecture design and system software arrangements are playing an increasingly important role for energy reductions.

We present research results on compiler supports for low-power.

Reference projects: NSF/NSC Project (3C), ITRI CCL Project, MOEA Project 2002-2005 (Pending).