branchtap improving performance with very few checkpoints through adaptive speculation control

25
1/25 June 28 th , 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control BranchTap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto

Upload: haig

Post on 31-Jan-2016

36 views

Category:

Documents


0 download

DESCRIPTION

BranchTap Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control. Patrick Akl and Andreas Moshovos AENAO Research Group Department of Electrical and Computer Engineering University of Toronto. What Happens on a Branch Misprediction?. Execution Timeline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

1/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

BranchTapImproving Performance With Very Few Checkpoints

Through Adaptive Speculation Control

Patrick Akl and Andreas Moshovos

AENAO Research GroupDepartment of Electrical and Computer Engineering

University of Toronto

Page 2: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

2/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

• We wish to make the recovery fast

What Happens on a Branch Misprediction?

Execution Timeline

Misprediction

Discovered Recover Processor

State

Redirect Fetch

Resume

Execution

Predict a Branch Outcome

Predicted Path Correct Path

Page 3: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

3/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

• Existing mechanisms– Reorder buffer based: slow– Instantaneous checkpoints: faster

• Problem: can’t have enough checkpoints

• State-of-the-art solution: checkpoint prediction– Allocate the few checkpoints judiciously

• Another degree of freedom: speculation control– Sometimes deeper speculation = higher recovery cost

• Can hurt performance

– Throttle speculation

State-of-the-art recovery

Page 4: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

4/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

• No additional checkpoints are needed

• Dynamically adapts to application behavior

• Improves performance for most programs– Misprediction performance penalty reduced by 28% on AVG

• BranchTap comes “for free” – Very simple to implement– Better than more accurate checkpoint predictors

BranchTap Results / Benefits

Page 5: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

5/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Outline

• Background

• BranchTap

• Methodology and Results

• Summary

Page 6: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

6/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

State Recovery Example: Register Alias Table

RAT

ArchitecturalRegister

PhysicalRegister

# a

rch

. re

gs

Lg(# arch. regs)

A add r1, r2, 100B breq r1, EC sub r1, r2, r2

Original Code

A add p4, p2, 100B breq p4, EC sub r5, p2, p2

Renamed Code

p1

p2

p3

p4p5p5p4

Page 7: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

7/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

ROB: Slow, Fine-Grain Recovery

• Too slow: recovery latency proportional to number of instructions to squash

Reorder

BufferB B B BB

1. Misprediction discovered2. Locate newest instruction

3. Undo RAT updates in reverse order

Program Order

RATINVALID

Each entry contains

1. Architectural destination register

2. Its previous RAT map

Page 8: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

8/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Global Checkpoints: Fast, Coarse-Grain Recovery

• Branch w/ GC: Recovery is “Instantaneous”

Reorder

BufferB B B BB

1. Misprediction discovered

Program Order

RATINVALID

checkpointcheckpointcheckpointcheckpoint

Page 9: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

9/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Impact of More Checkpoints

• More checkpoints ?– Power hungry structure

– Increased delay

• Only a few checkpoints can practically be implemented– Cannot always cover all branches

architecturalregister

physical register

Actual Implementation

Working Copy chec

kpoint

sRAT

Concept

Page 10: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

10/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Intelligent Checkpointing

• State of the art solution– Checkpoint allocation: Allocate checkpoints at hard-to-

predict branches

– Checkpoint management: Release checkpoints as soon as they are no longer needed

• Use few checkpoints efficiently

Page 11: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

11/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

• Mispeculation on a branch w/ a GC: Direct recovery

• Mispeculation on a branch w/o a GC: Indirect recovery

• With intelligent checkpointing: • 30% Indirect recoveries 75% of performance loss

Conventional Mechanisms: Recovery Scenarios

BBB ROB

BBB ROB

checkpoint

Fast Recovery

Slow Recovery

checkpoint

Page 12: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

12/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Outline

• Background

• BranchTap

• Methodology and Results

• Summary

Page 13: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

13/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

BranchTap Motivation

ROBNo Wait Scenario

Misprediction

discovered

~ Recovery Cost

~ Recovery Cost

checkpoint

Low confidence branch

checkpoint

checkpoint checkpoint

ROB

Sometimes, it is better to wait if no checkpoint is available

Wait Scenario

B B B

B B B

Page 14: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

14/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

BranchTap Concept

• Key idea: stall when speculation is likely to deteriorate performance– Count the number of low confidence branches w/o a checkpoint– If it exceeds a threshold, stall

• Threshold selection– Fixed

• Varies greatly across programs• Can deteriorate performance significantly

– Adaptive• Robust performance

• Minimize recovery cost while conserving good speculation opportunities

Page 15: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

15/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

No adaptation Sample &adapt

Execution Timeline (Cycles)

WT Next WT

Threshold Adaptation Policy

• BranchTap adapts across and within applications

Page 16: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

16/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Outline

• Background

• BranchTap

• Methodology and Results

• Summary

Page 17: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

17/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Results Overview

• Performance w/o Checkpoints– BranchTap improves even with just an ROB

• Performance w/ 4 Checkpoints– BranchTap improves over conventional recovery methods

• Performance w/ Larger Checkpoint Predictors– BranchTap offers better performance than a 64x larger

predictor

Page 18: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

18/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Methodology

• Simulator based on Simplescalar

• 24 SPEC CPU 2000 benchmarks

• Reference Inputs

• Processor configurations– 8-way OoO core– Up to 1K in-flight instructions– 1K-entry confidence table for low confidence branch

identification

• 1B committed instructions after skipping 100B

Page 19: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

19/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

“Perfect Checkpointing” Configuration

• A checkpoint is auto-magically taken at all mispredicted branches– All recoveries are fast

• We report the “deterioration relative to perfect checkpointing”

Page 20: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

20/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

0%

5%

10%

15%

20%

25%

gzip vpr lucas art AVG

Conventional BranchTap Adaptive BranchTap Non-Adaptive

Performance with No Checkpoints• Deterioration relative to “perfect checkpointing”

-39%

dete

riora

tion

• BranchTap improves over conventional mechanisms• Adaptation leads to robust performance improvements

bet

ter

Page 21: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

21/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

• Deterioration relative to “perfect checkpointing”

• BranchTap with 4 checkpoints is better than 6 checkpoints alone

0%

2%

4%

6%

8%

10%

twolf parser lucas mcf bzip2 AVG

Conventional BranchTap Adaptive BranchTap non-Adaptive

Performance Evaluation with 4 Checkpoints

-28%

dete

riora

tion b

ette

r

Page 22: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

22/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

• BranchTap with a 1K-entry confidence table and 4 GCs:– Higher performance than a 64K-entry confidence table with 4 GCs

– Lower complexity, virtually comes “for free”

0.0%

0.5%

1.0%

1.5%

2.0%

2.5%

3.0%

64 256 1K 4K 16K 64K

BranchTap vs. Larger Checkpoint Predictors

BranchTapde

terio

ratio

n

confidence table size

bet

ter

Page 23: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

23/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Outline

• Background

• BranchTap

• Methodology and Results

• Summary

Page 24: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

24/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

Summary

• Performance with 4 (no) checkpoints– ~28 (39) % of misprediction penalty removed– BranchTap is robust:

• Up to 6 (13) % better and max 1.2 (0.1) % worse than conventional mechanisms

• BranchTap is very simple to implement– Few counters and comparators

• BranchTap is better than other alternatives– BT + 1K predictor better than a 64K predictor alone– BT + 4 GCs better than 6 GCs alone

Page 25: BranchTap Improving Performance With Very Few Checkpoints  Through Adaptive Speculation Control

25/25June 28th, 2006 BranchTap: Improving Performance With Very Few Checkpoints Through Adaptive Speculation Control

BranchTapImproving Performance With Very Few Checkpoints

Through Adaptive Speculation Control

Patrick Akl and Andreas Moshovos

AENAO Research GroupDepartment of Electrical and Computer Engineering

University of Toronto

{pakl, moshovos}@eecg.toronto.edu