signature buffer: bridging performance gap between registers and caches

Post on 13-Jan-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Signature Buffer: Bridging Performance Gap between Registers and Caches. Lu Peng, Jih-Kwon Peir, Konrad Lai. Introduction. Two types of storage Registers Fast and small Supply data for operations Memory Large and slow Cache for recently used data - PowerPoint PPT Presentation

TRANSCRIPT

11

Signature Buffer: Signature Buffer: Bridging Performance Gap Bridging Performance Gap between Registers and between Registers and CachesCaches

Lu Peng, Jih-Kwon Peir, Konrad LaiLu Peng, Jih-Kwon Peir, Konrad Lai

22

IntroductionIntroduction

Two types of storageTwo types of storage– RegistersRegisters

Fast and smallFast and small Supply data for operationsSupply data for operations

– MemoryMemory Large and slowLarge and slow Cache for recently used dataCache for recently used data

Most RISC only operates on data from registersMost RISC only operates on data from registers

Data communication pathData communication path– Producer -> store -> load -> consumerProducer -> store -> load -> consumer

33

IntroductionIntroduction

Future processors with 35nm Future processors with 35nm technologytechnology– 10 GHz clock10 GHz clock– 64 KB L1 cache64 KB L1 cache– 3-7 cycles L1 cache access time 3-7 cycles L1 cache access time – IPC degrades by 3.5% per additional IPC degrades by 3.5% per additional

cycle on L1 cache access timecycle on L1 cache access time

44

Signature BufferSignature Buffer

Zero-cycle loadZero-cycle load– ““The load and its dependent instructions can be fetched, The load and its dependent instructions can be fetched,

dispatched and executed at the same time”dispatched and executed at the same time”

Avoid address calculationAvoid address calculation– Each load and store uses a signature for accessing the Each load and store uses a signature for accessing the

storagestorage

The signature buffer can be accessed in early pipeline The signature buffer can be accessed in early pipeline stagesstages

A signature consists of,A signature consists of,– Color of the base registerColor of the base register– Displacement valueDisplacement value

55

OutlineOutline

MotivationMotivation

ImplementationImplementation

Performance evaluationPerformance evaluation

66

Motivation – Motivation – Memory Reference Memory Reference CorrelationsCorrelations Signature correlationsSignature correlations

– Store-load and load-load can be Store-load and load-load can be correlated directly by the signaturecorrelated directly by the signature

Signature reference localitySignature reference locality– Nearby memory references often Nearby memory references often

differ by small displacement value differ by small displacement value with the same base registerwith the same base register

77

Example 1Example 1

Source and Assembly Codes of Function copy_disjunct from Parser

Signature correlations

Signature reference locality

88

Example 2Example 2

Source and Assembly Codes of Function bsW from Bzip

99

Signature BufferSignature Buffer

1010

Signature BufferSignature Buffer

0123

32

Initial State

1111

Signature BufferSignature Buffer

01

2 -> 323

32 -> 33

1 100

1 -- 100

1212

Data AlignmentData Alignment

1313

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

SB Directory SB Data Array

TagTag

L1 Tag Array L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

1414

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC I-VI-V 101101

SB Directory SB Data Array

000011

TagTag

CC

DD

L1 Tag Array

101000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!

1515

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC V-VV-V 101101

SB Directory SB Data Array

101011

000011

TagTag

CC

DD

L1 Tag Array

101000

000000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!

1616

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC V-VV-V 101101

BB DD I-VI-V 101101

SB Directory SB Data Array

101011

000011

010100

TagTag

CC

DD

L1 Tag Array

101000

101011

000000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!

1717

Data AlignmentData Alignment

SB SB tagtag

L1 tagL1 tag ValidValid BoundBound

AA CC I-VI-V 101101

BB DD I-II-I 101101

SB Directory SB Data Array

101011

000011

010100

TagTag

CC

DD

L1 Tag Array

101000

101011

000000

L1 Data Array

Requests (Signature): A-001 -> A-101 -> B-010 -> X-000(Real Address) : C-100 D-000 D-101 D-000

SB MISS!Invalidate high A, low B

1818

MicroarchitectureMicroarchitecture

Bypass I Bypass I – SB hit or an early store-load forwardingSB hit or an early store-load forwarding

Bypass IIBypass II– Normal store-load forwardingNormal store-load forwarding

1919

MicroarchitectureMicroarchitecture

2020

Performance Performance EvaluationEvaluation

2121

Performance Performance Evaluation – Evaluation – IPCIPC

SB – nospec13% speedup

SB – perfect14% speedup

2222

Performance Performance Evaluation – Evaluation – Load DistributionLoad Distribution

Normal S-L Forw. & L1 access reduced t0 30%, 70% of loads benefit from SBSB With perfect memory dependence predictor obtains 23% zero-cycle load

2323

Performance Performance Evaluation – Evaluation – SB Hit RatioSB Hit Ratio

Average SB hit rate is about 51%

2424

Performance Evaluation – Performance Evaluation –

Comparison with L0 Comparison with L0 CacheCache

Performance benefit of SB goes up with L1 latencyand always above having a L0 cache

2525

Performance Evaluation – Performance Evaluation –

Comparison with L0 Comparison with L0 CacheCache

Larger L0 => higher hit rate

SB is less sensitiveto size.

2626

AdvantagesAdvantages

Non-speculativeNon-speculative– Data obtained from the SB without intervening stores is Data obtained from the SB without intervening stores is

always correctalways correct

All loads can access the data from the SB without any All loads can access the data from the SB without any restriction on the type of the loads or base registers.restriction on the type of the loads or base registers.

Loads through the SB can bypass the address generation Loads through the SB can bypass the address generation and cache access completely.and cache access completely.

Store/Load correlation is established from the instruction Store/Load correlation is established from the instruction encoding bits to simplify hardware requirement.encoding bits to simplify hardware requirement.

SB uses line-based granularity to capture spatial locality.SB uses line-based granularity to capture spatial locality.

2727

Questions?

2828

Loads – SB SpecificLoads – SB Specific

Early S-L forwardingEarly S-L forwarding– A load has identical signature with an early store in the LSQ A load has identical signature with an early store in the LSQ

with no intervening store in between. (zero-cycle load & SB with no intervening store in between. (zero-cycle load & SB hit)hit)

Early SB accessEarly SB access– SB is accessed after a load is fetched and decoded (zero-SB is accessed after a load is fetched and decoded (zero-

cycle load & SB hit)cycle load & SB hit)

Delayed SB accessDelayed SB access– SB is accessed after memory dependence resolutions SB is accessed after memory dependence resolutions

because of intervening stores (SB hit)because of intervening stores (SB hit)

Non-Signature ForwardingNon-Signature Forwarding– Consecutive SB misses to the same SB line gets forwarded Consecutive SB misses to the same SB line gets forwarded

data from previous misses (SB miss)data from previous misses (SB miss)

top related