lucía g. menezo valentín puente josé Ángel gregorio university of cantabria (spain)

33
The Case for a Scalable Coherence Protocol for Complex On-Chip Cache Hierarchies in Many-Core Systems Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain) MOSAIC :

Upload: lorna

Post on 23-Feb-2016

61 views

Category:

Documents


0 download

DESCRIPTION

MOSAIC : . The Case for a Scalable Coherence Protocol for Complex On-Chip Cache Hierarchies in Many-Core Systems. Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain). Outline. Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

The Case for a Scalable Coherence Protocol for

Complex On-Chip Cache Hierarchies in Many-Core

SystemsLucía G. Menezo

Valentín PuenteJosé Ángel Gregorio

University of Cantabria (Spain)

MOSAIC :

Page 2: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

University of CantabriaEdinburgh - PACT 2013

Motivation Directory Schemas

◦ In-cache ◦ Sparse

MOSAIC Coherence Protocol◦ Examples

Evaluation Results Conclusions

Outline

Page 3: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

3University of CantabriaEdinburgh - PACT 2013

Performance improvement: more processors per chip

Major challenges: off-chip bandwidth wall Introduce cache into the chip Complex on-chip cache hierarchies

Coherence protocol: fundamental role to play

Motivation

Page 4: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

4University of CantabriaEdinburgh - PACT 2013

What coherence protocol to use with large number of cores: ◦ Broadcast-based protocols high energy

requirements◦ Directory-based protocols more storage

necessities for sharing information

MOSAIC: new coherence protocol◦ Directory without inclusiveness◦ Token Coherence to guarantee correctness

Motivation

Page 5: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

University of CantabriaEdinburgh - PACT 2013

Motivation Directory Schemas

◦ In-cache ◦ Sparse

MOSAIC Coherence Protocol◦ Examples

Evaluation Results Conclusions

Outline

Page 6: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

6University of CantabriaEdinburgh - PACT 2013

Each block in LLC includes tag, data and the sharers information

LLC receives requests needs precise knowledge

Inclusiveness is necessary: any block in the private levels needs to be allocated in LLC

Advantage: coherence protocol less complex Disadvantage: all LLC blocks has storage

overhead

Directory schemas: In-cache

Page 7: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

7University of CantabriaEdinburgh - PACT 2013

@ data

sharers

@ data

@ data

@ data

@ data

P

Proc

esso

rs a

nd p

rivat

e ca

ches

LLC + in-cache directory

P

P

P Inte

rcon

nect

ion

netw

ork

Overhead!!!

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

Directory schemas: In-cache

Page 8: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

8University of CantabriaEdinburgh - PACT 2013

Directory schemas: In-cache@ dat

asharers @ dat

asharers

LLC + in-cache directory

Inte

rcon

nect

ion

netw

ork

Overhead!!!

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

Overhead!!!

Proc

esso

rs a

nd p

rivat

e ca

ches

Page 9: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

9University of CantabriaEdinburgh - PACT 2013

Directory entries separated from data Allocated under demand Overhead proportional to the aggregate

private levels size (not LLC) Capacity and associativity has to be

sufficient to keep private-level cache tags

Directory schemas: Sparse

Page 10: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

10University of CantabriaEdinburgh - PACT 2013

@ data

sharers @ data

Directory schemas: Sparse

Inte

rcon

nect

ion

netw

ork

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@ dataP

@@ sharers

LLCSparse dir

Proc

esso

rs a

nd p

rivat

e ca

ches

Page 11: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

11University of CantabriaEdinburgh - PACT 2013

Duplicate-tag directory: holding all the tags of private levels

Example: 16 cores with 4-way 32KB L1 64-way

Directory schemas: SparseAssociativity = # cores * private caches associativity

# sets = # private

caches sets

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

Page 12: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

12University of CantabriaEdinburgh - PACT 2013

Directory schemas: Sparse

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

Decrease Associativity: now << # cores * private caches associativity

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

sharers sharerssharerssharerssharerssharerssharerssharers

sharerssharerssharerssharers

sharers sharerssharerssharerssharerssharerssharerssharers

sharerssharerssharerssharers

tagtagtagtagtagtag

tagtagtagtagtagtag

One tag may be in various private caches

More than 1 tag per entry conflicts

Inclusiveness needed invalidate private data (recalls messages)

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

tagtagtagtagtagtag

Increasenumber of sets

Page 13: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

13University of CantabriaEdinburgh - PACT 2013

Motivation Directory Schemas

◦ In-cache ◦ Sparse

MOSAIC Coherence Protocol◦ Examples

Evaluation Results Conclusions

Outline

Page 14: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

14University of CantabriaEdinburgh - PACT 2013

In-cache or sparse it doesn’t matter No inclusiveness No invalidations of data in private caches Reconstruction of sharing information under

demand Uses token counting to avoid extra traffic and

guarantee correctness

Token Coherence protocol:◦ Initially each block := # tokens (==#procs) ◦ Read request: data and 1 token◦ Write request: data and all tokens

MOSAIC Protocol

Page 15: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

15University of CantabriaEdinburgh - PACT 2013

MOSAIC Conceptual Approach

I 0 N/A

P0

O 2 DATA

P1

S 1 DATA

P2

SharersI

Last Level Cache

I 0 N/A

Data_sliceDir_slice Memory

Controller

On-chip network

Priv

ate

Cach

es

1

2

3

4

5

State Num. Tokens

Data

V

2

3

1

Page 16: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

16University of CantabriaEdinburgh - PACT 2013

When data not present in LLC broadcast for reconstruction

Private caches inform of num. of held tokens

Token counting avoids negative acknowledgements or timeouts

Reconstruction message piggybacks type of request and requestor

Key: directory may replace silently no invalidations

MOSAIC Key Facts

Page 17: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

17University of CantabriaEdinburgh - PACT 2013

MOSAIC Read RequestP0 P1 P2

InvalidState IS

Read

P3 Dir LLC

State SState OState C

Data + token

State A

ReconstructionInfo 1 tokenInfo 2 tokensOwnerUnblock (info 1 token)

Read

Forward GETS to Owner

Sharers [P2]Owner: ¿?Sharers [P2, P1]Owner: P1Sharers [P2, P1, P0]Owner: P1

Data + token

3 tokens 1 token

Unblock Sharers [P2, P1, P0, P3]Owner: P1

Page 18: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

18University of CantabriaEdinburgh - PACT 2013

MOSAIC Write RequestP0 P1 P2

InvalidState IS

WriteP3 Dir LLC

State SState OState C

Data + 3 tokens

State A

Reconstruction

Sharers [P0]Owner: P0

3 tokens 1 token

State IMState M

1 token

Unblock (info all tokens)

Directory Eviction

Page 19: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

19University of CantabriaEdinburgh - PACT 2013

Motivation Directory Schemas

◦ In-cache ◦ Sparse

MOSAIC Coherence Protocol◦ Examples

Evaluation Results Conclusions

Outline

Page 20: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

20University of CantabriaEdinburgh - PACT 2013

Evaluation methodologyConfig 1 Config 2

Number of cores 8 @3GHz 16 @3GHzIWin size/Issue

Width 128, 4-wayBlock size 64B

Private

L1 Size /

Associativity32KB I/D, 2-way

L2 Size /

Associativity64KB, 4-way

(exclusive with L1)

L3 Shared

Size / Associativity

16MB 16-way

32MB16-way

NUCA Mapping Static, interleaved across slices

Memory Capacity 4GBMax. Outstanding Mem. Operations 16

Topology 4×4 Mesh 6×6 Mesh

Core 0 Core 1 Core 2 Core 3

Core 4 Core 5 Core 6 Core 7

R R R R

R R R R

R R R R

R R R R

Slice 0 Slice 2Slice 1 Slice 3

Slice 4 Slice 6Slice 5 Slice 7

Slice 8 Slice 10Slice 9 Slice 11

Slice 12 Slice 14Slice 13 Slice 15

Core 0 Core 1 Core 2 Core 3

R R R R

R R R R

R R R R

R R R R

Slice 0 Slice 2Slice 1 Slice 3

Slice 5 Slice 7Slice 6 Slice 8

Slice 11 Slice 13Slice 12 Slice 14

Slice 17 Slice 19Slice 18 Slice 20

R

R

R

R

Slice 9

Slice 15

Slice 21

R

R

R

R

Slice 4

Slice 10

Slice 16

R R R RSlice 23 Slice 25Slice 24 Slice 26

RSlice 27

RSlice 22

R R R RSlice 28 Slice 30Slice 29 Slice 31

RR

Core 7Core 5

Core 6Core 4

Core 11 Core 10 Core 9 Core 8Co

re

12Co

re 1

4Co

re 1

3Co

re 1

5

Page 21: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

21University of CantabriaEdinburgh - PACT 2013

GEMS: full-system evaluation

◦SLICC: Specification Language for Implementing Cache Coherence

Simulation stack and Workloads

Multithreaded Workloads

4 Wisconsin Commercial Workload

3 NAS Parallel Bench.

Multiprogrammed Workloads

3 Spec 2006 (Rate Mode)

Page 22: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

22University of CantabriaEdinburgh - PACT 2013

Astar

Hmmer

Omnetpp FT IS LU

Apach

e Jbb OLTP Zeus

Gmean

0.50.60.70.80.9

11.1

64w128KB 32w128KB 2w128KB 1w128KB

MOSAIC PerformanceReducing associativity

Norm

alize

d ex

ecut

ion

time

128KB 16K entries (8 bytes per entry)

Page 23: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

23University of CantabriaEdinburgh - PACT 2013

Number of misses64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

BASE MO-SAIC

Astar Hmmer Omnetpp FT IS LU Apache Jbb OLTP Zeus

00.20.40.60.8

11.21.41.61.8

2Misses L2 Misses L1I Misses L1D

Norm

alize

d nu

m. m

isses x2

Page 24: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

24University of CantabriaEdinburgh - PACT 2013

Astar

Hmmer

Omnetpp FT IS LU

Apach

e Jbb OLTP Zeus

Gmean

0.40.50.60.70.80.9

11.1

64w16KB 32w16KB 2w16KB 1w16KB

MOSAIC Performance Reducing associativity and capacity

Norm

alize

d ex

ecut

ion

time

128KB 16K entries (8 bytes per entry) 16KB 2K entries

Page 25: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

25University of CantabriaEdinburgh - PACT 2013

MOSAIC Latency64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1

BASE

MOSAIC

Astar Hmmer Omnetpp FT IS LU Apache Jbb OLTP Zeus

0

2

4

6

8

10

12

L3 Other L2 Other L1 Private L2 Local L1

Late

ncy

(Pro

cess

or C

ycle

s)

16KB 2K entries

Page 26: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

26University of CantabriaEdinburgh - PACT 2013

Aver

age

netw

ork

link

utiliz

atio

n

MOSAIC Link Utilization

Astar

Hmmer

Omnetpp FT IS LU

Apach

e Jbb OLTP Zeus

Gmean

0

0.2

0.4

0.6

0.8

1

1.2

1.4 64w128KB 64w64KB 64w32KB 64w8KB 2w128KB 2w64KB2w16KB

Page 27: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

27University of CantabriaEdinburgh - PACT 2013

MOSAIC Link Utilization vs. Dir

Astar

Hmmer

Omnetpp FT IS LU

Apach

e Jbb OLTP Zeus

Gmean

0

0.2

0.4

0.6

0.8

1

1.2

1.42w128KB 2w64KB 2w16KB

Nor

mal

ized

net

wor

k lin

k ut

iliza

tion

40%!!

Page 28: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

28University of CantabriaEdinburgh - PACT 2013

MOSAIC Scalability

Astar

Hmmer

Omnetpp FT IS LU

Apach

e Jbb OLTP Zeus

Gmean

00.20.40.60.8

11.21.41.61.8

2 128w256KB 128w128KB 128w64KB 128w32KB 2w256KB 2w128KB2w64KB 2w32KB

Norm

alize

d lin

k ut

ilizat

ion

16 cores configuration

Page 29: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

29University of CantabriaEdinburgh - PACT 2013

Low complexity and great scalability Very low storage overhead No noticeable energy cost Alternative for future many-core cache

coherent CMPs

ConclusionsBandwidth scalability of a directory

Elegancy of Token Coherence

MOSAIC Coherence Protocol

Page 30: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

30University of CantabriaEdinburgh - PACT 2013

Thank you for your attention

Page 31: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

31University of CantabriaEdinburgh - PACT 2013

Page 32: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

32University of CantabriaEdinburgh - PACT 2013

Realistic Cache Configuration

Astar

Hmmer

Omnetpp FT IS LU

Apach

e Jbb OLTP Zeus

Gmean

00.20.40.60.8

11.2

16w512KB 16w256KB 16w128KB 16w64KB 16w32KB

Norm

alize

d ex

ecut

ion

time

- Same experiment with BASE: 20% impact in some cases

L1: 4-way 32KB / L2: 8-way 256KBx2 full dir 1/10 full dir

Page 33: Lucía G. Menezo  Valentín Puente José Ángel Gregorio University of Cantabria (Spain)

33University of CantabriaEdinburgh - PACT 2013

MOSAIC Energy12

8 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MO-SAIC

BASE

MOSAIC

Astar Hmmer Om-netpp

FT IS LU Apache Jbb OLTP Zeus

00.20.40.60.8

11.21.41.61.8

Network Sparse directory L3 L2 L1

Norm

alize

d Dy

nam

ic En

ergy