doppelgänger: a cache for approximate computing

Post on 29-Dec-2021

6 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Doppelgänger: A Cache for Approximate Computing

Joshua San Miguel

Jorge Albericio

Andreas Moshovos

Natalie Enright Jerger

Cache Hierarchy

2

main memory

processor core

private caches

shared last-level cache

Cache Hierarchy

3

processor core

private caches

main memory

shared last-level cache

Cache Hierarchy

4

processor core

private caches

main memory

shared last-level cache

Accessing memory is 10x – 100x greater latency and energy than accessing private cache!

Cache Hierarchy

5

processor core

private caches

main memory

shared last-level cache

Accessing memory is 10x – 100x greater latency and energy than accessing private cache!

Need hierarchy of large caches…

Cache Hierarchy

6

main memory

processor core

private caches

shared last-level cache

Cache Hierarchy

7

main memory

processor core

private caches

shared last-level cache

Cache Hierarchy

8

main memory

processor core

private caches

shared last-level cache

But last-level cache consumes substantial energy and takes up 30%-50% of chip area!

Cache Hierarchy

9

main memory

processor core

private caches

shared last-level cache

But last-level cache consumes substantial energy and takes up 30%-50% of chip area!

Higher efficiency via Approximate Computing…

Summary

10

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Summary

11

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.

Summary

12

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.

Significantly reduces area and energy consumption. Reduces total on-chip cache area by 1.36x.

Outline

Approximate Computing

Approximate Similarity

Doppelgänger Cache

Cache Architecture

Similarity Mapping

Evaluation

13

Approximate Computing

Not all data/computations need to be precise.

14

http://www.zentut.com/

http://www.businessweek.com/

http://www.cc.gatech.edu/~cnieto6/

http://www.analyticbridge.com/

http://themusicparlour.blogspot.ca/

http://www.scientific-computing.com/

Data mining

Computer vision Audio and video processing

Gaming Machine learning Dynamical simulation

Approximate Similarity

15

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Approximate Similarity

16

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Approximate Similarity

17

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Approximate Similarity

18

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Approximate Similarity

19

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Approximate Similarity

20

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Approximate Similarity

21

1

2

3

92 131 183 91 132 186

90 131 185 93 133 184

35 31 29 43 38 37

approximately

similar

Two data blocks are approximately similar (i.e., doppelgängers) if replacing the values of one with the other still results in acceptable application output in the end.

Allows for 77% cache storage savings of approximable data!

Outline

Approximate Computing

Approximate Similarity

Doppelgänger Cache

Cache Architecture

Similarity Mapping

Evaluation

22

Doppelgänger Cache

23

main memory

processor core

private caches

shared last-level cache

Doppelgänger Cache

24

main memory

processor core

private caches

shared last-level cache

How can we exploit approximate similarity to save area and energy in the last-level cache?

Doppelgänger Cache

25

main memory

processor core

private caches

shared last-level cache

Doppelgänger Cache

26

main memory

processor core

private caches

shared LLC precise LLC Doppelgänger LLC

Conventional Cache

27

tag array data array

address

from L2

data from

memory

Conventional Cache

28

tag array data array

address

from L2

data from

memory

One-to-one mapping of data values to memory locations.

Conventional Cache

29

tag array data array

address

from L2

data from

memory

One-to-one mapping of data values to memory locations.

But the fundamental goal of a processor is to process data values, not memory locations…

Conventional Cache

30

tag array data array

address

from L2

data from

memory

Conventional Cache

31

tag array data array

address

from L2

data from

memory

Conventional Cache

32

tag array data array

address

from L2

data from

memory

Conventional Cache

33

tag array data array

address

from L2

data from

memory

Conventional Cache

34

tag array data array

address

from L2

data from

memory

Multiple copies of approximately similar blocks.

Conventional Cache

35

tag array data array

address

from L2

data from

memory

Doppelgänger Cache

36

tag array

approximate data array

address

from L2

data from

memory

Doppelgänger Cache

37

tag array

approximate data array

address

from L2

data from

memory

Smaller data array allows for substantial area and energy savings.

Doppelgänger Cache

38

tag array

approximate data array

address

from L2

data from

memory

Doppelgänger Cache

39

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address

from L2

data from

memory

Doppelgänger Cache - Lookups

40

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

Doppelgänger Cache - Lookups

41

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 0

from L2

Doppelgänger Cache - Lookups

42

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 0

from L2

map X from

tag array

Doppelgänger Cache - Lookups

43

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 0

from L2

data A to L2

map X from

tag array

Doppelgänger Cache - Insertions

44

tag 0 map X

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

Doppelgänger Cache - Insertions

45

tag 0 map X

tag 5

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 5

from L2

Doppelgänger Cache - Insertions

46

tag 0 map X

tag 5

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 5

from L2

data B from

memory

data B to L2

Doppelgänger Cache - Insertions

47

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Doppelgänger Cache - Insertions

48

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Doppelgänger Cache - Insertions

49

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Miss!

Doppelgänger Cache - Insertions (Miss)

50

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Doppelgänger Cache - Insertions (Miss)

51

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 5

from L2

data B from

memory generate map Y from data B

data B to L2

Doppelgänger Cache - Insertions

52

tag 0 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

Doppelgänger Cache - Insertions

53

tag 0 map X

tag 6

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

Doppelgänger Cache - Insertions

54

tag 0 map X

tag 6

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory

data C to L2

Doppelgänger Cache - Insertions

55

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Doppelgänger Cache - Insertions

56

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Doppelgänger Cache - Insertions

57

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Hit!

Doppelgänger Cache - Insertions (Hit)

58

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Doppelgänger Cache - Insertions (Hit)

59

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Doppelgänger Cache - Insertions (Hit)

60

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2 Data block A serves as an acceptable approximation of data block C.

Doppelgänger Cache - Insertions (Hit)

61

tag 0 map X

tag 6 map X

tag 5 map Y

tag 1 map X

tag 2 map X

tag 3 map X

map X data block A

map Y data block B

tag array

approximate data array

address 6

from L2

data C from

memory generate map X from data C

data C to L2

Doppelgänger Cache - Similarity Mapping

62

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Doppelgänger Cache - Similarity Mapping

63

data block A

A[0] A[1] A[n]

hash function

mapping

hash

map

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Doppelgänger Cache - Similarity Mapping

64

data block A

A[0] A[1] A[n]

mapping

hash

map

hash function Aggregates values in block:

hash = AVG(A*0+, …, A*n+)

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Doppelgänger Cache - Similarity Mapping

65

data block A

A[0] A[1] A[n]

hash function

hash

map

Discretizes hash value:

mapping

map

(M-bit)

All possible

hash values

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Doppelgänger Cache - Similarity Mapping

66

data block A

A[0] A[1] A[n]

hash function

hash

map

Discretizes hash value:

mapping

map

(M-bit)

All possible

hash values

approximately

similar

The map value represents the signature (or likeness) of a block. Blocks that generate the same map value are approximately similar.

Doppelgänger Cache

67

main memory

processor core

private caches

shared LLC precise LLC Doppelgänger LLC

uniDoppelgänger Cache

68

main memory

processor core

private caches

shared LLC uniDoppelgänger LLC

uniDoppelgänger Cache

69

main memory

processor core

private caches

shared LLC uniDoppelgänger LLC

Precise blocks simply use physical address as the map value.

70

More details in paper: Cache writes, replacements and coherence.

Details on hash functions and mapping.

Sensitivity to size of map space and data array.

Evaluation of uniDoppelgänger.

Doppelgänger Cache

Outline

Approximate Computing

Approximate Similarity

Doppelgänger Cache

Cache Architecture

Similarity Mapping

Evaluation

71

Evaluation

72

Applications: PARSEC and AxBench

Performance: Full-system cycle-level simulation

Error: Pin simulation

Area and Energy: CACTI

Configuration: 4 cores, private L1 and L2

2MB shared LLC (1MB precise, 1MB Doppelgänger)

Doppelgänger: 14-bit similarity map, 1/4 data array

73

Evaluation - Compression Ratio

0x

1x

2x

3x

4x

5x

6x

BΔI exact deduplication doppelganger doppelganger + BΔI

com

pre

ssio

n r

atio

bet

ter

74

Evaluation - Compression Ratio

0x

1x

2x

3x

4x

5x

6x

BΔI exact deduplication doppelganger doppelganger + BΔI

com

pre

ssio

n r

atio

bet

ter

75

Evaluation - Compression Ratio

0x

1x

2x

3x

4x

5x

6x

BΔI exact deduplication doppelganger doppelganger + BΔI

com

pre

ssio

n r

atio

bet

ter

76

Evaluation

0.8x

0.9x

1.0x

1.1x

1.2x

1.3x

1.4x

applicationoutput accuracy

applicationperformance

total cachedynamic energy

reduction

total cacheleakage energy

reduction

total cache areareduction

bet

ter

77

Evaluation

0.8x

0.9x

1.0x

1.1x

1.2x

1.3x

1.4x

applicationoutput accuracy

applicationperformance

total cachedynamic energy

reduction

total cacheleakage energy

reduction

total cache areareduction

bet

ter

Conclusion

78

Doppelgänger Cache: Identifies approximate similarity in data block values.

77% cache storage savings of approximable data.

Effectively compresses storage of approximately similar blocks. 3x better compression ratio than state-of-the-art techniques.

Significantly reduces area and energy consumption. Reduces total on-chip cache area by 1.36x.

Thank you

Doppelgänger: A Cache for Approximate Computing

Joshua San Miguel

Jorge Albericio

Andreas Moshovos

Natalie Enright Jerger

top related