compressed tag architecture for low-power embedded cache systems

17
Jong Wook Kwak and Young Tae Jeon Journal of Systems Architecture Volume 56, Issue 9, pp.419-428 Sep. 2010 Presenter: Chun-Hung Lai 111/06/23

Upload: xenos-love

Post on 02-Jan-2016

35 views

Category:

Documents


1 download

DESCRIPTION

Compressed Tag Architecture for Low-Power Embedded Cache Systems. Jong Wook Kwak and Young Tae Jeon Journal of Systems Architecture Volume 56, Issue 9, pp.419-428 Sep. 2010 Presenter: Chun-Hung Lai. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Jong Wook Kwak and Young Tae JeonJournal of Systems ArchitectureVolume 56, Issue 9, pp.419-428

Sep. 2010Presenter: Chun-Hung Lai

112/04/20

Page 2: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Processor in embedded systems mostly employ cache architectures in order to alleviate the access latency gap between processors and memory systems. Caches in embedded systems usually occupy a major fraction of the implemented chip area. The power dissipation of cache system thus constitutes a significant fraction of the power dissipated by the entire processor in embedded systems.

In this paper, we propose the compressed tag architecture to reduce the power dissipation of the tag store in cache systems. We introduce a new tag-matching mechanism by using a locality buffer and a tag compression technique. The main power reduction feature of our proposal is the use of small tag space matching instead of full tag matching, with modest additional hardware costs. The simulation results show that the proposed model provides a power and energy-delay product reduction of up t0 27.8% and 26.5%, respectively, while still providing a comparable level of system performance to regular cache systems.

AbstractAbstract

- 2 -

Page 3: Compressed Tag Architecture for Low-Power Embedded Cache Systems

 The cache power dissipation constitutes a major fraction of the embedded processor The tag bits requires a significant fraction of the cache area

。However, conventional tag bits are unnecessarily large

Goal: reduce power consumption in the tag of cache memory Propose a “Compressed Tag Architecture” cache

。Use partial tag bits instead of full tag matching

What’s the ProblemWhat’s the Problem

- 3 -

Number of Tag Bits VS. Cache Hit Ratio

- 5~6 tag bits comparison provides the same level of full tag bit comparison

Locality -> small fraction of addr. range

Process data setProcess data set

011001101100110110011

Tags

Address Space:

Reduced tagsXXXX011XXXX011XXXX011

Page 4: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Related Works Related Works

- 4 -

Partial tag comparison [9]

Partial tag comparison [9]

False hit

Tag Overflow Buffering (TOB) [12]

Tag Overflow Buffering (TOB) [12]

Fragile when locality changes frequently

Reduce HW cost for data value predictorReduce HW cost for data value predictor

Energy-efficient cache architecture

Energy-efficient cache architecture

Partial tag resolution

[11]

Partial tag resolution

[11]

Partial tag generation

[7]

Partial tag generation

[7]

Fewer tag bits to uniquely identify each instruction

Focus on: reduce HW cost

Small tag to enable only the the data array of the matched way

Move the MSB tag bits from cache into an external register

Compressed Tag architecture for low power cache

Compressed Tag architecture for low power cacheThis paper:

Manage partial tag bit (upper address bits) information

Manage partial tag bit (upper address bits) information

Address compression for on-chip address

bus

Address compression for on-chip address

bus

Partial match address

compression [13]

Partial match address

compression [13]

Upper bits of recently occurring address are saved

for compression

Analyze and solve the locality change problemSolve the false hit problem

Page 5: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Address decomposition for compressed tag architecture

Locality Buffer (LoB) and Locality Compressed Bit (LCB)

PrefacePreface

- 5 -

TagH: role of locality detection- If programs exhibit locality: the address of TagH bits are same for successive requests

TagH is saved in a separated register, LOB

TagH is saved in a separated register, LOB

Index bits of the LoB

Index bits of the LoB

Only TagL field is checked on a tag comparison

- If hit, use LCB to find TagH- If hit, use LCB to find TagH

Page 6: Compressed Tag Architecture for Low-Power Embedded Cache Systems

The shaded areas indicate the required cache modification

The Compressed Tag ArchitectureThe Compressed Tag Architecture

- 6 -

LCB: Locality Compressed Bit

TagL hit: - LoB is accessed by LCB bits

Only TagL is checked

11Index a cache set

22

 TagL miss: - a cache miss

3344

LoB hit: - a final hit

If LoB miss:-other LoB entries are checked while a cache line fill, if a match occurs: - then a corresponding LCB is changed to indicate a correct LoB entry

If LoB miss:-other LoB entries are checked while a cache line fill, if a match occurs: - then a corresponding LCB is changed to indicate a correct LoB entry

55

Page 7: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Goal: to support these locality changes Add a Locality Miss Buffer (LoMB)

。Save localities that are not included in LoB (provide a second chance) Add hit counters for each entry of LoB and LoMB

。Increase when a match occurs Replacement mechanism for locality buffer

If There are Still Misses in All LoB Entries If There are Still Misses in All LoB Entries

- 7 -

LoMB:All LoB entries are misses:- Check all LoMB entries

If one of LoMB is hit

LoMB hit counter ++; if (LoMB hit counter > threshold) { replace a LoB entry that has the smallest hit counter; } else { not allocate in LoB and cache }

LoMB hit counter ++; if (LoMB hit counter > threshold) { replace a LoB entry that has the smallest hit counter; } else { not allocate in LoB and cache }

11

22

If all LoMB are misses:Place into LoMBPlace into LoMB

Prevent a short locality changesPrevent a short locality changes

Page 8: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Scenario: TagL is miss and hit in locality buffer

Example of the Compressed Tag Cache Example of the Compressed Tag Cache Operation- 1Operation- 1

- 8 -

1. TagL miss: - mean: cache miss -> fetch from memory

22

2. LoB is accessed by LCB - LoB hit -> LoB hit counter ++

3. Cache in L1 cache - the LCB bit of the hit LoB entry is updated in cache

LoB hit

33

Note, LoB hit includes:1.The entry indexed by LCB is hit2.Hit in other entries

TagL miss11

Page 9: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Scenario: TagL is miss, miss in locality buffer, and hit in locality miss buffer

Example of the Compressed Tag Cache Example of the Compressed Tag Cache Operation- 2Operation- 2

- 9 -

1. TagL miss: - mean: cache miss -> fetch from memory

22

2. LoB is accessed by LCB - LoB miss ->

4. If (LoMB hit counter >= threshold) - Replace LoB with LoMB - Cache in L1 cache (LCB bit is updated accordingly)44

3. Check all LoMB entries - LoMB hit -> LoMB hit counter++

33

Locality buffer replacement

LoMB hit

LoB miss

If (LoMB hit counter < threshold)- Feed to CPU without caching(No caching)

TagL miss11

Page 10: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Scenario: TagL is miss, miss in locality buffer, and miss in locality miss buffer

Example of the Compressed Tag Cache Example of the Compressed Tag Cache Operation- 3Operation- 3

- 10 -

1. TagL miss: - mean: cache miss -> fetch from memory

22

2. LoB is accessed by LCB - LoB miss ->

3. Check all LoMB entries - LoMB miss ->

33 LoMB miss

LoB miss

5. Feed to CPU without caching (No caching)

4. Place into LoMB- If (LoMB available) insert; else select a candidate and replace;

TagL miss11

44

Page 11: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Overall Operations for Each Compressed Tag Overall Operations for Each Compressed Tag ComponentComponent

- 11 -

Example 1

Example 2

Example 3

Page 12: Compressed Tag Architecture for Low-Power Embedded Cache Systems

For a 4KB cache, partial tag bits (TagL) varies from 0~8

1-6 tag bits (TagL) are enough to provide a comparable miss ratio with the full tag 。On average, 6 bits are enough

When using 5 bits, the increase of miss ratio is 0.83% on average

Number of “TagL” Bits VS. Miss Ratio(%) Number of “TagL” Bits VS. Miss Ratio(%)

- 12 -

Bold number: miss ratio of partial tag policy becomes the same as full tag policy

Page 13: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Four entry LoB provides comparable miss ratios with the conv. Policy Within a 0.3% variation

Number of LoB Entries VS. Miss Ratio (%)Number of LoB Entries VS. Miss Ratio (%)

- 13 -

The proper number of LoB entry is four

The proper number of LoB entry is four

Decide

Decide

Page 14: Compressed Tag Architecture for Low-Power Embedded Cache Systems

2 entry LoMB provides comparable miss ratios with the conv. Although 3 or 4 entry provides slightly better

Number of LoMB Entries VS. Miss Ratio (%)Number of LoMB Entries VS. Miss Ratio (%)

- 14 -

The proper number of LoMB entry is two

The proper number of LoMB entry is two

The following energy evaluation will be made on the configuration :

4-entry LoB, 2-entry LoMB, and hit counter with threshold of one

The following energy evaluation will be made on the configuration :

4-entry LoB, 2-entry LoMB, and hit counter with threshold of one

Page 15: Compressed Tag Architecture for Low-Power Embedded Cache Systems

Cache Energy SavingCache Energy Saving

- 15 -

For 4,8,16KB cache The energy saving is up to 27.8%, 17.2%, 9.8% respectively

The proposed tag architecture provides more energy saving In case of a small cache size

For 4KB, 18% on averageFor 4KB, 18% on average

Page 16: Compressed Tag Architecture for Low-Power Embedded Cache Systems

This paper proposed an energy-efficient compressed tag architecture Exploit the memory access locality exhibited by programs

。Small fraction of addr. range -> partial tag bits are enough Most of the tag bits are moved out of the cache into a Locality

Buffer (LoB) Locality changes are solve by LoMB and hit counters

Results show that the proposed scheme The tag address bits is reduced to one-fourth of the original size The energy saving is up to 27.8%

。While still providing a comparable performance level with conv.

ConclusionConclusion

- 16 -

Page 17: Compressed Tag Architecture for Low-Power Embedded Cache Systems

This paper is one of the innovation of the tag organization in my research tree Reduce the number of tag bits during each tag comparison

。Programmable active tag bits。TLB index-based tagging。Tag overflow buffering。Selective physical tag/virtual tag cache…

Things can be improved The illustration of the LoB access can be overlapped with the

cache access is not clear。Why LOB can be accessed while accessing the cache??

Need LCB bits in cache to index a LOB entry The related works for the tag innovation used in processor cache are not

sufficient

Comment for This PaperComment for This Paper

- 17 -