web viewdifference between write through and write back for making clear their advantages and...

RESEARCH PAPER on

CACHE MEMORY

AUTHORS:-

Ankur Sharma*

KIRTI AZAD*

JANPREET SINGH JOLLY*.

*CSE B-TECH 4th YEAR, DRONACHARYA COLLEGE OF ENGINEERING.

ABSRTACT

This research paper investigates the cache memory and its various aspects. The early beginning part of the paper makes you familiar with the term cache. Further ahead, the paper covers the principles of locality o which the cache working is based. Then, a detailed cache organization is explained with the help of appropriate diagrams for easy understanding. Details of write policies implemented by the cache are also undertaken in this paper. Write through or write back what should be adopted? Difference between write through and write back for making clear their advantages and disadvantages is also presented. All possible combinations of the policies in case of write hit and write miss are shown. The later part in the paper covers up the replacement policies that can be taken up for eviction of a block in the cache when a new block is brought to the cache. Thus, going through this paper one will end up with a good understanding of cache and its importance.

CONTENTS

1. INTRODUCTION2. BASIC NOTATIONS3. LEVELS OF MEMORY HIERARCHY4. CACHE PRINCIPLES5. CACHE ORGANISATION

5.1.Direct Mapping5.2.Fully Associative Mapping 5.3.Set-Associative Mapping

6. READ POLICIES7. CACHE WRITE POLICIES8. CACHE REPLACEMENT POLICY

8.1Replacement Algorithms 8.1.1 Least Recently Used (LRU) Replacement policy8.1.2 First In First Out (FIFO) replacement policy8.1.3 Random replacement policy

9. CONCLUSION10.REFERENCES

1. INTRODUCTION Cache memory is Static Random Access memory(SRAM) used by microprocessor to increase the average speed of interaction with the main memory of the computer (random access memory or RAM). A cache is a buffer memory placed in between CPU and main memory that temporarily holds copies of portions of main memory that are currently in use. It is one of the upper levels of the memory hierarchy. The cache memory is 3-4 times faster than the main memory (DRAM) [1].The cache has a small capacity, is very fast and keeps code of frequently used data. When the CPU needs to access the memory for reading or writing data, it first checks if there is a copy of it in the cache. If a copy is found, the processor performs an operation using a cache with very small latency, increasing overall performance. A cache memory greatly decreases the main memory traffic for each processor, since most memory references are handled in the cache. Moreover, the cache consumes very less energy as compared to the main memory.

Figure 1: Cache memory placement [2]

2. BASIC NOTATIONS

Cache block( or cache line): It is the basic unit for cache storage that can be transferred between the main memory and the cache. The block consists of one or more physical words accessed from the main memory.

Cache hit: Processor references that are found in the cache.

Cache miss: Processor references that are not found in the cache.

Physical word: Basic unit of access in the memory.

Write hit: When CPU requests a write operation to a memory unit, the old data is searched in the cache. If the old data is found, it is termed as Write hit.

Write miss: When CPU requests a write operation to a memory unit, the old data is searched in the cache. If the old data is not found, it is termed as Write miss.

Miss rate: Fraction of memory references not found in cache (misses/references).

Hit time: Time to deliver a line in the cache to the processor (includes time to determine whether the line is in the cache).

Miss penalty: Additional time required because of a miss

3. LEVELS OF MEMORY HIERARCHY

Figure 2: Memory Hierarchy

Figure 1 shows that as we move down the memory hierarchy memories tend to have:

1. Lower cost per bit2. Higher capacity3. Increased latency4. Lower throughput

Thus, cache memory is faster, smaller amongst other memory types.But, all these features come with a drawback that cache is expensive than other memories.

4. CACHE PRINCIPLES

Cache works on the basis of the locality of program behavior[3]. There are three principles involved:

1. Spatial Locality – If access is made to a certain memory location, it is very likely that a reference will be made to either that or nearby locations in the near future throughout the program lifecycle.

2. Temporal Locality – If a sequence of references is made to n locations, then there is a high possibility that references following this sequence will be made into the sequence. Elements of the sequence will again be referenced during the lifetime of the program.

3. Sequentiality - Given that a reference has been made to a particular location s it is likely that within the next several references a reference to the location of s+ 1 will be made. Sequentiality is a restricted type of spatial locality and can be regarded as a subset of it.

5. CACHE ORGANISATION

For mapping the main memory to the cache, there are three basic types of organization:

a. Direct Mappedb. Fully Associativec. Set Associative

Consider a cache of 4096 (4K) words with a block size of 32 words. Therefore, the cache is organized as 128 blocks.

For 4K words, required address lines are 12 bits. To select one of the blocks out of 128 blocks, we need 7 bits of address lines and to select one word out of 32 words, we need 5 bits of address lines.

Let us consider a main memory system consisting 64K words. Since the block size of cache is 32 words, so the main memory is also organized as block size of 32 words. Therefore, the total number of blocks in main memory is 2048 (2K x 32 words = 64K words).

To identify any one block of 2K blocks, we need 11 address lines and to select a word from a block 5 address lines are required.

5.1. Direct Mapping

In direct mapping technique, block k of main memory maps into block k modulo m of the cache, where m is the total number of blocks in cache. In this example, the value of m is 128. In direct mapping technique, one particular block of main memory can be transferred to a particular block of cache which is derived by the modulo function.

Since more than one main memory block is mapped onto a given cache block position, contention may arise for that position. This situation may occur even when the cache is not full. Contention is resolved by allowing the new block to overwrite the currently resident block. So the replacement algorithm is trivial.

The detail operation of direct mapping technique is as follows:

The main memory address is divided into three fields. The field size depends on the memory capacity and the block size of cache. In this example, the lower 5 bits of address is used to identify a word within a block. Next 7 bits are used to select a block out of 128 blocks (which is the capacity of the cache). The remaining 4 bits are used as a TAG to identify the proper block of main memory that is mapped to cache.

When a new block is first brought into the cache, the high order 4 bits of the main memory address are stored in four TAG bits associated with its location in the cache. When the CPU generates a memory request, the 7-bit block address determines the corresponding cache block. The TAG field of that block is compared to the TAG field of the address. If they match, the desired word specified by the low-order 5 bits of the address is in that block of the cache.

If there is no match, the required word must be accessed from the main memory, that is, the contents of that block of the cache is replaced by the new block that is specified by the new address generated by the CPU and correspondingly the TAG bit will also be changed by the high order 4 bits of the address[4].

Figure 3: Direct-mapping cache

5.2. Fully Associative Mapping

In fully associative mapping, when a request is made to the cache, the requested address is compared in a directory against all entries in the directory. If the requested address is found (a directory hit), the corresponding location in the cache is fetched and returned to the processor; otherwise, a miss occurs. a main memory block can potentially reside in any cache block position. In this case, the main memory address is divided into two groups, low-order bits identifies the location of a word within a block and high-order bits identifies the block. In the example here, 11 bits are required to identify a main memory block when it is resident in the cache , high-order 11 bits are used as TAG bits and low-order 5 bits are used to identify a word within a block. The TAG bits of an address received from the CPU must be compared to the TAG bits of each block of the cache to see if the desired block is present.

In the associative mapping, any block of main memory can go to any block of cache, so it has got the complete flexibility and we have to use proper replacement policy to replace a block from cache if the currently accessed block of main memory is not present in cache. It might not be practical to use this complete flexibility of associative mapping technique due to searching overhead, because the TAG field of main memory address has to be compared with the TAG field of all the cache block. In this example, there are 128 blocks in cache and the size of TAG

is 11 bits[4]. The whole arrangement of Associative Mapping Technique is shown in the figure 4.

Figure 4: Fully-Associative mapping cache

5.3. Set-Associative Mapping

Set-Associative mapping technique is intermediate to the previous two techniques. Blocks of the cache are grouped into sets, and the mapping allows a block of main memory to reside in any block of a specific set. Therefore, the flexibility of associative mapping is reduced from full freedom to a set of specific blocks. This also reduces the searching overhead, because the search is restricted to number of sets, instead of number of blocks. Also the contention problem of the direct mapping is eased by having a few choices for block replacement.

Consider the same cache memory and main memory organization of the previous example. Organize the cache with 4 blocks in each set. The TAG field of associative mapping technique is divided into two groups, one is termed as SET bit and the second one is termed as TAG bit. Each set contains 4 blocks, total number of set is 32. The main memory address is grouped into three parts: low-order 5 bits are used to identifies a word within a block. Since there are total 32 sets present, next 5 bits are used to identify the set. High-order 6 bits are used as TAG bits.

The 5-bit set field of the address determines which set of the cache might contain the desired block. This is similar to direct mapping technique, in case of direct mapping, it looks for block, but in case of block-set-associative mapping, it looks for set. The TAG field of the address must then be compared with the TAGs of the four blocks of that set. If a match occurs, then the block is present in the cache; otherwise the block containing the addressed word must be brought to the cache. This block will potentially come to the corresponding set only. Since, there are four blocks in the set, we have to choose appropriately which block to be replaced if all the blocks are occupied. Since the search is restricted to four blocks only, so the searching complexity is reduced[4]. The whole arrangement of block-set-associative mapping technique is shown in the figure 5.

Figure 5: Set-Associative mapping cache

6. READ POLICIES

1. Read Through: Read a word to the CPU from the main memory.

2. No Read Through: Read a line to the cache from the main memory and then to CPU from cache[5].

7. CACHE WRITE POLICIES

When the CPU wants to write to a memory unit that uses a cache, it can use any of the following write policies depending upon whether old data exists in the cache(write hit) or not(write miss)[6].

Write Policies on Write Hit are:

1. Write-Through: The data is written to both the block in the cache and to the block in the lower-level memory.

2. Write-Back: the data is written only to the block in the cache i.e. the system memory is not updated. To cope up with this a dirty bit is attached to each cache block. It indicates whether a cache line is clean(i.e. not modified in the cache) or dirty(i.e. modified in the cache). The dirty block is written to the lower-level memory only when it has to be replaced. bit indicates whether the block is dirty (modified while in the cache) or clean (not modified). If it is clean the block is not written on a miss.

Write-Through Write-Back

1. Write is slower. 1. Write is faster(occurs at the speed of cache).

2. Information is written into both the cache and the main memory.

2. Information is written only into the cache.

3. Main memory is consistent i.e. data in it is not stale. It has the most current copy of the data.

3. Main memory is updated in case of dirty line replacement. Therefore, it’s not always consistent with cache.

4. Uses more memory bandwidth. 4. Uses less memory bandwidth.

5. Easy to implement. 5. Harder to implement.

6. Main memory access is required for each write.

6. Main memory access is required only in case of dirty line replacement.

Table 1: Difference between Write Through and Write Back.

Write Policies on Write Miss are:

1. Write Allocate: The line is loaded into the cache and then followed by an appropriate write hit action.

2. No Write Allocate: The new data is written in the main memory, but the line is not loaded in the cache.

Write Hit Policy Write Miss Policy Action On hit Action On miss

1. Write Through Write Allocate writes to cache and main memory.

updates the block in main memory and brings the block to the cache.

2. Write Through No Write Allocate writes to cache and main memory.

updates the block in main memory not bringing that block to the cache.

3. Write Back Write Allocate writes to cache setting dirty bit for the block, main memory is not updated.

updates the block in main memory and brings the block to the cache.

4. Write Back No Write Allocate writes to cache setting dirty bit for the block, main memory is not updated.

updates the block in main memory not bringing that block to the cache.

Table 2: All potential combinations of policies with main memory on write.

Out of the four combinations the 2nd and the 3rd combinations are the most efficient.

8. CACHE REPLACEMENT POLICY

8.1.Replacement Algorithms

Suppose a new block is to be brought into the cache. It may happen that, all the locations that it may occupy are occupied. What to do in such a case to incorporate the new block in the cache?

The only solution is to evict one of the old blocks to create space for the new one such that the evicted block is one that is the least likely to be referenced in the near future.

Now the question arises how to a block in the cache to be replaced by the incoming block.

This is done by using any of the under mentioned replacement algorithms:

8.1.1. Least Recently Used (LRU) Replacement policy

Keeping in mind the principle of locality, it can be supposed that there is a great possibility that lines which have been referenced recently will also be referenced in the near future.

The principle that LRU follows is that a blocks in the cache that have not been referenced in the recent past are unlikely to be referenced in the near future.

The cache ranks each of the blocks according to how recently they have been referenced and replaces the one that has been least recently used with the new block.

Therefore, when a block is to be overwritten, it is a good decision to overwrite the one that has gone for longest time without being referenced. This is defined as the least recently used (LRU) block. LRU block must be tracked as the computation proceeds.

Let us take an example of a four-block set. We use a 2-bit counter to keep track of each block.

When a hit occurs, that is, when a read request is received for a word that is in the cache, the counter of the block that is referenced is set to 0. All counters which values originally lower than the referenced one are incremented by 1 and all other counters remain unchanged.

When a miss occurs, that is, when a read request is received for a word and the word is not present in the cache, we have to bring the block to cache.

There are two possibilities in case of a miss:

If the set is not full, the counter associated with the new block loaded from the main memory is set to 0, and the values of all other counters are incremented by 1.

If the set is full and a miss occurs, the block with the counter value 3 is removed , and the new block is put in its place. The counter value is set to 0. The other three block counters are incremented by 1.

The counter values of occupied blocks are always distinct. Also it is trivial that highest counter value indicates least recently used block.

Have aloook at figure 3 which illustrates the LRU replacement policy.

Figure 3: LRU replacement policy

8.1.2. First In First Out (FIFO) replacement policy

FIFO follows the first in first out principle i.e. when a new block is to be brought in, the oldest block is overwritten. While using this technique, no updation is required when a hit occurs.

When a miss occurs and the set is not full, the new block is put into an empty block and the counter values of the occupied block will be increment by one. When a miss occurs and

the set is full, the block with highest counter value is replaced by new block and counter is set to 0, counter value of all other blocks of that set is incremented by 1.

The overhead of this policy is less, since no updation is required during hit.

8.1.3. Random replacement policy

The simplest algorithm of all is the Random replacement algorithm. This algorithm choses a random block in the cache when there is an incoming block to the cache. The block chosen at random will be overwritten by the new block.

Remarkably enough, this meek policy has been found to be very effective in practice.

9. CONCLUSION

The gist of cache memory has been explained very well in this paper. The paper makes it clear that the cache is faster than all other memories in the hierarchy.The cache memory is 3-4 times faster than the main memory.Among the write policies the best combinations that work well are:

1. Write through and No write allocate 2. Write back and Write allocate.

The LRU policy is the most efficient as compared to FIFO and Random policies reducing the number of faults to a minimum.

Thus, in this paper all possible cache features ,working and implementations have been clearly and correctly made understood to the readers.

10.REFERENCES 1. anyfreepapers.com/free-research-papers/cache-memory.htm2. notes.tyrocity.com/primary-memory3. cseweb.ucsd.edu/classes/su07/cse141/cache-handout.pdf4. nptel.iitm.ac.in/courses/Webcourse-contents/%20Guwahati/comp_org_arc/

pdf/coa.pdf5. ecs.umass.edu/ece/koren/architecture/Cache/tutorial.html6. ecee.colorado.edu/~ecen2120/Manual/caches/cache.html

http://www.ecs.umass.edu/ece/koren/architecture/Cache/tutorial.html

web viewdifference between write through and write back for making clear their advantages and...

Documents