august 8 th, 2011 kevan thompson creating a scalable coherent l2 cache
TRANSCRIPT
![Page 1: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/1.jpg)
August 8th, 2011Kevan Thompson
Creating a Scalable Coherent L2 Cache
![Page 2: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/2.jpg)
Motivation
Cache Background
System Overview
Methodology
Progress
Future Work
Outline
2
![Page 3: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/3.jpg)
Goal
Create a configurable shared Last Level Cache for the use in the PolyBlaze system
Motivation
3
![Page 4: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/4.jpg)
Introduction
4
Zia
Eric
Kevan
![Page 5: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/5.jpg)
In modern systems, processors out perform main memory, creating a bottleneck
This problem is only exacerbated as more cores contend for the memory
This problem is reduced if each processor maintains a local copy of the data
Cache Background
5
![Page 6: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/6.jpg)
A cache is a small amount of memory on the same die as the processor
The cache is capable of providing a lower latency and a higher throughput than the main memory
Systems may include multiple cache levels
The smallest and most local cache is the L1 cache. The next level cache is the L2, etc
Caches
6
![Page 7: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/7.jpg)
Shared Last Level Cache
Acts as a common location for data
Can be used to maintain cache coherency between processors
Does not exist in current MicroBlaze system
We will design our own shared L2 Cache to maintain cache coherency
7
![Page 8: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/8.jpg)
Cache Speeds
In typical systems:
An L1 cache is very fast (1 or 2 cycles )
An L2 cache is slower (10’s of cycles)
Main memory is very slow (100’s of cycles)
8
![Page 9: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/9.jpg)
Cache Speeds
In our system we expect :
The L1 cache to be very fast (1 or 2 cycles )
The L2 cache to be about (10 of cycles)
Main memory to be faster (10’s of cycles)
In order to model the memory bottleneck of a much faster system we’ll need to stall the Main Memory
9
![Page 10: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/10.jpg)
Direct Mapped Cache
10
Caches store Data, a Valid Bit and a unique identifier called a tag
![Page 11: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/11.jpg)
Tags
11
As an example imagine a system with the following :
32-bit Address Bus, and 32-bit Word Size
64-KByte Cache with 32-Byte Line Size
Therefore we have 2047 (211) Lines
![Page 12: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/12.jpg)
Set-Associated Cache
12
A cache with n possible entries for each address is called an n-way set associated cache
4-Way Set Associated Cache
![Page 13: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/13.jpg)
Replacement Policies
13
When an entry needs to be evicted from the cache we need to decide which Way it is evicted from.
To do this we use a replacement policy
LRU
Clock
FIFO
![Page 14: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/14.jpg)
LRU
14
Keep track of when each entry is accessed
Always evict the Least Recently Used
Implemented using a stack
MRU
LRU
Access 4 Access 2
![Page 15: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/15.jpg)
Clock
15
For each Way we store a Reference Bit
Also store a pointed to the oldest entry (Hand)
Starting with the Hand we test and clear each R Bit until we reach one that is 0
0 1 2 3
01 1 10 0 0
![Page 16: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/16.jpg)
System Overview
16
![Page 17: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/17.jpg)
PolyBlaze L2 Cache
17
1-16 Way Set Associated Cache
LRU or Clock Replacement Policy
32 or 64 Byte Line Width
64 Bit Memory Interface
Write Back Cache
![Page 18: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/18.jpg)
L2 Cache
18
![Page 19: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/19.jpg)
Reuse Policy
19
Determines which Way is evicted on Cache Miss
Currently uses LRU Policy
![Page 20: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/20.jpg)
Tag Bank
20
Contains Tags and Valid Bits
Stored on FPGA using BRAMs
Instantiate one bank for each Way
![Page 21: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/21.jpg)
Control Unit
21
Finite State Machine for L2 Cache Pipelining
If a request is outstanding from NPI we can service other requests in SRAM
![Page 22: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/22.jpg)
Data Bank
22
Control interface for off-chip SRAM
![Page 23: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/23.jpg)
SRAM
23
32-bit ZBT synchronous SRAM
1 MB
![Page 24: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/24.jpg)
Methodology
24
Break L2 cache into three parts and test separately then combine and test system
SRAM Controller
NPI Interface
L2 Core
Complete L2 Cache
![Page 25: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/25.jpg)
SRAM Controller
25
Create a wrapper that connects the SRAM controller to the MicroBlaze by an FSL
Write a program that will write and read data to all addresses in the SRAM
Write all 1’s
Write all 0’s
Alternate writing all 1’s and all 0’s
Write Random data
√
√
√
√
![Page 26: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/26.jpg)
NPI Interface
26
Uses a custom FSL width, so we cannot test using MicroBlaze
Create a hardware test bench to read and write data to all addresses
Write all 1’s
Write all 0’s
Alternate writing all 1’s and all 0’s
Write Random data
X
X
X
X
![Page 27: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/27.jpg)
L2 Core
27
Simulate the core of the L2 cache in iSim
Write a test bench that will approximate the responses from the L1/L2 Arbiter, SRAM Controller, and NPI Interface
The test bench will write to each line multiple times to create a large number of cache misses
X
X
X
![Page 28: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/28.jpg)
Complete L2 Cache
28
Combine the L2 Cache with the rest of PolyBlaze
Write test programs to read and write to various regions of memory
X
X
![Page 29: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/29.jpg)
Current Progress
29
SRAM Controller and Data Bank:
Designed and Tested
NPI Interface:
Testing and Debugging in Progress
L2 Core:
Testing and Debugging in Progress
![Page 30: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/30.jpg)
Future Work
30
Add Clock Replacement Policy to L2 Cache
Add a Write Back Buffer to L2 Cache
Migrate System from XUPV5 to a BEE3 so we can create a system with more cores
Modify the L2 Cache into a NUMA system
Add Custom Hardware Accelerators to PolyBlaze
![Page 31: August 8 th, 2011 Kevan Thompson Creating a Scalable Coherent L2 Cache](https://reader030.vdocuments.us/reader030/viewer/2022033022/56649c785503460f9492dad1/html5/thumbnails/31.jpg)
Questions?
31