in1210/01-pds 1 tu-delft the memory system. in1210/01-pds 2 tu-delft organization 0123 4567 89.........
Post on 01-Jan-2016
215 Views
Preview:
TRANSCRIPT
in1210/01-PDS 1TU-Delft
The Memory System
in1210/01-PDS 2TU-Delft
Organization
0 1 2 3
4 5 6 7
8 9 . .
. . . .
Word Address
Byte Address
0
1
2
3
in1210/01-PDS 3TU-Delft
Connection Memory-CPU
Memory
CPURead/Write
MFC
Address
Data
MAR
MDR
in1210/01-PDS 4TU-Delft
Memory Addressable number of bits Different orderings Speed up techniques
- Cache memories- Memory interleaving
Enlargement- Virtual memory
in1210/01-PDS 5TU-Delft
Organisation(1)
sense/wr
W0
W1
W15
FF FF
Address decoder
input/outputlines b7 b1 b0
R/WCS
A0
A1
A2
A3
b1 b1
in1210/01-PDS 6TU-Delft
Pinning
Total pins required for 16x8 memory: 16 4 address lines 8 data lines 2 control lines 2 power lines
in1210/01-PDS 7TU-Delft
32 by 32memory
array
W0
W31
......
1K by 1 memory
5-bitdeco-der
10-bitaddresslines
two 32-to-1multiplexors
in out
in1210/01-PDS 8TU-Delft
Pinning
Total number of pins required: 16 10 address lines 2 data lines (in/out) 2 control lines 2 power lines
For 128 by 8 memory: 19 pins (7+8+2+2)
in1210/01-PDS 9TU-Delft
Multiple Modules(1)
Address in Module
m bits
CSaddress
Modulen-1
CSaddress
Modulei
CSaddress
Module0
Module
k bits
MMaddress
Block-wise organization
in1210/01-PDS 10TU-Delft
Multiple Modules(2)
CSaddress
Module2**k-1
CSaddress
Modulei
CSaddress
Module0
Module
k bits
Address in Module
m bits
MMaddress
Interleaving organization
in1210/01-PDS 11TU-Delft
Question?
What is the advantage of the interleaved organization?
What the disadvantage?
in1210/01-PDS 12TU-Delft
Memory Hierarchy
increasingsize increasing
speed
increasingcost
Disks
MainMemory
Secondarycache
Primary cache
CPU
in1210/01-PDS 13TU-Delft
Caches(1) Problem: Main Memory is slower than
CPU registers (factor of 5-10) Solution: Fast and small memory between
CPU and Main Memory Contains: recent references memory
locations
CPU Cache MainMemory
in1210/01-PDS 14TU-Delft
Caches(2) Works because of locality principle Profit:
- Cache hit ratio: h- Access time cache: c- Cache miss ratio: 1-h- Access time main memory: m- Mean access time: h.c + (1-h).m
Cache is transparent to programmer
in1210/01-PDS 15TU-Delft
Caches(3) At READ operation
- If not in cache, get block in cache and read out cache (possibly read-through)
- If in cache, read out cache At WRITE operation
- If not in cache, write in main memory- If in cache, write in cache, and:
» write in main memory (store through)
» set modified (dirty) bit
in1210/01-PDS 16TU-Delft
Caches(3a)
Borrow books from library, store according to first letter of first author name in 26 locations
Direct mapped: separate location for a single book for each letter
Associative: any book can go to any of the 26 locations
Set-associative: 2 locations for letters A-B, C-D, E-F, etc
in1210/01-PDS 17TU-Delft
Caches(4) Suppose
- Main Memory is N = 2n bytes- Divided in blocks of b = 2k bytes- Cache: 128 blocks- e.g. n=16, k=4, b=16
Every block in cache has valid bit (is reset when memory is modified)
At context switch: invalidate cache
in1210/01-PDS 18TU-Delft
Direct Mapped Cache(1) A block in memory (j) can only be at one
place in cache (j mod #cache blocks) Place determined by block number Memory address:
5 7 4
tag block wordmainmemoryaddress
in1210/01-PDS 19TU-Delft
Direct Mapped Cache(1)BLOCK 0
.................
BLOCK 127
BLOCK 128
BLOCK 129
..................
BLOCK 255
BLOCK 256
tag5 bits
tag
tag
BLOCK 0
BLOCK 1
BLOCK 2CACHE
in1210/01-PDS 20TU-Delft
Direct Mapped Cache(1)BLOCK 0
BLOCK 1
.................
BLOCK 127
BLOCK 128
BLOCK 129
..................
BLOCK 255
BLOCK 256
tag5 bits
tag
tag
BLOCK 0
BLOCK 1
BLOCK 2CACHE
in1210/01-PDS 21TU-Delft
Associative(1) Each block can be at any place in cache At cache entry: parallel (associative)
match of tag in address with tags in all cache entries
Associative: slower, more expensive, higher hit ratio
12 4
tag wordmainmemoryaddress
in1210/01-PDS 22TU-Delft
Associative(2)BLOCK 0
BLOCK 1
.................
BLOCK 127
BLOCK 128
BLOCK 129
..................
BLOCK 255
BLOCK 256
tag12- bits
BLOCK 0
128blocks
tagBLOCK 1
tagBLOCK 2
tagBLOCK 3
tagBLOCK 4
in1210/01-PDS 23TU-Delft
Set-Associative(1) Combination of direct mapped and
associative Cache consists of sets Each set is associative One block can only be placed in one set;
determined by set number
6 6 4
tag set wordmainmemoryaddress
in1210/01-PDS 24TU-Delft
Set-Associative(2)BLOCK 0
BLOCK 1
.................
BLOCK 127
BLOCK 128
BLOCK 129
..................
BLOCK 255
BLOCK 256
tag6- bits
BLOCK 0
128blocks
tagBLOCK 1
tagBLOCK 2
tagBLOCK 3
tagBLOCK 4
set 0
set 1
in1210/01-PDS 25TU-Delft
Set-Associative(2)BLOCK 0
BLOCK 1
.................
BLOCK 127
BLOCK 128
BLOCK 129
..................
BLOCK 255
BLOCK 256
tag6- bits
BLOCK 0
128blocks
tagBLOCK 1
tagBLOCK 2
tagBLOCK 3
tagBLOCK 4
set 0
set 1
in1210/01-PDS 26TU-Delft
Question? Main memory: 4 GByte Cache: 512 blocks of 64 byte Cache: 8-way set-associative How many bits is the:
- byte address within a block
- set number
- tag
in1210/01-PDS 27TU-Delft
Answer! Main memory: 4 GByte, so 32-bits address Blocks of 64 byte, so 6-bits byte address 8-way set-associative cache with 512 blocks,
so 512/8=64 sets, so 6-bits set number So, 32-6-6=20-bits tag
in1210/01-PDS 28TU-Delft
Replacement(1)
(Set) associative replacement algorithms: Least Recently Used (LRU)
- At 2k blocks per set, implement with k-bit counters per block
- Hit: increase lower counters than referenced with 1, set counter at 0
- Miss and set not full: replace, set counter new block 0, increase rest
- Miss and set full: replace counter with value 2k-1, set counter new block at 0, increase rest
in1210/01-PDS 29TU-Delft
Example
0 1
0 0
1 0
1 1
1 0
0 1
0 0
1 1
k=2
HIT
in1210/01-PDS 30TU-Delft
Example
1 1
0 0
1 0
0 1
0 0
0 1
1 1
1 0
k=2EMPTY
MISS ANDSET NOTFULL
in1210/01-PDS 31TU-Delft
Example
0 1
0 0
1 0
1 1
1 0
0 1
1 1
0 0
k=2
MISS ANDSET FULL
in1210/01-PDS 32TU-Delft
Replacement(2) Replace oldest block Random replacement
in1210/01-PDS 33TU-Delft
Program example
int SUM = 0;for(j=0, j<10, j++) {
SUM =SUM + A[0,j];{AVE = SUM/10;for(i=9, i>-1, i--){
A[0,i] = A[0,i]/AVE}
Normalizeelements offirst row of A
in1210/01-PDS 34TU-Delft
Example cacheBLOCK 0tag
BLOCK 1tag
BLOCK 2tag
BLOCK 3tag
BLOCK 4tag
BLOCK 5tag
BLOCK 6tag
BLOCK 7tag
CACHE with 8 blocks, each block 1 word,LRU replacementSet 0
Set 1
13 3
tag block
direct
16
tag
associative
15 1
tag setsetassociative
in1210/01-PDS 35TU-Delft
Examples(2)0111101000000 0 0 00111101000000 0 0 10111101000000 0 1 00111101000000 0 1 1....................................................0111101000100 1 0 00111101000100 1 0 10111101000100 1 1 00111101000100 1 1 1
Tag directTag set-associativeTag associative
a(0,0) a(1,0) a(2,0) a(3,0) .... .... a(0,9) a(1,9) a(2,9) a(3,9)
Mem
ory
add
ress 4x10
array
columnorder
7A00
in1210/01-PDS 36TU-Delft
Direct mapped
a[0,0] a[0,2] a[0,4] a[0,6] a[0,8] a[0,6] a[0,4] a[0,2] a[0,0]
j=1 j=3 j=5 j=7 j=9 i=6 i=4 i=2 i=0
0
1
2
3
4
5
6
7
blockpos.
Contents of cache after pass:
a[0,1] a[0,3] a[0,5] a[0,7] a[0,9] a[0,7] a[0,5] a[0,3] a[0,1]
= miss
= hit
in1210/01-PDS 37TU-Delft
Associative
a[0,0] a[0,8] a[0,8] a[0,8] a[0,0]
j=7 j=8 j=9 i=1 i=0
a[0,1] a[0,1] a[0,9] a[0,1] a[0,1]
a[0,2] a[0,2] a[0,2] a[0,2] a[0,2]
a[0,3] a[0,3] a[0,3] a[0,3] a[0,3]
a[0,4] a[0,4] a[0,4] a[0,4] a[0,4]
a[0,5] a[0,5] a[0,5] a[0,5] a[0,5]
a[0,6] a[0,6] a[0,6] a[0,6] a[0,6]
a[0,7] a[0,7] a[0,7] a[0,7] a[0,7]
0
1
2
3
4
5
6
7
blockpos.
in1210/01-PDS 38TU-Delft
Set-associative
a[0,0] a[0,4] a[0,8] a[0,4] a[0,4]
j=3 j=7 j=9 i=4 i=2
a[0,1] a[0,5] a[0,9] a[0,5] a[0,5]
a[0,2] a[0,6] a[0,6] a[0,6] a[0,2]
a[0,3] a[0,7] a[0,7] a[0,7] a[0,3]
0
1
2
3
4
5
6
7
blockpos.
a[0,0]
i=0
a[0,1]
a[0,2]
a[0,3]
set 0
in1210/01-PDS 39TU-Delft
PowerPC PowerPC 604 Data and Instruction cache Caches are 16 K bytes Four-way set associative 128 sets, each with 4 blocks, each block 8
words of 32 bits
in1210/01-PDS 40TU-Delft
Example
Block 000BA2 st
Block 1
Block 2
Block 3003F4 st
address 0000 0000 0011 1111 0100 0000 000 0 1000
003F4 00 8
Set 0
=?
=?
no
yes
in1210/01-PDS 41TU-Delft
Virtual Memory(1) Problem: if compiled program does not fit
into memory Solution: Virtual memory, where the
logical address space is larger than the physical address space
Logical address space: Addresses referable by instructions
Physical address space: Addresses referable in real machine
in1210/01-PDS 42TU-Delft
Virtual Memory(2) For realizing virtual memory, we need an
address conversion:
am = f(av)
am is physical address (machine address)
av is virtual address
This is generally done by hardware
in1210/01-PDS 43TU-Delft
OrganizationProcessor
MMU
Cache
Main Memory
Disk Storage
am
am
av
data
data
DMA transfer
in1210/01-PDS 44TU-Delft
Address translation Basic approach is to partition both
physical address space and virtual address space in equally sized blocks called pages
A virtual address is composed of a page number and a word within a page, called off-set
in1210/01-PDS 45TU-Delft
Page tables
virtual page number offset
page frame offset
page table address
+
virtual address from processor page table base register
physical address from processor control bitspage
in1210/01-PDS 46TU-Delft
Associative TBL
virtual page number offset
virtual address from processor
page frame offset
physical address from processor
virtual page real page
= ?
HitMiss
control bits
TLB
in1210/01-PDS 47TU-Delft
Policies Number of pages in main memory:
resident set Mechanism works because of principle of
locality Acceleration: recent address translations
in separate cache
in1210/01-PDS 48TU-Delft
Replacement Page replacement algorithms Protection possible through page table
register Sharing possible through page table Hardware support: Memory
Management Unit (MMU)
in1210/01-PDS 49TU-Delft
Question? Main memory: 256 MByte Maximal virtual-address space: 4 GByte Page size: 4 KByte How many bits is the
- offset within a page
- virtual page frame number
- (physical) page frame number
in1210/01-PDS 50TU-Delft
Answer! Physical address: 8+20=28 bits Virtual address: 32 bits Offset in a page: 12 bits Virtual page frame number: 32-12=20 bits Physical page frame number: 28-12=16 bits
top related