memory arithmetic unit interface
DESCRIPTION
Memory Arithmetic Unit Interface. Jason M. Meier Justin S. Teller Tom J. Keeley. Current Paradigm. CPU. Done: Task 1. CPU:. Task 1. Task 2. MEMORY CTRL:. MEMORY:. DRAM System. Memory Controller. Active Pages Implementation. Used Configurable DRAM - RADRAM. - PowerPoint PPT PresentationTRANSCRIPT
Memory Arithmetic Unit InterfaceJason M. MeierJustin S. TellerTom J. Keeley
MemoryController
Current Paradigm
Task 1CPU: Task 2
MEMORY:
CPU
MEMORYCTRL:
DRAM System
Done: Task 1
Active Pages Implementation
• Used Configurable DRAM - RADRAM
•Reconfigurable logic implements various memory functions•“Active Page” consists of a page of data and a set of associated functions•Works on individual DRAM chips•Processor-centric and Memory-centric partitioning
* Active Pages - Oskin, Chong, Sherwood – ISCA ‘98
MAUI Implementation
Task 1CPU:
MEMORY:
CPU
MEMORYCTRL/MAUI: Task 1
DRAM System
Task 2
MAUI
MemoryController
MAU
Done: Task 1
1) CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus.2) MC interprets command and places a Read command in the transaction queue.3) DRAM performs read.4) Result is stored in appropriate register in the MAUI register file.
MAUI Instruction Set
LOAD REGCPU:
DRAM: R
MC/MAUI:
DRAM System
MAUI
MemoryController
MAU
1
23
4
1
2 3
4
MAUI_LD <m_rd>,offset(<cpu_rs>)
1) CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus.2) MC interprets command and places integer in the appropriate register in the MAUI register file.
MAUI Instruction Set II
LOADI REGCPU:
DRAM:
MC/MAUI:
DRAM System
MAUI
MemoryController
MAU
1
2
1
2
MAUI_LDI <rd>,<cpu_rs>
1) CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty. 2) CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus.3) MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue.4) Step 3 repeats for the length of the array.
MAUI Instruction Set III
MAU_ADDCPU:
DRAM: W
MC/MAUI:
1
2
4
MAUI_ADD <rd>,<rs1>,<rs2>,<rsz>
CPU
DRAM SystemMAUI
MemoryController
MAU1
2
3
3
R R W
4
Issues: Read & Write Locks
Issues: Address Mapping
TLB
Virtual Space
PhysicalSpace
Memory that is Contiguous in Virtual Space may not be Contiguous in Physical Space
•MAUI assumes consecutive addressing (size register)
•MAUI operations which cross page boundaries must be split into separate operations for each
page
•Programmer will not know mapping scheme
•Result: All MAUI operations will need to be privileged instructions, accessed by
programs through a system call.
• The compiler will be responsible for deciding when MAUI instructions should be used.
• This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI.
Issues: Compiler Issues
Issues: Task Interrupts
Task 1CPU: Task 2
MEMORY:
CPU
MEMORYCTRL/MAUI: Task 1 Task 1
DRAM System
Task 2
Task 2
MAUI
MemoryController
MAU
Memory
maui_ld r1, 0
Transaction Queue
BIU
Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr R2_statusR3_Data R3_Addr R3_statusMAU_Status = open
maui_ld r1, 0
Example: maui_add I
Memory Controller
Memory
maui_ld r2, 5
Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr R3_statusMAU_Status = open
maui_ld r2, 5
Example: maui_add II
Transaction Queue
Memory Controller
BIU
Memory
maui_ld r3, 10
Size(r4) OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr = 10 R3_statusMAU_Status = open
maui_ld r3, 10
Example: maui_add III
Transaction Queue
Memory Controller
BIU
Memory
maui_ld r4, 2
Size(r4) = 2 OffsetRL1_beg RL1_endRL2_beg RL2_endWL_beg WL_endR1_Data R1_Addr = 0 R1_statusR2_Data R2_Addr = 5 R2_statusR3_Data R3_Addr = 10 R3_statusMAU_Status = open
maui_ld r4, 2
Example: maui_add IV
Transaction Queue
Memory Controller
BIU
Memory
maui_add r3, r1, r2
R, 0
R, 5
Size(r4) = 2 Offset = 0RL1_beg = 0 RL1_end = 1RL2_beg = 5 RL2_end = 6WL_beg = 10 WL_end = 11R1_Data R1_Addr = 0 R1_status = wR2_Data R2_Addr = 5 R2_status = wR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied
maui_add r3, r1, r2
Example: maui_add V
Transaction Queue
Memory Controller
BIU
Memory
Read 10
D1[0]
maui_add r3, r1, r2*
Size(r4) = 2 Offset = 0RL1_beg = 1 RL1_end = 1RL2_beg = 5 RL2_end = 6WL_beg = 10 WL_end = 11R1_Data = D1[0] R1_Addr = 0 R1_status = fR2_Data R2_Addr = 5 R2_status = wR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied
Example: maui_add VI
Transaction Queue
Memory Controller
BIU
Memory
D2[0]
Read 10
maui_add r3, r1, r2*
Size(r4) = 2 Offset = 0RL1_beg = 1 RL1_end = 1RL2_beg = 6 RL2_end = 6WL_beg = 10 WL_end = 11R1_Data = D1[0] R1_Addr = 0 R1_status = fR2_Data = D2[0] R2_Addr = 5 R2_status = fR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied
Example: maui_add VII
Transaction Queue
Memory Controller
BIU
Memory
R, 1
R, 6
W,10, D1[0]+D2[0]
Read 10
maui_add r3, r1, r2*
Size(r4) = 2 Offset = 1RL1_beg = 1 RL1_end = 1RL2_beg = 6 RL2_end = 6WL_beg = 11 WL_end = 11R1_Data = D1[0] R1_Addr = 0 R1_status = wR2_Data = D2[0] R2_Addr = 5 R2_status = wR3_Data = D1[0] + D2[0] R3_Addr = 10 R3_status = fMAU_Status = occupied
Example: maui_add VIII
Transaction Queue
Memory Controller
BIU
Memory
Write 6, D
D1[1]
maui_add r3, r1, r2*
Size(r4) = 2 Offset = 1RL1_beg = NULL RL1_end = NULLRL2_beg = 6 RL2_end = 6WL_beg = 11 WL_end = 11R1_Data = D1[1] R1_Addr = 0 R1_status = fR2_Data R2_Addr = 5 R2_status = wR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied
Example: maui_add IX
Transaction Queue
Memory Controller
BIU
Memory
D2[1]
Write 6, D
maui_add r3, r1, r2*
Size(r4) = 2 Offset = 1RL1_beg = NULL RL1_end = NULLRL2_beg = NULL RL2_end = NULLWL_beg = 11 WL_end = 11R1_Data = D1[1] R1_Addr = 0 R1_status = fR2_Data = D2[1] R2_Addr = 5 R2_status = fR3_Data R3_Addr = 10 R3_status = uMAU_Status = occupied
Example: maui_add X
Transaction Queue
Memory Controller
BIU
Memory
Next Instruction
W,10, D1[1]+D2[1]
Size(r4) = 2 Offset = 2RL1_beg = NULL RL1_end = NULLRL2_beg = NULL RL2_end = NULLWL_beg = NULL WL_end = NULLR1_Data = D1[1] R1_Addr = 0 R1_status = uR2_Data = D2[1] R2_Addr = 5 R2_status = uR3_Data = D1[1] + D2[1] R3_Addr = 10 R3_status = fMAU_Status = free?
Example: maui_add XI
Transaction Queue
Memory Controller
BIU
Advantages & Disadvantages
Advantages•Better performance for DRAM latency bound computations
•Lower latency to DRAM compared to CPU
•Reduced traffic on front-side bus
•Concurrent execution
Disadvantages•MAUI operates at a lower clock frequency
•Increased compiler complexity
•Increased fabrication costs (More Logic = More $$)
•Recently used data may not be cached
Alternative Implementation
MAUI Occupies its Own Read & Write Bus
CPU
DRAM System
MAUIMAU
MemoryController
MAUI Read &Write Bus
•Eliminate Contention with CPU for DRAM system resources.•Create Circular Data flow resulting in increased performance•Need Specialized Triple-Ported DRAM system leading to increased production costs
GOODGOOD
X BAD
• Simulated on SimpleScalar version 4.0
• One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses.
• Found up to a 43% speedup!
Test Setup
Results
10000
100000
1000000
10000000
60 Int Array 600 Int Array 6000 Int Array 60000 Int Array
No MAUI
MAUI (Shared Bus)
MAUI (Separate Bus)
Tot
al C
PU
Cyc
les
Future Enhancements I
DRAM System
MAUI
MemoryController
MAUS
MAU Multi-taskingTask 1CPU: Task 2
MEMORY:
MEMORYCTRL/MAUI: Task 1
Task 2
Task 3
Task 3
Larger RegisterFile
More MAUs for Parallelism
SmallCache
Future Enhancements II
MAU_ADDCPU:
DRAM: W
MC/MAUI:
Better Pipelining
R R WR R R R R R WW
DRAM System
MAUI
MemoryController
MAU
Larger RegisterFile to Hold
Intermediate Results