trends in memory systemsblj/talks/sun-workshop.pdf• embedded systems designers care deeply about...

16
Trends in Memory Systems Prof. Bruce Jacob Keystone Professor & Director of Computer Engineering Program Electrical & Computer Engineering University of Maryland at College Park

Upload: others

Post on 21-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Trends in Memory Systems

Prof. Bruce JacobKeystone Professor & Director of Computer Engineering ProgramElectrical & Computer EngineeringUniversity of Maryland at College Park

Page 2: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Maryland Memory-Systems Research

• Cagdas Dirik, Ph.D. 2009. Performance Analysis of NAND Flash Memory Solid-State Disks. (SanDisk)

• Sadagopan Srinivasan, Ph.D. 2007. Prefetching vs. the Memory System: Optimizations for Multi-core Server Platforms. (Intel)

• Brinda Ganesh, Ph.D. 2007. Understanding and Optimizing High-Speed Serial Memory-System Protocols. (Intel)

• Ankush Varma, Ph.D. 2007. High-Speed Performance, Power, and Thermal Co-Simulation for SoC Design. (Intel)

• Nuengwong (Ohm) Tuaycharoen, Ph.D. 2006. Disk Design-Space Exploration in Terms of System-Level Performance, Power, and Energy Consumption. (Dhurakijpundit University, Thailand)

• Samuel Rodriguez, Ph.D. 2006. myCACTI: A New Cache-Design Tool for Pipelined Nanometer Caches. (AMD)

• Aamer Jaleel, Ph.D. 2005. The Effects of Out-of-Order Execution on the Memory System. (Intel)

• David Tawei Wang, Ph.D. 2005. Modern DRAM Memory Systems: Performance Analysis and a High Performance, Power-Constrained DRAM-Scheduling Algorithm. (MetaRAM, RIP)

Page 3: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Yesterday’s high-performance technologies are today’s embedded technologies, but yesterday’s embedded-systems issues are today’s high-performance issues

Ankush Varma, U. Maryland PhD 2007 (Intel)

Page 4: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

What’s the point, exactly?

• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design.

• The memory system has become the dominant concern in performance, and it is rapidly becoming a/the dominant concern in power. Our ability to deliver main-memory capacity is nonexistent, limited by bandwidth and power. Bandwidth is also a problem, limited by power/heat, as well as physical size. In larger systems, correctness is critical. The list goes on.

• The embedded-systems community has SOLVED (or at least ADDRESSED more-or-less successfully) issues of correctness, power/space, etc. … in particular the very issues that now confront the general-purpose community.

• Pretty obvious where to look if you want to predict the near-term future …

Page 5: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Problem: Capacity

Page 6: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Problem: Capacity

MC MC

JEDEC DDRx~10W/DIMM, ~20W total

FB-DIMM~10W/DIMM, ~300W total

Page 7: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Problem: Bandwidth

• Like capacity, primarily a power and heat issue: can get more BW by adding busses, but they need to be narrow & thus fast. Fast = hot.

• Required BW per core is roughly 1 GB/s, and cores per chip is increasing

• Graph: Thread-based load (SPECjbb), memory set to 52GB/s sustained … cf. 32-core Sun Niagara: saturates at 25.6 GB/s

Page 8: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Problem: TLB Reach

• Doesn’t scale at all (still small and not upgradeable)

• Currently accounts for 20+%of system overhead

• Higher associativity (which offsets the TLB’s small size) can create a power issue

• The TLB’s “reach” is actually much worse than it looks,because of differentaccess granularities

!

"#$%&$'$()*+'((%,-'*-.)$,',/)0$

!"#$%&'()*(+,-.'/0'(1"*-2/ 12%3)($*4%$3'56($4-*/-5$*/)$()*$'57$4-88$5%*$2)98',)$*/)$:%(*+2),)5*8;+<()7$

38%,60$12%.-7)($9%4)2$7-((-9'*-%5$*/'*$-($=#$%&$'$/->/8;+3'56)7$7-2),*+:'99)7$,',/)?$',,)(($*-:)$*/'*$-($@#$

%&$'$/->/8;+3'56)7$7-2),*+:'99)7$,',/)?$'57$9)2&%2:'5,)$*/'*$-($"#$%&$'$()*+'((%,-'*-.)$,',/)0$

A8)'28;?$).)5$*/)$/->/+9)2&%2:'5,)$.)2(-%5$-($

!""#$%&#'()'(*%+,'

!"#$%&'()'*+&'+,-+.,--/0",1"2&'0,0+&3' -.(#/0&%)12%$1/)#%)3#/,(0%$1/)#/4#%#.%'.*%''/51%$16(#5%5.(#1'#5/+,%0(3#$/#$.%$#/4#%#$0%31$1/)%"#)*7%8'($*%''/51%$16(#5%5.(#9:;#%)3#%#$0%31$1/)%"#)*7%8#:%)<(3#310(5$*+%,,(3#5%5.(#9:;=#>)#,%0$15?"%0@#)/$(#$.%$#$.(#$0%31$1/)%"#)*7%8#'($*%''/51%$16(5%5.(#3016('#$.(#'()'(#%+,'#4/0#)#$%&#%00%8'#%)3#+%<('#)#$%&#5/+,%01'/)'=#-.(#310(5$*+%,,(3#%)3#.%'.*%''/51%$16(#5%5.('#(%5.#3016(#/)(#'($/4#'()'(#%+,'#%)3#+%<(#/)(#$%&#5/+,%01'/)=#

A?$,?$#

B44(5$16(#!330(''

CDE@

F/03

-GH

'($#I :8$(

'()'( '()'(

-!J K!-!

'()'( '()'(

-!J K!-!

H8$(#1)#H"/5<

C(0+1''1/)'

L%5.(>)3(M

A)(#'($

4

N1$ON1$O

%0(#%5$16%$(3A)"8#/)(#:%)<

A?$,?$#

B44(5$16(#!330(''

CDE@

F/03

-GH

'($#I :8$(

'()'( '()'(

-!J K!-!

'()'( '()'(

-!J K!-!

H8$(#1)#H"/5<

C(0+1''1/)'

L%5.(>)3(M

A)(#'($

4

N1$O

1'#%5$16%$(3

:%)<#'("(5$

PCE#:1$'

PCE#:1$'

5,6'*%,7"1"/8,9'8.:,;'-&1.,--/0",1"2&'0,0+& 5<6'*%,7"1"/8,9'7"%&01.=,>>&7'0,0+&?'8'<,8@-

A)"8#/)(#:%)<

A?$,?$#

B44(5$16(#!330(''

CDE@

F/03

-GH

:%)<#'("(5$ '($#I :8$(

'()'( '()'(

-!J K!-!

'()'( '()'(

-!J K!-!

H8$(#1)#H"/5<

C(0+1''1/)'

L%5.(>)3(M

A)(#'($

4

N1$O

1'#%5$16%$(3

.%'.:%)<#'("(5$

PCE#:1$'

506'A,-+.,--/0",1"2&'0,0+&

:

Maps ~1MB

~10MB

Page 9: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Trend: Disk, Flash, and other NV

Rotating Disks vs. SSDsMain take-aways

Forget everything you knew about rotating disks. SSDs are different

SSDs are complex software systems

One size doesn’t fit all

Rotating Disks vs. SSDsMain take-aways

Forget everything you knew about rotating disks. SSDs are different

SSDs are complex software systems

One size doesn’t fit all

Magnet structure of

voice coil motor

Spindle & Motor

Disk

Actuator

Flash

Memory Arrays

Load / Unload

Mechanism

(a) HDD (b) SSD

• Flash is currently eating Disk’s lunch

• PCM is expected to eat Flash’s lunch

Page 10: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Obvious Conclusions I

• A new take on superpages that might overcome previous barriers• A new cache design that enables very large L1 caches• A virtual memory system for modern capacities

!ese are ideas that have been in development in our research group over the past 5–6 years. Fully Bu!ered DIMM, take 2 (aka “BOB”)

In the near term, the desired solution for the DRAM system is one that allows existing commodity DDRx DIMMs to be used, one that supports 100 DIMMs per CPU socket at a bare minimum, and one that does not require active heartbeats to keep its channels alive—i.e., it

What Every CS/CE Needs to Know about the Memory System — Bruce Jacob, U. Maryland

31

CPU (e.g. multicore)

MC MC MC

Master Memory Controller

MC MC MC

Figure X. A DRAM-system organization to solve the capacity & power problems

Fast, wide channel Fast, narrow channels

Slow, wide channel

• Want capacity without sacrificing bandwidth

• Need a new memory system architecture

• This is coming(details will change,of course)

Page 11: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Obvious Conclusions II

• Flash/NV is inexpensive, is fast (rel. to disk), and has better capacity roadmap than DRAM

• Make it a first-class citizen in the memory hierarchy

• Access it via load/store interface, use DRAM to buffer writes, software management

• Probably reduces capacity pressure on DRAM system

$CPU

Page 12: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Obvious Conclusions II

• Flash/NV is inexpensive, is fast (rel. to disk), and has better capacity roadmap than DRAM

• Make it a first-class citizen in the memory hierarchy

• Access it via load/store interface, use DRAM to buffer writes, software management

• Probably reduces capacity pressure on DRAM system

$CPU

DRAMFLASH

Page 13: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Obvious Conclusions III

• Reduce translation overhead (both in performance and in power)

• Need an OS/arch redesign

• Revisit superpages,multi-level TLBs

• Revisit SASOS concepts,*location of translation point/s*

• Probably most suited for the high-end, at least initially

$CPU

DRAMFLASH*

**

Page 14: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

… and while we’re on the topic of high-end …

Enterprise & Super- Computing

• Run same app (set of apps) 24x7

• Developers spend significant time/energy optimizing apps

• Frequently run a custom (or at least fine-tune the existing) OS

• Have significant, pressing correctness/failure/dependability issues=> not intrinsic to application area, but because of large-scale multipliers

• Care very deeply about energy consumption and heat dissipation=> not intrinsic to application area, but because of large-scale multipliers

• Sounds a lot like embedded systems, no?

Page 15: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Acknowledgements & Shameless Plugs

• Much of this has appeared previously in our books, papers, etc.

• The Memory System (You Can’t Avoid It; You Can’t Ignore It; You Can’t Fake It). B. Jacob, with contributions by S. Srinivasan and D. T. Wang. ISBN 978-1598295870. Morgan & Claypool Publishers: San Rafael CA, 2009.

• Memory Systems: Cache, DRAM, Disk. B. Jacob, S. Ng, and D. Wang, with contributions by S. Rodriguez. ISBN 978-0123797513. Morgan Kaufmann: San Francisco CA, 2007.

• Support from Intel, DoD, DOE, Sandia National Lab, Micron, Cypress Semiconductor

Page 16: Trends in Memory Systemsblj/talks/Sun-Workshop.pdf• Embedded systems designers care deeply about power & heat dissipation, cost, physical size, and correctness of design. • The

Questions?(thank you for your kind indulgence)

Prof. Bruce JacobKeystone Professor & Director of Computer Engineering ProgramElectrical & Computer EngineeringUniversity of Maryland at College Park

[email protected]/~blj

… or just google “bruce jacob”