scalable high performance main memory system using pcm...

24
© 2007 IBM Corporation International Symposium on Computer Architecture (ISCA-2009) Jul-28-09 Scalable High Performance Main Memory System Using PCM Technology Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers IBM T. J. Watson Research Center, Yorktown Heights, NY

Upload: others

Post on 26-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation International Symposium on Computer Architecture (ISCA-2009) Jul-28-09

Scalable High Performance Main Memory System Using PCM Technology

Moinuddin K. Qureshi Viji Srinivasan and Jude Rivers

IBM T. J. Watson Research Center, Yorktown Heights, NY

Page 2: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 2

Main Memory Capacity Wall

More cores in system More concurrency Larger working set

Demand for main memory capacity continues to increase

Main Memory System consisting of DRAM are hitting: 1. Cost wall: Major % of cost of large servers is main memory 2. Scaling wall: DRAM scaling to small technology is challenge 3. Power wall:

IBM P670 Server Processor Memory Small (4 proc, 16GB) 384 Watts 314 Watts

Large (16 proc, 128GB) 840 Watts 1223 Watts Source: Lefurgy et al. IEEE Computer 2003

Need a practical solution to increase main-memory capacity

Page 3: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 3

The Technology Hierarchy

More capacity by cheaper, denser, (slower) technology

21 23 27 211 213 215 219 223

Typical access latency in processor cycles (@ 4 GHz)

L1(SRAM) EDRAM DRAM HDD

25 29 217 221

Flash PCM

Phase Change Memory (PCM) promising candidate for large capacity main memory

High-Performance Disk Memory System

Page 4: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 4

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 5: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 5

What is Phase Change Memory?

Phase change material (chalcogenide glass) exists in two states: 1. Amorphous: high resistivity 2. Crystalline: low resistivity

Materials can be switched between states reliably, quickly, large number of times

PCM stores data in terms of resistance •  Low resistance (SET state) = 1 •  High resistance (RESET state) = 0 I

Bit Line

Word Line

N

Word Line

N N

Page 6: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 6

How does PCM work ? Tmelt

Tcryst

Time [ns]

RESET

SET

Tem

pera

ture

Switching by heating using electrical pulses

SET: sustained current to heat cell above Tcryst

RESET: cell heated above Tmelt and quenched

Large Current

SET Low resistance

103-104 Ω

Small Current

RESET High resistance

Access Device

Memory Element

106-107 Ω

Photo Courtesy: Bipin Rajendran, IBM

Page 7: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 7

Key Characteristics of PCM

+ Scales better than DRAM, small cell size Prototypes as small as 3nm x 20 nm fabricated and tested [Raoux+ IBMJRD’08]

+ Can store multiple bits/cell More density in the same area Prototypes with 2 bits/cell in ISSCC’08. >2 bits/cell expected soon.

+ Non-Volatile Memory Technology Data retention of 10 years Power implications, system implications

Challenges: -  More latency compared to DRAM. -  Limited Endurance (~10 million writes per cell) -  Write bandwidth constrained, so better to write less often.

Page 8: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 8

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 9: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 9

Hybrid Memory System

Hybrid Memory System: 1.  DRAM as cache to tolerate PCM Rd/Wr latency and Wr bandwidth 2.  PCM as main-memory to provide large capacity at good cost/power

DATA W

PCM Main Memory

DATA T

DRAM Buffer

PCM Write Queue

T=Tag-Store

Processor

Flash Or

HDD

Page 10: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 10

Lazy Write Architecture

Problem: Double PCM writes to dirty pages on install

DRAM Buffer PCM

Flash/Disk

WRQ

Processor

For example: Daxpy Kernel: Y[i] = Y[i] + X[i] Baseline has 2 writes for Y[i] and 1 for X[i] Lazy write has 1 write for Y[i] and 1 for X[i]

Page 11: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 11

Line Level Write Back

Problem: Not all lines in a dirty page are dirty Solution: Dirty bits per line in DRAM buffer and

write-back only dirty lines from DRAM to PCM

Problem: With LLWB, not all lines in dirty pages are written uniformly

Num

Writ

es to

Eac

h Li

ne (M

ln)

db1 db2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Average Average

Line_id

Page 12: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 12

Fine Grained Wear Leveling

Solution: Fine Grained Wear Leveling (FGWL) -When a page gets allocated page is rotated by a random shift value -The rotate value remains constant while page remains in memory -On replacement of a page, a new random value is assigned for a new page -Over time, the write traffic per line becomes uniform.

FGWL makes writes across lines in a dirty page uniform

Num

Writ

es to

Eac

h Li

ne (M

ln)

db1 db2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Average Average

Line_id

Page 13: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 13

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 14: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 14

Evaluation Framework

Trace Driven Simulator: 16-core system (simple core), 8GB DRAM main-memory at 320 cycles HDD (2 ms) with Flash (32 us) with Flash hit-rate of 99%

Workloads: Database workloads & Data parallel kernels

1. Database workloads: db1 and db2 2. Unix utilities: qsort and binary search 3. Data Mining : K-means and Gauss Seidal 4. Streaming: DAXPY and Vector Dot Product

Assumption: PCM 4X denser & 4X slower than DRAM 32GB @ 1280 cycle read latency

Page 15: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 15

Reduction in Page Faults

Benefit from capacity Need >16GB Streaming

Page 16: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 16

Impact on Execution Time

PCM with DRAM buffer performs similar to equal capacity DRAM storage

Page 17: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 17

Impact of PCM Latency

Hybrid memory system is relatively insensitive to PCM Latency

DRAM-8GB DRAM-32GB PCM-32GB HYBRID (1+32)GB

Page 18: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 18

Power Evaluations

Significant Power and Energy savings with PCM based hybrid memory system

Page 19: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 19

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 20: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 20

Impact of Write Endurance

F Frequency of System (4GHz) Y = Number of years (lifetime)

There are 225 seconds in a year

Num. cycles in Y years = Y. F.225

B Bytes/Cycle written to PCM S PCM capacity in bytes Wmax Max writes per PCM cell Assuming uniform writes to PCM

Endurance (in cycles) = (S/B).Wmax

Y = (S/B). Wmax F.225

If Wmax = 10 million, PCM will last for 2.5 years

For a 4GHz System, a 32GB PCM written at

1 Byte per Cycle

Y = Wmax 4 million

Page 21: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 21

Lifetime Results

Configuration Avg. Bytes/Cycle Avg. Lifetime

1GB DRAM + 32GB PCM 0.807 3.0 yrs

+ Lazy Write 0.725 3.4 yrs

+ Line Level Write Back 0.316 7.6 yrs

+ Bypass Streaming Apps 0.247 9.7 yrs

Table shows average bytes per cycle written to PCM and Average lifetime of PCM assuming Wmax = 10 million

Proposed filtering techniques reduce write traffic to PCM by 3.2X, increasing its lifetime from 3 to 9.7 years

Page 22: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 22

Outline

  Introduction   What is PCM ?   Hybrid Memory System   Evaluation   Lifetime Analysis   Summary

Page 23: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 23

Summary

  Need more main memory capacity: DRAM hitting power, cost, scaling wall

  PCM is an emerging technology – 4x denser than DRAM but with slower access time and limited write endurance

  We propose a Hybrid Memory System (DRAM+PCM) that provides significant power and performance benefits

  Proposed write filtering techniques reduce writes by 3x and increase PCM lifetime from 3 years to 9 years

Not touched in this talk but important: Exploiting non-volatile memories for system enhancement & related OS issues.

Page 24: Scalable High Performance Main Memory System Using PCM ...web.cecs.pdx.edu/~zeshan/qureshi_isca09.pdf · Scalable High Performance Main Memory System Using PCM Technology ... The

© 2007 IBM Corporation 24

Thanks!