![Page 1: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/1.jpg)
1
An Empirical Guide to Scalable Persistent Memory
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, Zixuan Wang, Yi Xu,
Subramanya R. Dulloor, Jishen Zhao, Steven Swanson
Non-Volatile Systems LaboratoryDepartment of Computer Science & Engineering
University of California, San Diego
Department of Electrical, Computer & Energy EngineeringUniversity of Colorado, Boulder
![Page 2: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/2.jpg)
2
Optane DIMMs are Here!
![Page 3: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/3.jpg)
3
Optane DIMMs are Here!
• Not Just Slow Dense DRAMTM
![Page 4: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/4.jpg)
4
Optane DIMMs are Here!
• Not Just Slow Dense DRAMTM
• Slower, denser, media
-> More complex architecture
-> Second-order performance anomalies
-> Fundamentally different
![Page 5: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/5.jpg)
6
Basics: Emulation is misleading
RocksDB
OptaneDRAM
![Page 6: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/6.jpg)
7
Outline
• Background
• Basics: Optane DIMM Performance
• Lessons: Optane DIMM Best Practices
• Conclusion
![Page 7: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/7.jpg)
8
Background
![Page 8: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/8.jpg)
9
Background: Optane in the Machine
Core CoreCore
![Page 9: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/9.jpg)
10
Background: Optane in the Machine
Core Core
L1/2 L1/2
L3
Core
L1/2
![Page 10: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/10.jpg)
11
Background: Optane in the Machine
Core Core
L1/2 L1/2
L3
iMC iMC
Core
L1/2
![Page 11: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/11.jpg)
12
Background: Optane in the Machine
Core Core
L1/2 L1/2
L3
iMC iMC
Core
L1/2
DRAM
DRAM
DRAM
DRAM
DRAM
DRAM
![Page 12: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/12.jpg)
13
Background: Optane in the Machine
Core Core
L1/2 L1/2
L3
iMC iMC
Core
L1/2
DRAM
DRAM
DRAM
Optane DIMM
Optane DIMM
Optane DIMM
DRAM
DRAM
DRAM
![Page 13: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/13.jpg)
14
Background: Optane in the Machine
Core Core
L1/2 L1/2
L3
iMC iMC
AppDirectMode
Core
L1/2
DRAM
DRAM
DRAM
Optane DIMM
Optane DIMM
Optane DIMM
DRAM
DRAM
DRAM
![Page 14: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/14.jpg)
15
Background: Optane in the Machine
Core Core
L1/2 L1/2
L3
iMC iMC
Core
L1/2
AppDirectMode
DRAM
DRAM
DRAM
Optane DIMM
Optane DIMM
Optane DIMM
Optane DIMM
Optane DIMM
Optane DIMM
DRAM
DRAM
DRAM
Far MemoryNear Memory
(Direct Mapped Cache)
Memory Mode
![Page 15: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/15.jpg)
16
Background: Optane in the Machine
Core Core
L1/2 L1/2
L3
iMC iMC
Optane DIMM
Optane DIMM
Optane DIMM
DRAM
DRAM
DRAM
Far MemoryNear Memory
(Direct Mapped Cache)
Memory Mode Core
L1/2
AppDirectMode
DRAM
DRAM
DRAM
Optane DIMM
Optane DIMM
Optane DIMM
![Page 16: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/16.jpg)
17
Background: Optane Internals
$
MC
Optane
DRAM
Optane DIMM
![Page 17: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/17.jpg)
18
Background: Optane Internals
$
MC
Optane
DRAM
WPQ
iMC
Optane DIMM
From CPU
![Page 18: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/18.jpg)
19
Background: Optane Internals
$
MC DRAM
Optane Controller
WPQ
iMC
Optane DIMM
From CPU
Optane
![Page 19: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/19.jpg)
20
Background: Optane Internals
$
MC DRAM
Optane Media
Optane Controller
WPQ
iMC
Optane DIMM
From CPU
256B Block
Optane
![Page 20: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/20.jpg)
21
Background: Optane Internals
$
MC DRAM
Optane Media
Optane ControllerOptaneBuffer
WPQ
iMC
Optane DIMM
From CPU
256B Block
Optane
![Page 21: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/21.jpg)
22
Background: Optane Internals
$
MC DRAM
Optane Media
Optane ControllerOptaneBuffer
AIT Cache
AIT
WPQ
iMC
Optane DIMM
From CPU
256B Block
Optane
![Page 22: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/22.jpg)
23
Background: Optane Internals
$
MC DRAM
Optane Media
Optane ControllerOptaneBuffer
AIT Cache
AIT
WPQ
iMC
Optane DIMM
From CPU
256B Block
ADR (Asynchronous DRAM Refresh)
Optane
![Page 23: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/23.jpg)
24
Background: Optane Interleaving
• Optane is interleaved across NVDIMMs at 4KB granularity.
Optane4
Optane5
Optane6
Optane1
Optane2
Optane3
![Page 24: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/24.jpg)
25
Background: Optane Interleaving
• Optane is interleaved across NVDIMMs at 4KB granularity.
Optane4
Optane5
Optane6
Optane1
Optane2
Optane3
4 5 61 2 3
Phys. Addr: 4KB 12KB 20KB0 8KB 16KB
1
24KB
![Page 25: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/25.jpg)
26
Basics: How does Optane DC perform?
![Page 26: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/26.jpg)
27
Basics: Our Approach
• Microbenchmark sweeps across state space– Access Patterns (random, sequential)
– Operations (read, ntstore, clflush, etc.)
– Access/Stride Size
– Power Budget
– NUMA Configuration
– Address Space Interleaving
• Targeted experiments
• Total: 10,000 experiments
![Page 27: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/27.jpg)
28
Test Platform
• CPU– Intel Cascade Lake, 24 cores at 2.2 GHz in 2 sockets
– Hyperthreading off
• DRAM– 32 GB Micron DDR4 2666 MHz
– 384 GB across 2 sockets w/ 6 channels
• Optane– 256 GB Intel Optane 2666 MHz QS
– 3 TB across 2 sockets w/ 6 channels
• OS– Fedora 27, 4.13.0
![Page 28: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/28.jpg)
29
Basics: Latency
• 2x -3x as slow as DRAM
• Write latency masked by ADR
![Page 29: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/29.jpg)
31
Basics: Bandwidth
• ~Scalable reads, Non-scalable writes
![Page 30: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/30.jpg)
32
Basics: Bandwidth
• Access size matters
![Page 31: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/31.jpg)
33
Basics: Bandwidth
• A mystery!
*answer in ~10 minutes
![Page 32: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/32.jpg)
34
Lessons: What are Optane Best Practices?
![Page 33: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/33.jpg)
35
Lessons: What are Optane Best Practices?
• Avoid small random accesses
• Use ntstores for large writes
• Limit threads accessing one NVDIMM
![Page 34: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/34.jpg)
36
Lesson #1: Avoid small random accesses
![Page 35: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/35.jpg)
37
Lesson #1: Avoid small random accesses
Optane Media
Optane ControllerOptaneBuffer
AIT Cache
AIT
WPQ
iMC
Optane DIMM
From CPU
256B Block
$
MC
Optane
DRAM
![Page 36: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/36.jpg)
38
Lesson #1: Avoid small random accesses
Optane Media
Optane ControllerOptaneBuffer
AIT Cache
AIT
WPQ
iMC
Optane DIMM
From CPU
256B Block
$
MC
Optane
DRAM
![Page 37: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/37.jpg)
39
Lesson #1: Avoid small random accesses
Optane Media
Optane ControllerOptaneBuffer
AIT Cache
AIT
WPQ
iMC
Optane DIMM
From CPU
256B Block
$
MC
Optane
DRAM
![Page 38: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/38.jpg)
40
Lessons: Optane Buffer Size
• Write amplification if working set is larger than Optane Buffer
![Page 39: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/39.jpg)
42
Lesson #1: Avoid small random accesses
• Bad bandwidth with:
– Small random writes (<256B)
– Not tiny working set / NVDIMM (>16KB)
• Good bandwidth with
– Sequential accesses
![Page 40: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/40.jpg)
43
Lesson #2: Use ntstores for large writes
![Page 41: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/41.jpg)
44
Lesson #2: Use ntstores for large writes
• Non-temporal stores bypass the cache
![Page 42: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/42.jpg)
45
Lessons: Store instructions
ntstore store + clwb store
$
MC
Optane
DRAM
1
2
3
1
2
3
![Page 43: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/43.jpg)
46
Lessons: Store instructions
ntstore store + clwb store
$
MC
Optane
DRAM
1
2
3
1
2
3
*lost bandwidth *lost locality
![Page 44: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/44.jpg)
47
Lesson #2: Use ntstores for large writes
![Page 45: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/45.jpg)
48
Lesson #2: Use ntstores for large writes
• Non-temporal stores bypass the cache
– Avoid cache-line read
– Maintain locality
![Page 46: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/46.jpg)
49
Lesson #3: Limit threads accessing one NVDIMM
![Page 47: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/47.jpg)
50
Lesson #3: Limit threads accessing one NVDIMM
• Contention at Optane Buffer
• Contention at iMC
![Page 48: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/48.jpg)
51
Lessons: Contention at Optane Buffer
• More threads = access amplification = lower bandwidth
![Page 49: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/49.jpg)
52
Lessons: Contention at media & iMC
• iMCs queues are small & media is slow
– Short queues end up clogged
![Page 50: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/50.jpg)
54
Lessons: Contention at media & iMC
• Contention is largest when random access size = interleave size (4KB)
![Page 51: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/51.jpg)
55
Lessons: Contention at media & iMC
• Contention is largest when random access size = interleave size (4KB)
• Load is fairest when access size = #DIMMs x interleave size (24KB)
![Page 52: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/52.jpg)
56
Lesson #3: Limit threads accessing one NVDIMM
• Contention at Optane Buffer
– Increase access amplification
• Contention at media/iMC
– Lose bandwidth through uneven NVDIMM load
– Avoid interleave aligned random accesses
![Page 53: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/53.jpg)
61
Conclusion
![Page 54: An Empirical Guide to Scalable Persistent Memory · Persistent Memory Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, ... –32 GB Micron DDR4 2666 MHz –384 GB across](https://reader033.vdocuments.us/reader033/viewer/2022052006/601ad07c63485a0f147d2d81/html5/thumbnails/54.jpg)
62
Conclusion
• Not Just Slow Dense DRAM
• Slower media
-> More complex architecture
-> Second-order performance anomalies
-> Fundamentally different
• Max performance is tricky– Avoid small random accesses
– Use ntstores for large writes
– Limit threads accessing one NVDIMM