transcendent memory: not just for virtualization anymore

60
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1

Upload: avi-miller

Post on 12-Jul-2015

920 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.1

Page 2: Transcendent Memory: Not Just for Virtualization Anymore

Transcendent Memory

Avi Miller

Principal Program Manager

ORACLE

PRODUCT

LOGO

Page 3: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.3

Further reading:

https://lwn.net/Articles/454795/

Page 4: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.4

Objectives

Utilise RAM more effectively

– Lower capital costs

– Lower power utilisation

– Less I/O

Better performance on many workloads

– Negligible loss on others

Page 5: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.5

Motivation: Memory-inefficient workloads

Page 6: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.6

More motivation: memory capacity wall

Memory capacity per core drops ~30% every 2 years

Source: Disaggregated Memory for Expansion and Sharing in Blade Server

http://isca09.cs.columbia.edu/pres/24.pptx

1

10

100

1000

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

10

20

11

20

12

20

13

20

14

20

15

20

16

20

17

# Core

GB DRAM

Page 7: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.7

Page 8: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.8

Slide from: Linux kernel support to exploit phase change memory, Linux Symposium 2010, Youngwoo Park, EE KAIST

Page 9: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.9

Disaggregated memory

Source: Disaggregated Memory for Expansion and Sharing in Blade Server

http://isca09.cs.columbia.edu/pres/24.pptx

Leverage fast, shared communication fabrics

Memory blade

CPUsDIMMDIMM

CPUsDIMMDIMM

CPUsDIMMDIMM

CPUsDIMMDIMM

DIMM

DIMM

DIMM

Exofa

bric

DIMM

DIMM

DIMM

DIMM

DIMM

Page 10: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.10

OS memory “demand”

Operating

systems are

memory hogs!

OS

Memory constraint

Page 11: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.11

OS Physical Memory Management

If you give an

operating

system more

memory…

New larger memory

constraint

OS

Page 12: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.12

OS Physical Memory Management

… it uses up

any memory

you give it!

My name is

Linux and I

am a

memory hog

Memory constraint

Page 13: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.13

OS Memory “Asceticism”

ASSUME

– We should use as little RAM as possible

SUPPOSE

– Mechanism to allow the OS to surrender RAM

– Mechanism to allow the OS to obtain more RAM

THEN

– How does an OS decide how much RAM it actually needs?

as-cet-i-cism, n. 1. extreme self-denial and austerity; rigorous self-discipline and

active restraint; renunciation of material comforts so as to achieve a higher state

Page 14: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.14

Impact on Linux Memory Subsystem

Page 15: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.15

Page 16: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.16

Page 17: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.17

Page 18: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.18

CAPACITY KNOWN

Can read or write to

any byte.

Page 19: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.19

CAPACITY KNOWN

Can read or write to

any byte.

CAPACITY UNKOWN

and may change

dynamically!

Page 20: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.20

• CAPACITY: known

• USES:

• kernel memory

• user memory

• DMA

• ADDRESSABILITY:

• Read/write any byte

Page 21: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.21

• CAPACITY

-“unknowable”

- dynamic

SO…

kernel/CPU can’t

address directly!

SO…

Need “permission”

to access and need

to “follow rules”

(even the kernel!)

• CAPACITY: known

• USES:

• kernel memory

• user memory

• DMA

• ADDRESSABILITY:

• Read/write any byte

Page 22: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.22

• CAPACITY: known

• USES:

• kernel memory

• user memory

• DMA

• ADDRESSABILITY:

• Read/write any byte

• THE RULES1. “page”-at-a-time

2. to put data here,

kernel MUST use a

“put page call”

3. (more rules later)

Page 23: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.23

Page 24: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.24

We have a page that contains:

And the kernel wants to

“preserve” Tux in Type B

memory.

Page 25: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.25

And the kernel wants to “preserve”

Tux into Type B memory… but…

Kernel MUST ask permission

and may get told NO!

may say NO

to kernel!

We have a page that contains:

Page 26: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.26

And the kernel wants to “preserve”

Tux into Type B memory.

Two choices…

1.DEFINITELY want Tux back(e.g. “dirty” page)

may commit to

keeping the

page around…

We have a page that contains:

may say NO

to kernel!

Page 27: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.27

And the kernel wants to “preserve”

Tux into Type B memory.

Two choices…

1.DEFINITELY want Tux back

2.PROBABLY want Tux back(but OK if disappears, e.g. “clean” pages)

may commit

to keeping the

page around…

or may not!

We have a page that contains:

may say NO

to kernel!

Page 28: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.28

Two choices…

1.DEFINITELY want Tux back

2.PROBABLY want Tux back

tran-scend-ent, adj., … beyond the range of normal perception

We have a page that contains:

Page 29: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.29

We have a page that contains:

Two choices…

1.DEFINITELY want Tux back

“PERSISTENT PUT”2.PROBABLY want Tux back

“EPHEMERAL PUT”eph-em-er-al, adj., … transitory, existing only briefly, short-

lived (i.e. NOT persistent)

tran-scend-ent, adj., … beyond the

range of normal perception

Page 30: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.30

“PUT”

“GET”

“FLUSH”

Core Transcendent Memory Operations

Page 31: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.31

“Normal” RAM

addressing

• byte-addressable

• virtual address: @fffff8000102458

0

Page 32: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.32

“Normal” RAM

addressing

• byte-addressable

• virtual address: @fffff80001024580

Transcendent

Memory• object-oriented addressing

• object is a page

• handle addresses a page

• kernel can (mostly) choose

handle when a page is put

• uses same handle to get

• must ensure handle is

and remains unique

Page 33: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.33

Why bother?

Page 34: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.34

Once we’re behind the curtain, we can do interesting things…

Page 35: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.35

Interesting thing #1

virtual machines (aka “guests”)

future?

Tmem support:

• multiple guests

• compression

• deduplication

Tmem supported in

Xen since 4.0 (2009)

hypervisor (aka “host”)

hypervisor

RAM

Page 36: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.36

Interesting thing #2

Zcache(2.6.39 staging driver)

compress

on put

decompress

on get

Page 37: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.37

Interesting thing #3

Transparently move pre-

compressed pages cross a

high-speed coherent

interconnect

Page 38: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.38

Interesting thing #3

RAMster

Peer-to-peer transcendent

memory

Page 39: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.39

Interesting thing #4

SSmem: Transcendent Memory as a

“safe” access layer for SSD or NVRAM

e.g. as a “RAM extension” not I/O device

Page 40: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.40

Interesting thing #3

…maybe only one large

memory server shared

by many machines?

Page 41: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.41

Cleancache

A third-level victim cache for otherwise reclaimed clean page cache

pages

– Optionally load-balanced across multiple clients

Cleancache patchset:

– VFS hooks to put clean page cache pages, get them back, maintain

coherency

– Per filesystem opt-in hooks

– Shim to zcache in 2.6.39

– Shim to Xen tmem in 3.0

Merged in Linux 3.0

Page 42: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.42

Frontswap

Temporary emergency FAST swap page store

– Optionally load-balanced across multiple clients

Frontswap patchset:

– Swap subsystem hooks to put and get swap cache pages

– Maintain coherency

– Manages tracking data structures (1 bit/page)

– Partial swapoff

– Shim to zcache in 2.6.39

– Shim to Xen tmem merged in 3.1

Merged in Linux 3.5

Page 43: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.43

Kernel changes

Frontends require core kernel changes

– Cleancache

– Frontswap

Backends do NOT require core kernel chances

– Zcache, RAMster, Xen tmem all implemented as drivers

Page 44: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.44

Transcendent Memory in LinuxMulti-year merge effort

Xen non-

Xen

name of patchset Linux

version

N Y zcache/zcache2 2.6.39/3.7 staging driver

Y Y cleancache 3.0 Linus decided!

Y N Xen-tmem, selfballooning 3.1

Y ? frontswap-selfshrinking 3.1

Y Y Frontswap 3.5 Linus decided!

? Y RAMster (merged w/zcache2) 3.4/3.7 staging driver

Y Y module support, frontswap unuse,

frontswap admission improvements3.8? under development

Page 45: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.45

Transcendent Memory

Transcendent Memory now in upstream Linux kernel

– cleancache, frontswap

– guest kernel support (aka Xen tmem)

– zcache

– RAMster

Transcendent Memory support has been in the Xen hypervisor for

over 2 years.

– Available in Oracle VM 2.2 and 3.x

Transcendent Memory in UEK2 for a year

– cleancache, frontswap

– guest kernel support (aka Xen tmem)

– zcache2 coming soon

Oracle Product Plans

Page 46: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.46

Pretty Graphs! Facts! Figures!

Page 47: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.47

frontswap patchset diffstat

Low core maintenance impact

– ~100 lines

No impact if CONFIG_FRONTSWAP=n

Negligible impact if CONFIG_FRONTSWAP=y and no backend

How much benefit per backend?

Documentation/vm/frontswap.txt | 210 +++++++++++++++++++

include/linux/frontswap.h | 126 ++++++++++++

include/linux/swap.h | 4

include/linux/swapfile.h | 13 +

mm/Kconfig | 17 ++

mm/Makefile | 1

mm/frontswap.c | 273 +++++++++++++++++++

mm/page_io.c | 12 +

mm/swapfile.c | 64 +++++--

9 files changed, 707 insertions(+), 13 deletions(-)

Page 48: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.48

A benchmark

Workload:

– make --jN on linux-3.1 source (after make clean)

– Fresh reboot before each run

– All tests run as root in multi-user mode

Software:

– Linux 3.2

Hardware:

– Dell Optiplex 790 (~$500)

– Intel Core i5-2400 @ 3.1Ghz (Quad Core/Hyperthreaded – 6M cache)

– 1GB DDR3 RAM @ 1333Mhz (limited by memmap)

– One 7200rpm SATA 6Gpbs drive with 8MB cache

– 10GB swap partition

– 1Gb ethernet

Page 49: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.49

Workload objective

Small N (4-12)

– No memory pressure

Page cache never fills to exceed RAM, no swapping

Medium N (16-24)

– Moderate memory pressure

Page cache fills so lots of reclaiming, but little to no swapping

Large N (28-36)

– High memory pressure

Much page cache churn, lots of swapping

Largest N (40)

– Extreme memory pressure

Little space for page cache churn, swap storm occurs

Changing N varies memory pressure

Page 50: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.50

Native/Baseline (no zcache registered)

4 8 12 16 20 24 28 32 36 40

nozcache 879 858 858 1009 1316 2164 3293 4286 6516

500

1000

2000

4000

8000

seconds

(elapsedme)

Kernelcompile“make–jN”(smallerisbe er)

DNC

didnotcomplete(18000+)

Page 51: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.51

Review: what is zcache?

when clean pages are reclaimed (cleancache “put”)

– zcache compresses/stores contents of evicted pages in RAM

– zcache has “shrinker hook” for if kernel runs low

when filesystem reads file pages (cleancache “get”)

– zcache checks if it has a copy, if so decompresses/returns

– else reads from filesystem/disk as normal

One disk access saved for every successful “get”

Captures and compresses evicted clean page cache pages

Page 52: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.52

Review: what is zcache?

when a page needs to be swapped out (frontswap “put”)

– zcache compresses/stores contents of swap page in RAM

– zcache enforces policies, may reject some (or all) pages

– frontswap maintains a bit map for saved/rejected swap pages

when a page needs to be swapped in (frontswap “get”)

– if frontswap bit is set, zcache decompresses/returns

– else read from swap disk as normal

One disk write+read saved for every successful “get”

Captures and compresses swap pages (in RAM)

Page 53: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.53

Zcache vs native/baseline

4 8 12 16 20 24 28 32 36 40

nozcache 879 858 858 1009 1316 2164 3293 4286 6516

zcache 877 856 856 922 1154 1714 2500 4282 660213755

500

1000

2000

4000

8000

seconds

(elapsedme)

Kernelcompile“make–jN”(smallerisbe er)

Upto26-31%faster

Page 54: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.54

Benchmark analysis - zcache

small N (4-12):

– no memory pressure

zcache has no effect, but apparently no measurable cost either

medium N (16-20):

– moderate memory pressure

zcache increases total pages cached due to compression

performance improves 9%-14%

large N (24-28)

– high memory pressure

zcache increases total pages cached due to compression

AND zcache uses RAM for compressed swap to avoid swap-to-disk

performance improves 26%-31%

large N (32-36)

– very high memory pressure

compressed page cache gets reclaimed before use, no advantage

compressed in-RAM swap counteracted by smaller kernel page cache?

performance improves /loses 0%-(1%)

largest N (40):

– extreme memory pressure

– in-RAM swap compression reduces worst case swapstorm

Page 55: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.55

Review: what is RAMster?

Leverages zcache, adds cluster code using kernel sockets

same as zcache but also “remotifies” compressed swap pages to

another system’s RAM

– One disk write+read saved for every successful swap “get” (at cost

of some network traffic)

– One disk access saved for every successful page cache “get” (at

cost of some network traffic)

Peer-to-peer or client-server (currently up to 8 nodes)

RAM management is entirely dynamic

Locally compresses swap and clean page cache pages, but stores in

remote RAM

Page 56: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.56

Zcache and RAMster

4 8 12 16 20 24 28 32 36 40

nozcache 879 858 858 1009 1316 2164 3293 4286 6516

zcache 877 856 856 922 1154 1714 2500 4282 660213755

ramster 887 866 875 949 1162 1788 2177 3599 5394 8172

500

1000

2000

4000

8000

seconds

(elapsedme)

Kernelcompile“make–jN”(smallerisbe er)

Page 57: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.57

Workload Analysis - RAMster

small N (4-12):

– no memory pressure

RAMster has no effect, but small cost

medium N (16-20):

– moderate memory pressure

RAMster increases total pages cached due to compression

performance improves 6%-13%

somewhat slower than zcache

large N (24-28)

– high memory pressure

RAMster increases total pages cached (local) due to compression

and RAMster uses remote RAM for to avoid swap-to-disk

performance improves 21%-51%

large N (32-36)

– very high memory pressure

compressed page cache gets reclaimed before use, no advantage

but RAMster still uses remote (compressed) RAM to avoid swap-to-disk

performance improves 19%-22% (vs zcache and native)

largest N (40):

– extreme memory pressure

use of remote RAM significantly reduces worst case swapstorm

Page 58: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.58

Questions?

Page 59: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.59

Page 60: Transcendent Memory: Not Just for Virtualization Anymore

Copyright © 2012, Oracle and/or its affiliates. All rights reserved.60