memory hotplug of memory offline when offlining memory has data, the data is migrated to other page...

31
May 29th, 2013 Yasuaki Ishimatsu <[email protected]> FUJITSU LIMITED Memory Hotplug Copyright 2013 FUJITSU LIMITED LinuxCon Japan 2013

Upload: phungkiet

Post on 23-Jun-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

May 29th, 2013 Yasuaki Ishimatsu <[email protected]> FUJITSU LIMITED

Memory Hotplug

Copyright 2013 FUJITSU LIMITED

LinuxCon Japan 2013

Agenda

Motivation of memory hotplug

What is memory hotplug?

Development of memory hotplug

To-do Lists

Copyright 2013 FUJITSU LIMITED 1

MOTIVATION OF MEMORY HOTPLUG

Copyright 2013 FUJITSU LIMITED 2

Hardware partitioning & Dynamic reconfiguration

Hardware partitioning System can have several servers in a box with flexibility

Copyright 2013 FUJITSU LIMITED

Dynamic reconfiguration System can change partition’s configuration at runtime

SB

Application

Partition 3

Middleware

Linux

SB

Application

Partition 2

Middleware

Linux

SB

Application

Partition 1

Middleware

Linux

SB

Application

Partition 2

Middleware

Linux

SB SB

Application

Partition 1

Middleware

Linux

Users can change partition’s configuration

while a partition is power off

SB

Application

Partition 1

Middleware

Linux

SB SB

Application

Partition 2

Middleware

Linux

Free SB User can add

Free SB

3

Purpose of memory hotplug (1)

Dynamic reconfiguration enhance RAS

Copyright 2013 FUJITSU LIMITED

Dynamic reconfiguration is used for resize

SB SB

Application

Partition 1

Middleware

Linux

SB SB

Application

Partition 1

Middleware

Linux

SB SB

Application

Partition 1

Middleware

Linux hot remove SB

hot add SB

SB

SB SB

Application

Partition 1

Middleware

Linux hot add SB

Free SB SB SB

Application

Partition 1

Middleware

Linux

SB

4

Purpose of memory hotplug (2)

Memory hotplug will be supported by KVM “ACPI memory hotplug” by Vasilis Liaskovitis

http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html

5 Copyright 2013 FUJITSU LIMITED

Node1 Node0

Application

Guest machine

Middleware

Linux

Add memory

⇨ Memory hotplug has been supported by several OSs. But Linux has not copmletely support it yet.

WHAT IS MEMORY HOTPLUG

Copyright 2013 FUJITSU LIMITED 6

Memory hotplug

Memory hotplug allows to users to increase/decrease the amount of memory Physical memory hotplug phase (For hot adding/removing DIMMs physically)

•memory hot add

•memory hot remove

Logical memory hotplug phase (For changing the amount of memory)

•memory online

•memory offline

Copyright 2013 FUJITSU LIMITED 7

Memory management

Copyright 2013 FUJITSU LIMITED

node0

physical address

virtual address

virtual memory map

page table

Kernel manages physical memory by using page table (direct mapping)

•calculate virtual address to physical address

virtual memory map

•manage page structures

8

Memory hot add

Prepare following items for managing added memory page table (direct mapping)

virtual memory map

Copyright 2013 FUJITSU LIMITED

physical address

virtual address

virtual memory map

page table

node0

node1

add virtual memory map

add page table

9

Memory hot remove

Delete following items for managing removed memory page table (direct mapping)

virtual memory map

Copyright 2013 FUJITSU LIMITED

physical address

virtual address

virtual memory map

page table

node0

node1

delete virtual memory map

delete page table

10

Online memory & Offline memory

All pages are managed by each zone structures

User can online/offline memory by echoing sysfs file echo online/offline >

/sys/devices/system/memory/memoryX/state

Online memory change page state to

usable

Offline memory change page stat to

unusable

Copyright 2013 FUJITSU LIMITED

physical address

node0

node1 page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

DMA32

DMA

NORMAL

NORMAL

zone

11

DEVELOPMENT OF MEMORY HOTPLUG

Copyright 2013 FUJITSU LIMITED 12

Status of memory hotplug on linux-3.2

Hot add memory Support

Hot remove memory Not supoprt

Online memory

Support

Offline memory Support with limitation

Copyright 2013 FUJITSU LIMITED 13

Required items for memory hot remove

Copyright 2013 FUJITSU LIMITED

Delete following items for managing removed memory page table (direct mapping)

virtual memory map

/sys/firmware/memmap sysfs

/sys/devices/syste/node/memoryX sysfs

Improve ACPI memory hotplug frame work

Update zone information

physical address

virtual address

virtual memory map

page table

node0

node1

delete virtual memory map

delete page table

All of them are merged into linux3.9

14

Limitation of memory offline

When offlining memory has data, the data is migrated to other page

But migratable pages are only page cache and anonymous page, called movable memory

So if memory is used as other purpose except for movable memory, called kernel memory, the memory cannot be offlined

Copyright 2013 FUJITSU LIMITED

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

DMA32

DMA

NORMAL

NORMAL

zone

allocate a new page and migrate date to the page

node1

node0

15

Comparison of 2 ways of offlining

Support to offline kernel memory

Pros. •All memory can be offlined

•No performance impact caused by NUMA

Cons. •Kernel address (P<->V relationship) should change completely

Make a node only of movable memory

Pros. •We can make use of existing feature

Cons. •The node can use as only page cache and anonymous page

•Performance impact caused by NUMA

Copyright 2013 FUJITSU LIMITED 16

Comparison of 2 ways of offlining

Support to offline kernel memory

Pros. •All memory can be offlined

•No performance impact caused by NUMA

Cons. •Kernel address (P<->V relation ship) should change completely

For supporting kernel memory offline, it will take several years.

Copyright 2013 FUJITSU LIMITED 17

Comparison of 2 ways of offlining

Support to offline kernel memory

Pros. •All memory can be offlined

•No performance impact caused by NUMA

Cons. •Kernel address (P<->V relation ship) should change completely

Make a node consists only of movable memory (movable node)

Pros. •We can make use of existing feature

Cons. •The node can use as only page cache and anonymous page

•Performance impact by NUMA

Copyright 2013 FUJITSU LIMITED

As first step, we develop following solution

18

Make a node consists only of movable memory

For making a memory range of movable memory, Linux has ZONE_MOVABLE, this is not created automatically

If a node is constructed by only ZONE_MOVABLE, the whole memory on the node can be migrated to other memory and offlined

Copyright 2013 FUJITSU LIMITED

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

page page page

page

section

zone

section

….

DMA32

DMA

NORMAL

MOVABLE

zone

all pages’ date can be migrated to other pages

node1

node0

19

ZONE_MOVABLE configuration

Enhance/Add features for creating ZONE_MOVABLE 1. New boot option, movablecore=acpi

2. New “online” feature, online_movable

Copyright 2013 FUJITSU LIMITED 20

MOVABLE

MOVABLE

Original boot option of creating MOVABLE zone

Linux has a boot option, called movablecore=, for creating MOVABLE zone

The boot option can specifies amount of movable memory in a system

But movable memory is evenly distributed to all nodes.

Copyright 2013 FUJITSU LIMITED

NORMAL

DMA32

DMA

NORMAL

node1

node0

DMA32

DMA

NORMAL

node1

node0

movablecore=10G

NORMAL

5G

5G

21

NORMAL

New boot option

We are proposing a new boot option “movablecore=acpi” use memory affinity structure in

SRAT table

if hotpluggable bit is enable, the memory range is managed to MOVABLE zone

Copyright 2013 FUJITSU LIMITED

DMA32

DMA

NORMAL

node1

node0

DMA32

DMA

NORMAL

MOVABLE

node1

node0

Node Hotpluggable bit

0 Disable

1 Enable

ACPI SRAT table

movablecore=acpi The feature is under developing

22

Lack of feature at onlining memory

Hot added memory is always managed by NORMAL zone

When onlining memory, the memory may contains kernel memory and movable memory

echo online > /sys/devices/system/node/nodeX/memoryY/state

Copyright 2013 FUJITSU LIMITED

NORMAL

DMA32

DMA

NORMAL

node1

node0

node1

node0

online memory

kernel memory

movable memory

movable memory

kernel memory

physical memory zone

free

free

free

23

New interface at memory onlining

Online memory into ZONE_MOVABLE

echo online_movable > /sys/devices/system/node/nodeX/memoryY/state

Copyright 2013 FUJITSU LIMITED

NORMAL

DMA32

DMA

NORMAL

node1

node0

node1

node0

online_movable memory

movable memory

movable memory

physical memory zone

MOVALBE

DMA32

DMA

NORMAL

node1

node0

zone

free

free

free

The feature is merged into linux3.8

24

Current status of memory hotplug

Hot add memory Support

Hot remove memory supoprt

Online memory

Support

Offline memory Support with limitation

•added online_movable option

•under developing movablecore=acpi option

Copyright 2013 FUJITSU LIMITED 25

TO-DO LISTS

Copyright 2013 FUJITSU LIMITED 26

Switch of changing hot added memory’s zone

Hot added memory is always managed by NORMAL zone

If user want to hot remove memory, user need to use: echo online_movable >

/sys/devices/system/nodeX/memoryY/state

Prepare SRAT information

•If movablecore=acpi is defined, check hotpluggable bit of SRAT information

sysctl

•vm.hotadd_memory_treat_as_movable

Copyright 2013 FUJITSU LIMITED

NORMAL

DMA32

DMA

NORMAL

node1

node0

DMA32

DMA

NORMAL

MOVABLE

node1

node0

use SRAT information or sysctl

27

Kernel object in a hot added memory

When hot adding memory, kernel objects like page table and virtual memory map are allocated into other memory.

Copyright 2013 FUJITSU LIMITED

DMA32

DMA

NORMAL

NORMAL/ MOVABLE

node1

node0

physical memoy zone

virtual memory map

page table

DMA32

DMA

NORMAL

NORMAL/ MOVABLE

node1

node0

physical memoy zone

virtual memory map

page table

move kernel objects in a hot added memory

28

Migrate vmalloc area

Vmalloc area is not continuous physically

When offlining vmalloc’s region, the data on the range is migrated to other memory

Copyright 2013 FUJITSU LIMITED

physical memoy

vmalloc area

Migrate vmalloc’s data

node1

node0

29

30 Copyright 2010 FUJITSU LIMITED