memory hotplug of memory offline when offlining memory has data, the data is migrated to other page...
TRANSCRIPT
May 29th, 2013 Yasuaki Ishimatsu <[email protected]> FUJITSU LIMITED
Memory Hotplug
Copyright 2013 FUJITSU LIMITED
LinuxCon Japan 2013
Agenda
Motivation of memory hotplug
What is memory hotplug?
Development of memory hotplug
To-do Lists
Copyright 2013 FUJITSU LIMITED 1
Hardware partitioning & Dynamic reconfiguration
Hardware partitioning System can have several servers in a box with flexibility
Copyright 2013 FUJITSU LIMITED
Dynamic reconfiguration System can change partition’s configuration at runtime
SB
Application
Partition 3
Middleware
Linux
SB
Application
Partition 2
Middleware
Linux
SB
Application
Partition 1
Middleware
Linux
SB
Application
Partition 2
Middleware
Linux
SB SB
Application
Partition 1
Middleware
Linux
Users can change partition’s configuration
while a partition is power off
SB
Application
Partition 1
Middleware
Linux
SB SB
Application
Partition 2
Middleware
Linux
Free SB User can add
Free SB
3
Purpose of memory hotplug (1)
Dynamic reconfiguration enhance RAS
Copyright 2013 FUJITSU LIMITED
Dynamic reconfiguration is used for resize
SB SB
Application
Partition 1
Middleware
Linux
SB SB
Application
Partition 1
Middleware
Linux
SB SB
Application
Partition 1
Middleware
Linux hot remove SB
hot add SB
SB
SB SB
Application
Partition 1
Middleware
Linux hot add SB
Free SB SB SB
Application
Partition 1
Middleware
Linux
SB
4
Purpose of memory hotplug (2)
Memory hotplug will be supported by KVM “ACPI memory hotplug” by Vasilis Liaskovitis
http://lists.gnu.org/archive/html/qemu-devel/2012-12/msg02693.html
5 Copyright 2013 FUJITSU LIMITED
Node1 Node0
Application
Guest machine
Middleware
Linux
Add memory
⇨ Memory hotplug has been supported by several OSs. But Linux has not copmletely support it yet.
Memory hotplug
Memory hotplug allows to users to increase/decrease the amount of memory Physical memory hotplug phase (For hot adding/removing DIMMs physically)
•memory hot add
•memory hot remove
Logical memory hotplug phase (For changing the amount of memory)
•memory online
•memory offline
Copyright 2013 FUJITSU LIMITED 7
Memory management
Copyright 2013 FUJITSU LIMITED
node0
physical address
virtual address
virtual memory map
page table
Kernel manages physical memory by using page table (direct mapping)
•calculate virtual address to physical address
virtual memory map
•manage page structures
8
Memory hot add
Prepare following items for managing added memory page table (direct mapping)
virtual memory map
Copyright 2013 FUJITSU LIMITED
physical address
virtual address
virtual memory map
page table
node0
node1
add virtual memory map
add page table
9
Memory hot remove
Delete following items for managing removed memory page table (direct mapping)
virtual memory map
Copyright 2013 FUJITSU LIMITED
physical address
virtual address
virtual memory map
page table
node0
node1
delete virtual memory map
delete page table
10
Online memory & Offline memory
All pages are managed by each zone structures
User can online/offline memory by echoing sysfs file echo online/offline >
/sys/devices/system/memory/memoryX/state
Online memory change page state to
usable
Offline memory change page stat to
unusable
Copyright 2013 FUJITSU LIMITED
physical address
node0
node1 page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
DMA32
DMA
NORMAL
NORMAL
zone
11
Status of memory hotplug on linux-3.2
Hot add memory Support
Hot remove memory Not supoprt
Online memory
Support
Offline memory Support with limitation
Copyright 2013 FUJITSU LIMITED 13
Required items for memory hot remove
Copyright 2013 FUJITSU LIMITED
Delete following items for managing removed memory page table (direct mapping)
virtual memory map
/sys/firmware/memmap sysfs
/sys/devices/syste/node/memoryX sysfs
Improve ACPI memory hotplug frame work
Update zone information
physical address
virtual address
virtual memory map
page table
node0
node1
delete virtual memory map
delete page table
All of them are merged into linux3.9
14
Limitation of memory offline
When offlining memory has data, the data is migrated to other page
But migratable pages are only page cache and anonymous page, called movable memory
So if memory is used as other purpose except for movable memory, called kernel memory, the memory cannot be offlined
Copyright 2013 FUJITSU LIMITED
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
DMA32
DMA
NORMAL
NORMAL
zone
allocate a new page and migrate date to the page
node1
node0
15
Comparison of 2 ways of offlining
Support to offline kernel memory
Pros. •All memory can be offlined
•No performance impact caused by NUMA
Cons. •Kernel address (P<->V relationship) should change completely
Make a node only of movable memory
Pros. •We can make use of existing feature
Cons. •The node can use as only page cache and anonymous page
•Performance impact caused by NUMA
Copyright 2013 FUJITSU LIMITED 16
Comparison of 2 ways of offlining
Support to offline kernel memory
Pros. •All memory can be offlined
•No performance impact caused by NUMA
Cons. •Kernel address (P<->V relation ship) should change completely
For supporting kernel memory offline, it will take several years.
Copyright 2013 FUJITSU LIMITED 17
Comparison of 2 ways of offlining
Support to offline kernel memory
Pros. •All memory can be offlined
•No performance impact caused by NUMA
Cons. •Kernel address (P<->V relation ship) should change completely
Make a node consists only of movable memory (movable node)
Pros. •We can make use of existing feature
Cons. •The node can use as only page cache and anonymous page
•Performance impact by NUMA
Copyright 2013 FUJITSU LIMITED
As first step, we develop following solution
18
Make a node consists only of movable memory
For making a memory range of movable memory, Linux has ZONE_MOVABLE, this is not created automatically
If a node is constructed by only ZONE_MOVABLE, the whole memory on the node can be migrated to other memory and offlined
Copyright 2013 FUJITSU LIMITED
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
page page page
page
section
zone
section
….
DMA32
DMA
NORMAL
MOVABLE
zone
all pages’ date can be migrated to other pages
node1
node0
19
ZONE_MOVABLE configuration
Enhance/Add features for creating ZONE_MOVABLE 1. New boot option, movablecore=acpi
2. New “online” feature, online_movable
Copyright 2013 FUJITSU LIMITED 20
MOVABLE
MOVABLE
Original boot option of creating MOVABLE zone
Linux has a boot option, called movablecore=, for creating MOVABLE zone
The boot option can specifies amount of movable memory in a system
But movable memory is evenly distributed to all nodes.
Copyright 2013 FUJITSU LIMITED
NORMAL
DMA32
DMA
NORMAL
node1
node0
DMA32
DMA
NORMAL
node1
node0
movablecore=10G
NORMAL
5G
5G
21
NORMAL
New boot option
We are proposing a new boot option “movablecore=acpi” use memory affinity structure in
SRAT table
if hotpluggable bit is enable, the memory range is managed to MOVABLE zone
Copyright 2013 FUJITSU LIMITED
DMA32
DMA
NORMAL
node1
node0
DMA32
DMA
NORMAL
MOVABLE
node1
node0
Node Hotpluggable bit
0 Disable
1 Enable
ACPI SRAT table
movablecore=acpi The feature is under developing
22
Lack of feature at onlining memory
Hot added memory is always managed by NORMAL zone
When onlining memory, the memory may contains kernel memory and movable memory
echo online > /sys/devices/system/node/nodeX/memoryY/state
Copyright 2013 FUJITSU LIMITED
NORMAL
DMA32
DMA
NORMAL
node1
node0
node1
node0
online memory
kernel memory
movable memory
movable memory
kernel memory
physical memory zone
free
free
free
23
New interface at memory onlining
Online memory into ZONE_MOVABLE
echo online_movable > /sys/devices/system/node/nodeX/memoryY/state
Copyright 2013 FUJITSU LIMITED
NORMAL
DMA32
DMA
NORMAL
node1
node0
node1
node0
online_movable memory
movable memory
movable memory
physical memory zone
MOVALBE
DMA32
DMA
NORMAL
node1
node0
zone
free
free
free
The feature is merged into linux3.8
24
Current status of memory hotplug
Hot add memory Support
Hot remove memory supoprt
Online memory
Support
Offline memory Support with limitation
•added online_movable option
•under developing movablecore=acpi option
Copyright 2013 FUJITSU LIMITED 25
Switch of changing hot added memory’s zone
Hot added memory is always managed by NORMAL zone
If user want to hot remove memory, user need to use: echo online_movable >
/sys/devices/system/nodeX/memoryY/state
Prepare SRAT information
•If movablecore=acpi is defined, check hotpluggable bit of SRAT information
sysctl
•vm.hotadd_memory_treat_as_movable
Copyright 2013 FUJITSU LIMITED
NORMAL
DMA32
DMA
NORMAL
node1
node0
DMA32
DMA
NORMAL
MOVABLE
node1
node0
use SRAT information or sysctl
27
Kernel object in a hot added memory
When hot adding memory, kernel objects like page table and virtual memory map are allocated into other memory.
Copyright 2013 FUJITSU LIMITED
DMA32
DMA
NORMAL
NORMAL/ MOVABLE
node1
node0
physical memoy zone
virtual memory map
page table
DMA32
DMA
NORMAL
NORMAL/ MOVABLE
node1
node0
physical memoy zone
virtual memory map
page table
move kernel objects in a hot added memory
28
Migrate vmalloc area
Vmalloc area is not continuous physically
When offlining vmalloc’s region, the data on the range is migrated to other memory
Copyright 2013 FUJITSU LIMITED
physical memoy
vmalloc area
Migrate vmalloc’s data
node1
node0
29