memory management in linux kernel
DESCRIPTION
Основные темы, затронутые на семинаре: Задачи и компоненты подсистемы управления памятью; Аппаратные возможности платформы x86_64; Как описывается в ядре физическая и виртуальная память; API подсистемы управления памятью; Высвобождение ранее занятой памяти; Инструменты мониторинга; Memory Cgroups; Compaction — дефрагментация физической памяти.TRANSCRIPT
![Page 1: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/1.jpg)
1
![Page 2: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/2.jpg)
2
Memory management in Linux kernel
![Page 3: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/3.jpg)
3
Memory management tasks
• Physical memory allocator• Physical memory management• Virtual memory allocator• PTE management• Memory allocator for kernel
needs
![Page 4: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/4.jpg)
4
Memory management subsystem
• >100K lines• Buddy allocator• Page replacement (“LRU” reclaim model)• PTE management• Slab/slob/slub kernel allocator• Pagecache/writeback/readahead/swap• Cgroup memory controller• Compaction
![Page 5: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/5.jpg)
5
Hardware
• X86_64• Paging (MMU, TLB, ...)• 4KB, 2MB and 1GB pages• NUMA• 4-level PTE's• Hardware referenced bit
![Page 6: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/6.jpg)
6
Physical memory description
• Node (pg_data_t)• Zone (struct zone)• Page (struct page)
$ cat /proc/zoneinfo | grep NodeNode 0, zone DMANode 0, zone DMA32Node 0, zone NormalNode 1, zone Normal
![Page 7: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/7.jpg)
7
Virtual memory description
• Address space (struct mm_struct)• VM area (struct vm_area_struct)
$ cat /proc/self/maps 00400000-0040c000 r-xp 00000000 08:03 2359718 /usr/bin/cat
0060b000-0060c000 r--p 0000b000 08:03 2359718 /usr/bin/cat0060c000-0060d000 rw-p 0000c000 08:03 2359718 /usr/bin/cat011a7000-011c8000 rw-p 00000000 00:00 0 [heap]7f4d072e5000-7f4d0d80e000 r--p 00000000 08:03 2369473 /usr/lib/locale/locale-archive7f4d0d80e000-7f4d0d9c2000 r-xp 00000000 08:03 2366682 /usr/lib64/libc-2.18.so7f4d0d9c2000-7f4d0dbc2000 ---p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so
7f4d0dbc2000-7f4d0dbc6000 r--p 001b4000 08:03 2366682 /usr/lib64/libc-2.18.so...
![Page 8: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/8.jpg)
8
File mappings
• File mappings (struct address_space)
• Radix tree with all resident pages• Pagecache• Major/minor pagefault
![Page 9: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/9.jpg)
9
Kernel API
• __get_free_page()• kmalloc()/kfree()• vmalloc()• ...
![Page 10: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/10.jpg)
10
Userspace API
• pagefault• mmap()/munmap()• brk()• mlock()/munlock()• fadvise(), madvise()• ...
![Page 11: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/11.jpg)
11
Memory reclaim• Normal/direct reclaim (free pool)• Per-node kswapd• Working set• Memory pressure• File memory vs anonymous memory• Swap• OOM
![Page 12: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/12.jpg)
12
“LRU” model
• 5 double linked lists: inactive file, active file, inactive anon, active anon, unevictable
• Referenced flag in struct page_struct flag
![Page 13: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/13.jpg)
13
List transition rules• mark_page_accessed():
– unreferenced -> referenced– inactive && referenced -> active
• shrink_inactive_list():– if (ptes referenced)
• anonymous -> active• referenced -> active• (ptes referenced > 1) -> active (3.2)• (vm_flags & VM_EXEC) -> active (3.2)• set referenced• rotate
– else• reclaim
• shrink_active_list():– If referenced
• file & VM_EXEC -> rotate
– -> inactive
![Page 14: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/14.jpg)
14
Memory pressure balancing
• nr_pages_to_scan = nr_pages/2^priority
• priority = [12..0]1/4096, 1/2048, 1/1024, ...
• swappiness• active > inactive
![Page 15: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/15.jpg)
15
Yasearch-specific problems & solutions
• Working set > 1/2 available memory
• Memory thrashing• promote_mapped_pages• file_inactive_ratio
![Page 16: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/16.jpg)
16
Monitoring & tools• top• vmtouch• /proc/vmstat• /proc/buddyinfo• /proc/slabinfo• perf top• oom-message in dmesg
![Page 17: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/17.jpg)
17
Demonstration
![Page 18: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/18.jpg)
18
Cgroups
• Each cgroup has own LRU lists.• No common LRU (since 3.3)!• Common free pool(s)• Common kswapd thread(s)• Global reclaim vs target reclaim
![Page 19: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/19.jpg)
19
Memory controller
• memory.limit_in_bytes• memory.soft_limit_in_bytes (will
be deprecated)• memory.use_hierarchy• ...
![Page 20: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/20.jpg)
20
Monitoring
• memory.usage_in_bytes• memory.max_usage_in_bytes• memory.stat
![Page 21: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/21.jpg)
21
Accounting
• Each page belongs to one cgroup• First accessed - owner• memory.move_charge_at_immigr
ate
![Page 22: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/22.jpg)
22
Yasearch-specific problems & solutions
• memory.low_limit_in_bytes• First accessed – owner? mlock()?
low_limit?• memory.recharge_on_pgfault
![Page 23: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/23.jpg)
23
Compaction
• Physical pages migration to zone's top
• https://lwn.net/Articles/368869• Broken in 3.3-3.7• Replacement for lumpy reclaim• Use perf top for problem diagnostics
![Page 24: Memory management in Linux kernel](https://reader034.vdocuments.us/reader034/viewer/2022042504/54b7710a4a79596d3a8b475b/html5/thumbnails/24.jpg)
24
Спасибо за внимание!