memory manager in windows

34
The Memory Manager in Windows® Server 2003 and Windows VistaLandy Wang Software Design Engineer Windows Kernel Team Microsoft Corporation

Upload: jean-pierre-camenzuli

Post on 24-Oct-2014

42 views

Category:

Documents


0 download

TRANSCRIPT

The Memory Manager in Windows Server 2003 and Windows Vista

Landy Wang Software Design Engineer Windows Kernel Team Microsoft Corporation

Outline Memory Manager (MM) improvements in Windows Server 2003 SP164-bit Windows features and Enhancements NUMA and large page support added Performance Enhancements Support for No Execute (NX) Capability

2005 Microsoft Corporation

2

Outline (cont) Memory Manager Improvements Planned for Windows VistaDynamic system address space Kernel page table pages allocated on demand Support for very large registries NUMA and large page support enhancements Advanced video model support I/O and Section Access Improvements Performance Improvements Terminal Server improvements Robustness and Diagnosability Improvements 2005 Microsoft Corporation

3

Server 2003 SP1 Windows 64

Windows 64-bit memory8TB user address space 8TB kernel address space 128GB pools 128GB system page table entries (PTEs) 1TB system cache

Support for x64 platform added 4GB virtual address space added for 32-bit large address space aware applicationsFurther increases performance of WOW layer on both Itanium and x64 systems 2005 Microsoft Corporation

4

Server 2003 SP1 NUMA & Large Page Support

Large page support added for user images and pagefile-backed sections Large pages now also used in 32-bit, even when booted with /3GB switch, forKernel Page Frame Number (PFN) database Initial non-paged pool

Prior large page support (added in Server 2003) was for the followingUser private memory Device driver image mappings Kernel, when not booted with /3GB switch 2005 Microsoft Corporation

5

Server 2003 SP1 NUMA & Large Page Support

Pages zeroed in parallel and in a node aware fashion during boot upReduces boot time on large NUMA systems

Physical pages initially consumed in top-down order, instead of bottom-upKeeps more pages below 4GB available for drivers that require it

2005 Microsoft Corporation

6

Server 2003 SP1 Performance Enhancements

Working set management performance increases, especially inAreas of large shared memory and when booted with /3GB switch

Premature self-trimming and linear hash table walks eliminatedMajor perf increases for apps like Exchange & SAP

2005 Microsoft Corporation

7

Server 2003 SP1 Performance Enhancements

Pool tagging paths parallelizedIntroduced shared acquire mode for spinlocks Employing for tag table updates

Expand hash table for tagging large pagesWhen we detect searches are occurring, instead of waiting for the table to be entirely filled

Overlapped asynchronous flushing for user requests to maximize I/O throughput Pagefiles zeroed in parallel instead of seriallyFaster shutdown when zero my pagefile is set

2005 Microsoft Corporation

8

Server 2003 SP1 Performance Enhancements

Per-process working set lock used to synchronize PTE updates and working set list changes to an address spaceSystem, session or process This lock converted from a mutex to a pushlockPushlocks support both shared and exclusive acquire modes Mutexes support only exclusive acquisitions

In conjunction with 2-byte interlocked operations this allows parallelization of many operationsMmProbeAndLockPagesCompletely remove the PFN lock acquire from this very hot routine

MmUnlockPages VirtualQuery etc. 2005 Microsoft Corporation

9

Server 2003 SP1 Performance Enhancements

Major PFN lock reduction to improve scalabilityReducing time held Replacing acquisitions with lock-free or alternative lock sequences in many places & APIs

Translation look-aside buffer (TLB) optimizations

2005 Microsoft Corporation

10

Server 2003 SP1 Other MM Enhancements

Support for no execute (NX) capability New Win32 SetThreadStackGuarantee APIAllows user applications & the CLR to specify guaranteed stack space requirements Requirements honored even in low resource scenarios

Support for hot-patching a running systemPatch system without reboot to reduce down time Backported to Windows XP SP2

2005 Microsoft Corporation

11

Windows Vista Dynamic Address SpaceSystem virtual address (VA) space allocated on-demandInstead of at boot time based registry & configuration information Region sizes bounded only by VA limitations Applies to non-paged, paged, session space, mapped views, etc.

Kernel page tables allocated on demandNo longer preallocated at system boot, saves1.5MB on x86 systems 3MB on PAE systems 16MB to 2.5GB on 64-bit machines

Boot with very large registries on 32-bit machines With and without /3GB switchImportant for large multipath LUN machines MM locates registry VA space used by boot loader & reuses it as dynamic kernel virtual address space 2005 Microsoft Corporation

12

Key Benefits of Dynamic Address SpaceNo registry editing & reboots to reconfigure systems due to resource imbalances Maximum resources available in wide range of scenarios, w/ no human interventionDesktop heap exhaustion Terminal Server maximum scaling Large video clients /3GB SQL and Exchange machines Http servers, NFS servers, etc

Features enabled w/o reboot, yet have no cost if not used 64-bit systems grow to maximum limit regardless of underlying physical configuration128GB paged pool, nonpaged pool 1TB system cache/system PTEs/special pool 128GB session pool 128GB session views (desktop heaps), etc 2005 Microsoft Corporation

13

Windows Vista Planned Enhancements for NUMA, Large System, Large Page SupportInitial nonpaged pool now NUMA aware, with separate VA ranges for each node Per-node look-asides for full pages Page table allocation for system PTEs, the system cache, etc. distributed across nodesMore even locality Avoids exhausting free pages from the boot node

NUMA-related APIs for device driversMmAllocateContiguousMemorySpecifyCacheNode MmAllocatePagesForMdlEx Default if no node is specified has been changedFrom current processor to the threads ideal processor

Zeroing of pages for these APIs bounds number of threads more intelligently 2005 Microsoft Corporation

14

Windows Vista Planned Enhancements for NUMA, Large System, Large Page SupportWin32 APIs that specify nodes for allocations & mapped views on per VAD & per section basis VirtualAllocExNuma CreateFileMappingExNuma MapViewOfFileExNuma

Scalable queryQueryWorkingSetEx

Higher perf for very physically sparse machinesExample: Hewlett-Packard Superdome1TB gaps between chunks of physical memory

PFN database & initial nonpaged pool always mapped with large pages regardless of physical memory sparseness

2005 Microsoft Corporation

15

Windows Vista Planned Enhancements for NUMA, Large System, Large Page Support /3GB mode on 32-bit systems supports up to 64GB of RAMBooting in /3GB mode on 32-bit systems now supports up to 64GB of RAM instead of just 16GB Booting without /3GB on 32-bit systems continues to support up to 128 GB of RAM

2005 Microsoft Corporation

16

Windows Vista Planned Enhancements for NUMA, Large System, Large Page Support Much faster large page allocations in kernel & user Support for cache-aligned pool allocation directives Data structures describing non-paged pool free list converted from linked list to bitmapReduced lock contention by over 50% Bitmaps can be searched opportunistically lock-free Costly combining of adjacent allocations on free no longer necessary

2005 Microsoft Corporation

17

Windows Vista New Video Model SupportDramatically different video architecture in Windows VistaMore fully exploits modern GPUs & virtual memory

MM provides new mapping typeRotate virtual address descriptors (VADs) Allow video drivers to quickly switch user views from regular application memory into Cached, non-cached, write combined AGP or video RAM mappings Allows video architecture to use GPU to rotate unneeded clients in and out on demand

First time Windows-based OS has supported fully pageable mappings w/ arbitrary cache attributes

2005 Microsoft Corporation

18

Windows Vista I/O Section Access ImprovementsPervasive prefetch-style clustering for all types of page faults and system cache read ahead Major benefits over previous clusteringInfinite size read ahead instead of 64k max Dummy page usageSo a single large I/O is always issued regardless of valid pages encountered in the cluster

Pages for the I/O are put in transition (not valid) No VA space is requiredIf the pages are not subsequently referenced, no working set trim and TLB flush is needed either

Further emphasizes that driver writers must be aware that MDL pages can have their contents change !

2005 Microsoft Corporation

19

Windows Vista I/O Section Access Improvements Significant changes in pagefile writingLarger clusters up to 4GB Align near neighbors Sort by virtual address (VA) Reduced fragmentation Improved reads

Cache manager read ahead size limitations in thread structure removed Improved synchronization between cache manager and memory manager data flushing to maximize filesystem/disk throughput and efficiency

2005 Microsoft Corporation

20

Windows Vista I/O Section Access Improvements Mapped file writing and file flushing performance increasesSupport for writes of any size up to 4GB instead of previous 64k limit per write Multiple asynchronous flushes can be issued, both internally and by the caller, to satisfy a single call

Pagefile fragmentation improvementsOn dirty bit faults, we use interlocked queuing operation to free the pagefile space of the corresponding page Avoids PFN lock acquisitions Reduces needless pagefile fragmentation 2005 Microsoft Corporation

21

Windows Vista I/O Section Access Improvements

Elimination of pagefile writes and potential subsequent re-reads of completely zero pagesCheck pages at trim time to see if they are all zero Optimization used to make this nearly freeUser virtual address used to check for the first and last ULONG_PTR being zero; if they both are, then After the page is trimmed, and TLB invalidated, a kernel mapping used to make the final check of the entire page Avoids needless scans & TLB flushesWeve measured over 90% success rate with this algorithm

2005 Microsoft Corporation

22

Windows Vista I/O Section Access Improvements Access to large section performance increasesA subsection is the name of the data structure used to describe on-disk file spans for sections The subsection structure was convertedFrom a singly linked (i.e., linear walk required) To a balanced AVL tree Enables huge performance gain for sections mapping large filesUser mappings & flushes, system cache mappings, flushes & purges, section-based backups, etc

Mapped page writer does flushing based on a sweep handData is written out much sooner than the prior 5 minute flush everything model 2005 Microsoft Corporation

23

Windows Vista I/O Section Access Improvements Dependencies between modified writer & mapped writer removed toIncrease parallelism Reduce filesystem deadlock rules Provide the cache manager with a way to influence which portions of files get written firstTo optimize disk seek as well as avoiding valid data length extension costs

2005 Microsoft Corporation

24

Windows Vista I/O Section Access ImprovementsCore support for SuperfetchEnables significantly faster app launch by deciding which pages should be prioritized Provides mechanisms to pre-fetch pages and prevents premature cannibalization Includes support for Per page priorities Access bit tracing Private page pre-fetching Section (including pagefile-backed) pre-fetching

2005 Microsoft Corporation

25

Windows Vista Fast S4 SupportHibernation converted to use memory management mirroring facilities Hibernation time reduced by 2x, with 50% smaller hiber-file Resume time reductions

2005 Microsoft Corporation

26

Windows Vista Internal Data Structure and Algorithmic Performance EnhancementsConstant PFN lock time reduction always ongoing, has included areas likeUser address space trimming and deletion MEM_RESET Page allocationsthe PFN sharecount now uses interlocked updates instead of requiring the PFN lock, etc

Page faults Modified writes Page color generation MDL construction for fault I/Os, and so on

Translation look-aside buffer (TLB) optimizations

2005 Microsoft Corporation

27

Windows Vista Internal Data Structure and Algorithmic Performance EnhancementsThe per-process address space lock used to synchronize creation/deletion/changes to user address spacesThis lock was converted from a mutex to a pushlockPushlocks support both shared and exclusive acquire modes Mutexes support only exclusive acquisitions

Allowed parallelization of many operations like VirtualAlloc, etc

VirtualAlloc support has been revamped to reduceConventional (non-AWE) allocations by over 30% AWE allocations by over 2500% (not a typo)

Address Windowing Extension (AWE) non-zeroed allocations are >10x faster than in SP1Can now therefore be used for http responses, for example

2005 Microsoft Corporation

28

Windows Internal Data Structure and Algorithmic Performance EnhancementsPFN database contains information about all physical memory in the machine In the past, whenever a new page was needed:The PFN spinlock was acquired New page removed from appropriate list chained through PFN database

This has been improved by adding a zero and free page SLIST for every NUMA node and page color Now obtain the page without needing the PFN lock in many instances where we need a single pageDemand zero faults, copy on write faults, etc For example, the fault processing path length is cut in halfAlleviates pressure on both the working set pushlock & PFN lock 2005 Microsoft Corporation

29

Windows Vista Terminal Services ImprovementsAdded Terminal Server session objectsEnables various components to have secure session IDs and implement compartment IDs, for example

Major overhaul of Terminal Server global-per-session image supportEliminated multiple image control areasTo provide single image cache & fix flush/purge/truncate races Only the shared subsections themselves are now per-session, instead of the entire image Shared subsection use AVL tree instead of a linked list, for faster searches

Support for hot-patching of session-space drivers 64-bit Windows uses demand zero pages instead of pool for WOW64 page table bitmaps 2005 Microsoft Corporation

30

Windows Vista Additional Robustness and DiagnosabilityCapability to mark system cache views as read onlyUsed by Registry to protect views from inadvertent driver corruption

Reduced data loss in the face of crashesFlush all modified data to its backing store (local & remote) if we are going to bugcheck due to a failed inpage Only failed inpages of kernel and/or drivers are fatalFailed inpages of user process code/data merely results in an inpage exception being handed to the application

Commit thresholds now reflected in global named eventsApps can use this to monitor the system

2005 Microsoft Corporation

31

Windows Vista Additional Robustness and Diagnosability .pagein debugger support for kernel/driver addresses addedAllows for viewing memory addresses which have been paged out to disk when debugging crashes

2005 Microsoft Corporation

32

Call to Action Consider these significant Memory Manager enhancements as you develop drivers for Windows Server 2003 and Windows Vista Use new APIs when available in Windows Vista

2005 Microsoft Corporation

33

2005 Microsoft Corporation. All rights reserved.This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.