case comp

44
Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Po Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Po lze lze CSE 5343/7343 CSE 5343/7343 Fall 2006 Fall 2006 Case Studies Case Studies Comparing Windows XP and Comparing Windows XP and Linux Linux

Upload: sharma-sudhir

Post on 03-May-2017

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Case Comp

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

CSE 5343/7343CSE 5343/7343Fall 2006Fall 2006

Case StudiesCase Studies

Comparing Windows XP and LinuxComparing Windows XP and Linux

Page 2: Case Comp

2

Copyright NoticeCopyright Notice© 2000-2005 David A. Solomon and Mark Russinovich© 2000-2005 David A. Solomon and Mark Russinovich

These materials are part of the These materials are part of the Windows Operating Windows Operating System Internals Curriculum Development Kit,System Internals Curriculum Development Kit, developed by David A. Solomon and Mark E. developed by David A. Solomon and Mark E. Russinovich with Andreas PolzeRussinovich with Andreas Polze

Microsoft has licensed these materials from David Microsoft has licensed these materials from David Solomon Expert Seminars, Inc. for distribution to Solomon Expert Seminars, Inc. for distribution to academic organizations solely for use in academic academic organizations solely for use in academic environments (and not for commercial use)environments (and not for commercial use)

Page 3: Case Comp

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

BackgroundBackground

ArchitectureArchitecture

Page 4: Case Comp

4

Linus and LinuxLinus and Linux

In 1991 Linus Torvalds took a college computer science In 1991 Linus Torvalds took a college computer science course that used the Minix operating systemcourse that used the Minix operating system

Minix is a “toy” UNIX-like OS written by Andrew Tanenbaum as a Minix is a “toy” UNIX-like OS written by Andrew Tanenbaum as a learning workbenchlearning workbench

Linus wanted to make MINIX more usable, but Tanenbaum Linus wanted to make MINIX more usable, but Tanenbaum wanted to keep it ultra-simplewanted to keep it ultra-simple

Linus went in his own direction and began working on Linus went in his own direction and began working on LinuxLinux

In October 1991 he announced Linux v0.02In October 1991 he announced Linux v0.02

In March 1994 he released Linux v1.0 In March 1994 he released Linux v1.0

Page 5: Case Comp

5

Windows and LinuxWindows and Linux

Both Linux and Windows are based on Both Linux and Windows are based on foundations developed in the mid-1970sfoundations developed in the mid-1970s

1970 1980 1990 2000

UNIX bo

rnUNIX

publi

cUNIX

V6

Linux

v1.0

v2.0

v2.1

v2.2

v2.3

v2.4

v2.6

1970 1980 1990 2000

VMS v1.0

Wind

ows N

T 3.1

NT 4.

0W

indow

s 200

0W

indow

s XP

Server

2003

Page 6: Case Comp

6

Comparing the ArchitecturesComparing the Architectures

Both Linux and Windows are monolithicBoth Linux and Windows are monolithicAll core operating system services run in a shared address space All core operating system services run in a shared address space in kernel-modein kernel-mode

All core operating system services are part of a single moduleAll core operating system services are part of a single module

Linux: vmlinuz Linux: vmlinuz

Windows: ntoskrnl.exeWindows: ntoskrnl.exe

Windowing is handled differently:Windowing is handled differently:Windows has a kernel-mode Windowing subsystemWindows has a kernel-mode Windowing subsystem

Linux has a user-mode X-Windowing systemLinux has a user-mode X-Windowing system

Page 7: Case Comp

7

Kernel ArchitecturesKernel Architectures

Device Drivers

Process Management, Memory Management, I/O Management, etc.

X-Windows

Application

System Services

User ModeKernel Mode

Hardware Dependent Code

Linux

Device Drivers

Process Management, Memory Management, I/O Management, etc.

Win32Windowing

Application

System Services

User ModeKernel Mode

Hardware Dependent Code

Windows

Page 8: Case Comp

8

Linux KernelLinux KernelLinux is a monolithic but modular systemLinux is a monolithic but modular system

All kernel subsystems form a single piece of code with no All kernel subsystems form a single piece of code with no protection between themprotection between them

Modularity is supported in two ways:Modularity is supported in two ways:Compile-time optionsCompile-time options

Most kernel components can be built as a dynamically loadable Most kernel components can be built as a dynamically loadable kernel module (DLKM)kernel module (DLKM)

DLKMsDLKMsBuilt separately from the main kernel Built separately from the main kernel

Loaded into the kernel at runtime and on demand (infrequently Loaded into the kernel at runtime and on demand (infrequently used components take up kernel memory only when needed)used components take up kernel memory only when needed)

Kernel modules can be upgraded incrementallyKernel modules can be upgraded incrementally

Support for minimal kernels that automatically adapt to the Support for minimal kernels that automatically adapt to the machine and load only those kernel components that are usedmachine and load only those kernel components that are used

Page 9: Case Comp

9

Windows KernelWindows Kernel

Windows is a monolithic but modular systemWindows is a monolithic but modular systemNo protection among pieces of kernel code and driversNo protection among pieces of kernel code and drivers

Support for Modularity is somewhat weak:Support for Modularity is somewhat weak:Windows Drivers allow for dynamic extension of kernel Windows Drivers allow for dynamic extension of kernel functionalityfunctionality

Windows XP Embedded has special tools / packaging rules that Windows XP Embedded has special tools / packaging rules that allow coarse-grained configuration of the OSallow coarse-grained configuration of the OS

Windows Drivers are dynamically loadable kernel modulesWindows Drivers are dynamically loadable kernel modulesSignificant amount of code run as drivers (including network Significant amount of code run as drivers (including network stacks such as TCP/IP and many services)stacks such as TCP/IP and many services)

Built independently from the kernelBuilt independently from the kernel

Can be loaded on-demandCan be loaded on-demand

Dependencies among drivers can be specifiedDependencies among drivers can be specified

Page 10: Case Comp

10

Comparing PortabilityComparing Portability

Both Linux and Windows kernels are portableBoth Linux and Windows kernels are portableMainly written in CMainly written in CHave been ported to a range of processor architecturesHave been ported to a range of processor architectures

WindowsWindowsi486, MIPS, PowerPC, Alpha, IA-64, x86-64i486, MIPS, PowerPC, Alpha, IA-64, x86-64Only x86-64 and IA-64 currently supportedOnly x86-64 and IA-64 currently supported> 64MB memory required> 64MB memory required

LinuxLinuxAlpha, ARM, ARM26, CRIS, H8300, i386, IA-64, M68000, MIPS, Alpha, ARM, ARM26, CRIS, H8300, i386, IA-64, M68000, MIPS, PA-RISC, PowerPC, S/390, SuperH, SPARC, VAX, v850, x86-PA-RISC, PowerPC, S/390, SuperH, SPARC, VAX, v850, x86-6464DLKMs allow for minimal kernels for microcontrollersDLKMs allow for minimal kernels for microcontrollers> 4MB memory required> 4MB memory required

Page 11: Case Comp

11

Comparing Layering, APIs, ComplexityComparing Layering, APIs, Complexity

WindowsWindowsKernel exports about 250 system calls (accessed via ntdll.dll)Kernel exports about 250 system calls (accessed via ntdll.dll)

Layered Windows/POSIX subsystems Layered Windows/POSIX subsystems

Rich Windows API (17 500 functions on top of native APIs)Rich Windows API (17 500 functions on top of native APIs)

LinuxLinuxKernel supports about 200 different system callsKernel supports about 200 different system calls

Layered BSD, Unix Sys V, POSIX shared system librariesLayered BSD, Unix Sys V, POSIX shared system libraries

Compact APIs (1742 functions in Single Unix Specification Compact APIs (1742 functions in Single Unix Specification Version 3; not including X Window APIs)Version 3; not including X Window APIs)

Page 12: Case Comp

12

Comparing ArchitecturesComparing Architectures

Processes and schedulingProcesses and scheduling

SMP supportSMP support

Memory managementMemory management

I/OI/O

File CachingFile Caching

Security Security

Page 13: Case Comp

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

Process ManagementProcess Management

Page 14: Case Comp

14

Process ManagementProcess ManagementWindowsWindows

ProcessProcessAddress space, handle Address space, handle table, statistics and at least table, statistics and at least one threadone threadNo inherent parent/child No inherent parent/child relationshiprelationship

ThreadsThreadsBasic scheduling unitBasic scheduling unitFibers - cooperative user-Fibers - cooperative user-mode threadsmode threads

LinuxLinux

Process is called a TaskProcess is called a Task

Basic Address space, Basic Address space, handle table, statisticshandle table, statistics

Parent/child relationshipParent/child relationship

Basic scheduling unitBasic scheduling unit

ThreadsThreads

No threads per-seNo threads per-se

Tasks can act like Windows Tasks can act like Windows threads by sharing handle threads by sharing handle table, PID and address table, PID and address spacespace

PThreads – cooperative PThreads – cooperative user-mode threadsuser-mode threads

Page 15: Case Comp

15

Scheduling PrioritiesScheduling PrioritiesWindowsWindows

Two scheduling classesTwo scheduling classes““Real time” (fixed) - Real time” (fixed) - priority 16-31priority 16-31Dynamic - priority 1-15Dynamic - priority 1-15

Higher priorities are Higher priorities are favoredfavored

Priorities of dynamic Priorities of dynamic threads get boosted on threads get boosted on wakeupswakeupsThread priorities are Thread priorities are never lowerednever lowered

31

15

16

0

Fixed

DynamicI/O

Windows

Page 16: Case Comp

16

Scheduling PrioritiesScheduling PrioritiesWindowsWindows

Two scheduling classesTwo scheduling classes““Real time” (fixed) - Real time” (fixed) - priority 16-31priority 16-31Dynamic - priority 1-15Dynamic - priority 1-15

Higher priorities are Higher priorities are favoredfavored

Priorities of dynamic Priorities of dynamic threads get boosted on threads get boosted on wakeupswakeupsThread priorities are Thread priorities are never lowerednever lowered

LinuxLinuxHas 3 scheduling classes:Has 3 scheduling classes:

Normal – priority 100-139Normal – priority 100-139Fixed Round Robin – priority Fixed Round Robin – priority 0-990-99Fixed FIFO – priority 0-99Fixed FIFO – priority 0-99

Lower priorities are favored Lower priorities are favored Priorities of normal threads Priorities of normal threads go up (decay) as they use go up (decay) as they use CPUCPUPriorities of interactive Priorities of interactive threads go down (boost)threads go down (boost)

Page 17: Case Comp

17

Scheduling Priorities (cont)Scheduling Priorities (cont)

31

15

16

0

Fixed

DynamicI/O

Windows

140

10099

0

Fixed FIFO

Fixed Round-Robin

NormalCPU

I/O

Linux

Page 18: Case Comp

18

Linux Scheduling DetailsLinux Scheduling Details

Most threads use a dynamic priority policy Most threads use a dynamic priority policy Normal class - similar to the classic UNIX schedulerNormal class - similar to the classic UNIX scheduler

A newly created thread starts with a base priority A newly created thread starts with a base priority

Threads that block frequently (I/O bound) will have their priority Threads that block frequently (I/O bound) will have their priority gradually increasedgradually increased

Threads that always exhaust their time slice (CPU bound) will Threads that always exhaust their time slice (CPU bound) will have their priority gradually decreasedhave their priority gradually decreased

““Nice value” sets a thread’s base priorityNice value” sets a thread’s base priorityLarger values = less priority, lower values = higher priorityLarger values = less priority, lower values = higher priority

Valid nice values are in the range of -20 to +20 Valid nice values are in the range of -20 to +20

Nonprivileged users can only specify positive nice valueNonprivileged users can only specify positive nice value

Dynamic priority policy threads have static priority zero Dynamic priority policy threads have static priority zero Execute only when there are no runnable real-time threadsExecute only when there are no runnable real-time threads

Page 19: Case Comp

19

Real-Time Scheduling on LinuxReal-Time Scheduling on Linux

Linux supports two static priority scheduling policies:Linux supports two static priority scheduling policies:Round-robin and FIFO (first in, first out)Round-robin and FIFO (first in, first out)

Selected with the sched-setscheduler( ) system callSelected with the sched-setscheduler( ) system call

Use static priority values in the range of 1 to 99Use static priority values in the range of 1 to 99

Executed strictly in order of decreasing static priorityExecuted strictly in order of decreasing static priority

FIFO policy lets a thread run to completion FIFO policy lets a thread run to completion Thread needs to indicate completion by calling the sched-yield( )Thread needs to indicate completion by calling the sched-yield( )

Round-robin lets threads run for up to one time slice Round-robin lets threads run for up to one time slice Then switches to the next thread with the same static priorityThen switches to the next thread with the same static priority

RT threads can easily starve lower-prio threads from executing RT threads can easily starve lower-prio threads from executing Root privileges or the CAP-SYS-NICE capability are required for the Root privileges or the CAP-SYS-NICE capability are required for the selection of a real-time scheduling policyselection of a real-time scheduling policy

Long running system calls can cause priority-inversionLong running system calls can cause priority-inversionSame as in Windows; but cmp. rtLinuxSame as in Windows; but cmp. rtLinux

Page 20: Case Comp

20

Windows Scheduling DetailsWindows Scheduling Details

Most threads run in variable priority levelsMost threads run in variable priority levelsPriorities 1-15; Priorities 1-15;

A newly created thread starts with a base priority A newly created thread starts with a base priority

Threads that complete I/O operations experience priority Threads that complete I/O operations experience priority boosts (but never higher than 15)boosts (but never higher than 15)

A thread’s priority will never be below base priorityA thread’s priority will never be below base priority

The Windows API function SetThreadPriority() sets the The Windows API function SetThreadPriority() sets the priority value for a specified threadpriority value for a specified thread

This value, together with the priority class of the thread's This value, together with the priority class of the thread's process, determines the thread's base priority levelprocess, determines the thread's base priority level

Windows will dynamically adjust priorities for non-realtime Windows will dynamically adjust priorities for non-realtime threadsthreads

Page 21: Case Comp

21

Real-Time Scheduling on WindowsReal-Time Scheduling on Windows

Windows supports static round-robin scheduling policy for Windows supports static round-robin scheduling policy for threads with priorities in real-time range (16-31)threads with priorities in real-time range (16-31)

Threads run for up to one quantumThreads run for up to one quantum

Quantum is reset to full turn on preemptionQuantum is reset to full turn on preemption

Priorities never get boostedPriorities never get boosted

RT threads can starve important system servicesRT threads can starve important system servicesSuch as CSRSS.EXESuch as CSRSS.EXE

SeIncreaseBasePriorityPrivilege required to elevate a thread’s SeIncreaseBasePriorityPrivilege required to elevate a thread’s priority into real-time range (this privilege is assigned to priority into real-time range (this privilege is assigned to members of Administrators group)members of Administrators group)

System calls and DPC/APC handling can cause priority System calls and DPC/APC handling can cause priority inversioninversion

Page 22: Case Comp

22

Scheduling TimeslicesScheduling TimeslicesWindowsWindows

The thread timeslice The thread timeslice (quantum) is 10ms-120ms(quantum) is 10ms-120ms

When quanta can vary, When quanta can vary, has one of 2 valueshas one of 2 values

Reentrant and Reentrant and preemptible preemptible

Fixed: 120ms

20ms

Foreground: 60ms

Background

LinuxLinux

The thread quantum is The thread quantum is 10ms-200ms10ms-200ms

Default is 100msDefault is 100ms

Varies across entire Varies across entire range based on priority, range based on priority, which is based on which is based on interactivity levelinteractivity level

Reentrant and Reentrant and preemptible preemptible

100ms

200ms10ms

Page 23: Case Comp

23

Kernel ReentrancyKernel Reentrancy

Mark Russinovich’s April 1999 Windows NT Magazine article, “Linux Mark Russinovich’s April 1999 Windows NT Magazine article, “Linux and the Enterprise”, pointed out that much of the Linux 2.2 was not and the Enterprise”, pointed out that much of the Linux 2.2 was not reentrantreentrant

Ingo Molnar stated in rebuttal:Ingo Molnar stated in rebuttal:““his example is a clear red herring.”his example is a clear red herring.”

A month later he made all major paths reentrantA month later he made all major paths reentrant

cpu 1

cpu 2

cpu 1cpu 2

Non-reentrant

Reentrant

Time Saved

Page 24: Case Comp

24

Kernel PreemptibilityKernel Preemptibility

A preemptible kernel is more responsive to high-priority A preemptible kernel is more responsive to high-priority taskstasks

Through the base release of v2.4 Linux was only Through the base release of v2.4 Linux was only cooperativelycooperatively preemptible preemptible

There are well-defined safe places where a thread running in the There are well-defined safe places where a thread running in the kernel can be preemptedkernel can be preempted

The kernel is preemptible in v2.4 patches and v2.6The kernel is preemptible in v2.4 patches and v2.6

Windows NT has always been preemptibleWindows NT has always been preemptible

Page 25: Case Comp

25

SchedulingScheduling

The Linux 2.4 scheduler is O(n)The Linux 2.4 scheduler is O(n)If there are 10 active tasks, it scans 10 of them in a list in order to If there are 10 active tasks, it scans 10 of them in a list in order to decide which should execute nextdecide which should execute nextThis means long scans and long durations under the scheduler lockThis means long scans and long durations under the scheduler lock

103 112 112 101ReadyList

Highest PriorityTask

Page 26: Case Comp

26

SchedulingScheduling

Linux 2.6 has a revamped scheduler that’s O(1) from Ingo Molnar Linux 2.6 has a revamped scheduler that’s O(1) from Ingo Molnar that:that:

Calculates a task’s priority at the time it makes scheduling decisionCalculates a task’s priority at the time it makes scheduling decision

Has per-CPU ready queues where the tasks are pre-sorted by priorityHas per-CPU ready queues where the tasks are pre-sorted by priority

112 112

101

103

Highest-priorityNon-empty Queue

Page 27: Case Comp

27

SchedulingScheduling

Windows NT has always had an O(1) scheduler based Windows NT has always had an O(1) scheduler based on pre-sorted thread priority queueson pre-sorted thread priority queues

Server 2003 introduced per-CPU ready queuesServer 2003 introduced per-CPU ready queuesLinux load balances queues Linux load balances queues

Windows does notWindows does not

Not seen as an issue in performance testing by MicrosoftNot seen as an issue in performance testing by Microsoft

Applications where it might be an issue are expected to use affinityApplications where it might be an issue are expected to use affinity

Page 28: Case Comp

28

Zero-Copy SendfileZero-Copy Sendfile

Linux 2.2 introduced Sendfile to efficiently send file data over a Linux 2.2 introduced Sendfile to efficiently send file data over a socketsocket

I pointed out that the initial implementation incurred a copy operation, I pointed out that the initial implementation incurred a copy operation, even if the file data was cachedeven if the file data was cached

Linux 2.4 introduced zero-copy SendfileLinux 2.4 introduced zero-copy Sendfile

Windows NT pioneered zero-copy file sending with TransmitFile, the Windows NT pioneered zero-copy file sending with TransmitFile, the Sendfile equivalent, in Windows NT 4Sendfile equivalent, in Windows NT 4

File DataBuffer

Network AdapterBuffer

Network

File DataBuffer

NetworkDriver

NetworkNetworkDriver

1-Copy 0-Copy

Page 29: Case Comp

29

Wake-one Socket SemanticsWake-one Socket Semantics

Linux 2.2 kernel had the Linux 2.2 kernel had the thundering herdthundering herd or or overschedulingoverscheduling problem problem

In a network server application there are typically several In a network server application there are typically several threads waiting for a new connectionthreads waiting for a new connectionIn v2.2 when a new connection came in all the waiters would In v2.2 when a new connection came in all the waiters would race to get itrace to get it

Ingo Molnar’s response: Ingo Molnar’s response: 5/2/99: “here he again forgets to _prove_ that overscheduling 5/2/99: “here he again forgets to _prove_ that overscheduling happens in Linux.”happens in Linux.”5/7/99: “as of 2.3.1 my wake-one implementation and 5/7/99: “as of 2.3.1 my wake-one implementation and waitqueues rewrite went in”waitqueues rewrite went in”

In Linux 2.4 only one thread wakes up to claim the new In Linux 2.4 only one thread wakes up to claim the new connection connection Windows NT has always had wake-1 semanticsWindows NT has always had wake-1 semantics

Page 30: Case Comp

30

Light-Weight SynchronizationLight-Weight Synchronization

Linux 2.6 introduces FutexesLinux 2.6 introduces FutexesThere’s only a transition to kernel-mode when there’s There’s only a transition to kernel-mode when there’s contentioncontention

Windows has always had CriticalSectionsWindows has always had CriticalSectionsSame behaviorSame behavior

Futexes go further:Futexes go further:Allow for prioritization of waitsAllow for prioritization of waits

Works interprocess as well Works interprocess as well

Page 31: Case Comp

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

Memory ManagementMemory Management

Page 32: Case Comp

32

Virtual Memory ManagementVirtual Memory ManagementWindowsWindows

32-bit versions split 32-bit versions split user-mode/kernel-mode from user-mode/kernel-mode from 2GB/2GB to 3GB/1GB2GB/2GB to 3GB/1GBDemand-paged virtual memoryDemand-paged virtual memory

32 or 64-bits32 or 64-bitsCopy-on-writeCopy-on-writeShared memoryShared memoryMemory mapped filesMemory mapped files

User

System

0

2GB

4GB

LinuxLinuxSplits user-mode/kernel-mode Splits user-mode/kernel-mode from 1GB/3GB to 3GB/1GBfrom 1GB/3GB to 3GB/1GB

2.6 has “4/4 split” option where 2.6 has “4/4 split” option where kernel has its own address kernel has its own address spacespace

Demand-paged virtual memoryDemand-paged virtual memory32-bits and/or 64-bits32-bits and/or 64-bitsCopy-on-writeCopy-on-writeShared memoryShared memoryMemory mapped filesMemory mapped files

User

System

0

3GB

4GB

Page 33: Case Comp

33

Physical Memory ManagementPhysical Memory ManagementWindowsWindows

Per-process working setsPer-process working setsWorking set tuner adjust Working set tuner adjust sets according to memory sets according to memory needs using the “clock” needs using the “clock” algorithmalgorithm

No “swapper”No “swapper”

Process

LRU

Reused Page

LinuxLinux

Global working set Global working set managementmanagementuses “clock” algorithmuses “clock” algorithm

No “swapper” (the working No “swapper” (the working set trimmer code is called set trimmer code is called the swap daemon, however)the swap daemon, however)

LRU

Reused Page

Other ProcessLRU

Page 34: Case Comp

34

I/O ManagementI/O ManagementWindowsWindows

Centered around the file objectCentered around the file objectLayered driver architecture Layered driver architecture throughout driver typesthroughout driver typesMost I/O supports asynchronous Most I/O supports asynchronous operationoperationInternal interrupt request level Internal interrupt request level (IRQL) controls interruptability(IRQL) controls interruptabilityInterrupts are split between an Interrupts are split between an Interrupt Service Routine (ISR) Interrupt Service Routine (ISR) and a Deferred Procedure Call and a Deferred Procedure Call (DPC)(DPC)Supports plug-and-playSupports plug-and-play

LinuxLinuxCentered around the vnodeCentered around the vnodeNo layered I/O modelNo layered I/O modelMost I/O is synchronousMost I/O is synchronous

Only sockets and direct disk Only sockets and direct disk I/O support asynchronous I/O support asynchronous I/OI/O

Internal interrupt request level Internal interrupt request level (IRQL) controls interruptability(IRQL) controls interruptabilityInterrupts are split between an Interrupts are split between an ISR and soft IRQ or taskletISR and soft IRQ or taskletSupports plug-and-playSupports plug-and-play

IRQLMasked

Page 35: Case Comp

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

I/O & File SystemI/O & File System Management Management

Page 36: Case Comp

36

File CachingFile CachingWindowsWindows

Single global common cacheSingle global common cache

Virtual file cacheVirtual file cacheCaching is at file vs. disk block Caching is at file vs. disk block levellevel

Files are memory mapped into Files are memory mapped into kernel memory kernel memory

Cache allows for zero-copy file Cache allows for zero-copy file servingserving

File Cache

File System Driver

Disk Driver

LinuxLinuxSingle global common cacheSingle global common cache

Virtual file cacheVirtual file cacheCaching is at file vs. disk block Caching is at file vs. disk block levellevel

Files are memory mapped into Files are memory mapped into kernel memory kernel memory

Cache allows for zero-copy file Cache allows for zero-copy file servingserving

File Cache

File System Driver

Disk Driver

Page 37: Case Comp

37

Monitoring - Linux procfsMonitoring - Linux procfs

Linux supports a number of special filesystemsLinux supports a number of special filesystemsLike special files, they are of a more dynamic nature and tend to have side Like special files, they are of a more dynamic nature and tend to have side effects when accessedeffects when accessed

Prime example is procfs Prime example is procfs (mounted at /proc)(mounted at /proc)

provides access to and control over various aspects of Linux (I.e.; scheduling provides access to and control over various aspects of Linux (I.e.; scheduling and memory management)and memory management)

/proc/meminfo contains detailed statistics on the current memory usage of Linux/proc/meminfo contains detailed statistics on the current memory usage of Linux

Content changes as memory usage changes over timeContent changes as memory usage changes over time

Services for Unix implements procfs on WindowsServices for Unix implements procfs on Windows

Page 38: Case Comp

38

I/O ProcessingI/O Processing

Linux 2.2 had the notion of bottom halves (BH) for low-Linux 2.2 had the notion of bottom halves (BH) for low-priority interrupt processingpriority interrupt processing

Fixed number of BHsFixed number of BHs

Only one BH of a given type could be active on a SMPOnly one BH of a given type could be active on a SMP

Linux 2.4 introduced Linux 2.4 introduced taskletstasklets, which are non-preemptible , which are non-preemptible procedures called with interrupts enabledprocedures called with interrupts enabled

Tasklets are the equivalent of Windows Deferred Tasklets are the equivalent of Windows Deferred Procedure Calls (DPCs)Procedure Calls (DPCs)

Page 39: Case Comp

39

Asynchronous I/OAsynchronous I/O

Linux 2.2 only supported asynchronous I/O on socket Linux 2.2 only supported asynchronous I/O on socket connect operations and tty’sconnect operations and tty’s

Linux 2.6 adds asynchronous I/O for direct-disk accessLinux 2.6 adds asynchronous I/O for direct-disk accessAIO model includes efficient management of asynchronous I/OAIO model includes efficient management of asynchronous I/O

Also added alternate epoll modelAlso added alternate epoll model

Useful for database servers managing their database on a Useful for database servers managing their database on a dedicated raw partitiondedicated raw partition

Database servers that manage a file-based database suffer from Database servers that manage a file-based database suffer from synchronous I/Osynchronous I/O

Windows I/O is inherently asynchronousWindows I/O is inherently asynchronous

Windows has had completion ports since NT 3.5Windows has had completion ports since NT 3.5More advanced form of AIO More advanced form of AIO

Page 40: Case Comp

Windows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas PolzeWindows Operating System Internals - by David A. Solomon and Mark E. Russinovich with Andreas Polze

SecuritySecurity

Page 41: Case Comp

41

SecuritySecurityWindowsWindows

Very flexible security model based on Very flexible security model based on Access Control ListsAccess Control ListsUsers are defined withUsers are defined with

PrivilegesPrivilegesMember groupsMember groups

Security can be applied to any Object Security can be applied to any Object Manager objectManager object

Files, processes, synchronization Files, processes, synchronization objects, …objects, …

Supports auditingSupports auditing

LinuxLinuxTwo models: Two models:

Standard UNIX modelStandard UNIX model

Access Control Lists (SELinux)Access Control Lists (SELinux)

Users are defined with:Users are defined with:Capabilities (privileges)Capabilities (privileges)

Member groupsMember groups

Security is implemented on an Security is implemented on an object-by-object basisobject-by-object basis

Has no built-in auditing supportHas no built-in auditing support

Version 2.6 includes Linux Security Version 2.6 includes Linux Security Module framework for add-on Module framework for add-on security modelssecurity models

Page 42: Case Comp

42

A Look at the FutureA Look at the FutureThe kernel architectures are fundamentally similarThe kernel architectures are fundamentally similar

There are differences in the detailsThere are differences in the detailsLinux implementation is adopting more of the good ideas used in Linux implementation is adopting more of the good ideas used in WindowsWindows

For the next 2-4 years Windows has and will maintain an edgeFor the next 2-4 years Windows has and will maintain an edgeLinux is still behind on the cutting edge of performance tricksLinux is still behind on the cutting edge of performance tricksLarge performance team and lab at Microsoft has direct ties into the Large performance team and lab at Microsoft has direct ties into the kernel developerskernel developers

As time goes on the technological gap will narrowAs time goes on the technological gap will narrowOpen Source Development Labs (OSDL) will feed performance test Open Source Development Labs (OSDL) will feed performance test results to the kernel teamresults to the kernel teamIBM and other vendors have Linux technology centersIBM and other vendors have Linux technology centersSqueezing performance out of the OS gets much harder as the OS Squeezing performance out of the OS gets much harder as the OS gets more tunedgets more tuned

Page 43: Case Comp

43

Linux Technology UnknownsLinux Technology Unknowns

Linux kernel forkingLinux kernel forkingRedHat has already done it: Red Hat Enterprise Server v3.0 is RedHat has already done it: Red Hat Enterprise Server v3.0 is Linux 2.4 with some Linux 2.6 featuresLinux 2.4 with some Linux 2.6 features

Backward compatibility philosophyBackward compatibility philosophyLinus Torvalds makes decisions on kernel APIs and Linus Torvalds makes decisions on kernel APIs and architecture based on technical reasons, not business reasonsarchitecture based on technical reasons, not business reasons

Page 44: Case Comp

44

Further ReadingFurther ReadingTransaction Processing Council: www.tpc.orgTransaction Processing Council: www.tpc.org

SPEC: www.spec.orgSPEC: www.spec.org

NT vs Linux benchmarks: www.kegel.com/nt-linux-benchmarks.htmlNT vs Linux benchmarks: www.kegel.com/nt-linux-benchmarks.html

The C10K problem: http://www.kegel.com/c10k.htmlThe C10K problem: http://www.kegel.com/c10k.html

Linus Torvald’s home: http://www.osdl.org/Linus Torvald’s home: http://www.osdl.org/

Linux Kernel Archives: http://www.kernel.org/Linux Kernel Archives: http://www.kernel.org/

Linux history: http://www.firstmonday.dk/issues/issue5_11/moon/Linux history: http://www.firstmonday.dk/issues/issue5_11/moon/

Veritest Netbench result: Veritest Netbench result: http://www.veritest.com/clients/reports/microsoft/ms_netbench.pdfhttp://www.veritest.com/clients/reports/microsoft/ms_netbench.pdf

Mark Russinovich’s 1999 article, “Linux and the Enterprise”: Mark Russinovich’s 1999 article, “Linux and the Enterprise”: http://www.winntmag.com/Articles/Index.cfm?ArticleID=5048http://www.winntmag.com/Articles/Index.cfm?ArticleID=5048

The Open Group's Single UNIX Specification:The Open Group's Single UNIX Specification:http://www.unix.org/version3/http://www.unix.org/version3/