sun solaris os glenn barney [email protected] coms e6998.002 : advanced computer design

35
Sun Solaris OS Glenn Barney [email protected] COMS E6998.002 : Advanced Computer Design

Upload: vincent-mckenzie

Post on 03-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Sun Solaris OS

Glenn Barney

[email protected] E6998.002 : Advanced Computer Design

Page 2: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Metrics• Sun focused on 5 major design areas

- Performance- Security

- Prevent- Detect- Respond

- Availability- Utilization- Platform Choice

- Hardware Compatibility list- 716 x86/x64 systems, 75 SPARC systems

• Major Metric successes are Security, Availability. Performance and Utilization are a bit more questionable… but still very good as we’ll see.

Page 3: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

History of Solaris• It’s a Unix OS that is an amalgam of earlier Unix based OSs, but mainly SUN’s first

OS, SunOS based on BSD and AT&T’s Unix, System .

•The General timeline :–1970 to 1979 : Unix is first written and Assembly and then C by Ritchie and Thompson.

–1982 – Bill Joy leaves Berkeley, co-founds sun and develops SunOS based on BSD

–1984 to 1987 – AT&T develops releases System V, which competes with BSD until the mid 90s

–1988 – AT&T Purchases large stake in Sun

–1993 – Sun announces first version of Solaris, which will no longer be based on BSD but mainly on System V release 4, an mix of other Unix distributions. The competing unix standards group, OSF, begins a GUI war with Sun supporting it’s own MOTIF/X against Sun’s OPEN LOOK.

–1994 – Sun creates the Common Desktop Environment to support both MOTIF and OPEN LOOK - by Solaris 5 it’s officially supported

Page 4: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

The Solaris Gestalt• Pulled in from BSD

– Virtual memory system

– Fast file system with symbolic links

– TCP/IP networking system with Kerberos, Telnet, FTP, sendmail.

– Alternate shells to Bourne shell (C shell)

– Vendor products like NFS (from SUN)

– and symmetric multiprocessing support, thread management and shared libraries

• Pulled in from System V– Interprocess Communication– Bourne shell enhancements– STREAMS and TLI networking

libraries– Remote File Sharing– Improved memory paging– Application Binary Interface

Created by Sun for the SunOS

•SunOS 4.x

–NFS

–OpenWindows 2.0 GUI

–OpenBoot monitor

–DeskSet Utilities

–Multiprocessing Support

•SunOS 5.x (ie Solaris)–SMP for more then 100 processors in single server

–CDE (Motif, PostScript, Open Look)

–Gnome 2.0 to support Linux integration

–Network Information Service (NIS)

–Clustering

–Java

–Ever growing list of new features

Page 5: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Some General Solaris Tidbits

• Solaris 10 does not support old Sun hardware : Chipsets it does support UltraSPARC II, III, IV and newer, 32 bit Intel x86 and 64-bit AMD Opteron.

• Of course old 32 bit SPARC programs are still supported

• Sun does support batch jobs like JCL : Sun MBM - which preserves Batch step constructs on Sun systems

• Load balancing seems to require a third party application

• Sun Network Cache and Accelerator (SNCA) since Solaris 8 helps cache and serve web pages, but doesn’t do load balancing per se

Page 6: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Solaris Overview

• Processor/Platform Specific code – less then 5% of kernel, developed to adapt to different hardware platforms

• Device Drivers - dynamically loaded and use a common published interface

• File System and Volume Management – treat large number of disks as single volume, Virtual File System supports unlimted file system extensions : UFS, NFS, Sun StoreEdge file systems, PC file systems, etc. New Zetabyte File System.

• Unified TCP/IP Stack

•Linux System Call Handler is in-kernel, it catches Linux ssytem calls and dispatches the equivalent Solaris kernel functions

•Dtrace debugging system new for Solaris 10, clean and modular pre-deployed global debugging solution at minimum runtime cost.

Page 7: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Solaris Modular Kernel

•Seven types of loadable modules

•Secluding classes

•File systems

•Loadable system calls

•Loaders for executable file formats

•Stream modules

•Bus or device drivers

•Miscellaneous

Page 8: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Solaris Kernel

However the two-level model was replaced with a 1 to 1 model. Why? Basically it was too complicated.

•Improved performance, scalability, and reliability

•Reliable signal behavior

•Improved adaptive mutex lock implementation

•User-level sleep queues for synchronization objects

• Kernel Thread - core unit of execution that is scheduled and executed on a processor.

– have an execution state and context that includes a global priority and scheduling class

– units that get scheduled, executed and context switched on and off processors

• User Thread – user level thread state maintained within a user process

• Process – executable form of program

• Lightweight Process – LWP kernel visible execution context for a user thread

• Solaris 2 to 8 had a “two-level threads model” where many threads were able to be assigned to to a smaller group of LWPs.

Page 9: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Kernel Thread Scheduling

• Dispatcher uses priority model to select which kernel thread to execute next.

• Supports preemption, and the kernel itself is preemptable.

• 170 global priorities partitioned by scheduling class.

• Three main classes are TS, SYS, and RT.

Timeshare (TS) – default for all process and kernel threads in the process.

Interactive (IA) – enhanced TS used by the windowing system to boost threads under the window focus

Fair Share Scheduling (FSS) – share based, not priority based.

Fixed Priority (FX) – fixed-priority

System (SYS) – used for kernel threads, they are bound and run till block or complete

Real Time (RT) – fixed priority, fixed-time quantum scheduling.

Page 10: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Interprocess Communication and Signals• Traditional Unix IPC

– Pipes: directly channels data between related processes through an file like object

– Named Pipes – FIFO paipes actually implemented as files in the file system namespace

– Socket – can be over a network or local (domain)

• System V IPC– Shared Memory – process create segment of shared memory shared among each other

– Message Queue – each message contains a 32 bit type value and a data payload

– Semaphores – process can sleep on them, used for synchronization but any process can increment

• Solaris doors – Door server contains a thread that sleeps waiting for client, client calls server through a door and scheduling control is passed to the door to the requesting thread through the door server. Very low latency turnaround.

• Signals – can interrupt a process after an event occurs. Signals can be ignored, caught and handled, or treated with a default action.

Page 11: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Memory• 64-bit kernel and process address space

• optimizes memory use by sharing program binaries and application data among processes

• VM system manages most objects related to I/O and memory, kernel and user applications, shared libraries and file systems– Manages virtual-to-physical mapping of memory.– Manages swapping memory between primary and secondary storage to optimize

performance.– Handles requests of shared images between multiple users and processes.– It acts as an integrated file cache.

• Newer features in the VM implementation include :– During I/O uses 64 bit address space to create a permanent mapping of all physical pages

into SEGKPM, eliminating need to map/unmap for each I/O.– Variable page sizes, largest available now is 356 Bytes– Generic framework: Multiple Page Size Selection (MPSS) for various page sizes– Support for nonuniform (NUMA) memory architectures– Dynamic reconfiguration – new pages can add to the free list on the fly while the kernel is in

a safe “kernel cage”– Modern memory allocators support slabs

Page 12: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

• Pages can very in size, common size is 8 Kbytes.

• Solaris kernel uses a combined demand-paged and swapping model.

• Abstract memory objects called segments, vnodes, and pages

– Physical memory, in chunks called pages

– Virtual file object called vnode

– File system is a hierarchy of vnodes

– Process and Kernel address space as segements of mapped vnodes

– Mapped hardware devices (ie frame buffers) are segments of hardware-mapped pages

• Physical Memory management done by Hardware Address Translator (HAT)

– Machine independent implementation

Virtual Memory

Page 13: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Virtual Memory Continued• Process’s virtual address space skeleton created by kernel when the fork()

system call creates the process

• Memory is allocated on the heap, malloc() doesn’t create physical memoy

• Heap can be allocated in 32 or 64 bit mode, much larger with 64 bit mode.

• Picture on the right show how memory mapping can share data among processes

• Several options govern how a file is shared when it is mapped between process– MAP_SHARED can be set to PROT_, READ|PROT_, WRITE– MAP_PRIVATE can be set to PROT_, READ|PROT_, WRITE

• Each segment has protection mode Read, Write, or Executable.

Page 14: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Page Faults and Anonymous Memory

• Major Page fault occurs when physical page does not exist

• Minor page fault when page is in physical memory but no MMU translation is exists (attaches)

• Protection fault when access violates memory permissions

There can also be anonymous memory, pages that are not associated with a vnode. They are used for new heap space, and are allocated by a zero-fill-on-demand operation, or a ZFOD.

Page 15: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Intimate Shared Memory

• System V shared memory (ipc) option

• Shared Memory optimization:

– Additionally share low-level kernel data

– Reduce redundant mapping info (V-to-P)

• Shared Memory is locked, never paged

– No swap space is allocated

• Use SHM_SHARE_MMU flag in shmat()

Page 16: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Physical Memory• Memory managed by page scanner

deamon (except kernel memory)

• When the system is booted memory is placed on the freelist in page size chunks.

• Anonymous memory is used for most of a processes’s memory allcoation (heap and stack).

• Pages are read into memory from the free list and then reside in a segmap cache, process’s address space, or the cachelist.

• page_create_va() allocates pages, taking into account the virual address to calculate page coloring.

• Page scanner uses global page replacement.

• Two bits are kept per page to indicate if the page has been modified since bits were last cleared.

Page 17: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Page swapping “ two-handed clock algorithm”

• In addition to this page-out process, the dispatcher can swap out entire processes to conserve memory, it does this rarely but in extreme circumstances.

Page 18: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Slab and HAT• Solaris has a general purpose memory allocator known as the slab allocator.

Used for memory requests that are :– Smaller then a page size

– Not even a multiple of a page size

– Frequently going to allocated and freed memory that causes fragmentation

• Solves fragmentation issues by grouping different-sized memory objects into separate caches, where each object cache has it’s own size and characteristics

• The HAT layer programs’s the TLB with entries identifiying the relationship of the virutal and physical addresses.

• If the TLB lookup fails, as backup the UltraSPARC uses a translation storage buffer (TSB), while most other architectures use a hardware page table.

• Big difference cause the TSB is a software lookup, but Solaris provides both.

• Take a look at the slide titled “Virtual Memory” to see a picture of the HAT layer, it is on the right

Page 19: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Virtual File System VFS• Created to abstract away file systems so

NFS and UFS could co-exist

• Made of vnode, the virtual node interface that implements file-related functions, and vfs the virtual file system that directs functions to specific file systems

• Structures consist of file descriptors in a file list, which point to a per-process file table. A vnode is looked up in this table, which eventually points to a physical node depending on file system implementation.

• New in Solaris 10 : Zettabyte File System– Endian Neutral – move files between

SPARC and x86 based systems

– ZFS protects all data with 64-bit checksums

– 128-bit file system!

– built on top of virtual storage pools

– All operations are transactional and copy-on-write

Page 20: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Unix File System (UFS)

• UFS we know and love : The default file system for Solaris, in development for over 20 years.

• Based around disk geometry : the number of sectors in a track, the location of the head, and the number of tracks.

• Supports hard and soft links.

• Inode (index node) is the internal descriptor for a file

• Access scheme : users, group, world.

Page 21: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

I/O

• Two distinct methods perform file system I/O:

– read(), write(), and related system calls

– Memory-mapping of a file into the process's address space

• Both are in the picture here to the right.

Page 22: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Performance: NUMA systems• NonUniform Memory Access (NUMA)

machine - machines in which some memory is closer to some CPUs than others

• Addressed by the Memory Placement Optimzation framework (MPO)

– Locality awareness

– Balancing

– Dynamic topology support

• Latency groups (lgroup) – sets of CPU and equidistant memory defined in the kernel.

• A home lgroup is chosen for each thread upon creation, and it prefers this lgroup.

• For memory allocation, perfer lgroup but if you know you have multithreaded, spread out code, random placement may be better

Page 23: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

CMT support and Parallel System architectures• Chip Multithreading (CMT) -

CPUs share various processor components and caches

• The three different parallel architectures

– SMP. Symmetric multiprocessor with a shared memory model; single kernel image

– MPP. Message-based model; multiple kernel images

– NUMA/ccNUMA. Shared memory model; single kernel image

So the Solaris kernel has several semaphores and mutex locks to help address concurrent thread memory access. SMP (like Intel and AMD chips) and CMT (the UltraSPARC T1) is lot more complicated then just NUMA system, and much research goes on in this field. Sun’s attitude is to try to make things as simple as possible while still providing necessary synchronization.

Page 24: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Networking : The TCP/IP Stack• Was two STREAMS layers with

packet queueing and locks between layers and 1 processor thread per connection

• Now merges TCP and IP layers and allocates a single thread per CPU.

– Streamlined to process packet through both layers

– Binds connections to a CPU for entire life

• Uses a vertical perimeter per-CPU mehcnaism to protect the connection. It is implemented with an IP classifier, serialization queue, and worker thread so only one CPU processes a specific packet.

• Integrated support for TCP offload engines – let hardware do the work

Page 25: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Security• For user permissions

– UFS and file system permissions

– Role Based Access Control since Solaris 8

– New in Solaris 10: least privilege model

– Access Control Lists let you make arbitrary security permissions

• Kernel level permissions, the privileged kernel thread and modules run the whole system and control Solaris containers.

• Automated Patch Tool

• Solaris Cryptographic framework

• Full network traffic control, for example TCP packet monitoring, disable redirecting of packets and answering system pings.

Page 26: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Solaris Containers/Zones•Containers provide the complete virtualized environment, zones are the component that provides the isolation between zones.

•Up to 8192 virtualized environments per Solaris OS instance.

•Provides a secure sandbox that has unique root, user and file systems. Also network interfaces, devices, hardware, I/O all virtualized.

•The kernel makes sure that the zones are isolated.

•If a zone fails, it can reboot in a few seconds.

Page 27: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Process rights management • Solaris 10 OS least privilege model includes nearly 50 fine-grained privileges as well as

the basic privilege set.

– Evolved from Trusted Solaris.

– Basic Privilege set includes al privileges given to unprivileged processes in the tradition security model

• Each process has four sets in it’s kernel credentials– The Inheritable set (I): The privileges inherited on exec.

– The Permitted set (P): The maximum set of privileges for the process.

– The Effective set (E): The privileges currently in effect, a subset of P.

– The Limit set (L): The upper bound of the privileges a process and its children may obtain

• Once launched, a process uses privalege manipulation functions to add or remove privaleges from the privilege sets

Page 28: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Cryptography

Two Basic Types

•User level Framework

•Exists Outside the Kernel

•Uses the PKCS 11 interface

•Applications use it

•Kernel Level Framework

•Operating System modules use it

•Can interface with hardware and software plug-ins

Niether provide actual encryption algorithms, plug-ins do all the work!

Both are verified by the Module Verification Deamon

Page 29: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

• Each plug in must be verified (signed) by the Module Verification Daemon

– First sets up thread pull that lives in the kCF to service requests

– Second answers request for verification of user and kernel level provider signatures

Cryptography Continued

User level crypto algorithms supported Kernel level crypto algorithms supported

• Cryptoadm() tool provided for administration of uCF and kCF.

• /dev/crypto drivers allow communication between user and kernel level plug ins

• /dev/cryptoadm runs the Module Verificaton Daemon

• For user level, provides digest() and mac() for calculating digest and MAC of files. Provides encrypt() and dectrypt() for encrypting and decrypting files

• Solaris IPsec/IKE and Kerberos, user-level and kernel-level, have been ported to use the Solaris Cryptographic Framework in the Solaris 10 OS.

Page 30: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

DTrace Debugging System

• Dynamically record data at points of points of interest (probes) in the user and kernel areas.

• Record stack trace, timestamp, arguments.

• Kernel modules called providers know how to activate probes

• Has it’s own D language – a compiler looks for probes and providers, using the provider information to find which probes should be logged when fired.

• DTrace won the top prize in the Wall Street Journal's 2006 Technology Innovation Awards competition

30,000 published probes within the Solaris kernel

Page 31: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Recovery – Predictive self healing• Self diagnosing system is constantly

gathering data. Error reports are encoded as a set of name-value pairs and form an error event. Diagnosis engines run in the background consuming error events.

• Diagnosis engines output a fault event, broadcast to all agents who can respond.

• Enter the Solaris Fault Manager– Manages the diagnosis engines and agents

– Provides a programming model for clients

– Compiles logs

– Manages multiplexing of events between producers and consumers

• Sun message identifier corresponds an error message with an online knowledgebase article or link

• Diagnosis have a universal link identifier so that solutions can be cross referenced

Page 32: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Why Solaris beats Linux• Solaris is more secure - it hasACLs, RBAC, PRM, and containers vs. ACLs and

Xen in Linux• Solaris is more Sable – Linux has rapid change and multiple centers of

control. While sun has a predictable lifecycle, and Solaris Application Guarantee.

Category Hardware Operating System Price-performance Improvement over prior record holderDual Node Sun Fire V20z Red Hat Enterprise Linux AS Release 3 $101.10/TOPS 15%Multiple Node Application servers: Sun Fire V20z Solaris 9, x86 Platform edition $82.74/TOPS 40%

Database server: Sun Fire V40z

Solaris has a better price/performance : SPECjAppServer2002 results

Solaris has a lower cost of support for high level support

Page 33: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Why Linux Beats Solaris

• Novell points out Solaris’s poor performance

But Sun has put out a lot of technology to fight criticisms, like ZFS to address big endian/little endian compatibility between SPARC and x86, and the linux binary API to increase software options on Solaris.

• Novell points out Solaris’s higher cost for multiple CPU machines

Page 34: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

Where Solaris is Headed

• Since once the most popular UNIX based OS in the world, SUN has lost a lot of market share.– Microsoft Windows took the low-end market away from most

Unix systems– Linux came in to pull away remainder – Solaris left with the high-end space - based sales on its

stability, performance, and support• Now with Solaris 10 and OpenSolaris, sun is trying to regain the

low end market• Trying to work with AMD/Linux, not against it:

– Linux Application Environment– Specific designs for AMD multiprocessor systems– Free OS with competitive support options

• Trusted Solaris features in Solaris 10 a huge selling point

Page 35: Sun Solaris OS Glenn Barney gb2174@columbia.edu COMS E6998.002 : Advanced Computer Design

References• Solaris 10: In a Class By Itself

http://www.sun.com/software/whitepapers/solaris10/classbyitself.pdf• Solaris and Linux : SealRock research comparison whitepaper

http://www.sun.com/software/whitepapers/solaris10/sealrock.pdf• Solaris 10 The Complete Reference –

http://books.mcgraw-hill.com/downloads/products/0072229985/0072229985_ch01.pdf• Solaris 8 Administrator Certification Training Guide – Appendix C

http://unixed.com/Resources/history_of_solaris.pdf• Solaris™ Internals Core Kernel Components

http://www.phptr.com/content/images/0130224960/samplechapter/0130224960.pdf• Solaris™ Internals : Solaris 10 and OpenSolaris Kernel Architecture

http://www.sun.com/books/catalog/solaris_internals.xml• The Solaris Cryptographic Framework

http://www.sun.com/bigadmin/features/articles/crypt_framework.pdf• The least privilege model in the Solaris OS

http://www.sun.com/bigadmin/features/articles/least_privilege.html• Solaris and Linux Seal Rock Research Paper

http://www.novell.com/collateral/4621445/4621445.pdf• SUSE® Linux Enterprise Server 9 and Solaris 10 on x86

http://www.novell.com/collateral/4621445/4621445.pdf