ubuntu 12.04
DESCRIPTION
Case studies of Ubuntu 12.04TRANSCRIPT
Bahir Dar University
Institute of Technology
School of Computing & Electrical Engineering
Electrical & Computer Engineering
Operating System
Case Study: Ubuntu 12.04
Abrham Aschale 087/2000
Haimanot Tizazu 126/2000
Yihenew Alemu 165/2000
January, 2013
i
Table of Contents
Components of a Linux System .......................................................................................................1
Process Management .....................................................................................................................4
Process Scheduling ..............................................................................................................5
Kernel Synchronization ......................................................................................................6
Deadlock Management .......................................................................................................8
Memory Management .....................................................................................................................9
Virtual Memory Management............................................................................................10
Input Output ..................................................................................................................................12
Ubuntu Distributed Operating System ...........................................................................................15
References .....................................................................................................................................18
1
Components of a Linux System
The Linux system is composed of three main bodies of code, in line with most traditional UNIX
implementations:
1. Kernel. The kernel is responsible for maintaining all the important abstractions of the
operating system, including such things as virtual memory and processes.
2. System libraries. The system libraries define a standard set of functions through which
applications can interact with the kernel. These functions implement much of the
operating-system functionality that does not need
the full privileges of kernel code.
3. System utilities. The system utilities are programs that perform individual, specialized
management tasks. Some system utilities may be invoked just once to initialize and
configure some aspect of the system; others known as daemons in UNIX terminology -
may run permanently, handling such tasks as responding to incoming network
connections, accepting logon requests from terminals, and updating log files.
Figure1.illustrates the various components that make up a full Linux system. The most
important distinction here is between the kernel and everything else. All the kernel code executes
in the processor's privileged mode with full access to all the physical resources of the computer.
Linux refers to this privileged mode as kernel Under Linux, no user-mode code is built into the
kernel. Any operating-system-support code that does not need to run in kernel mode is placed
into the system libraries instead. Although various modern operating systems have adopted a
message passing architecture for their kernel internals, Linux retains UNIX's historical model:
the kernel is created as a single, monolithic binary. The main reason is to improve performance.
Because all kernel code and data structures are kept in a single address space, no context
switches are necessary when a process calls an operating-system function or when a hardware
interrupt is delivered.
2
Figure1. Components of the Linux system
This single address space contains not only the core scheduling and virtual memory code but all
kernel code, including all device drivers, file systems, and networking code.
Even though all the kernel components share this same melting pot, there is still room for
modularity. In the san1.e way that user applications can load shared libraries at run time to pull
in a needed piece of code, so the Linux kernel can load (and unload) modules dynamically at run
time. The kernel does not necessarily need to know in advance which modules may be loaded
they are truly independent loadable components.
The Linux kernel forms the core of the Linux operating system. It provides all the functionality
necessary to run processes, and it provides system services to give arbitrated and protected
access to hardware resources. The kernel implements all the features required to qualify as an
operating system. On its own, however, the operating system provided by the Linux kernel looks
nothing like a UNIX system. It is missing many of the extra features of UNIX, and the features
that it does provide are not necessarily in the format in which a UNIX application expects them
to appear. The operating-system interface visible to running applications is not maintained
directly by the kernel. Rather, applications make calls to the system libraries, which in turn call
the operating system services as necessary.
3
The system libraries provide many types of functionality. At the simplest level, they allow
applications to make kernel-system service requests. Making a system call involves transferring
control from unprivileged user mode to privileged kernel mode; the details of this transfer vary
from architecture to architecture. The libraries take care of collecting the system-call arguments
and, if necessary, arranging those arguments in the special form necessary to make the system
call.
The libraries may also provide more complex versions of the basic system calls. For example,
the C language's buffered file-handling functions are all implemented in the system libraries,
providing more advanced control of file I/ O than the basic kernel system calls. The libraries also
provide routines that do not correspond to system calls at all, such as sorting algorithms,
mathematical functions, and string-manipulation routines. All the functions necessary to support
the running of UNIX or POSIX applications are implemented here in the system libraries.
The Linux system includes a wide variety of user-mode programs-both system utilities and user
utilities. The system utilities include all the programs necessary to initialize the system, such as
those to configure network devices and to load kernel modules. Continually running server
programs also com1.t as system utilities; such programs handle user login requests, incoming
network connections, and the printer queues.
Not all the standard utilities serve key system-administration functions. The UNIX user
environment contains a large number of standard utilities to do simple everyday tasks, such as
listing directories, moving and deleting files, and displaying the contents of a file. More complex
utilities can perform text-processing functions, such as sorting textual data and performing
pattern searches on input text. Together, these utilities form a standard tool set that users can
expect on any UNIX system; although they do not perform any operating-system function, they
are an important part of the basic Linux system.
4
Process Management
A process is the basic context within which all user-requested activity is serviced within the
operating system. To be compatible with other UNIX systems, Linux must use a process model
similar to those of other versions of UNIX. Linux operates differently from UNIX in a few key
places, however. In this section, we review the traditional UNIX process model and introduce
Linux's own threading model.
The fork() and exec() Process Model
The basic principle of UNIX process management is to separate two operations: the creation of a
process and the rum1.ing of a new program. A new process is created by the fork() system call,
and a new program is run after a call to exec(). These are two distinctly separate functions. A
new process may be created with fork() without a new program being run-the new subprocess
simply continues to execute exactly the same program that the first (parent) process was running.
Equally, running a new program does not require that a new process be created first: any process
may call exec() at any time. The currently running program is immediately terminated, and the
new program starts executing in the context of the existing process.
This model has the advantage of great simplicity. It is not necessary to specify every detail of the
environment of a new program in the system call that runs that program; the new program simply
runs in its existing environment. If a parent process wishes to modify the environment in which a
new program is to be run, it can fork and then, still running the original program in a child
process, make any system calls it requires to modify that child process before finally executing
the new program.
Under UNIX, then, a process encompasses all the information that the operating system must
maintain to track the context of a single execution of a single program. Under Linux, we can
break down this context into a number of specific sections. Broadly, process properties fall into
three groups: the process identity, environment, and context.
5
Process Identity
A process identity consists mainly of the following items:
Process ID (PID). Each process has a Lmique identifier. The PID is used to specify the process
to the operating system when an application makes a system call to signal, modify, or wait for
the process. Additional identifiers associate the process with a process group (typically, a tree of
processes forked by a single user command) and login session.
Credentials. Each process must have an associated user ID and one or more group IDs that
determine the rights of a process to access system resources and files.
Personality. Process personalities are not traditionally found on UNIX systems, but under Linux
each process has an associated personality identifier that can slightly modify the semantics of
certain system calls. Personalities are primarily used by emulation libraries to request that system
calls be compatible with certain varieties of UNIX.
Most of these identifiers are under the limited control of the process itself. The process group
and session identifiers can be changed if the process wants to start a new group or session. Its
credentials can be changed, subject to appropriate security checks. However the primary PID of a
process is unchangeable and uniquely identifies that process until termination.
Scheduling
Scheduling is the job of allocating CPU time to different tasks within an operating system.
Normally, we think of scheduling as being the running and interrupting of processes, but another
aspect of scheduling is also important to Linux: the running of the various kernel tasks. Kernel
tasks encompass both tasks that are requested by a running process and tasks that execute
internally on behalf of a device driver.
Process Scheduling
Linux has two separate process-scheduling algorithms. One is a time-sharing algorithm for fair,
preemptive scheduling among multiple processes; the other is designed for real-time tasks, where
absolute priorities are more important than fairness.
6
The scheduling algorithm used for routine time-sharing tasks received a major overhaul with
Version 3.2 of the kernel. Earlier versions of the Linux kernel ran a variation of the traditional
UNIX scheduling algorithm, which does not provide adequate support for SMP systems and does
not scale well as the number of tasks on the system grows. The overhaul of the Linux scheduler
with Version 3.2 of the kernel provides a scheduling algorithm that runs in constant time-known
as 0(1)-regardless of the number of tasks on the system. The new scheduler also provides
increased support for SMP, including processor affinity and load balancing, as well as
maintaining fairness and support for interactive tasks.
Kernel Synchronization
The way the kernel schedules its own operations is fundamentally different from the way it
schedules processes. A request for kernel-mode execution can occur in two ways. A running
program may request an operating-system service, either explicitly via a system call or
implicitly-for example, when a page fault occurs. Alternatively, a device controller may deliver a
hardware interrupt that causes the CPU to start executing a kernel-defined handler for that
interrupt.
The problem posed to the kernel is that all these tasks may try to access the same internal data
structures. If one kernel task is in the middle of accessing some data structure when an interrupt
service routine executes, then that service routine cannot access or modify the same data without
risking data corruption. This fact relates to the idea of critical sections-portions of code that
access shared data and that must not be allowed to execute concurrently.
As a result, kernel synchronization involves much more than just process scheduling. A
framework is required that allows kernel tasks to run without violating the integrity of shared
data.
Prior to Version 3.2 Linux was a non preernptive kernel meaning that a process running in kernel
mode could not be preempted -even if a higher priority process became available to run. With
Version 2.6, the Linux kernel became fully preemptive; so a task can now be preempted when it
is running in the kernel.
The Linu
two lock
spinlock;
processor
disabling
spinlock,
spinlock,
Linux us
simple sy
preempti
a lock. T
field pree
The coun
the value
preempt
be interru
ux kernel pro
s) for lockin
; the kernel
r machines,
g kernel pre
, the task d
, it enables k
ses an intere
ystem calls-p
on. Howeve
To enforce th
empt_count,
nter is increm
e of preempt
the kernel a
upted, assum
ovides spinl
ng in the kern
is designed
, spinlocks
eemption. T
disables ker
kernel preem
esting appro
preempt_dis
er, in additio
his rule, each
, which is a
mented when
t_count for
as this task c
ming there ar
ocks and sem
nel. On SMP
so that the
are inappro
That is, on
rnel preemp
mption.
oach to disa
sable () and p
on, the kerne
h task in the
counter ind
n a lock is a
the task cur
currently hol
re no outstan
maphores (a
P machines,
spinlock is h
opriate for u
single-proce
tion. When
able and ena
preempt_ena
el is not pree
e system has
dicating the n
acquired and
rrently runni
lds a lock. If
nding calls to
as well as re
the fundam
held only fo
use and are
essor mach
the task w
able kernel
able () -for d
emptible if a
s a thread-in
number of l
d decremente
ing is greate
f the count i
o preempt_d
eader writer
ental locking
or short dura
e replaced b
hines, rather
would other
preemption.
disabling and
a kernel-mod
nfo structure
locks being
ed when a lo
er than zero
is zero, the
disable ().
versions of
g mechanism
ations. On si
by enabling
r than holdi
rwise release
. It provides
d enabling k
de task is ho
that include
held by the
ock is releas
, it is not sa
kernel can s
7
these
m is a
ingle-
g and
ing a
e the
s two
kernel
olding
es the
task.
ed. If
afe to
safely
8
Deadlock Management
When one or more processes are trying to access the same resource both the processes might get
blocked. This situation is called a Deadlock. A deadlock occurs when all four necessary
conditions are held.
Four ways in which deadlocks take place are:
Mutual Exclusion
Hold and Wait
No Preemption
Circular wait
When carrying out a process it should take place in sequential manner as the manner stated
below:-
Obtain resource/Request for resource
Utilize resource
Release resource
Deadlock Prevention is the steps we follow in order to avoid deadlocks. There are three methods
in which deadlocks can be handled. They are:
Use a protocol to avoid all deadlocks and to prevent the occurrence of deadlocks in the
future.
Allow system to go through a deadlock position, perceive it, and pull through from it.
Ignore the problem.
Deadlock detection is the process where the Operating System detects a deadlock situation and
tries to recover from it. There are two methods which Operating Systems use to recover from
deadlocks.
Either the user recovers from the deadlock manually or the system recovers itself from the
deadlock automatically.
When recovering from the deadlock automatically there are 2 options which are:
Process extinction
Terminate all the deadlock processes.
Terminate one process at a time.
9
Resource preemption
Victim selection
Rollback
Starvation
Ubuntu 12.04 is well equipped with deadlock detection methods & avoids causing deadlocks as
we mentioned above.
Memory Management
Memory management of a computer deals mainly with the allocation of space to processors in
the Random Access Memory (RAM) of the computer. This memory management process is an
extremely vital part of the computer because the effective management of the memory of the
computer is what helps the computer work at great speeds and effectively execute multiple tasks
without the problem of a system overload and failure.
The memory is divided into two main components; this is the physical memory and the logical
memory. The physical memory is the memory available in the RAM and is divided into multiple
memory blocks with a physical address that are used to load processes.
The Logical memory is where the Operating system of the computer has to assign more than one
physical memory blocks from the RAM for a particular process. In this situation the process will
have more than one physical address and this will cause problems in situations where paging is
done. Therefore the operating system gives the process a logical memory block which is a
combination of the physical memory blocks of the system and this logical memory block contain
a logical address that will assigned to this logical memory block.
Ubuntu along with all other Linux systems use a feature called Cached Memory when it comes
to handling the memory in the RAM. It uses a tool called “top” which at any given time attempts
to keep the free memory in the system to a minimum.
10
This is because if RAM is not used it has a tendency to get wasted. The way that this happens is
the ‘top’ tool takes up the space by filling it with what is known as cached space. This cached
memory though appears to be occupied is actually free and available for the system to load any
new programs that come up.
Another reason Linux uses this method is because it faster to access information from the cached
space than from the hard disk itself.
Virtual Memory Management
Virtual memory is a segment of the hard disk that has been partitioned off so that the large
processors which require more memory than that is available in the RAM (Random Access
Memory) have to be run. The method employed is a basic tactic where a segment of the
secondary storage, usually a drive, is allocated for this large process to load and a small segment
on the RAM is given where the files load on to at run time for each particular task.
This way in which this operates is when a large process needs to be executed the system will
have a problem where the memory available on the RAM is insufficient. In this case what the
system does is it allocates a segment of the secondary storage device for this process to load its
files. After which the pages that are required to by the process to run will be loaded when needed
using a technique known as Paging or Swapping. This is where a page is loaded on to the RAM
when the process requires it and then after the process is done with that particular page the
process will remove the page from the memory and load a new page.
This method of creating a virtual memory is extremely valuable as this method allows for large
processors to be loaded in to the system without having the problem of the memory being
overloaded. The most preferred method of transferring pages is the Swapping technique. The
reason being is that this method allows for more than one file to be loaded at a time and thus
making a far more effective system that will operate at greater speeds.
Ubuntu OS uses a feature called Swap Space for managing the memory. This is a space on the
hard disk which is a part of the Virtual Memory. This space is used by the system to store file
11
that are currently inactive and create space for new files to be loaded. The Swap Space
temporally stores the file until either the file is needed again or the process is terminated. This
method allows for faster speed on the system.
However the process will not directly use files stored in the Swap Space as this would cause the
system to operate slower.
Another method used to manage the virtual memory is a technique known as Demand Paging.
The way in which demand paging works is as follow. When the system is required to run a
process that is too big to be loaded to the physical memory completely, then the system loads
only the file immediately required. The remaining files get transferred to the virtual memory.
Here after when the process requires a page to work with it is loaded from the virtual memory.
This process is known as demand paging.
The way in which the Linux operating system handles demand paging works is similar in most
Linux operating systems. When a process starts to execute the operating system will load the
files into the virtual memory. Then the when a command to execute is give the file containing it
is opened and the content mapped to the virtual memory. The way this is carried out is by
changing the data structure that describes the processes memory map. This is known as memory
mapping.
However the entire file is not brought up, only the first part is brought up. This will create a page
fault and then the operating system will search the memory map to determined which remaining
part of the file to bring up to be executed.
12
Input Output
To the user, the I/O system in Linux looks much like that in any UNIX system. That is, to the
extent possible, all device drivers appear as normal files. Users can open an access channel to a
device in the same way they opens any other file-devices can appear as objects within the file
system. The system administrator can create special files within a file system that contain
references to a specific device driver, and a user opening such a file will be able to read from and
write to the device referenced. By using the normal file-protection
Figure2. Device-driver block structure.
system, which determines who can access which file, the administrator can set access
permissions for each device.
Linux splits all devices into three classes: block devices, character devices, and network devices.
Figure2. Illustrates the overall structure of the device-driver system.
13
Block Devices: include all devices that allow random access to completely independent, fixed-
sized blocks of data, including hard disks and floppy disks, CD-ROMs, and flash memory. Block
devices are typically used to store file systems, but direct access to a block device is also allowed
so that programs can create and repair the file system that the device contains. Applications can
also access these block devices directly if they wish; for example, a database application may
prefer to perform its own, fine-tuned laying out of data onto the disk, rather than using the
general-purpose file system.
Character devices: include most other devices, such as mice and keyboards. The fundamental
difference between block and character devices is random access-block devices may be accessed
randomly, while character devices are only accessed serially. For example, seeking to a certain
position in a file might be supported for a DVD but makes no sense to a pointing device such as
mouse.
Network devices:-are dealt with differently from block and character devices. Users cannot
directly transfer data to network devices; instead, they must communicate indirectly by opening a
connection to the kernel's networking subsystem.
Inter process Communication
Linux provides a rich environment for processes to communicate with each other.
Communication may be just a matter of letting another process know that some event has
occurred, or it may involve transferring data from one process to another.
Synchronization
The standard Linux mechanism for informing a process that an event has occurred is the signal.
Signals can be sent from any process to any other process, with restrictions on signals sent to
processes owned by another user. However, a limited number of signals are available, and they
cannot carry information. Only the fact that a signal has occurred is available to a process.
Signals are not generated only by processes. The kernel also generates signals internally; for
example, it can send a signal to a server process when data arrive on a network channel, to a
parent process when a child terminates, or to a waiting process when a timer expires.
14
Internally, the Linux kernel does not use signals to communicate with processes running in
kernel mode. If a kernel-mode process is expecting an event to occur, it will not normally use
signals to receive notification of that event. Rather, communication about incoming
asynchronous events within the kernel takes place through the use of scheduling states and wait
queue structures. These mechanisms allow kernel-mode processes to inform one another about
relevant events, and they also allow events to be generated by device drivers or by the
networking system. Whenever a process wants to wait for some event to complete, it places itself
on a wait queue associated with that event and tells the scheduler that it is no longer eligible for
execution. Once the event has completed, it will wake up every process on the wait queue. This
procedure allows multiple processes to wait for a single event. For example if several processes
are trying to read a file from a disk, then they will awakened once the data have been read into
memory successfully.
Although signals have always been the main mechanism for communicating asynchronous
events among processes, Linux also implements the semaphore mechanism of System V UNIX.
A process can wait on a semaphore as easily as it can wait for a signal, but semaphores have two
advantages: Large numbers of semaphores can be shared among multiple independent processes,
and operations on multiple semaphores can be performed atomically. Internally, the standard
Linux wait queue mechanism synchronizes processes that are communicating with semaphores.
15
Ubuntu Distributed Operating System
A distributed operating system (DOS) allows the creation, implementation and completion of
programs (a single task or a number of different tasks running concurrently – often referred to as
parallel processing) that can utilize a network of general purpose and often heterogeneous
computers to accomplish a task. This network of computers forms a single multi-computer or
cluster. Unlike a parallel processor or SMP (symmetric multiprocessor) system that has common
RAM for UMA (uniform memory access) and has other shared resources, a cluster-based
system’s individual computers (nodes) have no direct access to the resources of other nodes.
Therefore, although conceptually similar in the functions required of an SMP operating system,
the cluster OS has additional requirements to provide communication between cluster nodes
which may be dissimilar in hardware and local operating systems. This is a heterogeneous
cluster.
Distributed Operating Systems (DOS) have the following benefits:
Speed - A process can be run faster by being divided into subtasks (threads) that are run
on two or more nodes. Also, more or dependent processes can be run at the same time.
Low powered machines can be combined to reach supercomputer performance by having
each process a small portion of the task.
Fail Safe Operation - Node failure can be compensated by others in the cluster improving
program reliability.
Remote Data Capture - Remote data acquisition systems, like point of sale, require a
distributed system to work correctly.
Off the Shelf - DOS can use cheap, generic and even dissimilar hardware.
Scalability - DOS can be easily reshaped, expanded and contracted in scope as needs
change.
Cost – Clusters offer a very high performance to price ratio.
16
A DOS needs to supply these basic services:
The ability to run processes on remote computers efficiently (distributed scheduling).
The ability to provide data required for processes and get results from processes.
The ability to provide resources required for processes.
Synchronization of processes so process interaction is possible.
A DOS may also provide process migration if required. But, this can cause high overhead
and complications in the DOS design.
Distributed Topographies
A DOS is used in the control of clusters. These networked collections of personal computers and
workstations are often referred to as a scalable computing cluster (SCC), isolated from a general
purpose network of workstations (NOW). A NOW is used by multiple users. An SCC tends to
consist of homogenous machines for ease of control and speed, while a NOW can be
heterogeneous in the mix of PCs and workstations. Unlike massively parallel processors (MPP)
which typically allow a single user per partition (logical collection of processors), these
topographies generally allow multiple users and time sharing. A cluster can emulate an MPP to
solve problems using similar parallel programming methods.
Linux as a Basis for Building a Distributed Operating System
Linux has an inherent capacity to be developed into a DOS. Linux Inter-Process Communication
(IPC) provides a set of mechanisms for data exchange data and synchronization. These include:
Shared memory - when one process writes some information in a shared memory,
other processes can immediately use it.
IPC Messages - structures for data exchange between processes using system
calls.
Semaphores – for process synchronization.
17
Linux is the Preferred Basis for DOS
Synchronicity is often the basis of scientific advancement. At about the same point in time these
factors came into play:
Linux Torvalds' Linux is made available to developers. It’s free, flexible, open source
and inexpensive.
Powerful desktop computers that can run Linux are plentiful and low cost.
Easy, cheap and standardized networking becomes available
Demand for distributed operating systems grows at an unparallel rate.
These factors have caused Linux to be adopted as a basis of many distributed operating systems.
Because Linux is free with open source code, universities and research labs can experiment with
and create application programmer interfaces (APIs, although, strictly speaking many of these
tools were system programming tools). These efforts addressed the need for distributed operating
systems that were becoming evident in the late 1980s for distributed, as well as massively
parallel systems, to be developed.
18
References
1. Silberschatz A., Galvin P., Gagne G. (2009). Operating System Concepts,8th edition. Hoboken, NJ. John Wiley & Sons.
2. Michael S., (2001). The Linux Codebook: Tips and Techniques for Everyday Use. San Francisco. No Starch Press, Inc.
3. Turnbull J., (2005). Hardening Linux. New York, NY. Springer-Verlag New York, Inc.