ubuntu 12.04

Bahir Dar University

Institute of Technology

School of Computing & Electrical Engineering

Electrical & Computer Engineering

Operating System

Case Study: Ubuntu 12.04

Abrham Aschale 087/2000

Haimanot Tizazu 126/2000

Yihenew Alemu 165/2000

January, 2013

i

Table of Contents

Components of a Linux System .......................................................................................................1

Process Management .....................................................................................................................4

Process Scheduling ..............................................................................................................5

Kernel Synchronization ......................................................................................................6

Deadlock Management .......................................................................................................8

Memory Management .....................................................................................................................9

Virtual Memory Management............................................................................................10

Input Output ..................................................................................................................................12

Ubuntu Distributed Operating System ...........................................................................................15

References .....................................................................................................................................18

1

Components of a Linux System

The Linux system is composed of three main bodies of code, in line with most traditional UNIX

implementations:

1. Kernel. The kernel is responsible for maintaining all the important abstractions of the

operating system, including such things as virtual memory and processes.

2. System libraries. The system libraries define a standard set of functions through which

applications can interact with the kernel. These functions implement much of the

operating-system functionality that does not need

the full privileges of kernel code.

3. System utilities. The system utilities are programs that perform individual, specialized

management tasks. Some system utilities may be invoked just once to initialize and

configure some aspect of the system; others known as daemons in UNIX terminology -

may run permanently, handling such tasks as responding to incoming network

connections, accepting logon requests from terminals, and updating log files.

Figure1.illustrates the various components that make up a full Linux system. The most

important distinction here is between the kernel and everything else. All the kernel code executes

in the processor's privileged mode with full access to all the physical resources of the computer.

Linux refers to this privileged mode as kernel Under Linux, no user-mode code is built into the

kernel. Any operating-system-support code that does not need to run in kernel mode is placed

into the system libraries instead. Although various modern operating systems have adopted a

message passing architecture for their kernel internals, Linux retains UNIX's historical model:

the kernel is created as a single, monolithic binary. The main reason is to improve performance.

Because all kernel code and data structures are kept in a single address space, no context

switches are necessary when a process calls an operating-system function or when a hardware

interrupt is delivered.

2

Figure1. Components of the Linux system

This single address space contains not only the core scheduling and virtual memory code but all

kernel code, including all device drivers, file systems, and networking code.

Even though all the kernel components share this same melting pot, there is still room for

modularity. In the san1.e way that user applications can load shared libraries at run time to pull

in a needed piece of code, so the Linux kernel can load (and unload) modules dynamically at run

time. The kernel does not necessarily need to know in advance which modules may be loaded

they are truly independent loadable components.

The Linux kernel forms the core of the Linux operating system. It provides all the functionality

necessary to run processes, and it provides system services to give arbitrated and protected

access to hardware resources. The kernel implements all the features required to qualify as an

operating system. On its own, however, the operating system provided by the Linux kernel looks

nothing like a UNIX system. It is missing many of the extra features of UNIX, and the features

that it does provide are not necessarily in the format in which a UNIX application expects them

to appear. The operating-system interface visible to running applications is not maintained

directly by the kernel. Rather, applications make calls to the system libraries, which in turn call

the operating system services as necessary.

3

The system libraries provide many types of functionality. At the simplest level, they allow

applications to make kernel-system service requests. Making a system call involves transferring

control from unprivileged user mode to privileged kernel mode; the details of this transfer vary

from architecture to architecture. The libraries take care of collecting the system-call arguments

and, if necessary, arranging those arguments in the special form necessary to make the system

call.

The libraries may also provide more complex versions of the basic system calls. For example,

the C language's buffered file-handling functions are all implemented in the system libraries,

providing more advanced control of file I/ O than the basic kernel system calls. The libraries also

provide routines that do not correspond to system calls at all, such as sorting algorithms,

mathematical functions, and string-manipulation routines. All the functions necessary to support

the running of UNIX or POSIX applications are implemented here in the system libraries.

The Linux system includes a wide variety of user-mode programs-both system utilities and user

utilities. The system utilities include all the programs necessary to initialize the system, such as

those to configure network devices and to load kernel modules. Continually running server

programs also com1.t as system utilities; such programs handle user login requests, incoming

network connections, and the printer queues.

Not all the standard utilities serve key system-administration functions. The UNIX user

environment contains a large number of standard utilities to do simple everyday tasks, such as

listing directories, moving and deleting files, and displaying the contents of a file. More complex

utilities can perform text-processing functions, such as sorting textual data and performing

pattern searches on input text. Together, these utilities form a standard tool set that users can

expect on any UNIX system; although they do not perform any operating-system function, they

are an important part of the basic Linux system.

4

Process Management

A process is the basic context within which all user-requested activity is serviced within the

operating system. To be compatible with other UNIX systems, Linux must use a process model

similar to those of other versions of UNIX. Linux operates differently from UNIX in a few key

places, however. In this section, we review the traditional UNIX process model and introduce

Linux's own threading model.

The fork() and exec() Process Model

The basic principle of UNIX process management is to separate two operations: the creation of a

process and the rum1.ing of a new program. A new process is created by the fork() system call,

and a new program is run after a call to exec(). These are two distinctly separate functions. A

new process may be created with fork() without a new program being run-the new subprocess

simply continues to execute exactly the same program that the first (parent) process was running.

Equally, running a new program does not require that a new process be created first: any process

may call exec() at any time. The currently running program is immediately terminated, and the

new program starts executing in the context of the existing process.

This model has the advantage of great simplicity. It is not necessary to specify every detail of the

environment of a new program in the system call that runs that program; the new program simply

runs in its existing environment. If a parent process wishes to modify the environment in which a

new program is to be run, it can fork and then, still running the original program in a child

process, make any system calls it requires to modify that child process before finally executing

the new program.

Under UNIX, then, a process encompasses all the information that the operating system must

maintain to track the context of a single execution of a single program. Under Linux, we can

break down this context into a number of specific sections. Broadly, process properties fall into

three groups: the process identity, environment, and context.

5

Process Identity

A process identity consists mainly of the following items:

Process ID (PID). Each process has a Lmique identifier. The PID is used to specify the process

to the operating system when an application makes a system call to signal, modify, or wait for

the process. Additional identifiers associate the process with a process group (typically, a tree of

processes forked by a single user command) and login session.

Credentials. Each process must have an associated user ID and one or more group IDs that

determine the rights of a process to access system resources and files.

Personality. Process personalities are not traditionally found on UNIX systems, but under Linux

each process has an associated personality identifier that can slightly modify the semantics of

certain system calls. Personalities are primarily used by emulation libraries to request that system

calls be compatible with certain varieties of UNIX.

Most of these identifiers are under the limited control of the process itself. The process group

and session identifiers can be changed if the process wants to start a new group or session. Its

credentials can be changed, subject to appropriate security checks. However the primary PID of a

process is unchangeable and uniquely identifies that process until termination.

Scheduling

Scheduling is the job of allocating CPU time to different tasks within an operating system.

Normally, we think of scheduling as being the running and interrupting of processes, but another

aspect of scheduling is also important to Linux: the running of the various kernel tasks. Kernel

tasks encompass both tasks that are requested by a running process and tasks that execute

internally on behalf of a device driver.

Process Scheduling

Linux has two separate process-scheduling algorithms. One is a time-sharing algorithm for fair,

preemptive scheduling among multiple processes; the other is designed for real-time tasks, where

absolute priorities are more important than fairness.

6

The scheduling algorithm used for routine time-sharing tasks received a major overhaul with

Version 3.2 of the kernel. Earlier versions of the Linux kernel ran a variation of the traditional

UNIX scheduling algorithm, which does not provide adequate support for SMP systems and does

not scale well as the number of tasks on the system grows. The overhaul of the Linux scheduler

with Version 3.2 of the kernel provides a scheduling algorithm that runs in constant time-known

as 0(1)-regardless of the number of tasks on the system. The new scheduler also provides

increased support for SMP, including processor affinity and load balancing, as well as

maintaining fairness and support for interactive tasks.

Kernel Synchronization

The way the kernel schedules its own operations is fundamentally different from the way it

schedules processes. A request for kernel-mode execution can occur in two ways. A running

program may request an operating-system service, either explicitly via a system call or

implicitly-for example, when a page fault occurs. Alternatively, a device controller may deliver a

hardware interrupt that causes the CPU to start executing a kernel-defined handler for that

interrupt.

The problem posed to the kernel is that all these tasks may try to access the same internal data

structures. If one kernel task is in the middle of accessing some data structure when an interrupt

service routine executes, then that service routine cannot access or modify the same data without

risking data corruption. This fact relates to the idea of critical sections-portions of code that

access shared data and that must not be allowed to execute concurrently.

As a result, kernel synchronization involves much more than just process scheduling. A

framework is required that allows kernel tasks to run without violating the integrity of shared

data.

Prior to Version 3.2 Linux was a non preernptive kernel meaning that a process running in kernel

mode could not be preempted -even if a higher priority process became available to run. With

Version 2.6, the Linux kernel became fully preemptive; so a task can now be preempted when it

is running in the kernel.

The Linu

two lock

spinlock;

processor

disabling

spinlock,

spinlock,

Linux us

simple sy

preempti

a lock. T

field pree

The coun

the value

preempt

be interru

ux kernel pro

s) for lockin

; the kernel

r machines,

g kernel pre

, the task d

, it enables k

ses an intere

ystem calls-p

on. Howeve

To enforce th

empt_count,

nter is increm

e of preempt

the kernel a

upted, assum

ovides spinl

ng in the kern

is designed

, spinlocks

eemption. T

disables ker

kernel preem

esting appro

preempt_dis

er, in additio

his rule, each

, which is a

mented when

t_count for

as this task c

ming there ar

ocks and sem

nel. On SMP

so that the

are inappro

That is, on

rnel preemp

mption.

oach to disa

sable () and p

on, the kerne

h task in the

counter ind

n a lock is a

the task cur

currently hol

re no outstan

maphores (a

P machines,

spinlock is h

opriate for u

single-proce

tion. When

able and ena

preempt_ena

el is not pree

e system has

dicating the n

acquired and

rrently runni

lds a lock. If

nding calls to

as well as re

the fundam

held only fo

use and are

essor mach

the task w

able kernel

able () -for d

emptible if a

s a thread-in

number of l

d decremente

ing is greate

f the count i

o preempt_d

eader writer

ental locking

or short dura

e replaced b

hines, rather

would other

preemption.

disabling and

a kernel-mod

nfo structure

locks being

ed when a lo

er than zero

is zero, the

disable ().

versions of

g mechanism

ations. On si

by enabling

r than holdi

rwise release

. It provides

d enabling k

de task is ho

that include

held by the

ock is releas

, it is not sa

kernel can s

7

these

m is a

ingle-

g and

ing a

e the

s two

kernel

olding

es the

task.

ed. If

afe to

safely

8

Deadlock Management

When one or more processes are trying to access the same resource both the processes might get

blocked. This situation is called a Deadlock. A deadlock occurs when all four necessary

conditions are held.

Four ways in which deadlocks take place are:

Mutual Exclusion

Hold and Wait

No Preemption

Circular wait

When carrying out a process it should take place in sequential manner as the manner stated

below:-

Obtain resource/Request for resource

Utilize resource

Release resource

Deadlock Prevention is the steps we follow in order to avoid deadlocks. There are three methods

in which deadlocks can be handled. They are:

Use a protocol to avoid all deadlocks and to prevent the occurrence of deadlocks in the

future.

Allow system to go through a deadlock position, perceive it, and pull through from it.

Ignore the problem.

Deadlock detection is the process where the Operating System detects a deadlock situation and

tries to recover from it. There are two methods which Operating Systems use to recover from

deadlocks.

Either the user recovers from the deadlock manually or the system recovers itself from the

deadlock automatically.

When recovering from the deadlock automatically there are 2 options which are:

Process extinction

Terminate all the deadlock processes.

Terminate one process at a time.

9

Resource preemption

Victim selection

Rollback

Starvation

Ubuntu 12.04 is well equipped with deadlock detection methods & avoids causing deadlocks as

we mentioned above.

Memory Management

Memory management of a computer deals mainly with the allocation of space to processors in

the Random Access Memory (RAM) of the computer. This memory management process is an

extremely vital part of the computer because the effective management of the memory of the

computer is what helps the computer work at great speeds and effectively execute multiple tasks

without the problem of a system overload and failure.

The memory is divided into two main components; this is the physical memory and the logical

memory. The physical memory is the memory available in the RAM and is divided into multiple

memory blocks with a physical address that are used to load processes.

The Logical memory is where the Operating system of the computer has to assign more than one

physical memory blocks from the RAM for a particular process. In this situation the process will

have more than one physical address and this will cause problems in situations where paging is

done. Therefore the operating system gives the process a logical memory block which is a

combination of the physical memory blocks of the system and this logical memory block contain

a logical address that will assigned to this logical memory block.

Ubuntu along with all other Linux systems use a feature called Cached Memory when it comes

to handling the memory in the RAM. It uses a tool called “top” which at any given time attempts

to keep the free memory in the system to a minimum.

10

This is because if RAM is not used it has a tendency to get wasted. The way that this happens is

the ‘top’ tool takes up the space by filling it with what is known as cached space. This cached

memory though appears to be occupied is actually free and available for the system to load any

new programs that come up.

Another reason Linux uses this method is because it faster to access information from the cached

space than from the hard disk itself.

Virtual Memory Management

Virtual memory is a segment of the hard disk that has been partitioned off so that the large

processors which require more memory than that is available in the RAM (Random Access

Memory) have to be run. The method employed is a basic tactic where a segment of the

secondary storage, usually a drive, is allocated for this large process to load and a small segment

on the RAM is given where the files load on to at run time for each particular task.

This way in which this operates is when a large process needs to be executed the system will

have a problem where the memory available on the RAM is insufficient. In this case what the

system does is it allocates a segment of the secondary storage device for this process to load its

files. After which the pages that are required to by the process to run will be loaded when needed

using a technique known as Paging or Swapping. This is where a page is loaded on to the RAM

when the process requires it and then after the process is done with that particular page the

process will remove the page from the memory and load a new page.

This method of creating a virtual memory is extremely valuable as this method allows for large

processors to be loaded in to the system without having the problem of the memory being

overloaded. The most preferred method of transferring pages is the Swapping technique. The

reason being is that this method allows for more than one file to be loaded at a time and thus

making a far more effective system that will operate at greater speeds.

Ubuntu OS uses a feature called Swap Space for managing the memory. This is a space on the

hard disk which is a part of the Virtual Memory. This space is used by the system to store file

11

that are currently inactive and create space for new files to be loaded. The Swap Space

temporally stores the file until either the file is needed again or the process is terminated. This

method allows for faster speed on the system.

However the process will not directly use files stored in the Swap Space as this would cause the

system to operate slower.

Another method used to manage the virtual memory is a technique known as Demand Paging.

The way in which demand paging works is as follow. When the system is required to run a

process that is too big to be loaded to the physical memory completely, then the system loads

only the file immediately required. The remaining files get transferred to the virtual memory.

Here after when the process requires a page to work with it is loaded from the virtual memory.

This process is known as demand paging.

The way in which the Linux operating system handles demand paging works is similar in most

Linux operating systems. When a process starts to execute the operating system will load the

files into the virtual memory. Then the when a command to execute is give the file containing it

is opened and the content mapped to the virtual memory. The way this is carried out is by

changing the data structure that describes the processes memory map. This is known as memory

mapping.

However the entire file is not brought up, only the first part is brought up. This will create a page

fault and then the operating system will search the memory map to determined which remaining

part of the file to bring up to be executed.

12

Input Output

To the user, the I/O system in Linux looks much like that in any UNIX system. That is, to the

extent possible, all device drivers appear as normal files. Users can open an access channel to a

device in the same way they opens any other file-devices can appear as objects within the file

system. The system administrator can create special files within a file system that contain

references to a specific device driver, and a user opening such a file will be able to read from and

write to the device referenced. By using the normal file-protection

Figure2. Device-driver block structure.

system, which determines who can access which file, the administrator can set access

permissions for each device.

Linux splits all devices into three classes: block devices, character devices, and network devices.

Figure2. Illustrates the overall structure of the device-driver system.

13

Block Devices: include all devices that allow random access to completely independent, fixed-

sized blocks of data, including hard disks and floppy disks, CD-ROMs, and flash memory. Block

devices are typically used to store file systems, but direct access to a block device is also allowed

so that programs can create and repair the file system that the device contains. Applications can

also access these block devices directly if they wish; for example, a database application may

prefer to perform its own, fine-tuned laying out of data onto the disk, rather than using the

general-purpose file system.

Character devices: include most other devices, such as mice and keyboards. The fundamental

difference between block and character devices is random access-block devices may be accessed

randomly, while character devices are only accessed serially. For example, seeking to a certain

position in a file might be supported for a DVD but makes no sense to a pointing device such as

mouse.

Network devices:-are dealt with differently from block and character devices. Users cannot

directly transfer data to network devices; instead, they must communicate indirectly by opening a

connection to the kernel's networking subsystem.

Inter process Communication

Linux provides a rich environment for processes to communicate with each other.

Communication may be just a matter of letting another process know that some event has

occurred, or it may involve transferring data from one process to another.

Synchronization

The standard Linux mechanism for informing a process that an event has occurred is the signal.

Signals can be sent from any process to any other process, with restrictions on signals sent to

processes owned by another user. However, a limited number of signals are available, and they

cannot carry information. Only the fact that a signal has occurred is available to a process.

Signals are not generated only by processes. The kernel also generates signals internally; for

example, it can send a signal to a server process when data arrive on a network channel, to a

parent process when a child terminates, or to a waiting process when a timer expires.

14

Internally, the Linux kernel does not use signals to communicate with processes running in

kernel mode. If a kernel-mode process is expecting an event to occur, it will not normally use

signals to receive notification of that event. Rather, communication about incoming

asynchronous events within the kernel takes place through the use of scheduling states and wait

queue structures. These mechanisms allow kernel-mode processes to inform one another about

relevant events, and they also allow events to be generated by device drivers or by the

networking system. Whenever a process wants to wait for some event to complete, it places itself

on a wait queue associated with that event and tells the scheduler that it is no longer eligible for

execution. Once the event has completed, it will wake up every process on the wait queue. This

procedure allows multiple processes to wait for a single event. For example if several processes

are trying to read a file from a disk, then they will awakened once the data have been read into

memory successfully.

Although signals have always been the main mechanism for communicating asynchronous

events among processes, Linux also implements the semaphore mechanism of System V UNIX.

A process can wait on a semaphore as easily as it can wait for a signal, but semaphores have two

advantages: Large numbers of semaphores can be shared among multiple independent processes,

and operations on multiple semaphores can be performed atomically. Internally, the standard

Linux wait queue mechanism synchronizes processes that are communicating with semaphores.

15

Ubuntu Distributed Operating System

A distributed operating system (DOS) allows the creation, implementation and completion of

programs (a single task or a number of different tasks running concurrently – often referred to as

parallel processing) that can utilize a network of general purpose and often heterogeneous

computers to accomplish a task. This network of computers forms a single multi-computer or

cluster. Unlike a parallel processor or SMP (symmetric multiprocessor) system that has common

RAM for UMA (uniform memory access) and has other shared resources, a cluster-based

system’s individual computers (nodes) have no direct access to the resources of other nodes.

Therefore, although conceptually similar in the functions required of an SMP operating system,

the cluster OS has additional requirements to provide communication between cluster nodes

which may be dissimilar in hardware and local operating systems. This is a heterogeneous

cluster.

Distributed Operating Systems (DOS) have the following benefits:

Speed - A process can be run faster by being divided into subtasks (threads) that are run

on two or more nodes. Also, more or dependent processes can be run at the same time.

Low powered machines can be combined to reach supercomputer performance by having

each process a small portion of the task.

Fail Safe Operation - Node failure can be compensated by others in the cluster improving

program reliability.

Remote Data Capture - Remote data acquisition systems, like point of sale, require a

distributed system to work correctly.

Off the Shelf - DOS can use cheap, generic and even dissimilar hardware.

Scalability - DOS can be easily reshaped, expanded and contracted in scope as needs

change.

Cost – Clusters offer a very high performance to price ratio.

16

A DOS needs to supply these basic services:

The ability to run processes on remote computers efficiently (distributed scheduling).

The ability to provide data required for processes and get results from processes.

The ability to provide resources required for processes.

Synchronization of processes so process interaction is possible.

A DOS may also provide process migration if required. But, this can cause high overhead

and complications in the DOS design.

Distributed Topographies

A DOS is used in the control of clusters. These networked collections of personal computers and

workstations are often referred to as a scalable computing cluster (SCC), isolated from a general

purpose network of workstations (NOW). A NOW is used by multiple users. An SCC tends to

consist of homogenous machines for ease of control and speed, while a NOW can be

heterogeneous in the mix of PCs and workstations. Unlike massively parallel processors (MPP)

which typically allow a single user per partition (logical collection of processors), these

topographies generally allow multiple users and time sharing. A cluster can emulate an MPP to

solve problems using similar parallel programming methods.

Linux as a Basis for Building a Distributed Operating System

Linux has an inherent capacity to be developed into a DOS. Linux Inter-Process Communication

(IPC) provides a set of mechanisms for data exchange data and synchronization. These include:

Shared memory - when one process writes some information in a shared memory,

other processes can immediately use it.

IPC Messages - structures for data exchange between processes using system

calls.

Semaphores – for process synchronization.

17

Linux is the Preferred Basis for DOS

Synchronicity is often the basis of scientific advancement. At about the same point in time these

factors came into play:

Linux Torvalds' Linux is made available to developers. It’s free, flexible, open source

and inexpensive.

Powerful desktop computers that can run Linux are plentiful and low cost.

Easy, cheap and standardized networking becomes available

Demand for distributed operating systems grows at an unparallel rate.

These factors have caused Linux to be adopted as a basis of many distributed operating systems.

Because Linux is free with open source code, universities and research labs can experiment with

and create application programmer interfaces (APIs, although, strictly speaking many of these

tools were system programming tools). These efforts addressed the need for distributed operating

systems that were becoming evident in the late 1980s for distributed, as well as massively

parallel systems, to be developed.

18

References

1. Silberschatz A., Galvin P., Gagne G. (2009). Operating System Concepts,8th edition. Hoboken, NJ. John Wiley & Sons.

2. Michael S., (2001). The Linux Codebook: Tips and Techniques for Everyday Use. San Francisco. No Starch Press, Inc.

3. Turnbull J., (2005). Hardening Linux. New York, NY. Springer-Verlag New York, Inc.

ubuntu 12.04

Documents