distributed systems

47
Lecture 14 – Operating System Architecture and Performance

Upload: george-foster

Post on 31-Dec-2015

15 views

Category:

Documents


0 download

DESCRIPTION

Lecture 14 – Operating System Architecture and Performance. Distributed Systems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed Systems

Lecture 14 – Operating System Architecture and Performance

Page 2: Distributed Systems

Part 1 – Operating System Support: Clearly the ability of an operating system to adequately provide the support for both local and remote interprocess communication is the most important characteristic of an operating system in determining its suitability for use in a distributed system. We will begin today’s lecture by looking in detail at the issues related to providing this support.

Page 3: Distributed Systems

OS Features of Concern:

• What are the primatives provided by the OS to facilitate remote interprocess communication

• Which standard communications protocols are supported by the OS to do this?

more …

Page 4: Distributed Systems

OS Features of Concern:

• Is the implementation open?

i.e. are the key interfaces well published and widely available?

• What has been done in order to ensure that the communication operations are performed efficiently?

more …

Page 5: Distributed Systems

OS Features of Concern:

• What support exists to account for use in networks with high latency?

• What support exists to deal with disconnections from the network?

Page 6: Distributed Systems
Page 7: Distributed Systems

How primitive are the primitives?

• Does the OS only provide basic functions like the “getRequest” and “sendReply?” …

OR

• Are more sophisticated functions such as the “doOperation” provided?

Page 8: Distributed Systems

Kernel

sendRequest

getReply

User-level

doOperation

Kernel

doOperationeffici

ency

Page 9: Distributed Systems
Page 10: Distributed Systems

Theory and Practice:

• There are advantages to embedding higher level functionality in the kernel

• In practice, middleware usually provides this functionality instead

• TCP and UDP are traditionally provided by the OS and used by middleware

Page 11: Distributed Systems

The Research Continues …

Portability and Interoperability

v.s.

Efficiency

Page 12: Distributed Systems

Part 2 – Openness and Standardization: Early in our study of distributed systems, we saw the importance of openness. On the other hand, an open system that is not widely used may be problematic. Standardization is therefore also important.

Page 13: Distributed Systems

Necessity for Internet connectivity has become a given …

• Requirements for UDP and TCP support abound

Page 14: Distributed Systems

Sometimes novel protocols are required for special hardware (ex. wireless) …

• Layered approach helps with this difficulty (i.e. layering allows alternative choices to be made at the lower layers)

Although TCP is a standard choice, it isn’t terribly effective for wireless communications

Page 15: Distributed Systems

Part 3 – Invocation Performance: Given the importance of both local and in particular remote invocation mechanisms, we will be concerned with the costs incurred by the operating system when providing these capabilities. We will see that the network is not the only source of performance problems.

Page 16: Distributed Systems
Page 17: Distributed Systems

What is a remote invocation? …

• Any invocation that crosses address space

• may or may not cross machine boundaries (i.e. network space)

Page 18: Distributed Systems

How is crossing address space like crossing network space?

Arguments need to be copied from space to space

• Not unlike marshaling/unmarshalling

• May rely on similar or the same mechanisms

Page 19: Distributed Systems

Remote v.s. Local

• local invocations only require pointers to arguments - no address space is crossed

• remote invocations are more complex and require copying all the bytes representing all the structures involved across the address spaces involved

Page 20: Distributed Systems

Network capabilities have improved substantially over the years …

… but invocation times have not kept up.

Page 21: Distributed Systems

Invocation Costs• crossing address space

• crossing network space

• marshalling/unmarshalling

• data copying

• thread scheduling

• context switching

Page 22: Distributed Systems

Part 4 – Measuring Performance: To provide a fair basis for comparison to determine the penalties paid for remote procedure calls or remote method invocations as opposed to their local counterparts, we may use a “null RPC” or a “null RMI.” We will now explain what these things are as well as discussing the typical observations that are made when they are used as mechanisms to measure performance.

Page 23: Distributed Systems

Features of null RPC/null RMI:

• Executes a null procedure

• Passes no arguments from the caller

• Returns no results to the caller

Allows measurement of delay introduced by OS/network

Page 24: Distributed Systems

Network I/O for a typical null RPC/RMI is minimal …only 100 bytes.

@ 100 Mbps, this amounts to 0.01 milliseconds

The time required for a typical null RPC/RMI is, however on the order of 0.1 milliseconds

That’s 10X longer than the network calculation suggests!

Page 25: Distributed Systems

Conclusion:

Clearly there is much delay introduced by the operating system in RMI/RPC.

Page 26: Distributed Systems

Part 5 – Improving Performance: There are a number of things that the operating system can do to improve the performance of RPC/RMI. We have already mentioned that one such strategy is to embed higher level functions in the OS, however we noted that this is done often at the expense of portability and interoperability. We will now look at some other alternatives to improve performance.

Page 27: Distributed Systems

• Memory sharing

Rapid “copying” of arguments/results can improve the delay. This extends down the protocol stack from layer to layer.

• Choice of protocols

TCP/UDP … overhead with TCP isn’t always significant. How the OS buffers TCP can be more significant. If the policy of the OS is to wait for more data before sending, this could be a hinderance.

Page 28: Distributed Systems

• Recognition of LRPC

If RPC is on a single machine, improvements in performance can be made by recognizing this (hence the name “lightweight RPC”) and treating it differently to take advantage of the fact that it is on a single machine.

In one implementation, for example, a stack was used to transfer parameters from client to server.

Page 29: Distributed Systems

Part 6 – Network Latency: Despite the preceding material which implied that the operating system was to blame for all performance problems with distributed systems, one must admit that network latencies can often be very high in applications running across the internet. Additionally, such applications may also suffer outright network disconnections for extended periods. This can be viewed as a period of extremely high latency and treated as such. We will now discuss how this can be dealt with and the role that the operating system might play in this.

Page 30: Distributed Systems

Asynchronous and Concurrent Invocations:

As dictated by the operating system, it may be possible to employ the following strategies:

• Permit Asynchronous Invocation (i.e. do not block when performing I/O

• Permit Concurrent Invocations (i.e. allow many operations to take place in parallel)

Page 31: Distributed Systems

Recall:

• Even with one CPU this is advantageous

• This is called pipelining:

Pipelining: The simultaneous execution of many independent subtasks by a single autonomous unit.

Page 32: Distributed Systems

Persistent Invocations:

Again, this availability of this strategy may be dictated by the operating system.

• Persistent invocation does not give up

• The caller may eventually chose to cancel it

• Such a strategy may be appropriate for something like a PDA which might go out of range for a few minutes at a time

Page 33: Distributed Systems

Part 7 – Operating System Architecture: There are two primary types of operating system kernels typically available. These are monolithic kernels and microkernels. We will look at the features of each type, and will compare and contrast each. There are also hybrid versions of kernels possible.

Page 34: Distributed Systems

Openness:

• For efficient use of resources (memory, disk, CPU) only what is required should be included. This is of particular importance with small systems such as PDA’s.

• Any hardware or software component should be able to be altered without requiring changes throughout

• Alternative components should be permissible to meet user preferences

• We should be able to add services without compromising existing ones

Page 35: Distributed Systems

Kernel types:- impacts what level of functionality is incorporated in the kernel and what level of functionality remains in user space

• Monolithic

- large, massive

- non-modular

- difficult to adapt

Page 36: Distributed Systems

Kernel types (cont’d):

• Microkernel

- sleek and streamline

- provides only low level basic functionality

- layers may be built on top to provide portability or user processes can access low level functionality directly to improve performance.*

* This will not rival the performance of higher level functionality embedded in a monolithic kernel.

Page 37: Distributed Systems

Comparison of Kernels:

Advantages of microkernels:

- can enforce modularity

- small size suggests less likelihood of bugs

Advantage of monolithic kernels:

- efficient operations

Page 38: Distributed Systems

Hybrid Kernels:

Performance problems are the biggest disadvantage of microkernels. To deal with this, attempts have been made at providing kernel loadable modules that load into the address space of the kernel to provide higher level functionality without requiring crossing of address spaces.

The research continues …

Page 39: Distributed Systems

Part 8 – Applications: We will now discuss a number of questions based upon applying the material from this lecture.

Page 40: Distributed Systems

Discuss encapsulation, concurrency, protection, name resolution, parameter passing, and scheduling in the context of the UNIX file service running on a single computer.

Page 41: Distributed Systems

Encapsulation:

A process may only access file data and attributes through the system call interface

Concurrency:

Several processes may access the same or different files concurrently. Locks may be placed on files by processes.

Page 42: Distributed Systems

Protection:

Users set access permissions using the familiar ugo/rwx format. Processes are associated with particular users and groups.

Name Resolution:

Pathnames are resolved by looking up each component in the appropriate directory until the actual filename is reached.

Page 43: Distributed Systems

Parameter Passing:

May be done by passing them in machine registers during a system call or by copying them between address spaces

Scheduling:

There are no separate file system threads. All file activity executes in the kernel.

Page 44: Distributed Systems

Why are some system interfaces implemented by dedicated system calls to the kernel while others are built on top of message-based system calls?

Page 45: Distributed Systems

Dedicated system calls are more efficient than message-based calls, however there is an advantage to implementing a system call as an RPC call: It makes the operations transparent between local and remote resources.

Page 46: Distributed Systems

What are the advantages to using copy-on-write for UNIX where a call to fork is often followed with a call to exec? What should occur in the event that the region that has been copied is itself copied?

Page 47: Distributed Systems

It would be wasteful to copy the address space of the forked process since they are immediately replaced. With copy-on-write, only the few pages that needed to be copied prior to the exec would be copied.

If exec is not called, and the forked process forks again, there are then three pages that are codependent - the father, the son, and the grandson. This arrangement of page dependencies complicates the copy-on-write policy.