2668 (1) disturbuted systems

25
Scale of Distributed Systems

Upload: mahender-kumar

Post on 24-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Scale of Distributed Systems

What is distributed systems?

• A collection of independent computers that appears to its users as a single coherent system

. A distributed system organized as middleware. The middleware layer runs on all machines, and offers a uniform interface to the system

What is a scale..?

•A system is said to be scalable if it can handle the addition of users and resources without suffering a noticeable loss of performance or increase in administrative complexity

•Building a system to fully use such resources requires an understanding of the problems of scale.

•The scale of a system has three dimensions:– Numerical– Geographical– Administrative

Dimensions of a scale

• Numerical :- The number of users and objects that are part of the objects that are part of the system

• Geographical:- the distance between the farthest nodes in the system

• Administrative:-the number of organizations that exert administrative control over pieces of the system.

• If a system is expected to grow, its ability to scale must be considered when the system is designed

Distributed systems consciously deigned to scale

• Grapevine was one of the earliest distributed computer systems consciously designed to scale

More recent projects that are concentrated on particular subsystems

Internet Domain Naming Systems

Kerberos

Sprite

Sprite

DEC’s Global Naming and Authentication services

Concentrating on complete Scalable systems

Locus

Dash

Amoeba

The Effects of Scale• Scale affects systems in numerous ways

• Scalability is negatively affected when the system is based on– Centralized server: one for all users

– Centralized data: a single data base for all users

– Centralized algorithms: one site collects all information, processes it, distributes the results to all sites.

contd…,• These are some of the issues that affect the scalability of a system as

a whole.

Reliability:- As the system scales geographically, it becomes less likely that all components will be able to communicate. Can over come by-

Autonomy

Replication

System Load:- If system gets bigger the amount of data that must be managed by network services grow and also the total number of requests increases.

Can overcome by –

Replication

Distribution and caching

Contd..,

Administration:- As the number of nodes in a system grows it becomes impractical to maintain information about the system and its users in each node.

It can be overcome when common information is maintained centrally

Heterogeneity:- It is likely that systems which cross administrative boundaries will not only include hardware of different types but also running different O.S or different versions

Dealing with Heterogeneity-

Coherence

The effects of scale on particular subsystems.

• If a system is expected to grow, its ability to scale must be considered when the system is designed

• The three dimensions of scale affect distributed systems in many ways.

• Scale also affects the user's ability to easily interact with the system.

• Among the affected components are

Naming and Directory services

The security Subsystems

Remote resources

Naming and Directory Services

• A name server maps a name to information about the name’s binding

• The information might be the address of an object or it might be the general

Granularity of Naming

UID- Based Naming

Directory Services

contd..,

• Granularity of Naming :- Name Servers actually differs in the size of the objects they name

some name servers may name only hosts , some may name individual users and services and few may name only individual files

The size of naming database ,the frequency of queries and the read-to- write ratio are all affected by the granularity of the objects named

• UID-Based Naming :- Uses unique Identifier to name the objects ,usually contains information about the server and the identifier.

Problem with Uid based naming is that objects move, the UID often identify the server on which objects resides

Contd..,

• Directory Services :- A directory contains UID’s for files other directories or in fact any object for which UID exists

There is no requirement that a subdirectory be on the same server as its parent.

Different parts of name space can reside on different machines

A directory server can support pieces of independent name spaces, and it is possible for those name spaces to overlap, or even to contain cycles.

The Security Sub Systems

• As the size of a system grows, security becomes increasingly important and increasingly difficult to implement.

• The bigger the system, the more vulnerable it is to attack

• Security has some aspects:

– Authentication :- how the system verifies a user's identity

Passwords

Host based authentication

Encryption based authentication

Authentication protocol

Contd..,

• Authorization:-

A request is sent to an authorization service whenever a server needs to make an access control decision. The authorization service makes the decision and its answerer back to the server

The client is first authenticated, then the server make its own decision about whether the client is authorized to perform an operation

Remote Resources

• Scale affects the sharing of many kinds of resources like processors, memory, storage, programs, and physical devices.

• The services that provide access to these resources often inherit scalability problems from the naming and security mechanisms they use.

• This section will look at the scaling issues related to network communication in such services

Communication

File Systems

.

Communication

Communication typically takes one of two forms: Point-to-point :-In point -to-point communication the

client sends messages to the particular server that can satisfy the request.

Broadcast :-the client sends the message to everyone, and only those servers that can satisfy the request

As a system grows geographically, the medium of communications places limits on the system's performance.

These limits must be considered when deciding how best to access a remote resource

File Systems

• The file system provides an excellent example of a service affected by scale.

• Files are spread across many servers, and each server only processes requests for the files that it stores.

• Files are assigned to multiple servers, and clients contact a subset of the servers when making requests

• It is heavily used, and it requires the transfer of large amounts of data

Solutions for scalability..?

•Shed load, but not too much.

•Avoid global broadcast.

•Support multiple access mechanisms.

•Keep the user in mind.

•Building and evaluating scaling techniques– Replication– Distribution– caching

Scaling Techniques

• Replication –– Replicate important resources– Distribute the replicas.– Use loose consistency

• Distribution –– Distribution technique can be used as a solution to size

scalability.– Distribution means taking a component, splitting it into

parts and spreading those parts among the nodes of the system(Ex: DNS)

Scaling Techniques

Scaling Techniques

• Caching –

Cache frequently accessed data

Consider access patterns when caching.

Cache timeout

Cache at multiple levels.

Look first locally

Conclusion

• A distributed system must be designed such that it is scalable.

• We need to face all the problems discussed thus far and come up with solutions for them in order to build a successful long lasting distributed system.

• Once again, the problems discussed here are a single drop in an ocean full of problems.

References

http://en.wikipedia.org/wiki/Scalability http://ieeexplore.ieee.org “Reliable Distributed Systems, Technologies,

Web Services and Applications” by Kenneth P Birman.

“Design, performance and scalability of the distributed enterprise systems”, by Janusz Kowalik.

“Distributed systems: principles and paradigms” by Andrew S. Tenanbaum.