distributed object systems design issues perry hoekstra consultant talent software services, inc

Distributed Object SystemsDesign Issues

Perry Hoekstra

Consultant

Talent Software Services, Inc

The Dream Assignment

You are the technical lead architect New application development project for an online trading system High level requirements

– Integrate with existing legacy systems

– Have 24x7 availability

– Provide Internet access

– Be flexible to change

– Scalable

– Initial version done in six months

Distributed Object Technology to the rescue!

Right????

Homework

Colleagues, vendors, and the Gartner Group tout how how object technology and distributed systems are solving many of today's IT challenges.

As a diligent architect, you have read numerous white papers and attend many presentations on how object technology and distributed systems will provide IT development projects with the following:

Object Technology provides:

a faster development time

more robust applications

higher level of quality

higher flexibility to meet changing user and business demands

greater scalability

reuse of objects

create world peace

What lurks in the shadows?What lurks in the shadows?

You want to know:

– what issues will come up

– roadblocks put in your way

– design and architectural considerations that must be addressed if using this new model

One Ring to rule them all, One Ring to find them, One Ring to bring them all, and in the darkness bind them,

In the Land of Mordor, where the Shadows lie.

The Essence of Distributed Object Systems

The presenter makes a bold statement:The advent of distributed object computing, or more specifically, cross-platform middleware such as Enterprise JavaBeans, cross-language middleware such as CORBA or the dominant distributed architecture; DCOM has made it dramatically easier for developers to write business objects in different languages, running on different machines with different architectures and operating systems, all communicating with each other.

Ground Rules for Presentation

This isn’t your father’s client-server system The main advantages to be gained from distributed systems are:

– resource sharing across networks– tolerance of failures by replicating devices and data– speedup in computational performance

But realizing these advantages involves addressing many design issues

The following is in no particular order of importance May think many of these are common sense but you would be

surprised

Design

Developing a distributed system take the same steps as with a single-address system:

understanding the domain requirements

developing or selecting the architecture

representing and communicating the architecture

analyzing or evaluating the architecture

implementing the system based on the architecture

ensuring that the implementation conforms to the architecture

Networks

Basic communications Messaging (among components, systems, etc.) Directory services Application services Event management Transaction processing Load management Security management

Most covered in more detail in the following slides

Examples of Network Issues

Established tools make fundamental design assumptions about the configuration, homogeneity, and stability of the network environment, assumptions that fail in dynamic client-network environments

How to provide for non-reliable networks and services (which primarily synchronous systems do not)

How to deal with firewalls blocking traffic (like IIOP)

Variable bandwidths and reliability when talking about global networks

Network Profiling

Profile users and applications to determine their needs Baseline the existing environment

OR Simulate a new network

Bandwidth

Bandwidth is a measure of the carrying capacity of a communications link

It determines the maximum amount of information that can be transmitted over that communications link

Once you have maxed out bandwidth, it’s all about latency

The performance of distributed systems is determined by the speed and latency of end-to-end communication (sending process to receiving process) and not just the network performance

Distributed Communication

Just say NO! Strange that distributed communication is a bad thing The biggest bottleneck to any distributed system is the

network Minimize N/W roundtrips per transaction (roundtrip =

Client send and receive cycle)

One of the first architectural questions is to use Pass by Value or Pass by Reference

Pass by Value Semantics

Remote Method Invocation (RMI) popularized the concept of Pass by Value

Client applications cache business objects in the form of local objects without having to keep an active remote reference to a server

This means that clients can actually loose connectivity in between method invocations completely and still function correctly

Minimize the number of remote method invocations that a system needs to make to accomplish a task

Advantages

Once the objects are located on the local client, the server is effectively stateless

This allows for the failure of that server and its replacement without effecting the functionality of the client application

Performance for the client application is enhanced since interactions with the objects are local

Scalability is enhanced since network usage and server bottlenecks are reduced

Disadvantages

There may be several local objects referring to the same state or piece of data.

Once that data is updated, the other objects holding state are now out of synch with the actual object state

This requires the use of some sort of concurrent usage facility

Is it a proper use of bandwidth to move an entire object in order to update a single text field?

Carelessness, need to take more direct control of what flows back and forth over the network (that seduction thing)

Pass by Reference Semantics

This underlies CORBA Have reference to local proxy which acts as a forwarding

agent to the remote server All state is held on the server Method calls are invoked on the local proxy which

marshals it and sends it to the remote server, processed and sent back

Advantages

Very successful when CORBA objects are used in a coarse-grained “service” orientation

These services act as a façade to a fined grained object model running on the server

Not usually moving large objects across the network, rather shallow copies of object state using IDL structs and sequences

Allows for usage coordination of state on the server

Disadvantages

Network interrupts can cause lose of data Can lead to “chatty” conversations with the remote server

that can saturate a the network Network latency causes client performance problems

Load Balancing

Distribute client requests over a group of server objects that typically live in different server processes and run on different hosts in order to prevent a single server from having to process all the requests

This strategy requires either:

– cache synchronization (covered later) across all servers responsible for client requests

– A load director that hands out server references based on some load algorithm

Load balancing and fault tolerance are usually mentioned in the same sentence

Fault Tolerance

Fault tolerance is a measure of how well an application can deal with exceptional situations

Efforts to improve an application's reliability or fault tolerance usually center on gracefully handling unexpected code faults and unexpected loss of connectivity to some application or external resource

Due to the numbers of concurrent users which are impacted, greater emphasis is usually making server portions of an application very fault tolerant

Horizontal Partitioning

Horizontal partitioning: whereby objects that represent a service exist on one and only one server

It is up to the client to discover/locate the particular server that the objects exist on

Because a object exists on only one server, this eliminates the need for caching

Drawbacks

Introduces a single point of failure and a possible performance bottleneck which load balancing and fault tolerance solve

With horizontally partitioned servers, it is difficult to choose the appropriate partitions, because actual load may not be fairly distributed across chosen partitions

Partitioned servers are generally unable to respond dynamically to changing load conditions.

Caching

Caching the results of remote operations can avoid network overhead associated with calling the same remote methods more than once

Many applications can perform remote operations upon startup rather than during normal usage

Users are often more willing to wait at startup time rather than during application usage

This part of the standard human interface design guidelines

Cache Synchronization

Also called vertical partitioning, whereby the same data is duplicated across multiple servers

This strategy requires cache synchronization across all servers responsible for client requests because any server needs to be able to process any client request

This ability allows for the support of load balancing and fail-over by keeping object state of multiple servers consistent

Extremely complex to implement and coordinate, usually utilizes some type of reliable messaging as a transport

Synchronous Coordination

There are two ways to coordinate clusters of cached servers, synchronous and asynchronous.

With synchronous coordination, the base update and all updates to the other cache servers must occur in a single unit of work

All items must be locked for write and everything must be successful or everything fails

This is a typical two-phase commit type of problem

These units of work can impose high overhead and cause long open transactions and is generally not favored

Asynchronous Coordination

Asynchronous coordination usually involves some type of underlying messaging or event system

The local change is moved to a queue and is propagated to the other servers without coordination between the local server and the remote servers

This is a more ideal solution because the complexity of coordinating multiple applications is considerably reduced

Asynchronous Coordination

However, there is a window of inconsistency between when the local update occurs and the change is propogated to the other servers

There is the possibility of one server's change overwriting another server's change, two servers could update the data simultaneously and propagate those changes out to the other servers.

All database access (other than read) must now go through the application server

Any database changes not funneled through the application server would invalidate the cache

Database Caching

Relies on the underlying database to cache information

Most modern databases have the concept of a buffer cache where the most recently used blocks of database data

The buffer cache contains modified as well as unmodified blocks. Because the most recently (and often the most frequently) used data is kept in memory, less disk I/O is necessary and performance is improved

Database caching also allows applications and users outside of the application server to manipulate and change the underlying data

Database Caching Disadvantages

However, the cached data is now physically separate (implies the application server and database are on two different machines) and communication is over a network connection

This produces a possible bottleneck in I/O between the business object and the database

The database cache is not optimized for access by objects but by it's own table/row format

Multi-Tier Systems

Distributing systems among multiple servers is not a panacea Study done by Yuval Lirov on mission critical systems

management in the financial industry

Has some amazing numbers on how the cost of systems management and support grows exponential with the number of nodes added to the network

That we have done our best in controlling availability and performance but that management is a cost factor that is becoming a very important factor in scalability (even a bottleneck

Transactions

Transactions are precious resources, if you must use one, use it for only a short time

Be careful when used, don’t use for reads Profile - how many database touches occur in one transaction Distributed transactions are even worse because they touch

resources on multiple servers New set of tools seduce developers into not thinking about

transaction processing Need to take more direct control of what flows back and forth

over the network

Alternate Technologies

Don’t be blinded by a certain technology– Can the processing needs be meet by a simpler

technology– If you look at case studies of systems like Amazon and

Charles Schwab they do not use complex technologies If you have a hammer, don’t view everything as a nail

Simplicity

Simple wins, complex loses. I remember avidly reading the literature about the complex experimental hypertext systems of the late '80s and early '90s, but they lost and HTML won. Now, maybe HTML was a little too far on the stupid side, but not so far that it couldn't mop the floor with the competition

-- David Megginson on the xml-dev mailing list Remove the number of moving parts, best-of-breed is not a

solution Do your application really need a EJB/CORBA server

architecture or can it get by simple servlets and JDBC?

Get to know your design patterns

Patterns give the developer some well-described solutions to common problems, so the wheel is not reinvented

A component may use and encapsulate a variety of external resources (e.g. databases, messaging systems, legacy system)

These need to be walled off through the use of the Façade pattern

A popular strategy is the wrappering of persistency objects by service objects, business objects, workflow objects, etc.

Legacy System Integration

Another strategy for the integration of legacy systems is through messaging

Messaging provides a loosely-coupled common communications mechanism whereby diverse applications can exchange information

Messaging also is a viable solution when developers need to integrating disparate applications (connectors)

These interdependent applications can run on the same system or different computers in a network

To manage these interdependencies, applications need a common mechanism for exchanging information.

Security

Security is not an afterthought

– Nothing can blow a project like trying to bolt-on a security scheme after development is completed

– Severity of overrun map to levels of security

Decide on how much security is needed

– Is SSL or Kerberos really needed?

– Encryption, if 50% of overhead is in protocol, it translates to 50% of performance. (more of a factor over slow links)

Monitoring

How are you going to monitor your running server applications– Dead processes– Maxed out CPU processing– Load on the server– Dynamic debugging of running production

servers/objects

Granularity of Services/Objects

Objects too small– Increased network communication among objects– Higher overhead for the server (virtual memory management,

etc) Objects too big

– More “server” logic that has to be duplicated into them to support selective updates

– Accessed/locked more often and locked for a longer period of time (limits scalibility)

– Possibility of update collisions Lengthens transaction time (bad)

Watch getter/setter methods, try to move data in chunks

Reuse

Look at use within component (nothing is free) How many have used third party AWT/Swing components that

were performance dogs? May not meet database read needs, transaction needs May have high remote messaging overhead.

– Normal usage of these components typically utilize objects to move information into and out of the component and access to the components are through distributed network calls.

Think just simple reuse is tough, factor in a distributed system environment

Databases

Come to love your Database Administrator– Indexes– Query plans– Replicated tables

Optimistic/pessimistic locking– Profile your database access, pessimistic locking can tie

up valuable resources What about read locking Can you do batch writes, they are more efficient if you can

get away with it.

Directory Services

Naming is intimately related to network (access and location) transparency and migration transparency

There are separate concerns to be addressed: designing the name space; meeting administrative requirements to partition the name space; and creating an implementation that scales

Caching and replication are key to name service implementation

Random Thoughts

Object/Relational Mapping Tools Make sure development/model office environments mimic

production Use version control Have a testing tool/framework for regression testing Invest in a load/stress testing tool Invest in a code profiler Use asynchronous message passing where there is network

uncertainty This was just a very small list of issues

Bibliography

Web sites– www.cetus-links.org

Books– Enterprise CORBA– Enterprise JavaBeans

Magazines– Enterprise Developer– Component Strategies– Distributed Computing– Sometimes Java Report

distributed object systems design issues perry hoekstra consultant talent software services, inc

Documents