distributed object systems design issues perry hoekstra consultant talent software services, inc
TRANSCRIPT
The Dream Assignment
You are the technical lead architect New application development project for an online trading system High level requirements
– Integrate with existing legacy systems
– Have 24x7 availability
– Provide Internet access
– Be flexible to change
– Scalable
– Initial version done in six months
Homework
Colleagues, vendors, and the Gartner Group tout how how object technology and distributed systems are solving many of today's IT challenges.
As a diligent architect, you have read numerous white papers and attend many presentations on how object technology and distributed systems will provide IT development projects with the following:
Object Technology provides:
a faster development time
more robust applications
higher level of quality
higher flexibility to meet changing user and business demands
greater scalability
reuse of objects
create world peace
What lurks in the shadows?What lurks in the shadows?
You want to know:
– what issues will come up
– roadblocks put in your way
– design and architectural considerations that must be addressed if using this new model
One Ring to rule them all, One Ring to find them, One Ring to bring them all, and in the darkness bind them,
In the Land of Mordor, where the Shadows lie.
The Essence of Distributed Object Systems
The presenter makes a bold statement:The advent of distributed object computing, or more specifically, cross-platform middleware such as Enterprise JavaBeans, cross-language middleware such as CORBA or the dominant distributed architecture; DCOM has made it dramatically easier for developers to write business objects in different languages, running on different machines with different architectures and operating systems, all communicating with each other.
Ground Rules for Presentation
This isn’t your father’s client-server system The main advantages to be gained from distributed systems are:
– resource sharing across networks– tolerance of failures by replicating devices and data– speedup in computational performance
But realizing these advantages involves addressing many design issues
The following is in no particular order of importance May think many of these are common sense but you would be
surprised
Design
Developing a distributed system take the same steps as with a single-address system:
understanding the domain requirements
developing or selecting the architecture
representing and communicating the architecture
analyzing or evaluating the architecture
implementing the system based on the architecture
ensuring that the implementation conforms to the architecture
Networks
Basic communications Messaging (among components, systems, etc.) Directory services Application services Event management Transaction processing Load management Security management
Most covered in more detail in the following slides
Examples of Network Issues
Established tools make fundamental design assumptions about the configuration, homogeneity, and stability of the network environment, assumptions that fail in dynamic client-network environments
How to provide for non-reliable networks and services (which primarily synchronous systems do not)
How to deal with firewalls blocking traffic (like IIOP)
Variable bandwidths and reliability when talking about global networks
Network Profiling
Profile users and applications to determine their needs Baseline the existing environment
OR Simulate a new network
Bandwidth
Bandwidth is a measure of the carrying capacity of a communications link
It determines the maximum amount of information that can be transmitted over that communications link
Once you have maxed out bandwidth, it’s all about latency
The performance of distributed systems is determined by the speed and latency of end-to-end communication (sending process to receiving process) and not just the network performance
Distributed Communication
Just say NO! Strange that distributed communication is a bad thing The biggest bottleneck to any distributed system is the
network Minimize N/W roundtrips per transaction (roundtrip =
Client send and receive cycle)
One of the first architectural questions is to use Pass by Value or Pass by Reference
Pass by Value Semantics
Remote Method Invocation (RMI) popularized the concept of Pass by Value
Client applications cache business objects in the form of local objects without having to keep an active remote reference to a server
This means that clients can actually loose connectivity in between method invocations completely and still function correctly
Minimize the number of remote method invocations that a system needs to make to accomplish a task
Advantages
Once the objects are located on the local client, the server is effectively stateless
This allows for the failure of that server and its replacement without effecting the functionality of the client application
Performance for the client application is enhanced since interactions with the objects are local
Scalability is enhanced since network usage and server bottlenecks are reduced
Disadvantages
There may be several local objects referring to the same state or piece of data.
Once that data is updated, the other objects holding state are now out of synch with the actual object state
This requires the use of some sort of concurrent usage facility
Is it a proper use of bandwidth to move an entire object in order to update a single text field?
Carelessness, need to take more direct control of what flows back and forth over the network (that seduction thing)
Pass by Reference Semantics
This underlies CORBA Have reference to local proxy which acts as a forwarding
agent to the remote server All state is held on the server Method calls are invoked on the local proxy which
marshals it and sends it to the remote server, processed and sent back
Advantages
Very successful when CORBA objects are used in a coarse-grained “service” orientation
These services act as a façade to a fined grained object model running on the server
Not usually moving large objects across the network, rather shallow copies of object state using IDL structs and sequences
Allows for usage coordination of state on the server
Disadvantages
Network interrupts can cause lose of data Can lead to “chatty” conversations with the remote server
that can saturate a the network Network latency causes client performance problems
Load Balancing
Distribute client requests over a group of server objects that typically live in different server processes and run on different hosts in order to prevent a single server from having to process all the requests
This strategy requires either:
– cache synchronization (covered later) across all servers responsible for client requests
– A load director that hands out server references based on some load algorithm
Load balancing and fault tolerance are usually mentioned in the same sentence
Fault Tolerance
Fault tolerance is a measure of how well an application can deal with exceptional situations
Efforts to improve an application's reliability or fault tolerance usually center on gracefully handling unexpected code faults and unexpected loss of connectivity to some application or external resource
Due to the numbers of concurrent users which are impacted, greater emphasis is usually making server portions of an application very fault tolerant
Horizontal Partitioning
Horizontal partitioning: whereby objects that represent a service exist on one and only one server
It is up to the client to discover/locate the particular server that the objects exist on
Because a object exists on only one server, this eliminates the need for caching
Drawbacks
Introduces a single point of failure and a possible performance bottleneck which load balancing and fault tolerance solve
With horizontally partitioned servers, it is difficult to choose the appropriate partitions, because actual load may not be fairly distributed across chosen partitions
Partitioned servers are generally unable to respond dynamically to changing load conditions.
Caching
Caching the results of remote operations can avoid network overhead associated with calling the same remote methods more than once
Many applications can perform remote operations upon startup rather than during normal usage
Users are often more willing to wait at startup time rather than during application usage
This part of the standard human interface design guidelines
Cache Synchronization
Also called vertical partitioning, whereby the same data is duplicated across multiple servers
This strategy requires cache synchronization across all servers responsible for client requests because any server needs to be able to process any client request
This ability allows for the support of load balancing and fail-over by keeping object state of multiple servers consistent
Extremely complex to implement and coordinate, usually utilizes some type of reliable messaging as a transport
Synchronous Coordination
There are two ways to coordinate clusters of cached servers, synchronous and asynchronous.
With synchronous coordination, the base update and all updates to the other cache servers must occur in a single unit of work
All items must be locked for write and everything must be successful or everything fails
This is a typical two-phase commit type of problem
These units of work can impose high overhead and cause long open transactions and is generally not favored
Asynchronous Coordination
Asynchronous coordination usually involves some type of underlying messaging or event system
The local change is moved to a queue and is propagated to the other servers without coordination between the local server and the remote servers
This is a more ideal solution because the complexity of coordinating multiple applications is considerably reduced
Asynchronous Coordination
However, there is a window of inconsistency between when the local update occurs and the change is propogated to the other servers
There is the possibility of one server's change overwriting another server's change, two servers could update the data simultaneously and propagate those changes out to the other servers.
All database access (other than read) must now go through the application server
Any database changes not funneled through the application server would invalidate the cache
Database Caching
Relies on the underlying database to cache information
Most modern databases have the concept of a buffer cache where the most recently used blocks of database data
The buffer cache contains modified as well as unmodified blocks. Because the most recently (and often the most frequently) used data is kept in memory, less disk I/O is necessary and performance is improved
Database caching also allows applications and users outside of the application server to manipulate and change the underlying data
Database Caching Disadvantages
However, the cached data is now physically separate (implies the application server and database are on two different machines) and communication is over a network connection
This produces a possible bottleneck in I/O between the business object and the database
The database cache is not optimized for access by objects but by it's own table/row format
Multi-Tier Systems
Distributing systems among multiple servers is not a panacea Study done by Yuval Lirov on mission critical systems
management in the financial industry
Has some amazing numbers on how the cost of systems management and support grows exponential with the number of nodes added to the network
That we have done our best in controlling availability and performance but that management is a cost factor that is becoming a very important factor in scalability (even a bottleneck
Transactions
Transactions are precious resources, if you must use one, use it for only a short time
Be careful when used, don’t use for reads Profile - how many database touches occur in one transaction Distributed transactions are even worse because they touch
resources on multiple servers New set of tools seduce developers into not thinking about
transaction processing Need to take more direct control of what flows back and forth
over the network
Alternate Technologies
Don’t be blinded by a certain technology– Can the processing needs be meet by a simpler
technology– If you look at case studies of systems like Amazon and
Charles Schwab they do not use complex technologies If you have a hammer, don’t view everything as a nail
Simplicity
Simple wins, complex loses. I remember avidly reading the literature about the complex experimental hypertext systems of the late '80s and early '90s, but they lost and HTML won. Now, maybe HTML was a little too far on the stupid side, but not so far that it couldn't mop the floor with the competition
-- David Megginson on the xml-dev mailing list Remove the number of moving parts, best-of-breed is not a
solution Do your application really need a EJB/CORBA server
architecture or can it get by simple servlets and JDBC?
Get to know your design patterns
Patterns give the developer some well-described solutions to common problems, so the wheel is not reinvented
A component may use and encapsulate a variety of external resources (e.g. databases, messaging systems, legacy system)
These need to be walled off through the use of the Façade pattern
A popular strategy is the wrappering of persistency objects by service objects, business objects, workflow objects, etc.
Legacy System Integration
Another strategy for the integration of legacy systems is through messaging
Messaging provides a loosely-coupled common communications mechanism whereby diverse applications can exchange information
Messaging also is a viable solution when developers need to integrating disparate applications (connectors)
These interdependent applications can run on the same system or different computers in a network
To manage these interdependencies, applications need a common mechanism for exchanging information.
Security
Security is not an afterthought
– Nothing can blow a project like trying to bolt-on a security scheme after development is completed
– Severity of overrun map to levels of security
Decide on how much security is needed
– Is SSL or Kerberos really needed?
– Encryption, if 50% of overhead is in protocol, it translates to 50% of performance. (more of a factor over slow links)
Monitoring
How are you going to monitor your running server applications– Dead processes– Maxed out CPU processing– Load on the server– Dynamic debugging of running production
servers/objects
Granularity of Services/Objects
Objects too small– Increased network communication among objects– Higher overhead for the server (virtual memory management,
etc) Objects too big
– More “server” logic that has to be duplicated into them to support selective updates
– Accessed/locked more often and locked for a longer period of time (limits scalibility)
– Possibility of update collisions Lengthens transaction time (bad)
Watch getter/setter methods, try to move data in chunks
Reuse
Look at use within component (nothing is free) How many have used third party AWT/Swing components that
were performance dogs? May not meet database read needs, transaction needs May have high remote messaging overhead.
– Normal usage of these components typically utilize objects to move information into and out of the component and access to the components are through distributed network calls.
Think just simple reuse is tough, factor in a distributed system environment
Databases
Come to love your Database Administrator– Indexes– Query plans– Replicated tables
Optimistic/pessimistic locking– Profile your database access, pessimistic locking can tie
up valuable resources What about read locking Can you do batch writes, they are more efficient if you can
get away with it.
Directory Services
Naming is intimately related to network (access and location) transparency and migration transparency
There are separate concerns to be addressed: designing the name space; meeting administrative requirements to partition the name space; and creating an implementation that scales
Caching and replication are key to name service implementation
Random Thoughts
Object/Relational Mapping Tools Make sure development/model office environments mimic
production Use version control Have a testing tool/framework for regression testing Invest in a load/stress testing tool Invest in a code profiler Use asynchronous message passing where there is network
uncertainty This was just a very small list of issues