high performance corba...performs better than using any static ql value throughput (n =8, l = 150...

65
High Performance CORBA Shikharesh Majumdar Real Time and Distributed Systems Lab Dept. of Systems & Computer Engineering, Carleton University, Ottawa, CANADA. email: [email protected]

Upload: others

Post on 18-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

High Performance CORBA

Shikharesh Majumdar

Real Time and Distributed Systems LabDept. of Systems & Computer Engineering,

Carleton University,Ottawa, CANADA.

email: [email protected]

Page 2: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ACKNOWLEDGMENTS

COLLABORATORS:

» Istabrak Abdul-Fatah» Imran Ahmad» Siva Balasubramaniam» E-Kai Shen» Yiping Shi» E-Kai Shen» Wai Keung Wu

SPONSORS:

» Natural Sciences & Engineering Research Council of Canada (NSERC)

» Communications and Information Technology Ontario (CITO)

» Nortel Networks

» Department of National Defense (CANADA)

Page 3: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

OUTLINE

Part I: Introduction

Part II: Impact of Interaction Architectures

Part III: Performance Optimizations for Systems with Limited Heterogeneity

Part IV: Application Level Design Guide lines for Performance Enhancement

Part V: Real-Time Issues

Page 4: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PART I: INTRODUCTION

Heterogeneity in DOC Systems

Distributed Object Computing (DOC)

» Concurrency, Reliability» Reuse

Heterogeneity: inherent in DOC» System upgrade» Addition of new components

Middleware provides inter-operability in heterogeneous environments

Page 5: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

DOC and Middleware

Both clients and servers use a standard API Client binds to server object with the help of agent Client invokes method in server Server receives request, performs requested operation, and sends reply

MiddlewareMiddlewareOS XOS X OS YOS Y

ServerServerClientClient

AgentAgent(Name Server)(Name Server)

requestrequest

responseresponse

bind bind

(locate)(locate) registerregister

Page 6: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

COMMON OBJECT REQUEST BROKER ARCHITECTURE (CORBA)

A standard put forward by OMG - a consortium of a large no. of companies companies

Basic idea: to access services provided by a server object through a pre-definedinterface

IDL: Interface Definition Language Stubs & Skeletons: provide marshalling/unmarshalling of data CDR: Common Data Representation GIOP: A protocol for inter-ORB communication (IIOP is GIOP over TCP/IP)

» IIOP is GIOP over TCP/IP

Page 7: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

MOTIVATION FOR RESEARCH

Scalability is a desirable attribute for most systems Many real-time applications also require low latency and high bandwidth

» Telephone Switch» Embedded Systems

A drawback of most Commercial-Off-The-Shelf CORBA compliant ORB products:

» often sacrifices performance for providing inter-operability

Research Goal: Develop low overhead high performance middleware systems by

» Using innovative client-agent-server interaction architecture (Part II)» Exploiting limited heterogeneity in systems (Part III)» Using application level performance optimizations (Part IV)

Page 8: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PART II: INTERACTION ARCHITECTURES

Goals The H-ORB Architecture The F-ORB Architecture The P-ORB Architecture Performance Comparison of Interaction Architectures Summary of Observations The Adaptive ORB

Page 9: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

GOALS

Understand the performance impact of client-agent-server interaction architecture on performance

Compare different interaction architectures

Gain insight into scalability and latency of these architectures and understand their relationship with workload

Devise an interaction architecture that produces high performance under a variety of different workload conditions

Page 10: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

THE HANDLE DRIVEN ORB (H-ORB)

Fig. 4.2 from [AbdulFig. 4.2 from [Abdul--FatahFatah

Page 11: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

THE FORWARDING ORB (F-ORB)

Fig. 4.5 from [AbdulFig. 4.5 from [Abdul--Fatah]Fatah]

Page 12: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

THE PROCESS PLANNER ORB (PORB)

Fig. 4.8 from [AbdulFig. 4.8 from [Abdul--Fatah]Fatah]

Page 13: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

EXPERIMENTAL ENVIRONMENT

Use a commercial CORBA compliant ORB: ORBeline (Visigenic Systems) Conduct experiments on a network of workstations running Solaris 2.5 Synthetic application:

» Client: operates cyclically» Server:

– Two copies of each server Si (I=1 or 2)– When invoked a server instance executes a for loop to consume a pre-

determined amount of CPU time.

Page 14: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

DESIGN OF EXPERIMENTS

Primary Factor: N (no. of clients) » Used to control system load

Secondary Factors:» Message Size (L)» Server Demands (S1, S2)» Inter-node Delay (D)

Relative performance of the three architectures observed to depend on these factors.

Performance Measures:» Throughput (client requests/sec)» End-to-end response time» Utilization (Process, CPU)

Page 15: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

EXPERIMENTAL RESULTS:SOFTWARE BOTTLENECKS

Fig. 5.10 and 5.12 from [Abdul-Fatah]

The Agent saturates and limits performance (software bottleneck)

Synchronous communication:» agent waits until message is

received

F-ORB Utilizations

Page 16: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

EXPERIMENTAL RESULTS:PROCESS CLONING

Fig. 5.15 and 5.19 from [Abdul-Fatah]

Degree of Cloning = 4 Degree of Cloning = 8

Page 17: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

EXPERIMENTAL RESULTS :SHORT MESSAGES

Fig. 5.23 and 5.33

No Cloning Degree of Cloning = 4

Higher Throughput

Page 18: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

EXPERIMENTAL RESULTS :LONG INTER-NODE DLEAYS

Fig. 5.37 and 5.42

No Cloning Degree of cloning = 4

Higher Throughput

Page 19: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

SUMMARY OF OBSERVATIONS

Possible to change interaction architecture of a COTS ORB Each architecture has a “winning” attribute

» H-ORB: Light weight in construction» F-ORB: Uses 3 messages per interaction» P-ORB: Invokes servers concurrently

Latency-Scalability Tradeoff Agents can become software bottlenecks

» Solution: agent cloning P-ORB: Tradeoff between concurrency of execution and waiting Communication-Bound Systems: H-ORB is a better choice as long as message

transfer delays are not very high Computation -Bound Systems: F-ORB AND P-ORB seem to be preferable

Page 20: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB

Similar to Fig.3.3 from [Similar to Fig.3.3 from [ShenShen]]

Page 21: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB (Internals)

Fig. 3.6 from [Fig. 3.6 from [ShenShen]]

Page 22: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

EXPERIMENTAL ENVIRONMENTS AND EXPERIMENTS

Experimental Environment:» Network of Sun workstations running under Solaris» Middleware: Orbix

Primary Factors» Threshold QL :

if length of forwarding queue > QL then return server handle to clientelse forward message to server

» QL=24: Forwarding ORB; QL = 0; Handle Driven ORB» N (no. of clients) -- Used to control system load

Secondary Factors:

» Message Size (L)» Server Demands (S1, S2)» Inter-node Delay (D)

Page 23: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB: RESULTS(EFFECT OF MESSSAGE LENGTH)

Fig. 5.15 and 5.17 from [Shen]

L= 4800 Bytes, Adaptive ORB performs the best

L = 19200 Bytes Adaptive ORB produces the highest

performance

Page 24: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB: RESULTS(EFFECT OF INTER-NODE DELAYS)

Fig. 5.21 and Fig. 5.23 from [Shen]

D = 50 ms, Adaptive ORB performs the best

D = 250 ms Adaptive ORB produces the highest

performance

Page 25: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB: RESULTS(NON-EQUIDISTANT NODES)

Fig. 5.33 from [Shen]

Server Closer to Agent Adaptive performs the best

Similar to Fig. 5-35 [Shen]

Client Closer to Agent Adaptive ORB produces the highest

performance

Page 26: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB: NEW RESULTS(EFFECT OF MULTITHREADING)

Polling Interval = 1 ms

Fig. 5.41 from [Shen]

Increasing the degree of multi-threading increases throughput

Fig. 5.42 from [Shen]

Increasing the degree of multithreading further decreases throughput

Page 27: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB: NEWRESULTS(EFFECT OF MULTITHREADING)

Polling Interval = 100 ms

Throughput increases monotonically with the degree of multithreading

Fig. 5.43 and Fig. 5.44 from [Fig. 5.43 and Fig. 5.44 from [ShenShen]]

Page 28: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB: NEWRESULTS(DYNAMIC THERSHOLDS)

How to determine threshold QL?» Use Analytic/simulation model» tuning parameter

Based on principles of balancing the load between forwarding and returning queues

if Qf < Qr + 1 { insert request in forwarding queue }

else {insert request in returning queue } Dynamic threshold (load balancing)

performs better than using any static QL value

Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms)Threshold Throughput

0 4.49971 8.82982 8.92663 8.03284 7.18685 6.24726 5.85927 5.83998 5.846024 5.8502

Dynamic 8.9802

Dynamic Threshold produces best performance (preliminary results)

Page 29: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

ADAPTIVE ORB: OBSERVATIONS

Adaptive ORB combines the good attributes of H-ORB and F-ORB» Highest performance under many workload conditions» up to 100% improvement in performance observed for certain workload

parameters» Close to highest performer for other workloads» Multithreading can be used to enhance performance at higher loads

Determination of Threshold» Simulation and analytic models» Tuning parameter

Dynamic Threshold-based technique looks promising

Page 30: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PART III: SYSTEMS WITHLIMITED HETEROGENEITY

Certain systems may be characterized by limited heteorgeneity» embedded systems: most system components may be built using the same

programming language and platform.

“Fly over” the CORBA protocol when two similar components talk to each other

Goal: Investigate optimization techniques that exploit limited heterogeneity

MiddlewareMiddlewareOS XOS X OS YOS Y

Server 2Server 2ClientClient

OS XOS X

Server 1Server 1

Page 31: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

POSTMORTEM OF A TYPICAL CORBA METHOD INVOKATION

Fig. 4.1 from [Ahmad]Fig. 4.1 from [Ahmad]

Page 32: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

RESULTS OF POSTMORTEM:DISTRIBUTION OF WORK

Experimental Setup:» Client and server execute on a network of PC’s running under Linux» Middleware: ORBACUS» A simple server that inserts a value into a variable» marshalling time (M) » unmarshalling time (U)» communication time (C)» server execution time (S)

Proportion of End-to-End Response TimeCase 1 - Data type:Octet Case 2 - Data Type: Double

Page 33: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

OPTIMIZATION SRTATEGY I:PREVENTION OF “NATIVE -

NEUTRAL” DATA CONVERSIONS

Marshalling: Convert data in native format into data in a neutral format. Unmarshalling: Convert data in neutral format back into native format Data conversion burns CPU cycles.

Optimization: If client and server are NOT heterogeneous » DO NOT perform data conversions(achieved through modification of ORB source code)

MiddlewareMiddlewareOS XOS X OS YOS Y

Server 2Server 2ClientClient

OS XOS X

Server 1Server 1

Page 34: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

OPTIMIZATION STRATEGY I:RESULTS

Experimental Setup:» Network of PC’s running under Linux» Middleware: ORBACUS» A simple server that inserts a value into a variable

Sources of savings:» reduction in computation time (no data conversions) and data copyingFig. 5.2 from [Ahmad]

Page 35: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

OPTIMIZATION STRATEGY II:REMOVING PADDING IN CDR

CDR aligns primitive data types on byte boundaries that are natural for most machine architectures.

Remove padding before transmission of message and re-introduce padding after reception of message

» achieved through interceptor programming

struct CD {char c;double d;

}; c PADDING d0 1 8

PADDING

Page 36: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

OPTIMIZATION STRATEGY II:RESULTS

Padding Removal: communication-computation tradeoff» Savings: communication times» Overheads: increase in marshalling/unmarshalling times

Performance improvement Is possible for communication intensive applications

Experimental Set up: same as before Zone A: Loss in performance Zone B: Gain in performanceFig. 5.3(b) from [Ahmad]

Page 37: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

OPTIMIZATION STRATEGY II:RESULTS (CONTD.)

Significant benefit is possible for data intensive applications Improvement in performance is proportional to the percentage of padding bytes

in the messageNet savings: savings in communication time - increase in marshalling timeFig. 5.3 (a) from [Ahmad]

Page 38: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

FLYOVER PROTOTYPE

Figure 3.6 from [Wu]

•CORBA ORB - when client and server are dissimilar

•Flyover - with similar client and server

Page 39: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

IDLP Tool

Figure 3.5 from [Wu] Implemented in Perl Takes standard CORBA IDL file as input (transparency) Produces Client Stub and Serve Skeleton Choice of path at run time: default - flyover; client invokes a special method to

use CORBA

Page 40: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

References

[Abdul-Fatah] Istabrak Abdul-Fatah, Performance of CORBA-Based Client-Server Architectures, M.A.Sc. Thesis, Dept. of Systems and Computer Eng., 1997.

[Ahmad] Imran Ahmad, Performance Enhancement Techniques for CORBA-Based Systems with Limited Heterogeneity, M.A.Sc. Thesis, Dept. of Systems and Computer Eng., 2002.

[Shen] E-Kai Shen, Perfomance of Adaptive Middleware Systems, M.A.Sc. Thesis, Dept. of Systems and Computer Eng., 2000.

[Wu] Flyover, A Technique for Achieving High Performance CORBA-Based Systems, M.A.Sc. Thesis, Dept. of Systems and Computer Eng., 2001.

Page 41: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

CONNECTION SET UP

Location as well as state of servers and clients can affect the connection set up latency

Typical scenario:

1. Client wants to bind to server2. Middleware looks up implementation repository3. Middleware daemon activates server4. Handle for requested server is returned to client

client

server

daemon

Implementation repository

1

2

3

4

Page 42: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

CONNECTION SET UP (contd.)

Factors that can affect system performance:

» State of server– Active (A) / Not Active (NA)

» Collocation of client and server in same address space – Collocated (C) / Not Collocated (NC)

» Distribution of client and sever

– Same Node (SN) / Different Nodes (DN)

Various combinations of these lead to different cases

Page 43: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

CONNECTION SET UP-RESULTS

Different possible configurationsMode I Mode II Mode III

Conditions Case I Case II Case III Case IV Case VServer state NA NA A A ADistribution SN DN SN DN SNCollocation NC NC NC NC CLatency (ms) 171.8 174.4 54.6 55.2 0.9

0

50

100

150

200

Latency (ms)

Case I Case II Case III Case IV Case V

Page 44: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

CONNECTION SET UP -OBSERVATIONS

Server state has a large impact on connection set up latency» at least 300 % improvement observed by keeping the server active

– Keep at least the popular servers active

If client and server are on the same node collocation can improve performance significantly

» an order of magnitude performance improvement is observed– Collocate clients and servers when ever possible

If client and server can not be collocated placing them on the same node can improve performance only marginally

» performance improvement observed is of the order of 1%– migration of server from client node to a different node is unlikely to

degrade connection setup latency – can balance load more evenly by distributing servers on different nodes[Note: higher network delays (WAN) may lead to a different conclusion]

Page 45: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PRAMETER PASSING

Two issues» native - CDR conversion overhead» bandwidth savings

CDR alignment of primitive fixed-length types [OMG-98].

Starting Bytes Boundary IDL Data Types

Multiples of 1 char, octet, booleanMultiples of 2 short, unsigned shortMultiples of 4 long, unsigned long, float, enumerated typesMultiples of 8 long long, unsigned long long, double, long double Example of complex data structure:struct CD {

char c;double d;}

c Padding d

8 1 0

Padding bytes in a stream

Page 46: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PRAMETER PASSING (contd.)

Complex Data Structure - Case A

Option I (Most natural)» Method 1 with one parameter

struct CD [size] {char c;double d}

Option II» Method1 with two parameters

char [size] c;double [size] d;

Option III» Method1 with parameter char [size] c;» Method 2 with parameter double [size] d;

Page 47: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

CASE A - RESULTS

Array Size 1 100 1000 5000Option 1 (ms) 2,131 3,472 24,556 123,703Option 2 (ms) 2,014 2,314 8,356 41,489Option 3 (ms) 3,747 4,298 15,490 74,168

parameter passing has a large impact on performance (increases with size) separating data types leads to large performance savings (less padding bytes) multiple method invocations may be preferable to single method invocation with

complex parameter

0%

50%

100%

150%

200%

# 1 #100 #1000 #5000

Array Size

Rat

io

Case A.1Case A.2Case A.3

Page 48: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PARAMETER PASSING (contd.)

Complex (Nested) Data Structure - Case B Option I (Most natural): Method 1 with one parameter

struct CSB [size] {char [size] c;struct s { int x; int y; int z;}boolean b;

} Option II: Method1 with three parameters

char [size] c;struct s {int x; int y; int z;}boolean [size] b;

Option III: Method 1 with 5 parameters: char [size] c; x; y; z, bolean [size] b;

Option IV: Method1 with parameter char [size] c; Method 2 with parameter x;Method 3 with parameter y; Method 4 with parameter z; Method 5 with

parameter b.

Page 49: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

CASE B - RESULTS

Response Time of Case B and the Comparison of Four Different OptionsArray Size 1 100 1000 5000Case B.1 (ms) 370 384 419 690Case B.2 (ms) 362 375 502 560Case B.3 (ms) 360 363 388 511Case B.4 (ms) 976 980 1011 1321

Unwrapping parameters improves performance: largest performance improvement occurs when all primitive types are separated

Invoking multiple methods produces worst performance

0.0%

50.0%

100.0%

150.0%

200.0%

250.0%

300.0%

# 1 # 100 # 1000 # 5000

Array Size

Rat

io

Case B.1

Case B.2

Case B.3

Case B.4

Page 50: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PARAMETER PASSING -OBSERVATIONS

Parameter passing strategy is important: large performance benefits can accrue from adopting an appropriate parameter passing mechanism

Two types of overheads:» data conversion (CPU operations in marshalling/unmarshalling) » padding bytes (communication bandwidth + CPU operations in

marshalling/unmarshalling)

Suggested practice: Unravel the complex data structures» Separating native data types into multiple arguments improves performance» [impact of padding byte removal seems to be the dominant factor]» Savings in term of padding bytes seems to be the most dominant factor

When multiple methods are invoked successively » collapsing multiple methods into one may produce performance

improvement (A.2 and A.3 & B.3 and B.4).

Page 51: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

METHOD PLACEMENT AND DISPATCHINGLATENCY

Experiment with one object and 500 methods.

Measure response time for different method invocation

Results of Experimentnthmethod Latency(microsecond)

1st method 16,406100th method 16,742200th method 17,210300th method 17,537400th method 18,019500th method 18,352

10,00012,00014,00016,00018,00020,000

1st met ho d

100 t h m et ho d200 t h m et ho d300 t h m et ho d400 t h m et ho d500 t h m et h od

late

ncy

tim

e (m

s)

Method dispatching: locate object and then locate method in object

Seems to be a linear search (product dependent)

Page 52: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

METHOD PLACEMENT AND DISPATCHINGLATENCY

Single Interface: all methods in one object (Model 1)

Multiple Interfaces: each method in a separate object (Model II)

Latency Measurement

Method Model I Model II

1st method 16,445 17,05110th method 16,496 17,06520th method 16,542 17,09230th method 16,573 17,09640th method 16,639 17,11850th method 16,689 17,124

Single interface has a marginally better performance

» packing multiple methods into one interface can give rise to performance improvement

15,000

15,500

16,000

16,500

17,000

17,500

1s t method

10th method20th method30th method40th method50th method

late

ncy

time

Model I

Model II

Page 53: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PACKING Of METHODS & OBJECTS INTO SERVER

How many methods should be packed in an object?

How many objects should be placed in a server?

Mode I: » Pack all methods in a single

object (server)

Mode II:» Pack each method in a

separate object» Pack all objects in the same

server

::_bind(Obj1:Server)

::_bind(Obj2:Server)

::_bind(Objn:Server)

Client 2

Client n

Server Process

Obj

Client 1,

……

::_bind(Obj1:Server)

::_bind(Obj2:Server)

::_bind(Objn:Server)

Client 2

Client n

Server Process

Obj 1

Client 1

……

Obj 2

Obj n

Page 54: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PACKING OF METHODS & OBJECTS INTO SERVER (contd.)

Mode III: » One method per object» One object per server (same

node)

Mode IV:» one method per object» one object per server» one server per node

::_bind(Obj1:Server1)

Server 1

Obj 1

Client 1

::_bind(Obj2:Server2)

Client 2

Server 2

Obj 2

::_bind(Objn:Servern)

Client n

Server n

Objn

::_bind(Obj1:Server1)

Server 1

Obj1

Client 1

::_bind(Obj2:Server2)

Client 2

Server 2

Obj 2

::_bind(Objn:Servern)

Client n

Server n

Obj n

Page 55: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PACKING OF METHODS & OBJECTS INTO SERVER - RESULTS

No. of Clients # 5 # 8 # 10 # 15Mode I (Latency (ms)) 100.0 152.8 190.0 276.9 Mode II (Latency time) 99.9 153.2 192.8 278.0 Mode III (Latency time) 96.8 155.0 219.4 433.1 Mode IV (Latency time) 71.7 125.4 169.4 257.6

-50.0

100.0150.0200.0250.0300.0350.0400.0450.0500.0

# 5 # 8 # 10 # 15

User Number

Late

ncy

(ms) Mode I

Mode II

Mode III

Mode IV

Page 56: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PACKING OF METHODS & OBJECTS INTO SERVER -RESULTS (contd.)

Higher Service Time (10 ms) No. of Clients = 10

Mode I Mode II Mode III Mode IVLatency (ms) 291.5 291.8 319.8 220.6Comparison (%) -- +0.1% + 9.71 % - 23.42 %

100.0150.0200.0250.0300.0350.0

Mode I Mode II Mode III Mode IV

Mode

Late

ncy

(ms)

Page 57: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

PACKING OF METHODS & OBJECTS INTO SERVER -OBSERVATIONS

Model IV (maximizing concurrency) produces best performance (expected)

Mode III (independent servers on same node) produces worst performance» process switching overhead

Mode I and Mode II perform comparably

Performance differences among modes increase with the no. of clients

Page 58: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING

Load Balancing by Application Designer

Example system: Two classes of clients -busy and non-busy

Replicated Servers Mode I: Unbalanced System

Two busy clients connected to the same server

Mode II: Balanced System

Load split evenly across two servers

serversclients Serverclients

Page 59: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING(Contd.)

Mode III: Half Dynamic Self RR

Busy client selects server by using localRR.

Non-busy client statically bound to a fixed server

Mode IV: Dynamic Self RR

Both busy and non-busy clients select servers by using a local RR.

clients serversclients servers

Page 60: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING(Contd.)

Mode V: Dynamic Central RR

All client requests are sent to centralized dispatcher

Dispatcher uses (global) RR to select server

clients servers

dispatcher

Page 61: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING-RESULTS

Service time =10ms, Number of Clients=4, number of servers =2

TP1 (Busy client) 16.67 25.64 21.74 21.51 13.79 TP2 (Non-Busy client) 1.88 1.86 1.85 1.85 1.75 TP (System) 18.54 27.50 23.59 23.35 15.55 Improvement (%) - - 48.30% 27.21% 25.93% -16.15%

-5.00

10.0015.0020.0025.0030.00

Strgy 1 Strgy 2 Strgy 3 Strgy 4 Strgy 5

Syst

em

Thro

ughp

ut

Page 62: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING-RESULTS

Service time =80 ms, Number of Clients=4, number of servers =2

TP1 (Busy client) 5.04 5.39 5.06 5.02 5.00 TP2 (Non-Busy client) 1.49 1.36 1.34 1.31 1.36 TP (System) 6.53 6.75 6.40 6.32 6.36 Improvement (%) - 3.3% -2.1% -3.2% -2.7%

-

2.00

4.00

6.00

8.00

Case I Case II Case III Case IV Case V

Thro

ughp

ut

Page 63: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING-RESULTS

Effect of Service Time

-20.00%

-10.00%

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

Strgy 1 Strgy 2 Strgy 3 Strgy 4 Strgy 5

Strategy

Com

pariso

n (b

ase

on S

trgy

1)

10ms20ms40ms60ms80ms120ms

Page 64: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING-RESULTS

More complex System - 8 Clients and 4 Servers

Effect of Service Time

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

45.00%

strgy1 strgy2 strgy3 strgy4 strgy5

Strategy

Com

pariso

n (bas

e on

strateg

y)

10ms

20ms

40ms

60ms

80ms

120ms

Page 65: High Performance CORBA...performs better than using any static QL value Throughput (N =8, L = 150 bytes, SA, SB = 10, 15 ms, D = 50 ms) Threshold Throughput 04.4997 18.8298 28.9266

LOAD BALANCING-OBSERVATIONS

Static symmetrically balanced system gave the best performance» needs a priori knowledge of client service requirements» assumes client behavior is in one class (busy or non-busy)

Half Round Robin Vs. Full Round Robin» Both strategies gave a comparable performance

Performance differences among strategies more pronounced at lower server demands

Central Vs. Local Round Robin

» For smaller systems (sub-systems) local RR is preferable– central dispatcher incurs extra message overhead

» For more complex system the centralized RR gives a superior performance