explorations in the grid/ws computing jungle · 5 globus reference implementation of the grid...
TRANSCRIPT
Explorations in theGrid/WS Computing Jungle
A Narrative of the Reverse-Engineering Attemptsof a “New” Distributed Paradigm
François Taïani, Matti Hiltunen & Rick Schlichting
Opera Talk, 6 May 2008, Comp. Dept., Cambridge University
2
The best and safest method of philosophizingseems to be, first to inquire diligently into theproperties of things, and to establish thoseproperties by experiences and then to proceedmore slowly to hypotheses for the explanation ofthem. For hypotheses should be employed only inexplaining the properties of things, but not assumedin determining them; unless so far as they mayfurnish experiments.
Isaac Newton
3
Context: Middleware Complexity
one request,2065 individual invocations,over 50 C-functions and 140C++ classes.
ORBacus RequestProcessing
4
Web Services’ Bright New World
Grid Computing: federating resourcesWeb Services: integrating services
Globus (Argonne Lab.): reference implementation
Large, complex, collaborative middleware (IBM, Apache,...)
Very poor performances: Up to 30s to create a simple distributed object (counter) Up to 2s for a roundtrip remote add operation on this counter
over the Internet
Where does it come from? Does it tell us something about modern mw development?
5
Globus
Reference Implementation of the Grid Standards. Developed by the “Globus alliance”, a partnership around
Argonne National Laboratory.
Globus is a high level “middleware” (software glue) It offers services to share/ use remote distributed “resources”
(CPU, memory, DB, bandwidth)
Since version 3.9.x use Web Service “connectivity” Web Services: “connectivity” middleware across the Internet Integration of services across organizations Related alphabet soup: SOAP, XML, WSDL, HTTP, .NET, etc.
6
Exploration Goals
We wanted to understand Globus (at least its connectivity)
Huge piece of software (3.9.x): 123,839 lines in Java (without reused libraries) 1,908,810 lines in C/C++ (including reused libraries)
Many libraries / technologies involved: XML, WSDL (Descr. Lang), WSRF (Resource Framework) Axis (Java to SOAP), Xerces (XML Parsing), com.ibm.wsdl
How to understand that?
A typical reverse engineering problem
7
Methodology I + First Resultscreate resource
subscribe to changes
add 3
notify ×4
destroy resource
client container
Java VMtracing
execution traces
First attempt: tracing everything (outside the JVM libs) client : 1,544,734 local method call (sic) server : 6,466,652 local method calls (sic) [+time out]
How to visualize such results?
8
Program Visualization: a few Notions
Problem studied for quite a long time now.
Different aspect : collection, manipulation, visualization.
Visualization some form of projection (many proposed).
Our goal: understand software structure:
lib 1
lib 2
lib 3
lib1.Wale .breathlib1.Mammal.inhalelib2.Lung .inhalelib2.Muscle.contractlib2.Nerve .transmitlib3.Signal.travel
GrowingCallStack
Tracing calls reveals the software structure.
9
lib1.Wale .breath
lib1.Mammal.inhale
lib2.Lung .inhale
lib2.Muscle.contract
lib2.Nerve .transmit
lib3.Signal.travel
Methodology I
lib3.Blood .flow
lib3.Pressure.foo
0%20%40%60%80%
100%
1 2 3 4 5 6Stack Depth
Cal
ls lib3lib2lib1
0
1
2
3
1 2 3 4 5 6Stack Depth
Cal
ls lib3lib2lib1
Aggregates invocations of the same library. Chart w.r.t. position in call stack.
A call graph obtained by tracing
lib2.Muscle.stop
lib2.Nerve .transmit
lib3.Signal.travel
10
Methodology I
lib 1
lib 2
lib 3
Package Activity vs. Stack Depth
Software Structure
11
Package Activity vs. Stack Depth
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,0001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55
Stack Depth
Cal
ls
org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers
(client, 1 creation, 4 requests, 1 destruction)
12
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,0001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55
Stack Depth
Cal
ls
org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers
(client, 1 creation, 4 requests, 1 destruction)
0
1
2
3
1 2 3 4 5 6Stack Depth
Cal
ls lib3lib2lib1
Package Activity vs. Stack Depth
13
Package Activity vs. Stack Depth
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,0001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55
Stack Depth
Cal
ls
org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers
(client, 1 creation, 4 requests, 1 destruction)
Looks better, but is the same!
89% of invocations (1,372,534) due to XML!
14
0%
20%
40%
60%
80%
100%1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55
Stack Depth
Cal
ls
org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers
0%20%40%60%80%
100%
1 2 3 4 5 6Stack Depth
Cal
ls lib3lib2lib1
Package Activity vs. Stack Depth
(client, percentage view)
15
Package Activity vs. Stack Depth
0%
20%
40%
60%
80%
100%1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55
Stack Depth
Cal
ls
org.apache.xerces org.apache.xml org.apache.axis org.apache.log4j org.apache.xpath org.apache.commons com.ibm.wsdlothers
(client, percentage view)
XML used by org.apache.axis, not by Globus!
Very long stack probably recursive parsing!
16
What does it tell us?
Most of the local invocations (89%) are related to XMLparsing (org.apache.xerces, org.apache.xml).
The parsing is used by Axis (the SOAP/Java bridge fromthe apache foundation), not directly by Globus.
The very long stacks we observe (up to 57 frames!) mostprobably reflect some recursive parsing loop, rather thanthe program structure.
Similar findings on the server side (only more dramatic,stack depth up to 108 (sic), 4 times more invocations).
17
New Questions
More insight needed:
Does invocation count reflect real performance?
How “bad” is really the platform?
Can we do the same kind of “structural” projection ofprofiling data?
If yes, is it useful?
Our choice: 2 step approach (1) Black box profiling (2) “Internal” profiling using sampling
18
Chosen Approach
2 steps:1. Black box profiling: minimal interferences. Coarse results.2. Sample based profiling: less accurate but more detailed.
We focused on the connectivity of the WSRFimplementation of GT4-Java: Low level “plumbing”. No high level service involved Motivation: profile the founding bricks of the Globus platform
Experimental set-up: Standalone SMP server running 4 Intel Xeon @ 1.6GHz No network cost involved! Avoids context switching overhead! Globus 3.9.4 used (last GT4 alpha release, released Dec.04)
19
client
clock instrumentation
Black-Box Profiling: Approach Black Box Approach: Measure externally visible latencies
Many different situations to be considered!
container
create
add 3
notify 3
destroy ×5
influence of resource init
×5
influence of client init
×5
influence of container init
×50
aver
agin
g
(10 000 invocations)
subscribe
20
Resource Set-Up
×5×5
subscribecreate
client cont.
21
Resource Set-UpContainer initoverhead(~8.2s!)
Client initoverhead(~24.8s!)
High lazy initialization costs! (> 30s!)
Stabilized latency remains high (380ms)
22
First Notification
×5×5client cont.
1st notify
23
First NotificationContainer initoverhead(~430ms)
Client initoverhead (~1.4s!)
Stabilizedlatency(~1.1s!)
24
Second Notification
Lazy initialization everywhere
×5×5client cont.
2nd notify
Resource initoverhead(~930ms!)
Stabilizedlatency 1st
notification(~1.1s)
Stabilized request latency still high (170ms)
25
Internal Profiling: Basics Profiling data obtained through sampling
(SUN hprof basic profiler on Java 1.4) JVM periodically stopped. Stack of active thread is captured. Result : A set of weighted stack traces. Weight = measures
how often the stack was observed.
Visualization:Set of weight stacks = multi-dimensional object Time (represented by weights) Threads: each trace belongs to a thread Control flow (represented by stacks, reflects use relationships) Code Structure (package organization, class hierarchy, etc.)
Projection (aggregation / collapsing) required
Many possibility. Our choice: code structure + stack depth
26
Methodology IIIlib1.Wale .breath
lib1.Mammal.inhale
lib2.Lung .inhale
lib2.Muscle.contract
lib2.Nerve .transmit
lib3.Signal.travel
lib3.Blood .flow
lib3.Pressure.foo
lib2.Muscle.stop
lib2.Nerve .transmit
lib3.Signal.travel
Sampling yields a set of weighted stack traces (weight reflects time spent)
×3 ×1 ×2
0123456
1 2 3 4 5 6Stack Depth
Tim
e U
nits lib3
lib2lib1
Aggregates invocations of the same library. Chart w.r.t. position in call stack.
EXCLUSIVE
0123456
1 2 3 4 5 6Stack Depth
Tim
e U
nits lib3
lib2lib1
INCLUSIVE
27
Experimental Set-Up
containerclient
createsubscribe
add 3notify 3
destroy ×5×5
Java VMhprof
profiling data
28
Sharp drop atlength 13
Container Profiling: Results
Some very deep traces. Lookquite regular beyond depth28 (recursion?)org.apache.axis predominant
Layered structuredfor upper stackdepths
Busy waiting related tonotification management.Outside request critical path.
29
subscribe
notify 3
+ extra granularity to observe packageorg.apache.axis
New Experimental Set-Up
containerclient
create
add 3
destroy ×5×5
Java VMhprof
profiling data
30
This is a recursion inorg.apache.wsdl.symbolTable(web services).Symbol management issue?
Traces of length 13 havedisappeared. They werecaused by the notificationmanagement.
org.globus.wsrf
org.globus.gsi (security)
sun.reflect (reflection)
New Results
31
Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages:Package Name Samples %
org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%
32
Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages:Package Name Samples %
org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%
Symbol management issue?
33
Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages:Package Name Samples %
org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%
SOAP + XML: 44%
34
Profiling Breakdown
Abstracts away low level packages (java.*, etc.)
Sample breakdown among “higher level” packages:Package Name Samples %
org.apache.axis.wsdl 231 21% org.apache.axis.encoding 66 6% org.apache.axis (others) 113 10% org.globus.gsi 249 23% org.globus.wsrf 49 4% cryptix.provider.rsa 82 7% org.apache.xerces 78 7% others 237 21%
Security / Cryptography: 30%
35
Many Other Visualisation Ways
0
100
200
300
400
500
600
700
800Th
read
-3Th
read
-2Th
read
-5Th
read
-9Th
read
-20
Thre
ad-0
Thre
ad-1
0Th
read
-22
Thre
ad-1
1Th
read
-7Th
read
-17
Thre
ad-1
4Th
read
-15
Thre
ad-2
5Th
read
-6Th
read
-21
Thre
ad-1
9Th
read
-28
Thre
ad-2
4Th
read
-29
Thre
ad-1
6Th
read
-18
Thre
ad-2
6Th
read
-13
Thre
ad-2
7Th
read
-8m
ain
Thre
ad-1
2Th
read
-23
Thre
ad-1
Thre
ad-4
Ref
eren
ce H
andl
erpr
oces
s re
aper
Tim
e U
nits
0
5
10
15
20
25
30
Ave
rage
Sta
ck D
epth
Time UnitsAverage Stack Depth
(v3.9.2)
36
Summing Up Globus
Lazy optimisation: very high latency on first invocation ofoperations (up to 30s to set up a resource on a new container!)
Stabilized latencies still high: ~ 160ms for a round triprequest (with authentication turned on)
No clear culprit. Technology overload:WSDL, SOAP, security
Is lazy optimisation a problem? Yes and No.
37
Longer Term
Platform and technology come and goGlobus is a moving target
But experimental approaches stay
And so do development practices
38
Middleware Practices:Are we doomed?
Lazy optimization flexibility paradox
Poor performance abstraction leaking
Reverse engineering Frankenstein’s return?
Can Cognitive-based Middleware save us?
API are for real beings!
39
To look further
Globus profiling The Impact of Web Service Integration on Grid Performance,
François Taïani, Matti Hiltunen, Rick Schlichting, HPDC-14,2005
Large graph reverse engineering CosmOpen: Dynamic reverse-engineering on a budget,
Francois Taiani, Marc-Olivier Killijian, Jean-Charles Fabre, TR COMP-002-2008, Lancaster University, 2008
http://ftaiani.ouvaton.org/7-software/index.html#CosmOpen
Next Generation Middleware Group at Lancaster
40
The End(Thank you)
41
Package Activity vs. “Progress”
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
Progress (% of Local Calls)
% o
f C
alls
in P
rogr
ess
Slic
e .
org.apache.xerces com.ibm.wsdl org.apache.xpath org.apache.xml org.apache.axis org.apache.log4j org.apache.commons org.xml.sax org.globus.wsrf org.apache.namingothersRemote Calls
client(1 creation, 4 requests, 1 destruction)
42
124
0
10
20
30
40
50
60
70
80
90
100
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Stack Depth
Hits
java.netjava.securitysun.security.providerorg.apache.axisothersjava.util.zipjava.langorg.apache.xerces
1124
666.
Profiling Results (Exclusive, Server)