grid monitoring futures with globus jennifer m. schopf argonne national lab april 2003
Post on 28-Mar-2015
221 Views
Preview:
TRANSCRIPT
Grid Monitoring Futures Grid Monitoring Futures with Globuswith Globus
Jennifer M. SchopfJennifer M. Schopf
Argonne National LabArgonne National Lab
April 2003April 2003
2
My DefinitionsMy Definitions
Grid:– Shared resources– Coordinated problem solving– Multiple sites (multiple institutions)
Monitoring:– Discovery
> Registry service> Contains descriptions of data that is available
– Expression of data> Access to sensors, archives, etc.
3
What do *I* mean by What do *I* mean by GridGrid monitoring? monitoring?
Different levels of monitoring needed:– Application specific– Node level– Cluster/site Level– Grid level
Grid level monitoring concerns data – Shared between administrative domains– For use by multiple people– (think scalability)
4
Grid Monitoring Does Not Include…Grid Monitoring Does Not Include…
All the data about every node of every site Years of utilization logs to use for planning next
hardware purchase Low-level application progress details for a single
user Application debugging data (except perhaps
notification of a failure of a heartbeat) Point-to-point sharing of all data over all sites
5
Overview of This TalkOverview of This Talk
Evaluation of information infrastructures– Globus Toolkit MDS2, R-GMA, Hawkeye
– Insights into performance issues
– (publication at HPDC 2003) What monitoring and discovery could be Next-generation information architecture
– Open Grid Services Architecture mechanisms
– Integrated monitoring & discovery arch for GT3
6
Performance and the GridPerformance and the Grid
It’s not enough to use the Grid, it has to perform – otherwise, why bother?
First prototypes rarely consider performance (tradeoff with dev’t time)– MDS1–centralized LDAP
– MDS2–decentralized LDAP
– MDS3–decentralized Grid service Often performance is simply not known
7
Globus Monitoring andGlobus Monitoring andDiscovery Service (MDS2)Discovery Service (MDS2)
Part of Globus Toolkit, compatible with other elements
Used most often for resource selection– aid user/agent to identify host(s) on which to run an
application Standard mechanism for publishing and discovery Decentralized, hierarchical structure Soft-state protocols Caching Grid Security Infrastructure credentials
8
MDS2 ArchitectureMDS2 Architecture
GI IS
Cache contains info fromA and B
GI IS requests infofrom GRIS services
Client 1 Client 2
Client 2 uses GI IS for searching collective information
GRIS register with GI IS
Resource A
GRIS
IPIP Resource B
GRIS
IPIP
IP
Client 1 searchesthe GRIS directly
GI IS
Cache contains info fromA and B
GI IS requests infofrom GRIS services
Client 1 Client 2
Client 2 uses GI IS for searching collective information
GRIS register with GI IS
Resource A
GRIS
IPIPResource A
GRIS
IPIP Resource B
GRIS
IPIP
IP
Resource B
GRIS
IPIP
IP
Client 1 searchesthe GRIS directly
9
Relational Grid Monitoring Relational Grid Monitoring Architecture (R-GMA)Architecture (R-GMA)
Implementation of the Grid Monitoring Architecture (GMA) defined within the Global Grid Forum (GGF)
Three components
– Consumers
– Producers
– Registry
GMA as defined currently does not specify the protocols or the underlying data model to be used.
11
R-GMAR-GMA
Monitoring used in the EU Datagrid Project
– Steve Fisher, RAL, and James Magowan, IBM-UK
Based on the relational data model Used Java Servlet technologies Focus on notification of events User can subscribe to a flow of data with specific
properties directly from a data source
12
R-GMA ArchitectureR-GMA Architecture
13
HawkeyeHawkeye
Developed by Condor Group Focus – automatic problem detection Underlying infrastructure builds on the Condor and
ClassAd technologies
– Condor ClassAd Language to identify resources in a pool
– ClassAd Matchmaking to execute jobs based on attribute values of resources to identify problems in a pool
14
Hawkeye ArchitectureHawkeye Architecture
15
Comparing Information SystemsComparing Information Systems
MDS2 R-GMA Hawkeye
Info
Collector
Information Provider
Producer Module
Info
Server
GRIS Producer Servlet
Agent
Aggregate Info Server
GIIS (Archiver)
Manager
Directory Server
GIIS Registry Manager
16
Some Architecture ConsiderationsSome Architecture Considerations
Similar functional components– Grid-wide for MDS2, R-GMA; Pool for Hawkeye– Global schema
Different use cases will lead to different strengths– GIIS for decentralized registry; no standard protocol
to distribute multiple R-GMA registries– R-GMA meant for streaming data – currently used for
NW data; Hawkeye and MDS2 for single queries Push vs Pull
– MDS2 is PULL only– R-GMA allows push and pull– Hawkeye allows triggers – push model
17
ExperimentsExperiments
How many users can query an information server at a time?
How many users can query a directory server?
How does an information server scale with the amount of data in it?
How does an aggregator scale with the number of information servers registered to it?
18
TestbedTestbed
Lucky cluster at Argonne– 7 nodes, each has two 1133 MHz Intel PIII CPUs (with
a 512 KB cache) and 512 MB main memory Users simulated at the UC nodes
– 20 P3 Linux nodes, mostly 1.1 GHz– R-GMA has an issue with the shared file system, so
we also simulated users on Lucky nodes All figures are 10 minute averages Queries happening with a one second wait
between each query (think synchronous send with a 1 second wait)
19
MetricsMetrics
Throughput– Number of requests processed per second
Response time– Average amount of time (in sec) to handle a request
Load– percentage of CPU cycles spent in user mode and
system mode, recorded by Ganglia– High when running small number compute intensive aps
Load1– average number of processes in the ready queue
waiting to run, 1 minute average, from Ganglia– High when large number of aps blocking on I/O
20
Performance of Information Performance of Information Servers vs. Number of UsersServers vs. Number of Users
0
20
40
60
80
100
120
140
160
0 100 200 300 400 500 600No. of Users
Thro
ughp
ut(q
uerie
s/se
c)
MDS GRIS (cache) MDS GRIS (nocache)Hawkeye Agent R-GMA ProducerServlet(lucky)R-GMA ProducerServlet(UC)
21
Experiment 1 SummaryExperiment 1 Summary
Caching can significantly improve performance of the information server– Particularly desirable if one wishes the server to scale well
with an increasing number of users When setting up an information server, care should be
taken to make sure the server is on a well-connected machine– Network behavior plays a larger role than expected
– If this is not an option, thought should be given to duplicating the server if more than 200 users are expected to query it
24
Information Service Throughput Information Service Throughput vs. Num. of Information Collectorsvs. Num. of Information Collectors
0
1
2
3
4
5
6
7
8
0 20 40 60 80 100# of information Collectors
Thro
ugh
put(
que
ries
/sec
)
MDS GRIS(cache) MDS GRIS(no cache)Hawkeye Agent R-GMA ProducerServlet
25
Experiment 3 SummaryExperiment 3 Summary
Too many information collectors is a performance bottleneck
Caching data helps Alternatively, register to more instances of
information servers with each handling a subset of the collectors
26
Overall ResultsOverall Results
Performance can be a matter of deployment – Effect of background load
– Effect of network bandwidth Performance can be affected by underlying
infrastructure– LDAP/Java strengths and weaknesses
Performance can be improved using standard techniques– Caching; multi-threading; etc.
27
So what could monitoring be?So what could monitoring be?
Basic functionality– Push and pull (subscription and notification)– Aggregation and Caching
More information available More higher-level services
– Triggers like Hawkeye– Viz of archive data like Ganglia
Plug and Play– Well defined protocols, interfaces and schemas
Performance considerations– Easy searching– Keep load off of clients
28
TopicsTopics
Evaluation of information infrastructures– Globus Toolkit MDS2, RGMA, Hawkeye
– Throughput, response time, load
– Insights into performance issues What monitoring and discovery could be Next-generation information architecture
– Open Grid Services Architecture mechanisms
– Integrated monitoring & discovery arch for GT3
29
Open Grid Services Architecture Open Grid Services Architecture (OGSA)(OGSA)
Defines standard interfaces and behaviors for distributed system integration, especially:– Standard XML-based service information
model
– Standard interfaces for push and pull mode access to service data
> Notification and subscription
30
Key OGSI concept - serviceDataKey OGSI concept - serviceData
Every service has it’s own service data– OGSA has common mechanism to expose a
service instance’s state data to service requestors for query, update and change notification
– Monitoring data is “baked right in”
– Service-level concept, not host-level concept
31
serviceDataserviceData
Every Grid Service can expose internal state as serviceData elements– An XML element of arbitrary complexity
Each service has a serviceData set– The collection of serviceData Elements
(SDEs) Example: state of a host is exposed as an
SDE by GRAM. Similar to MDS2 GRIS functionality, but in
each service (rather than once per host)
32
Example:Example:Reliable File Transfer ServiceReliable File Transfer Service
Performance
Policy
Faults
servicedataelements
Pending
FileTransfer
InternalState
GridService
Notf’nSource
Policy
interfacesQuery &/orsubscribe
to service data
FaultMonitor
Perf.Monitor
Client Client Client
Request and manage file transfer operations
Data transfer operations
33
MDS3MDS3 Monitoring and Discovery System Monitoring and Discovery System
Consists of a various components– Core functionality
– Information providers
– Higher level services
– Clients
34
Core FunctionalityCore Functionality
Xpath support – XPath is a language that describes a way to locate
and process items in XML docs by using an addressing syntax based on a path through the document's logical structure or hierarchy
Xindice support – native XML database Registry support
35
Schema IssuesSchema Issues
Need to keep track of service data schema– Avoid conflicts
– Find the data easier Should really have unified naming
approach
All of the tool are schema-agnostic, but interoperability needs a well-understood common language
36
MDS3 Information Providers in MDS3 Information Providers in June ReleaseJune Release
All the data currently in core MDS2 Full data in the GLUE schema for compute
elements (CE)– Ganglia information provider for cluster data will also
be available from Ganglia folks (with luck) Service data from RFT, RLS, GRAM GT2 to GT3 work
– GridFTP server data– Software version and path data – Documentation for translating your GT2 information
provider to a GT3 information provider
37
MDS3 Higher Level ProductsMDS3 Higher Level Products
Higher-level services can perform actions on service data collected from other services
Part of this functionality can be provided by a set of building blocks provided– Provider interface: GRIS-style API for writing
information providers
– Service Data Aggregator: set up subscriptions to data for other services, and publish it as a single data stream
– Hierarchy Builder: allow for hierarchy of aggregators
38
MDS3 Index ServerMDS3 Index Server
Simplest higher-level service is the caching index service
Much like the GIIS in MDS2 Will have configurablity like an GIIS
hierarchy Will also have PHP-style scripts, much as
available today
39
40
Clients Clients currently in GT3currently in GT3
findServiceData command line client– Same functionality of grid-info-search
C bindings – Core C bindings provide findServiceData C
function
– findServiceData command line client gives an example of using it to parse out information (in this case, registry contents)
41
Service Data BrowserService Data Browser
GUI client to display service data from any service
Extensible for data-specific visualization A version was released with GT3 alpha
http://www.globus.org/ogsa/releases/ alpha/docs/infosvcs/sdbquickstart.html
42
Comparing Information SystemsComparing Information Systems
MDS2 R-GMA Hawkeye MDS3
Info
Collector
Info. Provider Producer Module Service (SDEs)
Info
Server
GRIS Producer Servlet
Agent Service (service data set)
Agg. Info Server
GIIS (Archiver) Manager Index Service
Directory Server
GIIS Registry Manager Index Service
43
Is this enough?Is this enough?
No! Many places where additional help
developing MDS3 is needed
44
We Need More Basic InformationWe Need More Basic Information
Interfaces to other sources of data– GPT data– Other monitoring systems– Others?
Service data from other components– Every service has service data– OGSA-DAI
Will need to interface on schema
45
We Will Need More We Will Need More GUIs and ClientsGUIs and Clients
Additional GUI visualizers may be implemented to display service data specific to a particular port type (as part of service data browser)
Additional Client interfaces possibly– Integration into current portals, brokers
46
We Need MoreWe Need MoreHigher Level ServicesHigher Level Services
We have a couple planned– Archiving service
– Trigger template
47
Post-3.0 release: Archiving ServicePost-3.0 release: Archiving Service
Will allow subscription to service data Logging in a flexible way Well defined interfaces for mining Open questions:
– Best way to store time-series of arbitrary XML?
– Best way to query this archive?– Link to OGSA-DAI?– Link to other archivers?
48
Post-3.0 release: Trigger TemplatePost-3.0 release: Trigger Template
Will provide a template to allow subscription to data, reasoning about that data, and a course of action to take place
Essentially, a gateway service between OGSA Notifications and some other notification framework, with filtering of notifications
Example: Subscribe to disk space information, send mail to sys admin when it reached 90% full
Needed: trigger template and several small examples of common triggers, and documentation for how users could extend them or write new ones.
49
Other Possible HigherOther Possible HigherLevel ServicesLevel Services
Site Validation Service Job Tracking Service Interfacing to Netlogger?
50
We Need SecurityWe Need Security
Need I say more?
51
SummarySummary
Current monitoring systems– Insights into performance issues
What we really want for monitoring and discovery is a combination of all the current systems
Next-generation information architecture– Open Grid Services Architecture
mechanisms– MDS3 plans
Additional work needed!
52
ThanksThanks
Testbed/Experiment support and comments– John Mcgee, ISI; James Magowan, IBM-UK; Alain Roy
and Nick LeRoy at University of Wisconsin, Madison;Scott Gose and Charles Bacon, ANL; Steve Fisher, RAL; Brian Tierney and Dan Gunter, LBNL.
This work was supported in part by the Mathematical, Information, and Computational Sciences Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under contract W-31-109-Eng-38. This work also supported by DOESG SciDAC Grant, iVDGL from NSF, and others.
53
Additional InformationAdditional Information
MDS3 technology coordinators – Ben Clifford (benc@isi.edu) – Jennifer Schopf (jms@mcs.anl.gov)
Zhang, Freschl and Schopf, “A Performance Study of Monitoring and Information Services for Distributed Systems”, to appear in HPDC 2003– http://people.cs.uchicago.edu/~hai/hpdcv25.doc
MDS-3 information– Soon at www.globus.org/mds
Extra SlidesExtra Slides
56
Performance of GIS Information Performance of GIS Information Servers vs. Number of UsersServers vs. Number of Users
0
20
40
60
80
100
120
140
160
0 100 200 300 400 500 600No. of Users
Thro
ughp
ut(q
uerie
s/se
c)
MDS GRIS (cache) MDS GRIS (nocache)Hawkeye Agent R-GMA ProducerServlet(lucky)R-GMA ProducerServlet(UC)
62
Directory Server ScalabilityDirectory Server Scalability
00.5
11.5
22.5
33.5
44.5
1 10 50 100 200 300 400 500 600No. of Users
Load
1
MDS GIIS Hawkeye ManagerR-GMA Registry(lucky) R-GMA Registry(UC)
top related