community services project “enabling communities of collaborating users and services on the...
DESCRIPTION
Community Services Project “Enabling communities of collaborating users and services on the Grid”. Jon B. Weissman Distributed Computing Systems Group Department of Computer Science University of Minnesota. Outline. Motivation and Vision Why network services and the Grid? - PowerPoint PPT PresentationTRANSCRIPT
Community Services Project “Enabling communities of collaborating users and services
on the Grid”
Jon B. Weissman
Distributed Computing Systems Group
Department of Computer Science
University of Minnesota
2
Outline
• Motivation and Vision– Why network services and the Grid?– What are the technical challenges?
• Project Details– System architecture– Service model– Middleware
• Related Work• Conclusion and Future Work
3
Motivation: Why Services?
• What is a network service?
– Software that can be remotely executed across the network– High-end network services crucial to scientific application
communities– Putting network services on-line will increase collaboration and
productivity
• Other Benefits– Service provider maintains, tunes, and upgrades service
automatically– User need not have high-end resources– User need not become an expert in high performance
computing
solver
4
Grid Network Service Benefits
• Synergistic with the “Grid”– Ensemble of geographically-dispersed resources– Network services encapsulated Grid resources
• physical resources (computer, storage, instruments)• soft resources (software components)
– Make Grid resources “invisible”– Common access protocols– Emerging standard for Grid-based network services (OGSA)
• dynamic Web service
– Grid Services form the basis of Virtual Organization concept• A VO is a “Grid-let” specific to a user community
5
Service Modalities
• User view “out-sourcing”– I need to locate genomic, computing, and storage services
• “I want to compare my source sequence library against all known target sets”
– Clear separation between user and service providers
• Service provider view “deployment”– I need to deploy my service in the Grid for community use and/or
“personal use” (in-sourced)
• Resource provider view “hosting”– I need to host services to support my user community, possibly justify
my cost, generate revenue, etc
• Community-based metric– Amount of “science” that can be done: all parties gain benefit– Other metrics could be applicable: cost, barter, etc.
6
CHALLENGE: Dynamism
– assembling services, users, and resources may be performed without pre-planning
– environments/VOs must be flexible and adaptive
=> dynamic service deployment
7
Dynamic Service Deployment
• Want new services can be added to Grid while it is running … and remotely deployed to:– scale service deployment with demand– augment the capabilities of a VO by adding new services– deploy a Grid service on a newly added/discovered pool of
resources– enable a new version of a Grid service to replace an old one
• Service model must support adaptation at many levels– service architecture must adapt to new/replacement services– service must adapt to demand, resource availability– service and service architecture must adapt to faults
8
Issues
• Scheduling – Where to deploy? When to re-deploy? How long to deploy?– Where to ship a service request? How many resources to
grant it?
• Fault Tolerance– How to enable self-managing services? How to mask
failure?
9
Community Services Project
• System Architecture– Seonho Kim, Byoung-Dai Lee
• Middleware– Byoung-Dai Lee, Darin England, Anusha Iyer, Lakshman
Rao Abburi
• Testbeds– Seonho Kim
• Applications– Byoung-Dai Lee, Darin England, Murali Sangubhatla
10
Grid Stack
Applications
Grid system services
Grid Fabric: OGSA
Resources
Application-level Grid services
OGSI: GT, OGSI.net, ...
Reusab leMiddleware
11
Adaptive Grid Services
• Divide services into system and application categories + expose (and support) adaptivity
• System services have generally utilility– adaptive resource provider (ARP)– provide leased resource pools (CPUs, storage, etc) -> ACP, ASP, etc– “pre-installed”
• Application services are more specific– high-end services – high resource requirements– parallel equation solver, gene sequence comparison– adaptive application grid service (AGS)– AGS has a front-end and a back-end– AGS back-end is hosted upon one an ARP
12
(AGS) Service Lifecycle
• Packaging– leverage middleware
• Install– decide on front-end location
• Deploy– decide on back-end location
• Initialize• Access• Teardown
13
Lease ManagerLease Manager
Query moduleQuery module Allocation moduleAllocation module
Deploy moduleDeploy module
ARP
AGSI
AGSRepository
AGS Deployer
AGS Deployer
AGSFront-end
AGSFront-end
Resource Monitor
AGS_factory
lease
Home Site
Remote Site
RequestManagerRequestManager
RuntimePrediction
Service
RuntimePrediction
Service
ServiceInstallerServiceInstaller
SOAP/HTTP
Performance DBStatus DB
past workload…
Information ServiceInformation Service Registry
AGSIAGSI
Request/ResponseService Instance CreationRegister/Query
resources
Dynamic Service Architecture
Globus GT3 Grid Service Platform
AGS back-end
Client
14
Tomcat Servlet Engine
Tomcat Servlet Engine
Home Site
Remote Site
SOAP/HTTP
AXISFramework
(SOAPEngine)
AXISFramework
(SOAPEngine)
AXISFramework
(SOAPEngine)
AXISFramework
(SOAPEngine) ARP
Dynamically Deployed AGS_factory Web-app
InstanceInstanceInstanceInstance
AGSFront-end
AGSDeployer
WARWAR
Tomcat Servlet Engine
AGSFactory
AGSFactory InstanceInstance
Tomcat ManagerTomcat Manager
WebappLoader
WebappLoader
WebappDeployerWebappDeployer
ARP Host Node
Member NodeRequest/ResponseService Instance Creation
AGSFactory
AGSFactory
Dynamic Loading
15
Component API
installer {
ServiceType install (ARP, PackageType); // install AGS front-end service package on ARPvoid uninstall (ServiceType); // uninstall service front-end
}
AGS_Deployer {
ServiceType deploy_AGS (PackageType); // deploy back-end service on a selected ARPvoid undeploy_AGS (ServiceType); // undeploy service back-end
}
AGS_front_end {
void add_new_AGS (ServiceType, LotType); // inform front-end about new back-end AGSvoid remove_AGS (ServiceType); // inform front-end that a back-end AGS has been removedPerfDataType get_perf_data (); // returns performance data for the AGS
}
Key decision-makers:AGS_front_end decides where a service request will be sentAGS_deployer decides where to deploy or re-deploy a service
// service-specific interfaces
, ARP
16
AGS API
AGS/AGSI {
factory: // this interface is supported only by the factory (not the instances, AGSIs)ServiceType create (); // create AGSIvoid init (LotType); // init AGS with lease infovoid shutdown (); // disable servicevoid log_time (RequestType, TimeType); // logs performance data for a completed requestPerDataType get_perf_data(TimeFrameType); // perf. data for prior requests over a past time frame
notification: void event_occured (EventType); // a subscribed event has occurred
adaptation:
void new_resource_lot (LotType); //provide new lot to AGS factory (i.e. adding/removing resources)void add_resources (ResourceType); // AGSI - add resources to this AGSIvoid remove_resources (ResourceType); // AGSI - remove resources from this AGSI...
service-specific-interfaces:
...}
Key idea: service must respond to resource fluctuation
17
ARP APIAdaptiveResourceProvider{
query: AmtType avail_amt (LeaseType); // returns amt of avail resources grantable for the desired leaseLeaseType lease_length(AmtType); // returns maximal lease for the desired amount of resourcesPFType platform_features(AmtType); // returns the platform features of this ARPBoolean have_features(PFType); // does the platform have specific features?ProfType get_profile (TimeFrameType); // returns perf. profile over past/future time frame
notification: void subscribe_event (EventType, AGS); // AGS wishes to subscribe to a particular resource event
allocation: LotType alloc(LeaseType, AmtType); // initial pool allocation req. to AGSBoolean dealloc (LeaseType, AmtType); // dealloc resources to new amountBoolean realloc (LeaseType, AmtType); // increase granted resources to new amountBoolean renew_lease ( LotType, LeaseType, AmtType); // renew old lease (in LotType) to new lease
usage: // usage for ACPServiceType deploy (PackageType, LotType); // deploy service package using valid Lotvoid undeploy (ServiceType); // undeploy service from ARP
...} Key idea:
resources are allocated and leased in type-specific lotsARP exposes its features
18
Univ. Virginia(Solaris)
Deployed Services
Univ. MinnesotaSupercomputing
Institute(Solaris)
WAN
WAN
LAN
LANAGS
Front-endAGS
Front-end
AGSFactory
AGSFactory
ARPARP
Scenarios (Request Flow) Client AGS_Front-End AGS_Factory
Scenario 1 : LAN-LAN U of M (CS) U of M (MSI) U of M (CS)
Scenario 2 : LAN-WAN U of M (CS) U of M (MSI) U of Virginia
Scenario 3 : WAN-LAN U of Virginia U of M (MSI) U of M (CS)
Scenario 4 : WAN-WAN U of M (CS) U of Virginia U of M (MSI)
Deployed Services
AGSFactory
AGSFactory
ClientClient
Deployed Services
AGSFront-end
AGSFront-end
AGSFactory
AGSFactory
ARPARP
Univ. MinnesotaCS department(Linux/Solaris)
Community Service Testbed
ARPARP
19
System Architecture Results
20
Package Transfer Time (LAN)
0
200
400
600
800
1000
1K 10K 100K 500K 1M
Pakcage Size (Byte)
Tim
e (m
s)
LAN_SocketLAN_HTTPLAN_SOAP
Package Transfer Time (WAN)
0
1000
2000
3000
4000
5000
6000
1K 10K 100K 500K 1M
Package Size (Byte)
Tim
e (m
s)
WAN_SocketWAN_HTTPWAN_SOAP
Package Transfer Time (WAN)
0
10000
20000
30000
40000
50000
60000
5M 10M
Package Size (Byte)
Tim
e (m
s)
WAN_SocketWAN_HTTPWAN_SOAP
Package Transfer Time (LAN)
010002000300040005000600070008000
5M 10M
Package Size (Byte)
Tim
e (m
s)
LAN_SocketLAN_HTTPLAN_SOAP
Deployment/Installation Cost
SOAP penalty is about a factor of 2 (WAN)
21
Trasfer Time (WAN : 5MB file)
0
50000
100000
150000
10 210 410 610 810 1010 1210 1410 1610 1810 2010 2210 2410 2610 2810 3010
Buffer Size (KB)
Tim
e (m
s)Impact of SOAP Buffer Size
Transfer Time (WAN : 500KB)
01000200030004000500060007000
30 80 130 180 230 280 330 380 430 480 530 580 630 680
Buffer Size (KB)
Tim
e (m
s)
SOAP buffers must be sized appropriately
22
Deployment
• Reconfiguration cost is linear in the package size and is on the order of a few seconds
• Transfer cost package size and is on the order of a few seconds (WAN) for both install/deploy
• Total cost on the order of seconds
0
100
200
300
400
500
600
700
800
900
2K 10K 100K 500K
Deployment Cost :Impact of the total library size
Package Transfer Time (WAN)
0
1000
2000
3000
4000
5000
6000
1K 10K 100K 500K 1M
Package Size (Byte)
Tim
e (m
s)
WAN_SocketWAN_HTTPWAN_SOAP
23
Service latency (time in msecs)
End-to-end Latency
Scenario Latency – 0 byte
LAN_LAN 35
LAN_WAN 155
WAN_LAN 148
WAN_WAN 278
Client-> Front-endFront-end -> Back-endInstance creationHandle returned to Front-end
24
LAN-LAN
0%
20%
40%
60%
80%
100%
1K 10K 20K 50K 100K DoNothing
Per
cen
tag
es
Execution TimeTime FrontEnd-FactoryTime Client-FrontEnd
WAN-WAN
0%
20%
40%
60%
80%
100%
1K 10K 20K 50K 100K DoNothing
Per
cen
tag
es
Execution TimeTime FrontEnd-FactoryTime Client-FrontEnd
LAN-WAN
0%
20%
40%
60%
80%
100%
1K 10K 20K 50K 100K DoNothing
Per
cen
tag
es
Execution TimeTime FrontEnd-FactoryTime Client-FrontEnd
WAN-LAN
0%
20%
40%
60%
80%
100%
1K 10K 20K 50K 100K DoNothing
Per
cen
tag
es
Execution TimeTime FrontEnd-FactoryTime Client-FrontEnd
End-end Cost (Eigenvalue service)
25
Middleware
• Common middleware inside service components– scheduling, performance prediction
• Scheduling– Where to send a service request? How many resources to
grant it? How many resources to lease to a service?– Where to deploy?
• Performance prediction– Key to scheduling
• Common = reusable– Observation: best performance predictor, scheduling
technique highly dependent on the service and Grid
26
Solution: “Mixture of Experts”
AGS back-end
Run-Time
History DB
Predictor 1Predictor 2
Predictor N
Run-Time PredictorsPolicy 1
Policy 2Policy M
Scheduling Policies
Request Manager
Service codes
Adaptive code library
Request
Result
AGS front-end
Policy 1Policy 2
Policy M
Scheduling Policies
“Meta-level” algorithms – combinations of point algorithms
27
Middleware Results
28
Performance Prediction
• Meta-level performance prediction
History-Based Dynamic Performance Predictor(N-body simulation service)
0
20
40
60
80
O(N
)
O(N
*N)
O(N
*N*N
)
FILT
ERIN
G
Dyn
amic
Sele
ctio
n(A
VG)
Dyn
amic
Sele
ctio
n(N
OR
M)
Predictors
Nor
mal
ized
Err
or R
ate(
%)
Slow Start -> High Accuracy(N-body simulation service)
-1000
-800
-600
-400
-200
0
200
0 50 100 150 200 250 300 350 400
Request Number
Err
or R
ate(
%)
29
Scheduling: Where to send request?
Performance Comparison(N-body Simulation Service)
0
200
400
600
800
1000
1200
RR WQL MLQ DPS
Avg. Wait TimeAvg. Run-TimeAvg. Service TimeHybrid Workload
0
2000
4000
6000
8000
10000
12000
0 200 400 600 800 1000 1200
Timestamp
Nu
mb
er
of
Bo
die
s p
er
Re
qu
es
t
Where: may depend on service and workload.
Extreme Workload
0
2000
4000
6000
8000
10000
12000
0 50 100 150 200 250 300 350 400 450
Timestamp
Number
of Bod
ies
per Re
quest
0
50
100
150
200
250
300
RR WQL MLQ DPS
Avg. Wait TimeAvg. Run-TimeAvg. Service Time
31
Scheduling: How many resources?
Performance Comparison(N-body simulation service)
0
10
20
30
40
50
60
70
WAIT TIME RUN TIME SERVICE TIME
Time (sec)
MOLDABLEIDEALSRT_HARVESTIB_HARVEST
Service is leased a resource pool.How many resources to give to each request? depends on workload.Meta-level policies is the next step.
32
Current Work
33
Optimizations
• Incremental Service Deployment
• Service Caching
0
200
400
600
800
1000
1200
1400
2KB 10KB 100KB 500KBLAN
Incremental Service Upgrading
Service Redeployment
Impact of Service Caching
0
500
1000
1500
2000
2500
3000
3500
4000
4500
2KB 10KB 100KB 500KB
Package Size
Tim
e (m
illi s
eco
nd
s)
Service CachingService Redeploying through LANService Redeploying through WAN
34
Stochastic Leasing Model
• How many resources to lease a service?• Tradeoffs
– Holding resources has an associated cost proportional to length– Dynamic releasing/reacquiring may be more responsive to
demand and availability, but resource cost may be more expensive
• Probabilistic demand– Random demand, random execution times
• Developed a dynamic programming model– Models cost tradeoffs, and provides optimal leasing policy
35
DP vs. Static Leasing
~20% improvement Less variance
36
MSI
IBM machines
Testbed 2004
ACPACP
Demo (Laptop)
StatusMonitorStatus
Monitor
Solaris machines
ACPACP
beo cluster
ACP ACP
denali cluster
ACPACP
SGI machines
ACPACP
Information Service
CS
ACPACP
ACPACP
katmai.cs
AGS Client Requests
fairbanks
User Interface
SSH
ASPASP
ASPASP
WAN
WAN
GC AGSFront-endGC AGS
Front-endN-body AGS
Front-endN-body AGS
Front-endSolver AGSFront-end
Solver AGSFront-end
sitka.cs beo1.cs s1.msi
Resource Monitor
Resource Monitor
ServiceRegisterServiceRegister
ADCSwindows
ASPASP
Linux machines
LAN
LAN
Storage AGSFront-end
Storage AGSFront-end
a1.msi
37
• Grid service to provide remote archival storage• Transparent but restricted access to cluster of storage disks in ADCS lab
Remote Storage Web Service (ASP)
38
Related Projects
• Service environments and testbeds– NetSolve (Dongarra, U.Tennessee)– Ninf (Matsuoka, Tokyo IT)– Open Grid Services Architecture (OGSA)
• Component architecture and interface– XCAT (Gannon, U. Indiana)
– H2O (Sundarem, Emory)
– Composable Services (Karamcheti, NYU)
• Internet Server environment– Sharc (Shenoy, U. Mass)– Muse (Chase, Duke)
39
Summary
• Community Service Project– Dynamic Grid Infrastructure– Architecture, Middleware, Testbeds
• Addressing Dynamics and Reuse– Service demand, Grid resources– Adaptation at several levels– Meta-level strategies to promote reuse
• For more info: community-services.cs.umn.edu• Thanks to DOE and NSF
40
Future Work
• Customization– how to expose and configure services to meet specific user needs:
performance, fault tolerance, etc
• Data-intensive Services– large amounts of distributed data– deploy services that can process/analyze this data– extend our middleware and system architecture
• Multiple Services– applications may wish to use multiple services together: pipelines are
a common in high-end scientific applications
• Customized environments– collections of customized services configured for specific applications
41
Questions?