community services project “enabling communities of collaborating users and services on the...

Community Services Project “Enabling communities of collaborating users and services

on the Grid”

Jon B. Weissman

Distributed Computing Systems Group

Department of Computer Science

University of Minnesota

2

Outline

• Motivation and Vision– Why network services and the Grid?– What are the technical challenges?

• Project Details– System architecture– Service model– Middleware

• Related Work• Conclusion and Future Work

3

Motivation: Why Services?

• What is a network service?

– Software that can be remotely executed across the network– High-end network services crucial to scientific application

communities– Putting network services on-line will increase collaboration and

productivity

• Other Benefits– Service provider maintains, tunes, and upgrades service

automatically– User need not have high-end resources– User need not become an expert in high performance

computing

solver

4

Grid Network Service Benefits

• Synergistic with the “Grid”– Ensemble of geographically-dispersed resources– Network services encapsulated Grid resources

• physical resources (computer, storage, instruments)• soft resources (software components)

– Make Grid resources “invisible”– Common access protocols– Emerging standard for Grid-based network services (OGSA)

• dynamic Web service

– Grid Services form the basis of Virtual Organization concept• A VO is a “Grid-let” specific to a user community

5

Service Modalities

• User view “out-sourcing”– I need to locate genomic, computing, and storage services

• “I want to compare my source sequence library against all known target sets”

– Clear separation between user and service providers

• Service provider view “deployment”– I need to deploy my service in the Grid for community use and/or

“personal use” (in-sourced)

• Resource provider view “hosting”– I need to host services to support my user community, possibly justify

my cost, generate revenue, etc

• Community-based metric– Amount of “science” that can be done: all parties gain benefit– Other metrics could be applicable: cost, barter, etc.

6

CHALLENGE: Dynamism

– assembling services, users, and resources may be performed without pre-planning

– environments/VOs must be flexible and adaptive

=> dynamic service deployment

7

Dynamic Service Deployment

• Want new services can be added to Grid while it is running … and remotely deployed to:– scale service deployment with demand– augment the capabilities of a VO by adding new services– deploy a Grid service on a newly added/discovered pool of

resources– enable a new version of a Grid service to replace an old one

• Service model must support adaptation at many levels– service architecture must adapt to new/replacement services– service must adapt to demand, resource availability– service and service architecture must adapt to faults

8

Issues

• Scheduling – Where to deploy? When to re-deploy? How long to deploy?– Where to ship a service request? How many resources to

grant it?

• Fault Tolerance– How to enable self-managing services? How to mask

failure?

9

Community Services Project

• System Architecture– Seonho Kim, Byoung-Dai Lee

• Middleware– Byoung-Dai Lee, Darin England, Anusha Iyer, Lakshman

Rao Abburi

• Testbeds– Seonho Kim

• Applications– Byoung-Dai Lee, Darin England, Murali Sangubhatla

10

Grid Stack

Applications

Grid system services

Grid Fabric: OGSA

Resources

Application-level Grid services

OGSI: GT, OGSI.net, ...

Reusab leMiddleware

11

Adaptive Grid Services

• Divide services into system and application categories + expose (and support) adaptivity

• System services have generally utilility– adaptive resource provider (ARP)– provide leased resource pools (CPUs, storage, etc) -> ACP, ASP, etc– “pre-installed”

• Application services are more specific– high-end services – high resource requirements– parallel equation solver, gene sequence comparison– adaptive application grid service (AGS)– AGS has a front-end and a back-end– AGS back-end is hosted upon one an ARP

12

(AGS) Service Lifecycle

• Packaging– leverage middleware

• Install– decide on front-end location

• Deploy– decide on back-end location

• Initialize• Access• Teardown

13

Lease ManagerLease Manager

Query moduleQuery module Allocation moduleAllocation module

Deploy moduleDeploy module

ARP

AGSI

AGSRepository

AGS Deployer

AGS Deployer

AGSFront-end

AGSFront-end

Resource Monitor

AGS_factory

lease

Home Site

Remote Site

RequestManagerRequestManager

RuntimePrediction

Service

RuntimePrediction

Service

ServiceInstallerServiceInstaller

SOAP/HTTP

Performance DBStatus DB

past workload…

Information ServiceInformation Service Registry

AGSIAGSI

Request/ResponseService Instance CreationRegister/Query

resources

Dynamic Service Architecture

Globus GT3 Grid Service Platform

AGS back-end

Client

14

Tomcat Servlet Engine


Home Site

Remote Site

SOAP/HTTP

AXISFramework

(SOAPEngine)

AXISFramework

(SOAPEngine)

AXISFramework

(SOAPEngine)

AXISFramework

(SOAPEngine) ARP

Dynamically Deployed AGS_factory Web-app

InstanceInstanceInstanceInstance

AGSFront-end

AGSDeployer

WARWAR


AGSFactory

AGSFactory InstanceInstance

Tomcat ManagerTomcat Manager

WebappLoader

WebappLoader

WebappDeployerWebappDeployer

ARP Host Node

Member NodeRequest/ResponseService Instance Creation

AGSFactory

AGSFactory

Dynamic Loading

15

Component API

installer {

ServiceType install (ARP, PackageType); // install AGS front-end service package on ARPvoid uninstall (ServiceType); // uninstall service front-end

}

AGS_Deployer {

ServiceType deploy_AGS (PackageType); // deploy back-end service on a selected ARPvoid undeploy_AGS (ServiceType); // undeploy service back-end

}

AGS_front_end {

void add_new_AGS (ServiceType, LotType); // inform front-end about new back-end AGSvoid remove_AGS (ServiceType); // inform front-end that a back-end AGS has been removedPerfDataType get_perf_data (); // returns performance data for the AGS

}

Key decision-makers:AGS_front_end decides where a service request will be sentAGS_deployer decides where to deploy or re-deploy a service

// service-specific interfaces

, ARP

16

AGS API

AGS/AGSI {

factory: // this interface is supported only by the factory (not the instances, AGSIs)ServiceType create (); // create AGSIvoid init (LotType); // init AGS with lease infovoid shutdown (); // disable servicevoid log_time (RequestType, TimeType); // logs performance data for a completed requestPerDataType get_perf_data(TimeFrameType); // perf. data for prior requests over a past time frame

notification: void event_occured (EventType); // a subscribed event has occurred

adaptation:

void new_resource_lot (LotType); //provide new lot to AGS factory (i.e. adding/removing resources)void add_resources (ResourceType); // AGSI - add resources to this AGSIvoid remove_resources (ResourceType); // AGSI - remove resources from this AGSI...

service-specific-interfaces:

...}

Key idea: service must respond to resource fluctuation

17

ARP APIAdaptiveResourceProvider{

query: AmtType avail_amt (LeaseType); // returns amt of avail resources grantable for the desired leaseLeaseType lease_length(AmtType); // returns maximal lease for the desired amount of resourcesPFType platform_features(AmtType); // returns the platform features of this ARPBoolean have_features(PFType); // does the platform have specific features?ProfType get_profile (TimeFrameType); // returns perf. profile over past/future time frame

notification: void subscribe_event (EventType, AGS); // AGS wishes to subscribe to a particular resource event

allocation: LotType alloc(LeaseType, AmtType); // initial pool allocation req. to AGSBoolean dealloc (LeaseType, AmtType); // dealloc resources to new amountBoolean realloc (LeaseType, AmtType); // increase granted resources to new amountBoolean renew_lease ( LotType, LeaseType, AmtType); // renew old lease (in LotType) to new lease

usage: // usage for ACPServiceType deploy (PackageType, LotType); // deploy service package using valid Lotvoid undeploy (ServiceType); // undeploy service from ARP

...} Key idea:

resources are allocated and leased in type-specific lotsARP exposes its features

18

Univ. Virginia(Solaris)

Deployed Services

Univ. MinnesotaSupercomputing

Institute(Solaris)

WAN

WAN

LAN

LANAGS

Front-endAGS

Front-end

AGSFactory

AGSFactory

ARPARP

Scenarios (Request Flow) Client AGS_Front-End AGS_Factory

Scenario 1 : LAN-LAN U of M (CS) U of M (MSI) U of M (CS)

Scenario 2 : LAN-WAN U of M (CS) U of M (MSI) U of Virginia

Scenario 3 : WAN-LAN U of Virginia U of M (MSI) U of M (CS)

Scenario 4 : WAN-WAN U of M (CS) U of Virginia U of M (MSI)

Deployed Services

AGSFactory

AGSFactory

ClientClient

Deployed Services

AGSFront-end

AGSFront-end

AGSFactory

AGSFactory

ARPARP

Univ. MinnesotaCS department(Linux/Solaris)

Community Service Testbed

ARPARP

19

System Architecture Results

20

Package Transfer Time (LAN)

0

200

400

600

800

1000

1K 10K 100K 500K 1M

Pakcage Size (Byte)

Tim

e (m

s)

LAN_SocketLAN_HTTPLAN_SOAP

Package Transfer Time (WAN)

0

1000

2000

3000

4000

5000

6000

1K 10K 100K 500K 1M

Package Size (Byte)

Tim

e (m

s)

WAN_SocketWAN_HTTPWAN_SOAP


0

10000

20000

30000

40000

50000

60000

5M 10M

Package Size (Byte)

Tim

e (m

s)


Package Transfer Time (LAN)

010002000300040005000600070008000

5M 10M

Package Size (Byte)

Tim

e (m

s)

LAN_SocketLAN_HTTPLAN_SOAP

Deployment/Installation Cost

SOAP penalty is about a factor of 2 (WAN)

21

Trasfer Time (WAN : 5MB file)

0

50000

100000

150000

10 210 410 610 810 1010 1210 1410 1610 1810 2010 2210 2410 2610 2810 3010

Buffer Size (KB)

Tim

e (m

s)Impact of SOAP Buffer Size

Transfer Time (WAN : 500KB)

01000200030004000500060007000

30 80 130 180 230 280 330 380 430 480 530 580 630 680

Buffer Size (KB)

Tim

e (m

s)

SOAP buffers must be sized appropriately

22

Deployment

• Reconfiguration cost is linear in the package size and is on the order of a few seconds

• Transfer cost package size and is on the order of a few seconds (WAN) for both install/deploy

• Total cost on the order of seconds

0

100

200

300

400

500

600

700

800

900

2K 10K 100K 500K

Deployment Cost :Impact of the total library size


0

1000

2000

3000

4000

5000

6000

1K 10K 100K 500K 1M

Package Size (Byte)

Tim

e (m

s)


23

Service latency (time in msecs)

End-to-end Latency

Scenario Latency – 0 byte

LAN_LAN 35

LAN_WAN 155

WAN_LAN 148

WAN_WAN 278

Client-> Front-endFront-end -> Back-endInstance creationHandle returned to Front-end

24

LAN-LAN

0%

20%

40%

60%

80%

100%

1K 10K 20K 50K 100K DoNothing

Per

cen

tag

es

Execution TimeTime FrontEnd-FactoryTime Client-FrontEnd

WAN-WAN

0%

20%

40%

60%

80%

100%


Per

cen

tag

es


LAN-WAN

0%

20%

40%

60%

80%

100%


Per

cen

tag

es


WAN-LAN

0%

20%

40%

60%

80%

100%


Per

cen

tag

es


End-end Cost (Eigenvalue service)

25

Middleware

• Common middleware inside service components– scheduling, performance prediction

• Scheduling– Where to send a service request? How many resources to

grant it? How many resources to lease to a service?– Where to deploy?

• Performance prediction– Key to scheduling

• Common = reusable– Observation: best performance predictor, scheduling

technique highly dependent on the service and Grid

26

Solution: “Mixture of Experts”

AGS back-end

Run-Time

History DB

Predictor 1Predictor 2

Predictor N

Run-Time PredictorsPolicy 1

Policy 2Policy M

Scheduling Policies

Request Manager

Service codes

Adaptive code library

Request

Result

AGS front-end

Policy 1Policy 2

Policy M

Scheduling Policies

“Meta-level” algorithms – combinations of point algorithms

27

Middleware Results

28

Performance Prediction

• Meta-level performance prediction

History-Based Dynamic Performance Predictor(N-body simulation service)

0

20

40

60

80

O(N

)

O(N

*N)

O(N

*N*N

)

FILT

ERIN

G

Dyn

amic

Sele

ctio

n(A

VG)

Dyn

amic

Sele

ctio

n(N

OR

M)

Predictors

Nor

mal

ized

Err

or R

ate(

%)

Slow Start -> High Accuracy(N-body simulation service)

-1000

-800

-600

-400

-200

0

200

0 50 100 150 200 250 300 350 400

Request Number

Err

or R

ate(

%)

29

Scheduling: Where to send request?

Performance Comparison(N-body Simulation Service)

0

200

400

600

800

1000

1200

RR WQL MLQ DPS

Avg. Wait TimeAvg. Run-TimeAvg. Service TimeHybrid Workload

0

2000

4000

6000

8000

10000

12000

0 200 400 600 800 1000 1200

Timestamp

Nu

mb

er

of

Bo

die

s p

er

Re

qu

es

t

Where: may depend on service and workload.

Extreme Workload

0

2000

4000

6000

8000

10000

12000

0 50 100 150 200 250 300 350 400 450

Timestamp

Number

of Bod

ies

per Re

quest

0

50

100

150

200

250

300

RR WQL MLQ DPS

Avg. Wait TimeAvg. Run-TimeAvg. Service Time

31

Scheduling: How many resources?

Performance Comparison(N-body simulation service)

0

10

20

30

40

50

60

70

WAIT TIME RUN TIME SERVICE TIME

Time (sec)

MOLDABLEIDEALSRT_HARVESTIB_HARVEST

Service is leased a resource pool.How many resources to give to each request? depends on workload.Meta-level policies is the next step.

32

Current Work

33

Optimizations

• Incremental Service Deployment

• Service Caching

0

200

400

600

800

1000

1200

1400

2KB 10KB 100KB 500KBLAN

Incremental Service Upgrading

Service Redeployment

Impact of Service Caching

0

500

1000

1500

2000

2500

3000

3500

4000

4500

2KB 10KB 100KB 500KB

Package Size

Tim

e (m

illi s

eco

nd

s)

Service CachingService Redeploying through LANService Redeploying through WAN

34

Stochastic Leasing Model

• How many resources to lease a service?• Tradeoffs

– Holding resources has an associated cost proportional to length– Dynamic releasing/reacquiring may be more responsive to

demand and availability, but resource cost may be more expensive

• Probabilistic demand– Random demand, random execution times

• Developed a dynamic programming model– Models cost tradeoffs, and provides optimal leasing policy

35

DP vs. Static Leasing

~20% improvement Less variance

36

MSI

IBM machines

Testbed 2004

ACPACP

Demo (Laptop)

StatusMonitorStatus

Monitor

Solaris machines

ACPACP

beo cluster

ACP ACP

denali cluster

ACPACP

SGI machines

ACPACP

Information Service

CS

ACPACP

ACPACP

katmai.cs

AGS Client Requests

fairbanks

User Interface

SSH

ASPASP

ASPASP

WAN

WAN

GC AGSFront-endGC AGS

Front-endN-body AGS

Front-endN-body AGS

Front-endSolver AGSFront-end

Solver AGSFront-end

sitka.cs beo1.cs s1.msi

Resource Monitor

Resource Monitor

ServiceRegisterServiceRegister

ADCSwindows

ASPASP

Linux machines

LAN

LAN

Storage AGSFront-end

Storage AGSFront-end

a1.msi

37

• Grid service to provide remote archival storage• Transparent but restricted access to cluster of storage disks in ADCS lab

Remote Storage Web Service (ASP)

38

Related Projects

• Service environments and testbeds– NetSolve (Dongarra, U.Tennessee)– Ninf (Matsuoka, Tokyo IT)– Open Grid Services Architecture (OGSA)

• Component architecture and interface– XCAT (Gannon, U. Indiana)

– H2O (Sundarem, Emory)

– Composable Services (Karamcheti, NYU)

• Internet Server environment– Sharc (Shenoy, U. Mass)– Muse (Chase, Duke)

39

Summary

• Community Service Project– Dynamic Grid Infrastructure– Architecture, Middleware, Testbeds

• Addressing Dynamics and Reuse– Service demand, Grid resources– Adaptation at several levels– Meta-level strategies to promote reuse

• For more info: community-services.cs.umn.edu• Thanks to DOE and NSF

40

Future Work

• Customization– how to expose and configure services to meet specific user needs:

performance, fault tolerance, etc

• Data-intensive Services– large amounts of distributed data– deploy services that can process/analyze this data– extend our middleware and system architecture

• Multiple Services– applications may wish to use multiple services together: pipelines are

a common in high-end scientific applications

• Customized environments– collections of customized services configured for specific applications

41

Questions?

community services project “enabling communities of collaborating users and services on the...

Documents

storage services

dynamismassembling services

visionwhy network services

service request

selfmanaging services

grid jon

scale service deployment

darin england