[email protected] grid infrastructure

42
[email protected] Grid Infrastructure

Post on 20-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

[email protected]

Grid Infrastructure

Page 2: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

What is it ?

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 3

SERVERS

Clients

Page 5: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 6

Page 6: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 7

Page 7: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

THE WORLD NEEDS ONLY FIVE COMPUTERS (Thomas J. Watson)

• Google grid• Microsoft's live.com • Yahoo!• Amazon.com• eBay• Salesforce.com

Well, that's O(5) ;)

Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five)

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 8

Page 8: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Scaling• Scale-up

– Add more resources within the system– Does not requires changes in the applications– Limited extension– Singe point of failure

• Scape-out– Add more systems– Architecture dependent (needs change of code)– Economically

• Howto ?– Split the operation into groups– Perform each group on a different machine

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 9

Page 9: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

How fast can parallelization be ?

• Let:– α be the proportion of the process that can not be

parallelized.– P – number of processors– S – System speedup

Amdhals law:

S = 1 / (α + (1- α ) / P )

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 10

Page 10: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Cluster types• High availability

– Active-Active– Active-Passive– Heart beat

• Load Balancing Cluster– Round robin (weighted/non-weighted)– System status aware (session, cpu load, etc)

• Compute cluster– Queuing system (condor, hadoop, open-pbs, LSF, etc.)– Single system image (ScaleMP, SSI, Mosix, nomad,etc.)

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 11

Page 11: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Batch to On-Line scale

gLite

&

Globus

Dedicated resources

PBS Torque

Utility computing

(Condor)hadoop

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 12

Page 12: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Key Cloud Services Attributes• Off-Site, Thirds-party provider• Access via Internet• Minimal/no IT skills required to “implement”• Provisioning - self-service requesting; near

real-time deployment; dynamic & fine-grained scaling

• Fine-grained usage-based pricing model• UI - browser and successors• Web services APIs as System Interface• Shared resources/common versions

Source: IDC, Sep 2008

Page 13: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

What is “Grid”

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 14

Page 14: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

What is Grid Computing ?

Definition is not widely agreed

Foster & Kesselman:

• Computing resources are not administered centrally.

• Open standards are used.

• Non-trivial quality of service is achieved.

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 15

Page 15: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Other definitions

• "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner)

• "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements“ (Buyya )

• "a service for sharing computer power and data storage capacity over the Internet." (CERN)

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 16

Page 16: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

SOA & Web services

• Decompose processing into services

• Each service works independently

• Main components:– Universal Description, Discovery and Integration– Simple Object Access Protocol – Web Services Description Language

• W3C standard

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 17

Page 17: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

The Grid Metaphor

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 18

GRID

MIDDLEWARE

Visualising

Workstation

Mobile Access

Supercomputer, PC-Cluster

Data-storage, Sensors, Experiments

Internet, networks

Page 18: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Virtual Organization

• What’s a VO?– People in different organisations

seeking to cooperate and share resources across their organisational boundaries

• Why establish a Grid?– Share data– Pool computers– Collaborate

• The initial vision: “The Grid”• The present reality: Many “grids” • Each grid is an infrastructure

enabling one or more “virtual organisations” to share computing resources

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 19

Institute A

VO1

Institute C

Institute B

Institute D

Institute E

VO2Institute F

Page 19: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Stand alone computer

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 20

Hardware

Operating system

Application

Page 20: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Stand alone computer

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 21

Hardware

Operating system

Network stack

Application

Page 21: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Stand alone computer

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 22

Hardware

Operating system

Network stack

Grid Middleware

Application

Page 22: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Grid middleware

• Middleware, is an interfaces between resources and the applications

– User/Program Interface– Resource management– Connectivity – Information services– Collaboration

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 23

Page 23: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Middleware components – The batch approach

Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 24

Information Information ServiceService

SE & CE info

Pu

blis

h

Input “sandbox” + Broker Info

ReplicaReplicaCatalogueCatalogueDataSets info

Logging &Logging &Book-keepingBook-keeping

Author.&Authen.

StorageStorageElementElement

ComputingComputingElementElement

Output “sandbox”

ResourceResourceBrokerBroker

Job Status

Job S

ub

mit

Even

t

Job

Qu

ery

Job

Stat

us

Input “sandbox”

Output “sandbox”

““User User interface”interface”

Page 24: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

UI

NetworkServer

Job Contr.

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

Characts.& status

Page 25: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

submitted

Job Status

UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)

Page 26: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

edg-job-submit myjob.jdlMyjob.jdl

JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;

submitted

Job Statu

s

Job Description Language(JDL) to specify job characteristics and requirements

Page 27: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Input Sandboxfiles

Jobwaiting

submitted

Job StatusNS: network daemon

responsible for acceptingincoming requests

Page 28: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

WM: acts to satisfy the request

Job

Workload manager

Page 29: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

Where must thisjob be executed ?

Page 30: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Matchmaker: responsible to find the “best” CE for a job

Page 31: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/ Broker

Where are (which SEs) the needed data ?

What is thestatus of the

Grid ?

Page 32: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

Match-Maker/Broker

CE choice

Page 33: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

waiting

submitted

Job Status

JobAdapter

Job Adapter: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, PFN, etc.)

Page 34: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

Job Controller: responsible for theactual job managementoperations (done via CondorG)

Job

submitted

waiting

ready

Page 35: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

CE characts& status

SE characts& status

RBstorage

Job Status

Job

submitted

waiting

ready

scheduled

Page 36: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

“Compute element” – reminder!

Homogeneous set of worker nodes

Grid gate node

Local resource management system:Condor / PBS / LSF master

Globus gatekeeper

Job request

Info system

Logging

gridmapfile

I.S.

Logging

Page 37: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

submitted

waiting

ready

scheduled

running

“Grid enabled”data transfers/

accesses

Job

InputSandboxfiles

Page 38: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

Page 39: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

submitted

waiting

ready

scheduled

running

done

edg-job-get-output <dg-job-id>

Page 40: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job submission

UI

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ReplicaLocationServer

Inform.Service

ComputingElement

StorageElement

RB node

RBstorage

Job Status

OutputSandboxfiles

submitted

waiting

ready

scheduled

running

done

cleared

Page 41: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

Job monitoring

UI

Log Monitor

Logging &Bookkeeping

NetworkServer

Job Contr.-

CondorG

WorkloadManager

ComputingElement

RB node

LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB

LB: receives and stores job events; processes corresponding job status

Log ofjob events

edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>

Job status

Page 42: Eddie.Aronovich@cs.tau.ac.il Grid Infrastructure

EGEE08 Istanbul, Turkey

www.eu-egi.eu

43

Grids in Europe•www.eu-egi.eu

•Prof. Dieter KRANZLMUELLER , EGEE 08