[email protected] grid infrastructure
Post on 20-Dec-2015
220 views
TRANSCRIPT
Grid Infrastructure
What is it ?
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 3
SERVERS
Clients
IT all about IT
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 4
Hardware utilization
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 5
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 6
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 7
THE WORLD NEEDS ONLY FIVE COMPUTERS (Thomas J. Watson)
• Google grid• Microsoft's live.com • Yahoo!• Amazon.com• eBay• Salesforce.com
Well, that's O(5) ;)
Greg Matter (http://blogs.sun.com/Gregp/entry/the_world_needs_only_five)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 8
Scaling• Scale-up
– Add more resources within the system– Does not requires changes in the applications– Limited extension– Singe point of failure
• Scape-out– Add more systems– Architecture dependent (needs change of code)– Economically
• Howto ?– Split the operation into groups– Perform each group on a different machine
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 9
How fast can parallelization be ?
• Let:– α be the proportion of the process that can not be
parallelized.– P – number of processors– S – System speedup
Amdhals law:
S = 1 / (α + (1- α ) / P )
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 10
Cluster types• High availability
– Active-Active– Active-Passive– Heart beat
• Load Balancing Cluster– Round robin (weighted/non-weighted)– System status aware (session, cpu load, etc)
• Compute cluster– Queuing system (condor, hadoop, open-pbs, LSF, etc.)– Single system image (ScaleMP, SSI, Mosix, nomad,etc.)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 11
Batch to On-Line scale
gLite
&
Globus
Dedicated resources
PBS Torque
Utility computing
(Condor)hadoop
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 12
Key Cloud Services Attributes• Off-Site, Thirds-party provider• Access via Internet• Minimal/no IT skills required to “implement”• Provisioning - self-service requesting; near
real-time deployment; dynamic & fine-grained scaling
• Fine-grained usage-based pricing model• UI - browser and successors• Web services APIs as System Interface• Shared resources/common versions
Source: IDC, Sep 2008
What is “Grid”
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 14
What is Grid Computing ?
Definition is not widely agreed
Foster & Kesselman:
• Computing resources are not administered centrally.
• Open standards are used.
• Non-trivial quality of service is achieved.
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 15
Other definitions
• "the technology that enables resource virtualization, on-demand provisioning, and service (resource) sharing between organizations." (Plaszczak/Wellner)
• "a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, capability, performance, cost, and users' quality-of-service requirements“ (Buyya )
• "a service for sharing computer power and data storage capacity over the Internet." (CERN)
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 16
SOA & Web services
• Decompose processing into services
• Each service works independently
• Main components:– Universal Description, Discovery and Integration– Simple Object Access Protocol – Web Services Description Language
• W3C standard
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 17
The Grid Metaphor
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 18
GRID
MIDDLEWARE
Visualising
Workstation
Mobile Access
Supercomputer, PC-Cluster
Data-storage, Sensors, Experiments
Internet, networks
Virtual Organization
• What’s a VO?– People in different organisations
seeking to cooperate and share resources across their organisational boundaries
• Why establish a Grid?– Share data– Pool computers– Collaborate
• The initial vision: “The Grid”• The present reality: Many “grids” • Each grid is an infrastructure
enabling one or more “virtual organisations” to share computing resources
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 19
Institute A
VO1
Institute C
Institute B
Institute D
Institute E
VO2Institute F
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 20
Hardware
Operating system
Application
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 21
Hardware
Operating system
Network stack
Application
Stand alone computer
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 22
Hardware
Operating system
Network stack
Grid Middleware
Application
Grid middleware
• Middleware, is an interfaces between resources and the applications
– User/Program Interface– Resource management– Connectivity – Information services– Collaboration
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 23
Middleware components – The batch approach
Eddie Aronovich – Operating System course (TAU CS, Jan 2009) 24
Information Information ServiceService
SE & CE info
Pu
blis
h
Input “sandbox” + Broker Info
ReplicaReplicaCatalogueCatalogueDataSets info
Logging &Logging &Book-keepingBook-keeping
Author.&Authen.
StorageStorageElementElement
ComputingComputingElementElement
Output “sandbox”
ResourceResourceBrokerBroker
Job Status
Job S
ub
mit
Even
t
Job
Qu
ery
Job
Stat
us
Input “sandbox”
Output “sandbox”
““User User interface”interface”
UI
NetworkServer
Job Contr.
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
Characts.& status
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
submitted
Job Status
UI: allows users to access the functionalitiesof the WMS(via command line, GUI, C++ and Java APIs)
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
edg-job-submit myjob.jdlMyjob.jdl
JobType = “Normal”;Executable = "$(CMS)/exe/sum.exe";InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"};OutputSandbox = {“sim.err”, “test.out”, “sim.log"};Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ && other.GlueCEPolicyMaxCPUTime > 10000;Rank = other.GlueCEStateFreeCPUs;
submitted
Job Statu
s
Job Description Language(JDL) to specify job characteristics and requirements
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Input Sandboxfiles
Jobwaiting
submitted
Job StatusNS: network daemon
responsible for acceptingincoming requests
Job submission
UI
NetworkServer
Job Contr.-
CondorG
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
WM: acts to satisfy the request
Job
Workload manager
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
Where must thisjob be executed ?
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Matchmaker: responsible to find the “best” CE for a job
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/ Broker
Where are (which SEs) the needed data ?
What is thestatus of the
Grid ?
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
Match-Maker/Broker
CE choice
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
waiting
submitted
Job Status
JobAdapter
Job Adapter: responsible for the final “touches” to the job before performing submission(e.g. creation of wrapper script, PFN, etc.)
Job submission
UI
NetworkServer
Job Contr.
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
Job Controller: responsible for theactual job managementoperations (done via CondorG)
Job
submitted
waiting
ready
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
CE characts& status
SE characts& status
RBstorage
Job Status
Job
submitted
waiting
ready
scheduled
“Compute element” – reminder!
Homogeneous set of worker nodes
Grid gate node
Local resource management system:Condor / PBS / LSF master
Globus gatekeeper
Job request
Info system
Logging
gridmapfile
I.S.
Logging
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
submitted
waiting
ready
scheduled
running
“Grid enabled”data transfers/
accesses
Job
InputSandboxfiles
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
submitted
waiting
ready
scheduled
running
done
edg-job-get-output <dg-job-id>
Job submission
UI
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ReplicaLocationServer
Inform.Service
ComputingElement
StorageElement
RB node
RBstorage
Job Status
OutputSandboxfiles
submitted
waiting
ready
scheduled
running
done
cleared
Job monitoring
UI
Log Monitor
Logging &Bookkeeping
NetworkServer
Job Contr.-
CondorG
WorkloadManager
ComputingElement
RB node
LM: parses CondorG logfile (where CondorG logsinfo about jobs) and notifies LB
LB: receives and stores job events; processes corresponding job status
Log ofjob events
edg-job-status <dg-job-id>edg-job-get-logging-info <dg-job-id>
Job status
EGEE08 Istanbul, Turkey
www.eu-egi.eu
43
Grids in Europe•www.eu-egi.eu
•Prof. Dieter KRANZLMUELLER , EGEE 08