principles & architectures of distributed...
TRANSCRIPT
![Page 1: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/1.jpg)
Principles & Architecturesof Distributed Computation
Steven Newhouse
![Page 2: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/2.jpg)
©
Contents
� Examples of Distributed (Grid) Computing � Fallacies of Distributed (Grid) Computing� Standards for Distributed Computing
![Page 3: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/3.jpg)
©
What is Grid Computing?
Grid computing involves sharing heterogeneous resources (based on different platforms, hardware/software architectures, and computer languages), located in different places belonging to different administrative domains over a networkusing open standards.
![Page 4: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/4.jpg)
©
Grids In A Nutshell
� Co-ordinated use of shared resources across multiple administrative domains amongst a dynamic user community.
� Evolution of various forms distributed networked computing (HPC, Data,…)
� Various resources: compute, data, storage, instruments, sensors, etc.
� Very diverse scales & skills of use.
BUT what is important… is what you can do with them!e-Research/e-Science/e-Industry
![Page 5: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/5.jpg)
©
Grid Computing:A broad spectrum
� Volunteer Computing� ClimatePrediction.net
� Co-operative Resource Pools� GENIE
� Large-scale Computing & Data Grids� EGEE
� Federated Super-Computers� DEISA
![Page 6: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/6.jpg)
©
ClimatePredicition.net
� Climate models are dependent on many parameters and assumptions
� Need to understand the sensitivity of these parameters & assumptions
� Explore throughrepeatedcomputational runs
![Page 7: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/7.jpg)
©
BOINC: http://boinc.berkeley.edu
Berkley Open Infrastructure for Network Computing
![Page 8: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/8.jpg)
©
Security Issues: Participants
� Software package is digitally signed.� Communications are always be initiated by
the client.� HTTP over a secure socket layer will be used
where necessary to protect participant details and guarantee reliable data collection.
� Digitally signed files can be used where necessary.
![Page 9: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/9.jpg)
©
Security Issues: Experiment
� Two types of run replication:� Small number of repeated identical runs.� Large numbers of initial condition ensembles.
� Checksum tracking of client package files to discourage casual tampering.� Opportunity to repeat runs as necessary.� Server security management and frequent
backups.
![Page 10: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/10.jpg)
©
GENIE –http://www.genie.ac.uk
� Flexibly couple together state-of-the-art components to form a unified Earth System Model (ESM)
� Execute the resulting ESM across a computational Grid
� Share the distributed data produced by simulation runs
� Provide a high-level open access to the system, creating and supporting virtual organisationsof Earth System modellers
![Page 11: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/11.jpg)
©
The Problem:Thermohaline circulation
• Ocean transports heat through the “global conveyor belt.”
• Heat transport controls global climate.
• Wish to investigate strength of model ocean circulation as a function of two external parameters.
• Use GENIE-Trainer.
� Wish to perform 31×31 = 961 individual simulations.� Each simulation takes ∼4 hours to execute on typical Intel
P3/1GHz, 256MB RAM, machine ⇒time taken for 961 sequential runs ≈ 163 days!!!
![Page 12: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/12.jpg)
©
The Results:Scientific Achievements
Intensity of the thermohaline circulation as a function of freshwater flux between Atlantic and Pacific oceans (DFWX), and mid-Atlantic and North Atlantic (DFWY).
Surface air temperature difference between extreme states (off - on) of the thermohaline circulation.
North Atlantic 2°C colder when the circulation is off.
time taken for 961 runs over ~200 machines ≈ 3 days
![Page 13: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/13.jpg)
©
Extensive use of Condor
� Condor pools at:� Imperial College London� Southampton
� Evolving Infrastructure� Initially very simple Condor job� Portal interface to define & start Condor job� Improved data retrieval and visualisation
interfaces
![Page 14: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/14.jpg)
©
What Condor provides…
� Transparent execution environment� Sandboxing to support remote execution� Scheduling within & between pools� Information on available resources
![Page 15: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/15.jpg)
©
Commodity Production Computing & Data Grid
� A production grid is (inevitably) a collection of solutions customised for a particular scenario:� Scalability � Reliability� Authentication, Authorisation & Accounting� Portability� Customer satisfaction
![Page 16: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/16.jpg)
©
CERN: Large Hadron Collider(LHC)Raw Data: 1 Petabyte / secFiltered 100Mbyte / sec = 1 Petabyte / year = 1 Millio n CD ROMsRaw Data: 1 Petabyte / secFiltered 100Mbyte / sec = 1 Petabyte / year = 1 Millio n CD ROMs
CMS Detector
![Page 17: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/17.jpg)
©
EGEE: A Production Grid� Enabling Grids for E-SciencE� EU project – Funded until March 2008
� EDG (European Data Grid): 2001-2004� EGEE-I: 2004-2006
� Focus on production infrastructure� Multiple VO’s – large & small� Grid Operational Centres – to monitor infrastructure
� Its big & growing� 100+ sites, 10K+nodes, 6PB+ storage
� http://www.eu-egee.org
![Page 18: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/18.jpg)
©
Workload Management System
EGEE Integrated Software (gLite): Condor, Globus + EGEE
10,000+concurrent
jobs
![Page 19: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/19.jpg)
©
Super-computing Grids
� DEISA – Distributed European Infrastructure for Super-computing Applications� Linked IBM (AIX) machines across Europe
� Expansion to other systems during the project
� Standard submission interface from Unicore� Global Parallel File System (GPFS)� Dedicated networking infrastrucure
� USA TeraGrid & UK NGS� Pre-web service GT4 components
![Page 20: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/20.jpg)
©
SPICE: Simulated Pore Interactive Computing Environment (Coveney et al)
� NGS & TeraGrid during SuperComputing 05
![Page 21: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/21.jpg)
©
Steering client
Simulation
Steering library
VisualizationVisualization
Registry
Steering GS
Steering GS
connect
publish
find
bind
data transfer (Globus-IO)
publish
bind
Client
Steering library
Steering library
Steering library
Display
Display
Display
components start independently and
attach/detach dynamically
SPICE Architecture
Clients
Services Resources Display Devices
![Page 22: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/22.jpg)
©
The Future for HPC Grids
![Page 23: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/23.jpg)
©
So where does that get us…?� Discovery:
� Condor Collector, SPICE Registry� glite Information Index, BOINC Task Distribution
� Selection:� Condor Matchmaker� gLite Workload Management System� Unicore Abstract Job Objects
� Execution:� Condor, pre-WS GRAM, GridSAM, Unicore
![Page 24: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/24.jpg)
©
The Eight Fallacies of Distributed Computing (and the Grid)1. The network is reliable2. Latency is zero3. Bandwidth is infinite4. The network is secure5. Topology doesn't change6. There is one administrator7. Transport cost is zero8. The network is homogeneous
I also interpret the ‘network’ asthe ‘things’ I want to connect to
From Deutsch & Gosling
![Page 25: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/25.jpg)
©
The network is reliable� Services are here today & gone tomorrow
� DNS provides a level of abstraction above IPs� Defensive programming – test failure & success
� ‘Well known’ registries collect information� GT4 Index Service, Condor Collector, gLite II
� Good Naming schemes protect against failure� Human � Abstract � Address (e.g. Services)� Services can migrate to new locations
� Condor uses stateless UTP protocol� Very good recovery behaviour following failure
![Page 26: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/26.jpg)
©
Latency is zero
� Services calls can take a long time to complete� Run an HPC climate simulation (~ several days)
� Build in asynchronous communication� Start an activity� Subscribe to a notification interface� Messages delivered to you following ‘events’� Firewalls mean that ‘polling’ for state changes is still
an option
![Page 27: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/27.jpg)
©
Bandwidth is infinite
� Reliable File Transfer (RFT) within GT4� Web Service interface to control GridFTP transfers� Asynchronous service call initiates transfers� Transfers may take significant time
� Notifications during the transfer & on completion
� LHC Service Challenges (2005)� Demonstrate sustained transfers from Tier 0 � 1� 600MB/s daily average over 10 days
![Page 28: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/28.jpg)
©
The Network is Secure� Need to enable sharing through trust
� Need to know who is accessing the resource� Need to control who can access the resource� Need to know how much of the resource they used� Need to know what they did and when
� Avoid tampering� Digital signatures (also WS-Security)
� Provide confidentiality� https & WS-SecureConversation
![Page 29: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/29.jpg)
©
Topology does not change
� Abstracts layers provide some protection� Naming
� Some simple autonomic behaviour� GridFTP configurable TCP parameters & channels
� Don’t make assumptions on topology in the design!
![Page 30: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/30.jpg)
©
There is one administrator
� Clearly many, many, many administrators� Networks are partitioned & owned� Local Resources are ‘owned’
� Virtual Organisation models critical� Provide a virtual administration point� Used by local administrators to configure resources
![Page 31: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/31.jpg)
©
Transport cost is zero
� Cost has many interpretations…� For reliable data transfers need dedicated networks
� UK Light – 10GB optical network
� More flexibility by negotiating quality of services� Some protocol support within networks… between difficult
� Heavy users will have to pay� Dedicated infrastructure � $$$
![Page 32: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/32.jpg)
©
The network is homogeneous� Heterogeneity assumed in most middleware
� Low/High encoded Endian-ness for data� Growing use of Web Services� XML encoding for messages� Java a common development environment
� Use of open standards for virtualisation� Well defined ‘application’ service interfaces
� Standards Organisations: Open Grid Forum� Widely adopted Web Service plumbing
� OASIS (Organisation for the Advancement of Structured Information Standards): WS-ResourceFramework
![Page 33: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/33.jpg)
©
Defining Open Standards
� Bespoke proprietary protocols do not lead to interoperability but wrapping� Condor: Native communication library� GT4: WS-ResourceFramework
� WS-GRAM for job submission
� UNICORE: ****� gLite: Own protocol & interfaces but use others:
� Globus (pre WS-GRAM) & Condor-C
� There has to be a better way….
![Page 34: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/34.jpg)
©
Three Generations of Grid
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
Source: Charlie Catlett
![Page 35: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/35.jpg)
©
Three Generations of Grid
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
• Utilize software services and communications protocols developed by grid projects:
– Condor, Globus, UNICORE, Legion, etc.
• Need significant customization to deliver complete solution• Interoperability is still very difficult!
• Utilize software services and communications protocols developed by grid projects:
– Condor, Globus, UNICORE, Legion, etc.
• Need significant customization to deliver complete solution• Interoperability is still very difficult!
Source: Charlie Catlett
![Page 36: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/36.jpg)
©
Three Generations of Grid
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
• Utilize software services and communications protocols developed by grid projects:
– Condor, Globus, UNICORE, Legion, etc.
• Need significant customization to deliver complete solution• Interoperability is still very difficult!
• Utilize software services and communications protocols developed by grid projects:
– Condor, Globus, UNICORE, Legion, etc.
• Need significant customization to deliver complete solution• Interoperability is still very difficult!
• Common interface specifications support interoperability of discrete, independently developed services
• Competition and interoperability among applications , toolkits, and implementations of key services
• Common interface specifications support interoperability of discrete, independently developed services
• Competition and interoperability among applications , toolkits, and implementations of key services
Source: Charlie Catlett
![Page 37: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/37.jpg)
©
Three Generations of Grid
Standardization is key for third-generation grids!StandardizationStandardization is key for thirdis key for third--generation grids!generation grids!
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
• Local “metacomputers“– Distributed file systems– Site-wide single sign-on
• "Metacenters" explore inter-organizational integration• Totally custom-made, top-to-bottom: proofs-of-concept
• Utilize software services and communications protocols developed by grid projects:
– Condor, Globus, UNICORE, Legion, etc.
• Need significant customization to deliver complete solution• Interoperability is still very difficult!
• Utilize software services and communications protocols developed by grid projects:
– Condor, Globus, UNICORE, Legion, etc.
• Need significant customization to deliver complete solution• Interoperability is still very difficult!
• Common interface specifications support interoperability of discrete, independently developed services
• Competition and interoperability among applications , toolkits, and implementations of key services
• Common interface specifications support interoperability of discrete, independently developed services
• Competition and interoperability among applications , toolkits, and implementations of key services
We are here!
We are here!
Source: Charlie Catlett
![Page 38: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/38.jpg)
©
Open Grid Services Architecture
Security• Cross-organizational users• Trust nobody• Authorized access only
Security• Cross-organizational users• Trust nobody• Authorized access only
Information Services• Registry• Notification• Logging/auditing
Information Services• Registry• Notification• Logging/auditing
Execution Management• Job description & submission• Scheduling• Resource provisioning
Execution Management• Job description & submission• Scheduling• Resource provisioning
Data Services• Common access facilities• Efficient & reliable transport• Replication services
Data Services• Common access facilities• Efficient & reliable transport• Replication services
Self-Management• Self-configuration• Self-optimization• Self-healing
Self-Management• Self-configuration• Self-optimization• Self-healing
Resource Management• Discovery• Monitoring• Control
Resource Management• Discovery• Monitoring• Control
OGSAOGSA
OGSA “profiles”OGSA “profiles”
Web services foundationWeb services foundation
![Page 39: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/39.jpg)
©
Execution Management Services (EMS) within OGSA
Provisioning• Deployment• Configuration
Information Services
ServiceContainer
Accounting Services
Execution Planning Services
Candidate Set Generator (Work -Resource mapping)
Job Manager
Reservation
![Page 40: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/40.jpg)
©
Basic Execution Management
3. Submit the job3. Submit the job
2. Select from or deployrequired resources
2. Select from or deployrequired resources
Job
4. Manage the job4. Manage the job
1. Describe the job1. Describe the job
JSDLJSDL
Job
CDLCDL
� The basic problem� Provision, execute and manage services/resources in a grid� Allow for legacy applications
� The basic problem� Provision, execute and manage services/resources in a grid� Allow for legacy applications
![Page 41: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/41.jpg)
©
Configuration & Deployment: CDL� Prepare the infrastructure so that the job can execute
� Provide a right-shaped slot to fit the job
� Main parts:� Configuration Description Language (CDL) provides declarative
definition of system configuration� Deployment service carries out configuration requests to deploy and
configure the systemCDLCDL
IT InfrastructureIT Infrastructure
Prepare
![Page 42: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/42.jpg)
©
Describing a Job: JSDL� Job Submission Description Language (JSDL)
� A language for describing the requirements of jobs for submission� Declarative description
� A JSDL document describes the job requirements � Job identification information
� Application (e.g., executable, arguments)� Required resources (e.g., CPUs, memory)
� Input/output files JobJob
IT InfrastructureIT Infrastructure
JSDLJSDL
![Page 43: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/43.jpg)
©
OGSABasic Execution Service� BES_Factory
� CreateActivityFromJSDL
� BES_Activity_Management� GetActivityStatus� RequestActivityStateChanges� GetActivityJSDLDocuments
� BES_Container_Management� StopAcceptingNewActivities� StartAcceptingNewActivities� IsAcceptingNewActivities
![Page 44: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/44.jpg)
©
Summary
� Flexibility & composability in software design� You cannot predict how your software will be used
� Grids are (by definition) very diverse� But what you do with them (e-Science/e-Industry)
is important
� There will NOT be a single grid middleware� Open standards & interoperability key
![Page 45: Principles & Architectures of Distributed Computationmurli/ISSGC06/session-05/DistCompPrinArch1.pdf · Examples of Distributed (Grid) Computing ... Two types of run replication: Small](https://reader035.vdocuments.us/reader035/viewer/2022070812/5f0b43c67e708231d42fa8a4/html5/thumbnails/45.jpg)
©
Thank you…
� Acknowledgements:� OGSA-WG� Listed organisations
� Questions?