manchester computing supercomputing, visualization & e-science cs 602 — escience and grids...
TRANSCRIPT
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce
CS 602 — eScience and GridsCS 602 — eScience and Grids
John Brooke [email protected] Donal Fellows [email protected]
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce
Lecture 1: What is a GridLecture 1: What is a Grid
We examine how the Grid concept arose, what its relation is to other concepts such as e-Science and CyberInfrastructure. We examine a more precise definition for a Computational Grid. There are other types of Grid but this is the main focus of this module
CS6023
e-Science
“In the future, e-Science will refer to the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualisation back to the individual user scientists.”
Dr John Taylor, Director General of the Research Councils, OST
CS6024
Cyber Infrastructure Term coined by US Blue Ribbon panel - describes the
emergence of an infrastructure linking high-performance computers, experimental facilities, data repositories.
Seems to be distinguished from term Grid, which is considered more to apply directly to computation and cluster style computing.
May or may not be the same thing as eScience. eScience focuses on the way that science is done, cyber-
infrastructure on how the infrastructure is provided to support this way of working.
CS6025
Grids as Virtual Organizations• Used in paper Anatomy of the Grid (Foster, Kesselman,
Tuecke)• “ … Grid concept is coordinated resource sharing in dynamic, multi-
institutional virtual organizations …”
• There is an analogy with an electrical Power Grid where producers share resources to provide a unified service to consumers.
• A large unresolved question is how do Virtual Organizations federate across security boundaries (e.g. firewalls) and organisational boundaries (resource allocation).
• Grids may have hierarchical structures, e.g. the EU DataGrid, or may have more federated structures, e.g. EuroGrid
CS6026
What can Grids be used for?
User with laptop/PDA (web based portal)
VR and/or AG nodes
HPC resources
Scalable MD, MC, mesoscale modelling
“Instruments”: XMT devices, LUSI,…
Visualization engines
Steering
ReG steering API
Storage devicesGrid infrastructure (Globus, Unicore,…)
Moving the bottleneck out of the hardware and into the human mind…Moving the bottleneck out of the hardware and into the human mind…
Performance control/monitoring
CS6027
Hypotheses DesignIntegration
Annotation /Knowledge
Representation
InformationSources
InformationFusion
ClinicalResources
IndividualisedMedicine
Data Mining
Case-BaseReasoning
Data CaptureClinical
Image/SignalGenomic/Proteomic
Analysis
Knowledge Repositories
Model & Analysis Libraries
Grids for Knowledge/Information Flow
CS6028
Parallel and Distributed Computing Parallel computing is the synchronous coupling of
computing resource, usually in a single machine architecture or single administrative domain, e.g. a cluster.
Distributed computing refers to a much looser use of resources, often across multiple administrative domains.
Grid computing is an attempt to provide a persistent and reliable infrastructure for distributed computing.
Users may wish to run workflows many times over a set of distributed resources, e.g. in bioinformatics applications.
Users may wish to couple heterogeneous resources for scientific collaboration, e.g. telescopes, computers, databases, video-conferencing facilities.
CS6029
Re-usability and Components We wish to develop sufficient reusable components to
provide common facilities so that applications and services can interoperate.
We can do this by various approaches, in Globus a toolkit is developed, in Unicore all actions on the Grid are modelled by abstractions encapsulated in an inheritance hierarchy.
As part of this course you should start to identify the strengths and weaknesses of these two approaches.
More radical approaches are to impose a meta-operating system to present the resources as a virtual computer. This was tried by the Legion project and the idea partially survives in the concept of a DataGrid.
CS60210
Toolkits for Grid Functions
Software development toolkits Standard protocols, services & APIs A modular “bag of technologies” Enable incremental development of grid-enabled
tools and applications Reference implementations Learn through deployment and applications Open source
Diverse global services
Coreservices
Local OS
A p p l i c a t i o n s
CS602
Layered Architecture
Applications / Problem Solving Environments
Grid Services
HBMGASS
Grid FabricLSF
MPI
NQE
Application ToolkitsGlobusView
Solaris
GSI-FTPMDS
Grid Resources
Linux
PBS
GSI GRAM
DUROCMPICH-G globusrun
Manchester Imperial College
EPCCOxford
QMLoughborough
Manchester
QM-LUSI/XMT
UNICOS IRIXTru64
SRB
LUSI Portal Component RepositoryVisualization & SteeringComputational PSE
Component FrameworkVIPAR
CS60212
Core Functions for Grids
Acknowledgements to Bill Johnston of LBL
CS60213
The GGF Document “Core Functions for Production Grids” is attempting to define Grids by the minimal set of functions that a Grid must implement to be “usable”
This is a higher level approach that does not attempt to specify how the functions are implemented, or what base technology is used to implement them
In the original Globus Toolkit functions were implemented in C and could be called via APIs, scripts or used on the command line
In Unicore functions were abstracted as a hierarchy of Java classes, then mapped to Perl scripts at a lower level, the “Incarnation process”.
In the Open Grid Services Architecture there is a move to a Web services based approach, the hosting environment assumes prominence.
A Set of Core Functions for Grids
CS60214
Converging Technologies
Agents
Grid Computing
Web Service & Semantic Web Technologies
CS60215
Web Services Early Grids were built on the technologies used for
accessing supercomputers, e.g. ssh, shell scripts, ftp. Information services were built on directory services such as LDAP, Lightweight Directory Access Protocol.
However in the commercial sphere Web Services are becoming dominant based on SOAP, Simple Object Access Protocol, WSDL, Web Services Description Language and UDDI.
Early Grid systems such as Unicore and Globus are trying to refactor their functionality in terms of Web Services.
The key Grid concept not captured in Web services, is State, e.g what is the state of a job queue, the load on a resource, etc..
CS60216
Other Types of Grid The word Grid is very loosely used. Some aspects of collaborative video-conferencing and
advanced visualization are termed Grid. These are currently trying to use technology developed for
running computations, the results are not always usable. This is just one indication that we must conceptualise what
abstractions we need to capture in Grid software. We also need to develop abstractions for both high and low
level protocols, for security models, for user access policies.
The Unicore system we present has captured the key semantics and abstractions of a Computational Grid.
CS60217
Access Grid Manchester official UK Constellation
site
Solar Terrestrial Physics Workshop
Teleradiology, Denver
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce Lecture 2: Computational Lecture 2: Computational ResourceResource
If the Grid concept is to move from a vague analogy to a workable scientific concept, the terms need to be more carefully defined. Here we describe one approach to defining one key abstraction, namely computational resource.
CS60219
Terminology We identify a problem: terms in distributed computing are
used loosely and are thus not amenable to analysis. We identify a possible programme: to seek for invariants
which are conserved or are subject to identifiable constraints.
We now try to trace an analysis of the concept of “Computational Resource” since distributed computing networks are increasingly referred to as Grids.
An electricity grid distributes electrical power, a water grid distributes water, and information grid distributes information.
What does a computational grid distribute?
CS60220
The Analogy with a Power Grid• The power grid delivers electrical power in the form of a
wave (A/C wave)• The form of the wave can change over the Grid but there is
a universal (scalar) measure of power, Power = voltage x current.
• This universal measure facilitates the underlying economy of the power grid. Since it is indifferent to the way the power is produced (gas, coal, hydro etc…) different production centres can all switch into the same Grid.
• To define the abstractions necessary for a Computational Grid we must understand what we mean by computational resource.
CS60221
Information Grids• Information can be quantified as bits with sending and
receiving protocols.• Bandwidth x time gives measure of information flow. Allows
Telcos to charge.• Internet protocols allow discovery of static resource (e.g.
WWW pages).• Information “providers” do not derive income directly
according to volume of information supplied. Use other means (e.g. advertising, grants) to sustain resources needed.
• Current Web is static, do not need to consider dynamic state, hence extensions needed for Open Grid Services Architecture.
CS60222
What is Computational Power?• Is there an equivalent of voltage x current? Megaflops?• Power is a rate of delivery of energy, so should we take
Mflops/second. However this is application dependent.
• Consider two different computations1. Seti@home. Time factors not important.2. Distributed collaborative working on a CFD problem with
computation and visualization of results in multiple locations. Time and synchronicity are important!
3. But both may use exactly the same number of Mflops.
CS60223
Invariants in Distributed Computation To draw an analogy with the current situation we refer to
the status of physics in the 17th and 18th centuries. It was not clear what the invariant quantities were that
persisted through changes in physical phenomena. Gradually quantities such as momentum, energy, electric
charge were isolated and their invariance expressed in the form of Conservation Laws.
Without Conservation Laws, a precise science of physics is inconceivable.
We have extended our scope to important inequalities, e.g. Second Law of Thermodynamics, Bell’s inequality.
We must have constraints and invariants or analysis or modeling are impossible.
CS60224
An Abstract Space for Job-Costing
Define a job as a vector of computational resources (r1,r2,…,rn)
A Grid resource advertises a cost function for each resource (c1,c2,…,cn)
Cost function takes vector argument to produce job cost (r1*c1 + r2*c2 + … + rn*cn)
CS60225
A Dual Job-Space• Thus we have a space of “requests” defined as a vector space of the
computational needs of users over a Grid. For many jobs most of the entries in the vector will be null.
• We have another space of “services” who can produce “cost vectors” for costing for the user jobs (providing they can accommodate them).
• This is an example of a dual vector space.• A strictly defined dual space is probably too rigid but can provide a
basis for simulations.• The abstract job requirements will need to be agreed. It may be a task
for a broker to translate a job specification to a “user job” for a given Grid node.
• A Mini-Grid can help to investigate a given Dual Job-Space with vectors of known length.
CS60226
4 - Dual Space
Cost vector
Job vectorCost
User Job
Scalar cost
in tokens
1
2
CS60227
Computational Resource• Computational jobs ask questions about the internal
structure of the provider of computational power in a manner that an electrically powered device does not.
• For example, do we require specific compilers, libraries, disk resource, visualization servers?
• What if it goes wrong, do we get support? If we transfer data and methods of analysis over the Internet is it secure?
• A resource broker for high performance computation is a different order of complexity to a broker for an electricity supplier.
CS60228
Emergent Behaviour• Given this complexity, self-sustaining global Grids are likely
to emerge rather than be planned.• Planned Grids can be important for specific tasks, the EU
DataGrid project is an example. They are not required to be self-sustaining and questions of accounting and resource transfer are not of central interest.
• We consider the EUROGRID multi-level structure as an emergent phenomenon that could have some pointers to the development of large scale, complex, self-sustaining computational Grids.
• The Unicore Usite and Vsite structure is an elegant means of encapsulating such structure.
CS60229
Fractal Structure and Complexity• Grids are envisaged as having internal structure and also
external links.• Via the external links (WANS, intercontinental networks)
Grids can be federated.• Action of joining Grids raises interesting research
questions: 1. How do we conceptualise the joining of two Grids? 2. Is there a minimum set of services that defines a Grid. 3. Are there environments for distributed services and
computing that are not Grids (e.g. a cluster)
We focus on the emergent properties of virtual organisations in considering whether they are Virtual Organizations.
CS60230
Resource Requestor and Provider Spaces Resource requestor space (RR), in terms of what the user
wants: e.g. Relocatable Weather Model, 10^6 points, 24 hours, full topography.
Resource Provider space (RP), 128 processors, Origin 3000 architecture, 40 Gigabytes Memory, 1000 Gigabytes disk space, 100 Mb/s connection.
We may even forward on requests from one resource provider to another, recasting of O3000 job in terms of IA64 cluster, gives different resource set.
Linkage and staging of different stages of workflow require environmental support, a hosting environment.
CS60231
RR space
RP space
RP space
RP spaceRR space
request
request
Request referral
sync
Figure 1: Request from RR space at A mapped into resource providers at B and C, with C forwarding a request formulated in RR space to RP space at D. B and C synchronize at end of workflow before results returned to the initiator A.
AB
CD
RR and RP Spaces
CS60232
Resume We have shown how some concepts from abstract vector
spaces may be able to provide a definition of Computational Resource.
We do not know as yet what conservation laws or constraints could apply to such an abstraction and whether these would be useful in analysing distributed computing.
We believe that we can show convincingly that simple scalar measures such as Megaflops are inadequate to the task.
This invalidates the “league table” concept such as the Top 500 computers. Compuational resource will be increasingly judged by its utility within a given infrastructure.
CS60233
The Resource Universe• What is the “Universe” of resources for which we should
broker?• One might use a search engine but then there is no agreed
resource description language nor would users be able to run on most of the resources selected.
• Globus uses a hierarchical directory structure, MDS based on LDAP. Essentially this is a “join the Grid model”, based on the VO concept.
• By making Vsites capable of brokering we can potentially access the whole universe of Vsites.
• Concept of a Shadow Resource DAG makes the resource search structurally similar to its implementation, maintains AJO abstraction.
CS60234
Towards a Global Grid Economy?• Much access to HPC resources is via national grants or
the resources are private (governmental, commercial). Many problems with sharing resources, what incentives?
• Grid resources can be owned by international projects but resources are allocated by national bodies. This is like collaboration in large scale facilities, e.g. CERN.
• Europe has to go down the shared resource route, US doesn’t. Will this produce separate types of Grid economy?
• The problems of accounting and resource trading are rarely touched on. Mini-Grids can help explore technical issues outside of political ones.
CS60235
Summary The three different views of a distributed infrastructure
relate to the way it is used. We need to abstract usage patterns and see if we can link
them to invariants that can be quantified. We have investigated in depth the concept of
“Computational Resource”. This ties into all three definitions
1. eScience collaborations use resources
2. Cyber-infrastructures connect resources
3. Grids distribute resources
CS60236
Human Factors A prediction arises from this: that the abstracted idea of
human collaboration will be essential to success in this field.
In an electricity Grid the human participants are completely anonymised and only influence via mass action e.g. a power surge.
Patterns of usage in eScience will be much more complex and dynamic.
It will belong to the post-Ford model of industrial production, this time the product will be knowledge.
Our search to abstractions to encapsulate this will be far more challenging and exciting.
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce Lecture 3: Introduction to Lecture 3: Introduction to UnicoreUnicore
Unicore is the Grid middleware system you will study in depth. It is a complete system based on a three tier architecture. We have chosen it as an illustration because of its compact and complete nature and because it is very well-engineered for a Computational Grid. Thanks to Michael Parkin who created the slides in this lecture
CS60238
UNICORE Grid UNiform Interface to COmputing REsources
European Grid infrastructure to give secure and seamless access to High Performance Computing (HPC) resources
• Secure: Strong authentication of users based on X509 certificates. Communication using SSL connections over a TCP/IP/Internet connection - Defined in the UNICORE Protocol Layer (UPL) specification.
• Seamless: Uniform interface and consistent access to computing resources regardless of the underlying hardware, systems software, etc. - Achieved using Abstract Job Objects (AJO).
HPC resources based in centres in Switzerland, Germany, Poland, France, and United Kingdom integrated into a single grid
CS60239
UNICORE Grid Architecture
Client: Interface to the user.Prepares and submits the job
over the unsecured network to…
Gateway: The entry point to the computing centre and secured network. Authenticates the user and passes job to…
Server: Schedules the job for execution, translates the job to commands appropriate for the target system.
The UNICORE architecture is based on three layers:
CS60240
UNICORE Terminology USite A site providing UNICORE Services (e.g. CSAR).
VSite A computing resource within the USite.
USpace Dedicated file space on VSite.
May only exist during the execution of a job.
XSpace Permanent storage on the VSite.
(e.g. users home directory).
CS60241
UNICORE Security Between user and computing centre communications over SSL
• Users X.509 certificate stored in the client.
• Certificate encrypts data using Secure Sockets Layer (SSL) technology
- Industry standard method for protecting web communications.
- 128-bit encryption strength.
• Defined in the UNICORE Protocol Layer (UPL) standard.
– Prevents eavesdropping on and tampering with communications and data.
– Provides instant authentication of visitor's identity instead of requiring individual usernames and passwords.
Within the computing centre communications are within secure network
• Local site policy can specify encrypted communication if necessary.
CS60242
UNICORE Protocol Layer (UPL) Is a set of rules by which data is exchanged between computers.
Request/reply structure.
CS60243
The Abstract Job Object (AJO) Collection of approximately 250 Java classes representing actions, tasks,
dependencies and resources• v4.0 can be downloaded from www.unicore.org.
Specify work to be done at a remote site seamlessly• No knowledge of underlying execution mechanism required.• Example classes:
ExecuteScriptTask ListDirectory CompileTask Dependency Processor, Storage
Signed, serialised Java object transmitted from the Client to gateway using the UPL
CS60244
Simplified AJO Class Diagram (1)
ExecuteScriptTask
ChangePermissions, CopyFile, CreateDirectory, DeleteFile, FileCheck, ListDirectory, RenameFile, SymbolicLink
AbstractJob
AbstractAction Dependency
ActionGroupAbstractTask
FileTask FileTransferExecuteTaskFileAction
UserTask
Resource
CopySpooled, DeclarePortfolio, DeleteSpooled, IncarnateFiles, MakePortfolio, Spool, UnSpool
CopyPortfolioTask, ExportTask, GetPortfolio, ImportTask, PutPortfolio
{ordered}
Memory, Node, PerformanceResource, Processor, RunTime, Storage
CapacityResource
• Diagram shows how an Abstract Job object can be constructed from Tasks and groups of tasks.
• Resources can be allocated to each task..
CS60245
Simplified AJO Class Diagram (2)Outcome
AbstractTask_OutcomeActionGroup_Outcome
ChangePermissions_Outcome CopyFile_Outcome CopyPortfolio_Outcome CopyPortfolioToOutcome_Outcome CopySpooled_Outcome CreateDirectory_OutcomeDeclarePortfolio_Outcome DeleteFile_Outcome DeletePortfolio_Outcome DeleteSpooled_Outcome ExecuteTask_Outcome ExportTask_Outcome FileCheck_OutcomeGetPortfolio_Outcome ImportTask_Outcome IncarnateFiles_Outcome ListDirectory_OutcomeMakeFifo_Outcome, MakePortfolio_Outcome, MoveFifoToOutcome_OutcomePutPortfolio_Outcome, RenameFile_Outcome Spool_Outcome, SymbolicLink_OutcomeUnSpool_Outcome
AbstractJob_Outcome
CS60246
AJO Example 1:ListDirectory
add() addResource():storage:listDirectory:abstractJob
Directory set using setTarget(string target) method.
AbstractJob consigned to gateway
CS60247
AJO Example 2:ImportTask
5. add()
addResource()
:copyPortfolioToOutcome:dependency
:abstractJob
Used to download files on a specified VSite to the Client.
Import task imports a file from the Storage area to the jobs USpace. (Portfolio represents a collection of files in the USpace).
AbstractJob consigned to gateway.
:importTask
:storage
3. add() 4. add()
1. add() 2. add()
File name set using addFile(string target) method
Dependency ensures that file(s) are in the USpace before copied to outcome
CS60248
Dependency (1)d1: dependency
AJO Example 3:ExecuteScriptTask
:executeScriptTask
:abstractJob
:makePortfolio:incarnateFiles
:actionGroup
:scriptType
d2 :dependency
:resourceSet
Name (String)
Script (byte[ ][ ]) Files (String[ ])
IncarnateFiles
MakePortfolio
ResourceSet
+ this diagram to be completed…
setScriptType()setResource()
AbstractJob consigned to gateway
Script arguments set using setCommandLine(string args) method.
add()
add()
add()
add() add()
Dependencies ensure that files arrive before task is executed
add()
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce
Lectures 4-5: Unicore ClientLectures 4-5: Unicore Client
We now present a client side view of the Computational Grid. This will allow you to begin the practical exercises before engaging with the full complexity of the server side components and complete Grid architecture of Unicore.
We thank Ralf Ratering of Intel for permission to use this material.
CS60250
UNICORE
A production-ready Grid system that connects Supercomputers and Clusters to a Computing Grid.
Originally developed in German research projects UNICORE (1997-2000) and UNICORE Plus (2000-2003) – Client implemented by Pallas (now Intel PDSD)
– Server implemented by Fujitsu as sub-contractor of Pallas
Further enhanced in European research projects– Eurogrid (2000-2003), Grip(2001-2003), OpenMolGrid (2002-2005),
NextGrid (2004-2008), SimDat (2004-2008), others
Used as middleware for NaReGI
CS60251
The UNICORE Client Graphical Interface to UNICORE Grids Platform-independent Java application Open Source available from UNICORE Forum Functionality:
– Job Preparation,
Monitoring and Control
– Complex Workflows
– File Management
– Certificate Handling
– Integrated Application
Support
CS60252
UNICORE Server Components
UUDB IDB
Cli
ent
NJS
TSI
Gateway
AJO
Incarnation DatabaseTranslates AJO to platform specific incarnation Contains resource descriptions
Target System InterfaceOnly component that must live on target systemPerl or Java implementationsExecutes jobs or submit jobs to batch sub system
Network Job SupervisorMain server componentManages jobsPerforms Authorization
UNICORE User DatabaseMaps certificates onto logins
Abstract Job ObjectPlatform independent description of tasks, dependencies and resources Performs Authentication
Runs at DMZ
CS60253
1997 1998 1999 2000 2001 2002 2003 Today
History of UNICOREClient Versions
Early Prototypes developed in UNICORE project
First stable version 3.0
Final version in UNICORE Plus: 4.1 Build 5
UNICORE 5Open Source available atwww.unicorepro.com www.unicore.org
Pallas UNICOREpro version 1
CS60254
Starting the Client Prerequisites: Java ≥ 1.4.1
– If not available, choose bundled download package
UNICORE Configuration directory <.unicore> in your HOME directory
Get test certificates from Test Grid CA service
CS60255
Ready to go? „Hello Grid World!“
UNICORE Site == GatewayTypically represents a computing center
Virtual Site == Network Job SupervisorTypically represents target system
DEMO
1. Execute a simple script on the Test Grid
2. Get back standard output and standard error
CS60256
Gateway
Behind the Scenes: Authentication
Establish SSL Connection
Send User Certificate
Send Gateway Certificate
Trust User Certificate Issuer?
Trust Gateway Certificate Issuer?
Gateway Certificate
Client
User Certificate
CS60257
Behind the Scenes: Authorization
IDB
TSI
UUDB
Certificate 2
Certificate 3
Certificate 4
Certificate 5
Certificate 1
Login B
Login C
Login D
Login E
Login A
Typical UNICORE
User
Test Grid User
User Certificate
User Login
AJO Certificate==
SSL Certificate?
Client
NJS
Gateway
User Certificate
AJO
CS60258
Behind the Scenes:Creation & Submission
Script
Container
Abstract Job Object ExecuteScriptTask
IncarnateFiles
CL
IEN
TS
ER
VE
R
Script_HelloWorld1234...
1. Create file with script contents
2. Execute as script
Job Directory (USpace)A temporary directory at the target system where the job will be executed
CS60259
Monitoring the Job Status
Successful: job has finished succesfully
Not successful: job has finished, but a task failed
Executing: Parts of a job are running or queued
Running: Task is running
Queued: Task is queued at a batch sub system
Pending: Task is waiting for a predecessor to finish
Killed: Task has been killed manually
Held: Task has been held manually
Ready: Task is ready to be processed by NJS
Never run: Task was never executed
CS60260
The Primes Examplepublic void breakKey() {
try {
BufferedReader br = new BufferedReader(new FileReader("primes.txt"));
while (true) {
inputLine = br.readLine();
st = new StringTokenizer(inputLine," ");
val = new BigInteger(st.nextToken());
if ( (N.mod(val).compareTo(BigInteger.ZERO)) == 0) {
p = val;
q = N.divide(val);
return;
}
}
} catch (NullPointerException e) {
System.out.println("Done!");
} catch (IOException e) {
System.err.println("IO Error:" + e);
}
p = BigInteger.ZERO;
q = BigInteger.ZERO;
}
2
3
5
7
11
13
17
19
23
29
31
37
41
43
47
53
59
61
67
71
73
79
...
ArrBreakKey.java Primes.txt
CS60261
CL
IEN
TS
ER
VE
R„Gridify“ the Primes Example
ArrBreakKey.java
Job Directory (USpace)
ArrBreakKey.java
1. Import java file
ArrBreakKey.class
2. Compile java file
3. Execute class file
4. Get result in stdout/stderr
DEMO
CS60262
CL
IEN
TS
ER
VE
RBehind the Scenes:Software Resources
Command TaskExecutes a Software Resource, or Command (a binary that will be imported into the Job Directory)
APPLICATION javac 1.4
Description „Java Compiler“
INVOCATION [
/usr/local/java/bin/javac
]
END
Incarnation Database (IDB)Application Resources contain system specific information, absolute paths, libraries, environment variables, etc.
CS60263
CL
IEN
TS
ER
VE
RBehind the Scenes:Fetching Outcome
Job Directory (USpace)
ArrBreakKey.java
Files Directory
ArrBreakKey.class
2. Compile java file stdout, stderr
3. Execute class file stdout, stderr
Fetch Outcome
Session DirectoryConfigurable in User Defaults: Paths->Scratch Directory
stdout, stderr
stdout, stderr
CS60264
Integrated Application Example: POV-Ray
Scene Description#include "colors.inc"
#include "shapes.inc"
camera {
location <50.0, 55.0, -75.0>
direction z
}
plane {y, 0.0 texture {pigment {RichBlue }}}
object { WineGlass translate -x*12.15}
light_source { <10.0,50.0,35.0> colour White }
...
POV-Ray Application
CL
IEN
TS
ER
VE
R
Command Line Parameters
Display
Demo Image from Pov-Ray Distribution
Job Directory (USpace)
Include Files
Libraries
Remote File System (XSpace)
Input Files
Output Image
CS60265
Behind the Scenes:Plug-In Concept
Add your own functionality to the Client!– Heavily used in research projects all over the world
– More than 20 plug-ins already exist
No changes to basic Client Software needed Plug-Ins are written in Java Distribution as signed Jar Archives
CS60266
Existing Plug-Ins (incomplete) CPMD, Car-Parinello Molecular
Dynamics (FZ Jülich) Gaussian (ICM Warsaw) Amber (ICM Warsaw) Visualizer (ICM Warsaw) SQL Database Access (ICM
Warsaw) PDB Search (ICM Warsaw) Nastran (University of Karlsruhe) Fluent (University of Karlsruhe) Star-CD (University of Karlsruhe) Dyna 3D (T-Systems Germany) Local Weather Model (DWD) POV-Ray (Pallas GmbH) ...
Resource Broker (University of Manchester)
Interactive Access (Parallab Norway)
Billing (T-Systems Germany) Application Coupling (IDRIS
France) Plugin Installer (ICM Warsaw) Auto Update (Pallas GmbH) ...
CS60267
Using 3rd Party Plug-Ins
Get Plug-in Jar archive from Web-Site, Email, CD-ROM, etc. Store it in Client‘s Plug-In directory Client will check Plug-In Signature Import Plug-In certificates
from the Actions menu in the Keystore EditorIs one certificate in
the chain a trusted entry in the keystore?
Is the signing certificate a trusted entry in the
keystore?
REJECT
yesno
Add signing certificate to
keystore?
LOAD
no yes
REJECT LOAD
yesno
CS60268
Task Plugins Add a new type of task to the Client GUI New task can be integrated into complex jobs Application support: CPMD, Fluent, Gaussian, etc.
Add task
item
Settings
item
Icon
Plugin
info
CS60269
Supporting an Application at a Site
Install the application itself Add entry to the Incarnation Database (IDB)
APPLICATION Boltzmann 1.0
Description „Boltzmann Simulation“
INVOCATION [
/usr/local/boltzmann/bin/linuxExec.bin
]
END
CS60270
Plug-In Example: CPMD Workflow for Car–Parrinello molecular dynamics code
Input: conf_file2 RESTART
Input: conf_file1
re-iterate
Wavefunction Optimization
Geometry Optimization
further optimization?
MD Run
Output: stdout stderr RESTART.1, LATEST, ...
Other ...
Visualization
?further evaluation
CS60271
Plug-In Example: CPMD CPMD plugin constructs UNICORE workflow
CS60272
Plug-In Example: CPMD CPMD wizard assists in setting up the input parameters
CS60273
Plug-In Example: CPMD Visualize results
CS60274
Extension Plugins Add any other functionality Resource Broker, Interactive Access, etc.
JPA toolbar
Settings
item
Extensions menu
Virtual site
toolbar
Plugin info
CS60275
Plug-In Example:Resource Broker
Specify resource requests in your job Submit it to a broker site Get back offers from broker
CS60276
CL
IEN
TExample: Steering a Simulation
SE
RV
ER
Lattice-Boltzmann
Simulation Code
input file
reads
Editor
DEMO
output.gif
writes
Export Panel
sample.gif
writes
Sample Panel
control file
reads
Control Panel
Job Directory
Plugin Task
CS60277
Specifying Resource Requests
Tasks can have resource sets containing requests If not resource set is attached, default resources are used Resource sets can be edited, loaded and saved If a resource request does not match resources available at a site, the
Client displays an error
Resource Set 1
Resource Set 2
CS60278
Behind the Scenes: Authorization
UUDB
Cli
ent
NJS
Gateway
User Certificate
User Login
User Certificate
AJO
User Certificate
Sub-AJO
Site A
UUDB NJS
Gateway
User Certificate
Sub-AJO
Site B
User Certificate
Sub-AJO
SSL Certificate
== Trusted NJS?
CS60279
Using File TasksC
LIE
NT
SE
RV
ER
1
SE
RV
ER
2
Home
Temp
Spool
Root
Local
USpace
Home
Temp
RootUSpace
Storage Server Storage Server
CS60280
Complex Workflow:Control Tasks
Do N Loop
Do Repeat Loop
Hold Task
If Then Else
CS60281
UNICORE jobs stop execution when a task fails Sometimes Task failure is acceptable
– If and DoRepeat conditions– Tasks that try to use restart files– Whenever you do not care about task success
Set „Ignore Failure“ flag on Task
Behind the Scenes:Ignore Failure
Right Mouse Click in Dependency Editor
CS60282
Loops: Accessing the Iteration Counter Iteration variable: $UC_ITERATION_COUNTS Lives on server side Supported in
– Script Tasks
– File Tasks
– Re-direction of stdout/stderr
Nested loops: iteration numbers are separated by „_“, e.g. „2_3“
Caution: counter will not be propagated to sub jobs
CS60283
Job Monitor Actions
Get new status for a site, job or task
Get stdout, stderr and exported files of a job
Remove job from server.Deletes local and remote temporary directories
Kill job
Hold job execution
Resume a job that was held by a „Hold Job“ action or a Hold task
Copy a job from the job monitor. The job can be pasted into the job preparation tree and re-run e.g. with different parameters
Show dependencies of job
Show resources for task
CS60284
Caching Resource Information
Client works on cached resource information– UNICORE Sites, Virtual Sites, available resources
Resource Cache will be updated on...– ... startup
– ... refresh on „Job Monitoring“ tree node
Client uses cached
information in Offline mode
CS60285
Accessing other UNICORE Sites
UNICORE Siteswill be read from an XML fileCan be a URL on the web
Virtual Sitesare configured at the UNICORE Site
Job Monitor RootPerforming a „Refresh“ on this node will reload UNICORE Sites
CS60286
Configuration: UsingDifferent Identities
Using different identities
Key entries: Who am I?
CS60287
Browsing Remote File Systems Remote File Chooser
– Used in Script Task, Command Task, for File Imports, Exports, etc.
Select virtual site or „Local“
Preemptive file chooser mode will enhance performance on fast file systems
CS60288
The Client Log
„clientlog.txt“ or „clientlog.xml“ Used by developers to figure out problems
User Defaults->Paths:
User Defaults->Logging Settings:
Enable under Windows, when no console is used
Use PLAIN
INFO should be fine
CS60289
Starting the Client Revisited
client.jar in lib directory– start with .exe (Windows) or run script (Unix/Linux)
– or: „java –jar client.jar“
Command line options– Choose an alternative configuration directory:
• -Dcom.pallas.unicore.configpath=<mypath>
– Enable the security manager:• -Dcom.pallas.unicore.security.manager
CS60290
Outlook: OGSA Grid Services
Client
UUDB IDBNJS
TSI
UPL Grid ServiceUPL GSFactory Registry
UPL GSFactoryHandles
Register
XML FileContains Registry handles in addition to classical UNICORE Site addresses
HTTPS Request
AJO
Passes through firewalls
Grid Services invisible to user
UPL GSFactory
Start
UPL GSHandle
CS60291
Summary
With the UNICORE Client you can easily run and monitor complex jobs on a UNICORE Grid
Download the Client from www.unicore.org or www.unicorepro.com and have fun...
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce Lectures 6-7: Programming Lectures 6-7: Programming Unicore Client Plug-InsUnicore Client Plug-Ins
We now show how the Unicore client can be extended by programming application-specific plugins. This extends standard Java technology to a Grid context and brings flexibility and generality to the Unicore client.
We thank Ralf Ratering of Intel for permission to reproduce this material
CS60293
Overview Introduction
– Existing Plug-Ins
AJO Plugin– An Extension Plugin submitting „raw“ Abstract Job Objects that do
appear in the Job Monitor
Small Service Plugin– An Extension Plugin using containers for service jobs that do not
appear in the Job Monitor
Boltzmann Plugin– A Task Plugin that integrates the Boltzmann Lattice simulation into the
Client GUI
CS60294
Job Preparation– File, execution and control tasks
– Complex workflows
– Editing, copying,
– saving, etc.
Resource Handling Job Monitoring Job Control Remote File Browsing Certificate Handling
Functionality of the UNICOREpro Client
CS60295
Plug-In Concept
Add your own functionality to the Client!– Heavily used in research projects all over the world
– More than 20 plug-ins already exist
No changes to basic Client Software needed Plug-Ins are written in Java Distribution as signed Jar Archives
CS60296
Deployment and Installation
User gets plugin jar archive from Web-Site, Email, CD-ROM, etc.
Store it in Client‘s plugin path1. Lib directory
2. User Defaults Plugin directory
Client checks plugin jar signature
Is one certificate in the chain a trusted entry in
the keystore?
Is the signing certificate a trusted entry in the keystore?
REJECT
yesno
Add signing certificate to keystore?
LOAD
no yes
REJECT LOAD
yesno
CS60297
Task Plugins Add a new type of task to the Client GUI New task can be integrated into complex jobs Application support: CPMD, Fluent, Gaussian, etc.
Add task
item
Settings
item
Icon
Plugin
info
CS60298
Extension Plugins Add any other functionality Resource Broker, Interactive Access, etc.
JPA toolbar
Settings
item
Extensions menu
Virtual site
toolbar
Plugin info
CS60299
Supporting an Application at a Site
Install the application itself Add entry to the IDB
APPLICATION Boltzmann 1.0
Description „Boltzmann Simulation“
INVOCATION [
/usr/local/boltzmann/bin/linuxExec.bin
]
END
CS602100
Example Use: CPMD Workflow for Car–Parrinello molecular dynamics code
Input: conf_file2 RESTART
Input: conf_file1
re-iterate
Wavefunction Optimization
Geometry Optimization
further optimization?
MD Run
Output: stdout stderr RESTART.1, LATEST, ...
Other ...
Visualization
?further evaluation
CS602101
Example Use: CPMD CPMD plugin constructs UNICORE workflow
CS602102
Example Use: CPMD CPMD wizard assists in setting up the input parameters
CS602103
Example Use: CPMD Visualize results
CS602104
Example Use:On Demand Weather Prediction
On demand mesoscale weather prediction system
Based on relocatable version of DWD’s prediction model
Works from regular prediction data, topography and soil database
CS602105
Example Use:On Demand Weather Prediction
User Workstation
Topography & soil data
Regular prediction data
GME2LMinterpolation to LM grid
LMcalculation of mesoscale
prediction
1–5 MByte
50–100 MByteLM-forecast datavisualisation
~50 MByte
input datasets for LM (1–20 GByte)
CS602106
Example Use:Coupled CAE Applications
Run coupled aerospace simulations (electromagnetism)
Use CORBA as coupling substrate
Provide internal portal for Airbus engineers
CS602107
Example Use: Resource Broker Specify resource requests in your job Submit it to a broker site Get back offers from broker
CS602108
Existing Application Plug-Ins FZ Jülich
– CPMD, OpenMolGrid
ICM Warsaw– Gaussian, Amber, SQL Database Access
University of Karlsruhe– Nastran, Fluent, Star-CD
T-Systems– Dyna 3D
DWD – Local Weather Model
Pallas GmbH– POV-Ray, Script, Command, Compile, Globus Proxy Certificate
CS602109
Existing Extension Plug-Ins University of Manchester
– Resource Broker
Parallab Norway– Interactive Access
T-Systems Germany– Billing
IDRIS France– Application Coupling
ICM Warsaw– Plugin Installer
Pallas GmbH – Auto Update, AJO Submitter, Small Service Plugin
CS602110
AJO Plugin Idea: Easy way to develop your own
AJOs Use Client infrastructure
– Certificates
– Usites, Vsites and Resources
– User interface
Use JMC to control AJO– Watch status
– Fetch and display Outcome
– Send Control Actions
CS602111
Example: Execute an Application Resource
Select an Application Resource and execute it at virtual site
Submit AJO containing UserTask Use Job Monitor to get back output
Implement 2 classes– Main Plugin Class
– AJO Request Class
Build a Jar Archive named „*Plugin.jar“
Sign the Jar with your Certificate
CS602112
Using Application Resources
Incarnation Data Base
APPLICATION AJOTest 1.0
Description „Demo Resource for AJO Plugin“
INVOCATION [
echo „Hello World!“
]
END
CL
IEN
TS
ER
VE
R
Network Job Supervisor (NJS)
Resource Set
Memory (64, 128, 32000)
...
APPLICATION AJOTest 1.0
APPLICATION CPMD 3.1
...
Context MPI
...
Resource
Manager
Plugin
AJOTest resource
available?
Add to AJO UserTask
Display
message Submit as Request
CS602113
Client Requests GetFilesFromUSpace SendFilesToUspace GetFilesFromXSpace SendFilesToXSpace GetByteArrayFromXSpace SendByteArrayToXSpace GetListings GetUsites GetVsites GetResources GetRunningJobs GetJobStatus GetOutcome GetSpooledFiles ...
ClientObserver
RequestObservable
Start as new thread
Notify when finished
CS602114
Class AJORequest
public class AJORequest extends ObservableRequestThread {
...
public void run() {
UserTask userTask = new UserTask("UserTask");
userTask.addResource(software);
User user = ResourceManager.getUser(vsite);
AbstractJob job = new AbstractJob("AJORequest_„ + ResourceManager.getNextObjectIdentifier());
job.setVsite(vsite);
job.setEndorser(user);
job.add(userTask);
Reply reply=null;
try { reply = polling(job, vsite, user);
} catch (Exception e) {
logger.log(Level.SEVERE, "Submitting AJO in polling mode failed.", e);
}
notifyObservers(this, reply);
}
}
public abstract class ObservableRequestThread extends ObservableThread {
public void setInterrupted(boolean interrupted) {
public Reply nonPolling(AbstractJob job, Vsite vsite, User user, Vector streamedFiles);
public Reply polling(AbstractJob job, Vsite vsite, User user, Vector streamedFiles);
}
public abstract class ObservableThread extends Thread implements IObservable {
public void addObserver(IObserver anObserver);
public void deleteAllObservers();
public void deleteObserver(IObserver anObserver);
public void notifyObservers(Object theObserved, Object changeCode);
}
CS602115
Class AJOPluginpublic abstract class UnicorePlugable {
public HelpSet getHelpSet() {
public abstract String getPluginInfo();
public JMenuItem getSettingsItem() {
public abstract void startPlugin();
public abstract void stopPlugin();
protected Client getClient();
}
public abstract class ExtensionPlugable
extends UnicorePlugable {
public JMenuItem getCustomMenu();
public Component getJPAToolBarComponent();
public Component getVsiteToolBarComponent();
public Object setupSpecialVsiteFeatures(
Vsite vsite, AbstractJob job);
}
public class AJOPlugin extends ExtensionPlugable implements IObserver{
public String getPluginInfo() { return „AJO plugin example“; }
public Component getVsiteToolBarComponent() { return startButton; }
public void startPlugin() { startButton = new JButton(new ServiceAction()); }
public void stopPlugin() { /* empty */ }
private void submitServiceJob(SoftwareResource software, Vsite vsite) { AJORequest request = new AJORequest(software, vsite); request.addObserver(this); request.start(); }
public void observableUpdate(Object theObserved, Object changeCode) { Reply reply = (Reply)changeCode; ... }
private class ServiceAction { ... }
}
CS602116
Small Service Plugin
Idea: Do complete handling of jobs from plugin– Build, submit and monitor AJO
– Fetch back outcome and exported files
Use Client Containers to construct AJO
CS602117
AJOs and Containers Client containers encapsulate
complex AJOs Manage imports, exports and
execution Hold parameters, keep status,
check errors
Execute Group
Import Group
Export Group
CS602118
Container Hierarchy
Add your own container
CS602119
Implementing the Container
CS602120
Small Service Plugin
Job
Directory
serviceOutput.txt
SmallServiceContainer
CL
IEN
TS
ER
VE
R
Execute
writes
SmallService
AJO GetJobStatus
Repeat until
Status==DONE
GetOutcome
Spool
Area serviceOutput.txt
GetSpooledFiles
DeleteJob
CS602121
Class SmallServiceContainer
public class SmallServiceContainer extends UserContainer {
...
public void buildActionGroup() {
String unicoreDir = ResourceManager.getUserDefaults().getUnicoreDir();
String userHome = ResourceManager.getUserDefaults().getUserHome();
String filename = userHome + File.separator + "serviceOutput.txt";
FileExport[] exports = {
new FileExport(this, FileStorage.NSPACE_STRING,
"serviceOutput.txt", filename, true, true)};
setFileExports(exports);
super.buildActionGroup();
}
}
CS602122
Class SmallServicePluginpublic class SmallServicePlugin extends ExtensionPlugable implements IObserver {
public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof GetJobStatus) { ... if (status == AbstractActionStatus.DONE) { sendGetOutcome(); } } else if (theObserved instanceof GetOutcome) { sendGetSpooledFiles(); } else if (theObserved instanceof GetSpooledFiles) { sendDeleteJobs(); } else if (theObserved instanceof DeleteJob) {} }
public void startPlugin() { job = new JobContainer(); task = new SmallServiceContainer(job); job.addTask(task); startButton = new JButton(new ServiceAction()); }
private void submitServiceJob(Vsite vsite) { job.setName( ResourceManager.getServicePrefix() + "SmallServiceJob" + ResourceManager.getNextObjectIdentifier()); job.setVsite(vsite); job.setUser(ResourceManager.getUser(vsite)); job.run(); }
}
CS602123
{ folder=".";
initcond="spinodal";
steerfile="control";
gifanimfile="output.gif";
unicore_demo = 1;
writecolour=1;
writecolgif=1;
makedir = "yes";
g_cc=2.0 ; tau_r = 1.0 ;tau_b = 1.0; rho = 1.0;
tmax=5000 ; dt = 10 ; gravity=0.0;
nx=128 ; ny=128; }
The Lattice Boltzmann Application
Simulation of fluent mixing Output: a gif animation Intermediate sample files are generated Control file can change parameters
while application is executing
Duration
„Mixing Factor“
CS602124
Command
TaskCL
IEN
T
Job
Directory
Running Boltzmann using aCommand Task
Input
BoltzmannInput.txt
SE
RV
ER
Import with renaming
C:\tmp\output.gif
Export
output.gif
reads writes
Boltzmann
Application
Resource
Execute
Set tmax to 300
CS602125
Disadvantages ofCommand Task
Input file has to be edited outside Client Imports and Exports have to be specified
manually No integrated GUI for parameters Results have to be visualized outside client No additional functionality possible
– sample files
– application steering
Use a specialized Boltzmann Plugin Task!
CS602126
The Boltzmann Plugin Task Plugin
– Add Boltzmann tasks to jobs
– Input file editor
– Automatically import input file
– Export and visualize sample files
– Send control files
Implemented Classes– Main plugin class
– Plugin Container
– JPA Panel
– Sample Panel
– Control Panel
CS602127
Class BoltzmannPlugin
Icon Format
public class BoltzmannPlugin extends TaskPlugable {
public ActionContainer getContainerInstance(GroupContainer parentContainer) { BoltzmannContainer container = new BoltzmannContainer(parentContainer); container.setName("New_" + getName() + counter); counter++; return container; }
public String getIconPath() { return "org/gridschool/unicore/plugins/boltzmann/boltzmann.gif"; }
public String getName() { return "Boltzmann"; }
public String getPluginInfo() { return "Grid School Example: The Boltzmann Plugin“; }
public JPAPanel getPanelInstance(ActionContainer container) { return new BoltzmannJPAPanel(getClient(), (BoltzmannContainer)container); }
public void startPlugin() {}
public void stopPlugin() {}
}
CS602128
PluginJPAPanel
CL
IEN
T
Job
Directory
Run and steer Boltzmann from Plugin
Inputoutput.gif
SE
RV
ER
Boltzmann
Application
Resource
PluginContainer
Export Input file
Execute
reads
SamplePanel
Sample.gif
writes
Get File From Uspace Request
writes
ControlPanel
Control
Send File To Uspace Request
reads
Editor Export Panel
CS602129
Class BoltzmannJPAPanel
Set parameters in container Use RemoteTextEditor,
ImportPanel and ExportPanel Implements interface
Applyable
Container JPAPanel
applyValues
resetValues
updateValues
CS602130
Remote Text Editor Load, edit and save files from remote and local file
spaces private RemoteTextEditor textEditor = new RemoteTextEditor();
private void buildComponents() {
JTabbedPane tabbedPane = new JTabbedPane();
tabbedPane.add(textEditor, "Input File");
...
}
public void applyValues() {
container.setInputFile(textEditor.getFile());
container.setInputString(textEditor.getText());
...
}
public void resetValues() {
textEditor.setText(container.getInputString());
textEditor.setFile(container.getInputFile());
...
}
public void updateValues(boolean vsiteChanged) {
if (vsiteChanged) {
textEditor.setVsite(container.getVsite());
}
...
}
}
CS602131
Import and Export Panels Specify file imports and exports from the GUI Use out of the box
New Import
Remove Import
Browse file systems
CS602132
Class BoltzmannContainerpublic class BoltzmannContainer extends UserContainer { private String inputString;
protected void buildExecuteGroup() { byte[] contents = StringTools.dos2Unix(inputString).getBytes(); IncarnateFiles incarnateFiles =new IncarnateFiles("INCARNATEFILES"); incarnateFiles.addFile(INPUT_FILENAME, contents); ResourceSet taskResourceSet = getResourceSet().getResourceSetClone(); taskResourceSet.add(getPreinstalledSoftware()); UserTask executeTask = new UserTask(getName(), null, taskResourceSet, getEnv(), getCommandLine(), null, getRedirectStdout(), getRedirectStderr(), isVerboseOn(), isVersionOn(), null,getMeasureTime(), getDebug(), getProfile()); executeGroup = new ActionGroup(getName() + "_EXECUTION"); executeGroup.add(incarnateFiles); executeGroup.add(executeTask); try { executeGroup.addDependency(incarnateFiles, executeTask); } catch (InvalidDependencyException e) { logger.log(Level.SEVERE, "Cannot add dependency.", e); } }
public ErrorSet checkContents() { ErrorSet err = super.checkContents(); if (inputString == null || inputString.trim().length() == 0) { err.add(new UError(getIdentifier(), "No input file specified")); } }}
CS602133
Additional Outcome Panels Implement interface IPanelProvider in Container
public class BoltzmannContainer extends UserContainer implements IPanelProvider { .... public int getNrOfPanels() { return 2; }
public JPanel getPanel(int i) { if (i == 0) { if (samplePanel == null) { samplePanel = new BoltzmannSamplePanel(); } return samplePanel; } else { if (controlPanel == null) { controlPanel = new BoltzmannControlPanel(); } return controlPanel; } }
public String getPanelTitle(int i) { if (i == 0) { return "Sample"; } else { return "Control"; } }
public void finalizePanel() {}
}
CS602134
Class BoltzmannControlPanelpublic class BoltzmannControlPanel extends JPanel implements IObserver { private RemoteTextEditor editor; ... private JobContainer getJobContainer() { return ResourceManager.getCurrentInstance().getJMCTree().getCurrentJob(); }
private BoltzmannContainer getBoltzmannContainer() { return (BoltzmannContainer) ResourceManager.getCurrentInstance().getJMCTree().getFocussedObject(); }
private void sendControlFile() { JobContainer jobContainer = getJobContainer(); AJOIdentifier ajoId = (AJOIdentifier)jobContainer.getIdentifier(); Vsite vsite = jobContainer.getVsite(); String[] filenames = {CONTROL_FILE};
byte[][] contents = new byte[1][]; String inputString = StringTools.dos2Unix(editor.getText()); contents[0] = inputString.getBytes();
SendFilesToUspace request = new SendFilesToUspace(ajoId, filenames, contents, vsite); request.addObserver(this); request.start(); }
public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof SendFilesToUspace) { AbstractJob_Outcome outcome = (AbstractJob_Outcome)changeCode; logger.info("SendFilesToUspace result: " + outcome.getStatus()); } }}
CS602135
Class BoltzmannSamplePanelpublic class BoltzmannSamplePanel extends JPanel implements IObserver {
...
private void getSampleFile() { JobContainer jobContainer = getJobContainer(); AJOIdentifier ajoId = (AJOIdentifier)jobContainer.getIdentifier(); Vsite vsite = jobContainer.getVsite(); String[] filenames = {SAMPLE_FILE};
GetFilesFromUspace request = new GetFilesFromUspace(ajoId, filenames, vsite); request.addObserver(this); request.start(); }
public void observableUpdate(Object theObserved, Object changeCode) { if (theObserved instanceof GetFilesFromUspace) { AbstractJob_Outcome outcome = (AbstractJob_Outcome)changeCode; logger.info("GetFileFromUspace result: " + outcome.getStatus()); if (outcome.getStatus().isEquivalent(AbstractActionStatus.SUCCESSFUL)) { GetFilesFromUspace request = (GetFilesFromUspace)theObserved; File imageFile = (File)request.getLocalFiles().firstElement(); Image image = Toolkit.getDefaultToolkit().createImage(imageFile.getAbsolutePath()); imagePanel.setImage(image); imagePanel.repaint(); return; } } }
}
CS602136
Summary
Extension Plugins– Easy way to submit custom AJOs
– Use Client infrastructure
Task Plugins– Integrated Application support
– Use sub classes of UserContainer
– Use Client GUI elements
UNICOREpro Client Plugin Programmer‘s Guide– www.unicorepro.com → Documents
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce
Lecture 8: Resource BrokerLecture 8: Resource Broker
A resource broker for Unicore. This software was designed to provide and imporatant Grid abstraction, namely that the middleware should find the resources appropriate to the users request. In this way the user does not need to know what resources are on the Grid or to maintain lists of appropriate resources.
CS602138
Abstract Functions for aResource Broker
Resource discovery, for workflows as well as single jobs.
Resource capability checking, do the offering sites have ALL necessary capability and environmental support for instantiating the workflow.
Inclusion of Quality of Service policies in the offers. Information necessary for the negotiation between
client and provider and mechanisms for ensuring contract compliance.
Document submitted to GPA-RG group of GGF.
CS602139
Design of EuroGrid Resource Broker• To utilise the structure of UNICORE, in particular the
AJO.• To utilise the Usite/Vsite structure, in particular to
extend the Vsite to the concept of a Brokering Vsite. Two modes of operation are possible:• A simple Resource Check request: “Can this job run here”,
checks static qualities like software resources (e.g. Gaussian98 A9) as well as dynamic resources like quotas (disk quotas, CPU, etc.)
• A Quality of Service request: returns a range of turnaround time, and cost, as part of a Ticket. If the Ticket is presented (within its lifetime) with the job, the turnaround and cost estimates should be met.
CS602140
Ancestral Broker The API allows two levels of operation:
Resource Checking: Static requirements, capability and capacity.QoS Checking: Performance vs cost. Tickets can be issued as a
“guarantee”. Protocol can be used symmetrically by Broker.
User
BrokerNJS
ExecutionNJS
ExecutionNJS
1 CheckQoS
2 CheckQoS
2 CheckQoS
3 CheckQoS_Outcome
3 CheckQoS_Outcome
4 CheckQoS_Outcome
CS602141
The Brokering Process
T3E
O3000
VPP300
Broker
User
1. QoS Request
4. Ticket(s)
3. Ticket(s)
2. QoS Request
CS602142
Resource Broker Graph
Manchester Computing
EUROGRID
Green (O3000)
Fuji (VPP)
Turing (T3E)
IDRIS
Cray T3E
IBM ???
LeSC
UK Grid
???
T3Es `R’ USBrokeringVsite
Dumb Vsite
Figure 1: Possible Resource Broker Graph
CS602143
Advanced Features of the Broker• In addition the interface allows:
• a Ticket may contain a modified resource set which must be used with the Ticket.
• Multiple Tickets may be returned from a single site
• This powerful mechanism allows a Broker to:• Bid for jobs requiring resources it can’t find, e.g. when receiving
a bid requiring 256 processors for 1 hour, it could return a Ticket for 128 processors for twice as long
• Return a spread of offers, representing different priorities and corresponding costs
• Refine abstract application-specific resource requirements to concrete resource requirements, using built-in performance information about the code.
CS602144
Example Application-Specific Broker• An application Specific Resource Requirement is defined, that can
express “Run the DWD local weather model code, over X grid points, simulating a Y hour period”.
• Standardisation is controlled by the Sun-style Community Source Licence (new interfaces must be returned to the community for publication).
• A broker can then be designed that takes such a resource requirement, e.g. “DWD local weather model code, over 1000 grid points, simulating over a 24 hour period”, and returns concretised offers, such as “16 T3E processors for an hour, done by midday, for £32”, “32 O3000 processors for 30 minutes, done in an hour, for £45”.
• So the knowledge of the application performance is kept in a single place – the Broker, away from the users. Consequently, the users never have to learn the code’s performance characteristics.
CS602145
Interoperability across Grids The emerging infrastructure with multiple Grids is already
complex. One cannot guarantee to have a uniform middleware such
as UNICORE or Globus across all Grids. Therefore a translation service is necessary. We can link this to semantic information via a Grid
Resource Ontology. We are then starting to get to the right level of abstraction
for a genuine infrastructure for computational resource. It now no longer matters what you call this, the abstractions
reflect the underlying reality of usage and must be flexible to change with differing usage.
CS602146
Interoperable Broker: Method 1
1. The Network Job Supervisor delegates the Resource Check to the Broker at the Vsite.
2. The Unicore brokering track utilises the Incarnation Data Base exactly as for the ancestral broker.
3. The Globus track uses a translator of the QoS check object. The translation service is extendable.
4. The results of the translation are used to drive the LDAP search and the Globus broker then utilises MDS to perform this.
CS602147
NJSNJS
BrokerBroker
Unicore BrokerUnicore Broker Globus BrokerGlobus Broker
IDBIDB TranslatorTranslator FilterFilter
Basic TranslatorBasic Translator MDS(GRIIS/GRIS)MDS(GRIIS/GRIS)
Delegates resource checkDelegates resource check
LookupLookupresourcesresources
Delegates translationDelegates translation Uses to drive Uses to drive LDAP searchLDAP search
PerformsPerformsLDAP searchLDAP search
DiagramOf Broker Architecture
Architecture: Method 1
CS602148
Ontologies• Need ontologies at BOTH application and infrastructure
level.
• If we can create a Grid Resource Ontology, creation of specialist translation classes from basic Grid translator becomes possible.
• Incarnation Data Base at sites can be created via ontology, it contains site specific information which the clients job specification cannot do.
• So brokers take client request formulated in RR space, at each site use translator to convert to RR space, offers come back with capability and QoS.
CS602149
NJSNJS
BrokerBroker
Unicore BrokerUnicore Broker Globus BrokerGlobus Broker
IDBIDBTranslatorTranslator
FilterFilterOntology engineOntology engine
Resource DiscoveryService
Resource DiscoveryService
Delegates resource checkDelegates resource check
LookupLookupresourcesresources
Delegates Delegates translationtranslation Uses to drive Uses to drive
MDS searchMDS search
HierarchicalHierarchicalGrid SearchGrid Search
DiagramOf Broker Architecture
Architecture: Method 2
FilterFilter
Uses to Uses to Drive MDSDrive MDSSearchSearch
Nodal Nodal Grid Grid SearchSearch
OtherOtherBrokersBrokers
Resource DiscoveryService
Resource DiscoveryService
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce Lectures 9-10: Grid Lectures 9-10: Grid InteroperabilityInteroperability
A useful test of the validity of the Grid concept is to show that different middleware systems can be made to interoperate. Here we show how the Grid Interoperability Project enabled the high-level abstractions of Unicore to be mapped onto the Globus toolkit.We thank Phillip Wieder for permission to use this material.
CS602151
Outline
• Introduction to GRIP
• Interoperability Layer: Design
• Interoperability Layer: Realisation
• UNICORE – Globus: Work in Progress
• Summary
CS602152
The GRid Interoperability Project
• Development of an interoperability layer between the two Grid systems
• Interoperable applications
• Contributions made to the Global Grid Forum
• UNICORE towards Grid Services
… to realise the interoperability of UNICORE and Globus and to work towards standards for interoperability in the Global Grid Forum:
See http://www.grid-interoperability.org
CS602153
Focus of this Talk
… to describe the UNICORE – Globus interoperability
layer in detail concentrating the design & theimplementation.
Out of scope
• Interoperable applications
• Standardization work
Briefly discussed
• Interoperable resource broker
CS602154
Interoperability Layer: Design
CS602155
Client
NJS NJS
Gateway
TSI
UUDB
Gateway
NJS (brokering
)
USite USite
Starting Point: UNICORE Architecture
UPL (Unicore Protocol Layer)
NJS <-> TSI Protocol
IDB IDB
Abstract
Non-abstract
authorisation
authentication
incarnation
multi-site jobs
Batch system
cmds files
UPL
Target System A
Uspace
Target System B
CS602156
The software the TSI interfaces to can be:
• Batch system or scheduler
• UNIX shell (batch system emulated)
• GRID resource manager component (like Globus GRAM)
Term “Batch system” used in this talk
Target system == Execution system TSI runs on
Note of Clarification
CS602157
Client
NJSUUDB
Gateway
Interfacing Globus through UNICORE
IDB
Globus server
Target System
Uspace
TSIGlobus client
MDS
Globus hostMDS – Monitoring
& Discovery Service
CS602158
Challenges
• Grid approach
• UNICORE: User oriented workflow environment
• Globus: Services, APIs & portal builder
UNICORE as a workflow portal for Globus
• Security
• UNICORE: End-to-end security model• Globus: Requires transitive trust
Don’t violate UNICORE’s security model
• Resource description
• UNICORE: One model for discovery & request
• Globus: Different models
Map from MDS (LDAP), map to RSL
CS602159
Interoperability Modules
The following modules have been defined:
• Security
• Resources & information
• Job preparation
• Job submission & monitoring & control
• Output retrieval & file management
CS602160
Interoperability Layer: Realisation
CS602161
Security Basics
• Public/private key infrastructure to establish connections
• X509v3 certificates (incl. extensions)
UNICORE:
• End-to-end security, jobs signed
• Keys & certificates are stored in a keystore at the client side
Globus:
• Transitive trust, proxy certificates
• Keys & certificates are stored on the file system
CS602162
Client
NJS
Gateway
Interfacing Globus through UNICORE
Globus server
Target System
Uspace
TSIGlobus client
MDS
Globus host
X.509 user cert
Globus proxy
GSI enabled auth & comm
GSI – Grid Security Infrastructure
CS602163
Security Interoperation
• Proxy Certificate Plugin generates a proxy from the UNICORE user’s private key
• The proxy certificate is transferred to the user’s Uspace
• Proxy used for every task involving GSI enabled authentication & communication
• Configure Globus client (TSI) to use proxy
• Configure Globus server to trust signing CA
Details next slide ...
CS602164
CLIE
NT
SER
VER
SSO
Proxy certificate
Create proxy & encapsulate in Site-specific Security Object (SSO)
Network Job Supervisor
(NJS)
Gateway
Job Directory (Uspace)Unpack proxy into $USPACE/.proxy
Proxy certificate
Proxy Certificate Creation & Transfer
CS602165
Resources & Information
• Globus host specific information (hostname, port, ...) is configured at the TSI
• No extensions to the UNICORE Incarnation Database
• Interoperable Resource Broker for UNICORE IDB and Globus Monitoring & Directory Service (MDS)
• Alpha version
• Currently mapping between UNICORE & MDS resource descriptions
• Extensible
CS602166
Filter
Resource Broker Architecture (early 2003)
Translator
GIIS – Grid Index Information ServiceGRIS – Grid Resource Information Service
Broker
UNICORE Broker Globus Broker
NJS
IDB
MDS (GIIS/GRIS)
Basic Translator
Delegates resource check
Lookup resources
Delegates translation
Uses to drive LDAP
search
Performs LDAP
search
Diagram: John Brooke, University of Manchester
CS602167
The Target System Interface (TSI)
... implements the target system/batch system specific
functions to manage the incarnated tasks on the specific system.
• Normally runs as root (set*id)• Single threaded, multiple workers to support
multi-threaded NJS• NJS – TSI communication via plain sockets• Two implementations: Perl & Java
CS602168
TSI Flavours
TSI Grid Service
work in progress
Globus 3
Globus 2
Unix Shell
×Batch system
JavaPerlTSI Impl.
Target
× ×
×
work in progress
prototype using OGSI::Lite
CS602169
TSI: Perl Implementation
Vendor Type OS Batch Sub-System
Hitachi SR 8000 HI-UX/MPP NQS
IBM SP AIX LoadLeveler (+DCE),LSF (prototype)
Fujitsu VPP series UXP/V NQS
NEC SX series Super UX NQS
Cray T3E, SV1 UNICOS NQE
Various PCs IA32 clusters Linux PBS, CCS
SGI O2000/3000, Onyx IRIX NQS
Workstations (e.g. SUN, SGI, Linux)
native emulated
CS602170
TSI: Java Implementation
... implements the same functionality as the Perl TSI.
• Alpha version
• Unix only since uses set*id via Java Native Interface (JNI)
• Globus 2 version makes use of the Java CoG Kit
• Basis for interface to Globus Toolkit 3 (work in progress)
• NJS remains unchanged
CS602171
The Globus TSI
• Implemented interop. modules:
• Job preparation
• Job submission & monitoring & control
• Output retrieval & File management
• Target system: Globus Toolkit 2.x
• Perl (beta, inside firewall) & Java (alpha, outside firewall) implementations
• Both versions under development
• Current focus: GT3 & TSI Grid Service
CS602172
Batch-/Operating system (PBS, LSF, Linux, ...)
TSIShepher
d
NJS
initiate control / data
fork
TSI: Architecture (Perl Implementation)
TSI Workers
batch / os commands
TSI
CS602173
TSIShepher
d
TSI Workers
NJS
initiate control / data
forkGlobus proxy
Globus 2
Batch-/Operating System
Globus TSI
Globus protocolsTSI
Globus server
Globus client
CS602174
Job Preparation
Globus RSL job
&("executable"=/var/eurogrid/Globus/bin/globus-sh-exec)("directory"=/filespace/uspace_d3d775a/)("hostCount"="1")("count"="1")("maxTime"="10")("maxMemory"="1024")("queue"="low")("stdout"=https://...:39553/filespace/outcome_d3d775a/.../stdout)("stderr"=https://...:39553/filespace/outcome_d3d775a/.../stderr)
Incarnated UNICORE job
#TSI_USPACE_DIR /filespace/uspace_d3d775a/#TSI_OUTCOME_DIR /filespace/outcome_d3d775a/.../#TSI_TIME 600#TSI_MEMORY 1024#TSI_NODES NONE#TSI_PROCESSORS 1
Mapping
CS602175
Job Submission, Monitoring & Control
Globus proxy
GRAM Client
GRAM Gatekeep
er
Globus 2
GRAM Job Manager
Batch-/Operating system
create
TSI
&("executable"=/var/eurogrid/Globus/bin/globus-sh-exec)("directory"=/filespace/uspace_d3d775a/)("hostCount"="1")("count"="1")("maxTime"="10")("maxMemory"="1024")("queue"="low")("stdout"=https://...:39553/filespace/outcome_d3d775a/.../stdout)("stderr"=https://...:39553/filespace/outcome_d3d775a/.../stderr)
Job submission
Job control & status info
TSI Worker
GRAM – Globus Resource Allocation Manager
CS602176
TSI Worker
Globus proxy
GRAM ClientGASSServe
r
GRAM Gatekeep
er
Globus 2
GRAM Job Manager
GASS Client
Batch-/Operating system
TSI
Output Retrieval
stdout & stderr
create
Job submission
Job control & status info
GASS – Global Access to Secondary Storage
CS602177
File Management
• Necessary if TSI & Globus on different target systems
• Usage of GridFTP or GASS (automatic staging possible)
• Maintainance of remote Uspace (“Gspace”)
TSI Worker
Globus proxy
GRAM ClientGASSServe
r
GRAM Gatekeep
er
Globus 2
GRAM Job Manager
GASS Client
TSIstdout & stderr
create
Job submission
Job control & status info
GSpace
file staging & maintainance
USpace
CS602178
TSIShepher
d
NJS
initiate control / data
forkGlobus proxy
GRAM ClientGASSServe
r
GRAM Gatekeep
er
Globus 2
GRAM Job Manager
GASS Client
Batch-/Operating System
create
Globus 2 Target System Interface
Globus protocols
Job preparation
TSI
GSpace
USpace
CS602179
Globus API
• Submission: globusrun (returns <jobid>)
• Monitoring: globusrun –status <jobid>
• Control: globus-job-cancel <jobid>
• Output retrieval: globus-gass-server
• File Transfer: globus-url-copy (supports GridFTP & HTTP(S) for GASS transfers)
... or the corresponding Java Commodity Grid (CoG) Kit API
methods (Java TSI)
CS602180
Behind the Scenes: Create Uspace
# Incarnation of task <JobStart># Incarnation produced for Vsite <zam289_grip_test> at ......#TSI_IDENTITY zdv190 NONE#TSI_USPACE_DIR /opt/Unicore/filespace/uspace_8fdce574/#TSI_EXECUTESCRIPT# Commands to incarnate a Uspace/bin/mkdir -p -m700 /opt/Unicore/filespace/uspace_8fdce574//bin/mkdir -p -m700 /opt/Unicore/filespace/outcome_8fdce574/...
CS602181
Behind the Scenes: Job Submission# Incarnation of task <SimpleScript>
...#TSI_IDENTITY zdv190 NONE#TSI_USPACE_DIR /opt/Unicore/filespace/uspace_8fdce574/#TSI_SUBMIT#TSI_JOBNAME SimpleScript#TSI_OUTCOME_DIR /opt/Unicore/filespace/outcome_8fdce574/AA..#TSI_TIME 600#TSI_MEMORY 1024#TSI_NODES 1#TSI_PROCESSORS 1#TSI_HOST_NAME zam289_grip_test...#TSI_QUEUE low#TSI_EMAIL NONE...# Incarnation of ExecuteTask, UserTask or ExecuteScriptTask...
RETURNS: https://zam289.zam.kfa-juelich.de:32894/2796/1061744497/
CS602182
Behind the Scenes: Job Monitoring
#TSI_IDENTITY zdv190 NONE#TSI_GETSTATUSLISTING
RETURNS: QSTAThttps://zam289.zam.kfa-juelich.de:32894/2796/1061744497/ RUNNING
CS602183
TSI Modules
• tsi:Perl script to be executed, TSI configurationGlobus server information
• Initialisation:Contact NJS; create WorkersStart repository process
• MainLoop:Listen to NJS & process inputNo changes
CS602184
TSI Modules (cont.)
• Submit: Job submission to resource manager, returns jobID; prerequ.: pre-staging complete, target job description available
• GetStatusListing:Returns list of states (SUCCESSFUL, FAILED, PENDING, QUEUED, EXECUTING) for known jobIDs
• JobControl:abort, cancel, hold, resume
Job submission/monitoring/control
CS602185
TSI Modules (cont.)
• PutFiles:Writes files sent by NJS to target system
• GetDirectory:Return dir. & content to NJS
• EndProcessing:Job finished (check for stdout & stderr)?Close GASS server, update repository
• Reporting:Logging, debuggingLog Globus output
File transfer
CS602186
TSI Modules (cont.)
• BecomeUser:set*id; No changes
• ExecuteScript:Execute script; No changes
• DataTransfer:GASS control
• Globus:Job repository & Globus specific var.s
• JobPreparation:Mapping from UNICORE job description to RSL
Globus TSI specific
CS602187
“Classic” TSI Setup
SSL
Client
Client firewall
Gateway
NJS
TSI
Globus Server
Server firewall
SSL
Server demilitarized zone
TSI & Globus on target system
& inside firewall:
+ Ignore Globus firewall issues
+ Uspace == “Gspace”
- “Restricted” interoperability -> no direct remote access
CS602188
“Remote” TSI Setup
TSI & Globus outside firewall &
on different machines:
+ Interoperation with any Globus server possible
- Maintainance of temporary “Gspace”
SSL
Client
Client firewall
Gateway
NJS
TSI
Globus Server
Server firewall
SSL
Server demilitarized zone
SSL
CS602189
UNICORE – Globus:Work in Progress
CS602190
TSI Developments
• Java TSI as Grid Service Client (GT3)
• Currently only Job Submission
• Add file transfer & other services
• TSI Grid Service & NJS – TSI protocol
• TSI portType(s) (WSDL)
• XML Schema message definition
• Perl TSI & OGSI::Lite hosting environment
CS602191
TSIShepher
d
TSI Workers
NJS
initiate control/data
as beforeGlobus proxy
Grid Service Client
Master Job Factory Service (MJFS)
Managed Job Service(MJS)
creates
web services
Interfacing GT3 GRAM
Globus 3
TSIcreateService
create
SOAP
Batch system
CS602192
Batch-/Operating system (PBS, LSF, Linux, ...)
TSIGrid
ServiceFactory
NJS
createService SOAP messages
create
TSI Grid Service
TSI Grid Service
Instances
batch / os commands
OGSI::Lite
TSI Grid Service Client
CS602193
Resource Broker Developments
• UNICORE ontology (basis: JavaDoc)
• Ontology for MDS (basis: GLUE schema)
• Ontology mapping
• Integrate ontology engine into broker
• Resource broker portType
• Towards a Grid resource ontology
CS602194
Resource Broker Architecture
Diagram: Donal Fellows, University of Manchester
ComputeResourceComputeResource
BrokerBroker
NJSNJSIDBIDB UUDBUUDB
ExpertBrokerExpertBroker
DWDLMExpertDWDLMExpert OtherOther
LocalResourceCheckerLocalResourceChecker
UnicoreRCUnicoreRC GlobusRCGlobusRC
TranslatorTranslator
OntologicalTranslatorOntologicalTranslator
OntologyOntology
SimpleTranslatorSimpleTranslator
MDSGRAMTSI
ICMExpertICMExpert
Look up staticresources
Look upconfiguration
Verify delegatedidentities
Delegate to application-domain expert codeDelegate to Grid architecture-specificengine for local resource check
Pass untranslatable resources to Unicore resource checker
Look up resourcesLook updynamicresources
Delegate resource domain translation
Look up translations appropriateto target Globus resource schema
Broker hosted in NJS
Get back set ofresource filters and set ofuntranslatable resources
TicketManagerTicketManager
UNICORE ComponentsEUROGRID BrokerGlobus ComponentsGRIP Broker
Key:
Inheritance relation
Get signed ticket (contract)
Look up signing identity
CS602195
Other Activities
• XML Schema for UNICORE resource model
• OGSI’fication: UUDB portType, resource database portType, ...
• UNICORE Service Data Framework
• GGF: standardize portTypes, protocols, ...
• ...
GRIP is not the end
CS602196
Summary
CS602197
Interoperability Abstraction
• Single sign-on: Use SSO to transfer alternative security credentials through to the target system
• Resource discovery: extend resource broker
• Resource request: map UNICORE job description to representation needed
• Use batch system specific APIs/commands for job submission/monitoring & data transfer
• Income/Outcome staging to/from Uspace
Note: This MAY imply changes not only to the TSI
CS602198
How to Start?
• Take interoperability modules as starting point
• Consider security & resource/information representation/management carefully
• Define UNICORE client extensions if necessary
• Are server modifications necessary?
• Specify Perl modules to be implemented/changed
CS602199
Recommended Reading
• Grid Interoperability Project:http://www.grid-interoperability.org
• UNICORE software download:http://www.unicore.org/downloads.htm
• UNICORE Plus Final Report:http://www.unicore.org/documents/UNICOREPlus-Final-Report.pdf(Good intro to UNICORE)
• “An Analysis of the UNICORE Security Model”, GGF public comment period:http://sourceforge.net/projects/ggf(contains GRIP part; subsequent docs ready for submission)
CS602200
Recommended Reading (cont.)
• Java Commodity Grid Kit:http://www-unix.globus.org/cog/java/index.php(Also good intro to Globus programming)
• Globus Resource Allocation Manager:http://www-unix.globus.org/developer/resource-management.html
• “Globus Firewall Requirements”http://www.globus.org/security/v2.0/Globus%20Firewall%20Requirements-5.pdf
• OGSI::Lite – A Perl Hosting Environment:http://www.sve.man.ac.uk/Research/AtoZ/ILCT
Man
ch
este
r C
om
pu
tin
gSup
erc
om
puti
ng,
Vis
ualiz
ati
on &
e-S
cien
ce
Lecture 12 - Case Study Lecture 12 - Case Study
This case study presents the RealityGrid project. It has used most types of Grid middleware, Unicore, Globus and its own Perl web services implementation OGSI::Lite. It has also created application APIs for computational steering
CS602202
The RealityGrid ProjectMission: “Using Grid technology to closely couple
high performance computing, high throughput experiment and visualization, RealityGrid will move the bottleneck out of the hardware and back into the human mind.”
Scientific aims: to predict the realistic behavior of matter using diverse
simulation methods (Lattice Boltzmann, Molecular Dynamics and Monte Carlo) spanning many time and length scales
to discover new materials through integrated experiments.
CS602203
Partners
Academic University College London Queen Mary, University of London Imperial College University of Manchester University of Edinburgh University of Oxford University of Loughborough
Industrial Schlumberger Edward Jenner Institute for
Vaccine Research Silicon Graphics Inc Computation for Science
Consortium Advanced Visual Systems Fujitsu
CS602204
RealityGrid Characteristics Grid-enabled (Globus, UNICORE) Component-based, service-oriented Steering is central
– Computational steering
– On-line visualisation of large, complex datasets
– Feedback-based performance control
– Remote control of novel, grid-enabled, instruments (LUSI)
Advanced Human-Computer Interfaces (Loughborough) Everything is (or should be) distributed and collaborative High performance computing, visualization and networks All in a materials science domain
– multiple length scales, many "legacy" codes (Fortran90, C, C++, mostly parallel)
CS602205
Exploring Parameter Spacethrough Computational Steering
Initial condition: Random water/ surfactant mixture.
Self-assembly starts.
Rewind and restart from checkpoint.
Lamellar phase: surfactant bilayers between water layers.
Cubic micellar phase, low surfactant density gradient.
Cubic micellar phase, high surfactant density gradient.
CS602206
Computational Steering – Why? Terascale simulations can generate in days data that takes
months to understand Problem: to efficiently explore and understand the
parameter spaces of materials science simulations Computational steering aims to short circuit post facto
analysis– Brute force parameter sweeps create a huge data-mining problem– Instead, we use computational steering to navigate to interesting
regions of parameter space– Simultaneous on-line visualization develops and engages
scientist's intuition– thus avoiding wasted cycles exploring barren regions, or even
doing the wrong calculation
CS602207
Computational Steering – How? We instrument (add "knobs" and "dials" to) simulation
codes through a steering library Library provides:
– Pause/resume– Checkpoint and windback– Set values of steerable parameters– Report values of monitored (read-only) parameters– Emit "samples" to remote systems for e.g. on-line visualization– Consume "samples" from remote systems for e.g. resetting
boundary conditions Images can be displayed at sites remote from visualization
system, using e.g. SGI OpenGL VizServer, or Chromium Implemented in 5+ independent parallel simulation codes,
F90, C, C++
CS602208
Philosophy Provide right level of steering functionality to application
developer Instrumentation of existing code for steering
– should be easy
– should not bifurcate development tree
Hide details of implementation and supporting infrastructure– eg. application should not be aware of whether communication with
visualisation system is through filesystem, sockets or something else
– permits multiple implementations
– application source code is proof against evolution of implementation and infrastructure
CS602209
Steering and Visualization
Simulation
Visualization
Visualization
data transfer
Client
Steering library
Steering library
Steering library
Display
Display
Display
CS602210
Architecture
Communication modes:• Shared file system• Files moved by UNICORE daemon• GLOBUS-IO• SOAP over http/https
Simulation
Visualization
Visualization
data transfer
Client
Steering library
Steering library
Steering library
Data mostly flows from simulation to visualization.
Reverse direction is being exploited to integrate NAMD&VMD into RealityGrid framework.
CS602211
Steering in the OGSA
Steering client
Simulation
Steering library
Visualization
Visualization
Registry
Steering GS
Steering GS
con
nect
publish
find
bind
data transfer
publish
bind
Client
Steering library
Steering library
Steering library
CS602212
Steering in OGSA continued… Each application has an associated OGSA-compliant “Steering Grid Service”
(SGS) SGS provides public interface to application
– Use standard grid service technology to do steering
– Easy to publish our protocol
– Good for interoperability with other steering clients/portals
– Future-proofed next step to move away from file-based steering or Modular Visualisation Environments with steering capabilities
SGSs used to bootstrap direct inter-component connections for large data transfers
Early working prototype of OGSA Steering Grid Service exists– Based on light-weight Perl hosting environment OGSI::Lite
– Lets us use OGSI on a GT2 Grid such as UK e-Science Grid today
CS602213
Steering client
Built using C++ and Qt library – currently have execs. for Linux and IRIX
Attaches to any steerable RealityGrid application
Discovers what commands are supported Discovers steerable & monitored
parameters Constructs appropriate widgets on the fly
Web client (portal) under development
CS602214
program lbe
use lbe_init_module
use lbe_steer_module
use lbe_invasion_module
RealityGrid-L2: LB3D on the L2G
VisualizationSGI Onyx
Vtk + VizServer
SimulationLB3D with RealityGrid
Steering API
LaptopVizserver Client
Steering GUIGLOBUS used to
launch jobs
SGI OpenGL VizServer
SimulationData
GLOBUS-IOSteerin
g (XML)
File based communication via
shared filesystem: Steerin
g GUI
X output is tu
nnelled back using
ssh.
ReG steering GUI
CS602215
Performance Control
application
component1
component2
component3
application performanc
e steerer
component performance steerer
component performance steerer
component performance steerer
CS602216
Advance Reservation and Co-allocation:Summary of Requirements
Computational steering + remote, on-line visualization demand:– co-allocation of HPC (processors) and visualization (graphics pipes and processors)
resources– at times to suit the humans in the loop
• advanced reservation
For medium to large datasets, Network QoS is important– between simulation and visualization,– visualisation and display
Integration with Access Grid– want to book rooms and operators too
Cannot assume that all resources are owned by same VO Want programmable interfaces that we can rely on
– must be ubiquitous, standard, and robust
Reservations (agreements) should be re-negotiable Hard to change attitudes of sysadmins and (some) vendors
CS602217
Steering and Workflows Steering adds extra channels of information and control to
Grid services. Steering and steered components must be state-aware,
underlying mechanisms in OS and lower-level schedulers, monitors, brokers must be continually updated with changing state.
How do we store and restore the metadata for the state of the parameter space search?
Human factors are built into our architecture, humans continually interact with orchestrated services. What implications for workflow languages?
CS602218
Collaborative Aspects Multiple groups exploring multiple regions of parameter
space. How to record and restore the state of the collaboration? How to extend the collaboration over multiple sessions? What are the services and abstractions necessary to
bootstrap collaborative sessions? How do we reliably recreate the resources required by the
services, in terms of computation, visualization, instrumentation and networking.
CS602219
Integration with Access Grid?
Service for Bootstrappingsession
Contains “just enough”Information to start otherServices, red arrows indicate bootstrapping
Virtual Venues ServerMulticast addressingBridges
Visualization WorkflowWorkflows saved from Previous sessions or Created in this session
Simulation WorkflowWorkflows saved fromPrevious sessions orCreated in this session
Data Source WorkflowWorkflows saved from Previous sessions or Created in this session
Process RepositoryCollaborative processesCaptured using ontologyCan be enacted byWorkflow engines
Application RepositoryUses application specific ontology to describe what in silico processes need To be utilised for the session
Participants location and access rights
Application data, computation and visualization requirements
Who participates?
What do they use?
CS602220
How far have we got?
Linking US Extended Terascale Facilities and UK HPC resources via a Trans-Atlantic Grid
We used these combined resources as the basis for an exciting project
– to perform scientific research on a hitherto unprecedented scale
Computational steering, spawning, migrating of massive simulations for study of defect dynamics in gyroid cubic mesophases
Visualisation output was streamed to distributed collaborating sites via the Access Grid
Workshop presentation with FZ Juelich and HLRS, Stuttgart on the theme of computational steering.
At Supercomputing, Phoenix, USA, November 2003 TRICEPS entry won “Most Innovative Data-Intensive Application”
CS602221
Summary All our workflow concepts are built around the idea of
Steerable Grid Services. Resources used by services have complex state, may
migrate, may be reshaped. Collaborative aspects of “Humans in the loops” are
becoming more and more important. The problems of allocating and managing the resources
necessary for realistic modelling are very hard, they require (at present) getting below the Grid abstractions.
Clearly the Grid abstractions are not yet sufficiently comprehensive and in particular lack support for expression of synchronicity.