ws – pgrade tutorial mta sztaki laboratory of parallel and distributed systems (lpds)
Post on 03-Jan-2016
22 Views
Preview:
DESCRIPTION
TRANSCRIPT
WS – PGRADE Tutorial
MTA SZTAKILaboratory of Parallel and Distributed Systems (LPDS)
M. KozlovszkyResearch fellowm.kozlovszky@sztaki.hu
www.lpds.sztaki.hu
2
Contents•History, family of products•P-GRADE Portal, WS-PGRADE, gUSE
•WS-PGRADE in a nutshell•Architecture, high level structures, Workflow handling•WS-PGRADE features
•Parameter Study (PS) support in WS-PGRADE•Generator-job-collector type PS•Xconnect-DOTconnect type PS
•Repository•Internal Storage (home dir)•Automated workflow submission & Timing•Web services•Embedded Workflows•DB support•Conditional job submission
•Howto use WS-PGRADE /Basic + Advanced tasks/ •Case studies
•CancerGrid portal, TINKER application
3
Family of P-GRADE Portal products
•P-GRADE portal•Creating (basic) workflows and parameter sweeps for clusters, service grids, desktop grids•www.portal.p-grade.hu
•P-GRADE/GEMLCA portal (University of Westminster)•To wrap legacy applications into Grid Services•To add legacy code services to P-GRADE Portal workflows•http://www.cpc.wmin.ac.uk/cpcsite/gemlca
•WS-PGRADE•Creating complex workflow and parameter sweeps for clusters, service grids, desktop grids, databases•Creating complex applications using embedded workflows, legacy codes and community components from workflow repository•www.wspgrade.hu
4
Motivations of creating gUSE
•To overcome (most of) the limitations of P-GRADE portal:
•To provide better modularity to replace any service•To improve scalability to millions of jobs•To enable advanced dataflow patterns•To interface with wider range of resources•To separate Application Developer view from Application User view
WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment)
and gUSE (Grid User Support Environment) architecture
5
References and WS-PGRADE/gUSE installations
• WS-PGRADE Portal service is available for• GILDA - Training VO of EGEE and other projects• SEE-GRID - South-Eastern European Grid• VOCE - Virtual Organization Central Europe of
EGEE• HunGrid - Hungarian Grid VO of EGEE• NGS - UK National Grid Service • Desktop Grids: SZTAKI DG
• Users and projects using WS-PGRADE/gUSE• EDGeS project (Enabling Desktop Grids for e-
Science)• Integrating EGEE with BOINC and XtremWeb
technologies• User interfaces and tools
• ProSim project• In silico simulation of intermolecular recognition • JISC ENGAGE program
• University of Westminster Desktop Grid• Using AutoDock on institutional PCs
• CancerGrid project• Predicting various properties of molecules to find anti-
cancer leads• Creating science gateway for chemists
6
WS-PGRADE in a nutshell
•General purpose, workflow-oriented portal. Supports the development and execution of workflow-based applications•Based on GridSphere•Services supported by the portal:•New functionalities
•Web services•DB connectors•Embedded workflows•Job level PS•Conditional jobs•Recursive graph•Multi-generator•Multi-collector•CROSS product PS•DOT product PSSolves service+desktop Grids/clusters interoperability problems at workflow level
WS-PGRADE + gUSE = P-GRADE =
Service EGEE grids
(LCG2,GLite)
Globus grids
(GT2,GT4)
Desktop
grids
clusters
Job execution
File storage
Certificate management
Information system
Brokering
Job monitoring
Workflow & job visualization
7
WS-PGRADE architecture
Graphical User Interface: WS-PGRADEGraphical User Interface: WS-PGRADE
WorkflowEngine
WorkflowEngine
Workflowstorage
Workflowstorage File
storage
Filestorage
Applicationrepository
Applicationrepository
LoggingLogging
gUSEinformation
system
gUSEinformation
system
SubmittersSubmitters
Gridsphere portlets
Autonomous Services: high level
middleware service layer
Resources: middleware service layer
Local resources, service grid VOs, Desktop Grid resources, Web services, Databases
Local resources, service grid VOs, Desktop Grid resources, Web services, Databases
gUSE
Meta-brokerMeta-broker SubmittersSubmittersSubmittersSubmitters
Filestorage
Filestorage
SubmittersSubmitters
8
Concrete Workflow
Algorithms, executableResource references,Inputs
Graph
Jobs,Edges,Ports
Template
Constraints,Comments,Form Generators
Workflow Instance
Running state,Outputs
Repository Item
Application ORProject OR,Workflow part(G,T,CW)
Important high-level graph structures in WS-PGRADE
Legend:a b a must reference ba b a may reference b
9
Concrete Workflow
Algorithms,Resource references,Inputs
Graph
Jobs,Edges,Ports
Template
Constraints,Comments,Form Generators
Workflow Instance
Running state,Outputs
Repository Item
Application ORProject OR,Workflow part(G,T,CWI)
New
Edit, CopyDelete
New
New
Configure,Copy, Delete
New
SubmitNew
Workflow handling by Developer
Export
Import
Observe,Download,Suspend,Delete
Edit
1. Create and edit the Graph structure
2. Add name to the graph and configure it (job, port, resources, etc.)
3. Submit the Concrete Workflow, observe its status and fetch its result
4. Restrict some parameters/features of the workflow and create a template
5. Add name to the template and configure it (job, port, resources, etc.)
8. Redefine the originally used graph within the workflow
6. Export workflow, template, graph, to the repository (reuse it later by you or by end users)
Graph2
Jobs,Edges,Ports
7. Reuse an Application, Template, graph or Workflow from repository
10
Concrete Workflow
Algorithms,Resource references,Inputs
Template
Constraints,Comments,Form Generators
Workflow Instance
Running state,Outputs
Repository Item
Application ORProject OR,Workflow part(G,T,CW,WI)
Configure,Delete
Submit
Workflow handling by end user
Import
Observe,Download,Suspend,Delete
1. Import an Application/Template
2. Set the available parameters on the end-user View
3. Submit the Concrete Workflow, observe its status and fetch its result
11
Contents•History, family of products•P-GRADE Portal, WS-PGRADE, gUSE
•WS-PGRADE in a nutshell•Architecture, high level structures, Workflow handling•WS-PGRADE features
•Parameter Study (PS) support in WS-PGRADE•Generator-job-collector type PS•Xconnect-DOTconnect type PS
•Repository•Internal Storage (home dir)•Workflow activation & Timing•Web services•Embedded Workflows•DB support•Conditional job submission
•Howto use WS-PGRADE /Basic + Advanced tasks/ •Case studies
•CancerGrid portal, TINKER application
12
Parameter Study (PS) support in WS-PGRADE•Generator-job-collector type PS•Not totally similar concept as in P-GRADE•P-GRADE: workflow level PS•WS-PGRADE: job level PS
•Xconnect-DOTconnect type PS• X (cross) product• . (dot) product
16
Repository usage & services •Share with others the developed
• Graphs• Workflows - Configured graphs• Templates - Restricted workflows• Applications - Semantically tested, trusted Workflows containing the definitions of the eventually referenced Embedded Workflows (including their transitive closures)• Projects - Applications which are not ready/fully functional yet
•Build up a science/application gateway for end-users• Restrict certain workflow parameters• Provide easy workflow configuration/parameter setup interface• Repository is acting like an interface between developers and end usersNote: Exported/imported “Projects” are not checked for completeness and correctness “Applications” are checked. Name collisions are checked and handled by the system.
17
Repository Export Step 1:The button Export of the requested WF is selected
Step 2:Decide about the way of exporting (in case of Application and Project the Embedded Workflows will be exported with)
Step 3:Enter free input text which appearing in the Import List will describe and identify the object for the user.
Step 4:Execute the Export
Way of working is similar in case of Graph and Template just “Export type” can not be select there
18
Repository import views
•Materials are classified by their type•Latest available lists
• Application • Project• Concrete• Template• Graph
•To import
19
Columns of individual instances, please note, that outputs can be downloaded separately
Internal Storage (user workspace)
Inner slider to encounter and access each Instances Columns of bulk download: All or
proper parts of all instances of a given WF can be downloaded
Information about the quota of the user allotted storage capacity in the Portal server
20
Upload Implementation to the internal Storage
Step 1:Select the compressed file in the client machine containing the requested Workflow Step 2(option):
Check the kind of name(s) you want to redefine
Step 3 (If Step 2 performed):Enter the new name(s) which will not collide
Step 4:Confirm the operation
21
Workflow submission
A workflow can be submitted by 3 different way:1. Interactively started by the user hitting the button
Submit belonging to the given concrete workflow on the portlet Workflow/Concrete (default)
2. Started by a –crontab like - predefined time schedule.The corresponding timetable can be set in the portlet Workflow/Timing
3. Started by an external event. The corresponding event to be waited for and the name of the Workflow can be defined in the portlet Workflow/Remoting
22
Workflow Submission(1)Interactively by the user
Step 1:The workflow is selected by button “Submit”
Step 2:The submission can be confirmed or refused after the optional filling of a free description field identifying Workflow Instance for the user
23
Workflow Submission(2) Start automatically by an internal timer
List of scheduled Workflows
Definition of a new element to be added
Checklist to select an existing Workflow to
be added
Tool to define submission time
Confirm button to append a
new element to the list
Item can be revoked deleting
it from the listThe items will leave the list on schedule time expiration even if the Workflow submission has failed by any cause
•Start a workflow at a predefined time•Valid certificate should be available (if grid is used)•Crontab like working•Single run
24
Workflow Submission(3) Start by external event
Two parts must be defined:•On the Portal Server Side - the name of the Workflow and - the identifier “Remote Key”
•On the Client Side inside of a WSDL description - the URN of service call- the name of the service- the owner of the workflow (user)- the identifier “Remote Key”
- a free string to be associated to Workflow Instance
Note, that a Service call referencing a common Remote Key used in more than one Portal Server Side description submits all the associated Workflows
25
Workflow Submission(3) Start by external event – Portal Side
Name of WF to be appended to the list of remotely callable WF- s
Definition of Keyword to identify the call
Any Workflow can be deleted from the listeners.
The delete button can accessed only in the not
hidden state
List of listening Workflows
Button to hide the stored Keyword and the button
Delete
Append button
Button to see a stored Keyword
Check list can select each of the
available workflows
26
Workflow Submission(3) Start by external event – Client side: WSDLURL of the
Portal SERVER Predefined name of the Service
pUser Owner of the Workflow (Portal User)
pID Remote Key has been defined to identify the call
pText Free string to identify the Workflow instance
The client should provide a Service call conforming the WSDL.
27
Web service support
Principle:
• Job is a web service user can reach an existing remote service with the following attributes:1. Type:
Base standard type of the Web Service. The administrator of the portal server sets the list of standards the portal can understand. The default value is Axis.
2. Services:Selection list of services of the given type having been explored by the Portal Server
3. Methods:Selection list functions the selected service implements
Concrete WF List
Selected WF
Selected Job
Job Executable Job Inputs & Outputs JDL/RSL Job Config. History
Job is Workflow Job is Service Job is Binary
28
Web service support (contd.)
Step 1. Select Service as Job interpretation class
Step 2. Select type of Service to be understood
Step 3. Select one from the found services
Step 4. Select a Method among the interface routines of Service
30
Embedded workflow support
Principle:
Job is an embedded workflow
Original Workflow
Embedded WorkflowTo ensure the compatibility of interfaces the embedded workflow must be defined by a Template
The dummy job whose execution will be substituted by the call of the embedded one
31
Embedded workflow support
Focus on the caller Job
Step 2. Embedded (called) Workflow is selected by the check list showing the possible templated Workflows
Focus has been set by selecting a job
Step 1. Select (embedded) Workflow as Job interpretation class
34
DB support (Datasource: SQL Database)
<any>.mdbDepartments
24ML
26FK
15MA
AgeGenderName
Person
<Db_URL>
External File name
SQL
Upload
Remote
Value
SQL URL (UDBC)
USER
PASSWORD
SELECTAge from Person where Gender =“M”
<DB_URL>
<any_user>
015 1
24
Step 1 (offline)Create Database on a remote site
Step 2 (online)Port configuration:set Query
Step 3(online)Port configuration:File generation from result set
35
Conditional job submissionRule: To any input port a boolean conditional expression may be attached.They will be evaluated upon the value(s) of the referenced port(s). The false value inhibits the execution of the actual and subsequent jobs.The state of the failed job is indicated as “Term_false” the eventual subsequent jobs will remain in “Init” state.
Expression Syntax: <File value> <Operator> <Constant string> | <File value> <Operator> <reference of a foreign port>where <Operator>: == | != | contain
The comparison of the file value (regarded as a string) associated to the current port and of other operand which may be a string or the string file content of a different port results true or false depending on the string operation, where == means equal, != means not equal, and contain means, that the left operand contains the right operand.
AND operation is assumed for more than one ports associated with conditional expressions: B1
B2
Do not execute if
B1 AND B2 = FALSE
36
Howto use WS-PGRADE Basic + Advanced tasks
• Create Graph, or reuse an existing one• Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs)• Submit and check the running Workflow• Create template and share it as application through the repository
37
Graph Editor - Graph Creation There are two ways to create a Workflow Graph (WfG):
•Opening the Graph Editor (with the proper button) in the portlet Workflow/Graph
•Clone an existing one (Saving the actual Graph with a new name)
38
Graph Editor detailed - Build up a WF Graph
New Job New Port Right Click Port Property
Select Output
Port name can be changed
Comment can be inserted
Close by OK
New JobNew Port
Press mouse…
,and drag to Input Port
39
Howto use WS-PGRADE Basic + Advanced tasks
• Create Graph, or reuse an existing one• Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs)• Submit and check the running Workflow• Create template and share it as application through the repository
40
Development cycle – Creating the Concrete Workflow•Create Concrete Workflow from Graph, template, different workflow• give a name to the Concrete Workflow• give notes
41
Howto use WS-PGRADE Basic + Advanced tasks
• Create Graph, or reuse an existing one• Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs)• Submit and check the running Workflow• Create template and share it as application through the repository
42
Setup Concrete Workflow
•Setup all the job properties•Execution model:
•Workflow, Service, Binary
•Type:GEMLCA,GLITE,DG,GT-4,GT-2,Local•Grid resource•Type of binary
•SEQ, MPI, Java
•Number of MPI Nodes•Executable•Additional parameters
•Setup all the port properties
43
Workflow configuration Hierarchy
Concrete WF List
Selected WF
Selected Job
Job Executable Job Inputs & Outputs JDL/RSL Job Config. History
Job is Workflow Job is Service Job is Binary
44
Workflow Configuration; step-by-step
Select Configure
Select a job by mouse click
Fill the job property characteristics.Details have discussed previously.
Select Port Property Configuration
Fill port property characteristics. Details have discussed previously.
Select JDL/RSL Configuration
Select one of the JDL/RSL Configuration Parameters of the list box
Insert a definition
Confirm the settingsClose the configuration of this job
Save & Upload the Workflow configuration. Remind eventual error messages!
Return to main view
Note the inner slider:By moving it you can encounter –and make visible – any Port of the current job
45
Howto use WS-PGRADE Basic + Advanced tasks
• Create Graph, or reuse an existing one• Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs)• Submit and check the running Workflow• Create template and share it as application through the repository
46
Concrete WF List
Workflow
History File
Selected Job
Job Exec Conf Port Configurations JDL/RSL file
Workflow instance List
Workflow Instance
Job List
Details Configure
Short Details
Std Out
Job
Job Instance List
View Contents
Job Instance
Output FileSTD Error File STD Out File Log File
Std ErrOutputLog
Job List
Info,
Submit,
Delete/Abort All
Suspend/Resume/Delete,
Visualize
Job Executable
WF visualization
Job Inputs & Outputs JDL/RSL
Job Config. History
Locate Item in graph
Locate Item
Locate Item
Locate Item
Locate ItemBasic Workflow
management with WS-PGRADE Portal
In case of PS Workflows the list “Job Instance” may contain more than one elements with
different “PID” -s
47
Howto use WS-PGRADE Basic + Advanced tasks
• Create Graph, or reuse an existing one• Create Concrete Workflow from Graph • Setup Concrete Workflow (ports, jobs)• Submit and check the running Workflow• Create template and share it as application through the repository
49
Contents•History, family of products•P-GRADE Portal, WS-PGRADE, gUSE
•WS-PGRADE in a nutshell•Architecture, high level structures•Workflow handling
•By developers•By end users
•WS-PGRADE features•Parameter Study (PS) support in WS-PGRADE
•Generator-job-collector type PS•Xconnect-DOTconnect type PS
•Workflow timing•web service support •Embedded workflows•DB support
•Basic Workflow management with WS-PGRADE Portal•Repository usage•Principles of Workflow management
•Case studies•ProSim portal, CancerGrid portal, TINKER application
50
• EU Framework Program 6• Title: Grid Aided Computer System For Rapid Anti-
Cancer Drug Design • Project period
• January 1, 2007 – December 31, 2009 (+ extended period)
• Goals:• Developing focused libraries with a high content of anti-
cancer leads, building models for predicting various molecule properties
• Developing a computer system based on grid technology, which helps to accelerate and automate the in silico design of libraries for drug discovery processes
The CancerGrid project (Case Study 1)
51
moleculedatabase
executingworkflows
browsing molecules
DG clients from all partners Molecule database server
Portal and DesktopGrid
server
BOINCserver
3GBridge
Portal
DG jobs
WU 1WU 2WU N
Job 1Job 2Job N
GenWrapper forbatch execution
BOINC client
LegacyApplication
PortalStorage
LocalResource
Local jobs
LegacyApplication
WU X
WU Y
The CancerGrid infrastructure
52
The CancerGrid portal (gUSE & SZTAKI DG)
Workflow development & configuration
Algorithms configuration
Workflow execution
Molecule database browser
Integrated components of CancerGrid portal
Structure viewer
53
TINKER Conformer Generator (Case study 2.)P-GRADE/gUSE Workflow - background
• The application uses the open-source TINKER library, which contains tool for molecular design:•http://dasher.wustl.edu/tinker/
• The TINKER molecular modeling software (written in Fortran language) is a complete and general package for molecular mechanics and dynamics, with some special features for biopolymers.
54
Problem: The application to be gridified•A combination of these tools can be used for molecular modeling (so-called QSAR studies for drug development). The original application was developed by a Hungarian biochemist Ferenc Ötvös, PhD, from the Biological Research Center, Szeged, Hungary.
• Target end users are biologists or chemists, who need to examine conformers with the TINKER package.• The application generates conformers by unconstrained molecular dynamics at high temperature to overcome conformational bias then finishes each conformer by simulated annealing and/or energy minimization to obtain reliable structures.• The aim is to obtain conformation ensemble(s) to be evaluated by multivariate statistical modeling.
55
Problem: The application to be gridified (contd.)
•The sequential application generates 50.000 conformers and executes TINKER algorithms on them. It runs for about 7 days on a 2 GHz single CPU 1 GB memory machine.
START Read input
Peptidedefinition:Sequence,Chirality
Generate 1000 Ktrajectory snapshots
ParameterFile 1
T1 Ti Tn
minimize
ParameterFile 2
TM1
300 K dynamics
ParameterFile 3
TD1 minimize
ParameterFile 2
TDM1
Simulated annealing
ParameterFile 4
TSA1 minimize
ParameterFile 2
TSAM1
n = 50.000
56
Gridification process…
• In the original sequention application, the TINKER algorithms are called by Bash unix scripts, which are also applicable in EGEE grids.
• As it can be seen from the previous figure:
• The sequential generation of 50.000 conformers cannot be parallelized.• The rest of the application performs 5 algorithms on each conformers in 3 threads, which can run in parallel.
The gridified P-GRADE workflow
The gridified gUSE workflow
57
Data handling•The input TINKER package is 4.5 MBs.•The generator job (runs for around 15 hours) creates 50 tarballs containing 1000 conformers each, and the TINKER package. Each one of these files are 6.1 MBs.•Offline runtime and output of the PS jobs:
•First (dyn300k+min300k): ~40 min, 2.2 MBs•Second (min1000k): ~35 min, 1.1 MBs•Third (Simann+min): ~60 min, 2.2MBs
•The final output file is around 300 MBs.•The generator and collector input and output files are stored in storage elements. •Results:
• Tested in 3 VOs (VOCE, SEE-GRID-SCI, Biomed)• A significant speedup with 7 times faster execution achieved in average.
58
Thank you for your attention!Questions?
top related