vph noe, 2010 accessing european computing platforms stefan zasada university college london

36
VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Accessing European Computing Platforms

Stefan Zasada

University College London

Page 2: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Contents

• Introduction to grid computing• What resources are available to VPH?• How to gain access to computational resources• Deploying applications on EU resources• Making things easier with VPH ToolKit components• Coupling multiscale simulations• AHE demonstration• Going further

Page 3: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

What is a Grid?

Allow resource sharing between multiple, distinct organisations

Cross administrative domains

Security is important – only authorized users are able to use resources

Differs from cluster computing

“Computing as a utility” – Clouds too?

Page 4: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Resources Available to VPH Researchers

DEISAHeterogeneous HPC infrastructure currently formed by

eleven European national supercomputing centres that are interconnected by a dedicated high performance network.

EGI/EGEEConsists of 41,000 CPU available to users 24 hours a

day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs.

EGI includes national grid initiatives Clouds – Amazon EC3 etc

Page 5: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Compute Allocations for VPH

Conducted a survey of all VPH-I projects and Seed EPs to assess their computational requirements.

This was used to prepare an application to the DEISA Virtual Communities programme for an allocation of resources to support the work of all VPH-I projects, managed by the NoE.

We have set up these VPH VC allocations and run them for the past 2 years – now extended until the end of DEISA in 2011

Page 6: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

EGEE/EGIEGEE has transformed into EGI and will start to

integrate national grid infrastructuresConsists of 41,000 CPU available to users 24 hours a

day, 7 days a week, in addition to about 5 PB disk (5 million Gigabytes) + tape MSS of storage, and maintains 100,000 concurrent jobs.

EGEE/EGI have agreed to provide access to VPH researchers through their BioMed VO. Ask for more details if you would like to use this.

If you would like access contact [email protected]

Page 7: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

DEISA Project Allocations

+ euHeart in second wave, and other non-VPH EU projects

VPH was awarded 2 million standard DEISA core hours for 2009,

renewed for 2010 and 2011  • HECToR (Cray, UK)

• SARA (IBM Power 6, Netherlands)

Page 8: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Choosing the right resources for your application

Parallel/MPI/HPC applications:

Use DEISA (some DEISA sites have policies to promote large-scale computing.

Embarrassingly parallel/task farming

Use EGI/National Grid Initiative

Split multiscale application between appropriate resources

Page 9: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Uniform access to resources at all scales

Middleware provides underlying computational infrastructure infrastructure

Allows the different resources to be used seamlessly (grid, cloud etc).

Automates machine to machine communication with minimal user input

Virtualises resources – user need know nothing of the underlying operating systems etc

Requires X.509 certificate!

Page 10: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Accessing DEISA/EGEEDEISA

Provides access through Unicore/SSH (some sites have Globus)

Applications compiled at terminal, then submitted to batch queue or via Unicore

EGI/EGEEAccessed via gLite, often submit job to compile

National Grid InitiativeShould run gLite if part of EGI

May run Globus/Unicore

Page 11: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

The Application Hosting Environment

Based on the idea of applications as web services

Lightweight hosting environment for running unmodified applications on grid resources (DEISA, TeraGrid) and on local resources (departmental clusters)

Community model: expert user installs and configures an application and uses the AHE to share it with others

Simple clients with very limited dependencies

Non-invasive – no need for extra software to be installed on resources

http://toolkit.vph-noe.eu/home/tools/compute-resources/tools-for-accessing-compute-resources/ahe.html

Page 12: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Motivation for the AHEProblems with current middleware solutions:

Difficult for an end user to configure and/or install

Dependent on lots of supporting software also being installed

Require modified versions of common libraries

Require non-standard ports to be opened on firewall

We have access to many international grids running different

middleware stacks, and need to be able to seamlessly

interoperate between them

Page 13: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

AHE Functionality

Launch simulations on multiple grid resources

Single interface to monitor and manipulate all simulations launched on the various grid resource

Run simulations without manually having to stage files and GSISSH in

Retrieve files to local machine when simulation is done

Can use a combination of different clients – PDA, desktop GUI, command line

Page 14: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

AHE Design Constraints

Client does not have require any other middleware installed locally

Client is NAT'd and firewalled

Client does not have to be a single machine

Client needs to be able to upload and download files but doesn’t have local

installation of GridFTP

Client doesn’t maintain information on how to run the application

Client doesn’t care about changes to the backend resources

Page 15: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

AHE 2.0 Provides a Single Platform to:

Launch applications on Unicore and Globus 4 grids by acting as an OGSA-BES and GT4 client

Create advanced reservations using HARC and launch applications in to the reservation

Steer applications using the RealityGrid steering API or GENIUS project steering system

Launch cross-site MPIg applications

AHE 3 plans to support:

SPRUCE urgent job submission

Lightweight certificate sharing mechanisms

Page 16: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Virtualizing ApplicationsApplication Instance/Simulation is central entity; represented

by a stateful WS-Resource. State properties include:simulation owner

target grid resource

job ID

simulation input files and urls

simulation output files and urls

job status

Page 17: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

AHE Reach

DEISA

UK NGS

Leeds

Manchester

Oxford

RAL

HPCx

NGS

Local resources

GridSAM

Globus

UNICORE

TeraGrid

Globus

Page 18: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 201018

Cross site access

18

Scientific Application (e.g. bloodflow simulation HemeLB)

Application Hosting Environment (AHE) as scientific-specific Client technology

Globus GRAM 4 Interface

TeraGrid Globus 4 stack

Security Policies

with gridmaps

GLUE-based

information

GridFTP Interface

HPC Ressource

within NGS

HPC Ressource

within DEISA

Storage Resources

(e.g. tape archives, robots)

Common Environment

massively parallel jobs with

HemeLB

Common Environment

DEISA Modules with HemeLB

massively parallel jobs with HemeLB

OGSA-BES Interface

Grid Middleware

UNICORE with BES

Security Policies

(e.g. XACML)

GLUE-based

information

Stefan Zasada, Steven Manos, Morris Riedel, Johannes Reetz, Michael Rambadt et al., Preparation for the Virtual Physiological Human (VPH) project that requires interoperability of numerous Grids

Page 19: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

AHE DeploymentAHE Server

Released as a VirtualBox VM image – download image file and import to VirtualBox

All required services, containers etc pre-configured

User just needs to configure services for their target resources/applications

AHE Client

User’s machine must have Java installed

User downloads and untars client package

Imports X.509 certificate into Java keystore using provided script

Configures client with endpoints of AHE services supplied by expert user

Page 20: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Hosting a New Application

Expert user must:

Install and configure application on all resources on which it is being shared

Create an XSLT template for the application (easily cloned from exiting template)

Add the application to the RMInfo.xml file

Run a script to reread the configuration

Documentation covers whole process of deploying AHE & applications on NGS and TeraGrid

Page 21: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Authentication via local credentials

A designated individual puts a single certificate into a credential repository controlled by our new “gateway” service.

User uses a local authentication service to authenticate to our gateway service. Our gateway service provides a session key (not shown) to our modified AHE client

and our modified AHE Server to enable the AHE client to authenticate to the AHE Server.

Our gateway service obtains a proxy certificate from its credential repository as necessary and gives it to our modified AHE Server to interact with the grid.

User now has no certificate interaction. Private key of the certificate is never exposed to the user.

User

Credential Repository

Our “gateway” service

Grid Middleware Computational Grid

PI, sysadmin, …

certificate

Modified AHE Server

using our modified AHE client and a

local authentication service

Audited Credential Delegation

Page 22: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

ProjectName CertificateResourceName Certificate

Proxy KeyCertificate Key

ResourceName ProjectName

Proxy UserID

Credential Repository

DB_LAS

u1,pw1

u2,pw2

u3,pw3u4,pw4

Local Database Authentication Server Authorization Module

Paramterized Role Based Access Control

Gateway Service

Trusted CA Revoked CA

Role [Tasks]UserID [Role]

u1,pw1

ACD Architecture

RolePermission: Role [Task]UserRole: UserID [Role]

Recent AHE + ACD usability study found combined solution to be much more usable than other interfaces

Page 23: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Deploying multiscale applications

Loosely coupled:• Codes launched sequentially – output of one app is input to

next

• Could be distributed across multiple resources

• Perl scripts

• GSEngine

Tightly coupled:• Codes launched simulations

• Could be distributed across multiple resources

• RealityGrid Steering API

• MPIg

Page 24: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Workflow ToolsMany tools exist to allow users to connect up

applications running on a grid in to workflow:Taverna, OMII-BPEL, Triana, GSEngine, Scripts

Most workflow tools interface to and orchestrate web services

Workflows can also include data access, data processing and analysis services

Can assist with connecting different VPH models together

Page 25: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Constructing workflows with the AHEBy calling command line clients from Perl script complex workflows

can be achieved

Easily create chained or ensemble simulations

For example jobs can be chained together:

ahe-prepare prepare a new simulation for the first step

ahe-start start the step

ahe-monitor poll until step complete

ahe-getoutput download output files

repeat for next step

Page 26: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Constructing a workflowahe-prepare

ahe-start

ahe-monitor

ahe-getoutput

Page 27: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Computational TechniquesEnsemble MD is suited for HPC GRID

•Simulate each system many times from same starting position•Each run has randomized atomic energies fitting a certain temperature•Allows conformational sampling

Start ConformationSeries of Runs End Conformations

Cx C2

C1

C4

C3

Equilibration Protocolseq1 eq2 eq3 eq4 eq5 eq6 eq7 eq8

Launch simultaneous runs

(60 sims, each 1.5 ns)

Page 28: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

GSEngineGSEngine is a workflow orchestration engine developed by the

ViroLab projectCan be used to orchestrate applications launched by AHEIt allows services to be orchestrated using both point and click and

scripting interfacesWorkflows stored in a repository and shared between users Many of the aims of ViroLab similar to VPH-I projects, so GSEngine

will be useful hereIncluded in VPH ToolKit: http://toolkit.vph-noe.eu/home/tools/compute-

resources/workflow/gsengine.html

Page 29: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Closing the loop between simulation and user

Monitoring a running simulation:

Pause, resume or stop if required

Altering parameters in a running simulation

Receive feedback via on-line visualization

Restarting a simulation from a known checkpoint

Reproducibility, provenance

Computational Steering

RenderSimulateControl Visualize

Page 30: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

AHE/ReG Steering Integration

Simulation

Steering library

VisualizationVisualization

Registry

Steering WS

Steering WS

connect

register

bind

data transfer

(sockets/files)

bind

Steering library

Steering library

AHE launch

start

Extended

AHE Client

launch

Slide adapted from Andrew Porter

Page 31: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Cross-site Runs with MPI-g

MPI-g has been designed to run allow single applications run across multiple machines

Some problems won’t fit on a single machine, and require the RAM/processors of multiple machines on the grid.

MPI-g allows for jobs to be turned around faster by using small numbers of processors on several machines - essential for clinician

Page 32: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

MPI-g Requires Co-Allocation

We can reserve multiple resources for specified time periods

Co-allocation is useful for meta-computing jobs using MPIg, viz and for workflow applications.

We use HARC - Highly Available Robust Co-scheduler (developed by Jon Maclaren at LSU).

Slide courtesy Jon Maclaren

Page 33: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

HARC

HARC provides a secure co-allocation serviceMultiple Acceptors are used

Works well provided a majority of Acceptors stay alive

Paxos Commit keeps everything in sync

Gives the (distributed) service high availability

Deployment of 7 acceptors --> Mean Time To Failure ~ years

Transport-level security using X.509 certificates

HARC is a good platform on which to build portals/other servicesXML over HTTPS - simplerthan SOAP services

Easy to interoperate with

Very easy to use with the Java Client API

Page 34: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Multiscale Applications on European

e-Infrastructures (MAPPER)

VPHFusion

Computional Biology

MaterialScience

Engineering

MAPPER

Distributed Multiscale Computing Needs

Partners: UvA, UCL, UU, PSC, Cyfronet, LMU, UNIGE, Chalmers, MPG

€2.5M, starting 1st October for

3 years

Page 35: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

MAPPER Infrastructure

Page 36: VPH NoE, 2010 Accessing European Computing Platforms Stefan Zasada University College London

VPH NoE, 2010

Going further1. Download and install AHE Server:

Download: http://www.realitygrid.org/AHE

Quick start guide: http://wiki.realitygrid.org/wiki/AHE

2. Host a new application (e.g. /bin/date)Generic app tutorial: http://wiki.realitygrid.org/wiki/AHE

3. Modify these example workflow scripts to run your application:

http://www.realitygrid.org/AHE/training/ahe-workflow.tgz