1 grid programming in java & python prepared by gregor von laszewski argonne national laboratory...

69
1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory [email protected] Keith R. Jackson Lawrence Berkeley National Laboratory [email protected]

Upload: camilla-jefferson

Post on 11-Jan-2016

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

1

Grid Programming in

Java & Python

Prepared by Gregor von Laszewski

Argonne National Laboratory

[email protected]

Keith R. Jackson

Lawrence Berkeley National Laboratory

[email protected]

Page 2: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

2

Chapter Outline

Introduction to Grids– What is a Grid?– What is the Globus Toolkit?– What is a Commodity Grid Kit?

Using and Programming Grids with the JavaTM and Python CoG Kits– Secure access to remote resources– Remote job submission and monitoring– Distributed grid information management and remote data

access– Graphical component to access a Grid

Conclusion for this part of the presentation Exercises

Page 3: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

3

Integration of High-end Resources

On-demand creation of powerful virtual computing systems

Sensor nets

Data archives

SupercomputersSoftwarecatalogs

Colleagues

Page 4: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

4

Grid Computing and Existing Technologies

Commonalities between “Grid computing” and major industrial thrusts– Business-to-business, peer-to-peer, application service

providers, storage service providers, distributed computing, Internet computing

Differences between Grid computing and existing technologies – Complicated requirements: “Run program X at site Y subject

to community policy P, providing access to data at Z according to policy Q”

– High performance: unique demands of advanced & high-performance systems

Page 5: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

5

• Authenticate once

• Specify simulation (code, resources, etc.)

• Locate resources

• Negotiate authorization, acceptable use, etc.

• Acquire resources

• Initiate computation

• Steer computation

• Access remote datasets

• Collaborate on results

• Account for usage Domain 2

Domain 1

What challenges do we have to solve?

Page 6: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

6

An Example Application:An Advanced Scientific Instrument

Avatar

Virtual Reality Cave

Scientist

Advanced Photon Source

Electronic Library

and Databases

Computing Portal Clients

Supercomputer

Page 7: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

7

The Globus Toolkit

The Globus Toolkit provides a range of basic Grid Services – Security, information, fault detection, communication,

resource management

These services are simple and orthogonal– Independently use: mix and match– Programming model independence

For each service there is generally a well-defined API Standards are used extensively

– E.g. LDAP, GSS-API, X.509

Page 8: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

8

Globus Approach

A toolkit and collection of services addressing key technical problems– Modular “bag of services” model– Not a vertically integrated solution– General infrastructure tools (aka middleware) that

can be applied to many application domains Interdomain issues, rather than clustering

– Integration of intradomain solutions Distinction between local and global services

Page 9: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

9

Globus Hourglass

Focus on architecture issues– Propose set of core

services as basic infrastructure

– Use to construct high-level, domain-specific solutions

Design principles– Keep participation cost low– Enable local control– Support adaptation

“IP hourglass” model

Diverse global services

Core Globusservices

Local OS

A p p l i c a t i o n s

Page 10: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

10

Layered Grid Architecture

Application

Fabric

Connectivity

Resource

Collective

InternetTransport

Applica-tion

Link

Inte

rnet P

roto

col

Arch

itectu

re

Page 11: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

11

Production Grids & Testbeds

NASA’s Information Power GridThe Alliance National Technology Grid

GUSTO Testbed

Page 12: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

12

Key Grid Services

Resource Discovery– List all of the solaris machines with at least 16 processors

and 2 Gig’s of memory

Resource Acquisition– Submit my job to this machine

Data Management/Movement– Locate, or create, a replica of this data set– Move this data set to this host

Security– Authentication– Authorization

Page 13: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

13

Grid Security Infrastructure

Standards based infrastructure that provides:– Single sign-on– Authentication & data

integrity/confidentiality– Delegation

Based on:– X.509 Proxy certificates (draft ietf pkix

standard)– TLS– GSS-API

Page 14: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

14

Delegation in GSI

Compute Resource

Client

Tertiary Storage

1) Mutual Authentication using TLS

GSI Proxy

2) Delegate GSI credential

GSI Proxy

3) Submit Job

4) Mutual Authentication using TLS (using GSI proxy)

5) Retrieve Data Set

6) Run computation

Page 15: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

15

GridFTP

Implements many of the optional RFC’s for the ftp protocol.– Authentication using the Grid Security Infrastructure through

the GSS-API– Restart markers– Third-party transfers

Adds some new extensions to support high-performance transfers.– Parallel streams– Striped servers– TCP Buffer Tuning

Page 16: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

16

Definition: CoG Kit

A Commodity Grid (CoG) Kit defines and implements a set of general components that map Grid functionality into a commodity environment.

– Web/CGI, Java, Python, CORBA, DCOM, .... – CoG Kits help us build applications, Problem Solving

Environments, and Portals.– CoG Kits take the good things from framework & Grid

Page 17: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

17

To further elaborate …

Computational Grids– Globus Infrastructure

Commodity Technologies– CORBA, Java, Python, Perl, Web technologies,

...

Science Application

Computing Portal– Problem Solving Environments– Workbenches

++

=

+

Page 18: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

18

Motivation: Java & Python CoG Kits• Use and leverage existing technologies for Grid programming

- The capabilities of the framework onto which Grid Services are mapped can be exploited:

Objects, Events, Exceptions, JNDITM, ...

- Objects like jobs/tasks can be defined.

- XML support is provided.

- GUI's, ...., IDE's can be used (Forte, BOA Constructor) • Maximize software flexibility, extensibility, and reusability• Provide foundations for application developer teams that are

familiar to develop applications in this framework

- Reduce development and maintenance cost• Use as glue for many technologies

• Python is well suited to tying together many different languages/technologies.

Page 19: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

19

What is the Java CoG Kit ?

The Java CoG Kit provides a mapping between Java and the Globus Toolkit. It extends the use of Globus by enabling to access advanced Java features such as events and objects for Grid programming.

The Java CoG Kit is implemented in pure Java. It speaks the Grid protocols.

It is not a wrapper of the C Globus Toolkit This allows integration within applets. Mostly client side support

Page 20: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

20

What is the Python CoG Kit?

Similarly the Python CoG Kit provides a mapping between Python and the Globus Toolkit. It extends the use of Globus by enabling to access advanced Python features such as events and objects for Grid programming.

The Python CoG Kit is implemented as a series of Python extension modules that wrap the Globus C code.

Uses SWIG (http://www.swig.org) to help generate the interfaces.

Page 21: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

21

Status: Java CoG Kit

Modified core Globus components (Protocols) Basic services are provided accessing:

– Security (GSI) – Remote job submission and monitoring (GRAM)– Quality of service (GARA)– Remote Data Access (GSIFTP)– Information Service Access (MDS)– Certificate store (myProxy)

Current 100% client side components includes Reusable Grid GUI components

Page 22: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

22

Status: Python CoG Kit

Basic services are provided accessing: – Security (security) – Remote job submission and monitoring (gramClient)– Gass Server (gassServerEZ)– Secure high-performance network IO (io)– Protocol independent data transfers (gassCopy)– High performance Grid FTP transfers (ftpClient)– Support for building Grid FTP servers (ftpControl)– Remote file IO (gassFile)– Replica Catalog (replicaCatalog)– Replica Management (replicaManagement)– GSI enabled SOAP (GSISOAP)

Page 23: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

23

Python CoG Kit basics

Everything is contained in the pyGlobus package. The low-level C-wrappers are all named modulec,

e.g., ftpClientc, gramClientc– Shouldn’t need to access these directly, instead use the

Python wrappers (ftpClient, gramClient)

Exceptions are used to handle error conditions– pyGlobus.util.GlobusException is the base class for all

pyGlobus exceptions. It extends the base Python exceptions.Exception class

The Python wrappers classes manage the underlying memory

Page 24: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

24

Building the Python CoG Kit

Uses standard GNU tools (autoconf, make) Download the latest source from http://www-

itg.lbl.gov/Grid/projects/pyGlobus/ Tar zxvf pyGlobus-VERSION.tar.gz cd pyGlobus; mkdir build; cd build ../configure

– Use –with-flavor=<Globus flavor string> to override the default flavor

– Use –with-python=<path to python> to override the default python installation

make; make install– Installs into your site-packages directory (No need to edit

PYTHONPATH)

Page 25: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

25

Where do we go from here?

On which computers can I perform my task?– Use a Grid information service (MDS) to answer this.– Use the LDAP browser as a good example of how to

develop a Grid-based Portal to a directory based information service.

How can I execute a job on a remote machine?– Use the job submission API or GUI

How can I access data on a remote machine?– Use Grid FTP.– Use the protocol independent gassCopy module.– Remote file IO using the gassFile module.

Secure high-performance network IO.

Page 26: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

26

LDAP Browser and Editor User-friendly Windows Explorer-like interface to LDAP

directories– It allows browsing and modification of information

– It is written entirely in Java• Uses JFC (SwingSet) for the GUI• Uses JNDI class library for LDAP access• Connects to LDAP v2 and v3 servers.

– Note: v3 standard is defined as we speak

Recognition– More than 1600 commercial user licenses

– Many users in academia (license for free)

– Winner of the Best Student Application award in the Novell Developers' Contest

– Jars top 25%

Page 27: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

27

Features

Browsing, editing, searching Export Binary attributes, customizable attribute editors Object templates LDAP v3 aware, SSL support Drag and drop, copy-and-paste interface Named sessions Applet support

Page 28: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

28

Features

Browsing, editing, searching Export Binary attributes, customizable attribute editors Object templates LDAP v3 aware, SSL support Drag and drop, copy-and-paste interface Named sessions Applet support

Page 29: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

29

Features

Browsing, editing, searching Export Binary attributes, customizable attribute editors Object templates LDAP v3 aware, SSL support Drag and drop, copy-and-paste interface Named sessions Applet support

Page 30: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

30

Screenshot

Page 31: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

31

Tiny Information Query ProgramMDS mds = new MDS ("mds.globus.org", 389, "o=Grid")

MDSresult result = mds.search ("&(objectclass=GridComputeResource) (freenodes=64))", "contact freenodes totalnodes");

String contact = result.get("contact");

System.out.println(result.print());

This can also be done with JNDI or Netscape SDK We have this layer for portability

Page 32: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

32

Remote Job Submission

Access to remote resources is an elementary Grid problem

Supports delegation to the remote resource

Page 33: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

33

Gram Submission Demo

Hello World

1

2

3

Bang

A test to stderr

3

2

1

Bang

Page 34: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

34

C Job Submission Example

callback_func(void *user_arg, char *job_contact, int state, int errorcode){ globus_i_globusrun_gram_monitor_t *monitor; monitor = (globus_i_globusrun_gram_monitor_t *) user_arg; globus_mutex_lock(&monitor->mutex); monitor->job_state = state; switch(state) { case GLOBUS_GRAM_PROTOCOL_JOB_STATE_PENDING:{ globus_i_globusrun_gram_monitor_t *monitor; monitor = (globus_i_globusrun_gram_monitor_t *) user_arg; globus_mutex_lock(&monitor->mutex); monitor->job_state = state; switch(state) { case GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED: if(monitor->verbose) { globus_libc_printf("GLOBUS_GRAM_PROTOCOL_JOB_STATE_FAILED\n"); } monitor->done = GLOBUS_TRUE; break; case GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE: if(monitor->verbose) { globus_libc_printf("GLOBUS_GRAM_PROTOCOL_JOB_STATE_DONE\n"); } monitor->done = GLOBUS_TRUE; break; } globus_cond_signal(&monitor->cond); globus_mutex_unlock(&monitor->mutex);}

Page 35: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

35

C Job Submission Example (cont.)globus_l_globusrun_gramrun(char * request_string, unsigned long options, char *rm_contact){ char *callback_contact = GLOBUS_NULL; char *job_contact = GLOBUS_NULL; globus_i_globusrun_gram_monitor_t monitor; int err; monitor.done = GLOBUS_FALSE; monitor.verbose=verbose; globus_mutex_init(&monitor.mutex, GLOBUS_NULL); globus_cond_init(&monitor.cond, GLOBUS_NULL);

err = globus_module_activate(GLOBUS_GRAM_CLIENT_MODULE); if(err != GLOBUS_SUCCESS) { … } err = globus_gram_client_callback_allow( globus_l_globusrun_gram_callback_func, (void *) &monitor, &callback_contact); if(err != GLOBUS_SUCCESS) { … } err = globus_gram_client_job_request(rm_contact, request_string, GLOBUS_GRAM_PROTOCOL_JOB_STATE_ALL, callback_contact, &job_contact); if(err != GLOBUS_SUCCESS) { … } globus_mutex_lock(&monitor.mutex); while(!monitor.done) { globus_cond_wait(&monitor.cond, &monitor.mutex); } globus_mutex_unlock(&monitor.mutex); globus_gram_client_callback_disallow(callback_contact); globus_free(callback_contact);

globus_mutex_destroy(&monitor.mutex); globus_cond_destroy(&monitor.cond);

Page 36: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

36

Gram

Creating a job

– GramJob job = new GramJob ("&(executable=/bin/sleep)(arguments=15)");

Listening to state changes via Listeners– job.addListener( new GramJobListener() {

public void statusChanged(GramJob job) { System.out.println(“Job [” +

job.getIDAsString() + “]:" + " Status : "+ job.getStatusAsString());

} });

Page 37: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

37

gramClientfrom pyGlobus import gramClientfrom threading import *cond = 0condV = Condition(Lock())

try:

gramClient = GramClient()

callbackContact = gramClient.set_callback(func, condV)

jobContact = gramClient.submit_request(rm,rsl,

gramClient.JOB_STATE_ALL, callbackContact)

condV.acquire()

while cond == 0:

condV.wait()

condV.release()

except GramClientException, ex:

print ex

Page 38: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

38

gramClient Job State Callback

def func(cv, contact, state, error):

global cond

if state == gramClient.JOB_STATE_FAILED:

print "Job failed"

cv.acquire()

cond = 1

cv.notify()

cv.release()

elif state ==

gramClient.JOB_STATE_DONE:

print "Job is done"

cv.acquire()

cond = 1

cv.notify()

cv.release()

Page 39: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

39

Redirecting stdout with gramClient

from pyGlobus import gassServerEZ

opts = gassServerEZ.STDOUT_ENABLE

server = gassServerEZ.GassServerEZ(opts)

url = server.getURL()

rsl = "&(executable=/bin/sleep)(arguments=15)

(stdout=%s/dev/stdout)“ % url

(normal job submission from here)

Page 40: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

40

Remote Data Transfer

Directly use the Grid FTP protocol to transfer the data.

Use the gassCopy module for protocol independent transfers. It currently supports the ftp, gsiftp, http, and https protocols.

Page 41: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

41

Java UrlCopy (protocol independent file transfer)

import org.globus.io.urlcopy.*;

UrlCopy c = new UrlCopy();

c.setSourceUrl(from);

c.setDestinationUrl(to);

c.setUseThirdPartyCopy(true); // hint to enable thridparty transfer

// register a transfer listener....

c.setListener(new UrlCopyListener() {

public void transfer(int total, int current) {

System.out.println(total + " " + current); }

public void transferError(Exception e) {

System.out.println("transfer failed: " + e.getMessage());

}

});

c.copy(); // this starts the copy

Page 42: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

42

GSI FTPClient (not GridFTP)

import org.globus.io.ftp.*;

GSIFTPClient ftp = new GSIFTPClient(host, port);

ftp.authenticate();

ftp.setType(FTPClient.BINARY);

// we have convenient calls like

ftp.makeDir(dir);

ftp.deleteDir(dir3);

// the listener for the transfers....

listener = new TransferProgressListener {

public void transfer(int total, int current, String from, String to) {

System.out.println(total + " " + current + " " + from + " " + to);

}

public void transferError(String from, String to, Exception e) {

System.out.println("transfer failed: " + from + " " + to);

}

}

Page 43: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

43

Simple Examples

Retrieve a Makefile from a remote site and save it to Makefile.bak– File dst = new File (“Makefile.bak”);– ftp.get (“Makefile”, dst, listener);

Retrieve all files in remote directory to a directory called “Destination”– File destinationDir = newFile(“Destination”);– ftp.Get(“*”, destinationDir, true, listener);

Disconnect– ftp.disconnect();

Examples in source code: apis/examples, UrlCopy, FTPClient

Page 44: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

44

ftpClient Use Grid FTP to transfer a file.from pyGlobus import ftpClientfrom pyGlobus.util import BufferhandleAttr = ftpClient.HandleAttr()opAttr = ftpClient.OperationAttr()marker = ftpClient.RestartMarker()ftpClnt = ftpClient.FtpClient(handleAttr)ftpClnt.get(url, opAttr, marker, done_func, condV)buf = Buffer(64*1024)handle = ftpClnt.register_read(buf, data_func, 0)

def data_func(cv, handle, buffer, bufHandle, bufLen, offset, eof, error):

g_dest.write(buffer) if not eof: try: handle = g_ftpClient.register_read(g_buffer, data_func, 0) except Exception, e:

Page 45: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

45

Performance Options in ftpClient Setting tcpbuffer sizefrom pyGlobus import ftpControlbattr = ftpControl.TcpBuffer()battr.set_fixed(64*1024)Or battr.set_automatic(16*1024, 8*1024, 64*1024)opAttr.set_tcp_buffer(battr) Setting parallelism para = ftpControl.Parallelism()para.set_mode(ftpControl.PARALLELISM_FIXED)para.set_size(3)opAttr.set_parallelism(para)

Page 46: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

46

Using ftpClient plugins

Debug Plugin

from pyGlobus import ftpClient

f = open(“/tmp/out”, “w+”)

debugPlugin = ftpClient.DebugPlugin(f, “foobar”)

handleAttr.add_plugin(debugPlugin) RestartMarker Plugin

restartPlugin = ftpClient.RestartMarkerPlugin(beginCB,

markerCB, doneCB, None)

Page 47: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

47

gassCopy

Provides a protocol independent API to transfer remote files.

from pyGlobus import gassCopy

from pyGlobus import ftpClient

srcAttr = gassCopy.Attr()handleAttr = gassCopy.HandleAttr()destAttr = gassCopy.Attr()ftpSrcAttr = ftpClient.OperationAttr()ftpDestAttr = ftpClient.OperationAttr()srcAttr.set_ftp(ftpSrcAttr)destAttr.set_ftp(ftpDestAttr)copy = gassCopy.GassCopy(handleAttr)copy.copy_url_to_url(srcUrl, srcAttr, destUrl, destAttr)

Page 48: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

48

Remote File IO

Supports reading and writing remote files from http, https, ftp, gsiftp servers.– Can be opened either as normal Python file objects or as int

file descriptors suitable for use with the os module.

from pyGlobus import gassFile

gassFile = gassFile.GassFile()

f = gassFile.fopen(“gsiftp://foo.bar.com/file”, “r”)

lines = f.readlines()

gassFile.fclose(f)

Page 49: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

49

Secure High-Performance IO.

Uses the Grid Security Infrastructure to provide authentication.

Provides access to the underlying network parameters for tuning performance.

Page 50: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

50

TCP Server example

attr = NetIOAttr.TCPIOAttr() attr.set_authentication_mode(

io.GLOBUS_IO_SECURE_AUTHENTICATION_MODE_GSS_API) authData = AuthData.AuthData() authData.set_callback(auth_callback, None) attr.set_authorization_mode(

io.GLOBUS_IO_SECURE_AUTHORIZATION_MODE_CALLBACK, authData)

attr.set_channel_mode( io.GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAP)

soc = GSITCPSocket.GSITCPSocket() port = soc.create_listener(attr) soc.listen() childSoc = soc.accept(attr) buf = Buffer.Buffer(size) bytesRead = childSoc.read(buf, size, size)

We will develop a similar example for Java, It is already available as part of the GASS server.

Page 51: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

51

TCP Client example

attr = NetIOAttr.TCPIOAttr() attr.set_authentication_mode(

io.GLOBUS_IO_SECURE_AUTHENTICATION_MODE_GSS_AP) authData = AuthData.AuthData() attr.set_authorization_mode(

io.GLOBUS_IO_SECURE_AUTHORIZATION_MODE_SELF, authData) attr.set_channel_mode(

io.GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAP) soc = GSITCPSocket.GSITCPSocket() soc.connect(host, port, attr) nBytes = soc.write(str, len(str))

Page 52: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

52

Web Services in Grids

Provides a uniform, lightweight, standards based, framework for:– Service description (WSDL)– Discovery (UDDI / WSIL)– Invocation (SOAP)

Allows the use of standard tools Separates the interface from implementation details

– Including separating service definition from protocol bindings

Language/Programming model agnostic Reduces the cost of building scientific applications Uses SOAP over HTTP over the Grid Security Infrastructure

– Similar to HTTPS, but supports delegation

Page 53: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

53

GSISOAP

Built on SOAP.py, looking at switching to ZSI GSISOAP client example

from pyGlobus import GSISOAP, ioc

proxy = GSISOAP.SOAPProxy(“https://host.lbl.gov:8081”, namespace=“urn:gtg-Echo”)

proxy.channel_mode = ioc.GLOBUS_IO_SECURE_CHANNEL_MODE

proxy.delegation_mode = ioc.GLOBUS_IO_SECURE_DELEGATION_MODE_NONEprint proxy.echo(“spam, spam, spam, eggs, and spam”)

Page 54: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

54

GSISOAP (cont.) GSISOAP server examplefrom pyGlobus import GSISOAP, ioc

def echo(s, _SOAPContext): cred = _SOAPContext.delegated_cred # Do something useful with cred here return s

server = GSISOAP.SOAPServer(host.lbl.gov, 8081)server.channel_mode =

ioc.GLOBUS_IO_SECURE_CHANNEL_MODE_GSI_WRAPserver.delegation_mode =

ioc.GLOBUS_IO_SECURE_DELEGATION_MODE_FULL_PROXY

server.registerFunction(SOAP.MethodSig(echo, keywords=0, context=1), “urn:gtg-Echo”)

server.serve_forever()

Page 55: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

55

Reliable File Transfer Service

A Grid Web Service that:– Provides a simple interface to reliably

transfer data between GridFTP servers– Supports reliable transfers in the face of

network outages, machine crashes, and other faults

– Supports the automated tuning of network parameters to optimize bandwidth

Page 56: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

56

RFT Usage

GridFTP Server GridFTP Server

Reliable File Transfer Service

GridFTP transfer

WSDL Repository(UDDI?, WSIL?)

ClientRetrieve WSDL

GSI Proxy

Request transfer, and delegate GSI proxy.

SOAP + GSI

GSI Proxy Initiate GridFTP third party transfer using delegated GSI credential

Page 57: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

57

Fault Recovery

GridFTP Server GridFTP Server

Reliable File Transfer Service

Restart Markers are periodically returned

Network Partition occurs

Page 58: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

58

Fault Recovery

GridFTP Server GridFTP Server

Reliable File Transfer Service

Restart Markers are periodically returned

Restart transfer from last restart marker

Page 59: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

59

Fault Recovery (cont.)

GridFTP Server GridFTP Server

Reliable File Transfer Service

Restart Markers are periodically returned and written to the DB

DB

Machine crashes

Page 60: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

60

Fault Recovery (cont.)

GridFTP Server GridFTP Server

Reliable File Transfer Service

Restart Markers are periodically returned and written to the DB

DB

Restart transfer from last restart marker

After machine reboot, recover incomplete transfers from the DB

Page 61: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

61

RFT WSDL<definitions name="FileTransferService“ namespace stuff …> <message name= "submitTransferRequest">

<part name = "srcUrl" type = "xsd:string"/>

<part name = "destUrl" type = "xsd:string"/>

<part name = "priority" type = "xsd:int"/>

</message>

<message name = "submitTransferResponse">

<part name = "return" type = "xsd:string"/>

</message>

<message name = "checkStatusRequest">

<part name = "handle" type = "xsd:string"/>

</message>

<message name = "checkStatusResponse">

<part name = "return" type = "xsd:string"/>

</message>

Page 62: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

62

RFT Internals

Client

GSISOAP Interface

Persistent DBManager ThreadTransfer ThreadsOptimization

Module

Submit transfer request using 2 phase commit protocol

Transfer Queue

Write transfer to DB and then to Queue

GridFTP Server GridFTP Server

Read next transferObtain TCP buffer size

Initiate transfer

Page 63: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

63

RFT Usage Example

from rftClient import *

handle = “”srcUrl = “gsiftp://host.lbl.gov/data/climate_model_1234”destUrl = “gsiftp://lorenz.ncar.edu/tmp/climate_model_1234”try: handle = submit_transfer(srcUrl, destUrl)except TransferRequestException, ex: print ex…try: status = check_transfer_status(handle)Except TransferRequestException, ex: print exprint status…

Page 64: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

64

Many more CoGs

Perl JSP CORBA …. GUIs

Applications

Page 65: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

65

Java CoG Requirements

JDK 1.2 Security

– Needs security package such as IAIK We are able to replace security packages based on a

couple of common functions with public domain package.– We hope that these package will reach the

maturity of iaik

Page 66: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

66

Python CoG Requirements

Python 2.0+ The GT2-beta (or better) release of Globus. Support for dynamic libraries. Currently pyGlobus is tested on

– Linux– Solaris– Should compile– Will support win32 when Globus is ported

Page 67: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

67

Review

Modifications to core Globus– Http based protocols– APIs are not sufficient: we need services

Components useful to enhance grid infrastructure – JavaBeansTM can be used to integrate with COTS. – Java and Python have a rich set of libraries (network,

ldap, ...).– Grid programming in Java and Python is easier! – Java uses event model - not callbacks– GUIs do not have everything we need scripting!

Page 68: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

68

References

Java CoG Kit– http://www.globus.org/cog.

Python CoG Kit– http://www-itg.lbl.gov/Grid/projects/pyGlobus

Globus– http://www.globus.org

SWIG– http://www.swig.org

[email protected]

[email protected]

Page 69: 1 Grid Programming in Java & Python Prepared by Gregor von Laszewski Argonne National Laboratory gregor@mcs.anl.gov Keith R. Jackson Lawrence Berkeley

69

Acknowledgements

This research is supported by the Mathematical, Information, and Computational Science Division subprogram of the Office of Advanced Scientific Computing Research, U.S. Department of Energy, under Contract W-31-109-Eng-38 and under Contract DE-AC03-76SF00098 with the University of California. Globus research and development is supported by DARPA, DOE, and NSF.