globus online + panda: a brief summary

11
Globus Online + Panda: a brief summary Maxim Potekhin for the BNL PAS team Brookhaven National Laboratory ATLAS S&C Week, March 13 th 2012

Upload: aliza

Post on 06-Jan-2016

39 views

Category:

Documents


1 download

DESCRIPTION

Globus Online + Panda: a brief summary. Maxim Potekhin for the BNL PAS team Brookhaven National Laboratory ATLAS S&C Week, March 13 th 2012. Overview. For more info, please see my Globus presentation this coming Thursday How does Globus Online work and why we are considering it? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Globus Online + Panda: a brief summary

Globus Online + Panda:a brief summary

Maxim Potekhin for the BNL PAS teamBrookhaven National Laboratory

ATLAS S&C Week, March 13th 2012

Page 2: Globus Online + Panda: a brief summary

Overview

• For more info, please see my Globus presentation this coming Thursday

• How does Globus Online work and why we are considering it?

• Globus Online in the context of Panda, what functionality has been implemented and what we can offer in near term

• Credits: Carlos Contreras, Wensheng Deng, Shuwei Ye, Horst Severini

Page 3: Globus Online + Panda: a brief summary

How does it work?

From the Globus Online Web Site:

“Globus Online is a hosted service that automates the tasks associated with moving files between sites, or “endpoints.” In a nutshell, here’s how to transfer files using the Web interface:•Log in to Globus Online•Select “Start Transfer” on the “Go To” drop-down menu•Specify the source and destination endpoints, and the files to move•Click an arrow button to initiate the transfer”

https://www.globusonline.org/howitworks/

Why did we become interested?In addition to forming the basis of ATLAS computing, Panda has been once in a while used by a number of “non-ATLAS” VOs within the Open Science Grid. One of the challenges in enabling these organizations to use Panda was the absence of sufficiently light-weight and generic data movement subsystem, independent of DDM/DQ2. This was partially resolved by using a variety of one-off solutions.

In 2011, Globus Online was presented at the OSG All-Hands meeting, and it’s feature set presented a near-perfect match to the functionality we had been missing in serving the needs of small organizations and research teams.

Page 4: Globus Online + Panda: a brief summary

How does it work?

1. User initiates transfer request

3. Globus Online notifies user

DestinationDestinationSourceSource 2. Data Transfer

Page 5: Globus Online + Panda: a brief summary

Features

APIIn addition to the Web interface, Globus Online offers other APIs:•HTTP•Full-fledged gsissh interface•Python binding

Globus Connect.Globus Connect is a portable client, available in Linux, Windows and Mac versions, which can be used to create Globus Endpoints on demand on the machine of user’s choice (such as their laptop or their analysis workstation). In the Linux environment, it can be dynamically downloaded and installed if needed.

Page 6: Globus Online + Panda: a brief summary

Why it may be useful for ATLAS?

Hypothetical use caseDuring analysis, a user would like to run a series of filters on data which exists on a few different sites. In addition to having the resulting data written to the “usual” locations at the processing sites, the user wants to have the resulting body of data conveniently placed in one location such as their workstation or laptop, or any GridFTP site.

The transfers can be monitored on the Web Page or from a Python client if needed.

Page 7: Globus Online + Panda: a brief summary

Running with Globus-aware ATLAS Pilot

1. Pilot submission to Panda follows the usual technique (Pilot Scheduler in the past, APF going forward). Pilot type must point to a Globus-Enabled version of the Pilot.

2. The user submits their job using prun, with entries in the “metadata” section specifying the parameters of data transfer.

3. Job runs, output data gets to destination

In our testing, we had to re-use existing command line options in prun, will request that changes be made to make those more user-friendly. So the exact prun option syntax is work in progress. Example:

--exec "/usatlas/u/wdeng/prod_test/test/hadd_wrap.py 900b1a06-5d44-4a88-8394-bbf1ca090ccd_0.HIST.root `echo %IN | sed 's/,/ /g'`" --site ANALY_BNL_T3 "--athenaTag=AtlasPhysics,16.6.4.1.1" --noBuild --outDS user.wdeng.0a5c4f41-facf-4af6-b21a-b3230efb7c43.HIST --outputs 900b1a06-5d44-4a88-8394-bbf1ca090ccd_0.HIST.root --inDS data10_7TeV.00165954.physics_CosmicCalo.recon.HIST.r1608_tid176557_00 "--nFiles=2" "--nFilesPerJob=2" --useChirpServer userendpoint:wdeng#mytest111:/tmp/wdeng/test_go/

Note that Wensheng had specified his own endpoint, as defined in Globus Online, as destination for the data.

Page 8: Globus Online + Panda: a brief summary

Use Case with the Generic Pilot

To better illustrate what options and what semantics may be used, let’s consider the “Generic Pilot” use case, where processing is done for a non-Atlas user.

./sendJob.py --njobs 1 --computingSite TEST3 --transformation http://www.usatlas.bnl.gov/~mxp/panda/transformations/maxim_test.sh

--prodSourceLabel user --cloud OSG --jobParameters ‘\globus-user=mxp \in-mode=server \out-mode=server \files-in=xs.sh dir-in=/direct/usatlas+u/mxp/ \files-out=xs.sh dir-out=/home/usatlas1/ \globus-endpoint-in=mxp#MXP_BNL_TEST \globus-endpoint-local=mxp#MXP_BNL_TEST \globus-endpoint-out=mxp#MXP_OU_TEST’

The “modes” can be any of the following: server (gridFTP endpoint), local (local file copy), gc (Globus Connect).

Page 9: Globus Online + Panda: a brief summary

List of Endpoints

Maintenance of endpoints on the Globus Online portal

Page 10: Globus Online + Panda: a brief summary

Web monitoring of data transfers (task list)

Page 11: Globus Online + Panda: a brief summary

Conclusions

New functionality

On-demand, optional data transfer to any GridFTP server or workstation, with error handling, retries and extensive monitoring.

StatusPrototype tested with actual payload

Issues

Better understanding use cases

Scalability of GridFTP instances