the grid2003 project: an application laboratory for science

36
The Grid2003 Project: An Application Laboratory for Science Jorge L. Rodriguez University of Florida Department of Physics [email protected] D0SAR Workshop Louisiana Tech University April 7 th , 2004

Upload: soren

Post on 17-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

The Grid2003 Project: An Application Laboratory for Science. D0SAR Workshop Louisiana Tech University April 7 th , 2004. Jorge L. Rodriguez University of Florida Department of Physics [email protected]. What is Grid2003/Grid3?. International Data Grid with dozens of sites - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Grid2003 Project: An Application Laboratory for Science

The Grid2003 Project:An Application Laboratory for Science

Jorge L. RodriguezUniversity of FloridaDepartment of [email protected]

D0SAR WorkshopLouisiana Tech University

April 7th, 2004

Page 2: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 2

What is Grid2003/Grid3? International Data Grid with dozens of sites Serving applications across various

disciplines HEP experiments (LHC, BTeV) Bio-chemical, CS demonstrators…

Currently over 2000 CPUS available for use by over 100 users

A peak throughput of 1100 concurrent jobs with a completion efficiency of approximately 75%

Note: Grid2003 refers to the initial project from 8/2003 – 12/2003 Grid3 refers to the persistent grid infrastructure

Note: Grid2003 refers to the initial project from 8/2003 – 12/2003 Grid3 refers to the persistent grid infrastructure

Page 3: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 3

Grid3 Organization Stakeholders:

US LHC Software and Computing Projects US ATLAS, US CMS

Grid projects (iVDGL, PPDG, GriPhyN) CS groups, VDT team, iGOC

GriPhyN experiments LIGO, SDSS as well as ATLAS and CMS

New collaborators Vanderbilt BTeV (Fermilab) Group Argonne computational biology group U Buffalo chemical structure

Page 4: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 4

ContributorsBoston UniversityCaltechHampton University Harvard UniversityIndiana UniversityJohns Hopkins UniversityVanderbilt UniversityUniversity of OklahomaUniversity of ChicagoUniversity of FloridaUniversity of MichiganUniversity at Buffalo

Argonne National LaboratoryBrookhaven National LaboratoryFermi National Accelerator LaboratoryKyungpook National UniversityLawrence Berkeley National

LaboratoryUniversity of California San DiegoUniversity of New MexicoUniversity of Southern California-ISIUniversity of Texas, ArlingtonUniversity of Wisconsin-MadisonUniversity of Wisconsin-Milwaukee

Page 5: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 5

Contributors Argonne National Laboratory: Jerry Gieraltowski, Scott Gose, Natalia Maltsev, Ed May, Alex

Rodriguez, Dinanath Sulakhe, Boston University: Jim Shank, Saul Youssef, Brookhaven National Laboratory: David Adams, Rich Baker, Wensheng Deng, Jason Smith, Dantong Yu,

Caltech: Iosif Legrand, Suresh Singh, Conrad Steenberg, Yang Xia, Fermi National Accelerator Laboratory: Anzar Afaq, Eileen Berman, James Annis, Lothar Bauerdick, Michael

Ernst, Ian Fisk, Lisa Giacchetti, Greg Graham, Anne Heavey, Joe Kaiser, Nickolai Kuropatkin, Ruth Pordes*, Vijay Sekhri, John Weigand, Yujun Wu, Hampton University:

Keith Baker, Lawrence Sorrillo, Harvard University: John Huth, Indiana University: Matt Allen, Leigh Grundhoefer, John Hicks, Fred Luehring, Steve Peck, Rob Quick, Stephen Simms,

Johns Hopkins University: George Fekete, Jan vandenBerg, Kyungpook National University/KISTI: Kihyeon Cho, Kihwan Kwon, Dongchul Son, Hyoungwoo Park, Lawrence Berkeley National Laboratory: Shane Canon, Jason Lee, Doug Olson, Iowa Sakrejda, Brian Tierney, University at Buffalo: Mark Green, Russ Miller, University of California San Diego:

James Letts, Terrence Martin, University of Chicago: David Bury, Catalin Dumitrescu, Daniel Engh, Ian Foster, Robert Gardner*, Marco Mambelli, Yuri Smirnov, Jens Voeckler, Mike

Wilde, Yong Zhao, Xin Zhao, University of Florida: Paul Avery, Richard Cavanaugh, Bockjoo Kim, Craig Prescott, Jorge L. Rodriguez, Andrew Zahn, University of Michigan: Shawn

McKee, University of New Mexico: Christopher T. Jordan, James E. Prewett, Timothy L. Thomas, University of Oklahoma: Horst Severini, University of Southern California: Ben

Clifford, Ewa Deelman, Larry Flon, Carl Kesselman, Gaurang Mehta, Nosa Olomu, Karan Vahi, University of Texas, Arlington: Kaushik De, Patrick McGuigan, Mark Sosebee,

University of Wisconsin-Madison: Dan Bradley, Peter Couvares, Alan De Smet, Carey Kireyev, Erik Paulson, Alain Roy, University of Wisconsin-Milwaukee: Scott Koranda, Brian

Moe, Vanderbilt University: Bobby Brown, Paul Sheldon

* Team Leads

Page 6: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 6

Grid3 Services Software packaging Service (pacman)

Virtual Data Toolkit (VDT) Additional middleware configuration packages

Monitoring Services MonALISA ganglia Metrics Data Viewer

User Authentication Service Virtual Organization Management Service (VOMS)

Grid3 Operations The international Grid Operations Center (iGOC)

Page 7: The Grid2003 Project: An Application Laboratory for Science

Grid3 Packaging

Page 8: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 8

Grid Packaging Service Packaging is the key to success!

Automation in software installation greatly improves reliability of software deployments

Pacman package manager is used in Grid3 Complete installation and site configuration is

simplified to a single command:

In reality it takes a little more work. However…

% pacman –get iVDGL:Grid3% pacman –get iVDGL:Grid3

ref. pacman --- http://physics.bu.edu/~youssef/pacman/

Page 9: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 9

The VDT packages vers 1.1.12 Globus Alliance

Grid Security Infrastructure (GSI) Job submission (GRAM) Information service (MDS) Data transfer (GridFTP) Replica Location (RLS)

Condor Group Condor/Condor-G DAGMan Fault Tolerant Shell ClassAds

EDG & LCG Make Gridmap Cert. Revocation List Updater Glue Schema/Info provider

ISI & UC Chimera & related tools Pegasus

NCSA MyProxy GSI OpenSSH

LBL PyGlobus Netlogger

Caltech MonALISA

VDT VDT System Profiler Configuration software

Others KX509 (U. Mich.)

Page 10: The Grid2003 Project: An Application Laboratory for Science

Grid3 Monitoring

Page 11: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 11

Monitoring Services Ganglia - http://gocmon.uits.iupui.edu/ganglia-webfrontend

Open source tool to collect cluster monitoring information such as CPU and network load, memory and disk usage

MonALISA - http://gocmon.uits.iupui.edu:8080/index.html Monitoring tool to support resource discovery, access to information

and gateway to other information gathering systems ACDC Job Monitoring System -

http://acdc.ccr.buffalo.edu/statistics/acdc/fullsizeindexqueue.php Application uses globus GRAM to query job managers and collect

information about jobs. This information is stored in a DB and available for aggregated queries and browsing.

Metrics Data Viewer (MDViewer) - http://grid.uchicago.edu/metrics/ Application to display and analyze information collected by the

different monitoring tools, queries Metrics DBs at iGOC. Globus MDS

Information and Index Service for resource discovery, selection and optimization. GLUE schema with Grid3 extension

Page 12: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 12

Monitoring Infrastructure

Page 13: The Grid2003 Project: An Application Laboratory for Science

Grid3 Authentication

Page 14: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 14

Grid3 Authentication

iVDGL VOMS server

edg-mkgridmap

FNAL VOMS server

BNL VOMS server

user DNs

user DNs

user DNs

site a client

site b client

site n client

mapping of user’s grid credentials (DN) to local site group account

gridmap-file

gridmap-file

gridmap-file

USCMS, SDSS

USATLAS

BTeV, LSC, iVDGL

DN mappings

Page 15: The Grid2003 Project: An Application Laboratory for Science

Grid3 Operations

Page 16: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 16

Grid3 Operations: (iGOC)

http://www.ivdgl.org/grid2003/catalog

Page 17: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 17

Grid3 OperationsSupport and Policy Investigation and resolution of grid middleware

problems at the level of 16-20 contacts per week With other iGOC personnel develop Service Level

Agreements for iVDGL Grid service systems and iGOC support service.

Membership Charter completed which defines the process to add new VO’s, sites and applications to the Grid Laboratory

Support Matrix defining Grid3 and VO services providers and contact information

Page 18: The Grid2003 Project: An Application Laboratory for Science

Grid2003 Applications

Page 19: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 19

Project Application Overview 7 Scientific applications and 3 CS demonstrators

All iVDGL experiments participated in the Grid2003 project

A third HEP and two Bio-Chemical experiments also participated

Over 100 users authorized to run on Grid3 Application execution performed by dedicated

individuals Typically 1, 2 or 3 users ran the applications from a

particular experiment Participation from all Grid3 sites

Sites categorized according to policies and resource Applications ran concurrently on most of the sites Large sites with generous local use policies where more

popular

Page 20: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 20

Scientific Applications High Energy Physics Simulation and Analysis

USCMS: MOP, GEANT based full MC simulation and reconstruction Work flow and batch job scripts generated by McRunJob Jobs generated at MOP master (outside of Grid3) submit jobs to Grid3

sites via condor-G Data products are archived at FermiLab: SRM/dCache

USATLAS: GCE, GEANT based full MC simulation and reconstruction Workflow is generated by Chimera VDS, Pegasus grid scheduler and

globus MDS for resource discovery Data products archived at BNL : Magada and globus RLS are employed

USATLAS: DIAL, Distributed analysis application Dataset catalogs built, n-tuple analysis and histogramming (data

generated on Grid3) BTeV : Full MC simulation

Also utilizes the Chimera workflow generator and condor G (VDT)

Page 21: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 21

Scientific Applications Astrophysics and Astronomical

LIGO/LSC: blind search for continuous gravitational waves

SDSS: maxBcg, cluster finding package Bio-Chemical

SnB: Bio-molecular program, analyses on X-ray diffraction to find molecular structures

GADU/Gnare: Genome analysis, compares protein sequences

Computer Science Evaluation of Adaptive data placement and scheduling

algorithms

Page 22: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 22

CS Demonstrator Applications Exerciser

Periodically runs low priority jobs at each site to test operational status

NetLogger-grid2003 Monitored data transfers between Grid3 sites via

NetLogger instrumented pyglobus-url-copy GridFTP Demo

Data mover application using GridFTP designed to meet the 2TB/day metric

Page 23: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 23

Running on Grid3 With information provided by the Grid3 information system

1. Composes list of target sites Resource available Local site policies

2. Finds where to install application and where to write data Use of Grid3 Information Index Service (~MDS) Provides pathname for $APP, $DATA, $TMP and $WNTMP

3. User sends and remotely installs application from a local site4. User submit job(s) through globus GRAM

User never needs to interact with local site administrators other than through the Grid3 services!

Page 24: The Grid2003 Project: An Application Laboratory for Science

Grid3 Metrics

Page 25: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 25

Grid3 Metrics Collection Grid3 monitoring

applications (information consumers) MonALISA MetricsData Viewer

Queries to persistent storage DB (on the gocmon server) MonALISA plots MDViewer plots

Page 26: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 26

Grid3 Metrics CollectionMDViewer MonALISA

Page 27: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 27

Metrics Summary TableMetric Target Grid2003

“SC2003”

Number of CPUs 400 2762 (27 sites)

Number of users > 10 102 (16)

Number of Applications > 4 10

Number of site running concurrent applications > 10 17

Peak number of concurrent jobs 1000 1100

Data Transfer per day > 2-3 TB 4.4 TB (11.12.03)

Page 28: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 28

Grid3 Status Summary Current hardware

resources Total of 2693 CPUs

Maximum CPU count Off project contribution >

60% Total of 25 sites

25 administrative domains with local policies in effect

All across US and Korea Running jobs

Peak number of jobs 1100 During SC2003 various

Scientific applications were running simultaneously across various Grid3 sites

Page 29: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 29

USCMS and Grid3 So far have completed

about 14.2 million events Significant amount of

resources provided over that available to USCMS alone

About 1.4 time the event yield over dedicated USCMS resources

USCMS alone has utilized more than 147 CPU years on Grid3 resources! Another 20 CPU years by

other Grid3 applications

Canonical USCMS resources

Total resources with Grid3

Page 30: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 30

USCMS and Grid3

Grid3 sites only USCMS sites only

History over 3 month period

MonALISA Plots

Page 31: The Grid2003 Project: An Application Laboratory for Science

Outlook and Conclusions

Page 32: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 32

Grid3 Near Term Plans- What’s running on the grid now ?

USCMS has just about completed its Pre-Challenge Production PCP04 but plans to continues its production runs USCMS “MOP Regional Center” was asked to simulate 14.3

million JetMet events Higgs analyses signal and background events, 13 channels in all

about 75% of them background Particularly challenging run typical job runs for 5 days!

Some are run as long as 4 week!! Work done in preparation for CMS’ Data Challenge 04

DC04 in a nutshell Reconstruction at CERN (T0 center) of PCP04 “raw” data @

25Hz Stream and catalog at Tier1 centers (FNAL …) Physics analysis in real time @ Tier1 and Tier2 sites

Page 33: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 33

Grid3 Near Term Plans cont.

USATLAS, SDSS and LIGO USATLAS is in a development mode preparing for their

DC2 challenge which begins April 1st, currently using Grid3 to run test.

LIGO and SDSS are modifying their workflow generators to enhance reliability and improve productivity when running on the grid. Once work is completed they also intend to utilize the resources

Bio-Chemical and Computer Science research CS research is ongoing with of order 500 jobs being

submitted since SC2003. The work focus on data management and scheduling. Many more of these experiments are planned.

Page 34: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 34

Grid3 Near Term Plans cont.

New sites will be joining under existing VOs New HEP experiment VO: CDF has begun work to port

their software environment to Grid3 New CS applications :

Virtual-organization-aware resource allocation The Sphinx grid scheduler Scalability and robustness of the VDT scheduling

algorithms …

Lots more to do on evolving the Grid3 infrastructure and Operations model…

Page 35: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 35

U.S. Open Science Grid Goal: An integrated U.S. Grid infrastructure

Grid computing infrastructure to support US scientific efforts CPU & storage resources from laboratories and universities DOE and NSF partnership Internet2, ESNet, state, international optical networks

Getting there: OSG-1 (Grid3+), OSG-2, … Series of releases increasing functionality & scale

Initial meetings Sep. 17 @ NSF: Educators, scientists, etc. Jan. 12 @ Fermilab: Public discussion, planning sessions

Next steps White paper to be expanded into roadmap Presentation to funding agencies (May/June?)

Page 36: The Grid2003 Project: An Application Laboratory for Science

Jorge L. Rodriguez: The Grid2003 ProjectD0SAR Workshop Louisiana Tech 36

Conclusion

A project to deploy a reasonably large distributed international data grid consisting of tens of sites serving over one hundred users who running applications from a variety of scientific disciplines is successful.

It is still being used!

!Useful work was done on Grid3!14 million events and counting

!Useful work was done on Grid3!14 million events and counting