the texas high energy grid (thegrid) a proposal to build a cooperative data and computing grid for...

The Texas High Energy Grid (THEGrid)

A Proposal to Build a Cooperative Data and

Computing Grid for High Energy Physics and Astrophysics in

Texas

Alan Sill, Texas Tech UniversityJae Yu, Univ. of Texas, Arlington

Representing HiPCAT and the members of this workshop

Outline• High Energy Physics and Astrophysics in Texas• Work up to this workshop• CDF, DØ, ATLAS, CMS experiments• Problems• A solution• Implementation of the solution

– DØ and CDF Grid status– ATLAS, CMS– Etc.

• Status• Summary and Plans

High Energy Physics in Texas• Several Universities

– UT, UH, Rice, TTU, TAMU, UTA, UTB, UTEP, SMU, UTD, etc• Many different research facilities used

– Fermi National Accelerator Laboratory– CERN, Switzerland, DESY, Germany, and KEK, Japan– Jefferson Lab– Brookhaven National Lab– SLAC, CA and Cornell– Natural sources and underground labs

• Sizable community, variety of experiments and needs• Very large data sets now! Even larger ones coming!!

• High Energy Physics and Astrophysics data sets are huge– Total expected data size is over 5 PB for CDF and DØ, 10x

larger for CERN experiments– Detectors are complicated, need many people to construct and

make them work– Software is equally complicated– Collaboration is large and scattered all over the world

• Solution: Use the opportunity of having large data set in furthering grid computing technology– Allow software development and use at remote institutions– Optimize resource management, job scheduling, monitoring

tools, use of resources– Efficient and transparent data delivery and sharing– Improve computational capability for education – Improve quality of life for researchers and students

The Problem

Work up to this point• HiPCAT:

– What is HiPCAT? High Performance Computing Across Texas - a network and organization of computing centers and their directors at many Texas universities

– Other projects (TIGRE, cooperative education, etc.)– Natural forum for this proposal– First presentation April 2003– Many discussions since then– Led to this workshop

DØ and CDF at Fermilab Tevatron• World’s Highest Energy proton-anti-proton collider

– Ecm=1.96 TeV (=6.3x10-7J/p 13M Joules on 10-6m2)Equivalent to the kinetic energy of a 20t truck at a speed 80 mi/hr

Chicago

Tevatron p

p CDF

Dzero

Currently generating data at over a petabyte per year

Large-scale cluster computing duplicated all over the world

CDF Data Analysis Flow:

CDF

Level 3 Trigger(~250 duals)

Production Farm(~150 duals)

Central Analysis Farm(CAF)

(~500 duals)

UserDesktops

RoboticTape Storage

DataAnalysis7MHz beam Xing

0.75 Million channels

300 Hz

L1

↓

L2

↓

75 Hz

20 MB/s Read/write

Data

ReconstructionSimulation and

Distributed clusters in Italy, Germany, Japan, Taiwan, Spain, Korea, several places in the US, the UK, and Canada (more coming).

Tevatron Current Grid Framework (JIM)

UTATTU

CDF-GRID: Example of a working practical grid• CDF-GRID based on DCAF clusters is a de-facto working high

energy physics distributed computing environment• Built / developed to be clonable, • Deployment led by TTU• Large effort on tools usable both on- and off-site

– Data access (SAM, dCache)– Remote / multi-level DB servers– Store from remote sites to tape/disk at FNAL

• User MC jobs at remote sites = reality now• Analysis on remote data samples being developed using SAM

– Up and working, already used for physics !– Many pieces borrowed from / developed with / shared with Dzero

• This effort is making HEP remote analysis possible -> practical -> working -> easy for physicists to adopt

Basic tools• Sequential Access via Metadata (SAM)

– Data replication and cataloging system• Batch Systems

– FBSNG: Fermilab’s own batch system– Condor

• Three of the DØSAR farms consists of desktop machines under Condor• CDF: Most central resources already based on Condor

– PBS• More general than FBSNG; most dedicated DØSAR farms use this manager• Part of popular Rocks cluster configuration environment

• Grid framework: JIM = Job Inventory Management– Provide framework for grid operation Job submission, match making and

scheduling– Built upon Condor-G and Globus– MonALISA, Ganglia, user monitoring tools– Everyone has an account (with suitable controls), so everyone can submit!

Data Handling: Operation of a SAM Station/ConsumersProducers/

Station &Cache

Manager

File Storage Server

File Stager(s)

Project Managers

eworkers

FileStorageClients

MSS orOtherStation

MSS orOtherStation

Data flow

Control

Cache DiskTemp Disk

The tools cont’d• Local Task management

– CDF Grid (http://cdfkits.fnal.gov/grid/) • Decentralized CDF Analysis Farm = DCAF• Develop code anywhere (laptop is supported)• Submit to FNAL or TTU or CNAF or Taiwan or SanDiego or…• Get output ~everywhere (most desktops OK)• User monitoring system including Ganglia, info by queue/user per cluster

– DØSAR (Dzero Southern Analysis Region)• Monte Carlo Farm (McFarm) management (cloned to other institutions)• DØSAR Grid: Submit requests onto a local machine and the requests gets

transferred to a submission site and executed at an execution site• Various Monitoring Software

– Ganglia resource– McFarmGraph: MC Job status monitoring– McPerM: Farm performance monitor

http://cdfkits.fnal.gov/grid/







Background Statistics on CDF Grid• Data acquisition and data logging rate increased

– More data = more physicists– Approved by FNAL’s Physics Advisor Committee and Director

• Computing needs grow, but DOE/Fnal-CD budget flat

• CDF proposal: do 50% of analysis work offsite– CDF-GRID: Planned at Fermilab, deployment effort led by TTU

• Have a plan on how to do it• Have most tools in place and in use• Already in deployment status at several locations throughout the world

Hardware resources in CDF-GRIDsite GHz now TB now GHz Summer TB Summer Notes

INFN 250 5 950 30 Priority to INFN users; Pinned data sets exist

Taiwan 100 2.5 150 2.5 Pinned data sets exist

Korea 120 - 120 - Running MC only now

UCSD 280 5 280 5 Pools resources from several US groups. Min guaranteed from x2 larger farm (CDF+CMS)

Rutgers 100 4 400 4 In-kind, will do MC production

TTU 6 2 60 4 2 DCAFs, test site + CDF+CMS cluster

Germany GridKa

~200 16 ~240 18 Min. guaranteed CPU from x8 larger pool. Open to all by ~Dec (JIM)

Canada 240+ - 240+ - In-kind, doing MC production, + common pool

Japan - - 150 6 Just being deployed (07/2004)

Cantabria 30 1 60 2 ~1 months away

MIT - - 200 - ~1 month away

UK - - 400 - Open to all by ~Dec (JIM), + common pool

D0 Grid/Remote Computing April 2004 Joel Snow Langston University

DØSAR MC Delivery Stat. (as of May 10, 2004)Institution Inception NMC (TMB) x106

LTU 6/2003 0.4

LU 7/2003 2.3

OU 4/2003 1.6

Tata, India 6/2003 2.2

Sao Paulo, Brazil 4/2004 0.6

UTA-HEP 1/2003 3.6

UTA–RAC 12/2003 8.2

D0SAR Total As of 5/10/04 18.9

DØSAR Computing & Human ResourcesInstitutions CPU(GHz) [future] Storage (TB) People

Cinvestav 13 1.1 1F+?

Langston 22 1.3 1F+1GA

LTU 25+[12] 1.0 1F+1PD+2GA

KU 12 ?? 1F+1PD

KSU 40 1.2 1F+2GA

OU 19+270 (OSCER) 1.8 + 120(tape) 4F+3PD+2GA

Sao Paulo 60+[120] 4.5 2F+Many

Tata Institute 52 1.6 1F+1Sys

UTA 430 74 2.5F+1sys+1.5PD+3GA

Total 943 [1075] 85.5 +

120(tape)

14.5F+2sys+6.5PD+10GA

Current Texas Grid Status • DØSAR-Grid

– At the recent workshop in Louisiana Tech Univ.• 6 clusters form a regional computational grid for MC production• Simulated data production on grid in progress

– Institutions are paired to bring up new sites quicker– Collaboration between DØSAR consortium and the JIM team at Fermilab

begun for further software development• CDF Grid

– Less functionality than more ambitious HEP efforts, such as the LHC-Grid, butWorks now! Already in use!!Deployment led by TTUTuned on user’s needs

• Object Goal Oriented software!• Based on working models and spare use of standards

Costs little to get started• Large amount of documents and expertise in grid computing

accumulated between TTU and UTA already• Comparable experience probably available at other Texas institutions

Actual DØ Data Re-processing at UTA

Network Bandwidth Needs

Also have Sloan Digital Sky Surveyand other astrophysics work

– TTU SDSS DR1 mirror copy (first in the world)

– Locally hosted MySQL DB.– Image files stored on university

NAS storage– Submitted proposal w/ Astronomy

and CS colleagues for nationally-oriented database storage model based on local new observatory.

– Virtual Observatory (VO) storage methods -- international standards under development.

– Astrophysics is increasingly moving towards Grid methods.

Summary and Plans• Significant progress has been made within Texas in implementing grid computing

technologies for current and future HEP experiments • UTA and TTU are playing leading roles in Tevatron grid effort for the currently

running DØ and CDF as well as in LHC – ATLAS and CMS experiments• All HEP experiments building operating grids for MC data production

• Large amount of documents and expertise exist within Texas!• Already doing MC; moving toward data re-processing and analysis

– Different level of complexities can be handled by emerging framework

• Improvements to infrastructure necessary, especially with respect to network bandwidths– THEGrid will boost the stature of Texas in HEP grid computing world– Regional plans: Started working with AMPATH, Oklahoma, Louisiana, Brazilian

Consortia (tentatively named the BOLT Network)– Need Texas-based consortium to make progress in HEP and astrophysics

computing

Summary and Plans cont’d• Many shared pieces with between DØ and CDF experiment

for global grid development: Provides a template for THEGrid work

• Near-term goals:– Involve other institutions, including those in Texas– Implement and use an analysis grid 4 years before LHC– Work in close relation but not as part of LHC-Grid (so far)– Other experiments will benefit from feedback and use cases– Lead the development of these technologies for HEP– Involve other experiments and disciplines; expand grid– Complete the THEGrid document

• THEGrid will provide ample opportunity to increase inter-disciplinary research and education activities

the texas high energy grid (thegrid) a proposal to build a cooperative data and computing grid for...

Documents

computing grid

cooperative data

remote data samples

data size

problemhigh energy physics

planshigh energy physics

astrophysics data sets

kinetic energy