athena & the grid architectural view
DESCRIPTION
Athena & the Grid Architectural View. Craig E. Tull HCG/NERSC/LBNL ATLAS/LHCb/GridPP Workshop Cosener's House - May 23, 2002. What this talk is:. What this talk is not: Another presentation of GRAPPA. See Rob's talk of yesterday. What this talk is: - PowerPoint PPT PresentationTRANSCRIPT
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Athena & the GridArchitectural View
Craig E. Tull
HCG/NERSC/LBNL
ATLAS/LHCb/GridPP WorkshopCosener's House - May 23, 2002
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
What this talk is:
• What this talk is not:—Another presentation of GRAPPA.—See Rob's talk of yesterday.
• What this talk is:—An ATLAS perspective on the view of the Grid
from the Athena/Gaudi Framework.—A seat of the pants distillation of some
impressions from this workshop's presentations.
—Food for thought and discussions in this afternoon's session.
—… and slightly Random.
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Athena/GAUDI Architecture
Converter
Algorithm
Event DataService
PersistencyService
DataFiles
AlgorithmAlgorithm
Transient Event Store
Detec. DataService
PersistencyService
DataFiles
Transient Detector
Store
MessageService
JobOptionsService
Particle Prop.Service
OtherServices
HistogramService
PersistencyService
DataFiles
TransientHistogram
Store
ApplicationManager
ConverterConverter
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Grid vs. Athena Services
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Bigger Picture
Converter
Algorithm
Event DataService
PersistencyService
AlgorithmAlgorithm
Transient
Event Store
Detec. DataService
PersistencyService
Transient
Detector
Store
MessageService
JobOptionsService
Particle Prop.Service
OtherServices Histogram
ServicePersistency
Service
Transient
Histogram Store
ApplicationManager
ConverterConverterEventSelector
Analysis Program
OSMass
Storage
EventDatabasePDG
Database
DataSetDB
Other
MonitoringService
HistoPresenter
Other
JobService
Config.Service
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Bucket of Cold Water
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Grid: The new paradigm
• The Grid offers a vision of computer resources that are: Distributed, Heterogeneous, Robust, and Integrated.
• Some concepts are qualitatively new.—Resource Discovery, Virtual Data, Reserved QoS
• Some concepts are quantitatively "new".—Number of sites/jobs/nodes/users.
• Some concepts are old wine in new skins.—Distributed processing
• Some are natural & "obvious" extensions of old concepts.—Unix GroupsVO, LFNs
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Grid Projects: Integrated?
• We've heard here about:—GANGA, GRAPPA, BOSS, AliEn—CMT, Pacman, Packman, DAR—WP1 JSS, GriPhyN Planner—Magda, WP2 Replica Service—NetLogger, Prophesy, GMA, R-GMA, GridView,
Ganglia—VDL/IVDL, WP1 JDL, Condor ClassAds—EDG, PPDG, GriPhyN, GridPP, InfoGrid,
CrossGrid, GGF, Monarch,…• How do we take advantage of Grid capability while
protecting ourselves from potential duplication/conflicts of roles & responsibility?
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Grid: Ready for PrimeTime?
• CHEP'98 -First HENP Grid (Clipper) Talk—#237 Directions and Issues for High Data Rate Wide Area
Network Environments• Many Grid projects are CS R&D. But production grids do exist
(eg. NASA InfoGrid) and indications are that Grid computing is gaining momentum in non-HENP (ie. mainstream) world.
• IBM/Globus Partnership - 12 developers
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
ATLAS SW & Grid Projects
• The Grid does now offer advantages & functionality. More will certainly come.
• We cannot afford to wait to be handed the solution.• APIs to Grid services need to be compatible or adapted
with Athena Services• ATLAS interests/requirements need to be communicated to
Grid researchers/developers & DOE/NFS.• Timelines for ATLAS need to be defined.
—Grid timeline is not the same as some others—FTE resources avail. are critical input
• Much current work concentrates on issues like:—Data Volume, Data Set Distribution, ATLAS Resources
(Disk, CPU, HMS), Network Connectivity, $$$, FTE, etc.• Distributed Computing Model must be defined.• Control Framework
—Grid-compatible / Grid-aware, but not Grid-dependent
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Grid aware, but not dependent.
• Interface Technologies—Programmatic API (eg. C, C++, etc)—Scripting as Glue ala Stallman (eg. Python)—JobOptions.{txt,py}—Sandbox—Others?
• eg. SOAP, CORBA, RMI, DCOM, .NET, etc.
• International Standards would help!—Global Grid Forum
• Staged approach is called for.—Simple Batch model to begin. Add simple Grid
functionality via Services. Continual feedback.
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Athena/Grid Interface
• For the programmatic interface to Grid services, we are thinking in terms of Gaudi services to capture and present the functionality of the grid services (not necessarily a one-to-one mapping, BTW).
• I think it is important at this stage (maybe forever) to insure that the framework is "grid-capable" without being "grid-dependent". IE- We should always be able to run without grid services available.—Gaudi's component architecture makes this
approach to using the grid quite natural.—How do we switch between Grid/non-Grid?
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Jul’01: PSEUDOCODE FOR ATLAS SHORT TERM UC01
Logical File NameLFN = "lfn://"hostname"/"any_string
Physical File NamePFN = "pfn://"hostname"/"path
Transfer File NameTFN = "gridftp://"PFN_hostname"/path
JDLInputData = {LFN[]}OutputSE = host.domain.name
Worker NodeLFN[] = WP1.LFNList()for (i=0;i<LFN.list;i++){
PFN[] = ReplicaCatalog.getPhysicalFileNames(LFN[i])j = Athena.eventSelectonSrv.determineClosestPF(PFN[])localFile = GDMP.makeLocal(PFN[j],OutputSE)Athena.eventSelectionSrv.open(localFile)
}PFN[] = getPhysicalFileNames(LFN):PFN = getBestPhysicalFileName(PFN[], String[] protocols)TFN = getTransportFileName(PFN, String protocol)filename = getPosixFileName(TFN)
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
WP2: Replica Manager API(old: pre-SFN terminology)
• addPhysicalFileName(LogicalFileName, PhysicalFileName)• deletePhysicalFileName(LogicalFileName, PhysicalFileName)• SFN = getPhysicalFileNames(LogicalFileName)• copy(PhysicalFileName source, PhysicalFileName destination,
String protocol)• copyAndAddPhysicalFile(PhysicalFileName source,
PhysicalFileName destination, LogicalFileName lfn, String protocol)
• generatePhysicalFileName(LogicalFileName filename, PhysicalFileNamePattern)
• estimateCostForCopy(PhysicalFileName source, PhysicalFileName destination, String protocol)
• SFN = getLocationOfBestReplica (LogicalFileName)• getBestPhysicalFileName (PhysicalFileNameList, ProtocolList)• getTransportFileName (PhysicalFileName, Protocol)
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Athena Distributed Instrumentation
• Part of SuperComputing 2002 ATLAS demo• IMonitorSvc IChronoStatSvc extension?
— Abstract application monitoring service.• Prophesy (http://prophesy.mcs.anl.gov/)
— An Infrastructure for Analyzing & Modeling the Performance of Parallel & Distributed Applications— Normally a Parse & auto-instrument approach (C & FORTRAN).
• NetLogger (http://www-didc.lbl.gov/NetLogger/)— End-to-End Monitoring & Analysis
of Distributed Systems— C, C++, Java, Python, Perl,
Tcl APIs— Web Service Activation
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
WP1: Sandbox
• Working area (input & output) replicated on each CE to which Grid job is submitted.—Very convenient & natural.
• My Concerns:—Requires network access (with associated
privileges) to all CEs on Grid.• Could be a huge security issue with local
administrators.
—Not (yet) coordinated with WP2 services.—Sandbox contents not customizable to local
(CE/SE/PFN) environment.—Temptation to Abuse (not for data files)
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Grid System
Planner
ATLAS
planner
JDL
JobOptions
GDBOutput fragment
Job
Physical
File
GDB
input
GDB Magda
Sandbox
WP2
Rep Mgr
Specify input
Logical filenames
Register
output
WP1
JSS
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
ATLAS SW & the Grid
• What are the implications of a distributed computing model and grids for:
• The database domain?—Extensive in almost any case
• The control framework?—Depends upon the model (e.g., distributed data
sources versus distributing executables versus distributed execution)
• Other ATLAS software infrastructure?—eg. Build & install tools & kits
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Distributed Processing Models
• Batch-like Processing (ala WP1)• Distributed Single Event (MPP)• Client-Server (interactive)• WAN Data Access (AMS, Clipper)• File Transfer and Local Processing (GDMP)• Agent-based Processing (distributed control)• Check-Point & Migrate (save & restore)• Scatter & Gather (parallel events)
• Move the data or move the executable?—No experiment is planning to write PetaBytes
of Code!
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
ATLAS Distributed Processing Model
• At this point, it is still not clear what the final ATLAS distributed computing model will be. Although newer ideas like Agent-based Processing have a great deal of appeal, they are as yet unproven in a large-scale production environment.
• A conservative approach would be some combination of Batch-like Processing and File Transfer and Local Processing for batch jobs, with perhaps a Client-Server or Scatter-Gather approach for interactive/analysis jobs.—PPDG CS-11 - Interfacing and Integrating
Interactive Data Analysis Tools with the Grid and Identifying Common Components and Services
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Data Access Patterns
• Data access patterns of physics jobs also heavily influence our thinking about interacting with the Grid. It is likely that all possible data access patterns will be extant in ATLAS data processing at various stages in that processing.We may find that some data access patterns lend themselves to efficient use of the Grid much better than others.
• Data access patterns include:—Sequential Access (reconstruction)—Random Access (interactive analysis)—File/Data Set Driven (LFN-friendly)—Navigational Driven (OODB-like)—Query Driven (SQL/OQL/JDO/etc)
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
DB Architectural Elements
• Events are write-once• Three capabilities to support optimization:
• Event sharing• Data sharing• Data placement (clustering)
• Therefore, different storage formats— Does not mean different technologies!— Different ways to represent events and sets of
events.— Possible because navigation is separated
from storage.— Examples…
ATLAS DataBase Architecture - Ed Frank
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Architectural Motif- Extract & Transform
• Architecture will express many storage formats—Any job can read any of them without reconfiguration
• Can always extract events for transport, regardless of format—Cost depends upon the storage format
• Tier 0 assigned responsibility of keeping a copy of the data in a format such that extraction costs are affordable—Archival data format
• Can always transform (write) data into a new format—Store in format for local optimization
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Extract and Transform
Site 1
Site 3Site 2
Transport & Install
Extract & transform
Just Extract
Transport, transform & Install
ATLAS DataBase Architecture - Ed Frank
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Object Access vs File Access
• ATLAS (like others) is basing our Event Data Model (EDM) on a (transient) Object Data Model.—This transient model maps onto a persistent
Object Model (not necessarily 1-to-1)• We require users to think of objects in the
transient store at the Algorithm level.—Transient Data Store has data access proxy
concepts built in to read-in objects from persistency to TDS.
• Current Grid products heavily oriented towards LFN-like view of data. —Perfectly natural as this is the system-level
view of data & convenient unit for atomic data transfer across the network. (eg. FTP, gridFTP)
• BUT, if we want users to think objects, the object to LFN/PFN mapping has to be somewhere.
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Ganga Senarios
• Scenario 1—User makes a "high-level" selection of data to process
and defines processing job.• "High-level" means based on event characteristics and not
on file or even identity.
—High-level event selection uses ATLAS Bookkeeping DataBase (similar to current LArC Bookkeeping data base or BNL's Magda) to select event & logical file identities.
—Construct JDL for WP1 using LFNs—Construct jobOptions.py using PFNs (w/ WP2)—Submit job(s) using JDL & jobOptions.py in sandbox.
• Scenario 2 - The same except jobOptions.py now contains LFNs. This requires the Replica Service API-enabled EvtSelector or ConversionSrv.
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Observation about GUIs
• Several projects are promoting GUIs.—WP1, Grappa, AliEn, others.
• Independently written "native" GUIs are notoriously difficult to integrate/make coherent.
• Web-based GUIs are easier to integrate, but offer limited functionality.
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Rule #1: Protect the User
• Real Data vs. Virtual Data• LFN vs. PFN/TFN/SFN• Grid Enabled vs. Standalone
• We do not want the user of the Framework to know or care about details like this.—Implies: Uniform, abstract access
to/specification of data sets (ie. if Real and Virtual Data are to be used).
—Dummy (non-Grid) implementations of Grid-enabled Services?
[email protected] - Athena & Grid, Architecture View (23may02 - ATLAS/LHCb/GridPP WkShp @ Cosener's)
Way Forward/Discussion
• Goal: Give direction to new hires funded by GridPP to ensure that their work has the widest applicability in both ATLAS & LHCb.
• Discussion Questions:—Data-File or Data-Object level access?—Heterogeneity - How much? (Client vs. Server)—Communication Protocols?—How to synchronize/coordinate?
• ATLAS world-wide & Large Active US effort• LHCb - no US component => more EDG-centric
—GAUDI/Athena - Where to draw the line?• Grid middleware/Svc Interfaces/Implementations
—Balance Short-term Usability vs. Long-term Functionality - Remember the mainstream.