teragrid simo niskala teemu pasanen. teragrid general objectives resources service architecture...
Post on 22-Dec-2015
230 views
TRANSCRIPT
TeraGrid
• general
• objectives
• resources
• service architecture– grid services– teragrid application services
• Using TeraGrid
General• An effort to build and deploy the world's largest, fastest,
distributed infrastructure for open scientific research• Extensible Terascale Facility, ETF• Funded by National Science Foundation, NSF
– Total of $90 million at the moment• Partners:
– Argonne National Laboratory, ANL– National Center for Supercomputing Applications, NCSA– San Diego Supercomputing Center, SDSC– Center for Advanced Computing Research (CalTech), CACR– Pittsburgh Supercomputing Center, PSC– New partners in September 2003:
• Oak Ridge National Laborarory, ORNL• Purdue University• Indiana University• Texas Advanced Computing Center, TACC
• Provides Terascale computing power by connecting several supercomputers with Grid technologies
• Will offer 20 TFLOPS when ready in 2004– first 4 TFLOPS will be available for use around Jan 2004
Objectives
• increase computational capabilities for research community with geographically distributed resources
• deploy a distributed ”system” using Grid technologies rather than ”distributed computer”
• Define an open an extensible infrastructure– focus on integrating “resources” rather than “sites”– adding resources will require significant, but not unreasonable,
effort• supporting key protocols and specifications (e.g. authorization,
accounting)
– supporting heterogeneity while exploiting homogeneity• balancing complexity and uniformity
Resources• 4 clusters at ANL, Caltech, NCSA and SDSC
– Itanium 2-based Linux clusters– Total computing capacity of 15 TFLOPS
• Terascale Computing System, TCS-1 at PSC– AlphaServer-based Linux cluster– 6 TFLOPS
• HP Marvel system at PSC– Set of SMP machines– 32*1.15GHz(Alpha EV 67) + 128GB/machine
• ~1 Petabyte of networked storage• 40 Gb/s backplane network
Resources
• Backplane network– consists of 4 10Gb/s optical fiber channels– enables ”machine room” network across
sites– optimized for peak requirements– designed to scale to a much smaller number
of sites than general WAN– separate TeraGrid resource
• only for data transfer needs of TeraGrid resources
Grid Services
Service Layer Functionality TeraGrid implementation
Advanced Grid Services super schedulers, resource discovery services, repositories, etc.
SRB, MPICH-G2, distributed accounting, etc.
Core Grid Services (Collective layer)
TeraGrid information service, advanced data movement, job scheduling, monitoring
GASS, MDS, Condor-G, NWS
Basic Grid Services (Resource layer)
Authentication and accessResource allocation/MgmtData access/MgmtResource Information ServiceAccounting
GSI-SSH, GRAM, Condor, GridFTP, GRIS
Advanced Grid Services
• on top of Core and Basic Services• enhancements required for TeraGrid• for example Storage Resource Broker, SRB• additional capabilities• new services possible in future
Core Grid Services
• built on Basic Grid Services• focus on the coordination of multiple services• mostly implementations of Globus services
– MDS, GASS etc.• supported by most TeraGrid resources
Basic Grid Services
• focus on sharing single resources• implementations of i.e. GSI and GRAM• should be supported by all TeraGrid
resources
Grid Services
• provide clear specifications for what a resource must do in order to participate
• only specifications defined, implementations are left open
TeraGrid Application Services
• enable running of applications on a heterogenous system
• on top of basic and core Grid services• under development• new service specifications to be added
by current and new TeraGrid sites
TeraGrid Application Services
Service Objective
Basic Batch Runtime Supports running static-linked binaries
High Throughput Runtime (Condor-G) Supports running naturally distributed applications using Condor-G
Advanced Batch Runtime Supports running dynamic-linked binaries
Scripted Batch Runtime Supports scripting (including compile)
On-Demand / Interactive Runtime Supports interactive applications
Large-Data Supports very large data sets, data pre-staging, etc.
File-Based Archive Supports GridFTP interface to data services
Using TeraGrid• Access
– account• account request form• Globus certificate for authentication and
Distinguished Name (DN) entry– logging in
• single-site access requires SSH• multiple-site acccess requires GSI-enabled SSH
Using TeraGrid• Transferring files
– Storage Resource Broker (SRB)• data management tool for storing large data sets
accross distributed, heterogenous storage– High Performance Storage System (HPSS)
• moving entire directory structures between systems– SCP
• copying users files to TeraGrid platforms using SCP– Globus-url-copy
• transferring files between sites using GridFTP– GSINCFTP
• Uses proxy for authentication• additional software to Globus toolkit
Using TeraGrid
• Programming Environments– IA-64 clusters (in NCSA, SDSC, Caltech, ANL)
• Intel (default), Gnu, mpich-gm (default mpi-compiler)
– PSC clusters• HP (default), Gnu
– Softenv software• manages users environments through symbolic
keys