australian partnership for advanced computing
DESCRIPTION
Australian Partnership for Advanced Computing. Partners: Australian Centre for Advanced Computing and Communications ( ac3 ) in NSW The Australian National University ( ANU ) Commonwealth Scientific and Industrial Research Organisation ( CSIRO ) - PowerPoint PPT PresentationTRANSCRIPT
Australian Partnership forAdvanced Computing
“providing advanced computing andgrid infrastructure for eResearch”
Rhys FrancisManager, APAC grid program
Partners:• Australian Centre for Advanced Computing and
Communications (ac3) in NSW• The Australian National University (ANU)• Commonwealth Scientific and Industrial Research
Organisation (CSIRO)• Interactive Virtual Environments Centre (iVEC) in WA• Queensland Parallel Supercomputing Foundation (QPSF)• South Australian Partnership for Advanced Computing
(SAPAC) • The University of Tasmania (TPAC)• Victorian Partnership for Advanced Computing (VPAC)
• National Facility Program – a world-class advanced computing service– currently 232 projects and 659 users (27 universities)– major upgrade in capability (1650 processor Altix 3700 system)
• APAC Grid Program– integrate the National Facility and Partner Facilities– allow users easier access to the facilities– provide an infrastructure for Australian eResearch
• Education, Outreach and Training Program– increase skills to use advanced computing and grid systems– courseware project– outreach activities – national and international activities
APAC Programs
Engineering Taskforce
Implementation Taskforce
Project Leader
Research Leader
Steering Committee
Activities
APAC Grid Development
APAC Grid Operation
Research Activities Development Activities
Project Leader
Activities
140 people
>50 full time equivs
$8M pa in people
Plus compute/dataresources
Projects
Grid Infrastructure Computing Infrastructure
• Globus middleware• certificate authority• system monitoring and management
(grid operation centre)
Information Infrastructure• resource broker (SRB)• metadata management support
(Intellectual Property control)• resource discovery
User Interfaces and Visualisation Infrastructure
• portals to application software• workflow engines• visualisation tools
Grid Applications
Astronomy
High-Energy Physics
Bioinformatics
Computational Chemistry
Geosciences
Earth Systems Science
Organisation Chart
Program Manager Rhys Francis
Project Leader S/C Chair Astronomy Gravity Wave Susan Scott Rachael Webster Astrophysics portal Matthew Bailes Rachael Webster Australian Virtual Observatory Katherine Manson Rachael Webster Genome annotation Matthew Bellgard Mark Ragan Molecular docking Rajkumar Buyya Mark Ragan Chemistry workflow Andrey Bliznyuk Brian Yates Earth Systems Science workflow Glenn Hyland Andy Pitman Geosciences workflow Robert Woodcock Scott McTaggart EarthBytes Dietmar Muller Scott McTaggart Experimental high energy physics Glenn Moloney Tony Williams Theoretical high energy physics Paul Coddington Tony Williams Remote instrument management Chris Willing Bernard Pailthorpe
Project Leader Services Gateway VM Compute Infrastructure David Bannon CA NG1, NG2 VOMS/VOMRS Gram2/4 Information Infrastructure Ben Evans SRB NGdata GridFTP MDS2/4 UI&VI Rajesh Chabbra Gridsphere NGportal Myproxy Collaboration Services Chris Willing A/G
APAC Executive Director John O’Callaghan
Name Partner Name Partner Youzhen Cheng ac3 David Baldwyn ANU Bob Smart CSIRO Darran Carey iVEC Martin Nicholls QPSF/UQ Grant Ward SAPAC John Dalton TPAC Chris Samuel VPAC
Associated grid nodes David Green QPSF/Griffith Ian Atkinson QPSF/JCU Ashley Wright QPSF/QUT Marco La Rosa UoM
Gateway Servers Support Team David Bannon
Services Architect Markus Buchhorn
LCG VM Marco La Rosa
Infrastructure Support (Middleware)
Application Support
Infrastructure Support (Systems)
Strategic Management
Middleware Deployment
Research Applications
Systems Management
Experimental High Energy Physics
• Belle Physics Collaboration– K.E.K. B-factory detector
• Tsukuba, Japan– Matter/Anti-matter investigations– 45 Institutions, 400 users worldwide
• 10 TB data currently– Australian grid for KEK-B data
• testbed demonstrations• data grid centred on APAC National Facility
• Atlas Experiment– Large Hadron Collider (LHC) at CERN
• 3.5 PB data per year (now 15 PB pa)• operational in 2007
– Installing LCG (GridPP), will follow EGEE
Belle Experiment• Simulated collisions or events
– used to predict what we’ll see (features of data)– essential to support design of systems– essential for analysis
• 2 million lines of code
Belle simulations
• Computationally intensive– simulate beam particle collisions, interactions, decays
– all components and materials : 10x10x20 m, 100 µm accuracy
– tracking and energy deposition through all components
– all electronics effects (signal shapes, thresholds, noise, cross-talk)
– data acquisition system (DAQ)
• Need 3 times as many simulations as real events to reduce statistical fluctuations
Belle status
• Apparatus at KEK in Japan
• Simulation work done world wide
• Shared using an SRB federation: KEK, ANU, VPAC,
Korea, Taiwan, Krakow, Beijing…(led by Australia!)
• Previous research work used script based workflow
control, project is currently evaluating LCG middleware for
workflow management
• Testing in progress: LCG job management, APAC grid job
execution (2 sites), APAC grid SRB data management (2
sites) with data flow using international SRB federations
• Limitation is international networking
Earth Systems Science Workflow
• Access to Data Products– Inter-governmental Panel Climate
Change scenarios of future climate (3TB)
– Ocean Colour Products of Australasian and Antarctic region (10TB)
– 1/8 degree ocean simulations (4TB)– Weather research products (4TB)– Earth Systems Simulations– Terrestrial Land Surface Data
• Grid Services– Globus based version of OPeNDAP (UCAR/NCAR/URI)– Server side analysis tools for data sets: GRADS, NOMADS– Client side visualisation from on-line servers– THREDDS (catalogues of OPeNDAP repositories)
Workflow Vision
Discovery Visualisation
Digital Library
OPeNDAP
AP
AC
NF
VP
AC
AC
3
SA
PA
C
IVE
C
Job/Data Management
Analysis Toolkit
Crawler
DiscoveryPortlet
VisualisationPortlet
Get DataPortlet
Analysis ToolkitPortlet
Web MapService
Web ProcessingService
Web CoverageServiceOAI
Library API (Java)
Live Access Server (LAS)
OPeNDAP Server
Processing App.
Metadata
Crawler
Digital Repository
Gridsphere Portal
WebServices
Application Layer
DataLayer
HardwareLayer Compute Engine
ConfigMetadata
Workflow Components
APAC NF (Canberra)International IPCC model results (10-50Tb)TPAC 1/8 degree ocean simulations (7Tb)
Met Bureau Research Centre (Melbourne)Near real-time LAPS analyses products (<1Gb)Sea- and sub-surface temperature products
TPAC & ACE CRC (Hobart)NCEP2 (150Gb), WOCE3 Global (90Gb)Antarctic AWS (150Gb), Climate modelling (4Gb)Sea-ice simulations, 1980-2000
CSIRO Marine Research (Hobart)Ocean colour products & climatologies (1Tb)Satellite altimetry data (<1Gb)Sea-surface temperature product
CSIRO HPSC (Melbourne)IPCC CSIRO Mk3 model results (6Tb)
AC3 Facility (Sydney)Land surface datasets
OPeNDAP Services
Data
User
SRB
MCAT
SSA
SRB
get( )
AVD
Registry
SIASSA
query
List of m
atches
quer
y
Lis
t of
mat
ches
Australian Virtual Observatory
APAC Grid Geoscience
• Conceptual models • Databases• Modeling codes• Mesh generators• Visualization packages• People• High Performance
Computers• Mass Storage
Facilities
Core
Deep Mantle
UpperMantle
Oceanic Lithosphere
UpperCrust
SedimentsOceanicCrust
Oceans
Biosphere
Atmosphere
lower Crust
Subcontinentallithosphere
weathering
Mantle Convection
• Observational Databases–access via SEE Grid Information
Services standards
• Earthbytes 4D Data Portal–Allows users to track observations
through geological time and use them as model boundary conditions and/or to validate process simulations.
• Mantle Convection–solved via Snark on HPC resources
• Modeling Archive–stores the problem description so they
can be mined and auditedTrial application provided by:•D. Müller (Univ. of Sydney)•L. Moresi (Monash Univ./MC2/VPAC)
Workflows and services
Resource Registry
Service Registry
Results Archive
Data Management Service
HPC Repository
LoginJob
MonitorRun
SimulationEdit Problem Description
Local Repository
Archive Search
Geology S.A
Geology W.ARock Prop.
N.S.W
Rock Prop. W.A
AAA Job Management
Service
Snark ServiceEarthBytes
Service
User
Status update
APAC National Grid
Key steps
• Implementation of our own CA
• Adoption of VDT middleware packaging
• Agreement to a GT2 base for 2005, GT4 in 2006
• Agreement on portal implementation technology
• Adoption of federated SRB as base for shared data
• Development of gateways for site grid architecture
• Support for inclusion of ‘associated’ systems
• Implementation of VOMS/VOMRS
• Development of user and provider policies
VDT components
DOE and LCG CA Certificates v4 (includes LCG 0.25 CAs) GriPhyN Virtual Data System (containing Chimera and Pegasus) 1.2.14 Condor/Condor-G 6.6.7 VDT Condor configuration scriptFault Tolerant Shell (ftsh) 2.0.5 Globus Toolkit 2.4.3 + patches VDT Globus configuration scriptGLUE Schema 1.1, extended version 1 GLUE Information Providers CVS version 1.79, 4-April-2004 EDG Make Gridmap 2.1.0 EDG CRL Update 1.2.5 GSI-Enabled OpenSSH 3.4 Java SDK 1.4.2_06 KX509 2031111 Monalisa 1.2.12 MyProxy 1.11 PyGlobus 1.0.6 UberFTP 1.3 RLS 2.1.5 ClassAds 0.9.7 Netlogger 2.2
Apache HTTPD, v2.0.54 Apache Tomcat, v4.1.31 Apache Tomcat, v5.0.28 Clarens, v0.7.2 ClassAds, v0.9.7 Condor/Condor-G, v6.7.12
VDT Condor configuration scriptDOE and LCG CA Certificates, vv4 (includes LCG 0.25 CAs) DRM, v1.2.9 EDG CRL Update, v1.2.5 EDG Make Gridmap, v2.1.0 Fault Tolerant Shell (ftsh), v2.0.12 Generic Information Provider, v1.2 (2004-05-18) gLite CE Monitor, v1.0.2 Globus Toolkit, pre web-services, v4.0.1 + patches Globus Toolkit, web-services, v4.0.1 GLUE Schema, v1.2 draft 7 Grid User Management System (GUMS), v1.1.0
GSI-Enabled OpenSSH, v3.5Java SDK, v1.4.2_08jClarens, v0.6.0 jClarens Web Service Registry, v0.6.0 JobMon, v0.2 KX509, v20031111 Monalisa, v1.2.46 MyProxy, v2.2 MySQL, v4.0.25 Nest, v0.9.7-pre1 Netlogger, v3.2.4 PPDG Cert Scripts, v1.6 PRIMA Authorization Module, v0.3 PyGlobus, vgt4.0.1-1.13 RLS, v3.0.041021 SRM Tester, v1.0 UberFTP, v1.15 Virtual Data System, v1.4.1 VOMS, v1.6.7 VOMS Admin (client 1.0.7, interface 1.0.2, server 1.1.2), v1.1.0-r0
Our most important design decision
V-LAN
Gateway Server
Cluster
Datastore
Cluster
Gateway Server
Cluster
Datastore
Cluster
Installing Gateway Servers at all grid sites, using VM technology to
support multiple grid stacks
Gateways will support, GT2, GT4, LCG/EGEE, Data grid (SRB etc),
Production Portals, development portals, experimental grid stacks
High bandwidth, dedicated private networking between grid
sites
Gateway Systems
• Support the basic operation of the APAC National Grid and translate grid protocols into site specific actions– limit the number of systems that need grid components
installed and managed – enhance security as many grid protocols and associated
ports only need to be open between the gateways– in many cases only the local gateways need to interact
with site systems– support roll-out and control of production grid
configuration– support production and development grids and local
experimentation using Virtual Machine implementation
Grid pulse – every 30 minutesGateway Down [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Down [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Down [email protected] Gateway Down [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Up [email protected] Gateway Down [email protected] Gateway Up [email protected]
NG1 – globus toolkit 2 servicesANUiVECVPAC
NG2 – globus toolkit 4 servicesiVECSAPAC (down)VPAC
NGDATA – SRB & GridFTPANUiVEC VPAC (down)
NGLCG – special physics stackVPAC
NGPORTAL – apache/tomcatiVECVPAC
http://goc.vpac.org/
A National Grid
GrangeNet BackboneCentie/GrangeNet Link
AARNet Links
TownsvilleQPSF
Brisbane
CanberraANU
MelbourneVPACCSIRO
Sydneyac3
PerthIVEC
CSIRO
AdelaideSAPAC
HobartTPACCSIRO
+3500 processors
+3PB near line storage
Mass stores (15TB cache, 200+ TB holdings, 3PB capacity)• ANU 5+1300 TB CSIRO 5+1300 TB plus several 70-100 TB stores
Compute Systems (aggregate 3500+ processors)• Altix 1,680 1.6 GHz Itanium-II 3.6 TB 120 TB disk• NEC 168 SX-6 vector cpus 1.8 TB 22 TB disk• IBM 160 Power 5 cpus 432 GB • 2 x Altix 160 1.6 GHz Itanium-II 160 GB• 2 x Altix 64 1.5 GHz Itanium-II 120 GB NUMA• Altix 128 1.3 GHz Itanium-II 180 GB 5TB disk, NUMA• 374 x 3.06 GHz Xeon 374 GB Gigabit Ethernet• 258 x 2.4 GHz Xeon 258 GB Myrinet• 188 x 2.8 GHz Xeon 160 GB Myrinet• 168 x 3.2 GHz Xeon 224 GB GigE, 28 with
infiniband• 152 x 2.66 GHz P4 153 GB 16TB disk, GigE
Significant Resource Base
Resources Users
Data Compute Monitoring Constraints Activities Interfaces
Grid Staging and Execution
Progress Monitoring
Authorisation(Policy &
Enforcement)
Global resource allocation and
scheduling
Command line access to
Resources
Resource Discovery
VO Mgmt (Rights, Shares,
Delegations)
Workflow Processing (Job
execution)Portals, workflow
Access Services
Grid Interfaces
Resource Availability
AccountingApplication development
Portal for Grid Mgmt (GOC)
Data Movement
QueuesResource
RegistrationConfiguration
Mgmt
Data and Metadata Mgmt
(Curation)
AccessGrid interaction
Files, DBs, Streams
Binaries, Libraries, Licenses
History,Auditing
Authentication (Identity Mgmt)
Reporting, analysis and
summarisation
3rd party GUIs for applications and
activities
Operating Systems and Hardware
Firewalls, NATs and Physical Networks
Security: agreements, obligations, standards, installation, configuration, verification
Functional decomposition
Resources Users
Data Compute Monitoring Constraints Activities Interfaces
Grid Staging and Execution
Progress Monitoring
Authorisation(Policy &
Enforcement)
Global resource allocation and
scheduling
Command line access to
Resources
Resource Discovery
VO Mgmt (Rights, Shares,
Delegations)
Workflow Processing (Job
execution)
Portals, workflow
Access Services
Grid Interfaces
Resource Availability
AccountingApplication development
Portal for Grid Mgmt (GOC)
Data Movement
QueuesResource
RegistrationConfiguration
Mgmt
Data and Metadata Mgmt
(Curation)
AccessGrid based interaction
Files, DBs, Streams
Binaries, Libraries, Licenses
History,Auditing
Authentication (Identity Mgmt)
Reporting, analysis and
summarisation
3rd party GUIs for applications and
activities
Operating Systems and Hardware
Firewalls, NATs and Physical Networks
Security: agreements, obligations, standards, installation, configuration, verification
1 2 43 5 6
APAC National Grid
• Basic Services– single ‘sign-on’ to the facilities
– portals to the computing and data systems
– access to software on the most appropriate system
– resource discovery and monitoring
VPAC
QPSF
TPAC
IVEC
APACNATIONALFACILITY
ANU
CSIRO
SAPAC
AC3
one virtualsystem of
computationalfacilities