see-grid-2 infrastructure and operations overview
DESCRIPTION
SEE-GRID-2 Infrastructure and Operations Overview. Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007. Antun Balaz WP3 Leader Institute of Physics, Belgrade [email protected]. Grid Operations Objectives. Develop the next-generation SEE-GRID infrastructure - PowerPoint PPT PresentationTRANSCRIPT
The SEE-GRID-2 initiative is co-funded by the European Commission under the FP6 Research Infrastructures contract no. 031775
www.see-grid.eu
SEE-GRID-2
SEE-GRID-2 Infrastructure and Operations Overview
Antun BalazWP3 Leader
Institute of Physics, [email protected]
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 2
Grid Operations Objectives
Develop the next-generation SEE-GRID infrastructure Next generation of EGEE middleware (gLite) and services
Support in deployment and operations of the Resource Centres Monitoring, helpdesk, overall upgrade of infrastructure
Network resource provision and assurance in close cooperation with the SEEREN2 project Bandwidth-on-Demand requirements
CA and RA guidelines and deployment catch-all Certification Authority (CA) per-country CA deployment and operations
User portal deployment and operations P-GRADE
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 3
Main Achievements
Infrastructure maintained and expandedCore services deployed redundantly and maintained with no interruptions in operationOperations maintained and improved: BBmSAM deployed and integrated with HGSM Other operational tools developed, deployed and integrated SLA conformance; availabilities Grid-Operator-On-Duty shifts Accounting portal
Development areas identified and significant progress achieved: HGSM, BBmSAM, WiatG, Application-level accounting, YAIM customizations (glite-yaim-seegrid), JAVA Data Management API, software repository (SVN + apt-get/yum), firewall configuration suite, RB/WMS monitoring tool
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 4
Network Status
Majority of SEE-GRID countries covered by GEANT2 and SEEREN2; problems still with the connectivity of Albania Moldova
Liaison with SEEREN2 for effective network and services provisionTwo applications with BoD requirements have been identified: EMMIL (developed by International Business School, Hungary) VIVE (developed by the University of Belgrade, Serbia)
SALUTE application actively uses FTSSeveral applications use site-level MPI
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 5
SEE-GRID Infrastructure (1)
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 6
SEE-GRID Infrastructure (2)
SEE-GRID infrastructure contains currently the following resources: 34 sites in SEE-GRID production 6 sites in certification phase (2 AL + 1 HR + 2 RO + 1 MD) Over 1150 CPUs available Storage: 42 TB + 27 TB in preparation
All sites on gLite-3, with 12 sites on gLite-3.1 and the rest on gLite-3.0glite-WMSLB actively used (its 3.1 version)Guides provided for deployment of gLite-3.1 WNs on SL4.5, for both 32-bit and 64-bit architecturesOther 64bit guides in preparation (SE_dpm will be first)
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 7
SEE-GRID Infrastructure (3)
SEE-GRID total and free CPUs from November 2006 (from GStat)
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 8
SEE-GRID Infrastructure (4)
SEE-GRID Core services Catch-all Certification Authority
enables regional sites to obtain user and host certificates Virtual Organisation Management Service (VOMS),
authorization system for the SEE-GRID Virtual Organisation (VO), supporting groups and roles deployed two instances (master and slave) for failover
Workload management service (lcg-RB and glite-WMSLB) deployed several instances for failover Information Services (BDII)
deployed several instances for failover MyProxy is operational
supports certificate renewal FTS deployed
used in production
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 9
SEE-GRID Operations (1)
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 10
SEE-GRID Operations (2)
Distributed OperationsPilot SLA establishedMonitoring and Accounting ToolsHelpdesk tickets procedures Generic support group for users
TPM-like (monitoring open tickets created by users, trying to solve the simple ones, route the tickets, etc.).
Country level user support groups Step towards stand-alone operations
Grid-Operator-On-Duty shifts improving site availabilities
SEEGRID Wiki with detailed information for site admins: http://wiki.egee-see.org/index.php/SEE-GRID_Wiki
VOMS Role=ops used for SAM jobs submission
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 11
Operational & monitoring tools (1)
HGSMHGSM
HELP-DESKHELP-DESK
BDIIBDII
R-GMAR-GMA
SAMSAM
GSTAT(Taiwan)GSTAT
(Taiwan)
VOMSVOMSRTM(UK)
RTM(UK)
Googlemaps
Googlemaps
BBmSAMBBmSAM
GridICEGridICE
MonALISAMonALISA
NAGIOSNAGIOS
WiatGWiatG
AccountingAccounting
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 12
Operational & monitoring tools (2)
Operational & monitoring tools deployment status
Hierarchical Grid Site Management (HGSM) – Turkey Service Availability Monitoring (SAM) (+ porting to MySQL) – Bosnia
and Herzegovina with CERN support Helpdesk - Romania BBmSAM - Bosnia and Herzegovina GridICE – FYR of Macedonia SEE-GRID GoogleEarth – Turkey + Gidon Moont (ic.ac.uk) SEE-GRID GoogleMaps - Turkey Global Grid Information Monitoring System (GStat) – Min-Hong Tsai
(ASGC, Taiwan) Relational Grid Monitoring Architecture (R-GMA) – Bulgaria Nagios - Bulgaria Real Time Monitor (RTM) – Gidon Moont (ic.ac.uk) and Turkey (HGSM) MONitoring Agents using a Large Integrated Services Architecture
(MonALISA) – Romania What is at the Grid (WiatG) – CERN with support from Serbia Accounting Portal – IPP Pakiti - AUTH
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 13
BBmSAM portal Created for SLA monitoring
Generating site availability statistics according to several criteria Overview (HTML, XLS) and full dump (CSV) of data possible
Extended into full SAM portal Availability for last 24h period for all sites/services Latest results per service History for nodes/services
WiatG Web application for visualization of BDII information
http://bdii.phy.bg.ac.yu/WiatG/pl/WiatG.pl Used as an operational tool for site monitoring Current version seeks for: CE, gCE, RB, gRB, SE, LFC, FTS and GridICE Documentation available:
http://wiki.egee-see.org/index.php/WiatG
BBmSAM & WiatG
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 14
Accounting Portal (1)
Provides full accounting data Per site/institution Per country Per VO
Provides full statistics for usage Per institution Per application (in progress)
Provides job statistics (success rates etc.)Accounting portal is based on SEE-GRID R-GMA dataPublishing of site accounting data to R-GMA done by the deployed Java publisher, developed by IPP
SEE-GRID-2 PSC05 meeting, Thessalonica, Greece - September 11-12, 2007
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 15
Accounting Portal (2)
Accounting views for SEEGRID – per country/institution user accounting
https://gserv1.ipp.acad.bg:8443/Accounting-2
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 16
SEE-GRID Accounting Data
Base CPU time (hours)
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
180,000
200,000
Oct 06 Nov 06 Dec 06 Jan 07 Feb 07 Mar 07 Apr 07 May 07 Jun 07 Jul 07 Aug 07 Sep 07 Oct 07
Over 160 CPU-years provided toSEE-GRID user communities
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 17
HGSM database
SEE-GRID GOCDB Introduced as a lightweight version of GOCDB Allows us to easily change its format when necessary and to
adapt it to regional needs Allows us to provide custom exports on demand, depending
on operational tools/application developers
Contains statical information about all sitesDeveloped and maintained by TUBITAK-ULAKBIM, Turkey https://hgsm.grid.org.tr/
Used by EUMedGRID, other regional projects expressed interest
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 18
HGSM Development Roadmap documentImplemented improvements: Universal Exports System
Exports site XML data Site Information XML Import System (SIXIS)
Importer parses site information, nodes, contacts, downtimes and administrators
sBDII Pull-Insert System Data available in the information system can be inserted into
HGSM
In progress: Field Verification and Convenience Add-Ons Revision of Fields in HGSM Web Interface Site Snapshots and Exports
HGSM Developments
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 19
SEE-GRID-2 SLA
Hardware and connectivity criteria Min. amount of resources for sites to participate in the
infrastructure Network to fulfill operations test requirements
Level of support Site and security administrators availability and response time
Level of expertise Site and security administrators declaration of expertise
VO support Site to provide support to SEEGRID VO and its OPS role
Conformance to Operational Metrics Site availability Downtimes
SEE-GRID-2 SLA communicated to EGEE
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 20
Conformance to SEE-GRID-2 SLA
Improvements seen after four quarters of pilot SLA enforcement
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
Over 90% 50% to 90% Less than 50%
SLA Conformance (CE Availability)
Dec 06 - Jan 07
Feb 07 - Apr 07
May 07 - Jul 07
Aug 07 - Oct 07
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 21
Contribution Areas
HGSMApplication-level accounting toolYAIM customizations (glite-yaim-seegrid)SAM porting to MySQL (BBmSAM)WiatGNew tool “What should be at the Grid” (WsbatG) Based on the site configuration exported from HGSM, should
provide the expected status of BDIIJAVA Data Management APIFirewall configuration developmentContributions to standards (e.g. Glue Schema) Mainly providing feedbacks Coordination with other projects missing
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 22
CA Status
CAs accredited in the region in 2007 Bulgaria (BG.ACAD CA), Accredited on March 5, 2007 Serbia (AEGIS CA), Accredited on June 1, 2007 Romania (ROSA CA), Accredited on August 1, 2007
Earlier accredited CAs Greece (HellasGrid CA) Croatia (SRCE CA) Turkey (TRGRID CA)
Grid CA candidates Montenegro CA (MREN CA)
CP/CPS reviewed by GridAUTH (via see-ca-incubation mailing list) on July 10, 2007
F.Y.R.O.M. CA (MARGI CA) Accreditation request on May 4, 2007 First CP/CPS not yet available
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 23
CA Map
Catch All CA
NewCA
CandidateCA
TrainingCA
RA
EstablishedCA
Policy workshop on research infrastructures and eScience, Sarajevo, 21 November 2007 24
Conclusions
Regional Grid infrastructure matures in operations and provides reliable distributed computing and storage resources to RTD communitiesUsage continuously grows, user communities widenUser-level services developed and improvedSupport available on the regional and national levelNGI model should provide long-term sustainability in terms of human resources, Grid operations, infrastructure maintenance and upgrades