infso-ri-508833 enabling grids for e-science global grid user support the model and experience in...
TRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
GLOBAL GRID USER SUPPORTTHE MODEL AND EXPERIENCE IN LCG/EGEE
Gilles Mathieu(1), Torsten Antoni(2), Flavia Donno(3), Helmut Dres(2), Alistair Mills(3), Philippa Strange(4) , David Spence(4), Min Tsai(5), Marco Verlato(6)
(1) IN2P3/CNRS Computing Centre, Lyon – France
(2) Forschungzentrum Karlsruhe, Karlsruhe – Germany
(3) CERN, European Organization for Nuclear Research, Geneva – Switzerland
(4) Rutherford Appleton Laboratory, CCLRC, Chilton – England
(5) Academia Sinica, Tapei – Taiwan
(6) Istituto Nazionale Fisica Nucleare (INFN), Padova - Italy
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Outline
• Motivation
• GGUS: history and general description/architecture
• Services offered
• GGUS supporters: first and second line support
• Interface to the Regional Operations Centres (ROCs)
• GGUS for Grid Operations: CIC-on-duty • Some performance number and statistics
• Conclusions and future work
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Motivation
• A single access point for support• A portal with a well structured information and updated
documentation• Knowledgeable experts • Correct, complete and responsive support• Tools to help resolve problems
– search engines – monitoring applications– resources status
• Examples, templates, specific distributions for software of interest• Interface with other Grid support systems• Connection with developers, deployment, operation teams• Assistance during production use of the grid infrastructure
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Central Application
(GGUS)
DeploymentSupport
MiddlewareSupport
NetworkSupport
Operations Support
TPM
ROC 1 ROC 10ROC…
VOSupport
Interface
Webportal
The Support Model
““Regional Support with Central Coordination"Regional Support with Central Coordination"
The ROCs and VOs and the other project
wide groups such as the Core
Infrastructure Center (CIC), middleware
groups (JRA), network groups (
NA), service groups (SA) are
connected via a central
integration platform provided
by GGUS.
Regional Support units
User Support unitsTechnical Support units
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
The GGUS System
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
GGUS Portal: user services
Browseable ticketsBrowseable tickets
Search through solved ticketsSearch through solved tickets
Useful links (Wiki FAQ)Useful links (Wiki FAQ)
Broadcast toolsBroadcast tools
Latest NewsLatest News
GGUS Search EngineGGUS Search Engine
Updated documentation (Wiki FAQ)Updated documentation (Wiki FAQ)
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
GGUS Portal: Search engine
GGUSGGUSSearch Search EngineEngineOngoingwork to make it fasterand to searchthrough a widerset of docsand DBs
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
GGUS Portal:… and more
• Hot line with supporters available to answer to problems
• VO specific supporting teams for middleware integration problems
• Links to specific useful tools and middleware
• Examples and templates
• Links to dedicated Virtual Organization portals that gather resources and monitoring information, contact mailing lists, etc.
• Modular system to accommodate further needs
GGUS has shown to meet user needs
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
TPMGrid experts
GGUS Supporters
VO-TPMVO experts
User
First line support
VO SupportUnits
Middleware Support Units
Deployment Support Units
Operations Support
ROC Support Units
Network Support
Second line support
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
Operations support
• Purpose/role– Detect problems by monitoring the grid– Report them by creating and assigning GGUS tickets– Provide help and follow-up on problems
• Operations Support teams : “CIC on duty”– Currently 6 teams (CERN, France, Italy, UK, Russia, Taiwan)– Weekly shift
• CIC/GGUS interface– Based on Web services at GGUS side– “CIC-on-duty dashboard”: graphic user interface for operators,
hosted at IN2P3 Computing Centre (Lyon, France)
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 11
Enabling Grids for E-sciencE
INFSO-RI-508833
FZK, Karlsruhe, Germany IN2P3-CC, Lyon, France
CICROC Basic Workflow
CIC PORTAL
CIC-on-duty dashboard
UK FR GER IT …
Regional Support Units
Operator on duty
- Create()- Set(ticket)
SOAP
SOAP
-Get(ticket)- Get_all()
Ticket Ticket
Central Helpdesk
CIC
Help
desk
WSDL
WSDL
GGUS
Problem detection & reporting
Ticket follow-up
GGUSInterface
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 12
Enabling Grids for E-sciencE
INFSO-RI-508833
GGUSROC Basic Workflow
Web Portal
GGUS System
Operations support
ROC-1 Helpdesk
ROC-1 Interface
Ticket solved
Ticket assignment to ROC-1
SU-1SU-2
SU-N
ROC-X Helpdesk
ROC-X Interface
SU-1SU-2
SU-N
Ticket re-assigned
TPM
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 13
Enabling Grids for E-sciencE
INFSO-RI-508833
ROCs Interface: how it works
• First Interface between ROC_Italy Helpdesk and GGUS– ready since November ’04– in ‘production’ since March ‘05
• Based on Web Services at GGUS side, several advantages:– sample code available for PHP/Perl/Python and other computing
languages – very fast: 600-1000 service requests/sec on the GGUS Servers – easy to adapt
• Based on e-mail at local side (importing tool)• XML exchange format• Tickets fields mapping between the two systems
http://infnforge.cnaf.infn.it/eticketimp/
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 14
Enabling Grids for E-sciencE
INFSO-RI-508833
Performance statistics Average processing times for cms
00:00:00
140:01:52
Time (hh:mm:ss)
Average time fromticket creation toticket assignment
Average time fromticket assignment toticket solution
September
Average processing times for cms tickets
0:18:35
21:03:45
Time (hh:mm:ss)
Average time fromticket submit to ticketassignment
Average time fromticket assignment toticket solution
October
Average processing times for TPM
0:01:34
2:38:37
Time (hh:mm:ss)
Average time fromticket creation toticket assignment
Average time fromticket assignment toticket solution
October
Average processing times for all ROCs
1:35:16
41:59:13
Time (hh:mm:ss)
Average time from ticketcreation to ticketassignment
Average time from ticketassignment to ticketsolution
October
A peak of 80 tickets per day has been reached. The system can handle up to
1400 tickets a week.
0
10
20
30
40
50
15
22
1
1118
23
34
19
46
16 13 15
26
42
4
16
2 1
Castor
Generic DeploymentGlobalGridUserSupport
NetworkOperationsROC_Asia/Pacific
ROC_CE
ROC_CERNROC_DE/CHROC_France
ROC_ItalyROC_North
ROC_Russia
ROC_SEROC_SW
ROC_UK/Ireland
Security Management
TPM
VOSupport
Workload Management
Amount of tickets
November 2005: 315 tickets
G. Mathieu et al., CHEP06 Mumbai 13-17 February 2006 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Conclusions
• The GGUS model of “Regional Support with Central Coordination"“Regional Support with Central Coordination" has shown to be a workable and scalable solution for User support, given the feedback received from the users.• The functionality and usability of the GGUSGGUS system has improvedhas improved in the last months, thanks to the help of the ROCs and the experience acquired.• GGUS/CICGGUS/CIC interface has made the system a central mechanism for Grid Operations and monitoring. • The existent interfaces with the ROCsinterfaces with the ROCs are quite practical and make the system function as one.• The ticket traffic is increasing. We still do not knowdo not know what a realistic figurerealistic figure would be for the number of ticket to be expected. However, the system can be dimensioned appropriately with more TPMs and support units.• A lot of metrics establishedmetrics established to measure the performance of the system.• GGUS is working on a plan to offer resilience to system and network plan to offer resilience to system and network failuresfailures.• We need more specialized supportersWe need more specialized supporters and dedicated man power in order to guarantee good quality of service.• Training sessions for supporters are organized by the GGUS Teams and recording made available off-line.