site report from kek, japan
DESCRIPTION
Site Report from KEK, Japan. JP-KEK-CRC-01 and JP-KEK-CRC-02 Go Iwai, KEK/CRC Grid Operations Workshop – 2007 Kungliga Tekniska högskolan, Stockholm, Sweden 13-15 June 2007. JP-KEK-CRC-01 and JP-KEK-CRC-02. DEPLOYMENT STATUS AT KEK. KEK Internal Network. KEK External Network. - PowerPoint PPT PresentationTRANSCRIPT
Site Report from KEK, JapanSite Report from KEK, Japan
JP-KEK-CRC-01 and JP-KEK-CRC-02JP-KEK-CRC-01 and JP-KEK-CRC-02Go Iwai, KEK/CRCGo Iwai, KEK/CRC
Grid Operations Workshop – 2007Grid Operations Workshop – 2007Kungliga Tekniska högskolan, Stockholm, SwedenKungliga Tekniska högskolan, Stockholm, Sweden
13-15 June 200713-15 June 2007
DEPLOYMENT STATUS AT KEKDEPLOYMENT STATUS AT KEKJP-KEK-CRC-01 and JP-KEK-CRC-02JP-KEK-CRC-01 and JP-KEK-CRC-02
2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 22
KEK External NetworkKEK External NetworkKEK External NetworkKEK External NetworkKEK Internal NetworkKEK Internal NetworkKEK Internal NetworkKEK Internal Network
Logical Site OverviewLogical Site Overview
JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02
KEK FirewallKEK FirewallKEK FirewallKEK Firewall
HPSSHPSSHPSSHPSS
Central Computing SystemCentral Computing SystemNew KEK-CCNew KEK-CC
Grid LANGrid LAN
Scoped only for GRIDsScoped only for GRIDs
TaiwanTaiwanAsia-Pacific regionAsia-Pacific regionTaiwanTaiwanAsia-Pacific regionAsia-Pacific region
APANAPANAPANAPAN
Domestic institutesDomestic institutesU.S.AU.S.ADomestic institutesDomestic institutesU.S.AU.S.A
SuperSINETSuperSINETSuperSINETSuperSINET
Production SystemProduction SystemProduction SystemProduction System
Not for WLCGNot for WLCGStaff’s trainingStaff’s trainingWill Shift to PPSWill Shift to PPS
Not for WLCGNot for WLCGStaff’s trainingStaff’s trainingWill Shift to PPSWill Shift to PPS
JP-KEK-CRC-00JP-KEK-CRC-00JP-KEK-CRC-00JP-KEK-CRC-00
JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01
Production SystemProduction SystemProduction SystemProduction System
2007/6/132007/6/13 33Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
KEK-1KEK-1KEK-1KEK-1
KEK-2KEK-2KEK-2KEK-2
2007/6/132007/6/13 44Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
Physical Site OverviewPhysical Site Overview
Brief Summary of LCG Brief Summary of LCG DeploymentDeployment
JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01
• since Nov. 2005.since Nov. 2005.• is registered to GOC, is ready is registered to GOC, is ready
to WLCGto WLCG• is operated by KEK staffs.is operated by KEK staffs.• Site Role:Site Role:
– practice for production system practice for production system JP-KEK-CRC-02.JP-KEK-CRC-02.
– test use among university groups in test use among university groups in Japan.Japan.
• Resource and Component:Resource and Component:– SL-3.0.5 w/ gLite-3.0 laterSL-3.0.5 w/ gLite-3.0 later– CPU: 14, Storage: ~1.5TBCPU: 14, Storage: ~1.5TB– FTS, FTA, RB, MON, BDII, LFC, CE, SEFTS, FTA, RB, MON, BDII, LFC, CE, SE
• Supported VOs:Supported VOs:– belle, apdg, g4med, ppj, dteam, ops, belle, apdg, g4med, ppj, dteam, ops,
calice, ilc and ailcalice, ilc and ail
JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02
• since early 2006.since early 2006.• is registered to GOC, is ready to is registered to GOC, is ready to
WLCG.WLCG.• Site Role:Site Role:
– More stable services based on KEK-1 More stable services based on KEK-1 experiences. experiences.
• Resource and Component:Resource and Component:– SL or SLC w/ gLite-3.0 laterSL or SLC w/ gLite-3.0 later– CPU: 48, Storage: ~1TB (w/o HPSS)CPU: 48, Storage: ~1TB (w/o HPSS)– Full componentsFull components
• Supported VOs:Supported VOs:– belle, apdg, g4med, atlasj, ppj, ilc, belle, apdg, g4med, atlasj, ppj, ilc,
calice, dteam, ops and ailcalice, dteam, ops and ail
2007/6/132007/6/13 55Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
GridGrid Related ServicesRelated Services• We have our own GRID CAWe have our own GRID CA
– is started on Feb. 2006, and is recognized by LCG.is started on Feb. 2006, and is recognized by LCG.– is accredited by APGRID PMAis accredited by APGRID PMA– http://gridca.kek.jp/http://gridca.kek.jp/
• VO Membership ServiceVO Membership Service– Supported VOs:Supported VOs:
• apdg is the VO for Asia-Pacific Data Grid.apdg is the VO for Asia-Pacific Data Grid.• belle is the VO for Belle experiments.belle is the VO for Belle experiments.• atlasj is the VO for Atlas experiments in Japan.atlasj is the VO for Atlas experiments in Japan.• g4med is the VO for Geant4 medical application.g4med is the VO for Geant4 medical application.• PPJ is the VO for the Particle Physics in Japan.PPJ is the VO for the Particle Physics in Japan.• ail is the VO for Associated International Laboratory between Japan and France.ail is the VO for Associated International Laboratory between Japan and France.
– http://voms.kek.jp/http://voms.kek.jp/• Local Mirror ServiceLocal Mirror Service
– SL, SLC, LCG, gLiteSL, SLC, LCG, gLite– It takes ~30 minutes to update by using apt-get with CERN or FNAL repositories.It takes ~30 minutes to update by using apt-get with CERN or FNAL repositories.
• ~3 minutes with KEK repository~3 minutes with KEK repository– http://hepdg.cc.kek.jp/mirror/http://hepdg.cc.kek.jp/mirror/
• Semi-automatic Installation ServiceSemi-automatic Installation Service– WNs can be installed semi-automatically by PXE (Preboot eXecution Environment) and kickstart WNs can be installed semi-automatically by PXE (Preboot eXecution Environment) and kickstart
configuration file.configuration file.– http://hepdg.cc.kek.jp/install/http://hepdg.cc.kek.jp/install/
• Site PortalSite Portal– http://grid.kek.jp/http://grid.kek.jp/
2007/6/132007/6/13 66Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
People on Grid at KEK/CRCPeople on Grid at KEK/CRC• 7 persons in total7 persons in total• CACA
– T. Sasaki T. Sasaki and and Y. IidaY. Iida• VOMSVOMS
– Y. Watase Y. Watase and and G. IwaiG. Iwai• Site Operation and SecuritySite Operation and Security
– KEK-0KEK-0• G. Iwai G. Iwai
– KEK-1KEK-1• T. SasakiT. Sasaki, , Y. IidaY. Iida, , Y. Watase Y. Watase and and G. IwaiG. Iwai
– KEK-2KEK-2• T. SasakiT. Sasaki, , Y. WataseY. Watase, and , and G. IwaiG. Iwai
• DeploymentDeployment– Y. WataseY. Watase, , Y. Iida Y. Iida and and G. IwaiG. Iwai
• DocumentationDocumentation– Y. WataseY. Watase
• NetworkingNetworking– S. Suzuki, S. Yashiro and S. Suzuki, S. Yashiro and Y. IidaY. Iida
• Application (SRB, Portal and some Gridified applications)Application (SRB, Portal and some Gridified applications)– K. Murakami, K. Murakami, Y. Iida Y. Iida and and G. IwaiG. Iwai
2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 77
OPERATION STATISTICSOPERATION STATISTICS
2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 88
Submitted GGUS Submitted GGUS Tickets in JFY2006Tickets in JFY2006
• Total number of submitted Total number of submitted ticket: 28ticket: 28– KEK-1: 11KEK-1: 11– KEK-2: 17KEK-2: 17
2007/6/132007/6/13 99Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
Number of Submitted Jobs in JFY2006Number of Submitted Jobs in JFY2006
JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01 JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02
2007/6/132007/6/13 1010Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
Normalized CPU time in JFY2006Normalized CPU time in JFY2006(kSI2K*hrs)(kSI2K*hrs)
JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01JP-KEK-CRC-01 JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02JP-KEK-CRC-02
2007/6/132007/6/13 1111Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
VIRTUAL ORGANIZATIONVIRTUAL ORGANIZATIONBelle Experiment and Accelerator ScienceBelle Experiment and Accelerator Science
2007/6/132007/6/13 1212Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
VO for the Belle VO for the Belle ExperimentExperiment
• Belle VO is federated among Belle VO is federated among 4 countries, 6 institutes, 9 4 countries, 6 institutes, 9 sites.sites.– Japan: Nagoya University and Japan: Nagoya University and
KEKKEK– Taiwan: ASGC and NCUTaiwan: ASGC and NCU– Australia: University of Australia: University of
MelborneMelborne– Poland: CYFRONETPoland: CYFRONET– Korea University comes up Korea University comes up
soon.soon.
• Started using SRB and LCGStarted using SRB and LCG• Data distribution service using Data distribution service using
SRB-DSISRB-DSI– Belle already has a few PBs data Belle already has a few PBs data
in total including 100s TB DST in total including 100s TB DST and MCand MC• Bulk file register helps us: Bulk file register helps us:
SregisterSregister• we do not move any of themwe do not move any of them
– It is too much difficult to export It is too much difficult to export existing data to LCG physicallyexisting data to LCG physically
– Benefits both for native SRB Benefits both for native SRB users and LCG usersusers and LCG users
• SRB-DSI with LCG is in SRB-DSI with LCG is in operation now. operation now.
2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 1313
CYFRONETCYFRONETPolandPoland
CYFRONETCYFRONETPolandPoland
KEKKEKJapanJapanKEKKEK
JapanJapanNagoya Univ.Nagoya Univ.
JapanJapanNagoya Univ.Nagoya Univ.
JapanJapan
Melbourne Univ.Melbourne Univ.AustraliaAustralia
Melbourne Univ.Melbourne Univ.AustraliaAustralia
ASGCASGCTaiwanTaiwanASGCASGC
TaiwanTaiwanNCUNCU
TaiwanTaiwanNCUNCU
TaiwanTaiwan
2007/6/132007/6/13 1414Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
Hiroshima IT
VO for the Accelerator ScienceVO for the Accelerator Science
• Domestic supportsDomestic supports– Typical case at laboratory: A few staffs, ~10 students and no technician. Typical case at laboratory: A few staffs, ~10 students and no technician.
• Start to monitor them centrally over the VOStart to monitor them centrally over the VO– Focused on their operation supportsFocused on their operation supports– Not only for WLCG sites but also for NON-WLCG sitesNot only for WLCG sites but also for NON-WLCG sites
– PPJ VO is started for the accelerator science in PPJ VO is started for the accelerator science in Japan.Japan.
– Federated among a few universities.Federated among a few universities.• Tohoku Univ., Tsukuba Univ., Kobe Univ., Tohoku Univ., Tsukuba Univ., Kobe Univ.,
Hiroshima Univ., Nagoya Univ. and KEK.Hiroshima Univ., Nagoya Univ. and KEK.– Usage:Usage:
• To share resources and experiences among To share resources and experiences among major groups, ILC, KamLand, CDF and ATLAS major groups, ILC, KamLand, CDF and ATLAS without depending on experimental projects.without depending on experimental projects.
2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 1515
ConclusionConclusion• Tools used in daily grid operations Tools used in daily grid operations
– Semi –automatic installation tools only for WNsSemi –automatic installation tools only for WNs• Most of tools are handmade scriptsMost of tools are handmade scripts
– Monitoring tools, e.g.; SAM and GSTAT are very useful.Monitoring tools, e.g.; SAM and GSTAT are very useful.– GGUS Search and APWIKI are also. GGUS Search and APWIKI are also. – We are testing to audit by using nCircle, vulnerability management system.We are testing to audit by using nCircle, vulnerability management system.
• Scheduled InterventionsScheduled Interventions– 11 times in JFY200611 times in JFY2006– Due toDue to
• Software/hardware upgrade and site reconfigurationSoftware/hardware upgrade and site reconfiguration• Annual maintenanceAnnual maintenance• Replacement of host certificateReplacement of host certificate
• Unscheduled interventionsUnscheduled interventions– ~10 times/year~10 times/year– Ex) Failed to reconfigure the site, or power cut by thunder.Ex) Failed to reconfigure the site, or power cut by thunder.
• Domestic supports in JapanDomestic supports in Japan– Important mission for KEK.Important mission for KEK.
• ~90% of problems are detected by the COD, SAM, GSTAT and nagios.~90% of problems are detected by the COD, SAM, GSTAT and nagios.– Our operation on Grid is supported by great efforts by APROC members in ASGC, Taiwan.Our operation on Grid is supported by great efforts by APROC members in ASGC, Taiwan.– We’d like to keep the tighter collaboration with ASGC. We’d like to keep the tighter collaboration with ASGC.
2007/6/132007/6/13 1616Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
ENDENDThank youThank you
2007/6/132007/6/13 1717Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm
2007/6/132007/6/13 Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm 1818
KEK-CCKEK-CC Grid LANGrid LAN
B-NETB-NET
KEK-FBKEK-FB KEK-2KEK-2202.13.197.0/24202.13.197.0/24
KEK-2KEK-2202.13.197.0/24202.13.197.0/24
New builtNew builtNew builtNew built
130.87.224.0/21130.87.224.0/21
SRB/MCATSRB/MCAT172.22.28.0/24172.22.28.0/24
130.87.224.0/21130.87.224.0/21
SRB/MCATSRB/MCAT172.22.28.0/24172.22.28.0/24
KEK-1KEK-1130.87.208.0/22130.87.208.0/22
KEK-1KEK-1130.87.208.0/22130.87.208.0/22
KEK-DMZKEK-DMZ
KEK FirewallKEK Firewall
GridFTPGridFTP130.87.104.0/22130.87.104.0/22
GridFTPGridFTP130.87.104.0/22130.87.104.0/22
HSMHSMHSMHSMNFSNFS
SRBSRB
GridFTPGridFTP
SRB-DSISRB-DSISRB-DSISRB-DSIPluggable ExtensionPluggable Extension
APANAPANAPANAPAN
SuperSINETSuperSINETSuperSINETSuperSINET
LCG with SRB at Belle LCG with SRB at Belle VOVO
Points to Cover in Each PresentationPoints to Cover in Each Presentation• tools used in daily grid operations tools used in daily grid operations • what features are missing to make your work easier what features are missing to make your work easier • examples of the most frequent scheduled interventions at your site examples of the most frequent scheduled interventions at your site • examples of the most frequent unscheduled interventions at your site examples of the most frequent unscheduled interventions at your site • points to improve in communication with ROC, other sites, Vos, rest of points to improve in communication with ROC, other sites, Vos, rest of
the world... the world... • How do you plan deployment of updates/new versions so continuous production is not How do you plan deployment of updates/new versions so continuous production is not
interrupted? interrupted? • Communication with users: how are you informed about operational problems at your site Communication with users: how are you informed about operational problems at your site
reported by local/remote users? Mail/GGUS/phone/other? reported by local/remote users? Mail/GGUS/phone/other? • Correlation of cross-site issues: is the operations meeting enough for Correlation of cross-site issues: is the operations meeting enough for
this? How do you do it otherwise? this? How do you do it otherwise? • What percentage of real site problems are detected and reported by the What percentage of real site problems are detected and reported by the
COD before you know about them? COD before you know about them? • usefulness of the following operations bodies/meetings and suggestions to improve them: usefulness of the following operations bodies/meetings and suggestions to improve them:
– CODCOD– your ROC support team your ROC support team – operations meeting operations meeting
2007/6/132007/6/13 1919Grid Operations Workshop at KTH, StockholmGrid Operations Workshop at KTH, Stockholm