triumf site report for hepix, slac, october 10-14,2005 triumf site report corrie kost update since...
TRANSCRIPT
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF SITE REPORT
Corrie Kost
Update since Hepix Spring 2005
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
• $2995 US w 1 yr support
• indexes up to 100,000 docs
• 220 different file formats
• Two 10/100 Ethernet ports
- 1st for normal operation
- 2nd for setup using cross-over cable
• 120GB Seagate Drive
• 2GB Memory
• Maintainance via special google dial- up modem
Google Mini comes to TRIUMF
Read a complete in-depth review at http://www.anandtech.com/IT/showdoc.aspx?i=2523&p=2
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
The TRIUMF-CERN 1GbE Lightpath(s)
• TRIUMF• BCNET• CANARIE• SURFnet• CERN
• 1 GbE circuit establishedApril 18th 2005
• 2nd GbE circuit established July 19th 2005
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
http://grid.triumf.ca/status/sc3.html
Servers3 EMT64 systems, each with:
2 GB memory hardware raid - 3ware 9xxx SATA raid controller Seagate Barracuda 7200.8 drives in hardware raid 5 - 8 x 250 GB
1 dual Opteron 246 server with: 2 GB memory 3ware 9xxx SATA raid controllerWD Caviar SE drives in hardware raid 0 - 2 x 250 GB 2 4560-SLX IBM Tape Libraries (currently each with only 1 SDLT 320 tape drive)
1 borrowed EMT64 system used temporarily as an FTS Server with: 1 GB memory 2 SATA 80 GB drives for the OS and for Oracle's needs.
Storage5.5+ TB disk 8+ TB tape
ATLAS Service Challenge
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
10 GbE Lightpath to CERN
TRIUMF CERN
Atlantic Crossing√
√√
√
√
√X
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
10 GbE Lightpath to CERN
•Permanent 10GbE TRIUMF-CERN Lightpath ~ year-end 2005•Foundry Bigiron RX-4’s at TRIUMF & BCnet
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
TRIUMF WAN CWDM
BCNET 22km
10GbEFoundry Switch (CERN / Ottawa)
MRV CWDM
1610 nm1590 nm1570 nm1550 nm
Potential to Add 2 more1GbE channels
Single Pair Fiber
4 1GbE channelsPassport 8600• ORAN• WESTGRID• 2x CERN
SFP 4 Port Optical Mux
2x GbE TDM
PROBLEM: MRV needs 1550+/-3nm but FOUNDRY 1550+/-15nm
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
Raid5: Puzzling I/O resultsRepeated reads on same set of files (at 600MB/sec) – one or more files will “degrade” – typically after set of 16 8GB files have been read 1000 times. Positive: Read ~2PB during 50 days – averaging about 600MB/sec
TRANSITION
0
5
10
15
20
1 17 33 49 65 81
File Number (same every 16th)
8G
B F
ile R
ea
d T
ime
(s
ec
)
8 SATA disks on each of pair of RAID5 RocketRaid 1820A controllers
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
Unix Backups at TRIUMF
• Amanda system– Dual Opteron 248 2.2 GHz
• 2G Memory• 16 x400G WD disks ~ 6TB (1.5TB present sys ~ 10day cycle)• 2 LSI Mega raid 8 disk controllers
• Disk based ~1 month of backups– At least 2 full backups with daily incrementals
• 26 Slot Overland DLT tape library• SDLT 600 drive 300G native capacity per tape
• 150 Linux machines (users: home dir, servers: full)
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
Cheap Hot-Swap Backup
• Promise SuperSwap 1100 Enclosures
• Four 400 GB Seagate Sata Drives
• Promise FastTrak S150 SX4 Sata controller
• Raid 5
• Linux 2.4.20-8 RedHat 9
A disk can be removed at anytime and replaced at anytime. Rebuilds in background.
Used to keep live multiple (daily) RSYNC (via DIRVISH) copies of critical servers (for ~ 1 month). See http://www.dirvish.com/
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
http://hepix.caspur.it/afs/hepix.org/projects.html
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
- Site services (Web, Email, Batch, Windows) all much more stable – new hardware, more memory (typically 4-8GB) in servers
- Quad Opteron SUN I/O - using external SATA - still limited below 1 GB/sec
- Read 16 8GB files repeatedly – averaging over 600MB/sec for ~2PB
- Site “Backup” services still problematic
- tape media capacity (outgrow in 2 years)
- reliability (is SDLT robust?)
- Permanent 10GbE TRIUMF-CERN service by year-end.
- ATLAS Service Challenges targets being met for TRIUMF as TIER1
- Started using PLONE as content management for TRIUMF Web Server
- Moving some phones to voice-over-IP
- Scientific Linux (3 &4) still preferred Linux OS at TRIUMF
- Moving away from distributed printing to print/scan-to-email/copy stations
Conclusions / Observations
TRIUMF Site Report for HEPiX, SLAC, October 10-14,2005
STORM1
STORM2
SUN1
FoundryLCGSTORAGE
WORKERNODES
GPS TIMEMSR WEBNAMEDOCUMENTSCONDORGWEBSHAREMAILFILE
IBM CLUSTER
FEDORA / SLMIRROR
IBM / SHARESTORAGE
AMANDABACKUP (VIA DISKS)
TRIUMF Servers – May/2005