r&d activities on storage in cern-it’s fio group
DESCRIPTION
R&D Activities on Storage in CERN-IT’s FIO group. Helge Meinhard / CERN-IT HEPiX Fall 2009 LBNL 27 October 2009. Outline. Follow-up of two presentations in Umea meeting: iSCSI technology ( Andras Horvath) Lustre evaluation project (Arne Wiebalck ). iSCSI - Motivation. - PowerPoint PPT PresentationTRANSCRIPT
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
R&D Activities on Storagein CERN-IT’s FIO group
Helge Meinhard / CERN-ITHEPiX Fall 2009 LBNL
27 October 2009
Outline
Follow-up of two presentations in Umea meeting:• iSCSI technology (Andras Horvath)• Lustre evaluation project (Arne Wiebalck)
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
iSCSI - Motivation
• Three approaches– Possible replacement for rather expensive setups with
Fibre Channel SANs (used e.g. for physics databases with Oracle RAC, and for backup infrastructure) or proprietary high-end NAS appliances• Potential cost-saving
– Possible replacement for bulk disk servers (Castor)• Potential gain in availability, reliability and flexibility
– Possible use for applications, for which small disk servers have been used in the past• Potential gain in flexibility, cost-saving
• Focus is functionality, robustness and large-scale deployment rather than ultimate performance
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
iSCSI terminology
• iSCSI is a set of protocols for block-level access to storage– Similar to FC– Unlike NAS (e.g. NFS)
• “Target”: storage unit listening to block-level requests– Appliances available on the market– Do-it-yourself: put software stack on storage node, e.g. our
storage-in-a-box nodes• “Initiator”: unit sending block-level requests (e.g.
read, write) to the target– Most modern operating systems feature an iSCSI initiator
stack: Linux RH4, RH5; Windows
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Hardware used
• Initiators: number of different servers including– Dell M610 blades– Storage-in-a-box server– All running SLC5
• Targets:– Dell Equallogic PS5000E (12 drives, 2 controllers with 3 GigE
each)– Dell Equallogic PS6500E (48 drives, 2 controllers with 4 GigE
each)– Infortrend A12E-G2121 (12 drives, 1 controller with 2 GigE)– Storage-in-a-box: Various models with multiple GigE or 10GigE
interfaces, running Linux• Network (if required): private, HP ProCurve 3500 and 6600
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Target stacks under Linux
• RedHat Enterprise 5 comes with tgtd– Single-threaded– Does not scale well
• Tests with IET– Multi-threaded– No performance limitation in our tests– Required newer kernel to work out of the box (Fedora
and Ubuntu server worked for us) • In context of collaboration between CERN and Caspur, work
going on to understand the steps to be taken for backporting IET to RHEL 5
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Performance comparison
• 8k random I/O test with Oracle tool Orion
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Performance measurement
• 1 server, 3 storage-in-a-box servers as targets– Each target exporting 14 JBOD disks over 10GigE
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Almost production status…
• Two storage-in-a-box servers with hardware RAID5 running SLC5 and tgtd on GigE– Initiator provides multipathing and software RAID 1– Used for some grid services– No issues
• Two Infortrend boxes (JBOD configuration)– Again, initiator provides multipathing and software RAID 1– Used as backend storage for Lustre MDT (see next part)
• Tools for setup, configuration and monitoring in place
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Being worked on
• Large deployment of Equallogic ‘Sumos’ (48 drives of 1 TB each, dual controllers, 4 GigE/controller): 24 systems, 48 front-end nodes
• Experience encouraging, but there are issues– Controllers don’t support DHCP, manual config required– Buggy firmware– Problems with batteries on controllers– Support not fully integrated into Dell structures yet– Remarkable stability
• We have failed all network and server components that can fail, the boxes kept running
– Remarkable performance
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Equallogic performance
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
• 16 servers, 8 sumos, 1 GigE per server, iozone
Appliances vs. home-made
• Appliances– Stable– Performant– Highly functional (Equallogic: snapshots, relocation
without server involvement, automatic load balancing, …)• Home-made with storage-in-a-box servers
– Inexpensive– Complete control over configuration– Can run other things than target software stack– Can select function at software install time (iSCSI target
vs. classical disk server with rfiod or xrootd)
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Ideas (partly started testing)
• Two storage-in-a box servers as highly redundant setup– Running target and initiator stacks at the same time– Mounting half the disks local, half on the other machine– Some heartbeat detects failures and (e.g. by resetting an
IP alias) moves functionality to one or the other box• Several storage-in-a-box servers as targets
– Exporting disks either as JBOD or as RAID– Front-end server creates software RAID (e.g. RAID 6)
over volumes from all storage-in-a-box servers– Any one (or two with SW RAID 6) storage-in-a-box server
can fail entirely, the data remain available
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Lustre Evaluation Project
• Tasks and goals– Evaluate Lustre as a candidate for storage consolidation
• Home directories• Project space• Analysis space• HSM
– Reduce service catalogue• Increase overlap between service teams• Integrate with CERN fabric management tools
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Areas of interest (1)
• Installation– Quattorized installation of Lustre instances– Client RPMs for SLC5
• Backup– LVM-based snapshots for meta data– Tested with TSM, set up for PPS instance– Changelogs feature of v2.0 not yet usable
• Strong Authentication– v2.0: early adaptation, full Kerberos Q1/2011– Tested & used by other sites (not by us yet)
• Fault-tolerance– Lustre comes with built-in failover– PPS MDS iSCSI setup
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
FT: MDS PPS Setup
Dell Equallogic iSCSI Arrays16x 500GB SATA
Dell PowerEdge M600Blade Server 16GB
Private iSCSI Network
MDS MDTOSS
OSS
CLT
Fully redundant against component failure– iSCSI for shared storage – Linux device mapper + md for mirroring– Quattorized– Needs testing
Areas of Interest (2/2)
• Special performance & Optimization– Small files: „Numbers dropped from slides“– Postmark benchmark (not done yet)
• HSM interface– Active developement, driven by CEA– Access to Lustre HSM code (to be tested with
TSM/CASTOR)• Life Cycle Management (LCM) & Tools
– Support for day-to-day operations?– Limited support for setup, monitoring and management
Findings and Thoughts
• No strong authentication as of now– Foreseen for Q1/2011
• Strong client/server coupling– Recovery
• Very powerful users– Striping, Pools
• Missing support for life cycle management– No user transparent data migration– Lustre/Kernel upgrades difficult
• Moving targets on the roadmap– V2.0 not yet stable enough for testing
Summary
• Some desirable features not there (yet)– Wish list communicated to SUN– SUN interested in evaluation
• Some more tests to be done– Kerberos, Small files, HSM
• Documentation