gpfs storage server zki-ak 2013-03-15 · ibm aix® loose clusters gpfs 2.1-2.3 hpc research ... see...
TRANSCRIPT
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Karsten Kutzer
Client Technical Architect – Technical ComputingIBM Systems & Technology Group
IBM System x GPFS Storage Server“Schöne Aussicht”en für HPC Speicher
ZKI-Arbeitskreis Paderborn, 15.03.2013
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Agenda
� IBM General Parallel File System overview
� GPFS Native RAID technology
� GPFS Storage Server
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Agenda
� IBM General Parallel File System overview
� GPFS Native RAID technology
� GPFS Storage Server
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
IBM General Parallel File System (GPFS™) – History and evolution
2006200520021998
HPC
GPFS
General File Serving
� Standards� Portable
operating system interface
(POSIX) semantics
-Large block� Directory and
Small file perf
� Data management
Virtual Tape Server
(VTS)
Linux®Clusters
(Multiple architectures)
IBM AIX®Loose Clusters
GPFS 2.1-2.3
HPC
Research
Visualization
Digital Media
Seismic
Weather
exploration
Life sciences
32 bit /64 bit
Inter-op (IBM AIX
& Linux)
GPFS Multicluster
GPFS over wide
area networks
(WAN)
Large scale
clusters
thousands of
nodes
GPFS 3.1-3.2
2009
First called GPFS
GPFS 3.4
Enhanced
Windows cluster
support
- Homogenous
Windows Server
Performance and
scaling
improvements
Enhanced
migration and
diagnostics
support
2010
GPFS 3.3
Restricted
Admin
Functions
Improved
installation
New license
model
Improved
snapshot and
backup
Improved ILM
policy engine
2012
Ease of administration
Multiple-networks/ RDMA
Distributed Token Management
Windows 2008
Multiple NSD servers
NFS v4 Support
Small file performance
Information lifecycle management (ILM)
� Storage Pools
� File sets
� Policy Engine
GPFS 3.5
Caching via
Active File
Management
(AFM)
GPFS Storage
Server
GPFS File
Placement
optimizer (FPO)
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Agenda
� IBM General Parallel File System overview
� GPFS Native RAID technology
� GPFS Storage Server
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Disk drive latency by year
0
1
2
3
4
5
6
7
8
1985 1990 1995 2000 2005 2010
ms.
PERSEUS/GNR Motivation – Hard Disk Rates Are Lagging
Disk Areal Density Trend 2000-2010
0.1
1
10
100
1000
10000
1998 2000 2002 2004 2006 2008 2010
Gb
/sq
.in
.
100% CAGR
25-35% CAGR
Media data rate vs. year
1
10
100
1000
1995 2000 2005 2010
MB/s
� There have been recent inflection points in disk technology – in the wrong direction
� In spite of these trends, Peta- and Exascale projects aim to maintain performance increases
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
GPFS Native RAID technology
� Declustered RAID– Data and parity stripes are uniformly partitioned and distributed across a disk array.
– Arbitrary number of disks per array (unconstrained to an integral number of RAID stripe widths)
� 2-fault and 3-fault tolerance (RAID-D2, RAID-D3)– Reed-Solomon parity encoding– 2 or 3-fault-tolerant: stripes = 8 data strips + 2 or 3 parity strips– 3 or 4-way mirroring
� End-to-end checksum– Disk surface to GPFS user/client– Detects and corrects off-track and lost/dropped disk writes
� Asynchronous error diagnosis while affected IOs continue– If media error: verify and restore if possible– If path problem: attempt alternate paths– If unresponsive or misbehaving disk: power cycle disk
� Supports service of multiple disks on a carrier– IO ops continue on for tracks whose disks have been removed during carrier service
7
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
De-clustering – Bringing Parallel Performance to Disk Maintenance
8
20 disks, 5 disks per traditional RAID array
4x4 RAID stripes(data plus parity)
20 disks in 1 De-clustered array
� Declustered RAID: Data+parity distributed over all disks– Rebuild uses IO capacity of an array’s 19 (surviving) disks
Striping across all arrays, all file
accesses are throttled by array 2’s
rebuild overhead.
Load on files accesses are
reduced by 4.8x (=19/4)
during array rebuild.
Failed Disk
16 RAID stripes(data plus parity)
� Traditional RAID: Narrow data+parity arrays– Rebuild uses IO capacity of an array’s only 4 (surviving) disks
Failed Disk
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Low-Penalty Disk Rebuild Overhead
9
failed disk
Rd-Wr
time
Rd Wr
time
failed disk
Reduces Rebuild Overhead by 3.5x
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Declusterdata, parity and
spare
14 physical disks / 3 traditional RAID6 arrays / 2 spares 14 physical disks / 1 declustered RAID6 array / 2 spares
Declusterdata, parity and
spare
14 physical disks / 3 traditional RAID6 arrays / 2 spares 14 physical disks / 1 declustered RAID6 array / 2 spares
failed disks failed disks
failed disksfailed disksNumber of faults per stripe
Red Green Blue
0 2 0
0 2 0
0 2 0
0 2 0
0 2 0
0 2 0
0 2 0
Number of faults per stripe
Red Green Blue
1 0 1
0 0 1
0 1 1
2 0 0
0 1 1
1 0 1
0 1 0
Number of stripes with 2 faults = 1Number of stripes with 2 faults = 7
Declustered RAID6 Example
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Non-intrusive disk diagnostics
� Disk Hospital: Background determination of problems– While a disk is in hospital, GNR non-intrusively and immediately
returns data to the client utilizing the error correction code.
– For writes, GNR non-intrusively marks write data and reconstructs it
later in the background after problem determination is complete.
� Advanced fault determination
– Statistical reliability and SMART monitoring
– Neighbor check, drive power cycling – Media error detection and correction
� Supports concurrent disk firmware updates
11
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Mean time to data loss 8+2 vs. 8+3
Parity 50 disks 200 disks 50,000 disks
8+2 200,000 years 50,000 years 200 years
8+3 250 billion years 60 billion years 230 million years
12
Simulation assumptions: Disk capacity = 600-GB, MTTF = 600khrs, hard error rate = 1-in-1015
bits, 47-HDD declustered arrays, uncorrelated failures. These MTTDL figures are due to hard
errors, AFR (2-FT) = 5 x 10-6, AFR (3-FT) = 4 x 10-12
These figures assume uncorrelated failures and hard read errors.
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Agenda
� IBM General Parallel File System overview
� GPFS Native RAID technology
� GPFS Storage Server
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
GPFS Native Raid (GNR) on Power 775 – GA on 11/11/11
� See Chapter 10 of GPFS 3.5 Advanced Administration Guide (SC23-5182-05)
– or GPFS 3.4 Native RAID Administration and Programming Reference (SA23-1354-00)
� First customers:– Weather agencies, government agencies, universities.
IBM Confidential
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
R
DedicatedDisk Controllers
NSD File Server 1
x3650NSD File Server 1
x3650
NSD File Server 2NSD File Server 2
Disk Enclosures
NSD File Server 1NSD File Server 1
NSD File Server 2NSD File Server 2
GPFS Native RAIDGPFS Native RAID
GPFS Native RAIDGPFS Native RAID
RD
FDR IB
10 GigE
Workload Evolution
File/Data ServersMigrate RAID to Commodity File
Servers
Disk Enclosures
Compute Cluster Compute Cluster
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Introducing IBM GPFS Storage Server
� What’s New:– Replaces external hardware controller with software based RAID– Modular upgrades improve TCO– Non-intrusive disk diagnostics
� Client Business Value:– Integrated and ready to go for Big Data applications– 3 years maintenance and support– Improved storage affordability– Delivers data integrity, end-to-end– Faster rebuild and recovery times– Reduces rebuild overhead by 3.5x
� Key Features:– Declustered RAID (8+2p, 8+3p)– 2- and 3-fault-tolerant erasure codes– End-to-end checksum– Protection against lost writes– Off-the-shelf JBODs– Standardized in-band SES management– SSD Acceleration Built-in
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
17
“Twin Tailed” JBOD Disk Enclosure
x3650 M4
Complete Storage Solution
Data Servers, Disk (SSD and NL-SAS), Software, Infiniband and Ethernet
Model 24:
Light and Fast4 Enclosures 20U232 NL-SAS 6 SSD
Model 26:
HPC Workhorse!6 Enclosures 28U
348 NL-SAS 6 SSD
High Density HPC
Options18 Enclosures
2 - 42u Standard Racks1044 NL-SAS 18 SSD
A Scalable Building Block Approach to Storage
© 2012 IBM Corporation
IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013
Agenda
� IBM General Parallel File System overview
� GPFS Native RAID technology
� GPFS Storage Server