gpfs storage server zki-ak 2013-03-15 · ibm aix® loose clusters gpfs 2.1-2.3 hpc research ... see...

19
© 2012 IBM Corporation IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013 Karsten Kutzer Client Technical Architect – Technical Computing IBM Systems & Technology Group IBM System x GPFS Storage Server “Schöne Aussicht”en für HPC Speicher ZKI-Arbeitskreis Paderborn, 15.03.2013

Upload: dinhphuc

Post on 14-Apr-2018

223 views

Category:

Documents


5 download

TRANSCRIPT

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Karsten Kutzer

Client Technical Architect – Technical ComputingIBM Systems & Technology Group

IBM System x GPFS Storage Server“Schöne Aussicht”en für HPC Speicher

ZKI-Arbeitskreis Paderborn, 15.03.2013

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Agenda

� IBM General Parallel File System overview

� GPFS Native RAID technology

� GPFS Storage Server

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Agenda

� IBM General Parallel File System overview

� GPFS Native RAID technology

� GPFS Storage Server

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

IBM General Parallel File System (GPFS™) – History and evolution

2006200520021998

HPC

GPFS

General File Serving

� Standards� Portable

operating system interface

(POSIX) semantics

-Large block� Directory and

Small file perf

� Data management

Virtual Tape Server

(VTS)

Linux®Clusters

(Multiple architectures)

IBM AIX®Loose Clusters

GPFS 2.1-2.3

HPC

Research

Visualization

Digital Media

Seismic

Weather

exploration

Life sciences

32 bit /64 bit

Inter-op (IBM AIX

& Linux)

GPFS Multicluster

GPFS over wide

area networks

(WAN)

Large scale

clusters

thousands of

nodes

GPFS 3.1-3.2

2009

First called GPFS

GPFS 3.4

Enhanced

Windows cluster

support

- Homogenous

Windows Server

Performance and

scaling

improvements

Enhanced

migration and

diagnostics

support

2010

GPFS 3.3

Restricted

Admin

Functions

Improved

installation

New license

model

Improved

snapshot and

backup

Improved ILM

policy engine

2012

Ease of administration

Multiple-networks/ RDMA

Distributed Token Management

Windows 2008

Multiple NSD servers

NFS v4 Support

Small file performance

Information lifecycle management (ILM)

� Storage Pools

� File sets

� Policy Engine

GPFS 3.5

Caching via

Active File

Management

(AFM)

GPFS Storage

Server

GPFS File

Placement

optimizer (FPO)

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Agenda

� IBM General Parallel File System overview

� GPFS Native RAID technology

� GPFS Storage Server

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Disk drive latency by year

0

1

2

3

4

5

6

7

8

1985 1990 1995 2000 2005 2010

ms.

PERSEUS/GNR Motivation – Hard Disk Rates Are Lagging

Disk Areal Density Trend 2000-2010

0.1

1

10

100

1000

10000

1998 2000 2002 2004 2006 2008 2010

Gb

/sq

.in

.

100% CAGR

25-35% CAGR

Media data rate vs. year

1

10

100

1000

1995 2000 2005 2010

MB/s

� There have been recent inflection points in disk technology – in the wrong direction

� In spite of these trends, Peta- and Exascale projects aim to maintain performance increases

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

GPFS Native RAID technology

� Declustered RAID– Data and parity stripes are uniformly partitioned and distributed across a disk array.

– Arbitrary number of disks per array (unconstrained to an integral number of RAID stripe widths)

� 2-fault and 3-fault tolerance (RAID-D2, RAID-D3)– Reed-Solomon parity encoding– 2 or 3-fault-tolerant: stripes = 8 data strips + 2 or 3 parity strips– 3 or 4-way mirroring

� End-to-end checksum– Disk surface to GPFS user/client– Detects and corrects off-track and lost/dropped disk writes

� Asynchronous error diagnosis while affected IOs continue– If media error: verify and restore if possible– If path problem: attempt alternate paths– If unresponsive or misbehaving disk: power cycle disk

� Supports service of multiple disks on a carrier– IO ops continue on for tracks whose disks have been removed during carrier service

7

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

De-clustering – Bringing Parallel Performance to Disk Maintenance

8

20 disks, 5 disks per traditional RAID array

4x4 RAID stripes(data plus parity)

20 disks in 1 De-clustered array

� Declustered RAID: Data+parity distributed over all disks– Rebuild uses IO capacity of an array’s 19 (surviving) disks

Striping across all arrays, all file

accesses are throttled by array 2’s

rebuild overhead.

Load on files accesses are

reduced by 4.8x (=19/4)

during array rebuild.

Failed Disk

16 RAID stripes(data plus parity)

� Traditional RAID: Narrow data+parity arrays– Rebuild uses IO capacity of an array’s only 4 (surviving) disks

Failed Disk

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Low-Penalty Disk Rebuild Overhead

9

failed disk

Rd-Wr

time

Rd Wr

time

failed disk

Reduces Rebuild Overhead by 3.5x

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Declusterdata, parity and

spare

14 physical disks / 3 traditional RAID6 arrays / 2 spares 14 physical disks / 1 declustered RAID6 array / 2 spares

Declusterdata, parity and

spare

14 physical disks / 3 traditional RAID6 arrays / 2 spares 14 physical disks / 1 declustered RAID6 array / 2 spares

failed disks failed disks

failed disksfailed disksNumber of faults per stripe

Red Green Blue

0 2 0

0 2 0

0 2 0

0 2 0

0 2 0

0 2 0

0 2 0

Number of faults per stripe

Red Green Blue

1 0 1

0 0 1

0 1 1

2 0 0

0 1 1

1 0 1

0 1 0

Number of stripes with 2 faults = 1Number of stripes with 2 faults = 7

Declustered RAID6 Example

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Non-intrusive disk diagnostics

� Disk Hospital: Background determination of problems– While a disk is in hospital, GNR non-intrusively and immediately

returns data to the client utilizing the error correction code.

– For writes, GNR non-intrusively marks write data and reconstructs it

later in the background after problem determination is complete.

� Advanced fault determination

– Statistical reliability and SMART monitoring

– Neighbor check, drive power cycling – Media error detection and correction

� Supports concurrent disk firmware updates

11

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Mean time to data loss 8+2 vs. 8+3

Parity 50 disks 200 disks 50,000 disks

8+2 200,000 years 50,000 years 200 years

8+3 250 billion years 60 billion years 230 million years

12

Simulation assumptions: Disk capacity = 600-GB, MTTF = 600khrs, hard error rate = 1-in-1015

bits, 47-HDD declustered arrays, uncorrelated failures. These MTTDL figures are due to hard

errors, AFR (2-FT) = 5 x 10-6, AFR (3-FT) = 4 x 10-12

These figures assume uncorrelated failures and hard read errors.

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Agenda

� IBM General Parallel File System overview

� GPFS Native RAID technology

� GPFS Storage Server

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

GPFS Native Raid (GNR) on Power 775 – GA on 11/11/11

� See Chapter 10 of GPFS 3.5 Advanced Administration Guide (SC23-5182-05)

– or GPFS 3.4 Native RAID Administration and Programming Reference (SA23-1354-00)

� First customers:– Weather agencies, government agencies, universities.

IBM Confidential

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

R

DedicatedDisk Controllers

NSD File Server 1

x3650NSD File Server 1

x3650

NSD File Server 2NSD File Server 2

Disk Enclosures

NSD File Server 1NSD File Server 1

NSD File Server 2NSD File Server 2

GPFS Native RAIDGPFS Native RAID

GPFS Native RAIDGPFS Native RAID

RD

FDR IB

10 GigE

Workload Evolution

File/Data ServersMigrate RAID to Commodity File

Servers

Disk Enclosures

Compute Cluster Compute Cluster

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Introducing IBM GPFS Storage Server

� What’s New:– Replaces external hardware controller with software based RAID– Modular upgrades improve TCO– Non-intrusive disk diagnostics

� Client Business Value:– Integrated and ready to go for Big Data applications– 3 years maintenance and support– Improved storage affordability– Delivers data integrity, end-to-end– Faster rebuild and recovery times– Reduces rebuild overhead by 3.5x

� Key Features:– Declustered RAID (8+2p, 8+3p)– 2- and 3-fault-tolerant erasure codes– End-to-end checksum– Protection against lost writes– Off-the-shelf JBODs– Standardized in-band SES management– SSD Acceleration Built-in

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

17

“Twin Tailed” JBOD Disk Enclosure

x3650 M4

Complete Storage Solution

Data Servers, Disk (SSD and NL-SAS), Software, Infiniband and Ethernet

Model 24:

Light and Fast4 Enclosures 20U232 NL-SAS 6 SSD

Model 26:

HPC Workhorse!6 Enclosures 28U

348 NL-SAS 6 SSD

High Density HPC

Options18 Enclosures

2 - 42u Standard Racks1044 NL-SAS 18 SSD

A Scalable Building Block Approach to Storage

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Agenda

� IBM General Parallel File System overview

� GPFS Native RAID technology

� GPFS Storage Server

© 2012 IBM Corporation

IBM GPFS Storage Server – ZKI-AK Paderborn, 15.03.2013

Doing Big Data Since 1998

IBM GPFS