ibm general parallel file system (gpfs™) 3.5 file ... · gpfs fpo advanced storage for map reduce...

IBM General Parallel File System (GPFS™) 3.5

File Placement Optimizer (FPO)

Rick Koopman

IBM Technical Computing Business Development Benelux

[email protected]

© 2012 IBM Corporation

Technical Computing: Powerful. Comprehensive. Intuitive

GPFS 3.5 HDFS

Performance

Terasort: large reads X X

Hbase: small write X X

Metadata intensive X X

Enterprise

readiness

Posix compliance X

Meta-data replication X

Distributed name node X

Protection &

Recovery

Snapshot X

Asynchronous Replication X

Backup X

Security &

Integrity Access Control Lists X

Ease of Use Policy based Ingest X

Enterprise class replacement for HDFS



A typical HDFS Environment

Map Reduce Cluster

NFS

Filers

M

a

p

R

e

d

u

c

e

Users Jobs

H

D

F

S

Uses disk local to each server

Aggregates the local disk space into a single, redundant shared filesystem

The open source standard file systems used in partnership with Hadoop Map reduce



Map Reduce Environment Using GPFS-FPO (File Placement Optimizer)

Map Reduce Cluster

NFS

Filers

M

a

p

R

e

d

u

c

e

Users Jobs

G

P

F

S

-

F

P

O

Uses disk local to each server

Aggregates the local disk space into a single redundant shared filesystem

Designed for map reduce workloads

Unlike HDFS, GPFS-FPO is POSIX compliant – so data maintenance is easy

Intended as a drop in replacement for open source HDFS (IBM BigInsights product

may be required)



GPFS FPO advanced storage for Map Reduce Data

Hadoop HDFS IBM GPFS Advantages

HDFS NameNode is a single point of failure

Large block-sizes – poor support for small files

Non-POSIX file system – obscure commands

Difficulty to ingest data – special tools required

Single-purpose, Hadoop MapReduce only

Not recommended for critical data

No single point of failure, distributed

metadata

Variable block sizes – suited to multiple types

of data and data access patterns

POSIX file system – easy to use and manage

Policy based data ingest

Versatile, Multi-purpose

Enterprise Class advanced storage features



Next Generation Archiving Solutions LTFS Storage Platforms

IBM Storage



7

The Problem – Network Disk Growth…

C:/user defined namespace

Large And Growing Bigger

Difficult to Protect / Backup Cost Backup windows Time to recovery

Data mix reduces effectiveness of compression/dedupe

Data

Pro

tectio

n

Opera

tio

nal Manageability

Cost Data mix - Rich media & databases, etc Uses – active, time senstive access & static, immutable data



8

The Solution – Tiered Network Storage

C:/user defined namespace

Smaller

Easier to protect Faster Time to recovery Smaller backup footprint

Time critical applications/data

Data

Pro

tectio

n

Opera

tio

nal

Single file system view

LTFS LTFS LTFS LTFS LTFS

Smaller Scalable

Lower cost, scalable storage Data types/uses for tape

Static data, rich media, etc. Replication backup strategies

High use data, databases, email, etc

Static data, rich media, unstructured, archive

Policy Based Tier Migration



Tokyo London Los Angeles Smarter Storage Distributed Data

Namespace file view

Load balancing

Policy migration

Storage Distribution

Reduction of cost for storage

Data monetization

Disk SSD

Node 3

GPFS

DSM

LTFS LE

Disk SSD

Node 1

GPFS

DSM

LTFS LE

Disk

Node 4

GPFS

DSM

LTFS LE

LTFS

Disk SSD

Node 2

GPFS

DSM

LTFS LE

NFS/CIFS NFS/CIFS NFS/CIFS



IBM System x GPFS Storage Server A Revolution in HPC Intelligent Cluster Management!



11

“Twin Tailed” JBOD Disk Enclosure

x3650 M4

Complete Storage Solution

Data Servers, Disk (SSD and NL-SAS), Software, Infiniband and Ethernet

Model 24:

Light and Fast 4 Enclosures 20U

232 NL-SAS 6 SSD

10 GB/Second

Model 26:

HPC

Workhorse! 6 Enclosures 28U

12 GB/Second

348 NL-SAS 6 SSD

High Density HPC

Options 18 Enclosures

2 - 42u Standard Racks

1044 NL-SAS 18 SSD

36 GB/Second

A Scalable Building Block Approach to Storage



Mean time to data loss 8+2 vs. 8+3

Parity 50 disks 200 disks 50,000 disks

8+2 200,000 years 50,000 years 200 years

8+3 250 billion years 60 billion years 230 million years

12

Simulation assumptions: Disk capacity = 600-GB, MTTF = 600khrs, hard error rate = 1-in-1015

bits, 47-HDD declustered arrays, uncorrelated failures. These MTTDL figures are due to hard

errors, AFR (2-FT) = 5 x 10-6, AFR (3-FT) = 4 x 10-12

These figures assume uncorrelated failures and hard read errors.



De-clustering – Bringing Parallel Performance to Disk Maintenance

13

20 disks, 5 disks per traditional RAID array

4x4 RAID stripes

(data plus parity)

20 disks in 1 De-clustered array

Declustered RAID: Data+parity distributed over all disks – Rebuild uses IO capacity of an array’s 19 (surviving) disks

Striping across all arrays, all file

accesses are throttled by array 2’s

rebuild overhead.

Load on files accesses are

reduced by 4.8x (=19/4)

during array rebuild.

Failed Disk

16 RAID stripes

(data plus parity)

Traditional RAID: Narrow data+parity arrays – Rebuild uses IO capacity of an array’s only 4 (surviving) disks

Failed Disk



Low-Penalty Disk Rebuild Overhead

14

failed disk

Rd-Wr

time

Rd Wr

time

failed disk

Reduces Rebuild Overhead by 3.5x

ibm general parallel file system (gpfs™) 3.5 file ... · gpfs fpo advanced storage for map reduce...

Documents