ibm general parallel file system (gpfs™) 3.5 file ... · gpfs fpo advanced storage for map reduce...
TRANSCRIPT
IBM General Parallel File System (GPFS™) 3.5
File Placement Optimizer (FPO)
Rick Koopman
IBM Technical Computing Business Development Benelux
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
GPFS 3.5 HDFS
Performance
Terasort: large reads X X
Hbase: small write X X
Metadata intensive X X
Enterprise
readiness
Posix compliance X
Meta-data replication X
Distributed name node X
Protection &
Recovery
Snapshot X
Asynchronous Replication X
Backup X
Security &
Integrity Access Control Lists X
Ease of Use Policy based Ingest X
Enterprise class replacement for HDFS
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
A typical HDFS Environment
Map Reduce Cluster
NFS
Filers
M
a
p
R
e
d
u
c
e
Users Jobs
H
D
F
S
Uses disk local to each server
Aggregates the local disk space into a single, redundant shared filesystem
The open source standard file systems used in partnership with Hadoop Map reduce
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
Map Reduce Environment Using GPFS-FPO (File Placement Optimizer)
Map Reduce Cluster
NFS
Filers
M
a
p
R
e
d
u
c
e
Users Jobs
G
P
F
S
-
F
P
O
Uses disk local to each server
Aggregates the local disk space into a single redundant shared filesystem
Designed for map reduce workloads
Unlike HDFS, GPFS-FPO is POSIX compliant – so data maintenance is easy
Intended as a drop in replacement for open source HDFS (IBM BigInsights product
may be required)
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
GPFS FPO advanced storage for Map Reduce Data
Hadoop HDFS IBM GPFS Advantages
HDFS NameNode is a single point of failure
Large block-sizes – poor support for small files
Non-POSIX file system – obscure commands
Difficulty to ingest data – special tools required
Single-purpose, Hadoop MapReduce only
Not recommended for critical data
No single point of failure, distributed
metadata
Variable block sizes – suited to multiple types
of data and data access patterns
POSIX file system – easy to use and manage
Policy based data ingest
Versatile, Multi-purpose
Enterprise Class advanced storage features
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
Next Generation Archiving Solutions LTFS Storage Platforms
IBM Storage
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
7
The Problem – Network Disk Growth…
C:/user defined namespace
Large And Growing Bigger
Difficult to Protect / Backup Cost Backup windows Time to recovery
Data mix reduces effectiveness of compression/dedupe
Data
Pro
tectio
n
Opera
tio
nal Manageability
Cost Data mix - Rich media & databases, etc Uses – active, time senstive access & static, immutable data
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
8
The Solution – Tiered Network Storage
C:/user defined namespace
Smaller
Easier to protect Faster Time to recovery Smaller backup footprint
Time critical applications/data
Data
Pro
tectio
n
Opera
tio
nal
Single file system view
LTFS LTFS LTFS LTFS LTFS
Smaller Scalable
Lower cost, scalable storage Data types/uses for tape
Static data, rich media, etc. Replication backup strategies
High use data, databases, email, etc
Static data, rich media, unstructured, archive
Policy Based Tier Migration
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
Tokyo London Los Angeles Smarter Storage Distributed Data
Namespace file view
Load balancing
Policy migration
Storage Distribution
Reduction of cost for storage
Data monetization
Disk SSD
Node 3
GPFS
DSM
LTFS LE
Disk SSD
Node 1
GPFS
DSM
LTFS LE
Disk
Node 4
GPFS
DSM
LTFS LE
LTFS
Disk SSD
Node 2
GPFS
DSM
LTFS LE
NFS/CIFS NFS/CIFS NFS/CIFS
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
IBM System x GPFS Storage Server A Revolution in HPC Intelligent Cluster Management!
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
11
“Twin Tailed” JBOD Disk Enclosure
x3650 M4
Complete Storage Solution
Data Servers, Disk (SSD and NL-SAS), Software, Infiniband and Ethernet
Model 24:
Light and Fast 4 Enclosures 20U
232 NL-SAS 6 SSD
10 GB/Second
Model 26:
HPC
Workhorse! 6 Enclosures 28U
12 GB/Second
348 NL-SAS 6 SSD
High Density HPC
Options 18 Enclosures
2 - 42u Standard Racks
1044 NL-SAS 18 SSD
36 GB/Second
A Scalable Building Block Approach to Storage
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
Mean time to data loss 8+2 vs. 8+3
Parity 50 disks 200 disks 50,000 disks
8+2 200,000 years 50,000 years 200 years
8+3 250 billion years 60 billion years 230 million years
12
Simulation assumptions: Disk capacity = 600-GB, MTTF = 600khrs, hard error rate = 1-in-1015
bits, 47-HDD declustered arrays, uncorrelated failures. These MTTDL figures are due to hard
errors, AFR (2-FT) = 5 x 10-6, AFR (3-FT) = 4 x 10-12
These figures assume uncorrelated failures and hard read errors.
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
De-clustering – Bringing Parallel Performance to Disk Maintenance
13
20 disks, 5 disks per traditional RAID array
4x4 RAID stripes
(data plus parity)
20 disks in 1 De-clustered array
Declustered RAID: Data+parity distributed over all disks – Rebuild uses IO capacity of an array’s 19 (surviving) disks
Striping across all arrays, all file
accesses are throttled by array 2’s
rebuild overhead.
Load on files accesses are
reduced by 4.8x (=19/4)
during array rebuild.
Failed Disk
16 RAID stripes
(data plus parity)
Traditional RAID: Narrow data+parity arrays – Rebuild uses IO capacity of an array’s only 4 (surviving) disks
Failed Disk
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive
Low-Penalty Disk Rebuild Overhead
14
failed disk
Rd-Wr
time
Rd Wr
time
failed disk
Reduces Rebuild Overhead by 3.5x
© 2012 IBM Corporation
Technical Computing: Powerful. Comprehensive. Intuitive