ibm gpfs 2014 elastic storage - infn-cnafvladimir/export/sds-in... · performance & health...

111
© 2014 IBM Corporation IBM GPFS 2014 / Elastic Storage Title Software Defined Storage in action with GPFS v4.1 Speaker Frank Kraemer Frank Kraemer IBM Systems Architect mailto:[email protected]

Upload: others

Post on 05-Oct-2020

5 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

IBM GPFS 2014 / Elastic StorageTitle Software Defined Storage in action with GPFS v4.1Speaker Frank Kraemer

Frank KraemerIBM Systems Architectmailto:[email protected]

Page 2: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Agenda:

• File Systems Market Overview• GPFS v4.1 News & Roadmap• ILM with TSM/HSM & LTFS‐EE• Network Attached Storage (cNFS)• GPFS Native Raid (GNR)

o GPFS Storage Server (Intel x86)o Elastic Storage Server (Power8)

• GPFS‐FPO (Hadoop/Mapreduce)• Summary & Roadmap

LEGO, the LEGO logo and the Minifigure are trademarks and/or copyrights of the LEGO Group.

Page 3: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

IBM GPFS vs. Competitors

Why choosing GPFS?1. Stability2. Features3. Scalability4. OS Platform support 5. Global Namespace6. References

Competitors (some)• Lustre (Intel, DDN, Cray, Xyratex,..)• StorNext FS, Quantum• Gluster (RedHat)

• Panasas (NAS)• EMC Isilion (NAS)• NetApp Ontap v8.x (NAS)• HDS HNAS/BlueArc (NAS)

Open source & research projects• Ceph (April 30th 2014 now RedHat)

• BeeGFS (ex-Fraunhofer FS)• dCache• XtreemFS

http://en.wikipedia.org/wiki/List_of_file_systemsInfos

Seagate

(Spinnaker Networks)

Page 4: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS 1998

GPFS: A Shared‐Disk File System for Large Computing Clusters

Frank Schmuck and Roger HaskinIBM Almaden Research CenterSan Jose, CA

Page 5: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS history and milestones

Tiger

Shark

1.31.21.1

lc lc lc

2.2

2.2

2.2

lc

lc

sphacmprpdlc

2.11.5

sphacmprpd

sphacmp(ESS)

1.4

sphacmp(SSA)

1.x

sp

1998 2001 2002 2003 2004 2006 20072000

2.2

2.2

2.2

2.3

2.3

2.3

3.2

3.2

3.2

lcAIX v6/7

pLinux

Linux

InteroperabilityDisaster Recovery (DR)

Remote mountcapabilities (WAN)

Information LifecycleManagement (ILM)

2010

Win2012R23.2

3.3

3.3

3.3

3.3

3.1

3.1

3.1

1993

SFS v1.0

2009

SFS v1.1

IBM SAN File System (SFS)

2008 2014

Windows 2008R2

3.5

3.5

3.5

3.5

20112005

GPFS AFM / Panache

GPFS Native RAID 

GPFS‐SNC / Hadoop Octv4.1.0.3

Win7 x64

Page 6: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Software Defined Storage for Dummieshttp://www‐01.ibm.com/common/ssi/cgi‐bin/ssialias?subtype=BK&infotype=PM&appname=STGE_DC_ZQ_USEN&htmlfid=DCM03004USEN&attachment=DCM03004USEN.PDF

This book examines data storage and management challenges and explains software‐defined storage, an innovative solution for high‐performance, cost‐effective storage using the IBM General Parallel File System (GPFS).

http://en.wikipedia.org/wiki/Software‐defined_storage

mailto:[email protected]

Page 7: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS = Software Defined Storage (SDS)

GPFS Storage Server Cluster

Cinder SwiftGPFS Hadoop

Connector

GPFS NFS

Single software defined storage solution across all these application types

Linear capacity & performance scale out

POSIX

Enterprise storage on standard hardware

Single Name Space

Technical Computing Big Data & Analytics Cloud

Block ObjectFile

Page 8: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS DEVELOPMENT TEAMSBackground Information

Page 9: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Almaden Research, CA

Latitude: 37°12‘37.53‘N / Longitude: 121°48‘25.23‘W

Page 10: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Lab Poughkeepsie, N.Y.

Latitude: 41°39‘8.35‘N / Longitude: 73°56‘5.20‘W

Page 11: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Support & Lab Mainz, Germany

Page 12: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS CONCEPTSTutorial

Page 13: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Architecture (Basis)

Storage Area Network (SAN),

Shared SAS, Twin Tailed, etc.

LUN = Logical Unit Number / NSD = Network Shared Disk

1

SAN LUN

GPFS NSD

„1:1“ Relation

Page 14: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Architecture (Common)

SAN

LAN

LUN‘s

GPFS NSD Client

GPFS NSD Server

2

Page 15: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Architecture (Typical)

Disk LUN‘s

GPFS NSD Clients

GPFS NSD Server

FC SAN

LAN / WAN / Infiniband (any Mix)

3

+ Twin‐Tailed Disks + Internal Disks

FPOHadoop

(GSS = GPFS Storage Server // FPO = File Placement Optimizer)

GSS GSSFPOHadoop

Page 16: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

LUN‘s

NSD Clients

NSD Server

(NSD = Network Shared Disk)

LANInfiniband

remote cluster

Remote Cluster Mount (synchronous)local cluster

4

Page 17: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

LUN‘s

NSD Clients

NSD Server

(NSD = Network Shared Disk)

WANInfiniband

remote cluster

GPFS Advanced File Management (async)local cluster

Caching (R/W)

5

Page 18: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS System Structureapplication

File system call

configuration manager file system manager metanode

OS kernel

OS vnode / vfs

GPFS kernel extension

GPFS inode

GPFS administration commands (mm...)

Multi‐threadedGPFS daemon

mmfsdNSD

GPFS portability layer (required for Linux only)

NetworkSharedDisk

Page 19: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Metadata ServicesMulti‐threaded GPFS daemon

mmfsd

Configuration manager

drives recovery after nodefailure

1 per cluster, elected by thequorum nodes

1 per file system

1 per open file file metadata updatesMetanode(s)

File system manager(s)

selects file system managers

file system configuration

disk space allocation

token management

quota management

security services

Page 20: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Consistency control: Locks and Tokens

Token Servers

Applications

GPFSTokensLocks

Foo.1

Foo.2

Foo.3

Foo.1

Blk.02

Blk.19

Blk.936

Blk.237

Data buffers

File structures

Block 237

Block 2

Local consistency Cached capability, global consistency

Client systems

Request / releaseRevoke

Multiple modesDistributed via hashRecoverable service

Page 21: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Replicated Data and Metadata

No designated "mirror", no fixed placement function:flexible replication (e.g., replicate only metadata, or only important files)dynamic reconfiguration: data can migrate block‐by‐blockmm<cr|ch>fs interfaces for admin

Inode, indirect block, and/ordata blocks may be replicated

Each disk address:list of pointers to replicas

Each pointer:disk id + sector no.

Page 22: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Failure Group (FG) conceptFailure Group: collection of disks that could become unavailable simultaneously, e.g.,

– Disks attached to the same storage controller– Disks served by the same NSD server

Used for two purposes:– Replication: replicas of the same block must be

on disks in two different failure groups– Striping: stripe across failure groups, then

across disks within failure group:D1, D3, D5, D7, D2, D4, D6, D8

Reason: common point of failure = common resource that requires load balancing

GPFS-FPO: “extended failure group” (conveys additional location information)

Example: r,n = rack, node within rackwith replication 3:

– second copy placed in a different rack– third copy: same rack, but different node

D1 D2 D3 D4 D5 D6 D7 D8

FG1 FG2 FG3 FG4

1,1 1,2 2,1 2,2

rack 1 rack 2

Page 23: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS v3.5 has fullIPv6 Support

• IPv6 (Internet Protocol version 6) is a version of the Internet Protocol (IP) intended to succeed IPv4• IPv6 was developed by the Internet Engineering Task Force (IETF) to deal with this long‐anticipated IPv4 address exhaustion, and is described in Internet standard document RFC 2460, published in December 1998.

• While IPv4 allows 32 bits for an IP address, and therefore has 2^32 (4 294 967 296) possible addresses, IPv6 uses 128‐bit addresses, for an address space of 2^128  addresses.

• IPv6 also implements additional features not present in IPv4. • Network security is also integrated into the design of the IPv6 architecture, including IPsec.

Page 24: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS VERSION 4.1What‘s new with GPFS

Page 25: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS v4.1April 22 2014 

• IBM GPFS Concepts, Planning, and Installation Guide (GA76‐0441)• IBM GPFS Administration and Programming Reference (SA23‐1452)• IBM GPFS Advanced Administration and Programming Reference (SC23‐7032)• IBM GPFS Problem Determination Guide (GA76‐0443)• IBM GPFS Data Management API Guide (GA76‐0442)

http://www.ibm.com/common/ssi/cgi‐bin/ssialias?infotype=AN&subtype=CA&appname=gpateam&supplier=897&letternum=ENUS214‐079&pdf=yes

Page 26: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS v4.1 product structure

Server and Client for EachSocket Based Licensing• Simpler, no more PVUs

Express Edition• gpfs.base (no ilm, afm, cnfs) • gpfs.docs• gpfs.gpl• gpfs.msg• gpfs.gskit

Standard Edition• Add gpfs.ext

Advanced Edition• Add – gpfs.crypto

Platforms• zLinux• Ubuntu

Features Express Edition Standard Edition Advanced Edition

Basic GPFS functionality

ILM: Storage pools, Policy, mmbackup

Active File Management (AFM)

Clustered NFS (cNFS)

Encryption

(same as v3.5)

*NEW* *NEW*

Page 27: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Encryption and NIST ComplianceNative encryption support for GPFS v4.1 filesystems

Addresses critical requirementsEncryption of data at restSecure Erase is mandatory today

User and directory blocks will be fully encrypted.Per-inode file encryption key (FEK), which would be wrapped withone or more master encryption keys (MEK).MEK management will be external to GPFS. (TKLM)GPFS v4.1 will be NIST SP 800-131A compliant.

Page 28: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Encryption and NIST Compliance

• Native: encryption is built into the “Advanced” GPFS product

• Protects data from security breaches, unauthorized access, and being lost, stolen or improperly discarded

• Cryptographic erase for fast, simple and secure file deletion

• Complies with NIST SP 800-131A and is FIPS 140-2 certified

• Supports HIPAA, Sarbanes-Oxley, EU and national data privacy law compliance

Page 29: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Native Encryption and Secure Erase

Encryption of data at rest• Files are encrypted before they are stored on disk

• Keys are never written to disk

• No data leakage in case disks are stolen or improperly decommissioned

Secure deletion • Ability to destroy arbitrarily large subsets of a file system

• No “digital shredding”, no overwriting: secure deletion is a cryptographic operation

Page 30: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Reliability, Availability and Serviceability (RAS) #1

Automated deadlock detection, notification, and debug data collectionAutomated deadlock detectionAutomated deadlock data collectionAutomated deadlock breakup

Dump improvementDaemon survival under heavy loadsAbility to dump more data

Message LoggingSend message logs to system event logging facility

Directory enhancements to allow shrinkingMerging mostly empty blocksAllows larger directory block sizes

Page 31: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Reliability, Availability and Serviceability (RAS) #2

User-defined Node classesmmcrnodeclass, mmchnodeclass, mmdelnodeclass, mmlsnodeclass

Quota file improvementsQuota management enablement without unmounting the file systemfsck() speed improvementsSupport for GPT NSD

Adds a standard disk partition table (GPT type) to NSDsDisk label support for Linux

New GPFS NSD v2 format provides the following benefits:Includes a partition table so that the disk is recognized as a GPFS deviceAdjusts data alignment to support disks with a 4 KB physical block sizeAdds backup copies of some key GPFS data structuresExpands some reserved areas to allow for future growth

Page 32: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Performance & Health MonitoringNetwork Performance Monitoring

GPFS daemon caches statistics relating to RPCsA set of up to seven statistics cached per node

Channel wait timeSend time TCPSend time verbsReceive time TCPLatency TCPLatency verbsLatency mixed

GPFS RPC Latency Measurement.mmdiag –rpc

Ongoing enhancements in GPFS 4.1 TLsDisk Performance MonitoringMemory Utilization Monitoring

Page 33: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Performance Improvements

Fine Grained Directory Locking (FGDL)Local Read Only Cache (LROC)

Overflow file data cache to local SSD storageDefined as NSD with “localCache” as usageConfigure it for data or metadata (inodes/dirs)Utilize SSD as an extension of the GPFS buffer pool

Save more memory for applications Automatic management of local storage

Write Data Logging (WDL)Takes advantage of NVRAMs in GPFS client nodes to reduce latency of small and synchronous writesScale write performance with addition of GPFS client nodes

GPFS Clients

GPFS Storage Server Cluster

GPFS LROC

Page 34: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Backup/Restore Improvements New tool to restore from a fileset snapshot into the active file system.

Only copy the blocks that have been changed as well as the changed attributes since the restoring snapshot.

TSM Configuration Verification by mmbackupTSM B/A client must be installed and at the same version on all the nodes that will execute the mmbackup command.TSM B/A configuration are correct before executing the backup.

Automatic TSM tuning adjustments:

“The mmbackup command can be tuned to control the numbers of threads used on each node to scan the file system, perform inactive object expiration, and modified object backup. In addition, the sizes of lists of objects expired or backed up can be controlled or autonomic ally tuned to select these list sizes if they are not specified. List sizes are now independent for backup and expire tasks.”

Page 35: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS v4.1 on Windows via Cygwin

http://cygwin.com/

Cygwin is:A large collection of GNU and Open Source tools which provide functionality similar to a Linux distribution on Windows.A DLL (cygwin1.dll) which provides substantial POSIX API functionality.

GPFS:GPFS will use Cygwin for it‘s shell execution enviroment only.All GPFS programs (executables/binaries) will be native Windows binariesand will not have any linkage with Cygwin DLLs.Cygwin is needed as SUA has been completely removed by Microsoft in Windows Server 2012 R2. (see http://technet.microsoft.com/en-us/library/dn303411.aspx)

Page 36: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

New and changed commandsChanged with GPFS v4.1

mmaddcallbackmmafmctlmmafmlocalmmbackupmmchclustermmchconfigmmchfilesetmmchfsmmcrclustermmcrfilesetmmcrfsmmdiagmmlsfsmmlsmountmmmigratefsmmmountmmrestorefsmmsnapdirmmumount

New with GPFS v4.1mmafmconfigmmchnodeclassmmcrnodeclassmmdelnodeclassmmlsnodeclassmmsetquota

Page 37: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS MULTICLUSTERCloud File Systems via WAN (IP)

Page 38: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS the Cloud ‚backbone‘

Why?

Tie together multiple sets of data into a single namespace

Allow multiple application groups to share portions or all data

Secure, available and high performance data sharing

Support of public and private clouds

LAN

SANSAN

GPFS

LAN

SANSAN

GPFS

GPFS NSD Protocol on TCP/IP

Create an enterprise‐wideGlobal namespace

Page 39: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Cluster A ‚Europe‘Cluster B ‚US‘

/gpfs1_clusterA

/gpfs2_clusterB

LAN / WANvia TCP/IP

Cluster C ‚Far East Asia‘

GPFS Multicluster(Cloud Mode)

Page 40: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Multicluster ‐ Firewall

• bi‐ directional deamon communication

• data to filesystem always uses port 1191 (default)

• optional: mmchconfig tscTcpPort=PortNumber

11911191

Page 41: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

WIDE AREA DATA SERVICESGPFS WAN Cache with AFM / Panache

Page 42: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS WAN Cache Support (AFM)

Cache

http://www.almaden.ibm.com/storagesystems/projects/panache/http://static.usenix.org/event/fast10/tech/full_papers/eshel.pdf

Page 43: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Motivation for GPFS AFM

Data sharing across geographically distributed sites is commonWhile the bandwidth is decent, latency is highNetwork is unreliable, subject to outages

Infrastructure needs to be scalable to move data across the WANMask latency and fluctuating performance of the network

Applications desire local performance for remote dataMove data closer to compute servers

Traditional protocols for remote file serving are chatty and unsuitableLarge files (VM images, virtual disks) are becoming predominantExisting caching systems are primitive

Page 44: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Clients access:/global/data1/global/data2/global/data3/global/data4/global/data5/global/data6

Clients access:/global/data1/global/data2/global/data3/global/data4/global/data5/global/data6

Clients access:/global/data1/global/data2/global/data3/global/data4/global/data5/global/data6

Cache Filesets:/data1/data2

Local Filesets:/data3/data4

Cache Filesets:/data5/data6

File System: store1

Local Filesets:/data1/data2

Cache Filesets:/data3/data4

Cache Filesets:/data5/data6

File System: store2

Cache Filesets:/data1/data2

Cache Filesets:/data3/data4

Local Filesets:/data5/data6

File System: store3

See all data from any ClusterCache as much data as required or fetch data on demand

Home Cache

Global Namespace + AFM Cache

Page 45: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Read Only (RO)– Cache can only read data, no data change allowed.

Local Update (LU)– Data is cached from home and changes are allowed like SW mode but changes are not pushed to home. – Once data is changed the relationship is broken i.e cache and home are no longer in sync for that file.

Single Writer (SW)– Only cache can write data.– Home can’t change.– Other peer caches have to be setup as read only caches.

Independent Writer (IW)– One or more filesets can be linked to the same HOME. Other peer caches

can point to the same HOME and can be set up as “iw“ as well.

Change of Modes– SW & RO mode caches can be changed to any other mode.– LU cache can’t be changed. (Too many complications/conflicts to deal with.)

AFM Cache Modes

Page 46: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Home

appl

data

web

Cache

appl

data

web

Homedirectory tree

Cachedirectory tree

Inode: 100Inode attrs: < … >Remote state:<inode: 1024attrs: mtime, ctime, …. >

Inode 1024Inode attrs: < … >

Independent filesystemsSeparate inode spaceSW: Home FS must not be changedIW: HOME can be changedCache is a clustered FS

LOOKUPGETATTR

[root@c25m4n03 fs10]# mmlsattr -d -X -L file435234file name: file435234metadata replication: 1 max 2data replication: 1 max 2immutable: noappendOnly: noflags: storage pool name: systemfileset name: AFM_fs10snapshot name: creation time: Fri Mar 22 10:35:06 2013Windows attributes: ARCHIVE

gpfs.pcache.inode: 0x0000000000500003597E255F00000001gpfs.pcache.attr: 0x0000000000036126000000000[...]

AFM Technical Details

Page 47: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS ILMInformation Lifecycle Management (ILM)

Page 48: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Storage Pools & PoliciesMotivation:

– Not all storage is the same:some is faster, cheaper, more reliable, …

– Not all data are the same:some are more valuable, important, popular, …

Storage Pool: A named collection of disks with similar attributes intended to hold similar data

– System pool: one per file system;holds all metadata

– Data pools: zero or more: only hold data– External pool: off-line storage (e.g., tape)

for rarely accessed data

Policy: A set of user-specified rules that match data to the appropriate pool

– SQL-like syntax for selecting files based on file attributes, such as:

• name or name pattern (e.g., *.jpg)• owner, file size, time stamps• extended attributes

SSD

10k rpm SAS

7200 rpm SATA

“gold”

“silver”

“bronze”

Page 49: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Actionable intelligence for File Storage Tiering via GPFS

+ Another 27 Misc Attributes & Custom Extended Attributes (XATTR)

GPFS knows• File name• File type• I/O Size• Type of storage technology• Latency of storage• Locality of storage• Time of last access• Block size• Time of last change• Clone attributes• Time of file creation

• File Tree location• File heat• File size• Filesets• Generation of file’s reuse• Group owning file• User owning file• Space efficiency of file• Access Permissions• Time of last metadata change

Block Storage knows• I/O Size• Type of storage technology• Compressible data set• Latency of storage• Locality of storage

Page 50: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Pools

When creating a file system or adding disks:specify the name of the pool that each disk belongs to. → Pool = collection of all disks with the same pool name

Pools can have attributes specified via “stanzas”, e.g., allocation map layout, block size

Separate allocation map for each pool→ Efficient to find space in a particular pool

Block size:– All data pools must have the same block size

(allows migrating files one data block at a time).– System pool may have different block size,

but only if used for metadata only

Pool assignment recorded in the inode of each file.→ A file can only “belong” to a single pool→ Writes fail if pool is full (ENOSPC)

system

data1

data2

Page 51: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Policies

Placement policy:– Evaluated at file creation time– Determines initial file placement and replication

Migration policy:– Evaluated periodically or on-demand– Can move data between pools, changes replication, delete data,

or run arbitrary user commands

Policy engine (mmapplypolicy):– Fast, parallel directory traversal combined with inode scan– Runs outside the daemon, but makes use of GPFS infrastructure and APIs

(extended readdir, inode scan)– Can be used as powerful framework for building parallel file system utilities, e.g.

• Fast find/grep• Remote replication

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom.ibm.cluster.gpfs.v3r5.gpfs100.doc%2Fbl1adm_mmapplypolicy.htm

Page 52: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Filesets & Fileset Snapshots

root

fset1

fset3

fset2

Fileset: A partition of the file system name space (sub-directory tree)

– Allows administrative operations at finer granularity than entire file system, e.g.,

disk space limits, user/group quota, snapshots, caching, ...

– Can be used to refer to a collection of files in policy rules

Independent Fileset: A fileset with a reserved set of inode block ranges (“inode space”)

– Allows per-fileset inode scan– Enables fileset snapshots (inode copy-on-write

operates on inode blocks)– Separate inode limit and inode file expansion for

each inode space→ Active inode file may become sparse

Page 53: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Fileset Snapshotsro

otfs

et1

fset

2fs

et1

root

activeFS

fset2snapshot

fset1snapshot

globalsnapshot

copy-on-write

ditto resolution

Page 54: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS BACKUP & RESTOREGPFS Backup, Restore and HSM via TSM

Page 55: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Backup/Restore via Tivoli StorageManager 

Copy Pool #2

Copy Pool #1

GPFS can use Multiple TSM Servers in parallelTSM B/A Client for GPFS runs on each NodeBackup & Restore is done in parallelLAN Free mode is possibleGPFS Policy Engine is used; no filetree walk needed

3. Storage pool backup

TSMSrv #1

2. Migration

1. Backup4. Restore

/gpfs01 ‐ FS

TSMSrv #2

TSMSrv #N

Scale

TSM Disk Storage Pool(s) 

GPFS File System

Page 56: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

What is ‘mmbackup’ ?

Page 57: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS HSM via DMAPI

file

file

...

HSM Object ID / Handle

DMAPI

stub

Migration

Page 58: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Hierarchical StorageManagement

stubObject ID (DMAPI handle)

TSM Server

filemigrated

Object ID (DMAPI handle)

filepremigrated

stubObject ID (new DMAPI handle)

afterrestore

migstate=yes Move reestablishing ofthe link to the restore path

file

file (# of versions)normal file

file

backup

restore

migraterecall

migraterecall

HSM Client

Page 59: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Hierarchical StorageManagement

Page 60: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS SOBAR Backup Procedure

PreparationPre/Migrate all files

Information CollectionCreate file system configuration backup fileCreate file system snapshot & file system image

TSM BackupBackup file system configuration & file system image to TSM

Scale Out Backup and Restore (SOBAR) is a specialized mechanism for data protection against disaster only for GPFS file systems that are managed by Tivoli Storage Manager (TSM) Hierarchical Storage Management (HSM).

Page 61: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS SOBAR Restore ProcedureTSM RestoreRestore file system configuration & file system image

Target FS PreparationExtract & apply file system configurationCreate NSD’s and file system

Extract File System ImageMount the file system Recreate file system image

Start ProductionStart HSM daemons & remount the file systemAdd HSM management and start recall

Page 62: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS ILM VIA LTFS‐EELinear Tape File System Enterprise Edition

Page 63: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS ILM with LTFS‐EE• LTFS Enterprise Edition integrates LTFS with GPFS

– LTFS represents external tape pool to GPFS– Files can be migrated using on GPFS policies or LTFS EE commands– Similar implementation as with TSM HSM

• LTFS EE can be configured on multiple nodes– Multiple instances of LTFS EE share the same tape library

GPFS NodesGPFS file system

LTFS LE+

User file system

Data

IBM TS3500Tape Library

LTFS EE SAN

Fibre Channel 

Page 64: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS and LTFS EE integration

User accesses file system on all GPFS nodes

User file system is staging area for subsequent migration

HSM integrates with GPFS user file system and MMM to manage migration and recall

MMM manages workload over all LTFS EE instances

LTFS LE+ manages tape access via local tape drives

Metadata file system stores shared LTFS tape index

Page 65: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS HSM and LTFS EE• HSM client integrates with the DMAPI to intercept file access• HSM client calls migration driver / MMM to perform migration

–Migration can be triggered manually or via policies–Migration moves file to LTFS and leaves stub– Stub includes reference to directory on LTFS tape–MMM performs load balancing

• When stub is accessed HSM client calls MMM –MMM identifies free resources and performs recall from LTFS tape–After entire file is on disk user access is granted

User file system DMAPI HSMclient

Disk

File accessMigration

Recall

MMM

LTFS LE+ Tape LibraryMigration

driver

Other LTFS node

Page 66: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

LTFS‐EE Tape Import Feature• Import adds the specified tape to the LTFS Enterprise Edition system

– Adds stub files in GPFS file system, imported files are in migrated state– No file data movement, actual file data remains on tape– File data can still be accessed (recalled) via stub

• LTFS SDE and LE tapes can also be imported– Are first converted to LTFS EE tapes

• LTFS EE provides command for tape import: ltfsee import– Import can be done to specific directory in GPFS file system– Options rename, overwrite or ignore can be used to manage conflicts with existing files

Page 67: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

LTFS‐EE Tape Export Feature• Export removes tapes from LTFS EE system for vaulting or data exchange

– Removes tapes from pool – Exported tapes are not longer target for migrations or recalls– Files (stubs) migrated to exported tape can be deleted or kept in GPFS– Export message can be added to file stubs (64 bytes)

• Export with offline option keeps file stubs and GPFS – Files remain visible in the GPFS namespace but are no longer accessible

• LTFS EE provides command for tape export: ltfsee export– To move tape to I/O station use ltfsee tape move ieslot

Page 68: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

LTFS‐EE ReferencesLTFS EE InfoCenter: http://pic.dhe.ibm.com/infocenter/ltfsee/cust/index.jsp

GPFS InfoCenter: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp

LTFS EE Redbooks: http://www.redbooks.ibm.com/redpieces/abstracts/sg248143.html

LTFS EE Installation Demo:http://www.youtube.com/watch?v=bF5tHAjp5xA&feature=youtu.be

Page 69: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

NETWORK ATTACHED STORAGE (NAS)CIFS/SMB & NFS via GPFS

Page 70: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Clustered NFS (cNFS)

70

Feature of GPFS on LinuxShare files with non‐GPFS clients using NFS protocolNFS daemon (nfsd) of theLinux OS is used as normalAll nodes can share the same dataIf a NFS Server Node fails client connections are moved to another serverNFS Server Node(s) need GPFS Server LicenseNFS Clients need no GPFS License

Local area network (LAN)NFS Clients

NFS server(s)  GPFS NSD 

GPFS File System

AIX, Linux, OSX, Windows

(Linux only !)

Page 71: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Clustered NFS (cNFS) #2

# Enable cNFS on GPFS Cluster1> mmchconfig cnfsSharedRoot=<Dir_Path_Name>

# Add each node with the correct IP interface2> mmchnode ‐N <node_name> ‐‐cnfs‐enable –cnfs‐interface=<nfs_ip>

# Check Cluster Status3> mmlscluster –cnfs

# Done !

http://www.redbooks.ibm.com/redpapers/pdfs/redp4400.pdf

Page 72: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

User Space NFS v4

“NFS‐Ganesha”NFS‐GANESHA is a NFS server running in User Space. It is available under the LGPL license.

It has been designed to meet two goals:1. providing very large metadata and data caches (up to millions of records)2. providing NFS exports to various files systems and namespaces (a set of data organized 

as trees, with a structure similar to a files system)

NFS‐GANESHA uses dedicated backend modules called FSAL (which stand for File System Abstraction Layer) that provided the product with a unique API (used internally) to access the underlying namespace. The FSAL module is basically the "glue" between the namespace and the other part of NFS‐GANESHA.

https://github.com/nfs‐ganesha/nfs‐ganesha/wikihttps://github.com/nfs‐ganesha/nfs‐ganesha/

Page 73: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GANESHA NFS

File System Abstraction Layer = FSAL_GPFS

GANESHA, a multi‐usage with large cache NFSv4 server(Part of SONAS / V7000U v1.5)

Page 74: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

SAMBA – Does it work with GPFS ?

data

data

GPFS Cluster

data

SMB / CIFS

SMB / CIFS

SMB / CIFS

SMB / CIFS

SMB/CIFS Clients

CTDB runs here

Many customers use SAMBA & CTDB (the clustered version of SAMBA) to share GPFS data with SMB/CIFS Clients. 

Clustered Trivial Database (CTDB)

Page 75: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Samba/CTDB/GPFS Update

http://sambaxp.org/past‐conferences/sambaxp‐2013/archive.htmlhttp://sambaxp.org/program/schedule.html

Reminder: SMB/CIFS support via SAMBA or other software, not provided or supported by IBM with GPFS – You’re on your own!

Find more technical details:

Page 76: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS NATIVE RAID (GNR)GPFS Perseus – Declustered RAID

Page 77: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Native Raid (GNR)

Features• Auto rebalancing• Only 2% rebuild performance hit• Reed Solomon erasure code, “8 data +3 parity”• ~105 year MTDDL for 100‐PB file system• End‐to‐end, disk‐to‐GPFS‐client data checksums

Software RAID on the I/O Servers:

SAS attached JBOD’s Special JBOD storage drawer for very dense drive packingSolid‐state drives (SSDs) for metadata storage

SAS

vDISK

Local area network (LAN)

NSD servers

SAS

vDISK

JBODs

Page 78: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GNR is a software implementation of storage RAID technologies 

GPFS„Classic“ GPFS Native Raid

Page 79: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GNR Fault Tolerance2 or 3‐fault tolerant RAID– 8 data strips + 2 or 3 parity strips– 3 or 4 way replication

When one disk is down (most common case)

– Rebuild slowly with minimal impact to client workload

When three disks are down (rare case):– Fraction of stripes that have three failures ~  1%– Quickly get back to non‐critical (2 failure) state vs. 

rebuilding all stripes for conventional RAID

Page 80: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS GNR v2.5

Supported Server Hardware:

GPFS Storage Server V2.5, consisting of two of either of these IBM Power SystemS822L servers (type 5146):

• 128 GB memory (models 21S and 22S)• 256 GB memory (models 24S and 26S)

GPFS Native RAID for GPFS Storage Server is also supported with the LenovoIntelligent Cluster and current GSS Lenovo x86 solutions.

*NEW* Oct 6th 2014

Page 81: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS STORAGE SERVER (GSS)Declustered Software RAID Building Block

Page 82: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Storage Server (GSS)

Benefits of GSS:3 years maintenance and supportImproved storage affordabilityDelivers data integrity, end‐to‐endFaster rebuild and recovery timesReduces rebuild overhead by 3.5x

FeaturesDe‐clustered RAID (8+2p, 8+3p)2‐ and 3‐fault‐tolerant erasure codesEnd‐to‐end checksumProtection against lost writesOff‐the‐shelf JBODsStandardized in‐band SES managementSSD Acceleration Built‐in

Local Area Network (LAN)

GPFS

http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/doc_updates/bl1du13a.pdf

Page 83: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GSS v2.0 ‚Runs‘ GPFS v4.1Release “R2.0”

GUIConfigurationPerformanceMonitoring

Hardware ChangesNew servers and cardsSmaller TraysSSD, SAS and NL/SAS

Software EnhancementsEnclosure protection

GSS#2

GSS#1

Page 84: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM CorporationGPFS Native RAID @ x86

NSD

 Server  02  (x3650‐M4)

with three LSI 9201‐16e (PCIe gen2 x8)N

SD Server  01

   (x36

50‐M

4)with

 three LSI 920

1‐16

e (PCIe gen2

 x8)

Building Block on 6x IBM JBOD – Recovery Group Layout6x Disk enclosures JBOD01-06 (6x 60 disk slots)

RG01

RG01_DA2(58 disks)

RG01_DA1(58 disks)

5 29

RG02

RG02_DA2(58 disks)

RG02_DA1(58 disks)

RG01_DA3(58 disks)

RG02_DA3(58 disks)

RG02

_LOG (3

WayRe

plication @ 3 HDD)

RG01

_LOG (3

WayRe

plication @ 3 HDD)

p

SSD

SSD

SSD

p

p

SSD

SSD

SSD

p

p

p

432

5 29432

5 29432

5 29432

5 29432

5 29432

33 59323130

33 59323130

33 59323130

33 59323130

33 59323130

33 59323130

Page 85: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS GSS GUI #1Top level Navigation

• Home• Monitoring• Files• Volumes • Copy Services• Access• Configuration

(Preview Information)

Page 86: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS GSS GUI #2

(Preview Information)

Page 87: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS GSS GUI #3

(Preview Information)

Page 88: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

IBM System x GPFS Storage Server (GSS) 2.0Introducing four *new* models for entry-level high-performance storage server

Model 21s24 SSD

Model 22s48 SAS or SSD Drives

Model 24s96 SAS or SSD Drives

Model 26s144 SAS Drives

What’s inside the new models?

Server: IBM System x 3650 M4Storage: 2U JBOD (24 slots)

• 24 SSDs (Model 21s)• 1.2 TB SAS Drive plus, choice of

2 x 200 GB or 2 x 800 GB SSDs (Model 22s) Networking: 10 / 40 Gb Enet and/or FDR INFBSoftware: GPFS 4.1

Balanced system – high capacity and performanceNear-linear scalabilityLess hardware - more reliable, lower cost! Pre-integrated, shipped with one part number, 3 yr supportFast disk rebuilds

New New

Announce: June 10, 2014 , Ship Support: June 12 , GA: June 13

Owner: Scott Seal, [email protected] Kit: SSI , PartnerWorldI

Page 89: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Non‐intrusive disk diagnosticsGPFS/GNR Disk Hospital

Background determination of problemsWhile a disk is in hospital, GNR non‐intrusively and immediately returns data to the client utilizing the error correction code.For writes, GNR non‐intrusively marks write data and reconstructs it later in the background after problem determination is complete.

Advanced fault determinationStatistical reliability and SMART monitoringNeighbor check, drive power cyclingMedia error detection and correctionSupports concurrent disk firmware updates

Page 90: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS GSS

http://www.ibm.com/systems/technicalcomputing/platformcomputing/products/gpfs/

Page 91: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

20x IBM GSS‐24 @ FZ Jülich

http://www.fz‐juelich.de/ias/jsc/EN/Expertise/Datamanagement/OnlineStorage/JUST/Configuration/Configuration_node.html

(4640 Disks + 120 SSDs)

Page 92: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

ELASTIC STORAGE SERVER (ESS)Declustered Software RAID Building Block

Page 93: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFSElastic Storage Server (ESS)

*NEW* Oct 6th 2014

Power8 Server HardwareRed Hat Enterprise Linux 7 for PowerGPFS Standard Edition v4.1GPFS Native RAID v4.1IBM Support for xCAT 2

5146‐GL25146‐GL45146‐GL65146‐GS15146‐GS25146‐GS45146‐GS6

DCS3700

EXP24S

Page 94: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

IDEA ‐ GPFSElastic Storage Server (ESS)

*NEW* Oct 6th 2014

IBM Data Engine for Analytics is a customized infrastructure solution with integrated software that is optimized for big data and analytics workloads.

Page 95: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

TSM BACKUP TO DISK  TSM Backup to Disk Storage Pool on GPFS GSS

Page 96: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Whitepaper

2 x TSM ServerIBM x3650‐M4Red Hat Enterprise Linux Server release 6.5IBM Tivoli Storage Manager v7.1

1 x IBM System x GPFS Storage Server ‐ GSS266 x 4U‐60 with 58 x 2 TB NL‐SAS disks drawerIn total 348 disks

1 x Mellanox 32 Port InfiniBand FDR switchEach TSM server is connected with a 56 GBit/s link to the GSS system

„More TSM bang for the buckthan EMC Isilon…“

Page 97: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

PoC Hardware Setup

Page 98: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GSS als Backend Disk Storage für TSM

Peak Performance for a single TSM server is 5,4 GB/s sequential write with 10 or 50 sessions in parallel Peak Performance for both TSM servers is 4,5 GB/s per server or 9 GB/s sequential write with 10 sessions per server (or 3,8 GB/s per server with 50 sessions per server)Performance for a single sequential write session starts at 100 MB/s with 100KB file size and reach 2,5 GB/s with 1GB file sizeMultiple sequential write session performance starts at 12 MB/s per session (50 sessions in parallel = 600 MB/s) with 100KB file size and reach 108 MB/s per session (50 sessions in parallel = 5,4 GB/s) with 1 GB file size

Environment: 1 x GSS26 connected via dedicated 56 Gbit Infiniband lines to 2 x TSM v7.1

Page 99: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS‐FPO Shared Nothing Cluster and Hadoop

Page 100: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS‐FPO for Hadoop, BigData &  HANA

PERFORMANCE & FLEXIBILITY

IMPROVED DATA SHARING FORBETTER COLLABORATION

BUSINESS CONTINUITY AND DATA INTEGRITY

MORE EFFECTIVE MANAGEMENT OF DATA OVER ITS LIFECYCLE

AVOID EXPENSIVE DATA SILOS WITH MORE VERSATILE STORAGE

Enterprise features

Page 101: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

What is HDFS ?The Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable file‐system written in Java for the Hadoop framework.

File access can be achieved through the native Java API, the Thrift API to generate a client in the language of the users' choosing (C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml), the command‐line interface, or browsed through the HDFS‐UI webapp over HTTP. 

http://en.wikipedia.org/wiki/Apache_Hadoop

Rangers know: "Lot's of yellow Elephants can Cause Extensive Damage to your IT!”

Page 102: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Research Paper

“In this paper, we revisit the debate on the need of a new non‐POSIX storage stack for cloud analytics and argue, based on an initial evaluation, that it can be built on traditional POSIX‐based cluster filesystems.“

Page 103: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

Cluster storage configuration for Hadoop and GPFS Storage

Example with 4 datanodes (3 internal disks per datanodes)• 2 pools: a system pool for metadata (1 disk) and 2 FPO datapool for data (2 disks for data)• Several filesets in data pool to manage block replication factors

nsd1 nsd5 nsd6 nsd2 nsd7 nsd8 nsd4 nsd11 nsd12nsd3 nsd9 nsd10 NSD

GPFS-FPODatapool poolwith 3 filesets

Datanodes

with JBOB

No Namenode any moreMetadata are distributed accross the datanodes in a dedicated Storage pool, called system pool

- using physical disks or disk partitions

/dev/sda /dev/sdb /dev/sdc /dev/sda /dev/sdb /dev/sdc /dev/sda /dev/sdb /dev/sdc/dev/sda /dev/sdb /dev/sdcLinux disks

System pool for metadata

nsd1 nsd2 nsd4nsd3

nsd5 nsd6 nsd7 nsd8 nsd11 nsd12nsd9 nsd10

root fileset, replication factor 3 /gpfs-fpo

mrl fileset for map local dir ,replication factor 3 /gpfs-fpo/hadoop/mapred/loal/node1-4

tmp_set fileset for hadoop framework ,replication factor 1 /gpfs-fpo/tmp/hadoop4

Page 104: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS‐FPO for Hadoop/BigInsightshttp://www.ibm.com/systems/technicalcomputing/platformcomputing/products/gpfs/

Page 105: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS‐FPO new capabilities for BigInsightsFile system reliability

• GPFS-FPO avoids the need for a central namenode, a common failure point in HDFS

• Avoid long recovery times in the event of name node failure

• Pipelined replication for efficient storage of block replicas in GPFS-FPO environment

• Boost performance for meta-data intensive applications where the name-node can emerge as a bottleneck.

HDFSNamenode

SecondaryNamenode

Metadata is striped across GPFS FPO nodes, providing better reliability and avoiding the need for primary and secondary name nodes

IBM BigInsights cluster with GPFS-FPO

Page 106: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS‐FPO new capabilities for BigInsightsFlexible storage configuration

• GPFS-FPO avoids the need for a central namenode with distributed metadata, a common failure point in HDFS environments

• Avoids long recovery times in the event that the namenode fails and metadata needs to be recovered from the secondary name node

• Pipelined replication for efficient storage of block replicas in GPFS FPO environment

GPFS Server GPFS Server

Switched Fabric

Shared nothing storage – GPFS‐FPO

Shared storage ‐ GPFS

IBM BigInsights cluster with GPFS FPO

Page 107: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS‐FPO advanced storage for MapReduce

Hadoop HDFS IBM GPFS‐FPO Advantages

HDFS NameNode is a single point of failure

Large block‐sizes – poor support for small files

Non‐POSIX file system – obscure commands

Difficulty to ingest data – special tools required

Single‐purpose, Hadoop MapReduce only

Not recommended for critical data 

No single point of failure, distributed metadata

Variable block sizes – suited to multiple types of data and data access patterns

POSIX file system – easy to use and manage

Policy based data ingest

Versatile, Multi‐purpose

Enterprise Class advanced storage features

GPFS‐FPO  (File Placement Optimzier)

Page 108: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

SUMMARY & ROADMAP

Page 109: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Elastic Storage Vision

Page 110: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation

GPFS Wiki, FAQ & Forums

• GPFS Home Pagehttp://www.ibm.com/systems/gpfs

• GPFS Wikihttp://www.ibm.com/developerworks/wikis/display/hpccentral/General+Parallel+File+System+(GPFS)

• GPFS FAQhttp://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.pdf

• GPFS Forum andMailing listhttp://www‐128.ibm.com/developerworks/forums/dw_forum.jsp?forum=479&cat=13http://lists.sdsc.edu/mailman/listinfo.cgi/gpfs‐general

Page 111: IBM GPFS 2014 Elastic Storage - INFN-CNAFvladimir/export/SDS-in... · Performance & Health Monitoring Network Performance Monitoring GPFS daemon caches statistics relating to RPCs

© 2014 IBM Corporation