ibm spectrum scale scalable global parallel file...

46
IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS) Spectrum Scale 2019 Josef (Sepp) Weingand Business Development Leader DACH Data Retention Infrastructure - Tape Storage Infos / Find me on: [email protected] , +49 171 5526783 Blog http://sepp4backup.blogspot.de/ Facebook https://www.facebook.com/Sepp4Tape/ http://www.linkedin.com/pub/josef-weingand/2/788/300 http://www.facebook.com/josef.weingand http://de.slideshare.net/JosefWeingand https://www.xing.com/profile/Josef_Weingand https://www.xing.com/net/ibmdataprotection

Upload: others

Post on 19-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Spectrum Scale

Scalable Global Parallel File Systems(GPFS)

Spectrum Scale 2019

Josef (Sepp) Weingand

Business Development Leader DACH – Data Retention Infrastructure - Tape Storage

Infos / Find me on: [email protected], +49 171 5526783Blog http://sepp4backup.blogspot.de/

Facebook https://www.facebook.com/Sepp4Tape/

http://www.linkedin.com/pub/josef-weingand/2/788/300

http://www.facebook.com/josef.weingand

http://de.slideshare.net/JosefWeingand

https://www.xing.com/profile/Josef_Weingand

https://www.xing.com/net/ibmdataprotection

Page 2: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 2

Page 3: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 3

Spectrum Scale I/O for Summit

• Single name space supporting 250 PB capacity• Total number of files supported is 100B (10 mio files per single directory)• Single Node 16 GB/sec sequential read/write as requested from ORNL• Performs at an aggregate sequential peak read/write bandwidth of 2.5 TB/s• Performs at an aggregate random peak read/write bandwidth of 2.2 TB/s• Provides rich metadata performance - single directory parallel create rate of 50,000/s• Provides rich interactive performance - @32 KiB I/O 2.6 million IOPs

Page 4: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 4

Integration with Hortonworks Data Platform (HDP)

— IBM Spectrum Scale allows Hadoop applications to access data on

centralized or local storage

• Data can also be accessed through NFS, SMB and POSIX

• Spectrum Scale Storage can also be shared with other applications

— Hortonworks Data Platform (HDP) fully integrates with

IBM Spectrum Scale

• HDP uses best of breed open source Apache Hadoop components

• Fully tested and supported with centralized management GUI (Ambari)

— HDP can leverage Spectrum Scale tiering function

facilitate different performance tiers (hot, warm and cold)

— HDP supports federation of different data lakes

HDFS RPC

Applications

Higher-level languages:

Hive, BigSQL JAQL, Pig …

MapReduce API

Hadoop File system APIs

HDFS Client

Spectrum Scale HDFS Connector

Global Name Space

NF

S,

PO

SIX

, S

MB

Page 5: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

5

5

Flash

NVIDIADGX

Faktisch alle anderen Hersteller arbeiten im AI Umfeld über NFS

NFS

„Storage AI Performance“

NFS =

Not For Speed

Page 6: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 6

IBM & Nvidia – November 2018

Composable to grow as needed

• Up to 9 DGX-1 servers (72 GPUs) in a rack

• Storage scale-out from a single 300TB node to 8 Exabytes and a Yottabyte of files

High-Performance to feed the GPUs

• NVMe throughput of 120GB/s in a rack

• Over 40GB/s sustained random read per 2U

Extensible for the AI Data Pipeline

• Support for any tiered storage, including Cloud and Tape

Introducing IBM Spectrum Storage for AI

with NVIDIA DGX

A Scalable, software-defined infrastructure powered by IBM Spectrum Scale and NVIDIA DGX-1 systems. IBM Spectrum Storage for AI with NVIDIA DGX is a powerful engine for your data pipeline.

The workhorse of an AI data infrastructure on which companies can build their shared data service.

Page 7: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

7© IBM Corporation 2019 7

Spectrum Scale User Group 2019 Stuttgart 19.-

21. März 2019

• Spectrum Scale user group meeting @ISC June 17 2019 – Germany

Page 8: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 8

Was ist Spectrum Scale (a.k.a GPFS)?

• General Parallel File System: IBM’s shared disk, parallel cluster file

system. Runs under AIX, Linux and Windows OS on IBM Power and

Intel/AMD x86 architecture. Designed for high performance commercial

and scientificapplications. Used on many of the largest supercomputers

in the world.

• Cluster: 2-10,000 nodes, fast reliable communication, common admin domain.

• Shared disk: all data and metadata on storage devices accessible from any node through block I/O interface

• (“disk”: any kind of block storage device)

• Parallel: data and metadata flow from all of the nodes to all of the disks in parallel.

• Reduce storage costs with tiering/archive function to Object storage/Cloud, and Tape

Page 9: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 9

Spectrum Scale (GPFS): 20+ years of development

• GPFS (Spectrum Scale) started in 1998

• Spectrum Scale always in the leaders zone, across many use cases and industries

Best multi-purpose “open hardware” data store available:

• Most other products are niche, don’t work well in other areas

• Spectrum Scale is excellent for:

• File storage/NAS for AI, HPC, Technical, compute intensive, high performance

• File storage for Enterprise applications

• e.g. SAP, MQ, DB2, SAS, Datastage, Sterling, etc.

• Storage for Object, Analytics (Hadoop), OpenStack etc.

• Cost effective tiering for all data stored (file, object, analytics, etc.) = HSM

• Flash->Disk->Object or tape

• Practical openness: clusters can support Linux, Windows, and AIX: across x86, mainframe, and

POWER servers

Page 10: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 10

System View

• Scale File system is where files are stored

• File system is comprised of one or more pools

• Scale Pool is the destination for placement and migration of data

• Is comprised of NSD (one or more)

• All NSD ofonepoolmust comefromsame type ofstorage(disktype,

RAID-type)

• Scale Network Shared Disk (NSD)is device where data blocks are

stored

• OneNSD isoneLUN ofthestoragesystem

• Storage LUN is provided by the storage systems

• Stores thedataon block device

• Can beprovisionedbyoneormoreRAID arrays

• All NSD in onepoolmust comefromthesame type ofstorage

Page 11: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 11

Spectrum Scale Policies

• Placement policy:

• Evaluated at file creation time

• Determines initial file placement and replication

• Migration policy:

• Evaluated periodically or on-demand

• Can move data between pools, changes replication, delete data,or run arbitrary user

commands

• Policy engine (mmapplypolicy):

• Fast, parallel directory traversal combined with inodescan

• Runs outside the daemon, but makes use of Scale infrastructure and APIs(extended readdir,

inodescan)

• Can be used as powerful framework for building parallel file system utilities, e.g.

• Fast find/grep

• Remote replication

Page 12: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 12

2019 Unleash Storage Economics on a Global Scale

Block

iSCSI

Client workstations

Users and applications

HPCCompute farm

Traditionalapplications

Shared Namespace

Analytics

Transparent

HDFS

OpenStack

Cinder

Glance

Manila

Object

Swift

S3

Powered byIBM Spectrum Scale

Automated data placement and data migration

Disk Tape Shared Nothing Cluster / ECE

FlashNVMe

New Genapplications

Worldwide Data Distribution and collaboration

Site B

Site A

Site C

SMBNFS

POSIX

File

Encryption

DR Site

AFM-DR

JBOD/JBOF

Spectrum Scale RAID

RestAPI Immutability

Audit Logging

Transparent Cloud

Tier

Share

Containers

Storage Enabler for Containers

AFM

Kubernetes

AI

TCT

GUI / Admin Watch Folder

Compression

Page 13: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 13

Spectrum Scale on AWS #1

https://aws.amazon.com/de/quickstart/architecture/ibm-spectrum-scale/

Page 14: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 14

IBM Spectrum Scale&Archive Trial VM now available @ ibm.com

• A 90-day evaluation version of IBM Spectrum Archive Enterprise Edition is now available on

ibm.com.

• This Trial VM offers pre-configured IBM Spectrum Archive instance in a virtual machine based on

IBM Spectrum Archive Enterprise Edition 1.2 GA and IBM Spectrum Scale 4.2 versions fully

functional for demonstrations, hands-on training, functional testing, and data management

planning. This download can be deployed in minutes using VirtualBox and you can try it out on

your laptop, desktop or server - no need for tape hardware.

• In addition to the user guide included in this download, a dedicated Redbook was created for the

Spectrum Archive Trial VM. It comes in a nicer format with better explanation with many

contributions from the technical writers and reviewers.

• Here's the URL for Download. http://www.redbooks.ibm.com/abstracts/redp5384.html.

Page 15: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 15

Spectrum Scale Global Data Sharing

— Remote file system mount (cross-cluster-

mount) allows to share files between site

synchronously at high speeds

— IBM Spectrum Scale Active File Management

(AFM) allows to share files asynchronously

between sites

• Files are globally visible and only locally present

when accessed or pre-fetched

• Tolerates reliability and latency of WAN connections

— IBM Aspera can be used for efficient long-

distance file transfer

Global Name Space

Spectrum Scale

Spectrum Scale

Spectrum Scale

NFS, SMB, POSIX, Swift, S3, HDFS

Active file management

Spectrum Scale

Remote mount

Global Name Space

NFS, SMB, POSIX, Swift, S3, HDFS

Global Name Space

NFS, SMB, POSIX, Swift, S3, HDFS

File transfer

Spectrum Scale

AFM (WAN Caching)

Page 16: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 16

AFM Migration use case (local update)

• Migrate files from old NFS (e.g. SAM-FS) servers to new Spectrum Scale/GPFS

cache server

• After establishing AFM relation cache server „sees“ all files from old NFS (e.g.

SAM-FS) server

• Cache is configured in LU mode

• Home provides NFS share or GPFS cluster mount

• Files can be pre-fetched (transferred) based on results of policy scans

• Switch over when sufficient files are pre-fetched

• Uncached files accessed on cache are transferred from home

• Files changed on cache are not replicated back

row

Legacy NFS serverGPFS Cache server

NFS

Old source serverNew target server

Page 17: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 17

Spectrum Scale Cloud data exchange via TCT

— With IBM Spectrum Scale Transparent Cloud Tiering files are

copied to Object Storage (S3)

• Object storage can be on or off-premises

• Mapping of files and objects is included (manifest)

— Objects can be imported as files in other TCT instance (cluster)

• Based on file to object mapping (manifest)

• Import creates stub-files, does not transfer data

• Objects are transferred upon access or pre-fetch operation

— No global locking, last writer wins

Global Name Space

Global Name Space Global Name Space

TCT

TCT TCT

Export

Import

Page 18: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Storage & SDI

© IBM Corporation 2019 18

INTEGRATED Spectrum Scale - ESS System

The Enterprise Features and Reliability of

IBM’s Spectrum Scale File System

The Power of the Spectrum Scale

Architecture

Proven at Scale

across global

organizations

Easy to Use, get

up and running in

a few hours

Enterprise ready:

data protection,

management,

security and more

Industry’s Fastest

Converged

Scale-Out

Platform

Highest

performance

throughput per

hard disk drive

Industry’s highest

quality disk

drives with lowest

disk failure rate

Robust

Management

and Support

Designed for the world’s most

data intensive workflows

Pre-integrated, tested,

tuned, ready to deploy

Removes metadata

bottlenecks with SSD

De-clustered RAID for Spectrum

Scale on Power platform

Drastically lowers Total

Cost of Ownership

Automated tiering

and ILM from HDDs

to Flash to Tape

Page 19: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Systems19

What is Elastic Storage Server (ESS)?

“Building blocks” of Spectrum Scale storage

• Software Defined Storage implementation

• Standard hardware and software

• RedHat Linux, Spectrum Scale, Power servers, commodity storage enclosures (JBODs), networks (Infiniband, E’net)

• Unique intelligent de-clustered erasure coding (“super RAID”)•High performance, high availability, high reliability storage layer

ESS has its own built-in JBOD storage = enclosures with drives

• But can mix/match using Spectrum Scale support for almost any block storage (disk, flash, etc.)

ESS reduces risk: quick to deploy and grow a Spectrum Scale cluster • Fully validated hardware and software stack• Pre-assembled and pre-configured• Comes with lab services for on-site deployment

Flash, disk, and hybrid ESS models

▪Elastic Storage Server

(ESS) is an Integrated

building block solution

for Spectrum Scale

Page 20: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Systems20

Spectrum Scale Native RAID (Declustered SW Raid)

Spectrum Scale build your own solution

Requires dedicated disk controllers(SAN)

ESS eliminates need for SAN; Implements

de-clustered RAID in Software

▪Integrate de-clustered RAID

into software stack

▪QDR/EDR IB

▪10/40 GigE

Page 21: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Storage & SDI

21

Declustered Raid6 example

Page 22: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

The data deluge80% of all files created

are inactive

no access in at least 3 months!

=> NAS: Never Access Storage

| 22

Source: D. Anderson, 2013 IEEE Conf. on Massive Data Storage

Pro Jahr verkauften HDDs entsprechen 1,8 Mio Autos an CO2

Page 23: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

HDD ?!?▪ HDD has reached the limit of (known) materials to produce

larger write fields:

• Areal density/capacity scaling achieved by shrinking the same basic

technology to write smaller and smaller bits on disk

▪ Technologies to go beyond the superparamagnetic limit:

• Two dimensional magnetic recording (TDMR)

• Heat Assisted Magnetic Recording (HAMR)

• Microwave Assisted Magnetic Recording (MAMR)

• Bit Patterned Media (BPM)

▪ Recent Capacity Scaling of HDD: Volumetric Density

• Slow down in areal density scaling partially compensated by adding

more disks: conventional technology has reached space limit (~5

platters)

• Helium filled drive less turbulence thinner disks higher capacity

• WD 6TB (2013) 6 platters

• HGST 10TB Drive (2015) 7 platters - CAGR 29%

• 14 TB 9 platters (2017) – CAGR18%

• Doesn’t scale: No space for more heads and platters!

| 23

Magnetic Media “Trilemma”:

Page 24: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Seagate hits density problem with HAMR, WD infects MAMR

with shingles ....

▪ https://www.theregister.co.uk/2019/03/07/hamr_and_mamr_hdd_direction_debate/

▪ Seagate's next-generation HAMR disk drive will be a drop-in replacement while Western Digital's MAMR drive will not

▪ WD's technical product marketing director Eyal Shani told us that MAMR drives would use host-managed shingling, and so would not be drop-in replacements for existing drives.

▪ With shingling, write tracks are partially overlapped, meaning any rewriting of already written data incurs a time penalty as the affected block of write tracks is read, altered with the new data, and then rewritten.

| 24

Page 25: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

IBM Forschungslabor Rüschlikon: Tapetechnologie

Demonstration August 2017Areal recording density :

201 Gb/in2

20x TS1155 areal density

→ 330 TB cartridge capacity

Mit dieser Demonstration zeigt IBM das Potential zur Steigerung der Kapazität für Tape auf!

Dies mit bereits heute eingesetzten Technologien!

HDD Technology:• No room to continue adding platters• HDD capacity will be driven by areal density scaling (10-20% /a)

Cost advantage of tape will continue to grow!

Page 26: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Tape Drive History and RoadmapLTO

GenerationsLTO-5 LTO-6 LTO-7 LTO-8 LTO-9 LTO10 LTO11 LTO12

New Format

Capacity

(Native)

1.5 TB (L5) 2.5 TB (L6)

6 TB (L7) 12.0 TBUp to 24 TB Up to 48 TB Up to 96 TB Up to 192 TB

Other Format

Capacities

(Native)

800 GB (L4)(400 GB L3 R/O)

1.5 TB (L5)(800 GB L4 R/O)

2.5 TB (L6)(1.5 TB L5 R/O)

9 TB (M8)

6 TB (L7)

Up to 12 TB (L8)(6 TB L7 R/O)

Up to 24 TB (L9)(12 TB L8 R/O)

Up to 48 TB (L10)(24 TB L8 R/O)

Up to 96 TB (L11)(48 TB L10 R/O)

Native Data

Rate

140 MB/s 160 MB/s 300 MB/s Up to 360

MB/s

Up to 708 MB/s Up to 1100 MB/s

Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

2008

2010

2011

2013

TS1100

GenerationsTS1130 TS1140 TS1150 TS1155 TS1160 TS1170

New Format

Capacity (Native)

1 TB (JB) 640 GB (JA)

4 TB (JC)1.6 TB (JB)

10 TB (JD)

7 TB (JC)

15 TB (JD)20TB (JE)15 TB (JD)

7 TB (JC)

Up to 50 TB (JF)Up to 30 TB (JE)

15 TB (JD)

Other Format

Capacities (Native)

700 GB (JB)

500 GB (JA)

300 GB (JA)

1 TB (JB)

700 GB (JB)

(All JA R/O)

4 TB (JC) 7 TB (JC)4 TB read only (JC)

10 TB (JD)

7 TB (JC)

4 TB (JC)

10 TB (JD)

Native Data Rate 160 MB/s 250 MB/s 360 MB/s 360 MB/s 400 MB/s Up to 1000 MB/s

2014

2015

2017

2017

2018

Page 27: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Data Growth and the GAP with HDD Technology

Bisher war die Kapazitätssteigerung auf den Disk größer als das Datenwachtum. In Zukunft wird aber das Datenwachstum wesentlich größer sein!

Bedeutet wir müssten mehr Disk/HDDs installieren oder mehr auf Tape setzen!

Wir sollten die „Tape is dead“ Diskussion endlich begraben!

Page 28: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Microsoft Azure use Tape▪ Why Microsoft Azure will use Tape:

• The cheapest most economical way to store cold data, continued

improvement with 30%+ CAGR and easiest roadmap.

• Cheaper-Separating the media from the reader/writer

• Both Tape and Optical pull the media out of the reader/writer.

• 1 expensive part, and service any amount of media.

• Tape libraries can be more flexible in drive / media ratio

• New tape drives can store more data on older media

• Tape libraries have less environmental constraints

| 28

Faktor ~2 Faktor ~4

Page 29: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

IBM Spectrum Archive: Policy-based Cost Optimization

•Powerful policy engine

–Example: File Heat measures how often the file is accessed.

–As the file gets “cold” move it automatically to a lower cost

storage pool

–Information Lifecycle Management

–Fast metadata ‘scanning’ and data movement

–Automated data migration to based on threshold

•Users not affected by data migration

–Single namespace

–Persistent view of the data

•Tape as the external pool of Spectrum Scale

Small files last accessed > 30 days

last accessed > 60days

Silver pool is >60% full Drain it to 20%

accessed today and file size is <1G

Send it back to Silver pool when

accessed

System pool

(Flash)

Gold pool

(SSD)

Silver pool

( NL SAS)

TS4500

Spectrum Archive

Automation

Page 30: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Positioning Spectrum Scale ILM/HSM Function in general

Storing large volumes of larger files which are infrequently accessed on tape

▪ Optimize Total Cost of Ownership leveraging tape

▪ Providing easy access to files stored on tape in a tiered storage system

• Transparent user access to files via GPFS file system layer

• Automated migration from disk to tape using GPFS policies

▪ Exchanging data on LTFS tape

• Leverage copy, export and import functionality for LTFS tapes

➔Spectrum Archive does not make tape faster but much easier to use in many industries

and branches

| 30

100s of TB > 10 MB Files are never or rarely accessed

Page 31: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Systems 31

Global Name Space

Spectrum Scale

Combine disk/flash and tape optimizing storage cost in scaling

environments

Spectrum Scale

LTFS LTFS LTFS LTFS

Frequently used files Never or infrequently used files

Migration & Recall

Single file system viewC:\user defined namespace

Spectrum Scale

Move files which are no longer accessed to LTFS tape, leveraging

automation, transparent access and standardized format.

Spectrum Archive

Page 32: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Functional Overview▪ Supports pre/migration to address different use cases

▪ Stub size can be defined

▪ Fully integrated with Spectrum Scale cluster and file system capabilities

▪ Parallelism node wide and cluster wide

▪ Independent file list driven processing

▪ Supports multiple Spectrum Protect/Archive servers for a single file system

▪ Close integration with Spectrum Protect Backup Archive Client

▪ The migration of files in tape pools of the SP server is optimized for performance

▪ Read starts recall

• SC allows to set the stub-size for a managed file system

• When stub-size is set to 1 MB the first 1 MB for a file is kept in the stub

• SC allows to set a trigger when recall is started

• Trigger is relative to stub-size, e.g. 512 KB

• When read request gets beyond the trigger the recall is initiated in the background

| 32

Stub

Recall trigger

Page 33: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Highlevel Architecture

| 33

Spectrum ScaleNSD server

Spectrum ScaleNSD clients

Supported platforms: AIX™, xLinux,

pLinux, zLinux, Windows®

Supported platforms: AIX™, xLinux, pLinux, zLinux,

HP, Sol, Windows®

Supported platforms: AIX™, xLinux, zLinux (4Q15)

Spectrum Archive Enterprise Edition

Supported platform: xLinux

Supported storage medium:

LTFS compatible Tape Library

Function:• Backup, Restore• Migration, Recall• SOBAR

Function:• Migration, Recall

Customer application can run on Spectrum Protect NSD client or server nodes.

Supported platforms: Customer application on NSD client: AIX™, xLinux, pLinux, zLinux, Windows®Customer application on NSD server:Spectrum Protect: AIX™, xLinux, zLinux (4Q15)Spectrum Archive: xLinux

Disk, Optical, Tape Library, Object Storage

Both components use the same HSM logic and cannot be operated in one Spectrum Scale cluster

Spectrum Protect for Space ManagementSpectrum Protect Backup Archive Client

Spectrum ProtectServer

Supported storagetechnologies

Page 34: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Spectrum Archive Architecture

▪ Spectrum Archive integrates with Spectrum Scale as tape tier

• Spectrum Scale provides global name space

• Spectrum Archive migrates data to tape

▪ Each Spectrum Archive node has tape drives

• Supports up to two libraries, one per node

▪ Files are (pre-) migrated from disk to tape

• Based on policies or file lists

• Supports multiple copies on distinct tapes

▪ Files are recalled on access or by command

• Supports tape optimized recalls

▪ Tapes can be exported and imported

▪ Workload is distributed across nodes

| 34

Spectrum ScaleNode

LTFS Tape

Users and

application

Policy based migration

Spectrum

Archive

File & Object protocols

Spectrum Scale &

Spectrum Archive

Cluster

Spectrum ScaleNode

LTFS Tape

Spectrum

Archive

File systems with global namespace

Page 35: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Spectrum Archive Architecture with two tape libraries

▪ At least two Spectrum Archive nodes required

• Each manages one tape library

▪ Each node is in one node group

• Nodes in one node group are connected to the same library and

share all tape resources

• Additional nodes can be added to one node group

• Provides high availability of the node group

▪ Node groups can be stretched over two locations

• Files are replicated by Spectrum Scale on disk

• Files are migrated by Spectrum Archive to two tapes

• Read from local disk can be configured

| 35

Disk

SAN

TCP/IP Network

Node

1

Group

A Disk

Node

2

Group

B

Library 1

Library 2

SAN

File system

Stretched Spectrum Archive cluster

Page 36: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems 36

Spectrum Archive functional overview

GPFS Node 1

Node group 1

GPFS Node 2

Node Group 2

Global name space

LTFS Metadata

User file system

Users and Applications

User data

Sp. Archive Sp. Archive

Pool 1 Pool 2

Migration with

optional copy to

other tape in

other library

Recall with

option for bulk

recall

Tape management: reclamation (free space) and reconcilation (synchronize)

Export with

option to keep

stub in GPFS

Import

(only creates

stubs in GPFS)

Library 1 Library 2

Page 37: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Archive & Storage Tiering Solutions with Tape

• Spectrum Archive Single Drive Edition - LTFS SE

• Sepctrum Archive Library Edition – LTFS LE

• Spectrum Protect Space Managment – TSM HSM

• Spectrum Scale with

• Spectrum Archive - LTFS EE

• SpectrumProtect (HSM)

• Spectrum Protect

• HPSS

• ADMIRA/AREN for Video and Audio

Page 38: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Press…

▪ 63% of organizations expect to increase or at least maintain their tape footprint for the foreseeable future.

▪ https://www.esg-global.com/data-point-of-the-week-01-21-19?utm_campaign=Data%20Point%20of%20the%20Week&utm_source=hs_email&utm_medium=email&utm_content=69116242&_hsenc=p2ANqtz-_JzRAzgcewHNgo8a27E9Oo_2UnjALrp1dtWVr1fVD50rcOgQYpaE408ycJSXGMkzeITTOXpMljFX_k2L-8ByvxywoNQw&_hsmi=69141333

▪ From MySpace to MyFreeDiskSpace: 12 years of music – 50m songs

▪ "...We apologize for the inconvenience and suggest that you retain your back up copies."

• https://www.theregister.co.uk/2019/03/18/myspace_server_migration_data_loss/?fbclid=IwAR2n5stx2u3xZls9lqQc23H8_EBOTYUv6WV9iPav2xFq1h7NwotTmEun8K4

| 38

Page 39: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

BSI warnt vor gezielten Angriffen auf Unternehmen• „Wir erleben derzeit die massenhafte Verbreitung von raffinierten Angriffsmethoden durch die Organisierte

Kriminalität, die bis vor einigen Monaten nachrichtendienstlichen Akteuren vorbehalten waren….“, so BSI-Präsident Arne

Schönbohm.

• Dabei versuchen die Angreifer etwaige Backups zu manipulieren oder zu löschen und bringen dann selektiv bei

vielversprechenden Zielen koordiniert Ransomware auf den Computersystemen aus. Dabei kommt es teilweise zu

erheblichen Störungen der Betriebsabläufe. Durch dieses aufwändige Vorgehen können Angreifer deutlich höhere

Lösegeldforderungen an die Unternehmen stellen, als es bei bisherigen ungezielten Ransomware-Kampagnen der Fall

war. Neben einzelnen Unternehmen sind zunehmend auch IT-Dienstleister betroffen, über deren Netzwerke sich die

Angreifer dann Zugang zu deren Kunden verschaffen.

• Es droht ein kompletter Datenverlust

Im Gegensatz zu automatisierten und breitangelegten Ransomware-Kampagnen, bedeuten diese manuell ausgeführten

Angriffe einen deutlich höheren Arbeitsaufwand für die Angreifer. Da sie dadurch jedoch gezielt lukrativere Ziele angreifen

und u.U. Backups so manipulieren bzw. löschen, dass diese nicht mehr zur Wiederherstellung der Systeme zur

Verfügung stehen, können die Angreifer wesentlich höhere Lösegeldbeträge fordern. Unternehmen, die über keine

Offline-Backups verfügen, verlieren bei diesem Vorgehen alle Backups, selbst wenn diese auf externen Backup-

Appliances liegen. Dem BSI sind mehrere Fälle bekannt, bei denen die Verschlüsselung aller Systeme sowie der

Backup-Appliances nicht in eine Risikobewertung einbezogen wurde, weshalb die betroffenen Unternehmen alle

Daten verloren haben.

▪ https://www.datensicherheit.de/aktuelles/ransomware-bsi-warnung-gezielte-angriffen-unternehmen-31815

| 39

Es droht massiver Datenverlust, falls keine Tape Backup eingesetzt wird!

=> Handeln Sie jetzt und erneuern Sie Ihre IBM Tape Backup Systeme, nur Tape Backup schützt vor Datenverlust!

Page 40: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

migration

Disaster Recovery Using SOBAR (Scale Out Backup And Restore)

Function

•Function Backup

•Spectrum Protect HSM used to premigrate files

•SOBAR toolset used to generate filesystem metadata image

•Spectrum Protect backup archive client used to backup imagefiles

•Function Restore

•Spectrum Protect backup archive client used to restore image files

•SOBAR toolset used to recreate file system structure

•Spectrum Protect HSM used to pre-fetch files and allow direct access by applying transparent recall

Challenges

•All files to be included have to be premigrated or migrated

•Cluster configuration has to be backed up separately

Recommendations

•Frequently applied policy rules should ensure that newly created files will be premigrated immediately

•Integrate SOBAR backup to your business process to prevent file changes shortly before image capturing

•Prepare pre-fetching importance list for recovery processing

recall (transparent and manual)

Spectrum Scale ClusterSpectrum Protect

Server

Spectrum Protect for

Space Management client

AND backup archive client

typically installed on

serveral cluster nodes

Spectrum Scale

SOBAR toolset used

for processing

image backup

image restore

40© Copyright IBM Corporation 2015

Page 41: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

Tapedrive Sets:4 x TS1160 SAP003382 x TS1160 SAP00334

GPFS-Client GPFS-Client GPFS-Client GPFS-Client

GPFS-NSD-

Server

sap00b36 sap00b35 sap00b37 sap00b38

sap00334sap00338SpectrumArchive EE

Node 1

LTFSeeHSM

MMM

SpectrumArchive EE

Node 2

LTFSeeHSM

MMM

GPFS-NSD-

Server

TLL44O01(LTFS)

TLL44O0x(Spectrum Protect)

TS4500 [3 Frames, HA, 1100 Slots]

14(12) x TS1160380 x 3599 Typ E

Filesystems:ltfsmeta01: 2 x 500 GB Flash/SSDhdfs01 (DMAPI enabled): 20 TB SAS or

NL-SAS

CP

4 x TS1160 2 (4)x TS1160

Prod-P-Pool0150 x 3599 TypE

Test-R-Pool0110 x 3599 Typ E

LTFSeeActive

Control-Nodefür TLL44O01

LTFSeeInactive

Control-Nodefür TLL44O01

Hadoop-Backup Cluster

CP CP CP

LTFS-Tape-Pools:Prod/Test-P-Pool0x Prod/Test-Primary Copy PoolProd/Test-R-Pool0x Prod/Test-Redundant Copy Pool

TLP44

Test-P-Pool0110 x 3599 TypE

Prod-R-Pool0150 x 3599 Typ E

CP CP

Page 42: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Systems

Josef (Sepp) WeingandBusiness Development Leader DACH – Data Retention Infrastructure - Tape

StorageInfos / Find me on: [email protected], +49 171 5526783Blog http://sepp4backup.blogspot.de/

Facebook https://www.facebook.com/Sepp4Tape/

http://www.linkedin.com/pub/josef-weingand/2/788/300

http://www.facebook.com/josef.weingand

http://de.slideshare.net/JosefWeingand

https://www.xing.com/profile/Josef_Weingand

https://www.xing.com/net/ibmdataprotection

Page 43: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM Systems

Page 44: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

© IBM Corporation 2019 44

Spectrum Scale FAQ & Redbooks

Contact mailto:[email protected] you need more info.https://www.ibm.com/support/knowledgecenter/STXKQY/ibmspectrumscale_welcome.html

(Coming soon!)

Page 45: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Disclaimers

45

▪ Copyright© 2015 by International Business Machines Corporation.

▪ No part of this document may be reproduced or transmitted in any form without written permission from IBM Corporation.

▪ The performance data contained herein were obtained in a controlled, isolated environment. Results obtained in other operating environments may vary significantly. While IBM has reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. These values do not constitute a guarantee of performance. The use of this information or the implementation of any of the techniques discussed herein is a customer responsibility and depends on the customer's ability to evaluate and integrate them into their operating environment. Customers attempting to adapt these techniques to their own environments do so at their own risk.

▪ Product data has been reviewed for accuracy as of the date of initial publication. Product data is subject to change without notice. This information could include technical inaccuracies or typographical errors. IBM may make improvements and/or changes in the product(s) and/or programs(s) at any time without notice. Any statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only

▪ References in this document to IBM products, programs, or services does not imply that IBM intends to make such products, programs or services available in all countries in which IBM operates or does business. Any reference to an IBM Program Product in this document is not intended to state or imply that only that program product may be used. Any functionally equivalent program, that does not infringe IBM's intellectually property rights, may be used instead. It is the user's responsibility to evaluate and verify the operation of any on-IBM product, program or service.

▪ THE INFORMATION PROVIDED IN THIS DOCUMENT IS DISTRIBUTED "AS IS" WITHOUT ANY WARRANTY, EITHER EXPRESS OR IMPLIED. IBM EXPRESSLY DISCLAIMS ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NONINFRINGEMENT.

▪ IBM shall have no responsibility to update this information. IBM products are warranted according to the terms and conditions of the agreements (e.g. IBM Customer Agreement, Statement of Limited Warranty, International Program License Agreement, etc.) under which they are provided. IBM is not responsible for the performance or interoperability of any non-IBM products discussed herein.

▪ Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products in connection with this publication and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

▪ The provision of the information contained herein is not intended to, and does not, grant any right or license under any IBM patents or copyrights. Inquiries regarding patent or copyright licenses should be made, in writing, to:

IBM Director of LicensingIBM CorporationNorth Castle DriveArmonk, NY 10504-1785U.S.A.

Page 46: IBM Spectrum Scale Scalable Global Parallel File Systems(GPFS)konferenz-nz.dlr.de/pages/samfs2019/present/2. Konferenztag/1 - HSM... · Introducing IBM Spectrum Storage for AI with

IBM SystemsIBM Systems

Trademarks

46

▪ The following terms are trademarks or registered trademarks of the IBM Corporation in either the United States, other countries or both.– IBM, GDPS, Spectrum Storage, Spectrum Archive, Spectrum Scale, System Storage, System z, Virtualization Engine

▪ Linear Tape File System, Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

▪ Other company, product or service names may be trademarks or service marks of others