general parallel file system

Slide 1

Presentation by:Lokesh Pradhan General Parallel File SystemIntroductionFile SystemWay to organize data which is expected to be retained after the program terminates by providing procedures to store, retrieve and update data as well as manage the available space on the device which contains it.As we are juniors and senior standing in comp science, we all know what is File System . However, I want repeat what it meansWay to organize data which is expected to be retained after the program terminates by providing procedures to store, retrieve and update data as well as manage the available space on the device which contains it2Types of File SystemTypesExamplesDisk file systemFAT, exFAT, NTFSOptical discsCD, DVD, Blu-rayTape file systemIBMs Linear tapeDatabase file systemDB2Transactional file systemTxF, Valor, Amino, TFFSFlat file systemAmazons S3Cluster file system Distributed file system Shared file system San file system Parallel file system

NFS, CIFS, AFS, SMB, GFS, GPFS, LUSTRE, PAS

Here the list of types. They can be classified into many types, but here are only few you might know or will from now from this slide.3In HPC worldEqually large applicationsLarge input data set (e.g. astronomy data)Parallel execution on large clusters

Use parallel file systems for scalable I/Oe.g. IBMs GPFS, Suns Lustre FS, PanFS, andParallel Virtual File System (PVFS)

High performance computing is another style of computing at a comparable scale which process large input data set in parallel

In HPC world, parallel file systems are used for highly scalable storage I/O

Examples of parallel file systems are GPFS, Lustre, PanFS and PVFS

??? What do you think we can propose?

4General Parallel File SystemCluster: 512 nodes today, fast reliable communication

Shared disk: all data and metadata on disk accessible from any node through disk I/O interface (i.e., "any to any" connectivity)

Parallel: data and metadata flows from all of the nodes to all of the disks in parallel

RAS: reliability, accessibility, serviceability

The General Parallel File System (GPFS) is a high-performance shared disk clustered file system developed by IBM. As it is based on the characteristics of shared disk file system, it has provides concurrent high speed file access to the applications executing on multiple nodes of cluster. It can be used with AIX 5L clusters, Linux clusters, on Microsoft Windows Server, or a heterogeneous cluster of AIX, Linux and Windows nodes. Moreover, GPFS offers tools to manage and administrate the GPFS cluster and hence, allows the shared access to file systems from remote GPFS clusters. there are the characteristics of GPFSCluster: 512 nodes today, fast reliable communication, common admin domainShared disk: all data and metadata on disk accessible from any node through disk I/O interface (i.e., "any to any" connectivity)Parallel: data and metadata flows from all of the nodes to all of the disks in parallelRAS: reliability, accessibility, serviceability

GPFS is the filesystem of the ASC Purple Supercomputer[2] which is composed of more than 12,000 processors and has 2 petabytes of total disk storage spanning more than 11,000 disks.

5History of GPFSShark video serverVideo streaming from single RS/6000Complete system, included file system, network driver, control serverLarge data blocks, admission control, deadline schedulingBell Atlantic video-on-demand trial (1993-94)Tiger Shark multimedia file systemMultimedia file system for RS/6000 SPData striped across multiple disks, accessible from all nodesHong Kong and Tokyo video trials, Austin video server productsGPFS parallel file systemGeneral purpose file system for commercial and technical computing on RS/6000 SP, AIX and Linux clusters.Recovery, online system management, byte-range locking, fast pre-fetch, parallel allocation, scalable directory, small-block random access.Released as a product 1.1 - 05/98.

HistoryShark video serverVideo streaming from single RS/6000Complete system, included file system, network driver, control serverLarge data blocks, admission control, deadline schedulingBell Atlantic video-on-demand trial (1993-94)Tiger Shark multimedia file systemMultimedia file system for RS/6000 SPData striped across multiple disks, accessible from all nodesHong Kong and Tokyo video trials, Austin video server productsGPFS parallel file systemGeneral purpose file system for commercial and technical computing on RS/6000 SP, AIX and Linux clusters.Recovery, online system management, byte-range locking, fast prefetch, parallel allocation, scalable directory, small-block random access, ...Released as a product 1.1 - 05/98, 1.2 - 12/98, 1.3 - 04/00,

6What is Parallel I/O?Multiple processes (possibly on multiple nodes) participate in the I/OApplication level parallelismFile is stored on multiple disks on a parallel file system

7What does Parallel System support? A parallel file system must supportParallel I/OConsistent global name space across all nodes of the clusterIncluding maintaining a consistent view across all nodes for the same fileProgramming model allowing programs to access file data Distributed over multiple nodes From multiple tasks running on multiple nodesPhysical distribution of data across disks and network entities eliminates bottlenecks both at the disk interface and the network, providing more effective bandwidth to the I/O resources

Why use general parallel file systems? Native AIX File System No file sharing - application can only access files on its own nodeApplications must do their own data partitioning Distributed File SystemApplication nodes (DCE clients) share files on server nodeSwitch is used as a fast LANCoarse-grained (file or segment level) parallelismServer node : performance and capacity bottleneck

GPFS Parallel File SystemGPFS file systems are striped across multiple disks on multiple storage nodesIndependent GPFS instances run on each application nodeGPFS instances use storage nodes as "block servers" - all instances can access all disks

Native AIX File System (JFS)No file sharing - application can only access files on its own nodeApplications must do their own data partitioningDCE Distributed File SystemApplication nodes (DCE clients) share files on server nodeSwitch is used as a fast LANCoarse-grained (file or segment level) parallelismServer node is performance and capacity bottleneck

GPFS Parallel File SystemGPFS file systems are striped across multiple disks on multiple storage nodesIndependent GPFS instances run on each application nodeGPFS instances use storage nodes as "block servers" - all instances can access all disksParallel file systems offer numerous advantages and address some key issues by providing:Concurrent access to files by multiple nodes of a cluster. This prevents users from having to utilize the local disk of each node and then reassemble the output into either a coherent single file or a collection of multiple files (sometimes referred to as post-mortem reassembly). Scalable performance. Parallel file systems are designed with scalability in mind. As clusters grow, more disk and more network connections need to be incorporated into the fabric of a file system. A single disk space where serial files and files created by parallel applications can coexist and be manipulated.

9Performance advantages with GPFS file systemAllowing multiple processes or applications on all nodes in the cluster simultaneouslyAccess to the same file using standard file system calls. Increasing aggregate bandwidth of your file system by spreading reads and writes across multiple disks. Balancing the load evenly across all disks to maximize their combined throughput. One disk is no more active than another.Performance advantages with GPFS file systemUsing GPFS to store and retrieve your files can improve system performance by: Allowing multiple processes or applications on all nodes in the cluster simultaneousaccess to the same file using standard file system calls. Increasing aggregate bandwidth of your file system by spreading reads and writes across multiple disks. Balancing the load evenly across all disks to maximize their combined throughput. One disk is no more active than another. Supporting very large file and file system sizes. Allowing concurrent reads and writes from multiple nodes. Allowing for distributed token (lock) management. Distributing token management reduces system delays associated with a lockable object waiting to obtaining a token. Allowing for the specification of other networks for GPFS daemon communication and for GPFS administration command usage within your cluster.

10Performance advantages with GPFS file system (cont.) Supporting very large file and file system sizes. Allowing concurrent reads and writes from multiple nodes. Allowing for distributed token (lock) management. Distributing token management reduces system delays associated with a lockable object waiting to obtaining a token. Allowing for the specification of other networks for GPFS daemon communication and for GPFS administration command usage within your cluster.

GPFS Architecture OverviewImplications of Shared Disk ModelAll data and metadata on globally accessible disks (VSD)All access to permanent data through disk I/O interfaceDistributed protocols, e.g., distributed locking, coordinate disk access from multiple nodesFine-grained locking allows parallel access by multiple clientsLogging and Shadowing restore consistency after node failuresImplications of Shared Disk ModelAll data and metadata on globally accessible disks (VSD)All access to permanent data through disk I/O interfaceDistributed protocols, e.g., distributed locking, coordinate disk access from multiple nodesFine-grained locking allows parallel access by multiple clientsLogging and Shadowing restore consistency after node failures

12GPFS Architecture Overview (cont.)Implications of Large ScaleSupport up to 4096 disks of up to 1 TB each (4 Petabytes)The largest system in production is 75 TBFailure detection and recovery protocols to handle node failuresReplication and/or RAID protect against disk / storage node failureOn-line dynamic reconfiguration (add, delete, replace disks and nodes; rebalance file system)

Implications of Large ScaleSupport up to 4096 disks of up to 1 TB each (4 Petabytes)The largest system in production is 75 TBFailure detection and recovery protocols to handle node failuresReplication and/or RAID protect against disk / storage node failureOn-line dynamic reconfiguration (add, delete, replace disks and nodes; rebalance file system)

13

GPFS Architecture - Special Node RolesThree types of nodes:File system nodesManager nodesStorage nodesThree types of nodes: file system, storage, and managerFile system nodesRun user programs, read/write data to/from storage nodesImplement virtual file system interfaceCooperate with manager nodes to perform metadata operationsManager nodesGlobal lock managerFile system configuration: recovery, adding disks, Disk space allocation managerQuota managerFile metadata manager - maintains file metadata integrityStorage nodesImplement block I/O interfaceShared access to file system and manager nodesInteract with manager nodes for recovery (e.g. fencing)Data and metadata striped across multiple disks - multiple storage nodes14Disk Data Structures:Large block size allows efficient use of disk bandwidthFragments reduce space overhead for small filesNo designated "mirror", no fixed placement function:Flexible replication (e.g., replicate only metadata, or only important files)Dynamic reconfiguration: data can migrate block-by-blockMulti level indirect blocks

Each disk address:List of pointers to replicasEach pointer:Disk id + sector no.15Availability and Reliability Eliminate single point of failuresDesigned to transparently fail over token (lock) operations.Supports data replications to increase availability in the vent of a storage media failure.Offers time-tested reliability and has been installed on thousands of nodes across industriesBasis of many cloud storage offerings

For optimal reliability, GPFS can be configured to eliminate single points of failure. The file system can be configured to remain available automatically in the event of a disk or server failure.A GPFS file is designed to transparently fail over token (lock) operations and other GPFS cluster services, which can be distributed throughout the entire cluster to eliminate the need for dedicated metadata servers. GPFS can be configured to automatically recover from node, storage and other infrastructure failures.GPFS provides this functionality by supporting data replication to increase availability in the event of a storage media failure; multiple paths to the data in the event of a communications or server failure; and file system activity logging, enabling consistent fast recovery after system failures. In addition, GPFS supports snapshots to provide a space-efficient image of a file system at a specified time, which allows online backup and can help protect against user error.GPFS offers time-tested reliability and has been installed on thousands of nodes across industries, from weather research to broadcasting, retail, financial industry analytics and web service providers. GPFS also is the basis of many cloud storage offerings.16GPFSs AchievementUsed on six of the ten most powerful supercomputers in the world, including the largest (ASCI white)Installed at several hundred customer sites, on clusters ranging from a few nodes with less than a TB of disk, up to 512 nodes with 140 TB of disk in 2 file systems20 filed patentsASC Purple Supercomputer which is composed of more than 12,000 processors and has 2 PB of total disk storage spanning more than 11,000 disks.

ConclusionEfficient for managing data volumesProvides world-class performance, scalability and availability for your file dataDesigned to optimize the use of storageProvide highly available platform for data-intensive applications Delivering real business needs by streamline data workflows, improvised services reducing cost and managing the risks.GPFS already is managing data volumes that most companies will not need to support for five years or more. You may not need a multipetabyte file system today, but with GPFS you know you will have room to expand as your data volume increases. 18References"File System." Wikipedia, the Free Encyclopedia. Web. 20 Jan. 2012. ."IBM General Parallel File System for AIX: Administration and Programming Reference - Contents." IBM General Parallel File System for AIX. IBM. Web. 20 Jan. 2012. ."IBM General Parallel File System." Wikipedia, the Free Encyclopedia. Web. 20 Jan. 2012. .Intelligent Storage Management with IBM General Parallel File System. Issue brief. IBM, July 2010. Web. 21 Jan. 2012. .Mandler, Benny. Architectural and Design Issues in the General Parallel File System. IBM Haita Research Lab, May 2005. Web. 21 Jan. 2012. ."NCSA Parallel File Systems." National Center for Supercomputing Applications at the University of Illinois. University of Illinois, 20 Mar. 2008. Web. 21 Jan. 2012. .Parallel File System. Rep. Dell Inc., May 2005. Web. 21 Jan. 2012. .Welch, Brent. "What Is a Cluster Filesystem?" Brent B Welch. Web. 21 Jan. 2012. .

Here are the references.19Questions?

general parallel file system

Documents

update data

parallel ras

connectivity parallel

disk accessible

serviceability gpfs

disk io interface

ibms gpfs

remote gpfs clusters