lustre replication and migration tool - eofs · 2016. 9. 20. · © 2016 datadirect networks, inc....

12
USSM FOOTBALL SERIES SAISON 1 L’ESPRIT SM FOOTBALL SER FOOTBALL S DE CONQUÊTE VS ÉPISODE 3 02.09.2020 / 18H00 CHARTRES US SAINT-MALO

Upload: others

Post on 08-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Lustre Replication and Migration Tool

2016/09/20

DataDirect Networks Japan, Inc.Shuichi Ihara ([email protected])

Page 2: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Background: Replication and Migration

▶  Data backup is important • Many backup and replication requirements •  Lustre needs to be migrated from old to new system • Time to think backup for few PB Lustre system •  Lustre Replication is not ready yet.

▶  Lustre tiering is possible • Data Migration from Fast HDD (or SSD) to slow disks is

possible o  Different from fully automated HSM

•  Lack of user space utilities

2

Page 3: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Lustre data replication and migration

▶  Data replication (Backup) • Two independent and different namespaces (file system) • Asynchronous file level replication

▶  Data migration • Migrate data from a storage device to another type of

device (e.g. SSD or fast HDD to slow HDD) • Keep same metadata and namespace • Application access data transparently even after migration

Introduce two utilities (ldsync and ldmigrate) for data replication and migration in Lustre.

3

Page 4: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Challenges on Backup and Replication

▶  Two major challenges on Backup and Replication at large file system •  "Delta" detection between file systems o  Determined by file attributes (mtimes, size, checksum, etc) o  Depends on how many files are in the file systems

• Data transfer time o  Copying many large files, as well large single shared file o  Need efficient resource allocation and maximize utilization

4

Page 5: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Major Copy Tool : RSYNC and DCP

▶  RSYNC • Has been be maintained more than 10 years and

packaged in Linux distribution. • Supports many features, but lack of parallelization

▶  DCP (part of fileutils http://fileutils.io) • Designed for scalability and performance • Started as collaboration efforts among several large US

laboratories and DDN, that was involved at the beginning • Support MPI and any POSIX file system • Manage chunk of file and efficient MPI rank allocation for

copy and maximize resource utilization

5

Page 6: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Accelerate File system's delta detection

▶  "Diff" detections of two file systems • RSYNC takes few hours for delta detection at tens of millions

of inode in the file system • DCP (DCMP) in fileutils is much faster, but still consume a lot

of time and metadata pressure

▶  Lustre Changelog rescue •  Lustre Changelog records events the file system namespace

or file metadata. (Timestamp, FID and operation) • Keep in MDT and it can fetch from Lustre client • No more file system scanning except initial copy!

6

Page 7: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

ldsync : Parallel synchronization tool based on Lustre changelog

▶  Similar tool lustre_rsync is exist, but... • Single thread and still based on rsync • Partial changelog support

▶  ldsync is a replacement of lustre_rsync • A parallel synchronization framework includes Lustre

changelog analyzer • Changelog analyzer walk through Changelog and invoke

minimum stat() call to determine files to be synced • Flexible backend copy tool support (Use DCP for now, but

any native copy tool possible)

7

Page 8: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Architecture of ldsync

1.  Fetch Changelog from MDT and analyze 2.  Minimum stat() call to MDS to get additional metadata

information 3.  Copy files by "dcp" and also unlink files from backup file

system by "drm" 4.  Clear old Changelog

8

Primary Lustre

File system

Backup Lustre

File system

Data Mover Node

Data Mover mounts both primary(ro) and backup file system

Page 9: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Delta detection speed in two file systems9

0

200

400

600

800

1000

1200

1.2 2.4 3.6 4.8 6 7.2 8.4 9.6 10.8 12

TIm

e(se

c)

Number of file (Million)

ldsync(Changelog) and Rsync(without copy)

ldsync

rsync

25x reduction

Page 10: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

ldsync : Experimental performance (Many small files)10

1Milion x 4KB file creation and synchronize two file systems

0

50

100

150

200

250

300

350

400

450

ldsync(1node) ldsync(1node)+Optimized Changelog Walk

ldsync(8node)+Optimized Changelog Walk

Tim

e (s

ec)

Data Copy

Changelog Analysis

Changelog Scan

Changelog Scanning(2 sec)

3x Reduction

2.5x Reduction

Page 11: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

ldsync : Experimental performance (1TB single shared file)11

0 500

1000 1500 2000 2500 3000 3500 4000 4500 5000

Rsync DCP (1n1p) DCP (1n4p) DCP (1n8p) DCP (2n16p)

DCP (4n32p)

DCP (8n64p)

DCP (16n128p)

Single Client Multiple Clients

TIm

e(se

c)

Primary System : 1 x DDN ES7K(140 x NLSAS), 2 x FDR Backup System : 1 x DDN ES7K(140 x NLSAS), 2 x FDR

9x reduction Copy speed: ~80% BW of H/W(10GB/sec)

Page 12: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

ldmigrate : Parallel data migration tool for Lustre

▶  Keep metadata, but change OST object placement •  e.g) Migrate data from SSD OST pool to HDD •  "lfs migrate" can do it, but limited scalability

▶  ldmigrate • Migrate OST objects to another OSTs (OST pool) based on

Lustre data layout (determines how to place data to OSTs) • Parallelization and scalability •  Integration with Job Scheduler is possible

12

Page 13: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Architecture of ldmigrate13

OST Pool "SSD"

MDT

Acquire grouplock and start copy data to tmp files by dcp

OST OST

FileA

/scratch/user/

object

TmpFile

object

MDT

OST OST

FileA

/scratch/user/

object

TmpFile

object

Swap Layout and FileA objects are now in OSTs of OST pool "HDD" and remove tmp files

After finish copy

Parallel data copy Swap Lustre layout

OST Pool "HDD" OST Pool "SSD" OST Pool "HDD"

Page 14: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

How Setup Tiered OST Pool and Migration

▶  Creating OST pool for different type of device [root@mds ~]# lctl pool_new scratch.SSD [root@mds ~]# lctl pool_new scratch.HDD [root@mds ~]# lctl pool_add scratch.SSD OST[0-9] [root@mds ~]# lctl pool_add scratch.HDD OST[a-13]

▶  Assign OST pool to directory [root@client ~]# lfs setstripe -p SSD /scratch/user [root@client ~]# lfs getstripe -p /scratch/user/file* SSD

▶  Copy and layout change [root@client ~]# ldmigrate -g 100 -m -o SSD /scratch/user /scratch/tmp [root@client ~]# lfs getstripe -p /scratch/user/file* HDD

14

Page 15: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

Conclusions

▶  Introduced ldsync and ldmigrate for backup and data migration.

▶  Demonstrated huge performance improvements compared to existing tools and techniques.

▶  Still investigating several performance optimization and stability. It also require more tests

▶  These tools still under private repository at Github, but we plan to publish as open source or push patches to fileutils.

15

Page 16: Lustre Replication and Migration Tool - EOFS · 2016. 9. 20. · © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. ddn.com Any statements

ddn.com © 2016 DataDirect Networks, Inc. * Other names and brands may be claimed as the property of others. Any statements or representations around future events are subject to change.

16 Thank you!