vendor lock-in-free storage at msu
TRANSCRIPT
V E N D O R L O C K - I N - F R E E S T O R A G E AT M I C H I G A N S TAT E U N I V E R S I T Y
G R E G M A S O N I N S T I T U T E F O R C Y B E R - E N A B L E D R E S E A R C H
2
W H O A M I
• Sysadmin at MSU for over 6 years
• Couple of years in industry before that doing operations.
• Primary engineer for HPC storage
• On the internet: [email protected], @nodoubleg
3
W H AT I S H P C ?
• High Performance Computing
• Built with fast CPUs, low-latency high-bandwidth networks, fast storage, and batch job schedulers
4
M S U ’ S S C A L E
• ~7,600 cores
• ~50TB RAM
• ~2PB storage
• ~2,000 software titles installed
5
M S U ’ S H P C W O R K L O A D
• We serve everybody, from Ag Econ to Zoology
• Tuning anything for a specific workload is futile
• Chemistry
• Bioinformatics
6
M S U ’ S H P C S T O R A G E
• Persistent storage is all ZFS. Reasonably fast, reasonably available, cheap. Always safe*
• High-speed parallel storage is Lustre. Currently based on a modified ext4. Fast, only moderately reliable/safe.
• NetApp filer, to support VMware environment
7
Z F S AT M S U
• Run in production since 2009. OpenSolaris then, OpenZFS now.
• Over 1.5PB in production at iCER
• Even using it in odd places
8
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
9
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
10
( s o m e ) B E N E F I T S O F Z F S
• Checksum ALL THE THINGS!!1
• Integrated raid understands the objects it stores
• Copy-on-write transactions are atomic
• Snapshots
• Reduces hardware costs
• Simplified administration: zfs set refquota=3T tank/filesystem zfs snapshot tank/filesystem@beforeupgrade zpool status
11
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
12
O V E R V I E W O F Z F S C O M P O N E N T S
• pool: A collection of devices that provides storage for data managed by ZFS
• vdev: A top-level device in a pool. Can be a plain disk, raid group (raidz), or mirror.
• dataset: A zvol or filesystem
• zvol: A block device presented to the OS
• filesystem: A plain ol’ POSIX filesystem
• snapshot: A copy-on-write reference to a dataset at a point in time. Not just a copy.
• zil/log/slog/logzilla: ZFS Intent Log. All writes not yet committed to disk are stored here. Only read from when recovering from an unclean shutdown. Not a buffer.
• ARC/primarycache: Adaptive Replacement Cache. Some of the smart’s behind the performance of ZFS. Not just a dumb page cache. Resides in RAM.
• l2arc/cache/secondarycache: A block-device version of the ARC, commonly an SSD. When objects are evicted from the ARC, they might end up on the l2arc.
• For more info: http://bit.ly/zfsdocs
13
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
14
P L AT F O R M S W I T H Z F S
• OpenZFS
• Illumos
• FreeBSD
• Linux
• Mac OS X
• Oracle ZFS
• ZFS Storage Appliance
• Solaris 11
15
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
16
B U I L D A Z F S - B A S E D S Y S T E M
• You want trustworthy HBAs, disks, and NICs.
• I use LSI HBAs with the IT firmware.
• my NICs are Mellanox and Intel.
• Hardware spec isn’t scary!
• Illumos HCL: http://illumos.org/hcl/
• FreeBSD & Linux: anything these run on. Tend to have better hardware vendor support
17
R E C O M M E N D E D C O N F I G
• Quanta M4600H, Seagate 84-drive JBOD, Sanmina JBODS, or Supermicro SAS JBODs.
• Servers: any decent 2-socket Intel server with lights-out management. Lots of ECC RAM for the cache.
• Network: at least 10-gig. Investigate 40-gig Ethernet or Infiniband (IB).
• Be sure the number of disks meets the performance requirement
18
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
19
P O T E N T I A L P I T FA L L S
• Using cheap SATA hard drives
• Using SAS expanders with SATA drives
• Improperly-sized raid stripes
• Picking the wrong SSDs for acceleration
• Using the wrong disk multipathing strategy/algorithm
20
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
21
Z F S A LT E R N AT I V E S ( i f y o u m u s t )
• btrfs
• ReFS
• GPFS
• HAMMER
• Ceph
22
B T R F S
• Default filesystem for some Linux distros. Only very recently considered stable.
• Features checksums, mirroring, integrated double-parity raid that is still maturing.
• Can shrink the “array” or pool of disks, thanks for reused code from Linux MD raid.
• “mostly works ok” “typically doesn’t corrupt itself” as of kernel 3.10
• As of kernel 4.0, things are looking better-ish
23
R e F S
• Proprietary, successor to NTFS
• Works with Storage Spaces in Windows
• Supports most NTFS features
• 64-bit checksums are stored separately for metadata. Same for data, when enabled.
• Keeps running even after checksum failures, allowing for online recovery
• Performance is very low when data checksums are enabled
24
G P F S
• Proprietary parallel filesystem from IBM
• Similar to Lustre on ZFS: parallel filesystem with integrated raid, checksumming, and compression.
• Better raid implementation (declustered raid)
• Excellent policy-driven data movement, and truly global namespaces
25
H A M M E R
• Default filesystem in DragonflyBSD
• All data is CRC-checked. Smaller checksum than ZFS, designed for bit rot detection, not blind data verification.
• Raid is left to other software/devices. A bit flip on a raid array is not easily recoverable.
• single-file history accessible with undo command
• Smallest maximum filesystem size of the alternatives, at “only” 1 exabyte
26
C E P H
• A data storage system, not filesystem
• Superb object store and block device provider (RADOS)
• Objects are the way of the future
27
T O P I C S
• Benefits of ZFS
• Overview of ZFS
• Platforms with ZFS
• Build a ZFS-based system
• Potential pitfalls
• ZFS alternatives, if you must
• Storage of the future
28
S T O R A G E O F T H E F U T U R E ?
• ZFS still plays an important role for persistent data storage, and robust POSIX filesystems.
• Vendors are publicly committing to Lustre on ZFS.
• Future is object stores. Ceph, Amazon S3, Microsoft Azure, even objects on Lustre.
• Networks will unify, bringing unified storage with them. Infiniband and Ethernet will converge.
29
M O R E I N F O R M AT I O N
• OpenZFS: http://www.open-zfs.org
• me: [email protected], @nodoubleg
30
http://bit.ly/zfsdocs
Q U E S T I O N S ?
31