functional assessment of erasure coded storage archive...• solid state drives o priced for...
TRANSCRIPT
![Page 1: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/1.jpg)
Functional Assessment of Erasure Coded Storage Archive
Blair Crossman Taylor Sanchez Josh Sackos
LA-UR-13-25967
Computer Systems, Cluster, and Networking Summer Institute
![Page 2: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/2.jpg)
Presentation Overview
• Introduction
• Caringo Testing
• Scality Testing
• Conclusions
1
![Page 3: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/3.jpg)
Storage Mediums
• Tape o Priced for capacity not bandwidth
• Solid State Drives o Priced for bandwidth not capacity
• Hard Disk o Bandwidth scales with more drives
2
![Page 4: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/4.jpg)
Object Storage: Flexible Containers
• Files are stored in data containers • Meta data outside of file system
• Key-value pairs
• File system scales with machines
• METADATA EXPLOSIONS!!
3
![Page 5: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/5.jpg)
What is the problem?
• RAID, replication, and tape systems were not designed for exascale computing and storage
• Hard disk capacity continues to grow
• Solution to multiple hard disk failures is needed
4
![Page 6: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/6.jpg)
Erasure Coding : Reduce Rebuild Recalculate
Reduce! Rebuild! Recalculate!
5
![Page 7: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/7.jpg)
Project Description
• Erasure coded object storage file system is a potential replacement for LANL’s tape archive system
• Installed and configured two prototype archives o Scality o Caringo
• Verified the functionality of systems
6
![Page 8: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/8.jpg)
Functionality Not Performance
Caringo o SuperMicro admin node o 1GigE interconnect o 10 IBM System x3755
§ 4 x 1TB HDD o Erasure coding:
o n=3 o k=3
Scality o SuperMicro admin node o 1GigE interconnect o 6 HP Proliant (DL160 G6)
§ 4 x 1TB HDD o Erasure coding:
o n=3 o k=3
7
![Page 9: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/9.jpg)
Project Testing Requirements
• Data o Ingest : Retrieval : Balance : Rebuild
• Metadata o Accessibility : Customization : Query
• POSIX Gateway o Read : Write : Delete : Performance overhead
8
![Page 10: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/10.jpg)
How We Broke Data
• Pulled out HDDs (Scality, kill daemon)
• Turned off nodes
• Uploaded files, downloaded files
• Used md5sum to compare originals to downloaded copies
9
![Page 11: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/11.jpg)
Caringo: The automated storage system
• Warewulf/Perceus like diskless (RAM) boot
• Reconfigurable, requires reboot
• DHCP PXE boot provisioned
• Little flexibility or customizability
• http://www.caringo.com
10
![Page 12: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/12.jpg)
No Node Specialization
• Nodes "bid" for tasks
• Lowest latency wins • Distributes the work
• Each node performs all tasks • Administrator : Compute : Storage
• Automated Power management • Set a sleep timer • Set an interval to check disks
• Limited Administration Options
11
![Page 13: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/13.jpg)
Caringo Rebuilds Data As It Is Written
• Balances data as written o Primary Access Node o Secondary Access Node
• Automated o New HDD/Node: auto balanced o New drives format automatically o Rebuilds Constantly o If any node goes down rebuild starts immediately o Volumes can go "stale” o 14 Day Limit on unused volumes
12
![Page 14: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/14.jpg)
What’s a POSIX Gateway
• Content File Server o Fully Compliant POSIX object o Performs system administration tasks o Parallel writes
• Was not available for testing
13
![Page 15: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/15.jpg)
“Elastic” Metadata
• Accessible
• Query: key values o By file size, date, etc.
• Indexing requires “Elastic Search” machine to do
indexing o Can be the bottleneck in system
14
![Page 16: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/16.jpg)
Minimum Node Requirements
• Needs a full n + k nodes to: • rebuild • write • balance
• Does not need full n +k to: • read • query metadata • administration
15
![Page 17: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/17.jpg)
Static Disk Install
• Requires disk install
• Static IP addresses • Optimizations require deeper knowledge • http://www.scality.com
16
![Page 18: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/18.jpg)
Virtual Ring Resilience
• Success until less virtual nodes available than n+k erasure configuration.
• Data stored to ‘ring’ via distributed hash table
17
![Page 19: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/19.jpg)
Manual Rebuilds, But Flexible • Rebuilds on less than required nodes
o Lacks full protection • Populates data back to additional node • New Node/HDD: Manually add node • Data is balanced during:
• Writing • Rebuilding
18
![Page 20: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/20.jpg)
Indexer Sold Separately
• Query all erasure coding metadata per server
• Per item metadata
• User Definable
• Did not test Scality’s ‘Mesa’ indexing service • Extra software
19
![Page 21: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/21.jpg)
Fuse gives 50% Overhead, but scalable
20
![Page 22: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/22.jpg)
On the right path
• Scality o Static installation, flexible erasure coding o Helpful o Separate indexer o 500MB file limit ('Unlimited' update coming)
• Caringo o Variable installation, strict erasure coding o Good documentation o Indexer included o 4TB file limit (addressing bits limit)
21
![Page 23: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/23.jpg)
Very Viable
• Some early limitations
• Changes needed on both products
• Scality seems more ready to make those changes.
22
![Page 24: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/24.jpg)
Questions?
23
![Page 25: Functional Assessment of Erasure Coded Storage Archive...• Solid State Drives o Priced for bandwidth not capacity • Hard Disk o Bandwidth scales with more drives 2 . Object Storage:](https://reader033.vdocuments.us/reader033/viewer/2022050515/5f9ee0a798af2a64843d897b/html5/thumbnails/25.jpg)
Acknowledgements
Special Thanks to : Dane Gardner - NMC Instructor Matthew Broomfield - NMC Teaching Assistant
HB Chen - HPC-5 - Mentor Jeff Inman - HPC-1- Mentor
Carolyn Connor - HPC-5, Deputy Director ISTI
Andree Jacobson - Computer & Information Systems Manager NMC
Josephine Olivas - Program Administrator ISTI Los Alamos National Labs, New Mexico Consortium, and ISTI
24