don’t lose my datapeople.redhat.com/sellis/presentations/storage war... · 2020. 4. 29. ·...
TRANSCRIPT
![Page 1: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/1.jpg)
35+ years of storage war stories
Don’t lose my data
Steven Ellis - Red Hat
![Page 2: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/2.jpg)
Agenda
A little bit of history
- Highly abbreviated
War Stories
- Names have been avoided to protect the “innocent”
A couple of tips and tricks along the way
2
![Page 3: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/3.jpg)
Where to begin?
3
![Page 4: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/4.jpg)
https://commons.wikimedia.org/wiki/File:Bhimbetka_Cave_Paintings.jpg
![Page 5: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/5.jpg)
https://pxhere.com/en/photo/1411286
![Page 6: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/6.jpg)
We’ve always needed storage
https://www.theregister.co.uk/2013/09/09/feature_history_of_enterprise_storage/6
![Page 7: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/7.jpg)
Physical Storage Devices
7
![Page 9: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/9.jpg)
Can I fsck that for you?
![Page 10: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/10.jpg)
35+ years ago
10
![Page 11: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/11.jpg)
Beware aging / legacy storage
Bit Rot
Disk Rot
Flash failure
Mould
Old Interfaces (IDE / SCSI)
Old Tape formats
Hardware failure
Old Filesystems11
![Page 12: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/12.jpg)
Data recovery tips
Isopropyl Alcohol + lint free cloths
- Dust / oils can kill an optical drive
USB based dongles to reduce reboots
- CD/DVD Drive- Floppy Drives- IDE Drive- SATA
NAS / SAN / External USB for initial archive
- I prefer locally attached USB-3 drives
Linux tools
- SystemRescueCd / UltimateBootCD- http://www.system-rescue-cd.org/- https://www.ultimatebootcd.com/
- ddrescue- lzop
- Fast lightweight compression
- testdisk / photorec- Recovery of filesystems and individual files
off failed media
12
![Page 13: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/13.jpg)
Original Media Has Failed
USB based dongles don’t always behave well with failed Hard Drives
- Time to dig out / borrow some old hardware- Boot original hardware with a USB Live OS Image
Always create a full copy of the original media
- ddrescue is your friend- perform data recovery with a snapshot/copy of the backup- Fail back to testdisk/photorec
13
![Page 14: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/14.jpg)
HDDs > SSDs
TRIM is critical to Flash Storage performance
- Allows for elegant wear leveling
Makes it nearly impossible to recover “deleted data”
- On a HDD a deleted file is “often” just unlinked from the filesystem
14
![Page 15: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/15.jpg)
Raid is not a backup mechanism
Raid 0/1/10/5/6 can be implemented via
- Hardware raid controllers- Proprietary or in kernel drivers
- “Fake Raid”- Really a software driver - dm-raid
- Software Raid- mdadm or LVM based
15
![Page 16: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/16.jpg)
Original talk from Sys Admin Miniconf - LCA 2010 in Wellington
Pinpointing the issue
- RAID / HBA Adapter- Firmware Issues
- updates that can trash a Raid array- Raid metadata incompatible with different firmware versions
- Legacy Adapter- Conflict with motherboard chipset
MDADM can be your friend
- running XFS can also helps
Going Mad with MDADM Pt 1
16
![Page 17: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/17.jpg)
Problem - Hardware Raid Controller Failure
- No spare compatible hardware- Trade Me or Ebay was the only option for parts
- Installed a SATA/SAS HBA into a generic modern Linux box- Raid metadata was detected by dm-raid- Raid array assembled into a running state- Data recovered onto replacement hardware
Going Mad with MDADM Pt 2
17
![Page 18: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/18.jpg)
The Problem - recovering failed a RAID 5 array
Software Raid-5 set via mdadm
- 4 x 3TB Drives- Marvel 88SE9230 PCI-e SATA HBA- DMA errors under high I/O
- Or during weekly raid consistency check- 2 Drives were removed from RAID array
BZs
- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810239- https://bugzilla.redhat.com/show_bug.cgi?id=1337313
Going Mad with MDADM Pt 3
Solution
1. Change HBA
2. Check the Raid Setmdadm --detail /dev/md2
3. Confirm Event of 3 disks is close enoughmdadm --examine /dev/sd[abd]2 | \
grep Event
4. Force start a degraded array and cross
fingers
5. Consistency check on LVM and
filesystems18
![Page 19: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/19.jpg)
Go fsck
![Page 20: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/20.jpg)
Sanity check for
- HDD- SSD- SAN
hdparm -t
How fsck is your storage
Additional tool for flash storage
f3read
f3write
20
![Page 21: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/21.jpg)
Cluster fsck
![Page 22: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/22.jpg)
I need HA storage for/because
Beware
- there are dragons ahead- may you live in interesting time- life is to short to build a cluster of two
HA / DR / Backup
- HA isn’t a backup mechanism- DR with high or variable latency- Fail over / Fail back
What is the use case
- RTO / RPO- Workload performance requirements- Any latency issues
Cluster of two
- Is a problem waiting to happen- 3rd quorum node / arbiter is critical
22
![Page 23: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/23.jpg)
HA NFS
Approaches
- Active/Active- Requires a cluster aware file system
- gpfs / gfs2
- Active / Passive- Shared storage over FC/iSCSI- Partition is only mounted on a single node- Pacemaker + VIP
Issues
- Application services are latency sensitive- Requirement sub 5ms- NFS failover is >= 30 seconds
- Scale- 2 node cluster couldn’t cope with workload- Had to scale to 3 nodes with considerable
added complexity
- No Live migration- Environment was virtualised
23
![Page 24: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/24.jpg)
Multiple Single points of failure
Understand the requirements
- And the existing infrastructure- Especially any SAN arrays
- And any associated network infrastructure
Real cost of the solution
- What does an NFS head for the SAN cost- vs project and operational cost of your “busy
work”
Common statement is the existing storage infrastructure isn’t reliable or meet the RTO/RPO requirements of a project
- Secondary requirement is solution has to be virtualised
All Virtual infrastructure runs off the same SAN array
- But you need to meet a higher SLA than the array
24
![Page 25: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/25.jpg)
Software Defined Storage
Gluster
- Suits file centric workloads- Simple to implement- Can run virtually or on bare metal- Scales elegantly- Supports CIFS/NFS/pNFS + Gluster Fuse
Ceph
- Object / Block / File- Focused on bare metal- See my rook talk tomorrow for containers- Vibrant community- Replica 3 for performance- Excellent EC implementation for scale
25
![Page 26: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/26.jpg)
What the fsck!
![Page 27: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/27.jpg)
Dust and Humidity
Existing machine room re-sized
- Shrunk to provide additional storage space- New drywall installed
- and sanded- But they did place drop cloths over the racks
Outcome
- We had to vacuum out all the servers- Almost every hard drive was replaced over next
9 months
Aircon unit leak
- Wet carpet in the machine room- Temporary aircon couldn’t deal with
additional humidity from drying out carpets
Outcome
- Almost every hard drive failed over next 6 months
27
![Page 28: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/28.jpg)
Expect the unexpected
Corrupted LVM
- Multiple LUNs from SAN- Combined into a single VG via LVM
Corrupted filesystem
- xfs filesystem consistency issues- Rebooting host inconsistent behavior
Issue - FC LUNs had been allocated to 2 systems
- No partition table was present- Unix team had re-formatted the LUN- LUN was in the middle of a Linux LVM VG
Recommendation
- Always create a partition table
Issue - Poor grouping of LUNs
- Virtual Guests hosted on KVM- Direct LUNs mapped to Virtual Guests- Guests mounted wrong /var or /data
Recommendation
- Unique LV UUIDs & mount by UUID
28
![Page 29: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/29.jpg)
Back to the Future
![Page 30: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/30.jpg)
You will always need more storage
HDDs (rust) aren’t dead (yet)
At some point TCO for Flash will drop below rust
EDSFF for hyperdense flash storage
Persistent Memory
Cloud players will continue to innovate
Everything old is new again
30
![Page 32: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/32.jpg)
![Page 33: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/33.jpg)
Bonus Slides
![Page 34: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/34.jpg)
Data Protection Techniques
RAID Parity and striping across block devices to create sets of redundancy
Multipath Redundant network paths from host to storage (dual HBA/NIC at host)
EC Erasure coding saves data in fragments with parity across different locations
Mirror Storage array level synchronous and asynchronous mirroring of data (DR/BC)
Cache Battery or super-capacitor backed up cache
34
![Page 35: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/35.jpg)
Data Protection Techniques
RAID Redundant Array of Independent Disks
RAID 0 - striping only, no protection
RAID 1 - exact mirroring
RAID 5 - 5D+1P, parity blks striped
RAID 6 - 4D+2P, parity blks striped
Performance only
Least capacity
1 disk fail
2 disk fail
35
![Page 36: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/36.jpg)
Data Protection Techniques
EC Erasure Coding - K + M
EC 4+2
EC 8+3
2 disk fail
3 disk fail
36
https://stonefly.com/blog/understanding-erasure-codinghttps://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/storage_strategies_guide/erasure_code_poolshttps://www.snia.org/sites/default/files/SDC15_presentations/datacenter_infra/Shenoy_The_Pros_and_Cons_of_Erasure_v3-rev.pdf
![Page 37: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20](https://reader033.vdocuments.us/reader033/viewer/2022051917/6009931d8e26540f625f7fcf/html5/thumbnails/37.jpg)
DRBDont
DRBD
- Distributed Replicated Block Device
DRBDont
- Maintenance can be (was) painful- Fail back issues- Cluster of 2- STONITH !!!!!
CC BY-SA 2.5https://en.wikipediaorg/wiki/Distributed_Replicated_Block_Device
37