don’t lose my datapeople.redhat.com/sellis/presentations/storage war... · 2020. 4. 29. ·...

37
35+ years of storage war stories Don’t lose my data Steven Ellis - Red Hat

Upload: others

Post on 24-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

35+ years of storage war stories

Don’t lose my data

Steven Ellis - Red Hat

Page 2: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Agenda

A little bit of history

- Highly abbreviated

War Stories

- Names have been avoided to protect the “innocent”

A couple of tips and tricks along the way

2

Page 3: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Where to begin?

3

Page 4: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

https://commons.wikimedia.org/wiki/File:Bhimbetka_Cave_Paintings.jpg

Page 5: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

https://pxhere.com/en/photo/1411286

Page 6: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

We’ve always needed storage

https://www.theregister.co.uk/2013/09/09/feature_history_of_enterprise_storage/6

Page 7: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Physical Storage Devices

7

Page 9: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Can I fsck that for you?

Page 10: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

35+ years ago

10

Page 11: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Beware aging / legacy storage

Bit Rot

Disk Rot

Flash failure

Mould

Old Interfaces (IDE / SCSI)

Old Tape formats

Hardware failure

Old Filesystems11

Page 12: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Data recovery tips

Isopropyl Alcohol + lint free cloths

- Dust / oils can kill an optical drive

USB based dongles to reduce reboots

- CD/DVD Drive- Floppy Drives- IDE Drive- SATA

NAS / SAN / External USB for initial archive

- I prefer locally attached USB-3 drives

Linux tools

- SystemRescueCd / UltimateBootCD- http://www.system-rescue-cd.org/- https://www.ultimatebootcd.com/

- ddrescue- lzop

- Fast lightweight compression

- testdisk / photorec- Recovery of filesystems and individual files

off failed media

12

Page 13: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Original Media Has Failed

USB based dongles don’t always behave well with failed Hard Drives

- Time to dig out / borrow some old hardware- Boot original hardware with a USB Live OS Image

Always create a full copy of the original media

- ddrescue is your friend- perform data recovery with a snapshot/copy of the backup- Fail back to testdisk/photorec

13

Page 14: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

HDDs > SSDs

TRIM is critical to Flash Storage performance

- Allows for elegant wear leveling

Makes it nearly impossible to recover “deleted data”

- On a HDD a deleted file is “often” just unlinked from the filesystem

14

Page 15: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Raid is not a backup mechanism

Raid 0/1/10/5/6 can be implemented via

- Hardware raid controllers- Proprietary or in kernel drivers

- “Fake Raid”- Really a software driver - dm-raid

- Software Raid- mdadm or LVM based

15

Page 16: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Original talk from Sys Admin Miniconf - LCA 2010 in Wellington

Pinpointing the issue

- RAID / HBA Adapter- Firmware Issues

- updates that can trash a Raid array- Raid metadata incompatible with different firmware versions

- Legacy Adapter- Conflict with motherboard chipset

MDADM can be your friend

- running XFS can also helps

Going Mad with MDADM Pt 1

16

Page 17: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Problem - Hardware Raid Controller Failure

- No spare compatible hardware- Trade Me or Ebay was the only option for parts

- Installed a SATA/SAS HBA into a generic modern Linux box- Raid metadata was detected by dm-raid- Raid array assembled into a running state- Data recovered onto replacement hardware

Going Mad with MDADM Pt 2

17

Page 18: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

The Problem - recovering failed a RAID 5 array

Software Raid-5 set via mdadm

- 4 x 3TB Drives- Marvel 88SE9230 PCI-e SATA HBA- DMA errors under high I/O

- Or during weekly raid consistency check- 2 Drives were removed from RAID array

BZs

- https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1810239- https://bugzilla.redhat.com/show_bug.cgi?id=1337313

Going Mad with MDADM Pt 3

Solution

1. Change HBA

2. Check the Raid Setmdadm --detail /dev/md2

3. Confirm Event of 3 disks is close enoughmdadm --examine /dev/sd[abd]2 | \

grep Event

4. Force start a degraded array and cross

fingers

5. Consistency check on LVM and

filesystems18

Page 19: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Go fsck

Page 20: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Sanity check for

- HDD- SSD- SAN

hdparm -t

How fsck is your storage

Additional tool for flash storage

f3read

f3write

20

Page 21: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Cluster fsck

Page 22: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

I need HA storage for/because

Beware

- there are dragons ahead- may you live in interesting time- life is to short to build a cluster of two

HA / DR / Backup

- HA isn’t a backup mechanism- DR with high or variable latency- Fail over / Fail back

What is the use case

- RTO / RPO- Workload performance requirements- Any latency issues

Cluster of two

- Is a problem waiting to happen- 3rd quorum node / arbiter is critical

22

Page 23: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

HA NFS

Approaches

- Active/Active- Requires a cluster aware file system

- gpfs / gfs2

- Active / Passive- Shared storage over FC/iSCSI- Partition is only mounted on a single node- Pacemaker + VIP

Issues

- Application services are latency sensitive- Requirement sub 5ms- NFS failover is >= 30 seconds

- Scale- 2 node cluster couldn’t cope with workload- Had to scale to 3 nodes with considerable

added complexity

- No Live migration- Environment was virtualised

23

Page 24: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Multiple Single points of failure

Understand the requirements

- And the existing infrastructure- Especially any SAN arrays

- And any associated network infrastructure

Real cost of the solution

- What does an NFS head for the SAN cost- vs project and operational cost of your “busy

work”

Common statement is the existing storage infrastructure isn’t reliable or meet the RTO/RPO requirements of a project

- Secondary requirement is solution has to be virtualised

All Virtual infrastructure runs off the same SAN array

- But you need to meet a higher SLA than the array

24

Page 25: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Software Defined Storage

Gluster

- Suits file centric workloads- Simple to implement- Can run virtually or on bare metal- Scales elegantly- Supports CIFS/NFS/pNFS + Gluster Fuse

Ceph

- Object / Block / File- Focused on bare metal- See my rook talk tomorrow for containers- Vibrant community- Replica 3 for performance- Excellent EC implementation for scale

25

Page 26: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

What the fsck!

Page 27: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Dust and Humidity

Existing machine room re-sized

- Shrunk to provide additional storage space- New drywall installed

- and sanded- But they did place drop cloths over the racks

Outcome

- We had to vacuum out all the servers- Almost every hard drive was replaced over next

9 months

Aircon unit leak

- Wet carpet in the machine room- Temporary aircon couldn’t deal with

additional humidity from drying out carpets

Outcome

- Almost every hard drive failed over next 6 months

27

Page 28: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Expect the unexpected

Corrupted LVM

- Multiple LUNs from SAN- Combined into a single VG via LVM

Corrupted filesystem

- xfs filesystem consistency issues- Rebooting host inconsistent behavior

Issue - FC LUNs had been allocated to 2 systems

- No partition table was present- Unix team had re-formatted the LUN- LUN was in the middle of a Linux LVM VG

Recommendation

- Always create a partition table

Issue - Poor grouping of LUNs

- Virtual Guests hosted on KVM- Direct LUNs mapped to Virtual Guests- Guests mounted wrong /var or /data

Recommendation

- Unique LV UUIDs & mount by UUID

28

Page 29: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Back to the Future

Page 30: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

You will always need more storage

HDDs (rust) aren’t dead (yet)

At some point TCO for Flash will drop below rust

EDSFF for hyperdense flash storage

Persistent Memory

Cloud players will continue to innovate

Everything old is new again

30

Page 31: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

[email protected]

http://people.redhat.com/sellis

Questions?

Page 32: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20
Page 33: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Bonus Slides

Page 34: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Data Protection Techniques

RAID Parity and striping across block devices to create sets of redundancy

Multipath Redundant network paths from host to storage (dual HBA/NIC at host)

EC Erasure coding saves data in fragments with parity across different locations

Mirror Storage array level synchronous and asynchronous mirroring of data (DR/BC)

Cache Battery or super-capacitor backed up cache

34

Page 35: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Data Protection Techniques

RAID Redundant Array of Independent Disks

RAID 0 - striping only, no protection

RAID 1 - exact mirroring

RAID 5 - 5D+1P, parity blks striped

RAID 6 - 4D+2P, parity blks striped

Performance only

Least capacity

1 disk fail

2 disk fail

35

Page 36: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

Data Protection Techniques

EC Erasure Coding - K + M

EC 4+2

EC 8+3

2 disk fail

3 disk fail

36

https://stonefly.com/blog/understanding-erasure-codinghttps://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/storage_strategies_guide/erasure_code_poolshttps://www.snia.org/sites/default/files/SDC15_presentations/datacenter_infra/Shenoy_The_Pros_and_Cons_of_Erasure_v3-rev.pdf

Page 37: Don’t lose my datapeople.redhat.com/sellis/Presentations/Storage War... · 2020. 4. 29. · hdparm -t How fsck is your storage Additional tool for flash storage f3read f3write 20

DRBDont

DRBD

- Distributed Replicated Block Device

DRBDont

- Maintenance can be (was) painful- Fail back issues- Cluster of 2- STONITH !!!!!

CC BY-SA 2.5https://en.wikipediaorg/wiki/Distributed_Replicated_Block_Device

37