raid 6 resync: tests & tweaks

Upload: aidanlinz

Post on 10-Oct-2015

21 views

Category:

Documents


0 download

DESCRIPTION

Raid 6 Resync Testing: Exploring limiting factors in resync times using a Synology RS2112+ and 10x1TB Raid 6, Intel Atom N2700 CPU and 1GB DDR3 memory.Results reveal memory rather than CPU time to be the limiting factor in RAID re-sync speed, and several kernel variables were found to have a high impact on total array recovery time.

TRANSCRIPT

  • Is CPU the rate-limiter on rebuilds modern arrays, given gains in performance inembedded processors?Synology RS2112+ Intel Atom N2700 dual core cpu1GB DDR3 memory10x Western Digital 2TB RE4 sata drivesRaid 6+1, 13TB volumeControllers: 2x 4 port marvel 7042, 1x onboard intel (it's using all 3..weird)

    The commands 'blockdev' 'pidof' and 'ionice' were not part of the original firmware image and were installedvia the package 'util-linux-ng'

    For the too long; didn't read folks, there's a conclusion section at the end :)

    irst, I forced a resync with a network backup task inbound, which was writing at between 2 and 10MB/s.

    Rebuild rate with initial settings:

    RackSta'on> cat /proc/mdstatPersonali'es : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]md2 : ac've raid6 sda3[0] sdi3[8] sdh3[7] sdg3[6] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1] 13641545792 blocks super 1.2 level 6, 64k chunk, algorithm 2 [9/9] [UUUUUUUUU] [========>............] resync = 44.5% (868861184/1948792256) nish=1304.9min speed=13792K/sec

    This is per drive, so roughly 120MB/s, with an estimate of 39 hours to complete. The md2_raid6 andmd2_resync processes are using 2% and 12% of the cpu, respectively. IO wait is under 5%.

    First change set:ssyyssccttll --ww ddeevv..rraaiidd..ssppeeeedd__lliimmiitt__mmiinn==110000000000rreenniiccee --2200 --pp $$((ppiiddooff mmdd22__rraaiidd66))rreenniiccee --2200 --pp $$((ppiiddooff mmdd22__rreessyynncc))iioonniiccee --nn 00 --cc 11 $$((ppiiddooff mmdd22__rraaiidd66))iioonniiccee --nn 00 --cc 11 $$((ppiiddooff mmdd22__rreessyynncc))

    SSttooppppeedd bbaacckkuupp pprroocceessss

    Now we're up to 252MB/s, 19.2 hours to rebuild. IO wait jumped to 20%, memory usage began steadily rising,CPU usage jumped to 50% or so.

    Raid 6 Rebuild: Tests & Tweaks file:///Users/dlinz/Documents/My Notes.enex/Raid 6 Rebuild: T...

    1 of 3 9/7/14, 12:29 AM

  • Second change set:

    Used 'echo [value] >' on each of the following kernel memory and disk parameters, repeating thechanges to device sda for each disk, a-j/proc/sys/vm/dirty_ratio = 90 (% per process, default = 10)/proc/sys/vm/dirty_background_ratio = 80 (% system-wide, default = 5)/proc/sys/vm/dirty_expire_centisecs = 6000 (60s, default = 30s)/proc/sys/vm/dirty_writeback_centisecs = 4000 (40s, default = 5s/sys/block/sda/queue/scheduler = "deadline" (default = "cfq")/sys/block/sda/queue/nr_requests = 16384 (distinct IO requests, default = 128)blockdev --setra 3584 /dev/md2 (doubled the default readahead values)

    Rebuilding at 280 MB/s now, 17.67 hours-- but more importantly, IO wait halved back down to 10% with thesechanges.

    Third change set:

    Change stripe cache size from 1024 to 8192KB:

    eecchhoo 88119922 >> /sys/block/md2/mmdd//ssttrriippee__ccaacchhee__ssiizzee

    This change had the most significant impact:

    Rebuild rate increased from 31MB/s per disk to 49MB/s per disk, nearly 450MB/s combined

    IOPS were cut in half or better, down to 2,800 per disk, roughly 20,000 combined

    Memory utilization jumped sharply, up to 84% and 90% by the time the rebuild finished.

    IO wait shot up to 25% from 10%.

    Write speed plumetted to 200 KB/s

    TToottaall ttiimmee ttoo ccoommpplleettee rreebbuuiilldd:: 1100 hhoouurrss

    There was one change suggested that can only be turned on while the array is synced and healthy:

    mdadm --grow /dev/md2 --bitmap=internal

    Conclusion:

    Processing power is definitely not the limiting factor--quite the opposite, processor usage

    Raid 6 Rebuild: Tests & Tweaks file:///Users/dlinz/Documents/My Notes.enex/Raid 6 Rebuild: T...

    2 of 3 9/7/14, 12:29 AM

  • was the only resource with a healthy b uffer with the final changes. The most influentialfactors were low level kernel modifications, with memory and memory-related tweakshaving the biggest impact. The web interface was always snappy, and only at the very enddid it start to be just a little bit slower than it normally is. Rebuild time was reduced by 30hours with the most aggressive changes, cut to 1/4 the time of the default settings. Usingsettings that had very little impact on system performance, rebuild time was reduced bymore than 21 hours, down to under 18 hours start to finish.

    Everything I changed here can also be changed on a Thecus, though I'm not sure if it hasblockdev, ionic and pidof by default. You would also need to install the ssh add on module.It seems like Thecus has the right idea with the larger amount of memory, but from thelooks of it, the Xeon processors Thecus uses are more or less a complete waste.

    Reference used:http://www.fccps.cz/download/adv/frr/hdd/hdd.html (amazing document, extremely technical though)http://ubuntuforums.org/showthread.php?t=1715955

    Raid 6 Rebuild: Tests & Tweaks file:///Users/dlinz/Documents/My Notes.enex/Raid 6 Rebuild: T...

    3 of 3 9/7/14, 12:29 AM