Transcript
Page 1: DevAgent: Analyzing the Reliability of Storage Software ...esaule/NSF-PI-CSR-2017... · [6] Emulating Realistic Flash Device Errors with High Fidelity (NAS’16-Poster) SSDs exhibit

DevAgent:AnalyzingtheReliabilityofStorageSoftwareStackviaSmartDevicesMaiZheng,ComputerScienceDepartment,NewMexicoStateUniversity

[email protected]

Background&Motivation

• NewNVM-basedcomponentsarerevolutionizingthetraditionalstoragesystems,potentiallycreatingnewfailuremodesdifficulttounderstand

Observations&Approach

Acknowledgement

0

0.5

1

1.5

2

0 1000 2000 3000 4000 5000 6000 7000 8000

0 = uncom

mitted, 1

= commi

tted

write requests

0

0.5

1

1.5

2

0 1000 2000 3000 4000 5000 6000 7000 8000

0 = drop

ped, 1 =

committe

d

write requests

Results&Milestones

This material is based upon work supported in part by the National Science Foundation(NSF) under Grant Number 1566554 (CRII). Any opinions, findings, and conclusions orrecommendations expressed in this material are those of the author(s) and do notnecessarily reflect the views of the NSF.

[1]UnderstandingtheFaultResilienceofFileSystemCheckers(HotStorage’17)[2]OnFaultResilienceofFileSystemCheckers(FAST’17-WiP)[3]DoNotBlameDevicesforAllFailures(NVMW’17- Poster)[4]AGenericFrameworkforTestingParallelFileSystems (PDSW’16)[5]ReliabilityAnalysisofSSDsunderPowerFault (TOCS’16)[6]EmulatingRealisticFlashDeviceErrorswithHighFidelity(NAS’16- Poster)

SSDsexhibitdifferent#oferrorswhentestedondifferentOSes

devicedriver

blocklayer

filesystem

device

userlevel

OS (Kernel) SSD 1 SSD 2 SSD 3Debian 6 (2.6.32) 317 991 2Ubuntu 14 (3.16) 88 0 1

DevAgent-FW

June2015,flash-basedserversinAlgoliadatacenterstartedcorruptingfiles;developers“spentabigportionoftwoweeksjustisolatingmachinesandrestoringthemasquicklyaspossible”;SamsungSSDsweremistakenly blamed&blacklisted(untilonemonthlatertheyidentifiedakernelbug)

• Failureexample1:

• Failureexample2:

• Limitation of exiting analysis tools:- heavily rely on kernel &assumekernel is correct

• Wecan no longer completely trust thekernel=> analysis tools shouldminimize dependency/interferenceon kernel

• Wecan no longer focus on single component=> cross-layer analysis• Weneed to focus on generic & fundamental operations => interfacesb/w layers

• Publications:

• Prototyping DevAgent-FW onCosmosOpenSSD Platform[3]

• Emulatingrealisticdevicebehaviorsfortestinghighersoftwarelayers[5][6]

• Exposingvulnerabilitiesinpopularfilesystemcheckers[1][2]

• Analyzingfine-grainedI/Obehaviorofparallelfilesystems[4]

syscallinterface

deviceinterface SCSI/NVMe

systemcalls

iSCSINVMe/Fabrics DevAgent-SW

DevAgent-Usr

• Twokeyfunctionalities:- recordhost/deviceinteraction

fordiagnosis- emulatedevicebehaviorfor

testinghigherlayersfirmwareimplementation

(minimalintrusionintokernel)

softwareimplementation(goodportability)

- mapfunctioncallstodevice-levelcommandsforreasoninghigh-levelsemantics

• Identifyingdevice-levelimpactofkernelbugpatches[3]

- real

V.S.

- emulated

Top Related