towards weakly consistent local storage systemsjyshin/talks/socc16-poster-shin.pdftowards weakly...
TRANSCRIPT
TowardsWeaklyConsistentLocalStorageSystemsJi-YongShin1,2,MaheshBalakrishnan2,TudorMarian3,JakubSzefer2andHakimWeatherspoon1
1CornellUniversity,2YaleUniversity,3Google
StaleStore
• Primary/BackupseLng• Primaryperformstheworstduetonetworkdelays(100ms)• YogurtperformsbePerthanlocallatestbyusingthetrade-off
Performance:AccessingBlocksandK-VPairs
• Modernserversareaspowerfulasdistributedsystemsinthepastü CPUandstoragedevicesareparallel,similartodistributednodes• Goalistotrade-offconsistencyandperformanceinalocalstoreü UseofstaledataindifferentstoragedevicesforbePerperformance
ServerTrends
GetCostOverhead
Yogurt:ABlockLevelStaleStore
Summary
• Modernserversaresimilartodistributedsystems
• Localstoragesystemscanadoptweakconsistencyü WedefinethemasStaleStores
• Yogurt,ablocklevelStaleStoreü EffecYvelytrades-offconsistencyandperformanceü SupportshighlevelmulY-blockdataconstructs
Year 2006 2016 ComparisonsModel(4U) DellPowerEdge6850 DellPowerEdgeR930
CPU[#ofcores]
4×2coreXeon[8]
4×24coreXeon[96] 12X
Memory 64GB 6TB 96XNetworkbandwidth 2×1GigE
2×1GigE2×10GigE 11X
Storage 8×SCSI/SASHDD 24×SASHDD/SSD10xPCIeSSD
#ofdevices:4.2XCapacity:175.3X
UseofSSDs
DistributedvsModernServerDistributedSystems ModernServersDifferentversionsofdataexistindifferentserversduetonetworkdelaysduringreplicaYon
Differentversionsofdataexistindifferentstoragemediaduetologging,caching,copy-on-write,deduplicaYon,etc.
Olderversionsarefastertoaccesswhenthenetworkoverheadislow
Olderversionsarefastertoaccesswhentheyareonfasterstoragemedia
Reasonsfordifferentaccessspeedsü RAM,SSD,HDD,hybrid-drives,etc.ü DiskwitharmcontenYonorSSDundergarbagecollecYonü RAIDunderdegradedmode
• Localstoragesystemsinanyformthatcantrade-offconsistencyandperformance(e.g.KV-store,filesystem,blockstore,DB,etc.)
Requirements:1. MaintainmulYpleversionsofdata-Shouldhaveinterfacetoaccessolderversions2. AwareofconsistencysemanYcs-BoundedStaleness,monotonic-reads,read-my-writes,etc.3. CangivecostesYmatesforaccessingeachversion-ConsideraYonsfordatalocaYonsandstoragecondiYons
1. IssueGetCost()forblock1betweenversions3and6(Nquerieswithuniformdistance)
2. Readthecheapest:e.g.1(5):Read(1,5)3. Recordtheselectedversionforblock1
3(3) 1(4) 2(4) 1(5) 3(5) 1(6)
Cache… … Lo
g
I/O
Write(blk,data,ver),Read(blk,ver)
VersionedwritestosnapshotsVersionedreadsfromsnapshots
Cost GetCost(blk,ver) cache<<disk,#ofqueuedI/O(read<<write)
MulY-blockobjectaccess
GetVersonRange(blk,ver) Returnsaversionrangewhichablockisvalid
Readingblock1(monotonic-reads)
• Key-valuestores,filesystemscanstoreanobjectovermulYpleblocks• Readshouldbeservedfromapersistentsnapshot:GetVersionRange()
MulY-BlockObjectAccessinYogurt
Hard DriveDisk
Solid State
Disk
0
1
2
3
Drive
Solid State
Solid State Disk31
11
32
00
0 1 2 BlockAddr
Timestamp
(Snapsho
t#)
0
50000
100000
150000
200000
1 2 3 4 5 6 7 8
AverageRe
adLaten
cy
(us)
#ofStaleVersions@startOme
PrimaryLocallatestYogurtMRYogurtRMW
0
50000
100000
150000
200000
4KB 8KB 12KB 16KB 20KBKey-ValuePairSize
01234567
32B(3)
64B(7)
128B(15)
256B(31)
512B(63)
1024B(127)
AverageLatency(us)
GetCostQuerySize(#ofqueries)
• CostqueryingoverheadisnegligiblecomparedtodiskandSSDaccesslatencies
OtherPossibleStaleStores• Singledisklog-structuredstore• SSDflashtranslaYonlayers• Log-structuredarrays• Durablewritecachesthatarefastforwritesbutslowforreads
• Deduplicatedsystemswithreadcaches• Fine-grainedloggingoverablock-grainedcache
• Systemsstoringdifferencesfrompreviousversions