practical web-based delta sync for cloud storage services · web apps with local storage or log...

Post on 08-Aug-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

HeXiao Zhenhua Li Ennan Zhai Tianyin Xu

PracticalWeb-basedDeltaSyncforCloudStorageServices

xiaoh16@gmail.comJuly10,2017Hotstorage’17

NetworkTrafficisOverwhelminginCloudStorage

FileSync

2

CloudTraffichas30%CAGR(CompoundAverage Growth Rate)

SeverClient

NetworkTrafficUsers Vendors

DeltaSyncImproves Network Efficiency

DeltaSynciscrucialforreducingcloudstoragenetworktraffic.

10MB1B

DeltaSync

DeltaData

3

NewFile OldFile

Delta sync support in nine state-of-the-art cloud storage services 10MB

FullSyncNewFile OldFile

FullFile

No Web-basedDeltaSync

Whyweb-baseddeltasyncisnotsupportedbytoday’scloudstorageservices?

4

WebAppswithlocalstorageorlogfilesneedweb-basedDeltaSync

WebisthemostpervasiveandOS-independent cloudstorageaccessmethod

Web-baseddeltasyncisessentialforcloudstoragewebclientsandwebapps

Contribution

• Wequantitatively studywhyweb-baseddeltasyncisnotoffered bytoday’scloudstorageservices.

• Webuildapracticalweb-baseddeltasyncsolutionforcloudstorageservices.• Byreversing traditionaldeltasyncprocess,wemaketheoverheadaffordableatthewebclientside.• Byexploitingthelocality ofusers’editsandtradingoffhashalgorithms,wemakethecomputationoverheadaffordableattheserverside.

5

WebRsync:ImplementDeltaSynconWeb

• Implementrsync onrealcloudstoragewith nativewebtech:JavaScript + HTML5 + WebSocket• rsync isthedefactosolutionofdeltasyncincloudstorage

JavaScriptImplementationofRsync

WebServer

LocalFile System

HTML5FileAPI

WebSocket

StorageBackendAliyun OSS/OpenStack Swift

High-SpeedInternalNetwork

Web Browser

CImplementationofRsync

6

WebRsyncvs.rsync

7

Sync time of WebRsync vs rsync

Average Client CPU utilization

StagnationduetoJavaScript’sSingle-thread EventLoopModel

//printtimestampevery100mssetInterval(print(timestamp),100) //printthetimestampofeverykeystone( startorendofatask)on_start(task); print(task.id, timestamp) on_finish(task); print(task.id, timestamp)

8

StagMeter

1.SendmetadataWaitserver

2.ChecksumSearchandComparison

3.SendtokensandliteralbytesWaitserver

High CPU Utilizationwhencomputing

TimestampPrintingissuspendedWebisunderstagation state

StagMeteronWebRsync

9

Sync Process (Second)

WebR2sync:Client-sideOptimizationReverseComputationProcess

Client Server

RequestforSyncingFilef’

ChecksumListoffSegmentationFingerprinting

SearchingComparing

GeneratetokensandLiteralBytes Construct

NewFilefACK

10

WebRsync

WebR2sync:Client-sideoptimizationReverseComputationProcess

• Web Reverse Rsync: Reverse complicated computation fromserver to client.

Client Server

RequestforSyncingFilef’

SegmentationFingerprinting

GenerateTokensAndLiteralBytes

ConstructNewFilefACK

SearchingComparing

ChecksumListoff

11

PerformanceofWebR2sync

Edit Size (Byte)

Sync

Tim

e (S

econ

d)

12

Edit Size (Byte)

Sync

Tim

e (S

econ

d)

Issue:Servertakesseverelyheavyoverhead.

Server-sideOverheadProfiling

Checksumsearching andblockcomparison occupy80%ofthecomputingtime

MD5 Computing Checksum Search

13

Ø UsefasterhashfunctionstoreplaceMD5Ø Reducechecksumsearchingoverhead

ReplacingMD5withSipHashinChunkComparison

HashFunction CollisionProbability

CyclesperByte

MD5 Low 5.58

Murmur3 High 0.33

Spooky High 0.14

SipHash Low 1.13

SipHashremainlowCollisionProbabilityat muchfasterspeed

14

A comparison of pseudorandom hash functions

SolvePossibleHashCollision

• ReplaceMD5withSipHash,maycausepotentialcollisions(Probabilityp),sodoesMD5.

• OurSolution:UseSpooky(fastestmethod,collisionprobabilityp’).• Theprobabilityofcollisionsisp*p’

• Alternative:UseMD5orotherstronghashfunctionsasaglobalverification.• ComputeMD5overwholefileisexpensive.

15

ReduceChunkSearchingbyExploitingLocality ofFileEdits.

16

MD5-4

HashTableAdler32-1 Adler32-2 Adler32-3 Adler32-4

MD5-1 MD5-2 MD5-3

Block1 Block2 Block3 Block4

Checksumsearch

Compare

95%synchronizedfileshavelessthan10 edits.

EvaluationSetup

17

Basic experiment setup visualized in a map of China

SyncTime

18

1 10 100 1K 10K 100kEdit Size (Byte)

10-1

100

101Sy

nc T

ime

(Sec

ond) WebRsync

WebR2syncWebR2sync+rsync

WebR2sync+is2-3 times fasterthanWebR2syncand15-20timesfasterthanWebRsync

Throughput

19

0 2000 4000 6000 8000Number of Concurrent Users

NoWebRsync

WebRsync

WebR2sync

WebR2sync+

rsync

Thisthroughputisas4 timesasthatofWebR2sync/rsyncandas 9timesasthatofNoWebRsync.

FutureWork

• Evaluateourapproachunderdifferenteditmodes• delete,insert,append

• Evaluatetrafficefficiency• allthemethodsshouldhavesimilartrafficefficiency

• Understandtheeffectsofthreeoptimizations• evaluatethemseparately

20

Discussion

• Probabilityofcollisionsoffilechecksums

• Characteristicsoffileoperationsinreal-worldscenariosfromtheperspectiveofsync

• Localitymeasurefordecidingwhethertoapplylocality-basedoptimization.

21

Conclusion

•WebR2sync+isapracticalsolutionforweb-baseddeltasync• lightweightcomputation attheclientside• optimizedoverheadattheserverside• theserver-sideoptimizationscanbeadoptedinthetraditionalcloudstoragearchitecture

22

Thanks!discussion

23

WebRsyncDetailed Description

Block1

Block2

Block3

Adler32 MD5

Adler32 MD5

Adler32 MD5

… …

WeakChecksumSearch

StrongChecksumCompare

1 block offset

YES

YES

NO

NO

MatchedTokens LiteralBytes ConstructNewFile

Client Server

1 byte offset

Rolling Adler32O(1): Adler(i)=>Adler(i+1)

24

WebR2sync:FlowchartandData structure

ConstructNewFilesClient Server

WeakChecksumSearch

StrongChecksumCompare

YES

NO

NO

1 byte offsetNo further Operation

YESBlock 1Block 2Block 3Block 4

Block 1Block 2Block 3Block 4

Whenfind amatch,recordtheassociatedindex

25

SyncTimedecomposed

26

1 10 100 1K 10K 100KEdit Size (Byte)

0

0.05

0.1

0.15

0.2Sy

nc T

ime

(Sec

ond) Server

NetworkClient

WebR2sync+clienttakesstableandshortertime.BecauseoftheServer-sideoptimization,computingtimeismuchshorterbothinclientandserver.

top related