towards a redundancy -aware network stack for data centersmusa/uploads/hotnets_2016_talk.pdf ·...

Post on 21-May-2020

8 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

TowardsaRedundancy-AwareNetworkStackforDataCenters

AliMusaIftikhar(Tufts)

Ihsan AyyubQazi(LUMS)

FahadR.Dogar(Tufts)

TheProblemofTailLatency inDataCenters!

2

TheProblemofTailLatency inDataCenters!

Highfan-out

3

TheProblemofTailLatency inDataCenters!

• Loadimbalance• Backgroundtasks• Failures,etc.

Highfan-out

Straggler

4

TheProblemofTailLatency inDataCenters!

• Loadimbalance• Backgroundtasks• Failures,etc.

LongtaillatencyHighfan-out Stragglers+

Highfan-out

Straggler

5

Howtoavoidstragglers?

Reactively Proactively

6

Howtoavoidstragglers?

Reactively Proactively

PRO:lowoverhead

Hopper(SIGCOMM’15)C3(NSDI’15)

Sinbad(SIGCOMM’13)

CON:requiresstragglerdetection(slowandinaccurate)

7

Howtoavoidstragglers?

Reactively Proactively

PRO:lowoverhead

Dolly(NSDI’13)Lowlatencyvia

redundancy(CoNext’13)

Hopper(SIGCOMM’15)C3(NSDI’15)

Sinbad(SIGCOMM’13)

PRO:fastandaccurate

CON:requiresdeterminingthresholdload(non-trivial)

CON:requiresstragglerdetection(slowandinaccurate)

8

Howtoavoidstragglers?

Reactively Proactively

PRO:lowoverhead

Can we achieve the benefits of both without their limitations?

Dolly(NSDI’13)Lowlatencyvia

redundancy(CoNext’13)

Hopper(SIGCOMM’15)C3(NSDI’15)

Sinbad(SIGCOMM’13)

PRO:fastandaccurate

CON:requiresdeterminingthresholdload(non-trivial)

CON:requiresstragglerdetection(slowandinaccurate)

9

Overview• Duplicate-Aware Scheduling Framework

• Redundancy-Aware Network Stack

• Preliminary Results

Genericframework

NewnetworkstackforDC

10

Duplicate-awarescheduling

Replica1

Client

Replica2 11

Duplicate-awarescheduling

Replica1

Client

Replica2

high

low

high

low

1. PriorityQueues

12

Duplicate-awarescheduling

Replica1

Client

Replica2

high

low

high

low

request

1. PriorityQueues

13

Duplicate-awarescheduling

Replica1

Client

Replica2

high

low

high

low

request

1. PriorityQueues

14

Duplicate-awarescheduling

Replica1

Client

Replica2

high

low

high

low

P

request

B

1. PriorityQueues

15

Duplicate-awarescheduling

Replica1

Client

Replica2

high

low

P

high

lowB

request

1. PriorityQueues

16

Duplicate-awarescheduling

Replica1

Client

Replica2

high

low

high

lowB

request

1. PriorityQueues

17

Duplicate-awarescheduling

Replica1

Client

Replica2

high

low

high

lowB

request

purge

1. PriorityQueues2. Purging

18

NeedforPriorityQueuing

high

lowbackup

primary

19

ØDuplicationhasanoverhead!

L

NeedforPriorityQueuing

high

lowbackup

primary

üStrictprioritiesüWorkconservationüPreemption

20

ØDuplicationhasanoverhead!

LPropertiesrequired:

NeedforPriorityQueuing

high

lowbackup

primary

üStrictprioritiesüWorkconservationüPreemption

21

ØDuplicationhasanoverhead!

LPropertiesrequired:

PQ makes the overhead of duplication low. sJ

NeedforPriorityQueuing

high

lowbackup

primary

üStrictprioritiesüWorkconservationüPreemption

22

ØDuplicationhasanoverhead!

LPropertiesrequired:

PQ makes the overhead of duplication low. sJessential

ImportanceofPurging

ØStalerequestsblocknewrequests.

Lhigh

lowreq1req2

stale

23

ImportanceofPurging

high

lowreq1req2

stale

ØStalerequestsblocknewrequests.

L

Purging makes the system more efficient! A J24

ImportanceofPurging

high

lowreq1req2

stale

ØStalerequestsblocknewrequests.

L

Purging makes the system more efficient! A Joptimization 25

RealizingDuplicate-AwareSchedulingateverypotentialbottleneck resourceinaDC

26

RealizingDuplicate-AwareSchedulingateverypotentialbottleneck resourceinaDC

Network

27

RealizingDuplicate-AwareSchedulingateverypotentialbottleneck resourceinaDC

Compute

Network

28

RealizingDuplicate-AwareSchedulingateverypotentialbottleneck resourceinaDC

Memory

Compute

Network

29

RealizingDuplicate-AwareSchedulingateverypotentialbottleneck resourceinaDC

GFSHDFS BigTable

Memory

Compute

Filesystem/Database

Network

30

RealizingDuplicate-AwareSchedulingateverypotentialbottleneck resourceinaDC

GFSHDFS BigTable

Storage

Memory

Compute

Filesystem/Database

Network

31

ateverypotentialbottleneck resourceinaDC

GFSHDFS BigTable

Memory

Compute

Storage

Network

Filesystem/Database

In-networkpurging

Prioritization

Purging+preemption

challenges

32

RealizingDuplicate-AwareScheduling

RedundancyAwareNetworkStack(RANS)

Application

Transport

Link

Network

Duplicate-Awareness

Pointtomultipoint

PriorityQueues

Physical Sameasbefore

Sameasbefore

ExpressiveInterface

Layer NewRole

+purging

+purging

33

RedundancyAwareNetworkStack(RANS)

Application

Transport

Link

Network

Duplicate-Awareness

Pointtomultipoint

PriorityQueues

Physical Sameasbefore

Sameasbefore

ExpressiveInterface

Layer NewRole

+purging

Applicationsneedtobemodified.

challenge

+purging

34

RedundancyAwareNetworkStack(RANS)

Application

Transport

Link

Network

Duplicate-Awareness

Pointtomultipoint

PriorityQueues

Physical Sameasbefore

Sameasbefore

ExpressiveInterface

Layer NewRole

+purging

Applicationsneedtobemodified.

Expressiveinterfaceallowsrichcommunicationb/wAppand

Transport.E.g.DAG

challenge

opportunity

+purging

35

RedundancyAwareNetworkStack(RANS)

Application

Transport

Link

Network

Duplicate-Awareness

Pointtomultipoint

PriorityQueues

Physical Sameasbefore

Sameasbefore

ExpressiveInterface

Layer NewRole

+purging

Applicationsneedtobemodified.

Expressiveinterfaceallowsrichcommunicationb/wAppand

Transport.E.g.DAG

challenge

opportunity

Hardtoimplementperpacketpurging.

challenge+purging

36

RedundancyAwareNetworkStack(RANS)

Application

Transport

Link

Network

Duplicate-Awareness

Pointtomultipoint

PriorityQueues

Physical Sameasbefore

Sameasbefore

ExpressiveInterface

Layer NewRole

+purging

Applicationsneedtobemodified.

Expressiveinterfaceallowsrichcommunicationb/wAppand

Transport.E.g.DAG

challenge

opportunity

Hardtoimplementperpacketpurging.

challenge

AddssupportforexistingPQsinDCswitches.

opportunity

+purging

37

RedundancyAwareNetworkStack(RANS)

Application

Transport

Link

Network

Duplicate-Awareness

Pointtomultipoint

PriorityQueues

Physical Sameasbefore

Sameasbefore

ExpressiveInterface

Layer NewRole

+purging

Applicationsneedtobemodified.

Expressiveinterfaceallowsrichcommunicationb/wAppand

Transport.E.g.DAG

challenge

opportunity

Hardtoimplementperpacketpurging.

challenge

AddssupportforexistingPQsinDCswitches.

opportunity

+purging

38

e.g.Improvedfaulttolerance

üMultipath

üMulti-destination

RANSTransport:PointtoMulti-point

ØEnables:Richtransport

Sender1(replica1)

Receiver(client)

Sender2(replica2)

39

RANSTransport:ByteAggregation

Sender1(replica1)

Receiver(client)

Sender2(replica2)

ØOpportunity:Receiverdriventransport

Response

e.g.Moreefficientcongestioncontrol(2xormore)

üTwoormoreresponsestreams

üAggregatebytesatreceiverside

40

RANSTransport:PriorityAssignment

Sender1(replica1)

Receiver(client)

Sender2(replica2)

ØDynamicreplicaassignment

Response+Feedback

e.g.Improvedreplicaassignment

üFinegrainedmonitoringofcongestionwindow

üDynamicallyreprioritizeflows

üFeedbacktoApplication

41

Overview• Duplicate-Aware Scheduling Framework

• Redundancy-Aware Network Stack

• Preliminary Results

42

PreliminaryEvaluation:ns-2setupdetailsØ Storagescenario

Client

10servers

Replica1

Replica2

43

PreliminaryEvaluation:ns-2setupdetailsØ Storagescenario

Client

10servers

Replica1

Replica2

bottlenecks

44

PreliminaryEvaluation:ns-2setupdetails

TrafficDetails

Totalrequests 20K

Arrivalprocess Poisson

Server&replicaselection Uniformlyrandom

Ø Storagescenario

Client

10servers

Replica1

Replica2

bottlenecks

45

PreliminaryEvaluation:ns-2setupdetails

TrafficDetails

Totalrequests 20K

Arrivalprocess Poisson

Server&replicaselection Uniformlyrandom

Ø Storagescenario

Client

10servers

Replica1

Replica2

bottlenecks

The only source of stragglers is load imbalance.

46

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

47

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

48

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

49

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

50

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

51

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

~2X

52

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

~2X

Expecting more gains even at lower loads with additional straggler sources.

53

Noduplicates(baseline)

2-copies(proactivew/oPQ)

+PQs

+Purging

+ByteAggregation(RANS)

Averagerequestcompletiontimeof:

Load(%)

Requ

estcom

pletiontim

e(s)

50-80% improvement over the baseline across all loads.

Expecting more gains even at lower loads with additional straggler sources.

~2X

54

Summary&Futurework

• TheIssueofStragglers

• Duplicate-AwareSchedulingFramework

• RANS

• ImplementinginHDFSandCassandra

Simpleyetchallengingsolution

Afirststeptowardsaduplicate-awarenetwork

55

RANS:FeedbackandDiscussion

• AliMusaIftikhar(musa@cs.tufts.edu)

• FahadR.Dogar (fahad@cs.tufts.edu)

• Ihsan A.Qazi (ihsan.qazi@lums.edu.pk)Transport

Link

Network

PointtomultipointByteaggregationPriorityassignment

PriorityQueues

Physical Sameasbefore

Sameasbefore

ExpressiveInterface

Application Duplicate-Awareness

Layer NewRole

+purging

+purging

56

Possiblequestions– backupslide• Preemptionoverhead

• Notreallyanissueinthenetworkbecausepacketsaresmall.

• Packetpurging• PFC(backpressure,buildqueuesattheendhostsandpurgethem)

• Droptheentireduplicatequeue(easierthanper-packetdrops)

• Recenttrendtowardsprogrammableswitches

• GainswithPQ• Moregainswithfailuresasstragglers(primaryundergoesafailure)

• Alsomorebenefitswithdifferentresources

• Duplicationoverheadatclient• Clientisusuallynotthebottleneck

• Non-Idempotentrequests• Wearetargetingtheclassofappswhichhaveflexibleendpointsandrequireatleastoncesemantics

• Replicatingonlysmallpacketsandprioritizingthem• Onlybeneficialwithbursty smallflows• HDFShaveatypicalchunksizeb/w64MB-128MB

• Quorumsystems• RANScomplementssuchsystems,theycanusethistechniqueandsendKoutofNrequestsathighprio whileN-Kasbackups

• Can’tjustimplementattheappandgetthesamebenefits?• Networkcouldbeabottleneck• Finegrainedcontrol,muchmorecontrol

• Rootcausesofperformanceimprovement• PQavoidsoverheads• Nowwecaneasilygetthebenefitsofduplicationslikeaggregationetc.

• Purgingwillalsoattimespurgeprimarymakingthesystemmoreefficient.

57

Foodforthought

DCPrimary DCFailover

InterDCDuplicate-AwareScheduling

e.g.Google’sGeo-DistributedDatabase“Spanner”(OSDI’12)

58

Foodforthought

DCPrimary DCFailover

InterDCDuplicate-AwareScheduling

e.g.Google’sGeo-DistributedDatabase“Spanner”(OSDI’12)

Prefetch

SearchsuggestionsSpellcheck

• Searchenginesdropspellcheck,suggestions,etc.athighloads.

• Canbenefitfromduplicate-awarescheduling.

59

WhenRANSworksbest?

• Applicationfanout ishighandstragglersarefrequent.• End-pointsareflexibleand“atleastonce”semanticsaresufficient.• Clientisnotthebottleneck.• Requestsizesaresmall(orpreemptionoverheadisminimal).

60

top related