distributed debugging

51
Distributed Debugging Presenter: Chi-Hung Lu 1

Upload: zachary-rivas

Post on 30-Dec-2015

35 views

Category:

Documents


0 download

DESCRIPTION

Distributed Debugging. Presenter: Chi-Hung Lu. Problems. Distributed applications are hard to validate Distribution of application state across many distinct execution environments Protocols involve complex interactions among a collection of networked machines - PowerPoint PPT Presentation

TRANSCRIPT

X-Trace: A Pervasive Network Tracing Framework

Distributed DebuggingPresenter: Chi-Hung Lu11ProblemsDistributed applications are hard to validateDistribution of application state across many distinct execution environmentsProtocols involve complex interactions among a collection of networked machinesNeed to handle failures ranging from network problems to crashing nodesIntricate sequences of events can trigger complex errors as a result of mishandled corner cases2ApproachesLogging-based DebuggingX-TraceBi-directional Distributed BackTracker (BDB)PipDeterministic ReplayWiDSFridayJockeyModel CheckingMaceMC3X-Trace: A Pervasive Network Tracing FrameworkR. Fonseca et al, NSDI 074Problem DescriptionIt is difficult to diagnose the source of the problem for an internet applicationCurrent network diagnostic tools only focus on one particular protocolDoes not share information on the application between the user, service, and the network operators5ExamplestracerouteCould locate IP connectivity problemCould not reveal proxy or DNS failuresHTTP monitoring suiteCould locate application problemCould not diagnose routing problems6Examples7

UserDNS ServerProxyWeb ServerExamples8

UserDNS ServerProxyWeb ServerExamples9

UserDNS ServerProxyWeb ServerExamples10

UserDNS ServerProxyWeb ServerX-TraceAn integrated tracing frameworkRecord the network path that were takenInvoke X-Trace when initiating an application taskInsert X-Trace metadata with a task identifier in the requestPropagate the metadata down to lower layers through protocol interfaces

11Task TreeX-Trace tags all network operations resulting from a particular task with the same task identifierTask tree is the set of network operations connected with an initial taskTask tree could be reconstruct after collecting trace data with reports12An example of the task treeA simple HTTP request through a proxy

13X-Trace ComponentsDataX-Trace metadataNetwork pathTask treeReportReconstruct task tree14Propagation of X-Trace MetadataThe propagation of X-Trace metadata through the task tree

15Propagation of X-Trace MetadataThe propagation of X-Trace metadata through the task tree

16The X Trace metadataFieldUsageFlagsBits that specify which of the three optional components are presentTaskIDAn unique integer IDTreeInfoParentID, OpID, EdgeTypeDestinationSpecify the address that X-Trace report should be sent toOptionsAccommodate future extensions mechanism

17Operation of X-Trace Metadata

18Operation of X-Trace Metadata

19X-Trace Report Architecture

20X-Trace Report Architecture

21X-Trace Report Architecture22

Usage Scenario (1)Web request and recursive DNS queries

23Usage Scenario (2)A request fault annotated with user input

24Usage Scenario (3)A client and a server communicate over I3 overlay network

25Usage Scenario (3)Internet Indirect Infrastructure (I3)26

Usage Scenario (3)Internet Indirect Infrastructure (I3)27

Usage Scenario (3)Internet Indirect Infrastructure (I3)28

Usage Scenario (3)Tree for normal operation

29Usage Scenario (3)The receiver host fails

30Usage Scenario (3)Middlebox process crash

31Usage Scenario (3)The middlebox host fails

32DiscussionReport lossNon-tree request structuresPartial deploymentManaging report trafficSecurity Considerations33WiDS Checker: Combating Bugs in Distributed SystemsX. Liu et al, NSDI 0734Problem DescriptionLog mining is both labor-intensive and fragileLatent bugs often are distributed across multiple nodesLogs reflect incomplete information of an executionNon-determinism of distributed application

35GoalsEfficiently verify application propertiesProvide fairly complete information about an executionReproduce the buggy runs deterministically and faithfully36ApproachLog the actual execution of a distributed system

Apply predicate checking in a centralized simulator over a run driven by testing scripts or replayed by logs

Output violation report along with message tracesAn execution is interpreted as a sequence of events, which are dispatched to corresponding handling routines37ComponentsA versatile script languageAllow a developer to refine system properties into straightforward assertionsA checkerInspect for violations38ArchitectureComponents of WiDS Checker

39ArchitectureReproduce real runsLog all non-deterministic events using Lamports logical clockCheck user-defined predicatesA versatile scription language to specify system states being observed and the predicates for invariants and correctnessScreen out false alarms with auxiliary informationFor liveness propertiesTrace root causes using a visualization tool40Programming with WiDSWiDS APIs are mostly member function of the WiDSObject classWiDS runtime maintains an event queue to buffer pending events and dispatches them to corresponding handling routines41Enabling ReplayLoggingLog all WiDS nondeterminismRedirect OS calls and log the resultsEmbed a Lamport Clock in each out-going messageCheckpointSupport partial replaySave the WiDS process contextReplayStart from the beginning or a checkpointReplay events in serialized Lamport order

42CheckerObserve memory stateDefine states and evaluate predicatesRefresh database for each eventMaintain historyRe-evaluate modified predicatesAuxiliary information for violationsLiveness properties only guarantee to be true eventually43

44

45

46Visualization ToolsMessage flow graph

47EvaluationBenchmark and result summary

48PerformanceRunning time for evaluating predicates

49Logging OverheadPercentage of logging time

50DiscussionSystem is debugged by those who developed itBugs are hunted by those who are intimately familiar with the system51