netcheck: network diagnoses from blackbox traces · 2019. 12. 30. · netcheck: network diagnoses...

67
NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^ , Eleni Gessiou * , Fraida Fund * , Steven Portzer @ , Monzur Muhammad ^ , Ivan Beschastnikh ^ , Justin Cappos * (*) New York University, (^) University of British Columbia, (@) University of Washington

Upload: others

Post on 09-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck: Network Diagnoses from Blackbox Traces

Yanyan Zhuang*^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@, Monzur Muhammad^,

Ivan Beschastnikh^, Justin Cappos*

!(*)New York University, (^)University of British

Columbia, (@)University of Washington

Page 2: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

• Find bugs in networked applications • Large complex unknown applications !!!

• Large complex unknown networks !!!

• Understandable output / fix

Goal

2

Page 3: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client

3

Page 4: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client probing ping

4

Page 5: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client Different traffic (ICMP) Often different result

probing ping

5

Page 6: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client

6

Page 7: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Clientpacket capture

7

Page 8: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Clientpacket capture

Requires detailed protocol / app knowledge

8

Page 9: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client

9

Page 10: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client

ModelModel apps Magpie, Xtrace,

Pip...Model

10

Page 11: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client

ModelModel

Need a model per application

11

Model apps Magpie, Xtrace,

Pip...

Page 12: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client

12

Page 13: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

MotivationChrome Client

Network Config Analysis

Model & Config

Model & Config

Model & Config

Model & Config

13

Header Space Analysis, etc.

Apache Server

Page 14: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client

Network Config Analysis

Model & Config

Model & Config

Model & Config

Model & Config

Need detailed network knowledge HW + config

14

Page 15: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Motivation Apache Server

Chrome Client ?

15

Page 16: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck Apache Server

Chrome Client

programmer

programmer

16

Page 17: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck Apache Server

Chrome Client

programmer

programmer

17

Page 18: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck Apache Server

Chrome Client

Model Programmer’s Understanding

Deutsch’s Fallacies

programmer

programmer

18

Page 19: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

• Motivation • NetCheck Overview • Trace Ordering • Network Model • Fault Classification • Results / Conclusion

Outline

19

Page 20: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck overview

ApplicationFail

Traces

NetCheck

Likely Faults

20

Page 21: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck overview

Application

Traces

NetCheck

Likely Faults

ktrace strace

21

Fail

Page 22: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck overview

Application

Traces

NetCheck

Likely Faults

Ordering Algorithm

Network Model

Diagnoses EngineInput

DiagnosisOutput

Host Traces

NetCheck

syscall simulationresult

simulation stateerrors

22

Page 23: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck overview

Application

Traces

NetCheck

Likely Faults

Network Configuration Issues

Traffic Statistics

Problem Detected

23

Page 24: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

• Motivation • NetCheck Overview • Trace Ordering • Network Model • Fault Classification • Results / Conclusion

Outline

24

Traces (a) Trace Ordering

Page 25: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Series of locally ordered system calls Don’t want to modify apps or use a global clock Gathered by strace, ktrace, systrace, truss, etc. Call arguments and “return values” !socket() = 3 bind(3, …) = 0 listen(3, 1) = 0 accept(3, …) = 4 recv(4, "HTTP", …) = 4 close(4) = 0

Traces

25

Call arguments

Return values

Return buffer

Page 26: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

!Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3, ...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) = 5 4. accept(3, ...) = 4 4. close(3) = 0 5. recv(4,"Hello", ..) = 5 6. close(4) = 0

What we see is this:

- one trace per host - local order but no global order Q: how do we reconstruct what really happened?

26

Page 27: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

A1. socket() = 3 B1. socket() = 3 A2. bind(3, .. .) = 0 A3. listen(3, 1) = 0 B2. connect(3,...) = 0 A4. accept(3, ...) = 4 B3. send(3, "Hello", ...) = 5 A5. recv(4, "Hello", ...) = 5 B4. close(3) = 0 A6. close(4) = 0

What we want is this

The ground truth

A B

27

Page 28: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

A1. socket() = 3 B1. socket() = 3 A2. bind(3, .. .) = 0 A3. listen(3, 1) = 0 B2. connect(3,...) = 0 A4. accept(3, ...) = 4 B3. send(3, "Hello", ...) = 5 A5. recv(4, "Hello", ...) = 5 B4. close(3) = 0 A6. close(4) = 0

What we want is this

The ground truth !!!!!!!Goal: find an equivalent interleaving

A B

28

Page 29: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

!Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3, ...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) = 5 4. accept(3, ...) = 4 4. close(3) = 0 5. recv(4,"Hello", ..) = 5 6. close(4) = 0

Observation 1: Order Equivalence

- one trace per host - local order but no global order Q: how do we reconstruct what really happened? The socket() calls are not visible to the other side Some orders are equivalent! 29

Page 30: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

!Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3, ...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) = 5 4. accept(3, ...) = 4 4. close(3) = 0 5. recv(4,"Hello", ..) = 5 6. close(4) = 0

- one trace per host - local order but no global order Q: how do we reconstruct what really happened?

30

Observation 2: Return Values Guide Ordering

Page 31: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Return values guide ordering

A2. bind(3, ...) = 0 A3. listen(3, 1) = 0 B2. connect(3, ...) = 0 !!A2. bind(3, ...) = 0 B2. connect(3, ...) = -1, ECONNREFUSED A3. listen(3, 1) = 0 !!A call’s return value may-depend-on a remote call’s action Result indicates order of calls 31

!!!!

!!!!

One valid ordering: all syscalls returned successfully.

A second valid ordering: connect failed with ECONNREFUSED.

Page 32: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Deciding call order

full set of may-depend-on relations

socketbind getsockopt,setsockoptgetsockname

accept getpeername

poll, select

connect recv, recvfrom, recvmsg, read

send, sendto, sendmsg, write, writev, sendfileclose, shutdownlisten

32

Page 33: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Ordering Algorithm

33

Input traces

Output Ordering

Algorithm processsocket socket

connect

send

recv

accept

listen

bind

A B

Page 34: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Ordering Algorithm

34

Input traces

Output Ordering

Try socket on host A: accepted

Algorithm processsocket socket

connect

send

recv

accept

listen

bind

A B

socket

A

Page 35: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

connect

Ordering Algorithm

35

Input traces

Output Ordering

Try connect on host B:

Algorithm process

send

recv

accept

listen

A B

socket

Asocket

Bbind

A

connect rejected

Page 36: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

listen

Ordering Algorithm

36

Input traces

Output Ordering

Try listen on host A: accepted

Algorithm processconnect

send

recv

accept

A B

socket

Asocket

Bbind

Alisten

A

Page 37: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

recvrecv rejected

Ordering Algorithm

37

Input traces

Output Ordering

Try recv on host A:

Algorithm process

send

A B

socket

Asocket

Bbind

Alisten

Aconnect

Baccept

A

TCP BUFFER: “”

“Hola!”

Page 38: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

None

Ordering Algorithm

38

Input traces

Output Ordering

Try send on host B: accepted

Algorithm process

sendrecv

A B

socket

Asocket

Bbind

Alisten

Aconnect

Baccept

A

sendB

TCP BUFFER: “”

“Hola!”

Page 39: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Ordering Algorithm

39

Input traces

Output Ordering

Try send on host B: accepted

Algorithm process

recv

A B

socket

Asocket

Bbind

Alisten

Aconnect

Baccept

A

sendB

TCP BUFFER: “Hello”

None

“Hola!”

Page 40: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

recvrecv

Fatal Error

Ordering Algorithm

40

Input traces

Output Ordering

Try recv on host A:

Algorithm processA B

socket

Asocket

Bbind

Alisten

Aconnect

Baccept

A

None

sendB

TCP BUFFER: “Hello”

“Hola!”

Page 41: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

• Motivation • NetCheck Overview • Trace Ordering • Network Model • Fault Classification • Results / Conclusion

Outline

41

Model

Accept

Reject

Fatal Error

Page 42: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Simulates invocation of a syscall ○ datagrams sent/lost ○ reordering / duplication is notable

○ track pending connections ○ buffer lengths and contents ○ send -> put data into buffer ○ recv -> pop data from buffer !

● Simulation outcome ○ Accept → can process (correct buffer) ○ Reject → wrong order (incomplete buffer) ○ Permanent reject → abnormal behavior (incorrect buffer)

Network Model

Model

Accept

Reject

Fatal Error

42

Page 43: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Simulates invocation of a syscall ● Capture programmer assumptions

● Assumes a simplified network view • Assume transitive connectivity • Little, random loss • No middle boxes

• Assume uniform platform • Flag OS differences

Network Model

43

Page 44: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Blackbox Tracing mechanism

How Model Return Values Impact Trace Ordering

Trace Ordering: linear running time (total trace length) * number of traces

44

Ordering Algorithm

Network Model

Diagnoses EngineInput

DiagnosisOutput

Host Traces

NetCheck

syscall simulationresult

simulation stateerrors

Page 45: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

• Motivation • NetCheck Overview • Trace Ordering • Network Model • Fault Classification • Results / Conclusion

Outline

45

(c) Fault Classifier

Output45

Page 46: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Goal: Decide what to output ● Problem: Show relevant information ● Fault classifier: global (rather than local) view

○ uncovers high-level patterns by extracting low-level features ○ Examples: middleboxes, non-transitive

connectivity, MTU, mobility, network disconnection

○ All look like loss, but have different patterns in the context of other flows

Fault Classifier

46

Page 47: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Options to show different levels of detail ● Network admins / developers

● detailed info ● End users

● Classification ● Recommendations

Fault Classifier

Network Configuration Issues

Traffic Statistics

Problem Detected

47

Page 48: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

• Motivation • NetCheck Overview • Trace Ordering • Network Model • Fault Classification • Results / Conclusion

Outline

48

Page 49: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Reproduce reported bugs from bug trackers (Python, Apache, Ruby, Firefox, etc.) ○ A total of 71 bugs ○ Grouped into 23 categories

■ Virtualization incurred/portability bugs ■ SO_REUSEADDR behaves differently across OSes ■ accept inherit O_NONBLOCK ■ …

○ Correct analysis of >95% bugs

Evaluation: Production Application Bugs

49

Page 50: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Twenty faults observed in practice on a live network ○ MTU bug

■ Intermediary device ○ Port forward

■ Traffic sent to non-relevant addresses ○ Provide supplemental info

■ packet loss ■ buffers being closed with data in

○ 90% of cases correctly detected

Evaluation: Observed Network Faults

50

Page 51: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Middle boxes ○ Multiple unaccepted connections ■ client behind NAT in FTP

• TCP/UDP ▪ non-transitive connectivity in VLC

• Complex failures oVirtualBox send data larger than buffer size oPidgin returned IP different from bind oSkype NAT + close socket from a different thread

• Used on Seattle Testbed seattle.poly.edu

General Findings in Practice

51

Page 52: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

NetCheck Performance Overhead

52

Firefox

Skype

Telnet

SSH

VLC

Page 53: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Built and evaluated NetCheck, a tool to diagnose network failures in complex apps

!● Key insights:

○ model the programmer’s misconceptions ○ relation between calls → reconstruct order

● NetCheck is effective

○ Everyday applications & networks ○ Real network / application bugs ○ No per-network knowledge ○ No per-application knowledge

Try it here: https://netcheck.poly.edu/ 53

Conclusion

Page 54: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Backup slides.

54

Page 55: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

○ No app- or network-specific knowledge ○ No modification to apps/infrastructure ○ No synchronized global clock !

● Blackbox Tracing mechanism (eg, strace) ○ Reconstruct a plausible total ordering of

syscall traces from multiple hosts ○ Uses simulation and captured state to identify

network related issues ○ Map low-level issues to higher-level

characterizations of failure

What is NetCheck?

55

Page 56: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Blackbox Tracing mechanism

Diagnosis Model

Trace Ordering

Application-Agnostic Model

Collating Fault

Classifier

Call depen- dency

Traces

56

Page 57: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Blackbox Tracing mechanism

Diagnosis Model

Trace Ordering

Application-Agnostic Model

Collating Fault

Classifier

Call depen- dency

accept/reject/FE

Traces

57

Page 58: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Blackbox Tracing mechanism

Diagnosis Model

Trace Ordering

Application-Agnostic Model

Collating Fault

Classifier

Call depen- dency

accept/reject/FE

reject → reorder

Traces

Trace Ordering: linear running time

58

Page 59: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

1. push trace t0 in stack s0, …, trace tn-1 in stack sn-1

2. while (s0, … , sn-1) not empty: 3. q = peek_stack(s0, … , sn-1); q.sort(priority) 4. while True: 5. if q empty: raise FatalError 6. ij = q.dequeue(); 7. outcome = model_simulate(ij) 8. if outcome == ACCEPT: 9. ordered_trace.push(sj.pop()); break 10. elif outcome == REJECT: pass 11. elif outcome == FatalError: raise FatalError

Pseudocode and Analysis

O(L)

Best case: O(1) Worst case: O(n)

Overall: Best case O(L)

Worst Case O(n*L)

59

Page 60: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

1. push trace t0 in list s0, …, trace tn-1 in list sn-1

2. while (s0, … , sn-1) not empty: 3. q = peek_stack(s0, … , sn-1); q.sort(priority) 4. while True: 5. if q empty: raise FatalError 6. ij = q.dequeue(); 7. outcome = model_simulate(ij) 8. if outcome == ACCEPT: 9. ordered_trace.push(sj.pop()); break 10. elif outcome == REJECT: continue 11. elif outcome == FatalError: raise FatalError

Pseudocode and Analysis

Accept → Traverse

Reject → Backtrack60

Page 61: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

!Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3, ...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3,"Hello",..) =5 4. accept(3, ...) = 4 4. close(3) = 0 5. recv(4, "Hello", ..) = 5 • 6. close(4) = 0

NetCheck input

Syscall

61

Page 62: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

!Node A Node B 1. socket() = 3 1. socket() = 3 2. bind(3, ...) = 0 2. connect(3,...) = 0 3. listen(3, 1) = 0 3. send(3, "Hello",.) =5 4. accept(3, ...) = 4 4. close(3) = 0 5. recv(4, "Hello", ..) = 5 • 6. close(4) = 0

NetCheck input

Syscall

62

Page 63: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

Order 1 A1 bind(3, ...) = 0 A2 listen(3, 5) = 0 B1 connect(3, ...) = 0 !

Order 2 A1 bind(3, ...) = 0 B1 connect(3, ...) = -1 ECONNREFUSED A2 listen(3, 5) = 0 !

Order 3 B1 connect(3, ...) = -1 ECONNREFUSED A1 bind(3, ...) = 0 A2 listen(3, 5) = 0

connect depends on listen

63

Page 64: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Middle boxes ○ Multiple unaccepted connections

⇒ client behind NAT in FTP

○ Missing connect on accepted connections → server behind NAT or port forwarding

○ Multiple connect non-standard failure → firewall filtering connections

○ Multiple connect to listening address get refused ○ Multiple non-blocking connect failure ○ Traffic sent to non-relevant addresses → NAT or 3rd

party proxy/traffic forwarding

Example Rules

64

Page 65: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● Middle boxes ○ Multiple unaccepted connections

⇒ client behind NAT in FTP

○ Missing connect on accepted connections → server behind NAT or port forwarding

○ Traffic sent to non-relevant addresses → NAT or 3rd party proxy/traffic forwarding

● TCP ○ select/poll timeout ○ send data after connection closed

Example fault classifier rules

65

Page 66: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

• UDP o datagram sent/lost per connection o high datagram loss rate

⇒ non-transitive connectivity in VLC

• Misc o apps send data larger than default OS buffer size

⇒ bug report from VirtualBox bug tracker

o returned IP different from bind ⇒ simultaneous net disconnect/reconnect in Pidgin

○ Skype attempted to close socket from a different thread

Example rules (cont.)

66

Page 67: NetCheck: Network Diagnoses from Blackbox Traces · 2019. 12. 30. · NetCheck: Network Diagnoses from Blackbox Traces Yanyan Zhuang *^, Eleni Gessiou*, Fraida Fund*, Steven Portzer@,

● FTP ○ All reverse connections from server lost

■ Client behind NAT ● Pidgin

○ getsockname returns different IP ■ Client poor connection results in IP changes

● Skype ○ Poor call quality, msg drop

■ Network delay, NAT ■ Skype closes socket from different thread

● VLC ○ Packet loss

■ Non-transitive connectivity issue

Evaluation: Everyday Applications

67