frequency analysis of protocols dr. craig partridge bbn technologies
TRANSCRIPT
Frequency Analysis ofProtocols
Dr. Craig Partridge
BBN Technologies
An Emerging Field
Using techniques from signal processing to better understand networks and protocols
A quick tour of the work done to dateAlong with some highly speculative
thoughts about what might come next
An Overview of the Basic Concepts
Please note, I’m a systems person, not a mathematician. This talk structured for an intuitive understanding…
… although I’ll try to be rigorous where necessary
Step 1: Capture Packet Traces
Place taps or measuring devices in various spots in the network
For each transmission seen, capture Time Direction Duration Other stuff as desired
Network
tap
tap
tap
tap
Step 2: Trace to Signal
Trace is a discrete time series (time + data in non-uniform time increments)
Signal processing wants a time/amplitude series (often a uniform series)
Step 3: Run Feature Detection Algorithms over Signal
The meat of the task….Indeed, the signal representation you chose
is largely dictated by the algorithm you wish to run
Various algorithms extract various types of information
Rest of the talk is a survey of what has been done
USC DDoS Attack
How many sources are attacking you?Capture attack packets
Convert to a uniform series x(t) = # of attack packets received in millisecond t
Condition signalSubtract mean x(t) outRemoves dominant frequency
DDoS Continued
Now do auto-correlation and compute spectral densityBasically looking for frequency variations in
the attack stream over timeA uniform source would show a single stable set
of frequencies
Spectral-density: a spectrum where you show the power at each frequency
Wavelet-based Approach
Huang, Feldmann, WillingerFinding time structures in tracesCapture packet traces at some point
Divide into conversations/flowsUse source/destination/prefix info to do divisionDivide according to what class of traffic you wish to
analyzeConvert traces to uniform signal of 0/1
More Wavelet
Compute an energy functionCompute discrete Haar wavelet transformEnergy function measures wavelet
coefficientsLow coefficients reveal regular or periodic
structure in time series
Use energy graphs to reveal periodic structure
Lomb Periodogram
Cousins, Krishnan, PartridgeSimilar to wavelet approachLomb periodogram:
designed for non-uniform signal traces [ideal for packets]
Computes spectral power at each frequency
Lomb Example
Example Results• Identified CBR Send Rates• Identified FTP Round Trip Times• Characteristics from all three flows observed
Node ID Application Frequencies
(Hz)XX0 –X00 1.0 X01 4.88X02 1.0 4.51X03 1.0
X04 1.0X05 4.88X06 1.0X07 1.0X08 4.88X09 1.0X10 4.88X11 1.0 4.51X12 1.0Xp0 1.0 24.41Xp1 1.0 24.41
Xp2 1.0 24.41Xp3 1.0 24.41Xp4 1.0 14.64Xp5 1.0 14.64
Green: Correct DetectionRed: Missed Detection
Data: 18 nodes, tcpdumpResults: • Detected 6 out of 6 application frequencies emitted• Detected 15 out of 27 traffic generators• Missed most generators emitting at 1 Hz
Spectral Techniques easily show periodic application traffic on the network
Lomb Analysis of 802.11b Data
24.41 Hz
A Pause to Comment
All three approaches mentioned so far have the characteristic thatWe can detect timing structure from our dataIf we have ground-truth, we can show how
the timing structure we find relates to the timing structures in the network
But, without ground-truth, we can’t say for sure what the structure means
Topology Discovery
Techniques where we can show a valuable set of results, without ground-truth to interpret
Discover links in a network (wireless)CoherenceCausality
Given complete map, which links are used?Route discovery
Coherence
Take samples of the time series at different points in the network
Compare them, offset in timeLook for statistically significant
relationships between their spectral peaks
A Sketch of the Coherence Math
Compute the Discrete Fourier TransformThis gives you a series of equally spaced points in a
spectrumThe Cross Spectral Density is an averaged
product (for each of the points in the spectrum) of the DFT of one series with the complex conjugate of the DFT from another series
Normalize the CSD to 0…1 to get coherence
Coherence Plots
0 500 1000
0 500 10000
0.5
1
0 500 10000
0.5
1
0 500 10000
0.5
1
0 500 10000
0.5
1
0 500 10000
0.5
1
Coherence (n0, n1)
Coherence (n0, n2) Coherence (n1, n2)
Coherence (n0, n3) Coherence (n1, n3) Coherence (n2, n3)
Coherence Comments
Coherence worksNicely tracks moving nodes
But coherence gets confusedFor instance, confusion over applications
with similar periodicitySometimes skips hop in path
Causality
Instead of related spectra, try relating individual transmissions to transmissions that came before
Define a weight function W that estimates the likelihood that event k came from a prior transmission by node i
Then the probability that an event at node i caused k is:
€
k
i
p = k
i
Wk
j
Wj=1
n
∑
Topology Discovery Now create a conversation matrix Consider C which is the set of all events at a particular node
i. The probability that node j is sending to node i is:
€
ijp =
1
iCe
j
We
l
Wl=1
n
∑e∈Ci
∑
These values define a matrix Row x is probabilities that x is sending to each of the nodes Column x is probability that x is receiving from each of the nodes
N.B.: Probability can be computed incrementally over C
Comments on Causality
Core idea: Over the course of a number of events, the probability function
will give enough more weight to correct sources to yield a good conversation matrix
Current W is pretty simple Exponential (Poisson) focused on most recent event Self similarity not a problem until we look fairly deep back in
time May need a more expensive weight function
Very fast… (real time analysis)
Egress Nodes
Extend the causality equation For each event, compute 1 minus maximum weight: the egress
weight I.e. figure weighting algorithm correctly identified source of
event, if present. If no source, this inverse will be large Define a new column of the conversation matrix that
contains the normalized average of the egress weight. Large values flag egress nodes
Egress Example
QuickTime™ and aYUV420 codec decompressorare needed to see this picture.
Stitching
Once egress nodes identified, it is possible to connect graphs efficiently
Each probe shares with its neighboring probes the traffic traces from its egress nodesTraces are combined to create a single trace
between each set of pairsRerun the topology algorithm with the
additional trace and see if a link appears
Stitching Example
QuickTime™ and aYUV420 codec decompressorare needed to see this picture.
Thoughts on Egress and Stitching
Extensions to causality analysisEgress is highly dependent on the
weighting function
End-to-End Route Discovery
Discover end-to-end paths between communicating hosts (src and dst) Route: A path or sequence of links (src to dst) There may be multiple paths – need the path actually taken by data
from src to dst Require identification of active links
Can do receiver identification from conversation matrix Choose shortest paths
Break ties using “aggregate path coherence” Coherence between steps in each path
End result: Layer 3 (network) connectivity – Routing Tables
Some Thoughts
Progress is likely to be rapidBetter techniques
Match and latchMax-plus
Timing structure is remarkably robustE.g. Lomb showed frequency of traffic that
wasn’t visible