PAM 2010 Zurich, Switzerland
Network DVR:A Programmable Framework forApplication-Aware Trace Collection
Chia-Wei Chang*, Alex Gerber+, Bill Lin*, Shubho Sen+, Oliver Spatscheck+
*University of California, San Diego+AT&T Labs-Research
April 9, 2010
PAM 2010 Zurich, Switzerland Slide 2/18
Introduction• Network traces are essential for wide range of
network applications:– e.g., traffic analysis, network measurement, performance
monitoring, security analysis …
• Existing capture tools typically focus on recording packets based on simple packet header rules(e.g., port numbers):
– Often only capture packet headers, not content
• However, for certain applications, operators would like to record content selectively based on application requirements
– Recording all packets is not practical
PAM 2010 Zurich, Switzerland Slide 3/18
Motivated Application Example• Consider Snort for intrusion detection
– Snort can identify when an intrusion has occurred and raise the alarm
– e.g., for the following truncated Snort FTP policy rule
– Suppose we would like to record the character sequence that raised the alarm, what’s the challenge?
User-Agent:[^\n\r]+WebshotsNetClient
this means 1 or more of any charactersexcept newline or carriage return
PAM 2010 Zurich, Switzerland Slide 4/18
Challenges• “False positive” problem
– Naïve approach: Record from the beginning– Most initial partial matches eventually lead to “false positives”,
consuming resources unnecessarily– may match User-Agent: in the beginning, but eventually fail to
match WebshotsNetClient– Packet recording at line-speed is very resource intensive
• Need better way to start/stop recording– Different applications have different recording requirements– Need flexible programmability to capture application-specific
“Content-of-Interest”
Eg: User-Agent:[^\n\r]+WebshotsNetClient
PAM 2010 Zurich, Switzerland Slide 5/18
Impact of False Positive problem• Example experiment
– Real data collected on a trunk of four Gigabit Ethernet link(4 Gb/s aggregate) at large enterprise gateway over60 minute period
– Consider all FTP regular expression rules from Snort 2007
Using only IP-address and TCP-port filtering 900 MB
38 MB
Actual relevant recording 31 KB
Amount of Memory Copies
Eager recording(begin immediately when a rule starts to match)
Eager vs. Actual recording ratio > 1200xMost initial recordings lead to false positives
PAM 2010 Zurich, Switzerland Slide 6/18
Our Solution: Network DVR• Loosely analogous to concept of “Digital Video
Recorder” (DVR) for TV recording where users can program what content to record …
• Network DVR enables users to program the packet recorder using their own regular expression rules
– e.g. when to start recording, when to stop …
• Network DVR uses a concept of “Triggered-Recording” to minimize the impact of false positives
PAM 2010 Zurich, Switzerland Slide 7/18
Triggered-Recording Example• Suppose we want to record all domain names
corresponding to http requests to educational URLs– i.e., those of form http://.*\.edu, for example for tracking the
top 100 most popular educational web sites
• Basic idea of Triggered-Recording– Define 3 classes of matching rules: Start, Abort, and Final.– Start rule: e.g., http://
• Only start recording after some “start rule” has been matched – avoids false-starts when “h” or “ht” has been encountered, etc
– Abort rule: e.g., \.com and \.org• Stop recording if some “abort rule” has been encountered – enables
early garbage collection– Final rule: e.g., \.edu
• Stop recording and flush recorded character sequence to disk
PAM 2010 Zurich, Switzerland Slide 8/18
Triggered-Recording Example (cont’d)• Suppose we also want to record character sequences
that match the following signature:– DEL[^\r\n]*ATT
• Corresponding rules– Start rule: DEL– Abort rule: \r and \n– Final rule: ATT
PAM 2010 Zurich, Switzerland Slide 9/18
Features to design Network DVR• Deterministic Finite Automaton (DFA) used to match
incoming traffic against the 3 sets of trigger rules– Accept states correspond matched rules– Extend basic DFA to control the packet recorder and to
“remember” the state of the packet recorder (e.g. whether the recorder is currently recording)
– Consider matchings for a flow across “packet boundary” (need to maintain flow table and flow states in DFA)
• Start, abort, and final trigger rules are referred to as , , and rules and are grouped into 3 sets , , and , respectively
PAM 2010 Zurich, Switzerland Slide 10/18
1. Trigger Rules Creation
Application-Specific Signatures Design Trigger Rules
= http:// = \.com = \.org = \.edu
= Del = \r = \n = ATT
Corresponding Rulesets
Ruleset Rules
StartΩ
= http:// = Del
AbortΩ
= \.com = \.org = \r = \n
FinalΩ
= \.edu = ATT
Monitor Application 1
Match http://.*\.eduDon’t Match http://.*\.comor http://.*\.org
Monitor Application 2
Match Del[^\r\n]*\.ATT
PAM 2010 Zurich, Switzerland Slide 11/18
2. Construct DFA to Implement Matching• Each accept state corresponds to 1 or more matched
rules from the 3 rulesets
1h
2t
3t
4p
5:
6/ /
9e 10d u
\. 12c 13o m
15o
16r g
18D 19e l
21A 22T T
\r
\n
8
0
\.from any state
\rfrom any state
Afrom any state
\nfrom any state
remaining
transitions
Dfrom any state
hfrom any state
7: 1
11: 1
14: 11
17: 12
20: 2
23: 2
24: 21
25: 22
Compiled DFACorresponding Rulesets
Ruleset Rules
StartΩ
= http:// = Del
AbortΩ
= \.com = \.org = \r = \n
FinalΩ
= \.edu = ATT
PAM 2010 Zurich, Switzerland Slide 12/18
3. Trigger Recording Behavior
DFA
Matching Index: MA2
Matching Index: MA1
1h
2t
3t
4p
5:
6/ /
9e 10d u
\.12
c13
o m
15o
16r g
18D
19e l
21A
22T T
\r
\n
8
\.
from any state
A from any state
remaining
transitions
Dfrom any state
hfrom any state
0
\rfrom any state
\nfrom any state
set v1; start recording
if (v1): reset v1; flush recording
if (v1): reset v1; abort recording
if (v1): reset v1; abort recording
set v2; start recording
if (v2): reset v2; flush recording
if (v2): reset v2; abort recording
if (v2): reset v2; abort recording
For each signatureMAi =αi βi1 βij γi1 γik, there is a corresponding variable vi that
gets set where its corresponding start rule has
been matched
Upon encounter an abort or final trigger rule, we check if there is an active recording
for this rule by testing vi. If it is set, the recording is either
aborted or flushed, respectively and reset vi. If it is not, ignore the matched abort or final trigger rule
PAM 2010 Zurich, Switzerland Slide 13/18
3. Trigger Recording Issues• How to efficiently examine the trigger condition?
– To test if vi is set for flow f, we can perform a hash lookup on the key f:vi, where the key is constructed by combining the flow ID f and the variable name vi.
– We can perform a hash insert (or lookup-then-insert)/hash delete with the key f:vi to set/delete vi for flow f.
• Multiple recording requests may be triggered?– Worst-case bound on memory bandwidth/processing time is O(N), where
N = total number of signatures. – Network DVR uses single aggregated recording string for each flow to
guarantee always one memory copy for each incoming symbol– By logging the recording-begin/end memory positions of each valid
matching result in the aggregated recording string, the system can output all recorded matching strings for each application-specific signature.
PAM 2010 Zurich, Switzerland Slide 14/18
4. Memory Management
NullFid 1Fid 5Fid 108
n
n
.
c
o
c
me
d
u
l
g
o
g
o
u
Null
DFAstate
DFAstate
DFAstate
Monitor Application 1
Match http://.*\.eduDon’t Match http://.*\.comor http://.*\.org
Monitor Application 2
Match Del[^\r\n]*\.ATT
.
c
s
d
Valid recording, send to output queue for flushing to disk Recycle memory cells
to free list Abort recording,recycle memory cells
Application-Specific Signatures Flow Table Memory Allocation Free List
::
• Need constant-time memory operations
PAM 2010 Zurich, Switzerland Slide 15/18
Evaluation Setup• Use real data traces from a large ISP
– Traces collected on a trunk with four Gigabit Ethernet links at a large enterprise Internet gateway by using IP-filtering
– Partitioned into 10 datasets of 60 min intervals– Each dataset has approx. 3,500 flows– For each dataset, with the given application-specific signatures, we replayed
the complete trace and calculated the number of memory copies that Network DVR needed vs. an eager approach in which recording starts when the first character of a regular expression is matched
• Use practical signatures from SNORT– Use Snort 2007 ftp signatures (58 regex) to evaluate the amount of
unnecessary memory copies that Network DVR can reduce by using the proposed triggered-recording concept
– Use public domain DFA implementation provided by Becchi and Crowley[Conext 2008] to serve as the matching module in Network DVR
PAM 2010 Zurich, Switzerland Slide 16/18
Evaluation Results
− “data” shows the size of the total incoming symbols– “eager ” begins copying symbols to memory when the first character of
a regular expression is matched– “netDVR” uses our triggered-recording approach– “actual” shows the memory needed to record the details of matching
results (SNORT signatures)
• Comparison on Memory Copy Times
PAM 2010 Zurich, Switzerland Slide 17/18
Evaluation Results (cont’d)
• Comparison on Memory Copy Times
Mem
ory
copy
tim
es (l
og
scal
e)
Data Sets
Reduction=eager/netDVR 500-800X
Overhead=netDVR/actual only 1.48-1.6X
PAM 2010 Zurich, Switzerland Slide 18/18
Conclusions• Proposed Network DVR as a programmable
application-aware packet recording system
• Proposed Trigger-Recording concept to minimize the impact of false positive recordings (partial recordings that will eventually fail)
• Experimental results on real data sets from large enterprise gateway demonstrate 500-800x reduction in memory copies