Anomaly/Intrusion Detection and Prevention in
Challenging Network Environments
1
Yan Chen
Department of Electrical Engineering and Computer Science
Northwestern University
Lab for Internet & Security Technology (LIST)
http://list.cs.northwestern.edu
2
3
The Spread of Sapphire/Slammer Worms
4
Current Intrusion Detection Systems (IDS)
• Mostly host-based and not scalable to high-speed networks– Slammer worm infected 75,000 machines in <10 mins– Host-based schemes inefficient and user dependent
• Have to install IDS on all user machines !
• Mostly simple signature-based – Inaccurate, e.g., with polymorphism – Cannot recognize unknown anomalies/intrusions
5
Current Intrusion Detection Systems (II)
• Cannot provide quality info for forensics or situational-aware analysis– Hard to differentiate malicious events with
unintentional anomalies• Anomalies can be caused by network element faults,
e.g., router misconfiguration, link failures, etc., or application (such as P2P) misconfiguration
– Cannot tell the situational-aware info: attack scope/target/strategy, attacker (botnet) size, etc.
6
Network-based Intrusion Detection, Prevention, and Forensics System
• Online traffic recording [SIGCOMM IMC 2004, INFOCOM 2006, ToN 2007] [INFOCOM 2008]– Reversible sketch for data streaming computation– Record millions of flows (GB traffic) in a few hundred KB– Small # of memory access per packet– Scalable to large key space size (232 or 264)
• Online sketch-based flow-level anomaly detection[IEEE ICDCS 2006] [IEEE CG&A, Security Visualization 2006]– Adaptively learn the traffic pattern changes – As a first step, detect TCP SYN flooding, horizontal and
vertical scans even when mixed
• Online stealthy spreader (botnet scan) detection [IEEE IWQoS 2007]
7
Network-based Intrusion Detection, Prevention, and Forensics System (II)
• Polymorphic worm signature generation & detection[IEEE Symposium on Security and Privacy 2006] [IEEE ICNP 2007]
• Accurate network diagnostics [SIGCOMM IMC 2003, SIGCOMM 2004, ToN 2007] [SIGCOMM 2006] [INFOCOM 2007 (2)]
• Scalable distributed intrusion alert fusion w/ DHT[SIGCOMM Workshop on Large Scale Attack Defense 2006]
8
Network-based Intrusion Detection, Prevention, and Forensics System (III)
• Large-scale botnet and P2P misconfiguration event situational-aware forensics [work under submission]
– Botnet attack target/strategy inference– Root cause analysis of the P2P misconfiguration/poisoning
traffic
• NetShield: vulnerability signature based NIDS for high performance network defense [work in progress]
• Vulnerability analysis of wireless network protocols and its defense [work in progress]
9
System Deployment• Attached to a router/switch as a black box• Edge network detection particularly powerful
Original configurationMonitor each port
separatelyMonitor aggregated
traffic from all ports
Router
LAN
Internet
Switch
LAN
(a)
Router
LAN
Internet
LAN
(b)
RANDsystem
scan
po
rtsc
an
port
Splitter
Router
LAN
Internet
LAN
(c)
Splitter
RA
ND
syst
em
Switch
Switch
Switch
Switch
Switch
HPNAIDMsystem
RANDsystem
NetShield: Matching with a Large Vulnerability Signature Ruleset for High
Performance Network Defense
11
Outline
• Motivation
• Feasibility Study: a Measurement Approach
• High Speed Parsing
• High Speed Matching for Large Rulesets.
• Evaluation
• Conclusions
12
Motivation• Desired Features for Signature-based
NIDS/NIPS– Accuracy (especially for IPS)– Speed– Coverage: Large ruleset
Regular Expression
Vulnerability
Accuracy Relative Poor
Much Better
Speed Good ??
Memory OK ??
Coverage Good ??
Shield[sigcomm’04]
Focus of this work
Cannot capture vulnerability condition well!
Vision of NetShield
13
14
Research Challenges
• Background– Use protocol semantics to express vulnerability– Protocol state machine & predicates for each
state– Example: ver==1 && method==“put” &&
len(buf)>300• Challenges
– Matching thousands of vulnerability signatures simultaneously
• Sequential matching algorithmic parallel matching– High speed parsing– Applicability for large NIDS/NIPS rulesets
15
Outline
• Motivation• Feasibility Study: a Measurement Approach
Given a large NIDS/NIPS ruleset, what percentage of the rules can be improved with protocol semantic vulnerability signatures?
• High Speed Parsing• High Speed Matching for Large Rulesets.• Evaluation• Conclusions
16
Measure Snort Rules
• Semi-manually classify the rules.1. Group by CVE-ID 2. Manually look at each vulnerability
• Results– 86.7% of rules can be improved by protocol semantic
vulnerability signatures. – Most of remaining rules (9.9%) are web DHTML and
scripts related which are not suitable for signature based approach.
– On average 4.5 Snort rules are reduced to one vulnerability signature.
– For binary protocol the reduction ratio is much higher than that of text based ones. • For netbios.rules the ratio is 67.6.
17
Outline
• Motivation
• Feasibility Study: a Measurement Approach
• High Speed Parsing
• High Speed Matching for Large Rulesets.
• Evaluation
• Conclusions
18
Observations
array
PDU• PDU parse tree
• Leaf nodes are integers or strings
• Vulnerability signatures mostly based on leaf nodes
• Observation 1: Only need to parse the fields related to signatures.
• Observation 2: Traditional recursive descent parsers which need one function call per node are too expensive.
19
Efficient Parsing with State Machines
• Pre-construct parsing state machines based on parsing trees and vulnerability signatures.
• Studied eight protocols: HTTP, FTP, SMTP, eMule, BitTorrent, WINRPC, SNMP and DNS as well as their vulnerability signatures.
• Common relationship among leaf nodes.
Varderive
Sequential Branch Loop Derive(a) (d)(c)(b)
VarVar
20
Example for WINRPC• Rectangles are states• Parsing variables: R0 .. R4
• 0.61 instruction/byte for BIND PDU
1 rpc_ver_minor
R4
20*R4
R2++R2£R3
R2 ‹- 0R3 ‹- ncontext
Header BindR0
R0
R1-16
Bind
Bind-ACK
R1
Bind-ACK
1 rpc_vers
1 pfc_flags
1 ptype
2 frag_length
4 packed_drep
6 merge1
1 n_tran_syn
2 ID
16 UUID
1 padding
tran_syn4 UUID_ver
1 ncontext
8 merge2
3 padding
merge3
21
Outline
• Motivation
• Feasibility Study: a Measurement Approach
• High Speed Parsing
• High Speed Matching for Large Rulesets.
• Evaluation
• Conclusions
22
A Matching Problem Example• Data representations
– For all the vulnerability signatures we studied, we only need integers and strings
– Integer operators: ==, >, <– String operators: ==, match_re(.,.), len(.).
• Example signature for Blaster worm
23
Matching Problem Formulation
• Suppose we have n signatures, each is defined on k matching dimensions (matchers)– A matcher is a two-tuple (field, operation) or a four-tuple for
the associate array elements.
• Challenges for Single PDU matching problem (SPM)– Large number of signatures n– Large number of matchers k– Large number of “don’t cares”– Cannot reorder matchers arbitrarily -- buffering constraint– Field dependency
• Arrays, associate arrays• Mutually exclusive fields.
24
Observations
• Observation 1: Most matchers are good. – After matching against them, only a small number of
signatures can pass (candidates). – String matchers are all good, and most integer matchers
are good. – We can buffer bad matchers to change the matching order.
• Observation 2: Real world traffic mostly does not match any signature. Actually even stronger in most traffic, no matcher is met.
• Observation 3: NIDS/NIPS will report all the matched rules regardless the ordering. Different from firewall rules.
25
Matching AlgorithmsTwo steps
1.Pre-computation decides the rule order and matcher order
2.For each matcher m, compare traffic w/ all the rules that involve m and filter/combine the candidate matching rules iteratively.
• Matcher Implementation– Integer range checking: Binary search tree– String exact matching: Trie– String regular expression: DFA, XFA, etc.– String length checking: Binary search tree
26
Step 1: Pre-Computation• Put the selective matchers earlier
• Observe buffering constraint & field arrival order
ER1
ER1 ER2
ER1 ER2 ER3 ...ER4
...
Good Matcher 1 Don’t care of Good Matcher 1
Extended byGood Matcher 2
Don’t care of both Good Matcher 1 & 2
Don’t care of all Good Matcher 1 to n
27
RB1: 1 2 3 RB2: 4 5 6
Step 2: Iterative Matching
RB1: 1 2 3 RB2: 4 5 6 RB3: 7 RB4: 8
RB1: 1 2 3
RB1: 1 2 3 RB2: 4 5 6 RB3: 7
S2 = S1 A2+B2 = {3} {}+{6} = {}+{6} = {6}
S3 = S2 A3+B3 = {6} {}+{} = {6}+{} = {6}
S4 = S3 A4+B4 = {6} {4}+{} = {6}+{} = {6}
RB1: 1 2 3 RB2: 4 5 6 RB3: 7 RB4: 8 RB5: 9
S5 = S4 A5+B5 = {6} {6}+{} = {6}+{} = {6}
S1= {3}
PDU={Method=POST, Filename=fp40reg.dll, VARs: name="file"; value~".*\.\./.*", Headers: name="host"; len(value)=450}
28
Refinement and Extension
• SPM improvement– Allow negative conditions– Handle array case– Handle associate array case– Handle mutual exclusive case– Report the matched rules as early as possible
• Extend to Multiple PDU Matching (MPM)– Allow checkpoints.
29
Outline
• Motivation
• Feasibility Study: a Measurement Approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for Large Rulesets.
• Evaluation
• Conclusions
Evaluation Methodology• Fully implemented and deployed to sniff a campus router
hosting university Web servers and several labs.• Run on a P4 3.8Ghz single core PC w/ 4GB memory.• Much smaller memory usage. E.g., http 791 vulnerability
sigs from 941 Snort rules:
DFA: 5.29 GB vs. NetShield 1.08MB
30
31
Stress Test Results• Traces from Tsinghua Univ. (TH) and Northwestern Univ.
(NU)• After TCP reassembly and preload the PDU in memory• For DNS we only evaluate parsing.• For WINRPC we have 45 vulnerability signatures which
covers 3,519 Snort rules• For HTTP we have 791 vulnerability signatures which
covers 941 Snort rules.
32
Conclusions
• A novel network-based vulnerability signature matching engine– Through measurement study on Snort ruleset,
prove the vulnerability signature can improve most of the signatures in NIDS/IPS.
– Proposed parsing state machine for fast parsing
– Propose a candidate selection algorithm for matching a large number of vulnerability signature simultaneously
33
With Our Solutions
Ongoing work: apply NetShield on Cisco signature ruleset
Regular Expression
Vulnerability
Accuracy Relative Poor
Much Better
Speed Good Even faster
Memory OK Better
Coverage Good Similar
Build a better Snort alternative
34
Backup
35
Observation
array
PDU• PDU parse tree
• Leaf nodes are integers or strings
• Vulnerability signature mostly based on leaf nodes
• Traditional recursive descent parsers (BINPAC) which need one function call per node are too expensive.
Only need to parsethe fields related to signatures
36
Limitations of Regular Expression Signatures
1010101
10111101
11111100
00010111
Our network
Traffic Filtering
Internet
Signature: 10.*01
XX
Polymorphic attack (worm/botnet) might not have exact regular expression based signature
Polymorphism!
37
Reason
Regular expression is not power enough
to capture the exact vulnerability condition!
Cannot express
exact condition
Can express
exact condition
REShield
X
38
Outline
• Motivation
• Feasibility Study: a measurement approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for massive vulnerability Signatures.
• Evaluation
• Conclusions
39
What Do We Do?
• Build a NIDS/NIPS with much better accuracy and similar speed comparing with Regular Expression based approaches– Feasibility: in Snort ruleset (6,735 signatures) 86.7%
can be improved by vulnerability signatures.– High speed Parsing: 2.7~12 Gbps– High speed Matching:
• Efficient Algorithm for matching a large number of vulnerability rules
• HTTP, 791 vulnerability signatures at ~1Gbps
40
Network based IDS/IPS
• Accuracy (especially for IPS)– False positive– False negative
• Speed• Coverage: Large ruleset
Regular Expression
Vulnerability
Accuracy Poor Much Better
Speed Good Good
Coverage Good Good
Regular expression is not power enough
to capture the exact vulnerability condition!