1 issues in benchmarking intrusion detection systems marcus j. ranum

1

Issues in BenchmarkingIntrusion Detection Systems

Marcus J. Ranum<[email protected]>

2

IDS Benchmarking?

• How hard can it be to benchmark intrusion detection systems?– Very!– There are lots of ways to get it wrong

• Accidentally• Deliberately

– Avoiding doing it wrong does not necessarily mean you’ve done it right

3

What’s an IDS?

• IDS = Intrusion Detection System– Primary criterion for measurement is the

IDS’ ability to detect intrusions– Secondary criteria for measurement are

other issues:• False positives - false alarms• False negatives - real attacks that are missed• Performance impact - thruoughput delay or

CPU usage on host processor

4

Types of IDS

• Primary Types:– Network IDS (NIDS)– Host IDS (HIDS)

• Hybrid Types:– Per-Host Network IDS (PH-NIDS)– Load Balanced Network IDS (LB-NIDS)– Firewall IDS (FW-IDS)

5

Properties of: Network IDS

• Collect packets in promiscuous mode

• Issues:– Packet collection rate - what is the

maximum throughput?– Reassembly/defragmentation/reordering -

what about traffic spoofing?– Selective analysis - is the IDS choosing to

ignore some traffic in order to optimize?

6

Properties of: Host IDS

• Operate on host logs and processes– Sometimes forwards audit records to a

central for analysis

• Issues:– CPU usage on host– What about packet-oriented attacks?– Per-platform (individual) view of attacks -

single system is monitored per agent

7

Properties of: Per-Host Network IDS

• Network IDS “shim” layer inserted into network stack on each host

• Issues:– Has properties of a network IDS– But:

• Traffic is processed per-host only• Does not have same performance as NIDS• “Local” only view of traffic (but no drops)

8

Properties of: Load-Balanced Network IDS

• Use a load-balancing pre-processor to “spread” load across multiple NIDS

• Issues:– Can scale to “infinite” bandwidth– Total cost of solution is not single unit

pricing (requires switch + multiple NIDS)

9

Properties of: Firewall IDS

• Place network IDS capability in a firewall or bridge type device

• Issues:– No packet loss issues (retransmits take

care of packets that are lost)– (May) slow down network throughput

10

Other Issues

• Other things affecting speed and detection ability:– TCP fragment re-assembly– TCP packet re-ordering– TCP state/sequence tracking– Analyzing only selected sessions

11

Fragment Re-assembly

• Re-assembling fragments takes significant CPU time as well as memory to buffer packets– IDS can be negatively impacted by faked

fragments intended to consume extra memory

– How does IDS handle fragmented attacks? Simply alert “I see fragmented traffic” or de-fragment then apply IDS logic?

12

Packet Re-ordering

• Re-ordering packets requires significant CPU as well as memory for packet buffering – IDS can be impacted by unintentional or

deliberate packet drops since it tries to buffer out-of-sequence packets

– How does IDS handle re-ordering? Does it just flag out-of-sequence packets, or does it re-order then apply IDS logic?

13

TCP State Tracking

• Tracking TCP states requires maintaining per-session information– IDS is impacted by number of

simultaneous streams– IDS is impacted by randomized traffic– IDS is harder to fool with faked out-of-

sequence FIN packets

14

Analyzing Selected Sessions

• IDS can “optimize” performance by only reassembling or tracking TCP related with known signatures– IDS might have extremely good

performance against random traffic but poor performance against (e.g.) Web traffic

– Tradeoff is coverage versus performance; vendors do not usually document this

15

Naïve Simulation Network

TestNetwork

AttackGenerator

Target Host

AttackStream NIDS

16

What’s Wrong?

• The Naïve test network permits traffic that is not likely to be seen in a “real world” deployment - e.g.: ARP cache poisoning (you see a lot of this on DEFCON CTF networks)

• The presence of a router would “smooth” spikes somewhat and actually achieve higher sustained loads

17

Naïve Simulation Network #2

TestNetwork #2

Target Host

AttackStream NIDS

Routerw/somescreening

TestNetwork #1

AttackGenerator

SmartbitsLoadGenerator

18

What’s Wrong?

• SmartBits style traffic generators do not generate “real” TCP traffic– This penalizes IDS that actually look at

streams and try to reassemble them (which are desirable properties of a good IDS)

19

Skunking a Benchmark

TestNetwork

AttackGenerator

Target Hostw/Host-Net

AttackStream




20

What’s Wrong?

• Packet style counts are not relevant to host-network IDS

21

Skunking a Benchmark: #2

TestNetwork

AttackGenerator

Target Host

AttackStream


NIDS withselective detectionturned on

22

What’s Wrong?

• IDS with selective detection can be configured to only look at traffic aimed to local subnet– SmartBits style generators’ random traffic

largely gets seen and discarded

23

Effective Simulation Network

TestNetwork

Replayedpackets dumpedback onto network

NIDSRecorded attackand normal traffic onhard disk

24

What’s Wrong?

• Nothing:– Predictable baseline– Can verify traffic rate with simple math– Can scale load arbitrarily (use multiple

machines each with different capture data)– Traffic is real including “real” data contents– NID cannot be configured to watch a

specific machine (there are no targets)

25

Tools to Use

• Fragrouter - generates fragmented packets

• Whisker - generates out-of-sequence packets

• Pcap-pace - replays packets from a hard disk with original inter-packet timing

26

Summary

• It’s easy to skunk an intrusion detection benchmark

• It’s hard to design a good intrusion detection benchmark

• If you want to see if a given system works, the best way to find out is to try it on your actual network

1 issues in benchmarking intrusion detection systems marcus j. ranum

Documents