nids — pattern search vs. protocol decode

0167-4048/01$20.00 © 2001 Elsevier Science Ltd 37

NIDS (Network Intrusion Detection System) technology is more and more deployed by cor-porations as an adjunct to the firewall. It prov-ides increased visibility into what is really going on across the network wire. NIDS is not only usefulin detecting the intrusions themselves, but also in detecting errors in firewall configurations as security administrators discover events that, in theory, could not possibly happen. This paper con-trasts two of the most popular technologies for implementing NIDS: pattern-searching vs. protocol-analysis.

IntroductionNIDS has a long history going back to the mid-1980s.There are several classical views of NIDS, suchas differentiating between ‘anomaly detection’ (e.g.people logging in at 2 a.m.) and ‘misuse detection’(e.g. an intruder entering bad passwords in order toguess the right one).

However, in the late 1990s, a different type of NIDS started to appear that didn’t quite follow the classical architectures. Rather than focusing onhow the NIDS ought to be designed in theory,the NIDS was designed in a much more pragmatic“whatever I can get” fashion. Security experts used packet sniffers to eavesdrop on actual intru-sions, then piece together bits of code that wouldtrigger whenever those intrusion were seen again.The code needed to detect each intrusion was called

a ‘signature’. Most signatures were for clear misuses,some were for anomalies, and some for events thatare hard to classify in the classical models.

Pattern SearchThe basic technology to implement ‘signatures’ is to simply record a unique pattern and search for it within network traffic. You can think of this as virus scanning techniques applied to network traffic rather than files. This is called ‘pattern searching’, or more technically ‘network grep’.The technology is quite simple to explain.Take a standard packet sniffer (such as the free tcpdump) and pipe its output into a pattern search-ing system (such as the many free regular expressionparsers).

There are many free open-source variants of this technique, some of them which are as simple as‘tcpdump | grep xxx’. The best pattern searching system available is the open-source (i.e. free) Snort.The average Snort installation contains a list of about500 patterns to look for on the wire.Any packet thattriggers on one of those patterns will be captured and saved in an event file. In Snort parlance, these pattern signatures are known as “rules”. A sampleSnort rule is:

alert TCP $EXTERNAL any -> $INTERNAL80 (msg: “PHF”; flags: AP; content:“/cgi-bin/phf”;)

NIDS — Pattern Search vs.Protocol DecodeRobert GrahamNetwork ICE Corporation, 2121 South El Camino Real, Suite 1100, San Mateo, CA 94403, USA.

Computers & Security, 20 (2001) 37-41

NIDS — Pattern Search vs. Protocol Decode/Robert Graham

This rule will cause Snort to trigger whenever anHTTP request is made to the URL “/cgi-bin/phf”.PHF was a sample program shipped with many earlyWeb servers.A hacker could craft a special URL thatwould execute PHF and break into the system. Bychecking all incoming URLs against ‘/cgi-bin/phf ’,the system can find intruders. Since the majority ofsuch systems have long ago been patched or replaced,it no longer works against modern systems. The realreason the PHF signature triggers is due to broad-spectrum vulnerability scans against the Web server byintruders. Each of the other 500 rules in Snort looksimilar.They all look for patterns within packets thatmeet certain conditions.

Protocol AnalysisThere are a number of problems with blind patternsearch because it doesn’t truly understand the natureof the network traffic. The pattern ‘/cgi-bin/phf ’may appear in a packet for reasons not related to anattack, thereby causing the NIDS to trigger a ‘falsepositive’.

Rather than processing just the surface of the packets,a NIDS can be constructed to dig deeper into them,reconstructing the original meaning of the data. Thetechnological difference is similar to a packet-filteringfirewall and a proxy server. A packet filter understandsonly IP address and port numbers, a proxy server mustreconstruct fully the client and server ends of the com-munication.

This requires that a lot more code be written.The NIDS must essentially implement a completeWeb server in order to be able to detect signatures in an HTTP connection. Likewise, the NIDS must implement a full FTP server in order to detectFTP intrusions. There are potentially hundreds ofprotocols the NIDS must examine, each requiring anextensive amount of code to fully analyze what is going on.

One way to understand this is to use the same PHF exploit described in the section above. The text below shows a typical example of an HTTPrequest:

GET /index.html HTTP/1.0Host: www.robertgraham.comReferer: http://www.robertgraham. com/cgi-bin/phfUser-Agent: Mozilla/2.0

From the perspective of pattern-search NIDS, theHTTP header looks like a raw, meaningless stream ofcharacters, which you can visualize below:

GET/index.htmlHTTP/1.0Host:www.robert-graham.comReferer:http://www.robert-graham.com/cgi-bin/phfUser-Agent:Mozilla/2.0

However, the protocol-analysis NIDS applies moreintelligence, pulling apart each of the fields within theheader and assigning ‘meaning’ to them.This is shownin the following:

Method = GETURL = /index.htmlVersion = HTTP/1.0Fieldname = HostHTTP_HOST = www.robertgraham.comFieldname = RefererHTTP_REFERER = http://www.robert graham.com/cgi-bin/phfFieldname = User-AgentHTTP_USERAGENT = Mozilla/2.0

This example was chosen to highlight one of theissues with false-positives. This example is a requestfor ‘/index.html’. However, it also contains the string‘/cgi-bin/phf ’ elsewhere within the packet. ThisPHF pattern will be erroneously discovered by the blind pattern-search technique, but correctlyignored by the protocol-analysis technique. Whenlooking for hostile URLs, the protocol-analysisNIDS will only look in the URL field, and nowhereelse in the packet.

EvasionOne of the difficulties with NIDS is that you can’talways assume that intruders will be cooperative.Youmust assume that the intruder will be making attempts

38

39

to hide his/her tracks. In the field of NIDS, this isknown as evasion.

An example of evasion with HTTP would be to use ‘form-url-encoding’ with the URL. In this manner, the URL ‘/cgi-bin/phf ’ could be sent acrossthe wire as:

%2F%63%67%69%2D%62%69%6E%2F%70%68%66

In response to this, Snort includes a plugin that willautomatically decode such data before sending to thepattern-search subsystem. Most other NIDS have fol-lowed suite, even though they don’t contain fullparsers for HTTP.

Another example of evasion is analyzing SNMP(Simple Network Management Protocol) traffic. Acommon signature you might look for within thisprotocol is somebody attempting to access useraccounts on a Windows NT system. The raw Snortsignature for this is:

alert udp !$HOME_NET any -> $HOME_NET161 (msg:”NETBIOS-SNMP-NT-UserList”;content:”|2b 06 01 04 01 4d 01 0219|”;)

However, SNMP allows padding within the data.With some extra padding, the data on the wire wouldactually be sent as:

2b 80 06 80 01 80 04 80 01 80 4d 8001 80 02 80 19

Since the original pattern has been ‘smudged’, Snortwill no longer trigger on this attack. In contrast, aprotocol-analysis system will automatically unsmudgethe data back into a canonical form and correctlytrigger on the intrusion, no matter how much extrapadding is added to the data.

Most protocols allow similar sorts of encoding orsmudging that will hide the true signature of theintrusion.A protocol-analysis NIDS will automatical-ly handle this, but most pattern-search NIDS will failto detect the intrusion.

Performance Comparison

One of the reason engineers select pattern-searchingtechnology over protocol-analysis is the belief that itis faster. However, pattern-searching has actuallyproven to be slower in practice. (Protocol-analysisNIDS have come out on top in recent tests of NIDSspeed).

The misunderstanding comes from the association of ‘simple code’ with ‘fast code’, that few lines written of code equals few lines of code exe-cuted. However, although protocol-analysis has more lines of code, only a few of them are execu-ted per packet. Conversely, pattern-searchrequires the same lines of code to be executed overand over, resulting in more CPU resources beingused.

The best way to understand this is to look at the operation of linear searching. If you had to lookfor the word “syzygy” in this document, you wouldhave to look from top to bottom comparing eachword one-by-one. However, if I told you that this word was located in the third paragraph of the “Performance Comparison” section, your searchwould be dramatically quicker. The raw patternsearch logic is easy to describe to a child ‘just look at all words’, but carrying it out takes a long time.In contrast, explaining the protocol of section titles and paragraphs is more difficult, yet quicker tocarry out.

A protocol analysis IDS has what we call a ‘decisiontree’. These decisions are difficult to program into the system, but result in fewer operations carried out per packet because it quickly zeroes in on just the relevant information. Figure 1shows this graphically. The top shows pattern-search logic that loops over the same bit of code repeatedly. The bottom shows protocol-analysis that walks down a decision tree. While the total number of possible decisions it must theoreti-cally make is quite large, the total number act-ually made is quite small because following anybranch of the tree essentially ‘prunes’ the possibledecisions.

Computers & Security, Vol. 20, No. 1

NIDS — Pattern Search vs. Protocol Decode/Robert Graham

ComplexityOne of the issues with pattern-search NIDS is that itgets slower as more signatures are added to the system.If the number of signatures is doubled, the NIDS willrun half as fast. Therefore, a lot of time is spent pruning the signature database in order to improveperformance.

This is a classical computer science issue of ‘complex-ity’. We would label this problem has having O(n)complexity. A more desirable level of complexity isknown as O(logn). An example of this would be aphonebook.When you lookup somebody’s name, youfollow what is known as a ‘binary search tree’.To startwith, you split the phonebook in half at around theletter M. If the name starts with a letter between A-M, you know the name is in the first half of the phonebook. If the name starts with a letter between N-Z,you know it is in the second half of the phone book.Once you have selected the correct half of the phonebook, you select the middle of that section and repeatthe algorithm.You keep repeating this step until youzero in on the name.

The reason that O(logn) complexity is so desirable isthat if you double the size of the phonebook, you

only add one extra step. Once you have selected halfof the phonebook, you are now down to a problem ofthe original size.

Protocol-analysis allows easy O(logn) signature setswhere doubling the size of the rule-base has noappreciable affect on performance. Take for examplethe HTTP protocol-analyzer that breaks the packetdown into the URL. At that point, the URL can belooked up in an alphabetically sorted list of knownhostile URLs.This search works just like a phonebooksearch and is an O(logn) step.

The Downside of Protocol-AnalysisIt should be noted that protocol-analysis doesn’t comefree. In the HTTP example, the actual signaturematching is dramatically more accurate and faster, butthe system as a whole spends more time decoding theHTTP traffic. Therefore, the NIDS becomes highlysensitive to variations in traffic. If the network is filledwith lots little HTTP requests that download smallWeb-pages, then the NIDS will have to analyze lots ofHTTP headers. Conversely, if the requested Web-pages are large, then for the same amount of traffic, theNIDS will process a lot fewer headers.

Another example would be SNMP traffic. While testing a popular NIDS, I found that by sending 500 SNMP packets second, the NIDS would overload and eventually shutdown. In contrast, thesame NIDS could handle 20 000 packets/second ofHTTP traffic.

With a pattern-search NIDS, the entire NIDS isessentially a single subsystem.With a protocol-analysisNIDS, each protocol has its own little sub-NIDS. Allthe sub-NIDS will have different performance issuesunique to themselves. Just because the NIDS can han-dle 20 000 packets/second of HTTP traffic doesn’tmean it won’t be brought to its knees with 500 pack-ets/second of SNMP traffic. A raw pattern-searchingsystem doesn’t care about the nature of the traffic,whereas the protocol-analysis system may bog downunexpectedly.

40

Figure 1: Graphical representation of pattern-search(top) and protocol analysis (bottom).

41

Conclusion

If you want to design your own NIDS from scratch,then the easiest technology to use is certainly pattern-searching. However, it is my belief that protocol-anal-ysis is a better technology. Rather than skimming thesurface of network traffic, it digs down deep findingout what is truly going on. It is not only more accu-rate, but it is usually faster as well.

More and more NIDS are going the route of proto-col-analysis. As product vendors come up against thelimits of pattern-searching techniques, they are finding they sometimes have to add an analyzer for a

specific protocol in order to adequately analyze thesignature.An example of this is the HTTP preproces-sor in Snort, but similar examples can be found inother NIDS as well. Since each protocol is essentiallyan independent NIDS, they can easily be grafted ontoany existing NIDS. Protocol-analysis is probably theNIDS technology of the future.

Robert Graham is a long time developer of NIDS technology. His hasspent more than 10 years in the protocol analysis industry. He is current-ly the Chief Technology Officer (CTO) at Network ICE, a vendor of analmost pure protocol-analysis based NIDS, containing over 50 separateprotocol subsystems.

Computers & Security, Vol. 20, No. 1

nids — pattern search vs. protocol decode

Documents