locality, network control and anomaly detectionweb.cecs.pdx.edu/~jrb/ourmon/ourmon.pdf2 outline...
TRANSCRIPT
Locality, Network Control and Anomaly Detection
Portland State University Computer Science
2
Outline
intro to ourmon, a network monitoring systemnetwork control and anomaly detection
theorypictures at an exhibition of attacksport signatures
Gigabit Ethernet - flow measurementwhat happens when you receive 1,488,000 64 byte packets a second?answer: not enough
more information
3
the network security problem
you used IE and downloaded a fileand executed it perhaps automagically
you used outlook and got emailand executed it. just click here to solve the password problem
you were infected by a worm/virus thingee trolling for holes which tries to spread itself with a known set of network-based attacksresult: your box is now controlled by a 16-year old boy in Romania, along with 1000 othersand may be a porn server, or is simply running an IRC client or server, or has illegal music, “warez”. on itand your box was just used to attack Microsoft or the Whitehouse
4
how do the worm/virus thingeesbehave on the network?
1. case the joint: try to find other points to infect:via TCP syn scanning of remote IPs and remote portsthis is just L4, TCP SYNS, no data at L7ports 135-139, 445, 80, 443, 1433, 1434 are popularports like 9898 are also popular
• and represent worms looking for worms (backdoors)
2. once they find youthey can launch known and unknown attacks at younow we have L7 data
3. this is a random process, although certainly targeting at certain institutions/networks is common4. a tool like agobot can be aimed at dest X, and then dest X can be “DOSSED/scanned” with a set of attacks
5
what the heck is going on out there?
Vern Paxson said: the Internet is not something you can simulateComer defined the term Martian
means a packet on your network that is unexpectedThe notion of well-known ports so you can state that port 80 is used *only* for web traffic is defunctmedia vendors “may” have had something to do with thisJohn Gilmore said: The Internet routes around censorshipWe have more and more P2P apps where 1. you don’t know the ports used, 2. things are less server-centric, and 3. crypto may be in use
6
a lot of this work was motivated by this problem
gee, where’s the “anomaly”?now tell me what caused it …and what caused the drops too
7
ourmon introduction
ourmon is a network monitoring systemwith some similarities/differences to
traditional SNMP RMON II• name is a take off on this (ourmon is not rmon)
Linux ntop is somewhat similar, Cisco netflowwe deployed it in the PSU DMZ a number of years ago (2001)
first emphasis on network stats• how many packets, how much TCP vs UDP, etc.• what the heck is going on out there?
recent emphasis on detection of network anomalies
8
PSU network
Gigabit Ethernet backbone including GE connection to Inet1 and Inet2
I2 is from University of Washington (OC-3)I1 is from State of Oregon university net (NERO)PREN(L3)/NWAX(L2) exchange in Pittock
350 Ethernet switches at PSU10000 live ports, 5-6k hosts4 logical networks: resnet, OIT, CECS, 802.1110 Cisco routers in DMZ
ourmon shows 15-30k packets per second in DMZ (normally), 2G packets per day
9
PSU network DMZ Simplified overview
DMZ net
Fricka: Pittock PSU router at NWAX exchange
gig E link
local DMZ switch
MCECS PUBNET OIT RESNETall (often redundant) L3 routers
instrumentation including ourmon front-end
Firewall ACLs exist for the most part on DMZ interior routersourmon.cat.pdx.edu/ourmon is within MCECS network
10
ourmon architectural overview
a simple 2-system distributed architecturefront-end probe computer – gather statsback-end graphics/report processor
front-end depends on Ethernet switch port-mirroring
like Snortdoes NOT use ASN.1/SNMPsummarizes/condenses data for back-end
data is ASCIIcp summary file via out of band technique
micro_httpd/wget, or scp, or rsync, or whatever
11
ourmon current deployment in PSU DMZ
Cisco gE switch
A. gigabit connection toexchange router (Inet)
B. to Engineering C. to dorms, OIT, etc.
2 ourmon probeboxes (P4)
port-mirror of A, B, C
ourmongraphicsdisplaybox
Inet
gE
12
ourmon architectural breakdown
probe box/FreeBSD graphics box/BSDor linux
ourmon.confconfig file
runtime:1. N BPF expressions2. + topn (hash table) offlows and other things(lists)3. some hardwired C filters(scalars of interest)
pkts from NIC/kernel BPFbuffer
mon.litereport file
outputs:1. RRDTOOL strip charts2. histogram graphs 3. various ASCII reports,
hourly summaries
tcpworm.txt
13
the front-end probe
written in Cinput file: ourmon.conf (filter spec)
1-6 BPF expressions may be grouped in a named graph, and count either packets or bytessome hardwired filters written in Ctopn filters (generates lists, #1, #2, ... #N)3 kinds of topn tuples: flow, TCP syn, icmp/udperror
output files: mon.lite, tcpworm.txtsummarization of stats by named filter ASCII, but usually small (current 5k)
14
the front-end probe
typically use 7-8 megabyte kernel BPF bufferwe only look at traditional 68 byte snap size
a la tcpdumpmeaning L2-L4 HEADERS only, not datathis is starting to change
at this point due to hash tuning we rarely drop packets
barring massive syn attacksfront-end basically is 2-stage
gather packets and count according to filter typewrite report at 30-second alarm period
15
ourmon.conf filter types
1. hardwired filters are specified as:fixed_ipproto # tcp/udp/icmp/other pktspacket capture filter cannot be removed2. 1 user-mode bpf filter (configurable)bpf “ports” “ssh” “tcp port 22”bpf-next “p2p” “port 1214 or port 6881 or ...”bpf-next “web” “tcp port 80 or tcp port 443”bpf-next “ftp” “tcp port 20 or tcp port 21”3. topN filter (ip flows) is justtopn_ip 60 meaning top 60 flows
16
mon.lite output file roughly like this:
pkts: caught:670744 : drops:0: fixed_ipproto: tcp:363040008 : udp:18202658 : icmp:191109 : xtra:1230670:bpf:ports:0:5:ssh:6063805:p2p:75721940:web:102989812:ftp:7948:email:1175965:xtra:0topn_ip : 55216 : 131.252.117.82.3112->193.189.190.96.1540(tcp): 10338270 : etcuseful for debug, but not intended for human use
17
back-end does graphics
written in perl, one main program but there are now a lot of helper programsuses Tobias Oetiker’s RRDTOOL for some graphs (hardwired and BPF): integers
as used in cricket/mrtg, other apps popular with network engineerseasy to baseline 1-year of datalogs (rrd database) has fixed size at creation
top N uses histogram (our program) plus perl reports for topn data and other things
hourly/daily summarization of interesting things
18
hardwired-filter #1: bpf counts/drops
this happens to be yet another SQL slammer attack.front-end stressed as it lost packets due to the attack.
19
2. bpf filter output example
note: xtra means any remainder and is turned off in this graph.note: 5 bpf filters mapped to one graph
20
3. topN example (histogram)
21
ourmon has taught us a few hard facts about the PSU net
P2P never sleeps (although it does go up in the evening)
Internet2 wanted “new” apps. It got bittorrent.PSU traffic is mostly TCP traffic (ditto worms)web and P2P are the top apps
bittorrent/edonkeyPSU’s security officer spends a great deal of time chasing multimedia violations ...
Sorry students: Disney doesn’t like it when you serve up ShrekIt’s interesting that you have 1000 episodes of Friends on a server, but …
22
current PSU DMZ ourmon probe
has about 60 BPF expressions grouped in 16 graphs
many are subnet specific (e.g., watch the dorms)some are not (watch tcp control expressions)
about 7 hardwired graphsincluding a count of flows IP/TCP/UDP/ICMP, and a worm-counter …
topn graphs include (5 fundamentally):TCP syn’ners, IP flows (TCP/UDP/ICMP), top ports, ICMP error generators, UDP weighted errorstopn scans, ip src to ip dst, ip src to L4 ports
23
ourmon and intrusion detection
obviously it can be an anomaly detectorMcHugh/Gates paraphrase: Locality is a paradigm for thinking about normal behavior and “Outsider” threat
or insider threat if you are at a university with dormsthesis: anomaly detection focused on
1. network control packets; e.g., TCP syns/fins/rsts2. errors such as ICMP packets (UDP)3. meta-data such as flow counts, # of hash inserts
this is useful for scanner/worm finding
24
inspired by noticing this ...
mon.lite file (reconstructed), Oct 1: 2003:note 30 second flow counts (how many tuples)
topn_ip: 163000:topn_tcp: 50000topn_udp: 13000topn_icmp: 100000
normal icmp flow count: 1000/30 seconds
We should have been graphing the meta-data (the flow counts).Widespread Nachi/Welchia worm infection in PSU dorms
oops ...
25
actions taken as a result:
we use the BPF/RRDTOOL to graph:1. network “errors” TCP resets and ICMP errors2. we graph TCP syns/resets/fins3. we graph ICMP unreachables (admin prohibit, host unreachable etc).
we have RRDTOOL graphs for flow meta-data:topN flow countstcp SYN tuple weighted scanner numbers
topn syn graph now shows more infobased on ip src, sorts by a work weight, shows SYNS/FINS/RESETS
and the portreport …based on SYNs/2 functions
26
more new anomaly detectors
the TCP syn tuple is a rich vein of informationtop ip src syn’ners + front-end generates 2nd outputtcpworm.txt file (which is processed by the back-end)port signature report, top wormy ports, statswork metric is a measure of TCP efficiencyworm metric is a set of IP srcs deemed to have too many syns
topN ICMP/UDP error weighted metric for catching UDP scanners
27
daily topn reports are useful
top N syn reports show us the cumulative synners over time (so far today, and days/week)if many syns, few fins, few resets
• almost certainly a scanner/worm (or trinity?)many syns, same amount of fins, may be a P2P app
ICMP error statsshow up both top TCP and UDP scanning hostsespecially in cumulative report logs
both of the above reports show MANY infected systems (and a few that are not)
28
front-end syn tuple
TCP Syn Tuple produced by the front-end(ip src, syns sent, fins received, resets received, icmp errors received, total tcp pkts sent, total tcp pkts received, set of sampled ports)There are three underlying concepts in this tuple:
1. 2-way info exchange2. control data plane3. attempt to sample TCP dst ports, with counts
29
and these metrics used in particular places
TCP work weight: (0..100%)IP src: S(s) + F(r) + R(r)/total pktsTCP worm weight: low-pass filter IP src: S(s) - F(r) > NTCP error weight:IP src: (S - F) * (R+ICMP_errors)UDP work metric: IP src: (U(s) - U(r)) * (ICMP_errors)
30
EWORM flag system
TCP syn tuple only:E, if error weight > 100000W, if the work weight is greater than 90%O (ouch), if there are few FINS returned and some SYNS, no FINS more or lessR (reset), if there are RESETSM (keeping mum flag), if no packets are sent back, therefore not 2-waywhat does that spell? (often it spells WOM)
31
6:00 am TCP attack - BPF net errors
32
topn RRD flow count graph
33
bpf TCP control
34
6 am TCP top syn (this filter is useful...)
35
topn syn syslog sort
start log time : instances: DNS/ip : syns/fins/resets total countsend log time----------------------------------------------------------------------------------------Wed Mar 3 00:01:04 2004: 777: host-78-50.dhcp.pdx.edu:401550:2131:2983Wed Mar 3 07:32:36 2004Wed Mar 3 00:01:04 2004: 890: host-206-144.resnet.pdx.edu:378865:1356:4755Wed Mar 3 08:01:03 2004Wed Mar 3 00:01:04 2004: 876: host-245-190.resnet.pdx.edu:376983:1919:8041Wed Mar 3 08:01:03 2004Wed Mar 3 00:01:04 2004: 674: host-244-157.resnet.pdx.edu:348895:
:8468:29627Wed Mar 3 08:01:03 2004
36
1st graph you see in the morning:
37
BPF: in or out?
38
BPF ICMP unreachables
39
hmm... size is 100.500 bytes
40
flow picture: UDP counts up
41
bpf subnet graph:
OK, it came from the dorms (this is rare ..., it takes a strong signal)
42
top ICMP shows the culprit’s IP
8 out of 9 slots taken by ICMP errors back to 1 host
43
the worm graph (worm metric)
44
port signatures in tcpworm.txt
port signature report simplified:ip src: flags work: port-tuples131.252.208.59 () 39: (25,93)(2703,2) ...131.252.243.162 () 22: (6881,32)(6882,34) (6883,5)...211.73.30.187 (WOM) 100: (80,11)(135,11)(139,11)(445,11)(1025,11)(2745,11)(3127,11)(5000,11)211.73.30.200 (WOM) 100: port sig same as above212.16.220.10 (WOM) 100: (135,100)219.150.64.11 (WOM) 100: (5554,81) (9898,18)
45
tcpworm.txt stats show:
on May 20, 2004top 20 ports for “W” metric are:445,135,5554,9898,80, 2745, 66671025,139,6129,3127,44444,2125,66667000,9900,5000,1433www.incidents.org reports at this time:port 135 traffic increase due to bobax.cports mentioned are 135,445,5000
46
Gigabit Ethernet speed testing
test questions: what happens when we hit ourmon and its various kinds of filters1. with max MTU packets at gig E rate?
can we do a reasonable amount of work?can we capture all the packets?
2. with min-sized packets (64 bytes)same questions
3. is any filter kind better/worse than any othertopn in particular (answer is it is worse)and by the way roll the IP addresses (insert-only)
47
Gigabit Ethernet - Baseline
acc. to TR-645-2, Princeton University, Karlin, Peterson, “Maximum Packet Rates for Full-Duplex Ethernet”:3 numbers of interest for gE
min-packet theoretical rate: 1488 Kpps (64 bytes)max-packet theoretical rate: 81.3 Kpps (1518 bytes)min-packet end-end time: 672 ns
note: the min-pkt inter-frame gap for gigabit is 96 ns (not a lot of time between packets ...)an IXIA 1600 packet generator can basically send min/max at those rates
48
test setup for ourmon/bpf measurement
ixia 1600 gE packetgenerator
1. min-sized pkts. 2 max-sized pkts
1.7 ghz AMD 2000 UNIX/FreeBSD64-bit PCI/syskonnect+ ourmon/bpf
ixia
Packet Enginesline-speedgigabit ethernet switch
UNIXpc
ixia sends/recvs packets
port-mirror of IXIA send port
49
test notes
keep in mind that ourmon snap length is 68 bytes (includes L4 headers, not data)
The kernel BPF is not capturing all of a max-sized packet
An IDS like snort must capture all of the pktit must run an arbitrary set of signatures over an individual packetreconstruct flowsundo fragmentationshove it in a database? (may I say… GAAAA)
50
maximum packets results
with work including 32 bpf expressions, top N, and hardwired filters:we can always capture all the packetsbut we need a N megabyte BPF buffer in the kernel
add bpfs, add kernel buffer size8 megabytes (true for Linux too)
this was about the level of real work we were doing at the time in the real world
51
minimum packet results
using only the basic count/drop filterNO OTHER WORK!
using any-size of kernel buffer (didn’t matter)we start dropping packets at around 80 mbitspeed (10% of the line rate with overhead)this is only with the drop/count filter!if you want to do real work, 30-50 mbitsmore like itchallenged by opponent that has 100mbit NIC card ... (or distributed zombie)
52
why is min performance so poor?
Two points-of-view that are complimentary.1. there is not enough time to do any real work (you have 500 ns or so)2. the bottom-half of the os is at HW priority, interrupts prevent the top-half from running (enough) to avoid drops. note that growing the kernel buffer doesn’t help
research question: what is to be done?btw: this is why switch/router vendors usually publish performance stats on min-sized pkts.
53
also top N has a problem
random inserts means bucket lookup always fails
followed by memory allocation/cache churnrandom IP src and/or random IP dst
how to deal with this?one obvious answer: make sure hash algorithm is optimized as much as possible
improved lookup certainly does helpinsert is logically: lookup to find correct bucket + insert (node allocation/setup/chaining)our hash bucket size was way too small ...
54
this explains my long standing question of
why does ourmon sometimes drop packets?in the drop/count graph
but you couldn’t find any reason for itreason: looking at BIG things like flows,not small things like TCP syn attacks
which do not add up to anything in the way of a MegaByte flowand may be distributed
small packets are evil...
55
ourmon gigabit test conclusions
min packets are a problemno filters and still overflowstopn needed optimization (now bpf needs optimization)does topn problem apply to route caching in routers?
a different parallel architecture is needed.for min pktsconsequences for IDS snort system are terrible
easy to construct a DOS attack that can sneak packets by snort 80 mbits of 64-byte packets will likely clog it.to say nothing of a concerted zombie attackless could always do the trick depending on the exact circumstances and the amount of work done in the monitor
56
current work
auto-capture packets with packet-capture probefront-end driven with simple triggerauto-capture the new big worm graph event
correlate ip sources across back-end topnreports; e.g., did ip src X show up in
top_udp flows, udp scanners, icmp errors?statistical analysis of the basic two TCP metrics
under normal and adverse conditionscan we build a prototype intrusion prevention system? how about a jail for “bad” hosts?
57
other ideas
simple L7 scanning to determine information about P2PIRC hacker control plane
what about traffic analysis in a more general sensewhat if all traffic is encrypted, L4 and up
signal graph correlation analysisRRDTOOL graphs can correlate in ourmon AND router/switch infrastructure, so can we find the switch port fast
what about saving the last N days worth of packets for analysis
tons of data. but given IP x infected, can I determine how thathappened? or was there an IRC correlation with DDOS?
58
more ourmon information
ourmon.cat.pdx.edu/ourmon - PSU stats and download pageourmon.cat.pdx.edu/ourmon/info.html - how it works (not totally out of date)talk in http://www.cs.pdx.edu/~jrb/ourmon/ourmon.pdf2 Technical reports … if you want to read them, write to me