minds: data mining based network intrusion detection system vipin kumar [email protected] army high...

30
MINDS: Data Mining Based Network Intrusion Detection System Vipin Kumar [email protected] Army High Performance Computing Research Center University of Minnesota http://www.cs.umn.edu/research/minds/ Team Members: Eric Eilertson, Paul Dokas, Levent Ertoz, Ben Mayer, Aleksandar Lazarevic, Michael Steinbach, George Simon, Varun Chandola, Mark Shaneck, Jaideep Srivastava, Zhi-Li Zhang, Yongdae Kim, Vipin Kumar 1 AHPCRC

Upload: guy-start

Post on 14-Dec-2015

229 views

Category:

Documents


8 download

TRANSCRIPT

MINDS: Data Mining Based Network Intrusion Detection System

Vipin [email protected]

Army High Performance Computing Research Center University of Minnesota

http://www.cs.umn.edu/research/minds/

Team Members: Eric Eilertson, Paul Dokas, Levent Ertoz, Ben Mayer, Aleksandar Lazarevic, Michael Steinbach, George Simon, Varun Chandola, Mark Shaneck, Jaideep Srivastava, Zhi-Li Zhang, Yongdae Kim, Vipin Kumar

1AHPCRC

2AHPCRC

Information Assurance

Sophistication of cyber attacks and their severity is increasing

ARL, the Army, DOD and Other U.S. Government Agencies are major targets for sophisticated state sponsored cyber terrorists Cyber strategies can be a major force

multiplier and equalizer Across DoD, computer assets have

been compromised, information has been stolen, putting technological advantage and battlefield superiority at risk

Security mechanisms always have inevitable vulnerabilities Firewalls are not sufficient to ensure

security in computer networks Insider attacks

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

1 2 3 4 5 6 7 8 9 10 11 12 13 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002

Spread of SQL Slammer worm 10 minutes

after its deployment

Incidents Reported to Computer Emergency Response

Team/Coordination Center

3AHPCRC

Information Assurance

www.snort.org

Example of SNORT rule

(MS-SQL “Slammer” worm)

any -> udp port 1434 (content:"|81 F1 03 01 04 9B 81 F1 01|"; content:"sock"; content:"send")

Intrusion Detection System – Combination of software and hardware that attempts to

perform intrusion detection

– Raises the alarm when possible intrusion happens

• Traditional intrusion detection system IDS tools are based on signatures of known attacks

Limitations– Signature database has to be manually revised

for each new type of discovered intrusion

– Substantial latency in deployment of newly created signatures across the computer system

– They cannot detect emerging cyber threats

– Not suitable for detecting policy violations and insider abuse

– Do not provide understanding of network traffic

– Generate too many false alarms

4AHPCRC

Data Mining for Intrusion DetectionData Mining for Intrusion Detection

Increased interest in data mining based intrusion detection– Attacks for which it is difficult to build signatures

– Unforeseen/Unknown/Emerging attacks

• Misuse detection– Building predictive models from labeled labeled data sets (instances

are labeled as “normal” or “intrusive”) to identify known intrusions

– High accuracy in detecting many kinds of known attacks

– Cannot detect unknown and emerging attacks

• Anomaly detection– Detect novel attacks as deviations from “normal” behavior

– Potential high false alarm rate - previously unseen (yet legitimate) system behaviors may also be recognized as anomalies

5AHPCRC

Data Mining for Intrusion DetectionData Mining for Intrusion Detection

Misuse Detection – Building Predictive Models

categoric

al

tem

poral

continuous

class

ModelLearn

Classifier

Tid SrcIP Start time

Dest IP Dest Port

Number of bytes

Attack

1 206.135.38.95 11:07:20 160.94.179.223 139 192 No

2 206.163.37.95 11:13:56 160.94.179.219 139 195 No

3 206.163.37.95 11:14:29 160.94.179.217 139 180 No

4 206.163.37.95 11:14:30 160.94.179.255 139 199 No

5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes

6 206.163.37.95 11:14:35 160.94.179.253 139 177 No

7 206.163.37.95 11:14:36 160.94.179.252 139 172 No

8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes

9 206.163.37.95 11:14:41 160.94.179.250 139 195 No

10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes 10

Tid SrcIP Start time

Dest Port Number of bytes

Attack

1 206.163.37.81 11:17:51 160.94.179.208 150 ?

2 206.163.37.99 11:18:10 160.94.179.235 208 ?

3 206.163.37.55 11:34:35 160.94.179.221 195 ?

4 206.163.37.37 11:41:37 160.94.179.253 199 ?

5 206.163.37.41 11:55:19 160.94.179.244 181 ?

categoric

al

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Summarization of attacks using association rules

Training Set

Test Set

Key Technical Challenges

Large data size

High dimensionality

Temporal nature of the data

Skewed class distribution

Data preprocessing

On-line analysis

Anomaly Detection

6AHPCRC

Data Mining for Intrusion DetectionData Mining for Intrusion Detection

categoric

al

tem

poral

continuous

class

ModelLearn

Classifier

Tid SrcIP Start time

Dest IP Dest Port

Number of bytes

Attack

1 206.135.38.95 11:07:20 160.94.179.223 139 192 No

2 206.163.37.95 11:13:56 160.94.179.219 139 195 No

3 206.163.37.95 11:14:29 160.94.179.217 139 180 No

4 206.163.37.95 11:14:30 160.94.179.255 139 199 No

5 206.163.37.95 11:14:32 160.94.179.254 139 19 Yes

6 206.163.37.95 11:14:35 160.94.179.253 139 177 No

7 206.163.37.95 11:14:36 160.94.179.252 139 172 No

8 206.163.37.95 11:14:38 160.94.179.251 139 285 Yes

9 206.163.37.95 11:14:41 160.94.179.250 139 195 No

10 206.163.37.95 11:14:44 160.94.179.249 139 163 Yes 10

Tid SrcIP Start time

Dest Port Number of bytes

Attack

1 206.163.37.81 11:17:51 160.94.179.208 150 ?

2 206.163.37.99 11:18:10 160.94.179.235 208 ?

3 206.163.37.55 11:34:35 160.94.179.221 195 ?

4 206.163.37.37 11:41:37 160.94.179.253 199 ?

5 206.163.37.41 11:55:19 160.94.179.244 181 ?

categoric

al

Anomaly DetectionRules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Rules Discovered:

{Src IP = 206.163.37.95, Dest Port = 139, Bytes [150, 200]} --> {ATTACK}

Summarization of attacks using association rules

Training Set

Test Set

Misuse Detection – Building Predictive Models

Key Technical Challenges

Large data size

High dimensionality

Temporal nature of the data

Skewed class distribution

Data preprocessing

On-line analysis

Anomaly Detection

7AHPCRC

MINDS – MINDS – MMinnesota innesota ININtrusion trusion DDetection etection SSystemystem

network

Data capturing device

Anomaly detection

……

Anomaly scores

Humananalyst

Detected novel attacks

Summary and characterization

of attacks

MINDS system

Known attack detection

Detected known attacks

Labels

Feature Extraction

Association pattern analysis

Filtering

Net flow tools

tcpdump

Data mining based intrusion detection system Incorporated into Interrogator architecture at ARL Center for Intrusion

Monitoring and Protection (CIMP) Helps analyze data from multiple sensors at DoD sites around the country MINDS anomalies are used as the primary key when viewing related alerts from

other tools (SNORT, Jids, etc.) MINDS is the first effective anomaly intrusion detection system used by ARL

Routinely detects attacks and intrusive behavior not detected by widely used intrusion detection systems

Insider Abuse / Policy Violations / Worms / Scans

8AHPCRC

Feature Extraction ModuleFeature Extraction Module

• Three groups of features– Basic features of individual TCP connections

• source & destination IP - Features 1 & 2• source & destination port - Features 3 & 4• Protocol Feature 5• Duration Feature 6• Bytes per packets Feature 7• number of bytes Feature 8

– Time based features• For the same source (destination) IP address, number of unique destination

(source) IP addresses inside the network in last T seconds – Features 9 (13)• Number of connections from source (destination) IP to the same destination

(source) port in last T seconds – Features 11 (15)– Connection based features

• For the same source (destination) IP address, number of unique destination (source) IP addresses inside the network in last N connections - Features 10 (14)

• Number of connections from source (destination) IP to the same destination (source) port in last N connections - Features 12 (16)

9AHPCRC

Detection of Anomalies on Real Network Data

Anomalies/attacks picked by MINDS include scanning activities, worms, and non-standard behavior such as policy violations and insider attacks. Many of these attacks detected by MINDS, have already been on the CERT/CC list of recent advisories and incident notes.

Some illustrative examples of intrusive behavior detected using MINDS at U of M

• Scans–Detected scanning for Microsoft DS service on port 445/TCP

• Undetected by SNORT since the scanning was non-sequential (very slow). Rule added to SNORT in September 2002

–Detected scanning for Oracle server• Undetected by SNORT because the scanning was hidden within another Web scanning

–Detected a distributed windows networking scan from multiple source locations

• Policy Violations–Identified machine running Microsoft PPTP VPN server on non-standard ports

• Undetected by SNORT since the collected GRE traffic was part of the normal traffic

–Identified compromised machines running FTP servers on non-standard ports, which is a policy violation

• Example of anomalous behavior following a successful Trojan horse attack

–Detected computers on the network apparently communicating with outside computers over a VPN or on IPv6

• Worms–Detected several instances of slapper worm that were not identified by SNORT since they were

variations of existing worm code–Detected unsolicited ICMP ECHOREPLY messages to a computer previously infected with

Stacheldract worm (a DDos agent)

–January 26, 2003 (48 hours after the “slammer” worm)

MINDS

score srcIP sPort dstIP dPort protocolflagspackets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1637674.69 63.150.X.253 1161 128.101.X.29 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 026676.62 63.150.X.253 1161 160.94.X.134 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.59 0 0 0 0 024323.55 63.150.X.253 1161 128.101.X.185 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 021169.49 63.150.X.253 1161 160.94.X.71 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019525.31 63.150.X.253 1161 160.94.X.19 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 019235.39 63.150.X.253 1161 160.94.X.80 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 017679.1 63.150.X.253 1161 160.94.X.220 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.81 0 0.58 0 0 0 0 08183.58 63.150.X.253 1161 128.101.X.108 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.58 0 0 0 0 07142.98 63.150.X.253 1161 128.101.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 05139.01 63.150.X.253 1161 128.101.X.142 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 04048.49 142.150.Y.101 0 128.101.X.127 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 04008.35 200.250.Z.20 27016 128.101.X.116 4629 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03657.23 202.175.Z.237 27016 128.101.X.116 4148 17 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 03450.9 63.150.X.253 1161 128.101.X.62 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 03327.98 63.150.X.253 1161 160.94.X.223 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02796.13 63.150.X.253 1161 128.101.X.241 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02693.88 142.150.Y.101 0 128.101.X.168 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02683.05 63.150.X.253 1161 160.94.X.43 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02444.16 142.150.Y.236 0 128.101.X.240 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02385.42 142.150.Y.101 0 128.101.X.45 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 02114.41 63.150.X.253 1161 160.94.X.183 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 02057.15 142.150.Y.101 0 128.101.X.161 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01919.54 142.150.Y.101 0 128.101.X.99 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01634.38 142.150.Y.101 0 128.101.X.219 2048 1 16 [2,4) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01596.26 63.150.X.253 1161 128.101.X.160 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01513.96 142.150.Y.107 0 128.101.X.2 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01389.09 63.150.X.253 1161 128.101.X.30 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01315.88 63.150.X.253 1161 128.101.X.40 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.82 0 0.57 0 0 0 0 01279.75 142.150.Y.103 0 128.101.X.202 2048 1 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01237.97 63.150.X.253 1161 160.94.X.32 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 01180.82 63.150.X.253 1161 128.101.X.61 1434 17 16 [0,2) [0,1829) 0 0 0 0 0 0 0 0 0.83 0 0.56 0 0 0 0 0

Anomalous connections that correspond to the “slammer” worm Anomalous connections that correspond to the ping scan Connections corresponding to UM machines connecting to “half-life” game servers

Typical Anomaly Detection OutputTypical Anomaly Detection Output

11AHPCRC

Summarization Using Association Patterns

Anomaly Detection System

attack

normal

R1: TCP, DstPort=1863 Attack

R100: TCP, DstPort=80 Normal

Discriminating Association

Pattern Generator

1. Build normal profile

2. Study changes in normal behavior

3. Create attack summary

4. Detect misuse behavior

5. Understand nature of the attack

update

Knowledge Base

Ranked connections

12AHPCRC

Typical MINDS Output

score c1 c2 src IP sPort dst IP dPort protocolflags packetsbytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

31.2 - - 218.19.X.168 5002 134.84.X.129 4182 6 27 [5,6) [0,2045) 0 0.01 0.01 0.03 0 0 0 0 0 0 0 0 0 0 1 0

3.04 138 12 64.156.X.74 ----- xxx.xxx.xxx.xxx----- xxx 4 [0,2) [0,2045) 0.12 0.48 0.26 0.58 0 0 0 0 0.07 0.27 0 0 0 0 0 0

15.4 - - 218.19.X.168 5002 134.84.X.129 4896 6 27 [5,6) [0,2045) 0.01 0.01 0.01 0.06 0 0 0 0 0 0 0 0 0 0 1 0

14.4 - - 134.84.X.129 4770 218.19.X.168 5002 6 27 [5,6) [0,2045) 0.01 0.01 0.05 0.01 0 0 0 0 0 0 1 0 0 0 0 0

7.81 - - 134.84.X.129 3890 218.19.X.168 5002 6 27 [5,6) [0,2045) 0.01 0.02 0.09 0.02 0 0 0 0 0 0 1 0 0 0 0 0

3.09 4 1 xxx.xxx.xxx.xxx4729 xxx.xxx.xxx.xxx----- 6 ------ --------- --------- 0.14 0.33 0.17 0.47 0 0 0 0 0 0 0.2 0 0 0 0 0

2.41 64 8 xxx.xxx.xxx.xxx----- 200.75.X.2 ----- xxx ------ --------- [0,2045) 0.33 0.27 0.21 0.49 0 0 0 0 0 0 0 0 0.28 0.25 0.01 0

6.64 - - 218.19.X.168 5002 134.84.X.129 3676 6 27 [5,6) [0,2045) 0.03 0.03 0.03 0.15 0 0 0 0 0 0 0 0 0 0 0.99 0

5.6 - - 218.19.X.168 5002 134.84.X.129 4626 6 27 [5,6) [0,2045) 0.03 0.03 0.03 0.17 0 0 0 0 0 0 0 0 0 0 0.98 0

2.7 12 0 xxx.xxx.xxx.xxx----- xxx.xxx.xxx.xxx113 6 2 [0,2) [0,2045) 0.25 0.09 0.15 0.15 0 0 0 0 0 0 0.08 0 0.79 0.15 0.01 0

4.39 - - 218.19.X.168 5002 134.84.X.129 4571 6 27 [5,6) [0,2045) 0.04 0.05 0.05 0.26 0 0 0 0 0 0 0 0 0 0 0.96 0

4.34 - - 218.19.X.168 5002 134.84.X.129 4572 6 27 [5,6) [0,2045) 0.04 0.05 0.05 0.23 0 0 0 0 0 0 0 0 0 0 0.97 0

4.07 8 0 160.94.X.114 51827 64.8.X.60 119 6 24 [483,-) [8424,-) 0.09 0.26 0.16 0.24 0 0 0 0.91 0 0 0 0 0 0 0 0

3.49 - - 218.19.X.168 5002 134.84.X.129 4525 6 27 [5,6) [0,2045) 0.06 0.06 0.06 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0

3.48 - - 218.19.X.168 5002 134.84.X.129 4524 6 27 [5,6) [0,2045) 0.06 0.06 0.07 0.35 0 0 0 0 0 0 0 0 0 0 0.93 0

3.34 - - 218.19.X.168 5002 134.84.X.129 4159 6 27 [5,6) [0,2045) 0.06 0.07 0.07 0.37 0 0 0 0 0 0 0 0 0 0 0.92 0

2.46 51 0 200.75.X.2 ----- xxx.xxx.xxx.xxx21 6 2 --------- [0,2045) 0.19 0.64 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 0 0

2.37 42 5 xxx.xxx.xxx.xxx21 200.75.X.2 ----- 6 20 --------- [0,2045) 0.35 0.31 0.22 0.57 0 0 0 0 0 0 0 0 0.18 0.28 0.01 0

2.45 58 0 200.75.X.2 ----- xxx.xxx.xxx.xxx21 6 ------ --------- [0,2045) 0.19 0.63 0.35 0.32 0 0 0 0 0.18 0.44 0 0 0 0 0 0

UM computer connecting to a remote FTP server, running on port 5002 Summarized TCP reset packets received from 64.156.X.74, which is a victim of

DoS attack, and we were observing backscatter, i.e. replies to spoofed packets Summarization of FTP scan from a computer in Columbia, 200.75.X.2 Summary of IDENT lookups, where a remote computer tries to get user name Summarization of a USENET server transferring a large amount of data

13AHPCRC

Typical MINDS Output

score c1 c2 src IP sPort dst IP dPort prot flags packets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14

611 - - 128.118.x.96 873 160.94.x.50 4529 6 ---AP--- [24k,124k][20M,182M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0

348 - - 160.94.x.50 4529 128.118.x.96 873 6 ---A---- [24k,124k][3M,5M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0

24 - - 128.101.x.33 20 200.95.x.2255001 6 ---AP--- [24k,124k][20M,182M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0

11 - - 24.223.x.59 1135 160.94.x.1 554 6 ---APRSF[338,379][15k,17k] 0.08 0.1 0.1 0.3 0 0 0 1 0 0 0 0 0 0

7.8 11 0 x.x.x.x 8200 160.94.x.154 --- 6 ---AP-SF [4,4] --- 0.36 0.4 0.7 0.1 0 0 0 0 0 0.2 0.1 0 0 0

10 - - 128.101.x.173 22 24.26.x.13 4949 6 ---AP--- [24k,124k][3M,5M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0

9.6 - - 128.101.x.113 20 81.168.x.40 ### 6 ---AP-SF [24k,124k][20M,182M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0

9.5 - - 192.18.x.40 ### 134.84.x.19 ### 6 ---AP--F [24k,124k][20M,182M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0

9.5 - - 192.18.x.40 ### 134.84.x.19 ### 6 ---AP--F [24k,124k][20M,182M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0

9.4 - - 24.33.x.62 2011 160.94.x.150 3989 6 ---AP-SF [217,217] [252k,265k] 0.16 0.2 0.3 0.2 0 0 1 1 0 0 0 0 0 0

7.8 13 1 x.x.x.x 8200 134.84.x.21 --- 6 ---AP-SF [4,4] --- 0.37 0.4 0.7 0.3 0 0 0 0 0 0.1 0 0 0 0

9.1 - - 24.33.x.62 2011 160.94.x.150 4010 6 ---AP-SF [217,217] [252k,265k] 0.16 0.2 0.3 0.1 0 0 1 1 0 0 0 0 0 0

9.1 - - 24.33.x.62 2011 160.94.x.150 3995 6 ---AP-SF [217,217] [252k,265k] 0.16 0.2 0.3 0.1 0 0 1 1 0 0 0 0 0 0

9.1 - - 24.33.x.62 2011 160.94.x.150 3992 6 ---AP-SF [217,217] [252k,265k] 0.16 0.2 0.3 0.1 0 0 1 1 0 0 0 0 0 0

9 - - 24.33.x.62 2011 160.94.x.150 4007 6 ---AP-SF [217,217] [252k,265k] 0.16 0.2 0.3 0.1 0 0 1 1 0 0 0 0 0 0

8.9 - - 24.33.x.62 2011 160.94.x.150 4004 6 ---AP-SF [218,234] [265k,309k] 0.16 0.2 0.3 0.1 0 0 1 1 0 0 0 0 0 0

8.9 - - 24.33.x.62 2011 160.94.x.150 4001 6 ---AP-SF [217,217] [252k,265k] 0.16 0.2 0.3 0.1 0 0 1 1 0 0 0 0 0 0

5.7 10 # 63.251.x.177 8200 x.x.x.x --- 6 ---AP-SF [4,4] --- 0.38 0.4 0.3 0.4 0 0 0 0 0 0 0.1 0 0 0

7.3 27 7 66.151.x.190 8200 x.x.x.x --- 6 ---AP-SF [4,4] [559,559] 0.39 0.4 0.7 0.2 0 0 0 0 0 0.2 0 0 0 0

UM computers doing bulk transfers Attack on Real-Media server (Reported by CERT on September 9, 2003,

RealNetworks media server RTSP protocol parser buffer overflow) 8200/tcp traffic related to gotomypc.com which allows users to remotely control a

desktop (involves a third party) Mysterious traffic currently being investigated

14AHPCRC

Typical MINDS Output

score c1 c2 src IP sPort dst IP dPort protocolflags packets bytes 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

57973 - - 128.101.X.1 56025 192.67.X.205 22 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

6530 - - 141.213.X.100 4354 160.94.X.142 59999 tcp ---AP-SF [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

3227 - - 192.67.X.206 43710 128.101.X.1 22 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

1534 - - 160.94.X.142 59999 141.213.X.100 4354 tcp ---A--SF [32k,1M][3M,8M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

19.3 9 67 193.62.X.38 ----- 160.94.X.132 ----- tcp ---A--SF --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.1 0.1 0 0.1 0 0.1 0 0 0

14.9 23 81 134.84.X.117 ----- xxx.xxx.xxx.xxx----- tcp ---AP--- --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.1 0.1 0 0.1 0 0.1 0 0 0

26.6 81 258 208.2.X.101 ----- xxx.xxx.xxx.xxx 139 tcp ------S- [4,4] --------- 0.2 0.3 0.3 0.4 0 0 0 0 0.1 0 0.1 0 0.1 0 0 0

88.2 5 1 208.2.X.101 ----- xxx.xxx.xxx.xxx 139 tcp ------S- [4,4] [200,200] 0 0.1 0 0.1 0 0 0 0 0 0 0 0 0 0 1 0

143 - - 160.94.X.132 35755 193.62.X.38 45288 tcp ---A---F [32k,1M][1M,3M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

117 - - 144.34.X.164 1676 128.101.X.190 22 tcp ---A---- [32k,1M][1M,3M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

13.4 4 31 128.101.X.204----- xxx.xxx.xxx.xxx----- tcp ---A---F --------- --------- 0.3 0.3 0.3 0.5 0 0 0 0.2 0.1 0 0.1 0 0 0 0 0

12.3 11 101 xxx.xxx.xxx.xxx----- 134.84.X.117 ----- tcp ---AP--- --------- --------- 0.3 0.2 0.5 0.3 0 0 0 0.1 0.1 0.1 0.1 0 0.1 0 0 0

58.9 - - 134.84.X.2 554 67.40.X.170 62727 tcp ---AP-S- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

54 - - 128.101.X.39 54906 65.221.X.2 50789 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

34.4 - - 62.70.X.101 17534 134.84.X.43 6881 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

28.4 - - 220.120.X.249 15074 160.94.X.1 2355 tcp ---AP--- [32k,1M][8M,1765M] 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

12.1 23 73 xxx.xxx.xxx.xxx 57 216.196.X.78 ----- tcp ---A-R-- --------- --------- 0.2 0.3 0.3 0.4 0 0 0 0 0.2 0 0 0 0.2 0 0 0

UMN computers doing bulk transfers 160.94.122.142 is running a rogue FTP server on 60000/TCP UMN Computers doing large transfers via BitTorrent to many outside hosts This computer is scanning for computers on port 139/TCP. Majority of the packets are 192bytes

or 144bytes, except for the second summary (score 88.2) UMN computer running a RealMedia server, that was not known to the analyst Odd looking P2P traffic to/from a UMN computer (potentially KaZaA or Gnutella) The remote computer was scanning for 57/TCP, where RESET packets are sent back from

computers that do not have 57/TCP open.

15AHPCRC

Scan Detection• Despite the importance of scan detection its value is often overlooked

– Lack of good tools for scan detection• Existing methods either miss stealth scans or give too many false alarms

• Fast scans are easy to catch using existing schemes but stealth scans are very difficult to recognize

• MINDS employs our new methodology for detecting network scans– Makes use of powerful new heuristics

• Only considers flows with a small number of packets• Only considers scans in a subnet (not the whole internet)

– Makes effective use of usage information• Touches to rare IP / port combinations are more suspicious than others• A scanner will hit machines where the service is not available resulting in a low count

• Very low False Alarm rate– Evaluation of 36 million flows over a 30-minute window at the University of Minnesota

showed 2583 alarms but only 22 false alarms– Evaluation on an hour of data at the ARL showed 1150 scans report, but only 5 false

alarms

• Routinely finds compromised machines at ARL-CIMP

16AHPCRC

Detecting Suspicious Ports for Possible Worm Activity

• We find destinations located within the network for which there is a high connection failure rate on specific ports for inbound, non-scan connections

• Then we find ports on which there are many such destinations• The existence of these ports indicates a potential worm or

slow scan• This warrants targeted and more detailed data collection and

analysis that cannot be done easily on the entire data– Packet content analysis

– Signature generation

17AHPCRC

IP / port pairs for which a large percentage of connections failed

18AHPCRC

IP / port pairs for which a large percentage of connections failed (only for ports with many hits)

19AHPCRC

0 1 4 5 16 HP 17 Apple 20 CSC 21 64 65 68 69 80 81 84 85

2 3 GE 6 7 18 MIT 19 Ford 22 23 66 67 70 71 82 83 86 87

8 9 IBM 12 ATT 13 Xerox 24 Cable 25 28 29 72 73 76 77 88 89 92 93

10 11 14 15 HP 26 27 30 31 74 75 78 79 90 91 94 95

32 ATT 33 36 3748

Prudential49

52 DuPont

53 Chrysler

96 97 100 101 112 113 116 117

34 Halliburto

n

35 Merit Netw orks

38 PSI 39 50 51 54 Merck 55 98 99 102 103 114 115 118 119

40 Eli Lily 4144 Am

Rad Digi Com

45 Interop Show

Net

56 57 60 61 104 105 108 109 120 121 124 125

42 43 46 47 Nortel 58 59 62 63 106 107 110 111 122 123 126 127

128 129 132 133 144 145 148 149 192 193 196 197 208 209 212 213

130 131 134 135 146 147 150 151 194 195 198 199 210 211 214 215

136 137 140 141 152 153 156 157 200 201 204 205 216 217 220 221

138 139 142 143 154 155 158 159 202 203 206 207 218 219 222 223

160 161 164 165 176 177 180 181 224 225 228 229 240 241 244 245

162 163 166 167 178 179 182 183 226 227 230 231 242 243 246 247

172

AOL

170 171 174 175 186 187 190 191 234 235 238 239 250 251 254 255

APNIC (Asia) US Military IANA Reserved Multicast

RIPE (Europe) USPS Private Use

LACNIC (Lat. Am.) ARIN Loopback

Japan Inet UK Government Public Data Network

SITA (French)

168 169 173 184 185 188 189 232 249 252 253233 236 237 248

999 unique sources (Min:1, Max:28, Avg:1)1126 unique destinations (Min:1, Max:55, Avg:1)1516 total flows involved1472 scan flows on port 80 (found by scan detector)

7982 unique sources (Min:1, Max:16, Avg:1)6184 unique destinations (Min:1, Max:28, Avg:1)9930 total flows involved9406 scan flows on port 445 (found by scan detector)

24AHPCRC

Clustering

• Useful for detecting modes of behavior– Shared Nearest Neighbor (SNN) clustering works quite well at determining

modes of behavior• Not distracted by “noise” in the data

• SNN is CPU intensive, O(N^2)• Requires storing an N x K matrix

– K (number of neighbors) is typically between 10 – 20– K should be about the size of the smallest expect mode

• Clustered 850,000 connections collected over one hour at one US Army Fort

• Took 10 hours using 3 Quad 2.8 Ghz Servers, and 4 2 Ghz workstations (total of 16 CPUs)

• Required around 100 Meg of memory per PE for the distance calculations– 500 Meg of memory for the final clustering step on a single PE

• Found 3135 clusters– Largest clusters around 500 records, smallest cluster 10 records

Detecting Large Modes of Network Traffic Using ClusteringDetecting Large Modes of Network Traffic Using Clustering

Large clusters of VPN traffic (hundreds of connections)

Used between forts for secure sharing of data and working remotely

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Packets Bytes20040407.10:00:00.428036 0:00:00 A -1 B -1 gre 237 1 55620040407.10:00:00.685520 0:00:03 A -1 B -1 gre 237 1 55620040407.10:00:00.748920 0:00:00 A -1 B -1 gre 237 1 55620040407.10:01:44.138057 0:00:00 A -1 B -1 gre 237 1 55620040407.10:01:59.267932 0:00:00 A -1 B -1 gre 237 1 9620040407.10:02:44.937575 0:00:01 A -1 B -1 gre 237 1 55620040407.10:04:00.717395 0:00:00 A -1 B -1 gre 237 1 55620040407.10:04:30.976627 0:00:01 A -1 B -1 gre 237 1 55620040407.10:04:46.106233 0:00:00 A -1 B -1 gre 237 1 55620040407.10:05:46.715539 0:00:00 A -1 B -1 gre 237 1 55620040407.10:06:16.975202 0:00:01 A -1 B -1 gre 237 1 55620040407.10:06:32.105013 0:00:00 A -1 B -1 gre 237 1 556

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL packets Bytes20040407.10:00:40.685522 0:00:03 B -1 A -1 gre 237 1 9620040407.10:00:58.748922 0:00:00 B -1 A -1 gre 237 1 9620040407.10:01:44.138059 0:00:00 B -1 A -1 gre 237 1 9620040407.10:02:14.678442 0:00:00 B -1 A -1 gre 237 1 9620040407.10:02:44.937577 0:00:01 B -1 A -1 gre 237 1 9620040407.10:03:15.308206 0:00:00 B -1 A -1 gre 237 1 9620040407.10:04:30.976629 0:00:01 B -1 A -1 gre 237 1 9620040407.10:06:16.975204 0:00:01 B -1 A -1 gre 237 1 9620040407.10:06:32.105015 0:00:00 B -1 A -1 gre 237 1 9620040407.10:06:47.234837 0:00:00 B -1 A -1 gre 237 1 9620040407.10:07:02.367471 0:00:00 B -1 A -1 gre 237 1 9620040407.10:07:17.494574 0:00:00 B -1 A -1 gre 237 1 96

Detecting Unusual Modes of Network Traffic Using ClusteringDetecting Unusual Modes of Network Traffic Using Clustering

Clusters Involving GoToMyPC.com (Army Data)

Policy violation, allows remote control of a desktop

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags Packets Bytes20040407.10:00:10.428036 0:00:00 A 4125 B 8200 tcp 123 ***AP*SF 5 24820040407.10:00:40.685520 0:00:03 A 4127 B 8200 tcp 123 ***AP*SF 5 24820040407.10:00:58.748920 0:00:00 A 4138 B 8200 tcp 123 ***AP*SF 5 24820040407.10:01:44.138057 0:00:00 A 4141 B 8200 tcp 123 ***AP*SF 5 24820040407.10:01:59.267932 0:00:00 A 4143 B 8200 tcp 123 ***AP*SF 5 24820040407.10:02:44.937575 0:00:01 A 4149 B 8200 tcp 123 ***AP*SF 5 24820040407.10:04:00.717395 0:00:00 A 4163 B 8200 tcp 123 ***AP*SF 5 24820040407.10:04:30.976627 0:00:01 A 4172 B 8200 tcp 123 ***AP*SF 5 24820040407.10:04:46.106233 0:00:00 A 4173 B 8200 tcp 123 ***AP*SF 5 24820040407.10:05:46.715539 0:00:00 A 4178 B 8200 tcp 123 ***AP*SF 5 24820040407.10:06:16.975202 0:00:01 A 4180 B 8200 tcp 123 ***AP*SF 5 24820040407.10:06:32.105013 0:00:00 A 4181 B 8200 tcp 123 ***AP*SF 5 248

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags packets Bytes20040407.10:00:40.685522 0:00:03 B 8200 A 4127 tcp 123 ***AP*SF 4 21120040407.10:00:58.748922 0:00:00 B 8200 A 4138 tcp 123 ***AP*SF 4 21120040407.10:01:44.138059 0:00:00 B 8200 A 4141 tcp 123 ***AP*SF 4 21120040407.10:02:14.678442 0:00:00 B 8200 A 4145 tcp 123 ***AP*SF 4 21120040407.10:02:44.937577 0:00:01 B 8200 A 4149 tcp 123 ***AP*SF 4 21120040407.10:03:15.308206 0:00:00 B 8200 A 4153 tcp 123 ***AP*SF 4 21120040407.10:04:30.976629 0:00:01 B 8200 A 4172 tcp 123 ***AP*SF 4 21120040407.10:06:16.975204 0:00:01 B 8200 A 4180 tcp 123 ***AP*SF 4 21120040407.10:06:32.105015 0:00:00 B 8200 A 4181 tcp 123 ***AP*SF 4 21120040407.10:06:47.234837 0:00:00 B 8200 A 4182 tcp 123 ***AP*SF 4 21120040407.10:07:02.367471 0:00:00 B 8200 A 4183 tcp 123 ***AP*SF 4 21120040407.10:07:17.494574 0:00:00 B 8200 A 4184 tcp 123 ***AP*SF 4 211

Detecting Unusual Modes of Network Traffic Using ClusteringDetecting Unusual Modes of Network Traffic Using Clustering

Clusters involving mysterious ping and SNMP traffic

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL ICMP Type ICMP Code # Packets # Bytes20040407.10:01:00.181261 0:00:00 A 1176 B 161 udp 123 1 9520040407.10:01:23.183183 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:02:54.182861 0:00:00 A 1514 B 161 udp 123 1 9520040407.10:03:03.196850 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:04:45.179841 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:06:27.180037 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:09:48.420365 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:11:04.420353 0:00:00 A 3013 B 161 udp 123 1 9520040407.10:11:30.420766 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:12:47.421054 0:00:00 A 3329 B 161 udp 123 1 9520040407.10:13:12.423653 0:00:00 A -1 B -1 icmp 123 8 0 1 8420040407.10:14:53.420635 0:00:00 A -1 B -1 icmp 123 8 0 1 84

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL ICMP Type ICMP Code # Packets # Bytes20040407.10:01:00.181488 0:00:00 B 161 A 1176 udp 63 1 10320040407.10:01:23.183291 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:01:55.180590 0:00:00 B 161 A 1326 udp 63 1 23420040407.10:02:54.184537 0:00:00 B 161 A 1514 udp 63 1 13420040407.10:03:03.196958 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:04:45.179965 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:05:09.180542 0:00:00 B 161 A 1927 udp 63 1 23420040407.10:06:27.180159 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:09:48.420410 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:11:30.420773 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:13:12.423663 0:00:00 B -1 A -1 icmp 254 0 0 1 8420040407.10:14:53.421019 0:00:00 B -1 A -1 icmp 254 0 0 1 84

Detecting Unusual Modes of Network Traffic Using ClusteringDetecting Unusual Modes of Network Traffic Using Clustering

Clusters involving unusual repeated ftp sessions

Further investigations revealed a misconfigured Army computer was trying to contact MicrosoftStart Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags packets Bytes20040407.10:10:57.097108 0:00:00 A 3004 B 21 tcp 123 ***AP*SF 7 31820040407.10:11:27.113230 0:00:00 A 3007 B 21 tcp 123 ***AP*SF 7 31820040407.10:11:37.111176 0:00:00 A 3008 B 21 tcp 123 ***AP*SF 7 31820040407.10:11:57.118231 0:00:00 A 3011 B 21 tcp 123 ***AP*SF 7 31820040407.10:12:17.125220 0:00:00 A 3013 B 21 tcp 123 ***AP*SF 7 31820040407.10:12:37.132428 0:00:00 A 3015 B 21 tcp 123 ***AP*SF 7 31820040407.10:13:17.146391 0:00:00 A 3020 B 21 tcp 123 ***AP*SF 7 31820040407.10:13:37.153713 0:00:00 A 3022 B 21 tcp 123 ***AP*SF 7 31820040407.10:14:47.178228 0:00:00 A 3031 B 21 tcp 123 ***AP*SF 7 31820040407.10:15:47.199100 0:00:00 A 3040 B 21 tcp 123 ***AP*SF 7 31820040407.10:16:07.206450 0:00:00 A 3042 B 21 tcp 123 ***AP*SF 7 318

Start Time Duration Src IP Src Port Dst IP Dst Port Proto TTL Flags packets Bytes20040407.10:00:06.627895 0:00:01 B 21 A 2924 tcp 123 ***AP*SF 7 44920040407.10:00:16.633872 0:00:01 B 21 A 2925 tcp 123 ***AP*SF 7 44920040407.10:00:36.638794 0:00:01 B 21 A 2927 tcp 123 ***AP*SF 7 44920040407.10:01:16.652664 0:00:01 B 21 A 2932 tcp 123 ***AP*SF 7 44920040407.10:01:26.659694 0:00:01 B 21 A 2933 tcp 123 ***AP*SF 7 44920040407.10:01:56.666816 0:00:01 B 21 A 2937 tcp 123 ***AP*SF 7 44920040407.10:02:06.670680 0:00:01 B 21 A 2938 tcp 123 ***AP*SF 7 44920040407.10:02:56.687932 0:00:01 B 21 A 2944 tcp 123 ***AP*SF 7 44920040407.10:03:26.698413 0:00:01 B 21 A 2947 tcp 123 ***AP*SF 7 44920040407.10:04:06.712495 0:00:01 B 21 A 2952 tcp 123 ***AP*SF 7 44920040407.10:05:06.733731 0:00:01 B 21 A 2961 tcp 123 ***AP*SF 7 44920040407.10:06:16.758442 0:00:01 B 21 A 2969 tcp 123 ***AP*SF 7 449

Header AnalysisPacket-Based

Signature Detection

Session-Based Signature Detection

Simple Scans

Viruses and

Worms

Scans with Automatic

Virus Attacks

Scans with Target

Responses

New and Variant Attacks

Compromises

Behavior Analysis

(MINDS)

Anomaly Detection and New Attacks

MINDS: CRITICAL TO COMPLETE FUNCTIONALITYMINDS: CRITICAL TO COMPLETE FUNCTIONALITY

Army Research Laboratory (ARL), supported by the AHPCRC and the MINDS initiative, successfully monitors and analyzes network data to protect ARL and its Army and DoD customer infospace

30AHPCRC

• Correlation of suspicious events across network sites– Helps detect sophisticated attacks not identifiable by single site

analyses– Scalable anomaly detection– Distributed correlation algorithms– Grids & middleware

• Analysis of long term data (months/years)– Uncover suspicious stealth activities (e.g. insiders

leaking/modifying information)

MINDS

MINDS

MINDS

MINDS

MINDS

Current MINDS Research and Development Work