pattern detection in computer networks using robust...

Pattern Detection in Computer Networks Using Robust Principal Component

Analysis

CS525 URBAN NETWORKS: METHODS AND ANALYSIS

4-19-2017

Randy PaffenrothAssociate Professor of Mathematical Sciences, and

Associate Professor of Computer ScienceData Science Program

Worcester Polytechnic Institute

Urban networks vs.

Computer networks

By Howchou (Own work) [CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons

By Hibernia Networks (Hibernia Networks) [Public domain], via Wikimedia Commons

● Robust Principal Component Analysis as applied to analysis of computer networks.

● In effect, I am interested in “semi-supervised” learning where much of the data is unlabeled and has to “speak for itself”

● Attempt to justify why I think this is an interesting way to think about network analysis.

● Show some examples in this area.

● Beware! I am a mathematician and, morally, I can’t give a talk without any equations :-)

What am I going to talk about?

Theory and Practice

Where we find our inspiration for practice...

● Stuxnet, Flame, Target Inc., Neiman Marcus, Affinity Gaming, Dairy Queen...● Et tu, Dairy Queen!? This is when things got personal...

● "Axiom" 1: Unless some sensor, or collection of sensors, is effected by an attack then you can't detect it.

● I.e. either the marginal or joint probability density function of the sensors must be different in a statistically meaningful way, conditioned on the absence or presence of an attack.

● "Axiom" 2: The most dangerous attacks are those for which you don’t have a signature.● Virus detection and intrusion detection systems (IDS) do a good job of detecting

attacks for which a signature is known, but have nothing to say if the attack has no signature

● "Theorem": Therefore the most dangerous attacks can only be detected by sensors which were not designed to detect that threat.

● You have to get lucky and have a sensor that detects the new attack even though it was not designed to do so.

● "Corollary": You want lots of sensors!● But, how do you fuse them? Even once you have a way of fusing the data, how do

you avoid being overwhelmed with false alarms!

Advanced Persistent Threats

Reconnai

Botnet

Pivotin

Pivoting

Exfiltration

Point of e

Command and

control

What do we mean by a sensor?

No attackNo attack

AttackAttack

No attackNo attack

AttackAttack

No attackNo attack

AttackAttack

What kinds of sensors?

● Already talked about packet rates.● Port, CPU, memory activity, etc.● Intrusion Detection Systems

● Bro, Snort, Suricata, etc.

● More "complicated" sensors such as those inspired by information theory.● Packet payload entropy

Butun, Ismail, Salvatore D. Morgera, and Ravi Sankar. "A survey of intrusion detection systems in wireless sensor networks." Communications Surveys & Tutorials, IEEE 16.1 (2014): 266-282.Moosavi, M. R., et al. "ENTROPY BASED FUZZY RULE WEIGHTING FOR HIERARCHICAL INTRUSION DETECTION." Iranian Journal of Fuzzy Systems 11.3 (2014): 77-94.

Best with an example!

Data matrix

First order anomaly

Sparse correlations?Latent signal model...

A simple second order anomaly

Second order theory!

In our work we focused on analyzing the second order statistics of by way of its covariance or normalized cross correlation matrix , such as

Interesting questions:● Correlation versus covariance?● More refined calculations such as Maximum Likelihood covariance estimation (e.g. using convex optimization).

Well defined for missing data and different data types (e.g. point-biserial correlation).

Second order anomaly

Standing on the shoulders of giants

●Over the past 4-5 years there has been a flurry of activity on this problem, much of which we suspect the current audience is aware of.

● Ideas such as matrix completion, robust principal component analysis, and robust matrix completion have generated a lot of interest, including among us!

Z. Zhou, X. Li, J. Wright, E. Cande`s, and Y. Ma, “Stable Principal Component Pursuit,” ISIT 2010: Proceedings of IEEE International Symposium on Information Technology, 2010.

E. Candes, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?,” J. ACM, vol. 58, pp. 11:1–11:37, June 2011.

E. Candes and Y. Plan, “Matrix Completion With Noise,” Proceedings of the IEEE, vol.98, no.6, p.11, 2009

E.Candes and B.Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, pp. 717–772, December 2009.

Eckart, C.; Young, G. (1936). "The approximation of one matrix by another of lower rank". Psychometrika 1 (3): 211–8.

Matrix completion: The Netflix problem!

Robust principal component analysis

R. Paffenroth, P. Du Toit, R. Nong, L. Scharf, A. Jayasumana and V. Bandara Space-time signal processing for distributed pattern detection in sensor networks IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No.1, February 2013 P. Du Toit, R. Paffenroth, R. Nong Stability of Principal Component Pursuit with Point-wise Error Constraints in preparation 2012.

What I am interested in :-)

Singular values

The appropriate structures appear all over the place in real data!

Insurance Satisfaction Surveys

Elisa Rosales

Singular Values of Matrices

Amazon product communities SKAION Internet Attack (e.g., DDoS) simulations

The appropriate structures appear all over the place in real data!

Rakesh Biradar

Singular Values of Matrices

Abilene Internet2 Backbone

Enough math for the moment, lets try a really practical example

● DARPA Lincoln Lab Intrusion Detection Evaluation Data Set➢ IPsweep of the AFB from a remote site➢ Probe of live IP's to look for the sadmind daemon running on Solaris hosts➢ Breakins via the sadmind vulnerability, both successful and unsuccessful on those hosts➢ Installation of the trojan mstream DDoS software on three hosts at the AFB➢ Launching the DDoS

https://www.ll.mit.edu/ideval/data/2000/LLS_DDOS_1.0.html

Feature generation

Raw PCAP files Derived features

Imporant idea... don't blindly follow theory

arg min

( ) ( )

Lincoln Labs DARPA Intrusion Detection Data Set - PCA

● IP sweep from a remote site,● a probe of live IP addresses looking for a running Sadmind daemon,● and then an exploitation of a Sadmind vulnerability.

Lincoln Labs DARPA Intrusion Detection Data Set - Comparison

PCA – Too many false negatives

RPCA – Too many false positives

Lincoln Labs DARPA Intrusion Detection Data Set - Comparison

Too “thick”Too “thin”

Just right l

Key idea● Semi-supervised learning

● PCA and RPCA have many parameters● Far to many to train on reasonably sized collections

of attacks● Only train a few important parameters on

supervised training data– Like

● Gives better generalization and less over-fitting

Key idea● Semi-supervised learning

Training data for l

Algorithm not trained on this attack vector!

pattern detection in computer networks using robust...

Documents

robust game theory - mit.edudbertsim/papers/robust...

ds504/cs586: big data...

short course robust optimization and machine...

ds504/cs586: big data...

rebalancing bike sharing systems: a multi-source data smart...

robust pca robust pca 3: spherical pca. robust pca

pattern detection in computer networks using robust...

lore: exploiting sequential inﬂuence for location...

more robust doubly robust estimators

new efficient large-scale fleet management via multi-agent...

01 min-max robust and cvar robust mean-variance...

dissecting regional weather-traffic sensitivity throughout...

wireshark lab: getting started v6 - worcester...

telco churn prediction with big data - worcester...

unveiling taxi drivers’ strategies via...

product robust design and process robust design

robust optimization - stanford university · robust...

section 5 robust details section 5 robust details

redtide model output vs a.fundyence abundance observation...

wireshark ethernet arp v7 -...