network mapping and anomaly detection athina markopoulou (irvine) robert calderbank (princeton) rob...

Download Network Mapping and Anomaly Detection Athina Markopoulou (Irvine) Robert Calderbank (Princeton) Rob Nowak (Madison) MURI Kickoff Meeting September 19,

If you can't read please download the document

Upload: carol-hodge

Post on 18-Jan-2018

225 views

Category:

Documents


0 download

DESCRIPTION

Application Challenges Network Mapping: Infer network topology/connectivity from minimal measurements Detecting Topology Changes: Quickly sense changes in routing or connectivity Network-wide Anomalies: Detect weak and distributed patters of anomalous network activity Predicting Malicious Traffic: Identify network sources that are likely to launch future attacks

TRANSCRIPT

Network Mapping and Anomaly Detection Athina Markopoulou (Irvine) Robert Calderbank (Princeton) Rob Nowak (Madison) MURI Kickoff Meeting September 19, 2009 Challenges - Applications - Mathematics Preliminary Results - Detecting Malicious Traffic Sources (Athina Markopoulou) - Network Topology Id - Network-wide Anomaly Detection Research Directions Outline Application Challenges Network Mapping: Infer network topology/connectivity from minimal measurements Detecting Topology Changes: Quickly sense changes in routing or connectivity Network-wide Anomalies: Detect weak and distributed patters of anomalous network activity Predicting Malicious Traffic: Identify network sources that are likely to launch future attacks Mathematical Challenges Vastly Incomplete Data: Impossible to monitor a network everywhere and all the time. Where and when should we measure? Large-scale Inference: Inference of high-dimensional signals/graphs from noisy and incomplete data. Robust statistical data analysis and scalable algorithms are crucial. Network Representations: Statistical analysis matched to network structures. Can network data be sparsified using new representations and transformations? Network Prediction Models: New network-centric statistical methods are needed to cluster network nodes for robust prediction from limited datasets. Predicting Malicious Traffic Sources Predictive Blacklisting as an Implicit Recommendation System Problem: predict sources of malicious traffic on the Internet Blacklists: list of worst offenders (source IP addresses or prefixes) used to block (or to further scrub) traffic originating from those sources Goal: Predict malicious sources that are likely to attack a victim in the future based on past logs Prediction vs. Detection strictly speaking, this is not detection but it does require finding patterns in the data Traditional Blacklist Generation Local Worst Offender List (LWOL) Most prolific local offenders Reactive but not proactive Global Worst Offender List (GWOL) Most prolific global offenders Might contain irrelevant offenders Non prolific attackers are elusive to GWOL State of the art: Collaborative Blacklisting J. Zhang, P. Porras, and J. Ullrich, Highly Predictive Blacklisting, USENIX Security 2008 (best paper award) Key idea: A victim is likely to be attacked not only by sources that attacked this particular, but also by sources that attacked similar victims Methodology: Use link-analysis (pagerank) on the victims similarity graph to predict future attacks First methodological development in this problem a long time! Formulating Predictive Blacklisting as an Implicit Recommendation System 32?? 1?? ??2? Items ( movies) Users R = Rating Matrix Recommendation system (e.g. Netflix, Amazon) ? ?-3? ??-2 38?- -??1 ?-121 4?-27 2?6- -7?1 3-?9 ?21-? 112?- Victims Attackers Predictive Blacklisting -??? ?-?? ??-? ???- Time R(t) = Attack Matrix Collaborative Filtering (CF) different techniques capture different patterns in the data Multi-level Prediction Individual level: (attacker, victim) use time series to project past trends Local level: neighborhood-based CF group similar victim networks,(knn) notion of similarity accounts for common attackers and time groups of attackers attacking the same victims find them using the cross-association (CA) method [Global level: factorization-based CF (in progress)] find latent factors in the data using, e.g. SVD Combine ratings from different predictors 9 Tested our approach on Dshield data 6-month of logs Dshield.org is a central repository of shared logs Several victim organizations submit their IDS logs (flow data) The repository analyzes the logs and provides a predictive blacklist, tailored to each victim UCI Princeton Dshield.org Several different patterns co-exist in the data and should be detected and used for prediction 11 Preliminary results A combination of methods significantly improves the hit count of the blacklist up to 70% (57% on avg) compared to the state-of-the art (HPB) Combined method State-of-the-art (HPB) Older method (GWOL) and there is much room for improvement Challenges & Future Directions Get closer to the upper bound Latent factor techniques Dealing with missing data Adversarial model Scalability Hopefully interactions with other people in this group F. Soldo, A. Le, A. Markopoulou, Network Mapping Existing Methods: Active probing (e.g., traceroute) New Approach: Passive monitoring Lumeta Corporation The Data monitors end-hosts Hop-counts from end-hosts to monitors; extracted from TTL fields of traffic at monitors ? 1 ? Clustering End-Hosts Problem: Use hop-count data to automatically cluster end-hosts into topologically relevant groups (e.g., subnets) Intuition: End-hosts with similar hop-counts are probably close together Challenge: Clustering with missing data honeypots end-hosts 2-d histogram of hop- counts; ellipses indicate end-hosts from different subnets Matrix Completion observed hop-counts are random samples of entries of complete hop-count matrix SVD of hop-count matrix is low-rank r Results clusters from complete data clusters from 25% data 0 fraction of complete data 1 mixture model Network-wide Anomaly Detection Binary pattern 0/1 Signal strength Observation model: Weak in strength: signal Invisible in per node signature Weak in extent: # affected nodes Invisible in network wide aggregate unknown Distributed Network Anomaly Detection Prior work: Can detect weak and unstructured patterns by exploiting multiplicity. (Ingster, Jin-Donoho03, Abramovich et al 01) Subtle adaptive testing procedures: Higher criticism, False discovery control Localizable signal strength sparsity Now you see it, now you dont # active nodes Detecting weak and sparse patterns In addition to multiplicity, can we exploit the (possibly non-local) dependencies between node measurements to boost performance? Method must be adaptive to network interaction structure. How do node interactions effect thresholds of detectability/localizability? Interactions Nodes Network anomaly patterns Latent multi-scale Ising model : # edge agreements strength of interaction Observed network node measurements Latent multi-scale dependencies Modeling network anomaly patterns Theorem: Consider a latent multi-scale Ising model with uniform node interaction strength. With probability, (1)the correct dependency structure (tree) can be learnt using i.i.d network observations x by hierarchical correlation clustering. (2)the number of non-zero basis coefficients for an x drawn at random is is. Hierarchical correlation clusteringUnbalanced Haar Basis Hierarchical clustering and basis learning Network data Signal is focused and strength is amplified Wavelet coefficients Weak patterns are amplified by the sparsifying transform adapted to network topology, while noise characteristics remain the same. Detection of anomalies in transform domain Detection vs. Signal Strength Detection using Original data Detection using Wavelet coefficients Coherent activation patterns result in few non-zero basis coefficients and can be detected with much smaller signal strength Example basis vectors learnt from O(log p) network measurements using hierarchical clustering Sample delay covariance matrix Internet anomaly detection Monitor unknown network Compression achieved for real Internet RTT data Research Directions Active Sensing: Sequential algorithms that automatically decide where, what and when to measure? Online Large-scale Inference: on-line and near real-time network monitoring to detect topology changes and traffic anomalies. Wireless Network Sensing: Exploitation of sparsity and diversity in wireless networks for fast and robust identification of network-wide characteristics. New Network Representations: Relationships between wavelet representations and persistent homology. Extra slides network structure is unknown; infer network routing/topology by triangulation Network Discovery Unfortunately, many hop-counts are not observed ? ?