waterfall: rapid identification of ip flows using cascade classification
TRANSCRIPT
Waterfall:
Rapid identificationof IP flows using
cascade classificationPaweł Foremski, MSc. Eng.
The Institute of Theoretical and Applied Informaticsof the Polish Academy of Sciences, Gliwice
Brunów, 24th June 2014CN 2014 Conference
Identification of IP flows?“traffic classification” or “traffic identification”
TC: input - output
TrafficClassifier
Input Output
networktraffic
applicationnames
TC input• TC input is the object of classification:
o Single IP packet
o IP flow
o Endpoint
o Host
TC output• TC output is the result of classification:
o Application name – e.g. Skype, Teamviewer
o Network protocol – e.g. HTTP, SMTP
o Category – e.g. chat, streaming
o Traffic profile – e.g. bulk, interactive
o Content type – e.g. text, image
o Web application – e.g. Google Docs, Facebook
TC: the problem• How to identify network traffic?
• How to cope with practical constraints?o With limited resources (on high-speed routers)
o With limited details (only packet headers)
o ...
• How to measure the performance?o Result accuracy
o Reaction time
o Temporal stability
o Spatial stability
o ...
TC: applications
HTTP
Skype
BitTorrent
FTP
BitTorrent
Queuing
Quality of Service
Firewall
Access Policy
Monitoring
Routing
...
TC: applications
Alessandro Finamore, Marco Mellia, Michela Meo, Maurizio M. Munafò, Dario Rossi, Experiences of Internet Traffic Monitoring with Tstat,IEEE Network "March/April 2011", Vol.25, No.3, pp.8-14, ISSN: 0890-8044, March/April 2011
TC: applications
FTTH4 Mbps
ADSL24 Mbps
VoIP, DNS, G
ames,
...
BitTorrent, eMule, YouTube, ...
5-10 ms
50-100 ms
TC: existing solutions• Port numbers
• Deep Packet Inspection (DPI) - e.g. [2,3]
• Machine Learning - e.g. [5,9]
• Behavioral analysis - e.g. [4,7,8]
• Classifier fusion - e.g. [6]
Waterfall: motivation
Each TC algorithm has advantages and disadvantages.
The problem: Could we integrate these approaches into one system so that we move forward in TC?
How would solving this problem affect classification performance?
Waterfall: the idea1. Use existing classifiers as modules2. Implement the rejection option3. Minimize false positives4. Connect in a cascade structure
1
2
3
An old (yet new) idea
• Classifier selection• Mixture of experts• Cascade classification
Kuncheva L., “Combining pattern classifiers: methods and algorithms",John Wiley & Sons, 2004
A
A
B
Ax
• Classifier fusion• Majority vote• Weighted vote• Naive Bayes Combination• Behavior Knowledge Space• ...
Waterfall: the idea
Waterfall: practical system
dstip
dnsclass
portsize
npkts
port
(Python source code available at mutrics.iitis.pl)
Flow features limited to first 10 seconds
Waterfall: validation
• Total sum of over 3.5 TB of data
• Validation of spatial and temporal stability
Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014
Validation: dataset 1
Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014
Validation: dataset 2
Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014
Temporal stability (8 months)
Validation: datasets 3 and 4
Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014
Spatial stability
No payloads
Experiment 1: >50% is easy
Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014
>50%
>50%
Experiment 2: more is faster
Foremski P., Callegari C., Pagano M., "Waterfall: Rapid identification of IP flows using cascade classification“.Proceedings of the 21st International Conference on Computer Networks, CN2014, CCIS 431, pp. 14-23. Springer, 2014
adding specialized modules
Discussion• Waterfall is a new architecture for TC• We propose an idea and an open source implementation• A 5-element system yielded very good results
• Findings• More than 50% of traffic in Internet is easy to identify
• Adding more modules to cascade can increase the speed
• Open questions• Quantitative comparison: Waterfall vs. BKS
• How to train the system in an optimal way?
• How to put the modules in a proper order?
References1. Foremski P., On different ways to classify Internet traffic: a short review of selected publications.
Theoretical and Applied Informatics 2013; 25(2).2. B.-C. Park, Y. J. Won, M.-S. Kim, and J. W. Hong, Towards automated application signature
generation for traffic identification, in Network Operations and Management Symposium, 2008. NOMS 2008. IEEE, pp. 160–167, IEEE, 2008.
3. S. H. Yeganeh, M. Eftekhar, Y. Ganjali, R. Keralapura, and A. Nucci, CUTE: Traffic Classification Using TErms, in Computer Communications and Networks (ICCCN), 2012 21st International Conference on, pp. 1–9, IEEE, 2012.
4. T. Karagiannis, K. Papagiannaki, and M. Faloutsos, BLINC: Multilevel traffic classification in the dark, in ACM SIGCOMM Computer Communication Review, vol. 35, pp. 229 – 240, ACM, 2005.
5. A. Finamore, M. Mellia, M. Meo, and D. Rossi, KISS: Stochastic packet inspection classifier for udp traffic, Networking, IEEE/ACM Transactions on, vol. 18, no. 5, pp. 1505 – 1515, 2010.
6. A. Dainotti, A. Pescapé, and C. Sansone, Early classification of network traffic through multi-classification, Traffic Monitoring and Analysis, pp. 122 – 135, 2011.
7. Foremski P., Callegari C., Pagano M., DNS-Class: Immediate classification of IP flows using DNS, International Journal of Network Management, John Wiley & Sons, 2014, DOI: 10.1002/nem.1864
8. P. Bermolen, M. Mellia, M. Meo, D. Rossi, and S. Valenti, Abacus: Accurate behavioral classification of P2P-TV traffic, Computer Networks, vol. 55, no. 6, pp. 1394 – 1411, 2011.
9. G. Münz, H. Dai, L. Braun, and G. Carle, TCP traffic classification using Markov models, Traffic Monitoring and Analysis, pp. 127 – 140, 2010.
Thank you!
Paweł Foremski, [email protected] website: http://mutrics.iitis.pl/
TC: definition
Internet traffic classification (or identification) isthe act of matching IP packets
to the applications that generated them. [1]
TC: the problem• How to identify network traffic?• How to do it well?
o With limited resources (on high-speed routers)
o With limited details (only packet headers)
o With good accuracy (no errors)
o In limited time (in real-time)
o For current and future protocols (flexibility and stability)
o For the whole Internet (backbone routers and gateways)
• How to measure the performance?o Result accuracy
o Reaction time
o Temporal stability
o Spatial stability
o Processing time
o Unknown detection
Example: dnsclassForemski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS",
International Journal of Network Management, John Wiley & Sons, 2014
dnsclass: details
Foremski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS",International Journal of Network Management, John Wiley & Sons, 2014
dnsclass: details
Foremski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS", International Journal of Network Management, John Wiley & Sons, 2014
dnsclass: motivation
Foremski P., Callegari C., Pagano M., "DNS-Class: Immediate classification of IP flows using DNS", International Journal of Network Management, John Wiley & Sons, 2014