internet traffic classification demystified: myths
TRANSCRIPT
![Page 1: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/1.jpg)
Internet Traffic Classification Demystified: Myths, Caveats, and Best
PracticesKNOM Tutorial 2007November 8, 2007
POSTECH
Hyunchul Kim (김현철)CAIDA
(Cooperative Association for Internet Data Analysis)
Joint work with Marina Fomenkov, kc claffy (CAIDA)
Dhiman Barman, Michalis Faloutsos (UC Riverside)
![Page 2: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/2.jpg)
Internet Traffic Classification
• Port number based
• Payload based
• Behavior based
• Flow features based
1/41
![Page 3: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/3.jpg)
Questions
• The best traffic classification method?Under what conditions?Why?
• Contributions and limitations of each method?
2/41
![Page 4: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/4.jpg)
3/41
Network Science ?????
盲人摸象 : Blind men guessing an elephant
![Page 5: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/5.jpg)
Outline
• We evaluated the performance of CoralReef [CoralReef 07]BLINC [Karagiannis 05]Six machine learning algorithms [WEKA 07]
• Data used : 7 traces with payload3 backbone and 4 edge tracesFrom Japan, Korea, Trans-pacific, and US
• Performance metricsPer-trace : accuracyPer-application : precision, recall, and F-measureRunning time
4/41
![Page 6: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/6.jpg)
DatasetsTrace
(Country)Link type Date
(Local time)
Start time & duration
(Local time)
AverageUtilization
(Mbps)
Payload bytes per packet
PAIX-I (US)
OC48 Backbone, uni-directional
2004.2.25 Wed
11:00, 2h 104 16
PAIX-II (US)
OC48 Backbone
2004.4.21 Wed
19:59, 2h 2m 997 16
WIDE (US-JP)
100 ME Backbone
2006.3.3 Fri
22:45, 55m 35 40
KEIO-I (JP)
1 GE Edge 2006.8.8 Tue
19:43, 30m 75 40
KEIO-II (JP)
1GE Edge 2006.8.10 Thu
01:18, 30m 75 40
KAIST-I (KR)
1GE Edge 2006.9.10 Sun
02:52, 48h 12m 24 40
KAIST-II (KR)
1GE Edge 2006.9.14 Thu
16:37, 21h 16m 28 40
5/41
![Page 7: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/7.jpg)
Developing a classification benchmark
• Payload signatures of 50+ applications from The BLINC work [Karagiannis 05]“Traffic Classification Using Clustering Algorithms” [Erman 06] Korean P2P/File sharing applications [Won 06]Manual payload inspection
• Classification unit 5-tuple flow
• <srcip, dstip, protocol, srcport, dstport>• With 64 seconds timeout • 5 minute interval
6/41
![Page 8: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/8.jpg)
7/41
Application breakdown
Percentage of flows Percentage of bytes
![Page 9: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/9.jpg)
Tools evaluated
• CoralReef : port-number based classificationVersion 3.8 (or later)
• BLINC : host behavior-based classification28 configurable threshold parameters
• WEKA : A collection of machine learning algorithms6 most often used / well-known algorithmsFlow attributes selectionTraining set size vs Performance (Accuracy / F-Measure)
8/41
![Page 10: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/10.jpg)
9/41
BLINC : Behavior based classification
![Page 11: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/11.jpg)
10/41
How BLINC works: 1/2
IP
TCP
UDP
srcPort1….srcPortn
{dstIPs}, {dstPorts},{#pkts}, {avg_pkt_size}
Example data structure of BLINC [Karagiannis 05B]
• For each preconfigured interval (default 5 minutes), BLINC reads input flows and stores all behavior information for every host and port.
![Page 12: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/12.jpg)
11/41
How BLINC works: 2/2
• After populating the data structures with all input flows, BLINC does the following.
1. Per-host server behavior identification.2. Per-port server behavior identification.3. Applies some heuristics.
– Community heuristics.– Recursive detection heuristics.
4. Re-reads all flows and classifies them based on the results obtained through the steps 1~3.
![Page 13: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/13.jpg)
12/41
Machine Learning How To
Raw IP Trace(dag,pcap etc)
Training Flow Records
Testing Flow Records
Convert to arff format
Convert to arff format
Model
Train using an algorithm (Bayesian,Decision Tree etc)
TestStats (accuracy,True positive)
Extract flow features And ground truths
![Page 14: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/14.jpg)
Machine learning algorithms
Supervised machine learning algorithms
Bayesian Decision Trees Rules Functions Lazy
Naïve Bayesian, Support Vector Machine, [Moore 05, Williams 05] C4.5 Neural Net [Auld 07, k-Nearest Neighbors
Bayesian Network [Zander 06] Nogueira 06] [Roughan 04][Williams 06] (still in progress)
13/41
![Page 15: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/15.jpg)
Accuracy (Per-trace)
14/41
Accuracy = # of correctly classified flowstotal number of flows in a trace
![Page 16: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/16.jpg)
Per-application performance metrics
• Precision : “How precise is an application fingerprint?”
True PositivesTrue Positives + False Positives
• Recall : “How complete is an application fingerprint?”True Positives
True Positives + False Negatives
• F-Measure : Combination of precision and recall2 x Precision x Recall
Precision + Recall15/41
![Page 17: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/17.jpg)
16/41
Accuracy of CoralReef
0
20
40
60
80
100
PAIX-I
PAIX-IIW
IDE
KEIO-IKEIO-IIKAIST-IKAIST-II
Per
cent
age
of F
low
s (%
)Flow Accuracy
Over 71.8% for all traces
![Page 18: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/18.jpg)
17/41
CoralReef: Precision and Recall
Precision (%) Recall (%)DNS >= 99.9 >= 99.9Mail >= 92.0 >= 99.9
News >= 98.5 >= 99.9 SNMP >= 98.9 >= 98.9 NTP >= 99.9 >= 99.9
WWW >= 98.8 >= 94.3Chat >= 96.3 >= 89.7SSH >= 85.2 >= 94.5
![Page 19: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/19.jpg)
18/41
CoralReef on P2P and FTP
P2P (High precision & low recall) FTP
0
20
40
60
80
100
PAIX-IPAIX-IIW
IDE
KEIO-I
KEIO-II
KAIST-IKAIST-II
Perc
enta
ge o
f F
low
s (
%)
PrecisionRecall
0
20
40
60
80
100
PAIX-IPAIX-IIW
IDE
KEIO-I
KEIO-II
KAIST-IKAIST-II
Perc
enta
ge o
f F
low
s (
%)
PrecisionRecall
![Page 20: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/20.jpg)
19/41
0
20
40
60
80
100
PAIX-IPAIX-IIW
IDE
KEIO-I
KEIO-II
KAIST-IKAIST-II
Pe
rce
nta
ge
of
Flo
ws (
%)
PrecisionRecall
0
20
40
60
80
100
PAIX-I
PAIX-II
WID
E
KAIST-I
KAIST-II
Pe
rce
nta
ge
of
Flo
ws (
%)
PrecisionRecall
Streaming (Low precision & high recall) Games
CoralReef on Streaming and Games
![Page 21: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/21.jpg)
20/41
Port number based approach: Lessons Learned
• Still identify many applications very accurately.Mail, News, NTP, SNMP, Chat, DNS, and WWW
• Four causes of misclassificationsApplications that often use non-default portsP2P and FTP
Applications whose default ports are abused by othersStreaming, Games, …
Conflicts in default portsSHOUTCAST vs HTTP (8000-8005, 8080)
Local variants of common applicationsIRC – low recall in Japan and Korea
![Page 22: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/22.jpg)
21/41
Tuning BLINC
• 28 different threshold parametersFour P2P-related parameters first, then WWW, DNS, …For every single traceA parameter set tuned for a link did not work that well on other
links. More than 25 times’ trial and error, for every trace.
• Some if (condition)statements in the code needs to be changed.
For P2P traffic identification
![Page 23: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/23.jpg)
22/41
Accuracy of BLINC
0
20
40
60
80
100
PAIX-I
PAIX-II
WIDE
KEIO-I
KEIO-II
KAIST-I
KAIST-II
Perc
enta
ge o
f Flo
ws
(%)
BLINC on <srcIP, srcport> pairsBLINC on <dstIP, dstport> pairs
Flow accuracy obtained from <srcIP,srcport> and <dstIP,dstport>
![Page 24: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/24.jpg)
23/41
0
20
40
60
80
100
PAIX-IPAIX-IIW
IDE
KEIO-I
KEIO-II
KAIST-IKAIST-II
Pe
rce
nta
ge
of
Flo
ws (
%)
PrecisionRecall
0
20
40
60
80
100
PAIX-IPAIX-IIW
IDE
KEIO-I
KEIO-II
KAIST-IKAIST-II
Perc
enta
ge o
f F
low
s (
%)
PrecisionRecall
Lower recall on backbone traces (due to asymmetric routing)
BLINC on Mail and DNS
Mail DNS
![Page 25: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/25.jpg)
24/41
BLINC on P2P flows and bytes
0
20
40
60
80
100
PAIX-IPAIX-IIW
IDE
KEIO-I
KEIO-II
KAIST-IKAIST-II
Perc
enta
ge o
f F
low
s (
%)
PrecisioinRecall
0
20
40
60
80
100
PAIX-IPAIX-IIW
IDE
KEIO-I
KEIO-II
KAIST-IKAIST-II
Perc
enta
ge o
f B
yte
s (
%)
PrecisionRecall
Flow recall > Bytes recall : Missed large P2P flow
![Page 26: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/26.jpg)
BLINC : Lessons learned
• Reverse BLINC (on <dstIP, dstport> pairs) on backbone traffic
• Strongly depends on the link characteristics.
• Parameter tuning is too cumbersome
• Lower bytes accuracy
• Incomplete fingerprints for some applications
25/41
![Page 27: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/27.jpg)
Selected key flow attributes
Protocol srcport dstport Payloaded or not
Min pkt size
TCP flags
Size of n-th pkt
Keio-I V V V PUSH 2, 8
Keio-II V V V PUSH 1, 4
WIDE V V V V SYN, PUSH
4, 7
KAIST-I V V V V SYN,RST,PUSH, ECN
3, 5
KAIST-II V V V V SYN, PUSH
2, 3, 7
PAIX-I V V SYN, ECN 2, 9
PAIX-II V V V V SYN, CWR
1,4
* CFS (Correlation-based Feature Selection [Williams 06]) was used 26/41
![Page 28: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/28.jpg)
Training set size vs accuracy(machine learning algorithms)
Support Vector Machine achieves over 97.5 and 99% of accuracy when only 0.1% and 1% of a trace is used to train it, respectively.27/41
![Page 29: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/29.jpg)
28/41
Training set size vs F-Measure (SVM vs. C4.5)
![Page 30: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/30.jpg)
F-Measure vs. training set size (SVM vs. Bayesian Net.)
29/41
![Page 31: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/31.jpg)
F-Measure vs. training set size (SVM vs. k-Nearest.)
30/41
![Page 32: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/32.jpg)
Running time of machine learning algos
Training time Testing time
31/41
![Page 33: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/33.jpg)
Machine Learning Algorithms : Lessons Learned
• Key flow features– Protocol, port numbers, packet size info, and
TCP header flags– No site/location-dependent features!
• Support Vector Machine worked the best– On every trace, for all applications!– Running time
32/41
![Page 34: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/34.jpg)
Accuracy Comparison
Trans-pacific BackboneHighest
proportionof P2P flows
Lowest proportionof P2P flows
* Only 0.1% of each trace is used to train machine learning algorithms33/41
![Page 35: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/35.jpg)
F-Measure of CoralReef
• Most applications (WWW, DNS, Mail, News, NTP, SNMP, Spam Assassin, SSL, Chat, Game, SSH, and Streaming) use their default ports in most cases.
High precisionLow recall Low precision
High recall
34/41
![Page 36: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/36.jpg)
F-Measure of BLINC
Lower recall on Backbone traces
< 1% ofP2P flows
• Strongly depends on the link characteristics and traffic mixes.
High precision low recall
4~13% ofP2P flows
35/41
![Page 37: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/37.jpg)
Conclusions
• Diverse traces & per-application performance eval.Understand contributions and limitations of each method
• Port number Still identify many applicationsPowerful when used in combination with packet size info and TCP header flags
• BLINCStrongly depends on the link characteristics Parameter tuning is too cumbersome
• Support Vector Machine worked the bestRequires the smallest training set Running time
36/41
![Page 38: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/38.jpg)
Future work
1. A robust traffic classifierSVM trained with samples from our tracesEvaluating on 10 different payload traces
• 7 existing + 3 new traces [DITL 07]• So far, >= 94~96% of accuracy on all of them
2. Classifying header-only packet traces Longitudinal study of traffic classification: Internet2/NLANR packet-header trace archive, etc.
3. Graph-similarity based traffic classificationAutomatically tuning 28 parameters of BLINC
37/41
![Page 39: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/39.jpg)
392/22
Network Science !!!!!!!
39
![Page 40: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/40.jpg)
39/41
A Day in the Life of the Internet
• At least annual measurement with as many networks participating as possible
• Most recent: January 9-10, 20077 DNS participants (C root, F root, K root, M root, AS112, B ORSN M ORSN)6+ Network Participants (WIDE, KAIST, POSTECH, Chungnam National University, AMPATH, CAIDA, Cambridge Univ., …)
• http://www.caida.org/projects/ditl/
![Page 41: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/41.jpg)
40/41
“A Collaborative Internet Data Analysis in Korea”
• Are we Understanding current Internet in Korea?– How much?– We are living in the future of other people.
• Does anyone of us have all the required resources? Nope!– Network, Data, H/W, S/W, man power, fund, time, …– Need to work cooperatively/collaboratively on data collection,
measurement, analysis, comparison, brain storming, …
• Co-work with– ANF Measurement WG members– Sue Moon (KAIST), Won-ki Hong (POSTECH), Choongsun Hong
(KHU), Youngseok Lee (CNU), and Taek-keun Kown (CNU)– CAIDA and WIDE
![Page 42: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/42.jpg)
AcknowledgementsThis research was supported in part by IT Scholarship program of Institute for
Information Technology Advancement & Ministry of Information andCommunication, South Korea.– http://www.mic.go.kr
This research was supported in part by the National Science Foundation through TeraGrid resources provided by SDSC.
http://www.nsf.gov
This research was supported in part by the San Diego Supercomputer Center under SDS104 and utilized the Datastar system.
http://www.sdsc.edu
4241/41
![Page 43: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/43.jpg)
References[Allman 07] Allman et all, “A Brief History of Scanning,” ACM Internet Measurement Conference 2007, Oct 2007.[Auld 07] Auld et al., “Bayesian Neural Networks For Internet Traffic Classification,” IEEE Transaction on Neural Networks,
18(1):223-239, January 2007.[Bennet 00] Bennet, “Support Vector Machines: Hype or Hallelujah?,” ACM SIGKDD Explorations, 2(2):1-13, October 2000.[CoralReef 07] CoralReef. http://www.caida.org/tools/measurement/coralreef[Erman 06] Erman et al., “Traffic Classification Using Clustering Algorithms,” ACM SIGCOMM Workshop on Mining Network Data
(MineNet), Pisa, Italy, September 2006.[Dimov 07] Dimov, “Weka: Practical machine learning tools and techniques with Java implementations,” AI Tools Seminar, University
of Saarland, April 2007.[DITL 07] Day in the Life of the Internet. http://www.caida.org/projects/ditl[Karagiannis 05] Karagiannis et al., “BLINC: Multi-level Traffic Classification in the Dark,” ACM SIGCOMM 2005, Philadelphia, PA,
August 2005. [Karagiannis 05B] Karagiannis et al., “BLINC: Multi-level Traffic Classification in the Dark,” Technical Report, UC Riverside, 2005.[Moore 05] Moore et al., “Internet Traffic Classification Using Bayesian Analysis Techniques,” SIGMETRICS 2005, Karlsruhe,
Germany, August 2003.[Nogueira 06] Nogueira et al., “Detecting Internet Applications using Neural Networks,” IARIA Internet Conference on Networking
and Services, July 2006.[Roughan 04] Roughan et al., “Class-of-Service Mapping for QoS: A Statistical Signature-based Approach to IP Traffic
Classification,” ACM Internet Measurement Conference 2004, Oct 2004.[WEKA 07] WEKA: Data Mining Software in Java. http://www.cs.waikato.ac.nz/ml/weka[Williams 06] Williams et al., “A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic
Flow Classification,” ACM SIGCOMM Computer Communication Review, 36(5):7-15, October 2006.[Won 06] Won et al., “A Hybrid Approach for Accurate Application Traffic Identification,” IEEE/IFIP E2EMON, April 2006.[Zander 06] Zander et al., “Internet Archeology: Estimating Individual Application Trends in Incomplete Historic Traffic Traces,”
CAIA Technical Report 060313A, March 2006.
42
![Page 44: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/44.jpg)
Backup slides
43
![Page 45: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/45.jpg)
SVM Design Objective [Bennet 00]
Find the hyperplane thatMaximizes the margin
44
![Page 46: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/46.jpg)
Why maximum margin Hyperplane? [Bennet 00]
Intuitively, this feels safest
A hyperplane is really simple
It is robust to outliers since non-support vectors do not affect the solution at all.
If we’ve made a small variation near the boundary this gives us least chance of causing a misclassification.
There is Structural Risk Minimization theory (using VC D.) that gives the upper bound of generalization error.
Empirically it works very well
OptimalHyperplane
Non-optimal
45
![Page 47: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/47.jpg)
F-Measure vs. training set size (SVM vs. Naïve Bayes)
46
![Page 48: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/48.jpg)
Application breakdown : flows & pkts
47Percentage of flows Percentage of packets
![Page 49: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/49.jpg)
49
Application breakdown : bytes & pkts
8Percentage of bytes Percentage of packets
![Page 50: Internet Traffic Classification Demystified: Myths](https://reader034.vdocuments.us/reader034/viewer/2022042405/625de594ee61e00a6774d174/html5/thumbnails/50.jpg)
50
F-Measure of BLINC
• Incomplete fingerprints for the behavior of FTP, Streaming, and Game.
Lower recall on Backbone traces
< 1% ofP2P flows
• Threshold-based mechanism mandates enough behavior information of hosts.
High precision low recall
4~13% ofP2P flows
49• Often misclassifies DNS flows on backbone traces.