Southeast University
Dept. of Cs. & Eng.2008.8
AsiaFI School
Wang Yang
Southeast University
August 2008
FAME : Factor Analysis Based Metrics Exploring Algorithm
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Outline
Introduction
Basic of FA
FAME algorithm
Experiments
Conclusion and Future work
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Introduction
Basics of Metrics
Basic of network behavior research
we need different metrics to describe different network research objects’ behavior.
Example
the Object of network behavior research
different levels: link, packets, flows, sessions
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Introduction
Basics of Metrics
Atomic metrics
Describes the object’s direct property that cannot be further decomposition
Derivative metrics
Derived from the atomic metric through limited elementary operations and can reflect the characteristics of the object.
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Introduction
Atomic metrics exploring method
Rules: measurability, repeatability of measuring process
Research instinct
Enumerate every possibility
IETF IPPM WG: connectivity; one-way delay; one-way packet loss rate
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Introduction
Derivative metrics exploring method
Enumerate different operations on atomic metrics
Andrew Moore : mean, variance, FFT
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Introduction
Shortcoming
Atomic metrics:
Reflect what, no why and how
Derivative metrics
There is no systematic method
Lots of useless metrics
We need a systematic method
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Basics of FA
What is Factor Analysis
originated in psychometrics, and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data.
a statistical method used to explain variability among observed variables in terms of fewer unobserved variables called factors.
The information gained about the interdependencies can be used later to reduce the set of variables in a dataset.
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Basics of FA
FA Example
Spearman
a wide variety of mental tests could be explained by a single underlying intelligence factor (a notion now rejected).
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Basics of FA
Schema for common factor theory
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Basics of FA
Mathematical model
X is a matrix of observable variables
F is a m × l matrix of unobservable random variables
aijis factor loading that explain the relationship between the source metrics and the factor metrics
pmpmppp
mm
mm
FaFaFaX
FaFaFaX
FaFaFaX
2211
222221212
112121111
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
FAME Algorithm
Algorithm
1. Select original metrics’ matrix X ;
2. Get X’s observing experiment data x through measuring process;
3. Test x to determine whether x is fit for factor analysis process. If the answer is yes, then go to the 4th step, else go to the 1st step to reselect metrics;
4. Get factor loading matrix A through factor analysis process;
5. Give each factor semantic meaning through A.
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Experiment
Experiment Setup
Environment
Netflow Data aggravated by host
Captured at CERNET X Province border Router (Cisco 7609)
SPSS 15
Two type of data
same time range all-IP traffic data
same IP different time traffic data
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Experiment
Original metricsMetric Names Meaningipkts Incoming pkts numberopkts Out coming pkts numberiocts Incoming octsoocts Out coming octsiflows Incoming flowsoflows Out coming flowsiIPs Different IP addresses connected to the
host
oIPs Different IP addresses connected by the host
iports Different source ports seen in incoming flows
oports Different destination ports seen in out coming flows
pkts_r The ratio of ipkts over opktsocts_r The ratio of iocts over ooctsflows_r The ratio of iflows over oflowsIPs_r The ratio of iIPs over oIPsports_r The ratio of iports over oports
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Experiment
Same time range all-IP traffic datavariables
Factors
1 2 3 4iIPs .928 .329 -.034 -.014
oIPs .930 .321 -.041 -.085
iports .937 .316 -.074 -.019
oports .920 .299 -.100 -.088
ipkts .404 .901 .006 -.031
opkts .329 .883 -.192 -.011
iocts .154 .768 .260 -.083
Oocts .251 .792 -.289 .023
Iflows .638 .741 -.025 -.032
Oflows .607 .750 -.138 -.062
pks_r .008 -.099 .852 .395
octs_r -.173 .017 .762 -.023
flows_r .029 -.107 .766 .452
IPs_r -.069 -.033 .187 .891
ports_r -.068 -.015 .203 .883
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Experiment
Same time range all-IP traffic data
Four factors
active factor
the level of the user interaction activity with the outside world
throughput factor
reflects the host throughput from the view of the number of packets, the number of bytes and the number of flows
load factor
the host tendency of providing or acquiring traffics
role factor
the host user is client/Server/P2P point
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Experiment
same IP different time traffic datavariable
factor
1 2ipks .982 -.056
opks .980 -.094
iocts .959 -.034
oocts .953 -.120
iflows .990 -.079
oflows .988 -.110
iIPs .935 -.241
oIPs .920 -.295
ipors .962 -.177
opors .947 -.230
rpks -.100 .893
rocts -.181 .779
rflows -.099 .963
rIPs -.118 .921
rpors -.119 .902
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Experiment
same IP different time traffic data
two factors
active factor
the level of the user interaction activity with the outside world
role factor
the host user is client/Server/P2P point
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Conclusion and Future work
Conclusion
Factor Analysis is a systematic method to exploring derivative metrics
Factor metrics can help explain and reduce the source atomic and derivative metrics.
Future work
how to select source variables for factor analysis
how to computer the value of the factor metrics
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Reference
V. Paxson, G. Almes, J. Mahdavi. Framework for IP Performance Metrics, RFC 2330, May 1998
W. Moore and D. Zuev, Discriminators for use in flow-based classification, Technical report, Intel Research, Cambridge, 2005.
Mingzhong Zhou, Study of Large-scale Network IP Flows behavior Characteristics and Measurement Algorithms. Phd. Thesis, Southeast University, August 2006.
Southeast University
Dept. of Cs. & Eng.
AsiaFI School
2008.8
Questions?
Thank You