ameta’analysisapproachforfeatureselectionin...

15
A MetaAnalysis Approach for Feature Selection in Network Traffic Research Daniel C. Ferreira, Félix Iglesias Vázquez, Gernot Vormayr, Maximilian Bachl, Tanja Zseby Institute of Telecommunications TU Wien

Upload: others

Post on 16-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

A Meta-­Analysis Approach for Feature Selection in Network Traffic Research

Daniel C. Ferreira, Félix Iglesias Vázquez, Gernot Vormayr, Maximilian Bachl, Tanja Zseby

Institute of TelecommunicationsTU Wien

Page 2: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Network Traffic Analysis

August 20172

Feature Vector

Traffic Analysis

Post Processing

Traffic Observation

Packet/Flow Data

Results

Reproducibility Workshop

Feature Selection:Select most suitable features

Page 3: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Well-­chosen Features è Simplified Analysis

August 20173

diameter

length

magnetic

weight

Reproducibility Workshop

Page 4: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Agree to Disagree

August 20174

Source: Iglesias, Zseby: "Analysis of network traffic features for anomaly detection";; Machine Learning, 101 (2015), 1;; 59 -­ 84.

Reproducibility Workshop

Page 5: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Why a Meta Analysis?• Meta-­Analysis common in other disciplines– Structures the state of art– Combines existing results– Identifies agreements/disagreements in the community

– Provides basis for gap analysis• Provides information about– Availability of data and tools – Parameter settings– Validation Methods– Terminology and notation

è Supports reproducibility and comparability

August 20175Reproducibility Workshop

Page 6: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Data Structure

August 20176Reproducibility Workshop

Page 7: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Example: Features• Base features• Operations on base features• Flow keys

August 20177

Standard IPFIX Information Element

Non-­IPFIX feature

Reproducibility Workshop

Page 8: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Example: Data Set

August 20178Reproducibility Workshop

Dataset available

Page 9: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Example: Algorithms

August 20179

Tool available

Parameters not provided

Link to tools provided

Reproducibility Workshop

Page 10: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Initial Results• 71 Papers from years 2005 to 2017

August 2017Reproducibility Workshop

10

pcapAnomaly Detection

Analysis Chain

Page 11: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Initial Results• Flow Definitions– 64.6% of papers that define a flow-­key useclassical 5-­tuple sIP, dIP, sPort, dPort, Protocol

– 70.8% use bi-­directional flows– 83.1% use flow-­based features

• Data Sets– 46.5% use at least one public data set

• Most Common Features– Number of papers that use a specific base feature– Number of papers weighted with their citations log10(citations)

August 2017Reproducibility Workshop

11

Page 12: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Most Recurrent Base Features

August 201712Reproducibility Workshop

Page 13: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Summary• Meta Analysis for Network Traffic Analysis– Supports comparability and reproducibility– Focus on feature selection, but much more information collected

• JSON files– Structured, searchable state of art– Fast extraction of relevant information from papers

• Initial results– Most common features– Flow definitions– Usage of public data sets

• è Data allows for many further analysis opportunities

August 201713Reproducibility Workshop

Page 14: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Discussion• Manual data curation è Errors– Involve authors (check and correct)

• Analysis just shows “preferred” features, methods– è not necessarily the best!

• Incentives to fill data base– Conferences can require to add accepted papers– Students can add data when exploring state of art– Searchable data base may increase citations for papers included

• All data, documentation, paper data base available at: www.cn.tuwien.ac.at/meta

August 201714Reproducibility Workshop

Page 15: AMeta’AnalysisApproachforFeatureSelectionin …conferences.sigcomm.org/sigcomm/2017/files/program... · 2017. 10. 27. · Institute’of’Telecommunications TU’Wien. Network"Traffic"Analysis

Thank you!Contact: [email protected]