Building a usable visual analyticstool for network security1
Mike JustHeriot-Watt University
Edinburgh, UK
12 July 2016@ Dalhousie University
Halifax, NS, Canada
1Joint with Muhammad Adnan (Leeds) and Lynne Baillie1 / 47
Network security challenges
Increase inNumber of usersVariety of connecting devicesDiversity of communicating applicationsAmounts of network data
Layered security modelFirewalls, IDS/IPS, . . .Active monitoring typically supported by textual orsemi-visual tools as well as home-made scriptsEfficiency and effectiveness of these tools ischallenged by the high volume and complexity ofdata that is being generated
2 / 47
Visual analytics for network security
State-of-the-art still necessitatescomputer+human solutionsVisual analytics has emerged as a promisingapproach to deal with the data overload
Network data is processed and presented in avisualisationVisualisation is interpreted by human, perhaps toidentify possible attack traffic for further analysis
Unfortunately, many proposed VA tools havefailed to gain wide acceptance among networksecurity professionals
3 / 47
Visual analytics for network security
Figure : VISUAL (Ball et al., 2004)
4 / 47
Visual analytics for network security
Figure : TNV (Goodall et al., 2006)
5 / 47
Visual analytics for network security
Figure : VisAlert (Foresti et al., 2006)
6 / 47
Visual analytics for network security
Figure : Itoh et al., (2006)
7 / 47
Visual analytics for network security
Figure : ClockView (Kintzel et al., 2011)
8 / 47
Visual analytics for network security
Figure : FloVis (Taylor et al., 2009)
9 / 47
Visual analytics for network security
Figure : NFlowVis (Mansmann et al., 2009)
10 / 47
Visual analytics for network security
Several common issuesTarget fairly broad use casesLack design justificationsDon’t necessarily meet user needs (match theirwork practices)
“researchers come to us and say, here’s avisualization tool, let’s fit your problem to thistool. But what we need is a tool built to fit ourproblem” (Hao, VizSec 2013)
Closest to our design are FlowVis and NFlowVis
11 / 47
Our approach
Use case: detecting potential bandwidthdepletion DDoS attacksApproach
1 Started with a low-fidelity design of the proposedvisual analytics tool based on existing designguidelines
2 Selection of appropriate time seriesvisualisations for tool by performing aquantitative graphical perception study
3 Evaluation of the proposed tool by designing andconducting a mixed-method user study.
Our goal was to not only design a tool, but to do sovia an effective user-centred design process
12 / 47
Talk Outline1 Low-fidelity design2 Time series visualisations3 Proposed tool evaluation
13 / 47
Talk Outline1 Low-fidelity design2 Time series visualisations3 Proposed tool evaluation
14 / 47
Initial low-fidelity design
1 Network traffic overview
2 Data filters (a) packets & bytes (b) source & destination
3 Network traffic details 15 / 47
Initial LF design approach
Use case: detection of bandwidth depletionDDoS attacks from network flow dataPre-design domain analysis of use caseidentified following characteristics
Causes a considerable increase in the amountnetwork trafficOriginates from multiple source IP addressesUsually targets servers within a networkUsually targets well-known services/ports within anetwork (e.g., web and e-mail services)
Shneiderman design: “Overview first, zoomand filter, then details-on-demand”
Hence, included options for tooltips and zoom
16 / 47
LF design validation
Semi-structured design interviews with networksecurity professsionals
Asked about suitability of different componentsof proposed toolInterviews coded and analysed using theconstant comparative method (CCM)
Part of grounded theory
CategoriesCore design elementsInteraction techniquesTitles and legendsPlacement of interface componentsNetwork traffic detailsNetwork traffic overview
17 / 47
LF design validation (some results)
Interaction techniquesSimplification of data filtersEndorsement of interaction techniques (e.g.,tooltips, zoom)
Increased specificity for titlesFrom ‘main interactive visualisation’ to ‘networktraffic overview’From ‘details on demand’ to ‘network trafficdetails’
Inclusion of baseline historical data for networktraffic overview
18 / 47
Proposed tool – A sneak peek
19 / 47
Talk Outline1 Low-fidelity design2 Time series visualisations3 Proposed tool evaluation
20 / 47
Time series visualisation component
Initial plan: Determine appropriate time seriesvisualisation based on feedback from LF designsIn fact, we introduced 10 visualisations as partof our LF validation
Scatter plot, line chart, silhouette/area chart, barchart, horizon graph, radar chart, rectangularheatmap, circular heatmap, treemap and sunburstvisualisation
However, feedback was not conclusive
Further research uncovered gaps in the study oftime series visualisations
21 / 47
Time series visualisations
Time series visualisations widely usedExample: Network security analysis
Time (horizontal), number of packets (vertical)
Tasks such as maxima and comparison used toidentify possible Denial of Service attacks
22 / 47
Time series visualisations
Time series visualisations widely usedExample: Network security analysis
Time (horizontal), number of packets (vertical)
Tasks such as maxima and comparison used toidentify possible Denial of Service attacks
22 / 47
Time series visualisations
Several possible visual representations to use
23 / 47
Time series visualisations
Several possible visual representations to use
23 / 47
Time series visualisations
Several possible visual representations to use
23 / 47
Time series visualisations
Several possible visual representations to use
23 / 47
Time series visualisations
Which visual representation to use?
What about user interaction?
Dozens of research papers since early 80s onvisual representation and graphical perceptionGaps re: some fundamental factors
Interaction techniquesVisual encodingsCoordinate systems
24 / 47
Time series visualisations
Which visual representation to use?
What about user interaction?
Dozens of research papers since early 80s onvisual representation and graphical perceptionGaps re: some fundamental factors
Interaction techniquesVisual encodingsCoordinate systems
24 / 47
Time series visualisations
Which visual representation to use?
What about user interaction?
Dozens of research papers since early 80s onvisual representation and graphical perceptionGaps re: some fundamental factors
Interaction techniquesVisual encodingsCoordinate systems
24 / 47
Gaps
Interaction techniques
Graphical perception studies commonly in staticsetting, limiting knowledge of user experience.
Visual encodings
Effectiveness within and across position and colourvisual encodings, but not area.
Coordinate systems
Limited empirical evidence on Cartesian vs. Polarcoordinate systems for time series visualisationsusing different visual encodings.
25 / 47
Gaps
Interaction techniques
Graphical perception studies commonly in staticsetting, limiting knowledge of user experience.
Visual encodings
Effectiveness within and across position and colourvisual encodings, but not area.
Coordinate systems
Limited empirical evidence on Cartesian vs. Polarcoordinate systems for time series visualisationsusing different visual encodings.
25 / 47
Gaps
Interaction techniques
Graphical perception studies commonly in staticsetting, limiting knowledge of user experience.
Visual encodings
Effectiveness within and across position and colourvisual encodings, but not area.
Coordinate systems
Limited empirical evidence on Cartesian vs. Polarcoordinate systems for time series visualisationsusing different visual encodings.
25 / 47
Visual Representations
Visual encodings: Position, colour, and areaFor each, a Cartesian and polar coord. systemInteraction techniques: highlighting & tooltips
Position encoding: Cartesian (line chart)
26 / 47
Visual Representations
Visual encodings: Position, colour, and areaFor each, a Cartesian and polar coord. systemInteraction techniques: highlighting & tooltips
Position encoding: Polar (radar chart)
26 / 47
Visual Representations
Visual encodings: Position, colour, and areaFor each, a Cartesian and polar coord. systemInteraction techniques: highlighting & tooltips
Colour encoding: Cartesian (rectangular heatmap)
26 / 47
Visual Representations
Visual encodings: Position, colour, and areaFor each, a Cartesian and polar coord. systemInteraction techniques: highlighting & tooltips
Colour encoding: Polar (circular heatmap)
26 / 47
Visual Representations
Visual encodings: Position, colour, and areaFor each, a Cartesian and polar coord. systemInteraction techniques: highlighting & tooltips
Area encoding: Cartesian (icicle plot)
26 / 47
Visual Representations
Visual encodings: Position, colour, and areaFor each, a Cartesian and polar coord. systemInteraction techniques: highlighting & tooltips
Area encoding: Polar (sunburst plot)
26 / 47
Visual Representation Summary
27 / 47
Graphical perception study
Graphical perception study
4 arrangements of two interaction techniques:
No interaction Only tooltipsOnly highlighting Both highlighting & tooltips
3 visual encodings:
Position Colour Area
2 coordinate systems:
Cartesian Polar
4 study tasks:
Maxima ComparisonMinima Trend detection
96 (4x3x2x4) experimental conditions
28 / 47
Graphical perception study
Graphical perception study4 arrangements of two interaction techniques:
No interaction Only tooltipsOnly highlighting Both highlighting & tooltips
3 visual encodings:
Position Colour Area
2 coordinate systems:
Cartesian Polar
4 study tasks:
Maxima ComparisonMinima Trend detection
96 (4x3x2x4) experimental conditions28 / 47
Study Tasks
MaximaTo identify the highest absolute value in a dataset
MinimaTo identify the lowest absolute value in a dataset
ComparisonTo compare two sets of data points to find outwhich set has the highest aggregated value
Trend detectionTo identify subset of data (i.e., a week) withlowest value increase (upward trend) within dataset
Task scenarioPresented as sales data of a fictitious company
29 / 47
Study Tasks
MaximaTo identify the highest absolute value in a dataset
MinimaTo identify the lowest absolute value in a dataset
ComparisonTo compare two sets of data points to find outwhich set has the highest aggregated value
Trend detectionTo identify subset of data (i.e., a week) withlowest value increase (upward trend) within dataset
Task scenarioPresented as sales data of a fictitious company
29 / 47
Study Design
Study design24 study participants(14 male, 10 female; 18-44 years old)Within-subject factorial design with 96 (4x3x2x4)experimental conditions for each participant
Experimental conditionsCounterbalanced visualisations and interactionsTasks ordered simple to complex(Javed et al., 2010)
Data for visual representations96 distinct, synthetic time series datasets (one foreach condition) following Fuchs et al. (2013)Each dataset had 112 data points (1 per day) over16 week period
30 / 47
Study Procedure
Stage Description
Introduction Greetings, consent, demographicquestionnaire, study explanation
Maxima Task training, 24 conditionsMinima Task training, 24 conditionsComparison Task training, 24 conditionsTrend detect. Task training, 24 conditions
24 experimental conditions for each task(3 visual encodings x 2 coord. systems x 4 interact.)
31 / 47
Study Procedure
Stage Description
Introduction Greetings, consent, demographicquestionnaire, study explanation
Maxima Task training, 24 conditionsMinima Task training, 24 conditionsComparison Task training, 24 conditionsTrend detect. Task training, 24 conditions
24 experimental conditions for each task(3 visual encodings x 2 coord. systems x 4 interact.)
31 / 47
Study data collected
Effectiveness measured with four components,collected after each experimental condition
Completion of an experimental condition (sec)
Accuracy of the given answer (binary)
Confidence of the given answer (5-point Likert)
Ease of use of a visualisation (5-point Likert)
Final two collected via questionnaire per condition
32 / 47
Study data collected
Effectiveness measured with four components,collected after each experimental condition
Completion of an experimental condition (sec)
Accuracy of the given answer (binary)
Confidence of the given answer (5-point Likert)
Ease of use of a visualisation (5-point Likert)
Final two collected via questionnaire per condition
32 / 47
Results: Interaction Techniques
Interactivity enhanced user experienceInteraction significantly better than no interactionConfidence and ease-of-useNo affect on completion time or accuracy
Exception: Minima, and colour encoding
Textual (tooltips) better than highlighting
33 / 47
Results: Interaction Techniques
Interactivity enhanced user experienceInteraction significantly better than no interactionConfidence and ease-of-useNo affect on completion time or accuracy
Exception: Minima, and colour encoding
Textual (tooltips) better than highlighting
33 / 47
Results: Interaction Techniques
Interactivity enhanced user experienceInteraction significantly better than no interactionConfidence and ease-of-useNo affect on completion time or accuracy
Exception: Minima, and colour encoding
Textual (tooltips) better than highlighting
33 / 47
Results: Visual Encodings
Completion, accuracy, confidence, & ease
Position & colour better: max, min, trend det.Colour more accurate for minima
Area more effective for comparison task
34 / 47
Results: Visual Encodings
Completion, accuracy, confidence, & easePosition & colour better: max, min, trend det.
Colour more accurate for minima
Area more effective for comparison task
34 / 47
Results: Visual Encodings
Completion, accuracy, confidence, & easePosition & colour better: max, min, trend det.
Colour more accurate for minima
Area more effective for comparison task
34 / 47
Results: Coordinate Systems
Completion, accuracy, confidence, & ease
Cartesian generally better than polar
Polar better for minima task with area
Neglible effect of coordinate system for colour
35 / 47
Results: Coordinate Systems
Completion, accuracy, confidence, & ease
Cartesian generally better than polar
Polar better for minima task with area
Neglible effect of coordinate system for colour
35 / 47
Results: Coordinate Systems
Completion, accuracy, confidence, & ease
Cartesian generally better than polar
Polar better for minima task with area
Neglible effect of coordinate system for colour
35 / 47
Results: Coordinate Systems
Completion, accuracy, confidence, & ease
Cartesian generally better than polar
Polar better for minima task with area
Neglible effect of coordinate system for colour
35 / 47
Key Findings
Interactivity improved user experienceImproved confidence and ease of use, without asignificant decrease in completion time or accuracy.
No “one-size-fits-all”The choice of a visual representation should bebased on the type of tasks
Generally, Cartesian is betterCartesian coordinate systems are generallycomparable or more effective than Polar, except forvisualisations that use area for minima.
36 / 47
Key Findings
Interactivity improved user experienceImproved confidence and ease of use, without asignificant decrease in completion time or accuracy.
No “one-size-fits-all”The choice of a visual representation should bebased on the type of tasks
Generally, Cartesian is betterCartesian coordinate systems are generallycomparable or more effective than Polar, except forvisualisations that use area for minima.
36 / 47
Key Findings
Interactivity improved user experienceImproved confidence and ease of use, without asignificant decrease in completion time or accuracy.
No “one-size-fits-all”The choice of a visual representation should bebased on the type of tasks
Generally, Cartesian is betterCartesian coordinate systems are generallycomparable or more effective than Polar, except forvisualisations that use area for minima.
36 / 47
Talk Outline1 Low-fidelity design2 Time series visualisations3 Proposed tool evaluation
37 / 47
Initial low-fidelity design (reminder)
1 Network traffic overview
2 Data filters (a) packets & bytes (b) source & destination
3 Network traffic details 38 / 47
Proposed tool – Line chart
39 / 47
Proposed tool – Icicle plot
40 / 47
Proposed tool – Updates
Streamline of source and destination filtersAnd radio buttons, rather than checkboxes
Updates to some titles
Inclusion of zoom interactionVisualsation choices
Line chart: Effectiveness for maxima, minima, andtrend detection(could have also selected rectangular heatmap)Icicle plot: Effectiveness for data comparison(could have also selected sunburst visualisation)
41 / 47
Tool Development and Dataset
Developed as a web applicationHTML5, CSS, Javascript, and D3.jsMySQL to store network flow data, via PHP
Network flow dataset from the VAST 2013challenge
8GB of data with about 70mil network flow records15 days of network traffic collected from asimulated networkIncludes four potential bandwidth depletion DDoSattacks
We created three different variations of thedataset for our three experimental conditions
Original & increased/decreased traffic volumeTemporal position of DDoS attacks randomlypositioned
42 / 47
User Study
We recruited 12 participants for a lab study tomeasure the tool’s effectivenessA within-subjects design with participantsexposed to three conditions (counterbalanced)
1 Tool with line chart only2 Tool with icicle plot only3 Tool with both visualisations available (radio
button)
Participants asked to find three possiblenetwork attacksMeasures
Completion time and accuracyUsability measure using SUS and NASA-TLXAlso conducted a post-evaluation design interview
43 / 47
Quantitative Results
Conditions Time(s) Acc.(%) SUS NASA-TLXLine 153 89 77 31
Icicle 129 89 76 31Both 164 97 80 31
Average 149 92 78 31
No statistically significant difference betweenthe conditions
44 / 47
Qualitative Results
Post-evaluation semi-structured designinterview
Interviews recorded and analysed similar to LFdesignNetwork traffic overview
Preference for line chart vs. icicleDesire for ability to better compare data, e.g., viewzoomed chart simultaneously with original
Network traffic detailsDesire for more detail and interaction
Interactive functionalityDesire for more detail with tooltips
45 / 47
Looking Ahead
Future work on time series visualisationsIncreased study of interactivityOffset, interaction effects, different tasks andinteractionsUse in different domains
Visualisations for network securityChallenge to meet needs/desires of networksecurity professionalsChallenge to convey information in visualisations.Max/min are “easy”. Comparison and trenddetection more challenging.Approaches need to start with clear use case, andrequirements (e.g., involvement of end-userprofessionals)
46 / 47
Further reading
Work on time series visualisations was published atCHI’16. Paper available from my website.
Contact
Interactive & Trustworthy Technologies (ITT)Web http://www.ittgroup.org/
Twitter @ITT Research
Mike JustWeb http://www.justmikejust.co.uk/
Email [email protected]
47 / 47