making sense of data visually: a modern look at datavisualization
TRANSCRIPT
Making sense of data visually:
A modern look at data visualization
VLADIMIR MILEV
NEW VENTURE SOFTWARE
Author BioVladimir Milev
MCPD Enterprise
Speaker (Devreach, NTK Slovenia and others)
DV Evangelist
Founder at New Venture Software
@vmilev
www.linkedin.com/in/vladimirmilev/
http://www.newventuresoftware.com/
Agenda1. Big data and information overload
2. What problems DataViz solves
3. DataViz fundamental theory
4. Basic visualizations
5. Advanced visualizations
Information OverloadTwitter: 500 million tweets per day
Facebook: 55 million status updates per day
Facebook: 900 million interactions per day (comments, likes etc.)
Reddit:
Proliferation of smart devices We are already living in a world dominated by
smart devices What is the meaning of this? More connected, data is more accessible Less space for tables and text Must use visual communication
Making Sense of DataIncreasing amount of data available
Increasing number of data consumer devices
Obtaining data no longer a problem
We have an Information Overload issue
Quick data analysis is the new problem
But how quick?
A Picture is worth a 1000 wordsWith about 1,000,000 ganglion cells, the human retina would transmit data at roughly the rate of an Ethernet connection, or 10 million bits per second.”
-Vijay Balasubramanian, PhD, Professor of Physics at U Penn
OK – That’s a lot of bandwidthBUT ARE WE USING IT EFFICIENTLY?
EfficiencyBest readers usually read up to about 300 words per minute.
Average word length is 5.1 letters
300 * 5.1 = 1530 characters per minute
Or 1530 / 60 = 25.5 characters per second
1 character is usually stored as 8 bits
26 * 8 = 208 bits per second
Reading bandwidth is ~0.025 KiB/s
Or 0.00208% Efficiency
So reading clearly isn’t the way to go…BUT WHAT IS THE SOLUTION?
Using statisticsFor the most part of the 20th century
Using arithmetic mean, average, standard deviation
Variance, correlations, regressions
Turns out this is not good enough
Anscombe’s QuartetI II III IV
x y x y x y x y
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.1 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.1 4 5.39 19 12.5
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89
• Statistical properties are identical:• Mean of X (9.0) and Y (7.5) values are constant• Nearly same variances, correlations and regressions• As far as statistics is concerned these sets are almost the same
Anscombe’s Quartet
So DataViz is very powerful
But why does it work so well?
Gestalt PsychologySeeing with the brain
The mind understands external stimuli as whole rather than the sum of their parts
We tend to order our experience in a manner that is regular, orderly, symmetric, and simple
Key principles of gestalt: reification, multistability, invariance
Gestalt laws of grouping: proximity, similarity, closure, symmetry
Gestalt Principles - ReificationOur minds tend to construct/generate information
Gestalt Principles - Multistability
The tendency of our mind to jump back and forth between ambiguous alternative interpretations
Spinning Girl Rubin Vase
Gestalt Principles - InvarianceThe tendency to perceive simple geometric objects independent of rotation, translation, and scale
Also elastic deformations, different lighting, and different component features
Gestalt Laws of Grouping - Similarity
We group objects based on visual similarity
Gestalt Laws of Grouping - Proximity
We group items based on spatial proximity
Gestalt Laws of Grouping - Closure
We perceive objects such as shapes, letters, pictures, etc., as being whole when they are not complete
Application in Data Visualization Introducing the visual variables
Fundamental properties of objects which can encode information into a picture
Fundamental visual variables:◦ Position
◦ Size
◦ Color
◦ Shape
◦ Orientation
Basis for all Data Visualization!
Basic/Common VisualizationsBar graphs
Line graphs
Area charts
Pie charts
Bar Graphs
• Using color correctly to encode gender
• Using position (ordering) to create an orderly scale
• Using size to encode the values• Using orientation to differentiate
gender again
Bar Graphs continued
• Labels are used• Color is neutral and does not encode
information• Again, we have top-down ordering
(position)• And again size encodes the relative
numeric value
Bars and Normal Distribution
Minimum passing grade
• Distribution of test scores for Polish “Matura” exam
• Normal Distribution is expected
• Red line shows normal distribution
• 30 is the minimum expected grade
• Detecting behavioral changes• What happened?
Line Graphs
Confirming what we already know –paper media is declining rapidly.
• Shape encodes the value• Color is not significant• Design goal is to show a
trend/change
Area Graphs
Effect of school year on Team Fortress 2 players
School starts
• Similar to line graph• Design goal for area
charts is emphasize on the value/quantity, not so much on the trend
• You can see both• Color has no
meaning
Area Graphs continued• This time color carries a meaning (legend)
• The graph is also good for displaying ratio between series of data over time
Pie Charts
Pie ChartsGolden Rules for Pie Charts
• Ratio of one piece to the whole
• Order the values
• Less than 6 pieces
• Avoid legends
• Sum up to 100%
Abusing Pie Charts
Don’t break the rules!
Maps
Plot millions of journal entries from 18th and 19th century ship logs, and you reveal a picture of ocean trade you've never seen before
• Visualization of routes
• Color saturation indicates heavily used routes
Maps are good with animations too
• Concentration of NO2 from 2005 to 2011
• Using both color and position to encode concentration
• Using continuous color scale• Adding another dimension -
time
Choropleth Maps
Displaying the most popular name for a newborn in each state
• Using discrete palette to encode information
Heat Maps
• Excellent for plotting recurring values
• Color saturation/brightness encodes the values
• Position also encodes information
• Easy to spot concentrations and find patterns
Heat Maps medicine/genetics
Tree Maps
• Excellent for representing hierarchical data
• Color carries a meaning• Size carries a meaning as well• Position is irrelevant• Suitable for annotations
Parallel Coordinates Plot
• Interactive visualization• Good at displaying
relationships between different dimensions of data
• Position encodes dimension
• Color encodes scale
Parallel Coordinates Plot – in action
Selecting a subset of a dimension to display the relationships with the other dimensions
Chord Diagram
• Similar to Parallel Coordinates plot
• Color and Position used to encode data
• Design is different• Filtering of dimensions is not a
design goal• Focuses on selecting a whole
dimension
Some resourceshttp://www.reddit.com/r/dataisbeautiful/
http://blog.visual.ly/
http://flowingdata.com/
http://eagereyes.org/
http://www.perceptualedge.com/blog/
Thank You!