infographics and visualisation (or: beyond the pie chart · • data analysis may tell you...

27
Infographics and Visualisation (or: Beyond the Pie Chart) LSS: ITNPBD4, 6 and 10 Oct 2015

Upload: others

Post on 20-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Infographics and Visualisation

(or: Beyond the Pie Chart)

LSS: ITNPBD4, 6 and 10 Oct 2015

Page 2: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

ITNPD4: Applications of Big Data 2

Overview –  Why infographics and visualisation

–  What’s the problem we’re trying to solve? –  What makes for good infographics and visualisations? –  Where are we now in this area?

Overview

Page 3: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

The problem

•  Data analysis may tell you something about the structure of a problem

•  Or may predict how to optimise something –  Profit, energy usage etc.

•  BUT: –  In general you will have to convince someone –  And they may not be convinced by the numbers on their

own •  They expect some sort of graphic that they can

show to the Board/CEO to convince them –  A visualisation, perhaps an infographic.

ITNPD4: Applications of Big Data 3

Page 4: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Visualisation and infographics

•  Visualisation is the generic name for displaying data –  May be a single image –  Or a movie, for example. “Visualizations help people see things that were not obvious to them before” (SAS website)

•  There is also sonification, where data is sounded out: this works, because our ears are very good a picking up patterns. –  E.g. Geiger counter, reversing systems in modern cars.

•  Infographics are generally single images –  Providing a visualisation of a specific set of data.

ITNPD4: Applications of Big Data 4

Page 5: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Infographics •  An infographic is a

picture that displays information in an accessable and/or informative way.

•  Can be quite simple •  …or quite complex

ITNPD4: Applications of Big Data 5

Page 6: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

…not a new idea (Minard, 1869)!

ITNPD4: Applications of Big Data 6

The standard text in this area is E. R. Tufte, “The visual display of quantitative information”

Page 7: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

ITNPD4: Applications of Big Data 7

Infographic shows the troops and troop movements on the eastern from in World War 2.

Page 8: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Visualisation of low-dimensional datasets

•  Low-dimensional datasets are often visualised as simple X/Y graphs: but even here there are issues –  For both X and Y axes:

•  Offset (is the origin at 0?) •  Scale •  Linear or logarithmic? •  Continuous or broken axes.

–  Graph lines: •  One or more than one? •  Line style: continuous, dashed, dotted… •  Line colour •  Symbols and/or lines?

ITNPD4: Applications of Big Data 8

Page 9: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

ITNPD4: Applications of Big Data 9

0 100 200 300 400 500 600 700 800 900 1000-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

100 101 102 103-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800 900 100010-4

10-3

10-2

10-1

100

101

100 101 102 10310-4

10-3

10-2

10-1

100

101

Page 10: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

ITNPD4: Applications of Big Data 10

0 100 200 300 400 500 600 700 800 900 1000-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 800 900 1000-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Page 11: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

ITNPD4: Applications of Big Data 11

Page 12: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Visualising high dimensional datasets

•  This is harder: and can be where infographics comes in –  Cannot do this directly.

•  Can plot two or three dimensions directly, but not more •  Clever infographics can plot more dimensions, for example using

geographical location, lines of varying thickness and colour, multiple symbols

–  How can we show the structure of such datasets? •  When we can’t think of one-off target-domain clever tricks …

–  Discuss earlier infographics

•  Clearly depends on what we are trying to show! –  Geography as timeline, for example –  See also http://www.creativebloq.com/graphic-design-tips/great-

infographic-design-tips-1232813

ITNPD4: Applications of Big Data 12

Page 13: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

What can we do in general

•  Let’s say that we don’t have any inspiration for designing a good infographic (!) –  Infographics often depends on specific factors

•  E.g. dates, geographic distribution, …

•  Can we find 2 or 3 (or even a few more) dimensions that … – … in some sense

•  …summarise (what we want to emphasise about) the dataset?

•  Ways forward: projecting and clustering

ITNPD4: Applications of Big Data 13

Page 14: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Choosing dimensions and projecting data

•  If the data is evenly spread throughout all the dimensions and has no structure? –  Give up. There’s nothing to be learned from it.

•  Datasets that have something to tell us have some from of structure

•  Maybe the data lie (largely) on a smaller dimensional subset of the high-dimensional space. –  As opposed to being spread evenly throughout the

original space.

ITNPD4: Applications of Big Data 14

Page 15: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Example •  Say that we have 3-dimensional data, sampled over time

–  Each point is (x,y,z,t): really 4-dimensional data •  and -1 <= x,y,z <=1, 0<=t<=10 (the points (x,y,z) are

inside a sphere, of radius 1, centered at the origin) •  Let’s also say that at each time t, sqrt(x2+y2+z2) = t/10

–  So that the points at time t are on the surface of a sphere of radius t/10

•  Clearly, if we simply look at all the(x,y,z) points (ignoring t) they are spread throughout the sphere –  But not in an unstructured way

ITNPD4: Applications of Big Data 15

Page 16: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Discovering structure in data

•  There are many techniques for discovering structure –  Principal component analysis (pca)

•  Linearly projecting a high dimensional dataset on to a smaller number of dimensions

•  In such a way that as much as possible of the variance in the data is contained in this smaller number of dimensions

•  And the dimensions are orthogonal to each other •  Well-understood and commonly used technique for data

dimension reduction

ITNPD4: Applications of Big Data 16

Page 17: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Independent components analysis •  Independent components analysis (ica)

–  a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. Hyvärinen, (U Helsinki)

•  Essentially looking for dimensions that co-vary •  Finding ways of summarising points in the N-dimensional

space using less than N values. •  Data is assumed to be a linear mixture of underlying

latent variables –  These are assumed non-Gaussian, and mutually independent:

independent components •  Related to PCA, but can find structure when PCA fails to

do so

ITNPD4: Applications of Big Data 17

Page 18: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Example: input

ITNPD4: Applications of Big Data 18

Page 19: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

ICA output

ITNPD4: Applications of Big Data 19

Page 20: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

ITNPD4: Applications of Big Data 20

Page 21: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Clustering data

•  Often rather than projecting data on to other axes, it is better to look at how the data points are grouped –  The aim is to classify a large number of data vectors

into a small number of manageable groups •  Does the data fall into clusters?

–  How unevenly distributed is the data? –  Does it cluster in

•  The original high-dimensional space •  In a lower-dimensional projected space?

ITNPD4: Applications of Big Data 21

Page 22: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

How does clustering work?

•  Techniques –  Partition or Hierarchical

ITNPD4: Applications of Big Data 22

Page 23: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Examples

ITNPD4: Applications of Big Data 23

Page 24: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Partition-based clustering •  Based on distance between vectors

–  But which distance? •  Euclidean •  City-block? •  Weighted versions •  Chebychev distance

•  Forming clusters: –  Simple method:

•  Start with each vector as a single-element cluster •  Identify two closest vectors and combine them into the same

cluster. •  Keep doing this until the distance between the two closest

vectors not in the same cluster is large.

ITNPD4: Applications of Big Data 24

Page 25: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Criticisms of clustering

•  Clustering is descriptive, and not unique –  Actual clusters may depend on techniques used, as well

as on the data •  Clustering techniques will always find clusters

–  Even when there aren’t any! –  (This implies some measure for qualirty of clustering

should be used) •  Clustering techniques depend strongly on the

measures used –  There should ideally be some conceptual support of the

measures used to calculate distances between vectors.

ITNPD4: Applications of Big Data 25

Page 26: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Examples:

•  Google News indexes –  Uses text to create topic clusters

•  Title, article listings •  Used to discover multiple reports of same story

•  Video clusters on YouTube –  Uses keywords, popularity, viewer engagement, user

browsing history –  http://www.strutta.com/blog/six-degrees-of-youtube/

ITNPD4: Applications of Big Data 26

Page 27: Infographics and Visualisation (or: Beyond the Pie Chart · • Data analysis may tell you something about the structure of a problem • Or may predict how to optimise something

Infographics tools

•  At its simplest, Excel has many facilities for creating infographics and visualisations. –  But it’s limited, and proprietary (though one can

import comma separated values) •  Matlab? Not free! Good graphing tools •  Flot: jQuery and JavaScript based •  Google Chart API: free

–  JavaScript based, browser output

•  D3: JavaScript based, very powerful.

ITNPD4: Applications of Big Data 27