visualisatie - module 3 - big data

Post on 13-Apr-2017

190 Views

Category:

Education

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Post-academiccourseBigData

Post-academiccourseBigData

Joris KlerkxResearch Manager, PhD.joris.klerkx@cs.kuleuven.be

VisualisatieBig DataIVPV - Instituut voor Permanente Vorming28-05-2015

1

Augment group - HCI research lab Dept. ComputerwetenschappenKU Leuvenhttps://augmenthuman.wordpress.com

2

Erik Duval11/9/1965 – 12/3/2016

3

Our mission

“Toaugmentthehumanintellect”(Engelbart,1962)

4

By ‘augmen+nghuman intellect’ we mean increasing the capability of a manto approach a complex problem situa+on, to gain comprehension to suit hisparticular needs, and to derive solu+onstoproblems.

Design,buildandevaluaterelevanttoolsandtechnologiesthathelpuserstobecomebeCerintheirdailylife&work(Duval,2015)

Our mission

5

What are relevant user actions?

How can we capture signals? How can we store them?

How can we create a meaningful feedback loop?

Our Research

Physiological, behavioural signals

Sensors, (self-)trackers

Information visualization

Scalable infrastructure

6

Application Domains

Technology-Enhanced Learning

Media Consumption

Science 2.0

(e)Health

7

Slides will be posted to Slideshare & Zephyr

8

http://www.hearts.com/ecolife/cut-paper-consumption-protect-forests/

9

Big Data

10

Big data

11

Big datainsights

12

Better Human Understanding

13

A mental model represents what a person thinks is true… but isn’t necessarily true

14

UNDERSTANDING OF THEIR MENTAL MODELS

15

Wouter Walgrave - http://www.slideshare.net/wouterwalgraeve/mental-models-as-information-radiators 16

17

18

?

19

"The idea that business is strictly a numbers affair has always struck me as preposterous. For one thing, I’ve never been particularly good at numbers, but I think I’ve done a

reasonable job with feelings. And I’m convinced that it is feelings — and feelings alone — that account for the success of the Virgin brand in all of its myriad forms.” -- Richard

Branson

20

Gut feeling21

What your gut feeling says

What the facts say

22

What your gut feeling says

What the facts say

Confirmation bias

Undervalued Overvalued Foolish23

Big datainsightsdata-driven insights

24

25

Big datainsightsdata-driven insights

Meaningful

26

Defining visualization

27

Definition

28

Information Visualization is the use of interactive visual representations to amplify cognition [Card. et. al]

algorithm<>

human

29

Information Visualisation is the use of interactive visual representations to amplify cognition [Card. et. al]

Definition

30

http://www.demorgen.be/dm/nl/5403/Internet/article/detail/1890428/2014/05/18/Twitteractiviteit-verraadt-je-politieke-profiel.dhtml31

Facilitate human interaction for exploration with and understanding of big data

32

Data visualization

Slidesource:JohnStasko

Scientific visualization

Information visualization

33

Scientific visualisation

Specifically concerned with data that has a well-defined representation in 2D or 3D space (e.g., from simulation mesh or scanner).

Slidesource:RobertPutman 34

Information Visualisation

Concerned with data that does not have a well-defined representation in 2D or 3D space (i.e., “abstract data”)

35

Dispersion (Backstrom & Kleinberg)36

The role of visualisation

37

Big datainsightsdata-driven insights

Meaningful

38

By Longlivetheux - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=3770524739

https://medium.com/@angelamorelli/3-powerful-lessons-i-have-learnt-as-an-information-designer-cb028940254#.mkgb0h2cc40

The Role of visualisation

Brehmer, M.; Munzner, T., "A Multi-Level Typology of Abstract Visualization Tasks," Visualization and Computer Graphics, IEEE Transactions on , vol.19, no.12, pp.2376,2385, Dec. 2013 41

Explore

Data insights: a visualization (Gregor Aisch)

42

http://www.visual-analytics.eu/faq

Also: Visual Analytics

43

Visualizing Big Data

44

Multiple data sources with varied data types

“Diverse” data

I talk geoJSON

i talk custom xml

i talk apache logs

45

millions of records

“Tall” data

46

http://dataclysm.org

Example: 51 million ratings

47

Example: 51 million ratings

48

http://dataclysm.org

Example: 51 million ratings

49

http://dataclysm.org

Example: 51 million ratings

50

http://dataclysm.org 51

Cluttered displays

Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)52

Cluttered displaysBinned density scatterplot

Hexagonal instead of rectangular

Heer, J. & Kandel, S. (2012), Interactive Analysis of Big Data, XRDS, 19 (1)53

Multi-variate data with 100s to 1000s of variables

“Wide” data

54

http://www.perceptualedge.com/blog/?p=2046

In this day of so-called Big Data, organizations are scrambling to implement new software and hardware to increase the amount of data that they collect and store. In so doing they are unwittingly making it harder to find the needles of useful information in the rapidly growing mounds of hay. If you don’t know how to differentiate signals from noise, adding more noise only makes matters worse.

55

Avoid the All-You-Can-Eat buffet! (Ben Fry)56

Visualizations might help reveal multidimensional patterns

Use the power of the machine to find a proxy in the data that predicts the selected variables

Depending on their specific questions, domain experts might select a subset of variables they are interested in

57

Example: 4 million messages/day on OKCupid

http://dataclysm.org 58

Each dot at 90% transparency

http://dataclysm.org 59

http://dataclysm.org 60

http://dataclysm.org 61

http://dataclysm.org 62

Multiple views on the data allow exploration of patterns

63

The strength of visualization

64

Anscombe`s quartet http://en.wikipedia.org/wiki/Anscombe's_quartet

Enables discovery of visual patterns in data sets

Graphics reveal data (Tufte, 2001)

65

World Population GrowthA tremendous change occurred with the industrial revolution: whereas it had taken all of human history until around 1800 for world population to reach one billion, the second billion was achieved in only 130 years (1930), the third billion in less than 30 years (1959), the fourth billion in 15 years (1974), and the fifth billion in only 13 years (1987). During the 20th century alone, the population in the world has grown from 1.65 billion to 6 billion.

Seeing is understanding

66

Facilitates understandinghttp://www.bbc.co.uk/news/world-15391515

67

Facilitates human interaction for exploration and understandinghttp://www.bbc.co.uk/news/world-15391515

68

http://www.informationisbeautiful.net/visualizations/how-many-gigatons-of-co2/

Tells stories

69

T. Nagel, M. Maitan, E. Duval, A. Vande Moere, J. Klerkx, K. Kloeckl, and C. Ratti. Touching transport - a case study on visualizing metropolitan public transit on interactive tabletops. In AVI2014: 12th ACM International Working Conference on Advanced Visual Interfaces, pages 281–288, 2014.

http://www.youtube.com/watch?v=wQpTM7ASc-w

Facilitates human interaction for exploration and understanding70

Will there be enough food?

http

://w

ww.

foot

print

netw

ork.o

rg/e

n/ind

ex.ph

p/gfn

/pag

e/ea

rth_

over

shoo

t_da

y/

Communicates insights easily

71Triggers Impact

http://terror.periscopic.com

Shows patterns & triggers questions

72

Interactivity allows comparison

73

http://blog.stephenwolfram.com/2012/03/the-personal-analytics-of-my-life/

Shows trends & anomalies in the data, therefore triggers questions

74

Helps to find stories, see trends

BelgiumBrazil

USA

India

75

Sentiment analysis in enterprise social network (slack)

Shows patterns

76

http://deredactie.be/cm/vrtnieuws/grafiek/interactief/1.224856177

Reader Client

Tracking Service

WebSockets

Database

engagement data mouse data

10.065 sessies werden getracked

9674 sessies werden gebruikt in de analyse

391 sessies werden verwijderd uit analyse (noise)

78

Visualizing Reader Activity

Elk vierkant is een ‘slide’

Elke rij stelt een navigatie-patroon voor doorheen de slides

Kolom 1 toont absoluut aantal lezers

Kolom 2 toont het percentage lezers

79

262 readers (2.7%) gaan volledig door alle slides, waarna ze snel teruggaan naar de eerste slide om die nog even te bekijken.

Lezerstijd per slide

Lezers spenderen +/- 75 seconden (avg) op de eerste slide om te bestuderen welke informatie voorhanden is.

80

Shows patterns

Sentiment analysis in enterprise social network (slack)

Triggers questions & creates awareness

Disclaimer: Should we trust NLP-algorithms? 81

Empowers users to make informed decisions

Positive Badges

Negative Badges

82

Show errors in the data

http://woutervds.github.io/InfoVisPostgraduwhat/83

Show errors in the data84

Khaled Bachour, Frederic Kaplan, Pierre Dillenbourg, "An Interactive Table for Supporting Participation Balance in Face-to-Face Collaborative Learning," IEEE Transactions on Learning Technologies, vol. 3, no. 3, pp. 203-213, July-September, 2010

Creates awareness

85

http://infosthetics.com/

http://visualizing.orghttp://www.visualcomplexity.com/vc/

http://visual.ly/

http://flowingdata.comhttp://www.infovis-wiki.net

86

Visualizing (big) dataGuidelines & Facts

88

How many circles?

89

Humans have advanced perceptual abilitiesOur brains makes us extremely good at recognizing visual patterns

90

91

Humans have little short term memoryOur brain remembers relatively little of what we perceive.

Most of us can only hold three to seven chunks of data at the same time.Humans have little short term memory

92

RecognitionIdentify previously learned information

93

Humans have advanced perceptual abilities

Humans have little short term memory

Our brains makes us extremely good at recognizing visual patterns

Our brains remember relatively little of what we perceive

Externalize data by using interactive, visual encodingsPromote recognition rather than recall

94

https://www.youtube.com/watch?v=og7bzN0DhpI (9:51 - 11:22 )95

96

“The centrality of human activity in the process is key”

97

Explore

Data insights: a visualization (Gregor Aisch)

98

“It’s not a magical algorithm that finds the insight for you”

“You have to look at the overview, you have to decide what you zoom in to, what you filter out. And then

you click to get the details”Ben Shneiderman, 201199

http://www.bbc.com/future/bespoke/20140724-flight-risk/

Overview first, zoom & filter, details-on-demand

100

Overview first, zoom & filter, details-on-demand

http://www.student.kuleuven.be/~r0580868/

101

https://postgraduwhatblog.wordpress.com/2016/02/13/infovis-van-de-week-1-wouter/

Overview first, zoom & filter, details-on-demand

102

Visual Information Seeking Mantra

103

Real data is ugly and needs to be cleaned

http

://hc

il2.c

s.um

d.ed

u/tr

s/20

11-3

4/20

11-3

4.pd

f

http://www.netmagazine.com/features/seven-dirty-secrets-data-visualisationhttps://code.google.com/p/google-refine/

http://vis.stanford.edu/wrangler/Pre-process your data

104

http://nieuws.vtm.be/verkiezingen/gemeente?province=P1&city=G73

Always check & pre-process your data

105

Verkiezingen 14/10/12

Forget about 3D graphs (on a 2D screen..)

Occlusion Complex to interact with Doesn’t add anything to the data

106

Source: Stephen Few

What if we need to add a 3rd variable?

107

Use small coordinated graphs to add variables

108

Forget about 3D graphs

Source: Stephen Few

Which student has more blogposts?

• Size & angle are difficult to compare• Without labels & legends, impossible to show exact quantitative

differences• Limited Short term (visual) memory

109

Source: Stephen Few

Save the pies for dessert (S. Few)

Try using either of the pies to put the slices in order by size

110

deredactie.be

demorgen.be

vtm.be

Verkiezingen 14/10/12

111

Obviously there are exceptions to the rule

112http://themetapicture.com/the-sunny-side-of-the-pyramid/

0"

5"

10"

15"

20"

25"

30"

blogposts" tweets" comments"on"blogs"

reports"submi6ed"

Student'1'

Student"1"

0" 5" 10" 15" 20" 25" 30"

blogposts"

comments"on"blogs"

tweets"

reports"submi6ed"

Student'1'

Student"1"

Use Common Sense

0"

5"

10"

15"

20"

25"

30"

blogposts" comments"on"blogs"

tweets" reports"submi6ed"

Student'1'

Student"1"

113

0" 10" 20" 30" 40" 50" 60"

Student"1"

Student"2"

Student"3"

Student"4"

blogposts"

tweets"

comments"on"blogs"

reports"submi:ed"

0%# 20%# 40%# 60%# 80%# 100%#

Student#1#

Student#2#

Student#3#

Student#4#

blogposts#

tweets#

comments#on#blogs#

reports#submi;ed#

Use Common Sense

What are you comparing?What story do you get from it?

114

Which graph makes it easier to focus on the pattern of change through time, instead of the individual values?

Choose graph that answers your questions about your data115Source: Stephen Few

vtm.be

deredactie.be

nieuwsblad.be

Verkiezingen 14/10/12

Communicate the correct story

116

Don’t use visualisations to mislead

117

Don’t use visualisations to mislead

118

Source: Stephen Few 119

Source: Stephen Few 120

121

http://fellinlovewithdata.com/research/deceptive-visualizations 122

http://fellinlovewithdata.com/research/deceptive-visualizations 123

How much better are the drinking water conditions in Willowtown as compared to Silvatown?

124http://fellinlovewithdata.com/research/deceptive-visualizations

Storytelling with visualisation

125

Visualization tasks

Brehmer, M.; Munzner, T., "A Multi-Level Typology of Abstract Visualization Tasks," Visualization and Computer Graphics, IEEE Transactions on , vol.19, no.12, pp.2376,2385, Dec. 2013 126

http://www.ted.com/talks/hans_rosling_shows_the_best_stats_you_ve_ever_seen.html127

Human Perception

128

Our brains makes us extremely good at recognizing visual patterns

Source: Katrien Verbert 129

Source: Katrien Verbert 130

A limited set of visual properties that are detected - very rapidly (< 200 to 250 ms), - accurately,- with little effort,- before focused attentionby the low-lever visual system on them.

Healey,C.,&Enns,J.(2012).ADenEonandVisualMemoryinVisualizaEonandComputerGraphics.IEEETransac+onsonVisualiza+onandComputerGraphics,18(7),1170-1188.

Pre-attentive characteristics

Note that eye movements take at least 200 ms to initiate.

131

Pre-attentive characteristics

Find the red dot

<> Hue

Find the dot

<> shape

Find the red dot

conjunction not pre-attentive

http://www.csc.ncsu.edu/faculty/healey/PP/

helps to spot differences in multi-element display

132

Pre-attentive characteristics

Line orientation Length, width Closure Size

Curvature Density, contrast Intersection 3D depth

Not all of them allow showing exact quantitative differencesHelps to spot differences in multi-element display

133

http://www.csc.ncsu.edu/faculty/healey/PP/

http://www.slideshare.net/chelsc/gestalt-laws-and-design-presentation

http://artspilesenglish.blogspot.be/2011/11/gestalt-theory-exercise-for-3rdlevel.html

134

Gestalt Laws (“Pattern” laws)

Basic rules or design principles that describe perceptual phenomena.Explain the way users or humans see patterns in visualisations.

Figure & Ground

135

136

Closure

Smallness

137Source: Katrien Verbert

Common Fate

Objects with a common movement, that move in the same direction, at the same pace, at the same time are organised as a group (Ehrenstein, 2004).

138

Law of Isomorphism

Is similarity that can be behavioural or perceptual, and can be a response based on the viewers previous experiences (Luchins & Luchins, 1999; Chang, 2002). This law is the basis for symbolism (Schamber, 1986).

139

London Tube Map

Which Gestalt laws do you see?

140

Visualization design process

141

B. McDonnel and N. Elmqvist. Towards utilizing gpus in information visualization: A model and implementation of image-space operations. Visualization and Computer Graphics, IEEE Transactions on, 15(6):1105–1112, 2009.http://www.infovis-wiki.net/index.php/Visualization_Pipeline

142

143

Data

- structuretime, hierarchy, network, 1D, 2D, nD, …

- questions where, when, how often, …

- audience domain & visualisation expertise, …

144

S. Stevens. On the theory of scales of measurement. Science, 103(2684), 1946.

StructureTime? hierarchical? 1D? 2D? nD? network? …

145

Questions (to get things going)

What is the average amount of students that bought the course book ?

What? When? How much? How often?

When did students start looking at the course material?

How much hours did Peter work on this assignment?

(Why did Peter have to redo his assignment?)

How often did Peter retake the course before he passed?

(why?)

146

147

Visual mapping

Encode data characteristics into visual form

Each mark (point, line, area,…) represents a data element

Think about relationships between elements (position)

“Simplicity is the ultimate sophistication.”Leonardo da Vinci

Size

http

://w

ww.

info

rmat

ioni

sbea

utifu

l.net

/200

9/vi

sual

isin

g-th

e-gu

ardi

an-d

atab

log/

148

X4

How much bigger is the lower bar?

SlideadaptedfromMichaelPorath&KatrienVerbert

Length

149

X5

How much bigger is the right circle?

SlideadaptedfromMichaelPorath&KatrienVerbert

Area

150

X9

How much bigger is the right circle?

151

Apparent magnitude curves

http://makingmaps.net/2007/08/28/perceptual-scaling-of-map-symbols

SlideadaptedfromMichaelPorath 152

Which one looks more accurate?

SlideadaptedfromMichaelPorath 153

Compensating magnitude to match perception

Color

Color Principles - Hue, Saturation, and Value

https://www.youtube.com/watch?v=l8_fZPHasdo154

Use maximum +/- 5 colors (for categories,.. ) (short term memory)

http://en.wikipedia.org/wiki/HSL_and_HSV

• hue: categorical

• saturation: ordinal and quantitative

• luminance/brightness: ordinal and quantitative

How to choose colors

source from: Katrien Verbert 155

http://colorbrewer2.org

156

157

https://eagereyes.org/basics/rainbow-color-map

158

http://gizmodo.com/why-a-white-cup-makes-your-coffee-taste-more-intense-1663691154

intensity, sweetness, aroma, bitterness, and quality

159

How to choose colors

Position

160

Position & color

http://time.com/12933/what-you-think-you-know-about-the-web-is-wrong/

161

J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions On Graphics, 5(2):110–141, 1986.

162

163

J. Mackinlay. Automating the design of graphical presentations of relational information. ACM Transactions On Graphics, 5(2):110–141, 1986.

164

Offer precise controls for sharing on the Internet... Users should navigate through 50 settings with more than 170 options

Example Facebook privacy statement

Questions?

How did its complexity change over time? How does its length compare to privacy statementsof other tools?

165

How did its complexity change over time?

http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html166

How does its length compare to privacy statementsof other tools?

http://www.nytimes.com/interactive/2010/05/12/business/facebook-privacy.html167

Example: Encoding weather forecast on a smartphone

168

?Joris KlerkxResearch Manager, PhD.joris.klerkx@cs.kuleuven.be@jkofmsk https://augmenthuman.wordpress.com

169

Always on-the-look for new opportunities…

top related