an introduction to data visualization - github pages · an introduction to data visualization...

Post on 23-May-2020

14 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

An Introduction to Data VisualizationAnamaria Crisan

@amcrisan http://cs.ubc.ca/~acrisanacrisan@cs.ubc.ca 1

Master of Science ( Bioinformatics )

PhD(Computer Science)

GenomeDX Biosciences

British Columbia Centre for Disease Control

2010 2013 20152008

2

PhD Candidate, Computer ScienceUniversity of British Columbia

Webinar Learning Goals

Have a high-level understanding of data visualization design and evaluation

Have a basic understanding of different data visualizations tools as well as their strengths and limitations

Today

Tomorrow

3

What we’ll talk about

4

Why should we visualize data?

How do we use data visualizations?

How should we visualize data?5

A Comment on “How Should we Visualize Data?”

There are two aspects of visualizations to think about:

How do you make a visualization? Is it the right visualization?

6

Why should we visualize data?

7

Translating Numbers to Words

http://bit.ly/1FxtT2z

It is not always easy to reason consistently with numbers

8

60%

Probability Frequency Visualization6 in 10

< <

Whiting (2015) “How well do health professionals interpret diagnostic information? A systematic review”

• Numeracy : the ability to reason with numbers§ Individuals with low numeracy have a difficulty interpreting numbers and probabilities§ Also true amongst educated professionals

• Visualization can make data more accessible to individuals with lower numeracy skills

Least Understandable Most Understandable

Visualizing Data is Effective

9

But …. Visualization Design ALSO matters

Baseline Visualization

Alternative 1 Alternative 2

Zikmund-Fisher (2013). A demonstration of ''less can be more'' in risk graphics.

Example: Communicating Survival Benefit of Cancer Therapy

11

OPTION A OPTION B

Example: Infection Transmission in a Hospital

12

Example: Visualizing Arteries of the Heart for Surgery Planning

Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis” Made with : Processing 13

EXISTING STANDARD Accuracy : 39%

REVISED VISUALIZATIONAccuracy: 91%

Borkin (2011). “Evaluation of Artery Visualizations for Heart Disease Diagnosis” Made with : Processing 14

Example: Visualizing Arteries of the Heart for Surgery Planning

How do we use data visualizations?

15

Role of Data visualization in the current paradigm of scientific research

= Communication

16

Do you have a

research

Problem?

Yes.

No.

Do all the

Science!

But eventually you’ll have a problem

right?

Duh.

Informthe masses!

https://www.ratbotcomics.com/comics/pgrc_2014/1/1.html

17

Yes.

No.

Do all the

Science!

Duh.

Inform

Maybe data

Visualization?

Infographics are pretty

the masses!

Problem?

right?18

Do you have a

research

But eventually you’ll have a problem

Yes.

No.

Do all the

Science!

Duh.

Inform

Did it work?

Maybe data

Visualization?

the masses!

Infographics are pretty

Problem?

right?19

Do you have a

research

But eventually you’ll have a problem

Yes.

No.

Do all the

Science!

Duh.

Inform

Did it work?

Maybe data

Visualization?No : (

the masses!

Different Infographics?

Problem?

right?20

Do you have a

research

But eventually you’ll have a problem

Yes.

No.

Do all the

Science!

Duh.

the masses!Inform

Did it work?

Maybe data

Visualization?No : (

Different Infographics?

Declare VictoryYes!

(maybe?)

Problem?

right?21

Do you have a

research

But eventually you’ll have a problem

Limitation #1 : Missed Opportunity in Exploration

Do all the

Science!

DataVisualization!

the masses!Inform

Missed Opportunity for Exploration§ Exploration is looking at your data,

trying different analysis methods, assessing if there are outliers or missing data etc.

22

Limitation #1 : Missed Opportunity in Exploration

Same stats, different graphs (Anscombe’s quartet)

23

Autodesk Research (2017). Same Stats, Different Graphs: https://www.autodeskresearch.com/publications/samestats

Same stats, different graphs

Limitation #1 : Missed Opportunity in Exploration

24

Autodesk Research (2017). Same Stats, Different Graphs: https://www.autodeskresearch.com/publications/samestats

Same stats, different graphs (Datasaurus)

Limitation #1 : Missed Opportunity in Exploration

25

Limitations #2 : Identifying the Appropriate Vis

Selecting the appropriate data visualization is challenging

DataVisualization!

We’ll spend the rest of the talk on this subject

§ True for exploration & communication applications

26

How should we visualize data ?

27

Human Perception & Cognition

Computer Graphics

Data Analysis

Cross Cutting Disciplines in Information Visualization

Visualization Design & Analysis28

R. Kosara (EagerEyes) – https://eagereyes.org/basics/encoding-vs-decoding

Encoding and Decoding Information

A Small Digression

30

Non-colour blind individual

Colour blind individual

Example 1: A Heat map Example 2: The Dress

Concrete Examples of Perception in Action

Colour Blind Simulator: http://www.color-blindness.com/coblis-color-blindness-simulator/

And… we’re back!

32

Putting it all Together for Visualization Design & Analysis

§ Non-trivial to condense knowledge across all these areas

§ Still an ongoing area of research§ I will try convey a simpler

intuition about design & analysis

33

Why? (Motivation)Why do you need to visualize data?How will you, or others, use the visualization?

Breaking Down a Visualization in Three Questions

34

Breaking Down a Visualization in Three Questions

Why? (Motivation)Why do you need to visualize data?How will you, or others, use the visualization?

What? (Data & Tasks)What kind of data is being visualized?What tasks are performed with the data?

35

People tend to jump to this level and ignore why and what

What? (Data & Tasks)What kind of data is being visualized?What tasks are performed with the data?

How? (Visual & Interactive Design)How do you make the visualization?Is it the right visualization?

Why? (Motivation)Why do you need to visualize data?How will you, or others, use the visualization?

Breaking Down a Visualization in Three Questions

36

Design & Evaluation with Three Questions

Why?

What?

How?

Design EvaluationDoes the visualization address the the intended need?

Are you using the right data, or deriving the right data?

Are the visual & interactive choices appropriate for the data and tasks?

Does the visualization support the tasks using that data?

If interactive / computer based, is the visualization easy to use and reliable (i.e doesn’t crash all the time)

37

A Nested-model for Visualization Design & Analysis

Why?

What?

How?

Design

Evaluation

T. Munzner (2014) – Visualization Design and Analysis

Domain Problem*

Data+ Task

Visual + Interaction Design Choices

Algorithm

Infovis (Information Visualization) research advocates an iterative process

T. Munzner (2014) – Visualization Design and Analysis

Design

Evaluation

Thinking Systematically about Data Visualization

*Domain Problem = Motivation 39

An iterative approach to development allows us to get feedback before committing to ineffective design choices

An Iterative Process

40

1. Identify a relevant problem that effects you or a group of stakeholders

Domain Problem

Data+ Task

Visual + Interaction Design Choices

Algorithm

T. Munzner (2014) – Visualization Design and Analysis

Thinking Systematically about Data Visualization

41

NursesClinicians

Medical Health Officers Researchers

Community Leaders

§ Multidisciplinary decision making teams§ More data & diverse data types = more informed decision making§ BUT – different stakeholder abilities to interpret data & different needs

Public Health Stakeholders

PoliticiansPatients

42

2. Ask what data stakeholders use (is it available)?

3. Ask what stakeholders do with the data [tasks]

Domain Problem

Data+ Task

Visual + Interaction Design Choices

Algorithm

T. Munzner (2014) – Visualization Design and Analysis

Thinking Systematically about Data Visualization

43

Many Different Types of Data!

T. Munzner (2014) – Visualization Design and Analysis44

Don’t Just Visualize the Raw Data!

Original (Raw) Data

Derived Data

Example Example when this advice is ignored

T. Munzner (2014) – Visualization Design and Analysis XKCD

People also Perform Different Tasks with Data

A Crisan (2017) – Evidence Base Design and Analysis of a whole genome sequence clinical report….

WGS equivalent

DIAGNOSIS TASKS TREATMENT TASKS SURVEILLENCE TASKS

TOTAL SCOREDiagnose

Latent TBDiagnose Active TB

Reactive vs New Acuqistion

Characterize Transmission Risk

Choose Meds

Choose Tx Duration

Assess Response to Tx

Guide Contact Tracing

Report to Public Health

Define a Cluster

Connect case to

Existing Cluster

Guide Public Health

Response

Patient Identifier Same 3 3 3 3 3 3 3 2 1 1 1 1 26

Sample Collection Date Same 3 3 2 3 3 3 3 1 1 1 1 1 24

Patient Prior TB Results Same 3 2 3 3 3 3 3 1 1 1 0 1 23

Speciation Speciation 1 3 2 3 3 3 3 2 1 1 1 1 23

Sample Type (sputum, fine needle aspirate)

Same 2 3 2 3 3 3 3 1 1 1 0 1 22

Culture results WGS data 1 3 2 3 3 3 3 2 1 1 0 1 22

Sample Collection Site (lymph node, blood draw etc.)

Same 2 3 2 3 3 3 3 1 1 0 0 1 21

Acid Fast Bacilli Smear Speciation 2 3 2 3 2 3 3 1 1 1 0 1 21

Resistotype Predicted DST 0 2 3 1 3 3 2 2 1 1 1 1 19

Phenotype DST Predicted DST* 0 2 3 2 3 3 2 1 1 1 0 1 18

Chest x-ray NA 3 3 2 3 0 2 3 1 0 0 0 0 17

Report Releate Date Same 2 2 1 2 2 2 2 1 0 1 0 1 15

Requester IDs Same 2 2 2 2 2 2 2 1 0 0 0 0 15

Interpretation or comments from reviewer

Same 2 2 1 2 2 2 3 1 0 0 0 0 15

Predicted DST Predicted DST 0 2 2 1 3 3 2 1 0 1 0 0 15

MIRU-VNTR SNPs 0 2 3 1 1 1 1 1 1 1 1 1 13

Cluster Assignment Cluster Assignemnt 0 2 2 1 1 1 0 1 1 1 1 1 11

SNP/variant disance SNPs 0 1 2 1 1 1 0 1 1 1 1 1 10

Phylogenetic Tree Phylogenetic Tree 0 2 1 1 1 1 0 1 0 1 1 1 9

Reviewer ID Same 1 1 1 1 1 1 1 1 0 0 0 0 8

TST results Speciation** 3 1 1 1 0 0 0 1 0 0 0 0 7

IGRA results Speciation** 3 1 1 1 0 0 0 1 0 0 0 0 7

Lab QC WGS Speciffic 0 1 2 1 1 1 0 1 0 0 0 0 7

Spoligotype SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3

RFLP SNPs 0 1 1 1 0 0 0 0 0 0 0 0 3

3 (High) 2 (Some) 1 (Low) 0 (V. L ow)Degree of consensus

46

4. Explore if other visualizations have addressed this problem and set of tasks & data

5. Implement your own solution (remember this include interaction!)

T. Munzner (2014) – Visualization Design and Analysis

Domain Problem

Data+ Task

Visual + Interaction Design Choices

Algorithm

Thinking Systematically about Data Visualization

47

https://www.youtube.com/watch?v=j4Ut4krp8GQ

Example of a more complex visualization

48

A Small Digression

49

Mark:Basic Graphical Element(basic building block)

Channel:Controls the appearance of marks

Marks & Channels : Basic Building Blocks

T. Munzner (2014) – Visualization Design and Analysis49

Example

Channels Vary in their Effectiveness

Bar ChartPosition Common Scale

Pie ChartAngle & Area

J. Heer (2010) – Crowdsourcing Graphical Perception: Using Mechanical Turk ……50

ggplot (data = mpg, aes( x= display, y = cty, colour = class)) + geom_point( )

Channel: Position Channel: Colour

Mark: Point

Marks & Channels : ggplot2 example

Note: Generally in ggplot2 aesthetics refer to channels and geoms refer to marks, but there are complex geoms that aren’t simple marks but chart types (i.e. geom_density) and there are aesthetics that have little to do with the visual channels directly (i.e. group)

https://rpubs.com/hadley/ggplot-intro51

And… we’re back!

53

4. Explore if other visualizations have addressed this problem and set of tasks

5. Implement your own solution (part or all of that solution could be a new algorithm)

Domain Problem

Data+ Task

Visual + Interaction Design Choices

Algorithm

Thinking Systematically about Data Visualization

54

6. Test multiple alternatives (including new ones you develop) with stakeholders

7. Gather qualitative & quantitative evaluation data

Domain Problem*

Data+ Task

Visual + Interaction Design Choices

Algorithm

Thinking Systematically about Data Visualization

55

1. Identify a relevant problem that effects you or a group of stakeholders

2. Ask what data stakeholders use (is it available)?

3. Ask what stakeholders do with the data [tasks]

4. Explore if other visualizations have addressed this problem and set of tasks & data

5. Implement your own solution (vis and/or algorithm)

6. Test multiple alternatives (including new ones you develop) with stakeholders

7. Gather qualitative & quantitative evaluation data

Design

Evaluation

Thinking Systematically about Data Visualization

56

Discovery Design ImplementInformation Gathering Design & Evaluation Finalize Design

Expert Consults

Task & DataQuestionnaire

Design Sprint

Design Choice Questionnaire

TB Workflow

MapData GatheredQualitative

Quantitative

Study Design Exploratory Sequential Model Embedded Model

https://peerj.com/articles/4218/

MYCOBACTERIUM TUBERCULOSISGENOME SEQUENCING REPORTNOT FOR DIAGNOSTIC USE

Pa ent Name JOHN DOE BarcodeBirth Date 2000-01-01 Pa ent ID 12345678910Loca on SOMEPLACE Sample Type SPUTUM

Sample Source PULMONARY Sample Date 2016-12-25

Sample ID A12345678 Sequenced From MGIT CULTURED ISOLATE

Repor ng Lab LAB NAME Report Date/Time 2017-01-01, 15:36

Requested By REQUESTER NAME Requester Contact REQUESTER@EMAIL.COM

SummaryThe specimen was posi ve for Mycobacterium tuberculosis. It is resistant to isoniaizd and ri-fampin. It belongs to a cluster, sugges ng recent transmission.

OrganismThe specimen was posi ve forMycobacterium tuberculosis, lineage 2.2.1 (East-Asian Beijing).

Drug Suscep bility

Resistance is reported when a high-confidenceresistance-conferring muta on is detected. “Nomuta on detected” does not exclude the possi-bility of resistance.

! No drug resistance predicted!Mono-resistance predicted"!Mul -drug resistance predicted! Extensive drug resistance predicted

Drug class Interpreta on Drug Resistance Gene (Amino Acid Muta on)

Ethambutol No muta on detectedSuscep blePyrazinimide No muta on detected

Isoniazid katG (S315T)First Line

ResistantRifampin rpoB (S531L)

Streptomycin No muta on detected

Ciprofloxacin No muta on detected

Ofloxacin No muta on detectedMoxifloxacin No muta on detectedAmikacin No muta on detectedKanamycin No muta on detected

Second Line Suscep ble

Capreomycin No muta on detected

Page 1 of 2 Pa ent ID: 12345678910 | Date: 2017-01-01 | Loca on: Someplace

My Work: Evidence Based Design

57

My Work: Exploring Vis for Genomic Epidemiology

OPTION A OPTION B

How do researchers visualize data? How can we systematically compare visualizations?

58

Wrapping up

59

DATA VISUALIZATION IS NOT

JUST AN ART PROJECT

60

Have a high-level understanding of data visualization design and evaluation

Revisiting Today’s Learning Goal

§ Visualizations of data are useful§ Helpful in instance of low numeracy§ Can used in communication and exploration

§ But.. visualization design also matters§ Many different alternatives, important to test

§ It’s possible to think systematically about visualizations§ Many disciplines cross cut information visualization research§ At the bear minimum think “Why”, “What”, “How”

§ Some small examples to get you started§ https://peerj.com/articles/4218/ + more to come

61

An Introduction to Data VisualizationAnamaria Crisan

@amcrisan http://cs.ubc.ca/~acrisanacrisan@cs.ubc.ca 62

top related