visual analytics in omics - why, what, how?

62
Visual Analytics in omics - why, what, how? Prof Jan Aerts STADIUS - ESAT, Faculty of Engineering, University of Leuven, Belgium Data Visualization Lab [email protected] [email protected] creativecommons.org/licenses/by-nc/3.0/

Upload: jan-aerts

Post on 11-Jun-2015

979 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Visual Analytics in Omics - why, what, how?

Visual Analytics in omics - why, what, how?

Prof Jan AertsSTADIUS - ESAT, Faculty of Engineering, University of Leuven, BelgiumData Visualization [email protected]@datavislab.org

creativecommons.org/licenses/by-nc/3.0/

Page 2: Visual Analytics in Omics - why, what, how?

• What problem are we trying to solve?

• What is Visual Analytics and how can it help?

• How do we actually do this?

• Some examples

• Challenges

�2

Page 3: Visual Analytics in Omics - why, what, how?

A. What’s the problem?

�3

Page 4: Visual Analytics in Omics - why, what, how?

hypothesis-driven -> data-driven

Scientific Research Paradigms (Jim Gray, Microsoft)

!

!

!

!

I have an hypothesis -> need to generate data to (dis)prove it.I have data -> need to find hypotheses that I can test.

1st 1,000s years ago empirical

2nd 100s years ago theoretical

3rd last few decades computational

4rd today data exploration

�4

Page 5: Visual Analytics in Omics - why, what, how?

What does this mean?

• immense re-use of existing datasets

• biologically interesting signals may be too poorly understood to be analyzed in automated fashion

• much of initial analysis is exploratory in nature => what’s my hypothesis?=> searching for unknown unknowns

• automated algorithms often act as black boxes => biologists must have blind faith in bioinformatician (and bioinformatician in his/her own skills)

�5

Page 6: Visual Analytics in Omics - why, what, how?
Page 7: Visual Analytics in Omics - why, what, how?

For domain expert: what’s my hypothesis?

�7

Martin Krzywinski

Page 8: Visual Analytics in Omics - why, what, how?

input

filter 1

filter 2

output A

filter 3

output B output C

For developer and domain expert:opening the black box

�8

Page 9: Visual Analytics in Omics - why, what, how?

B. What is Visual Analytics and how can it help?

�9

Page 10: Visual Analytics in Omics - why, what, how?

�10

Our research interest:visual design + interaction design + backend

Page 11: Visual Analytics in Omics - why, what, how?

What is visualization?

�11

visualization of simulations

in situ visualization of real-world structures

Page 12: Visual Analytics in Omics - why, what, how?

What is visualization?

T. Munzner

�12

Page 13: Visual Analytics in Omics - why, what, how?

What is visualization?

T. Munzner

cognition <=> perception cognitive task => perceptive task

�13

Page 14: Visual Analytics in Omics - why, what, how?

• record information

• blueprints, photographs,seismographs, ...

• analyze data to support reasoning

• develop & assess hypotheses

• discover errors in data

• expand memory

• find patterns (see Snow’s cholera map)

• communicate information

• share & persuade

• collaborate & revise

Why do we visualize data?

�14

Page 15: Visual Analytics in Omics - why, what, how?

Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012

Page 16: Visual Analytics in Omics - why, what, how?

The strength of visualization

Page 17: Visual Analytics in Omics - why, what, how?

pictorial superiority effect

“information”

“informa” “i”65% 10%

72hr

�17

Page 18: Visual Analytics in Omics - why, what, how?

Steven’s psychophysical law

= proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength

�18

Page 19: Visual Analytics in Omics - why, what, how?

Accuracy of quantitative perceptual tasks

McKinlay

what/where (qualitative)how much (quantitative)

�19

Page 20: Visual Analytics in Omics - why, what, how?

Accuracy of quantitative perceptual tasks

McKinlay

what/where (qualitative)how much (quantitative)

�20

Page 21: Visual Analytics in Omics - why, what, how?

Accuracy of quantitative perceptual tasks

McKinlay“power of the plane”

what/where (qualitative)how much (quantitative)

�21

Page 22: Visual Analytics in Omics - why, what, how?

Pre-attentive vision

= ability of low-level human visual system to rapidly identify certain basic visual properties

• some features “pop out”

• used for:

• target detection

• boundary detection

• counting/estimation

• ...

• visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure

�22

Page 23: Visual Analytics in Omics - why, what, how?

�23

Page 24: Visual Analytics in Omics - why, what, how?

�24

Page 25: Visual Analytics in Omics - why, what, how?

1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets)e.g. is there a red square in this picture

Limitations of preattentive vision

2. Speed depends on which channel (use one that is good for categorical)

�25

Page 26: Visual Analytics in Omics - why, what, how?

Gestalt laws - interplay between parts and the whole

�26

Page 27: Visual Analytics in Omics - why, what, how?

Gestalt laws - interplay between parts and the whole

• simplicity

• proximity

• similarity

• connectedness

• good continuation

• common fate

• familiarity

• symmetry

�27

Page 28: Visual Analytics in Omics - why, what, how?

Bret Victor - Ladder of abstration

�28

Page 29: Visual Analytics in Omics - why, what, how?

For domain expert: what’s my hypothesis?

�29

Martin Krzywinski

Page 30: Visual Analytics in Omics - why, what, how?

�30

Martin Krzywinski

Page 31: Visual Analytics in Omics - why, what, how?

�31

Martin Krzywinski

Page 32: Visual Analytics in Omics - why, what, how?

input

filter 1

filter 2

output A

filter 3

output B output C

For developer and domain expert:opening the black box

�32

Page 33: Visual Analytics in Omics - why, what, how?

A B

C

�33

Page 34: Visual Analytics in Omics - why, what, how?

A B

C

�34

Page 35: Visual Analytics in Omics - why, what, how?

A B

C

�35

Page 36: Visual Analytics in Omics - why, what, how?

C. How do we actually do this?

�36

Page 37: Visual Analytics in Omics - why, what, how?

Talking to domain experts

�37

Page 38: Visual Analytics in Omics - why, what, how?

Data visualization framework

�38

Page 39: Visual Analytics in Omics - why, what, how?

Card sorting

�39

Page 40: Visual Analytics in Omics - why, what, how?

Tools of the trade

�40

Page 41: Visual Analytics in Omics - why, what, how?

Processing - http://processing.org

• java

�41

Page 42: Visual Analytics in Omics - why, what, how?

D3 - http://d3js.org/

• javascript

�42

Page 43: Visual Analytics in Omics - why, what, how?

Vega - https://github.com/trifacta/vega/wiki

• html + json

�43

Page 44: Visual Analytics in Omics - why, what, how?

D. Examples

�44

Data exploration Data filtering User-guided analysis

Page 45: Visual Analytics in Omics - why, what, how?

Data exploration

HiTSeeBertini E et al. IEEE Symposium on Biological Data Visualization (2011)

Page 46: Visual Analytics in Omics - why, what, how?

Aracari

Ryo Sakai

Bartlett C et al. BMC Bioinformatics (2012)

�46

Page 47: Visual Analytics in Omics - why, what, how?

RevealJäger, G et al. Bioinformatics (2012)

Page 48: Visual Analytics in Omics - why, what, how?

MeanderPavlopoulos et al. Nucl Acids Res (2013)

�48

Georgios Pavlopoulos

Page 49: Visual Analytics in Omics - why, what, how?

ParCoordBoogaerts T et al. IEEE International Conference on

Bioinformatics & Bioengineering (2012)

Thomas Boogaerts

Endeavour gene prioritization

�49

Page 50: Visual Analytics in Omics - why, what, how?

Sequence logo

Page 51: Visual Analytics in Omics - why, what, how?

Seagull

Page 52: Visual Analytics in Omics - why, what, how?
Page 53: Visual Analytics in Omics - why, what, how?

subgroup

similarity difference

Page 54: Visual Analytics in Omics - why, what, how?

Data filtering (visual parameter setting)

TrioVis

Ryo Sakai

Sakai R et al. Bioinformatics (2013)

�54

Page 55: Visual Analytics in Omics - why, what, how?

User-guided analysis

SparkNielsen et al. Genome Research (2012)

clustering

chromatin modification

DNA methylationRNA-Seq

data samples

regions of interest

�55

Page 56: Visual Analytics in Omics - why, what, how?

BaobabViewvan den Elzen S & van Wijk J. IEEE Conference on

Visual Analytics Science and Technology (2011)decision trees

Page 57: Visual Analytics in Omics - why, what, how?

E. Challenges

�57

Page 58: Visual Analytics in Omics - why, what, how?

Many challenges remain

• scalability (data processing + perception), uncertainty, “interestingness”, interaction, evaluation

• infrastructure & architecture

• fast imprecise answers with progressive refinement

• incremental re-computation

• steering computation towards data regions of interest

�58

Page 59: Visual Analytics in Omics - why, what, how?

Computational scalability

• speed

• preprocessing big data: mapreduce = batch

• interactivity: max 0.3 sec lag!

• size

• multiple data resolutions => data size increase

• not all resolutions necessary for all data regions: steer computation to regions of interest

Page 60: Visual Analytics in Omics - why, what, how?

• Options:

• distribute visualization calculations over cluster

• distributing scala/spark or other “real-time” mapreduce paradigm

• functional programming paradigm?

• lazy evaluation and smart preprocessing: only calculate what’s needed

=> generic framework

Page 61: Visual Analytics in Omics - why, what, how?

Perceptual scalability

• “overview first, then zoom and filter, details on demand”: breaks down with very big datasets

• “analyze first, show results, then zoom and filter, details on demand” => need to identify regions of interest and “interestingness features”

• identify higher-level structure in data (e.g. clustering, dimensionality reduction) -> use these to guide user

Page 62: Visual Analytics in Omics - why, what, how?

Thank you

• Georgios Pavlopoulos

• Ryo Sakai

• Thomas Boogaerts

• Toni Verbeiren

• Data Visualization Lab (datavislab.org)

• Erik Duval

• Andrew Vande Moere

�62