visual analytics in omics - why, what, how?

Visual Analytics in omics - why, what, how?

Prof Jan AertsSTADIUS - ESAT, Faculty of Engineering, University of Leuven, BelgiumData Visualization [email protected]@datavislab.org

creativecommons.org/licenses/by-nc/3.0/

• What problem are we trying to solve?

• What is Visual Analytics and how can it help?

• How do we actually do this?

• Some examples

• Challenges

�2

A. What’s the problem?

�3

hypothesis-driven -> data-driven

Scientific Research Paradigms (Jim Gray, Microsoft)

!

!

!

!

I have an hypothesis -> need to generate data to (dis)prove it.I have data -> need to find hypotheses that I can test.

1st 1,000s years ago empirical

2nd 100s years ago theoretical

3rd last few decades computational

4rd today data exploration

�4

What does this mean?

• immense re-use of existing datasets

• biologically interesting signals may be too poorly understood to be analyzed in automated fashion

• much of initial analysis is exploratory in nature => what’s my hypothesis?=> searching for unknown unknowns

• automated algorithms often act as black boxes => biologists must have blind faith in bioinformatician (and bioinformatician in his/her own skills)

�5

For domain expert: what’s my hypothesis?

�7

Martin Krzywinski

input

filter 1

filter 2

output A

filter 3

output B output C

For developer and domain expert:opening the black box

�8

B. What is Visual Analytics and how can it help?

�9

�10

Our research interest:visual design + interaction design + backend

What is visualization?

�11

visualization of simulations

in situ visualization of real-world structures


T. Munzner

�12


T. Munzner

cognition <=> perception cognitive task => perceptive task

�13

• record information

• blueprints, photographs,seismographs, ...

• analyze data to support reasoning

• develop & assess hypotheses

• discover errors in data

• expand memory

• find patterns (see Snow’s cholera map)

• communicate information

• share & persuade

• collaborate & revise

Why do we visualize data?

�14

Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012

The strength of visualization

pictorial superiority effect

“information”

“informa” “i”65% 10%

72hr

�17

Steven’s psychophysical law

= proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength

�18

Accuracy of quantitative perceptual tasks

McKinlay

what/where (qualitative)how much (quantitative)

�19


McKinlay


�20


McKinlay“power of the plane”


�21

Pre-attentive vision

= ability of low-level human visual system to rapidly identify certain basic visual properties

• some features “pop out”

• used for:

• target detection

• boundary detection

• counting/estimation

• ...

• visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure

�22

1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets)e.g. is there a red square in this picture

Limitations of preattentive vision

2. Speed depends on which channel (use one that is good for categorical)

�25

Gestalt laws - interplay between parts and the whole

�26

Gestalt laws - interplay between parts and the whole

• simplicity

• proximity

• similarity

• connectedness

• good continuation

• common fate

• familiarity

• symmetry

�27

Bret Victor - Ladder of abstration

�28

For domain expert: what’s my hypothesis?

�29

Martin Krzywinski

�30

Martin Krzywinski

�31

Martin Krzywinski

input

filter 1

filter 2

output A

filter 3

output B output C

For developer and domain expert:opening the black box

�32

A B

C

�33

A B

C

�34

A B

C

�35

C. How do we actually do this?

�36

Talking to domain experts

�37

Data visualization framework

�38

Card sorting

�39

Tools of the trade

�40

Processing - http://processing.org

• java

�41

http://processing.org

D3 - http://d3js.org/

• javascript

�42

http://d3js.org/

Vega - https://github.com/trifacta/vega/wiki

• html + json

�43

https://github.com/trifacta/vega/wiki

D. Examples

�44

Data exploration Data filtering User-guided analysis

Data exploration

HiTSeeBertini E et al. IEEE Symposium on Biological Data Visualization (2011)

Aracari

Ryo Sakai

Bartlett C et al. BMC Bioinformatics (2012)

�46

RevealJäger, G et al. Bioinformatics (2012)

MeanderPavlopoulos et al. Nucl Acids Res (2013)

�48

Georgios Pavlopoulos

ParCoordBoogaerts T et al. IEEE International Conference on

Bioinformatics & Bioengineering (2012)

Thomas Boogaerts

Endeavour gene prioritization

�49

Sequence logo

Seagull

subgroup

similarity difference

Data filtering (visual parameter setting)

TrioVis

Ryo Sakai

Sakai R et al. Bioinformatics (2013)

�54

User-guided analysis

SparkNielsen et al. Genome Research (2012)

clustering

chromatin modification

DNA methylationRNA-Seq

data samples

regions of interest

�55

BaobabViewvan den Elzen S & van Wijk J. IEEE Conference on

Visual Analytics Science and Technology (2011)decision trees

E. Challenges

�57

Many challenges remain

• scalability (data processing + perception), uncertainty, “interestingness”, interaction, evaluation

• infrastructure & architecture

• fast imprecise answers with progressive refinement

• incremental re-computation

• steering computation towards data regions of interest

�58

Computational scalability

• speed

• preprocessing big data: mapreduce = batch

• interactivity: max 0.3 sec lag!

• size

• multiple data resolutions => data size increase

• not all resolutions necessary for all data regions: steer computation to regions of interest

• Options:

• distribute visualization calculations over cluster

• distributing scala/spark or other “real-time” mapreduce paradigm

• functional programming paradigm?

• lazy evaluation and smart preprocessing: only calculate what’s needed

=> generic framework

Perceptual scalability

• “overview first, then zoom and filter, details on demand”: breaks down with very big datasets

• “analyze first, show results, then zoom and filter, details on demand” => need to identify regions of interest and “interestingness features”

• identify higher-level structure in data (e.g. clustering, dimensionality reduction) -> use these to guide user

Thank you

• Georgios Pavlopoulos

• Ryo Sakai

• Thomas Boogaerts

• Toni Verbeiren

• Data Visualization Lab (datavislab.org)

• Erik Duval

• Andrew Vande Moere

�62

visual analytics in omics - why, what, how?

Education

biological data visualization

data visualization framework38

scalability data processing

visual design

strength of visualization

visual analytics science

belgiumdata visualization

martin krzywinski