visual analytics in omics - why, what, how?

Visual Analytics in omics - why, what, how?

Prof Jan AertsSTADIUS - ESAT, Faculty of Engineering, University of Leuven, BelgiumData Visualization Lab!jan.aerts@esat.kuleuven.bejan@datavislab.org

creativecommons.org/licenses/by-nc/3.0/

• What problem are we trying to solve?

• What is Visual Analytics and how can it help?

• How do we actually do this?

• Some examples

• Challenges

A. What’s the problem?

hypothesis-driven -> data-driven

Scientific Research Paradigms (Jim Gray, Microsoft)

I have an hypothesis -> need to generate data to (dis)prove it.I have data -> need to find hypotheses that I can test.

1st 1,000s years ago empirical

2nd 100s years ago theoretical

3rd last few decades computational

4rd today data exploration

What does this mean?

• immense re-use of existing datasets

• biologically interesting signals may be too poorly understood to be analyzed in automated fashion

• much of initial analysis is exploratory in nature => what’s my hypothesis?=> searching for unknown unknowns

• automated algorithms often act as black boxes => biologists must have blind faith in bioinformatician (and bioinformatician in his/her own skills)

For domain expert: what’s my hypothesis?

Martin Krzywinski

filter 1

filter 2

output A

filter 3

output B output C

For developer and domain expert:opening the black box

B. What is Visual Analytics and how can it help?

Our research interest:visual design + interaction design + backend

What is visualization?

visualization of simulations

in situ visualization of real-world structures

T. Munzner

cognition <=> perception cognitive task => perceptive task

• record information

• blueprints, photographs,seismographs, ...

• analyze data to support reasoning

• develop & assess hypotheses

• discover errors in data

• expand memory

• find patterns (see Snow’s cholera map)

• communicate information

• share & persuade

• collaborate & revise

Why do we visualize data?

Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012

The strength of visualization

pictorial superiority effect

“information”

“informa” “i”65% 10%

Steven’s psychophysical law

= proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength

Accuracy of quantitative perceptual tasks

McKinlay

what/where (qualitative)how much (quantitative)

McKinlay

McKinlay“power of the plane”

Pre-attentive vision

= ability of low-level human visual system to rapidly identify certain basic visual properties

• some features “pop out”

• used for:

• target detection

• boundary detection

• counting/estimation

• ...

• visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure

1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets)e.g. is there a red square in this picture

Limitations of preattentive vision

2. Speed depends on which channel (use one that is good for categorical)

Gestalt laws - interplay between parts and the whole

• simplicity

• proximity

• similarity

• connectedness

• good continuation

• common fate

• familiarity

• symmetry

Bret Victor - Ladder of abstration

For domain expert: what’s my hypothesis?

Martin Krzywinski

filter 1

filter 2

output A

filter 3

output B output C

For developer and domain expert:opening the black box

C. How do we actually do this?

Talking to domain experts

Data visualization framework

Card sorting

Tools of the trade

Processing - http://processing.org

• java

D3 - http://d3js.org/

• javascript

Vega - https://github.com/trifacta/vega/wiki

• html + json

D. Examples

Data exploration Data filtering User-guided analysis

Data exploration

HiTSeeBertini E et al. IEEE Symposium on Biological Data Visualization (2011)

Aracari

Ryo Sakai

Bartlett C et al. BMC Bioinformatics (2012)

RevealJäger, G et al. Bioinformatics (2012)

MeanderPavlopoulos et al. Nucl Acids Res (2013)

Georgios Pavlopoulos

ParCoordBoogaerts T et al. IEEE International Conference on

Bioinformatics & Bioengineering (2012)

Thomas Boogaerts

Endeavour gene prioritization

Sequence logo

Seagull

subgroup

similarity difference

Data filtering (visual parameter setting)

TrioVis

Ryo Sakai

Sakai R et al. Bioinformatics (2013)

User-guided analysis

SparkNielsen et al. Genome Research (2012)

clustering

chromatin modification

DNA methylationRNA-Seq

data samples

regions of interest

BaobabViewvan den Elzen S & van Wijk J. IEEE Conference on

Visual Analytics Science and Technology (2011)decision trees

E. Challenges

Many challenges remain

• scalability (data processing + perception), uncertainty, “interestingness”, interaction, evaluation

• infrastructure & architecture

• fast imprecise answers with progressive refinement

• incremental re-computation

• steering computation towards data regions of interest

Computational scalability

• speed

• preprocessing big data: mapreduce = batch

• interactivity: max 0.3 sec lag!

• size

• multiple data resolutions => data size increase

• not all resolutions necessary for all data regions: steer computation to regions of interest

• Options:

• distribute visualization calculations over cluster

• distributing scala/spark or other “real-time” mapreduce paradigm

• functional programming paradigm?

• lazy evaluation and smart preprocessing: only calculate what’s needed

=> generic framework

Perceptual scalability

• “overview first, then zoom and filter, details on demand”: breaks down with very big datasets

• “analyze first, show results, then zoom and filter, details on demand” => need to identify regions of interest and “interestingness features”

• identify higher-level structure in data (e.g. clustering, dimensionality reduction) -> use these to guide user

Thank you

• Georgios Pavlopoulos

• Ryo Sakai

• Thomas Boogaerts

• Toni Verbeiren

• Data Visualization Lab (datavislab.org)

• Erik Duval

• Andrew Vande Moere

visual analytics in omics - why, what, how?

biological data visualization

data visualization framework38

scalability data processing

visual design

strength of visualization

visual analytics science

belgiumdata visualization

martin krzywinski

Education

visual analytics review

brochure nuix visual analytics

seqdynamics: visual analytics for evaluating online …meng...

storyboarding for visual analytics

visual analytics antibiogram

visual analytics with tableau

analytics updates viewability, verification, visual...

visual text analytics for technology and innovation...

big data visual analytics and predictive analytics tools

evolutionary visual software analytics

proposal 12 - visual analytics

forecasting hotspots - a predictive visual analytics...

a visual analytics application for biological networks1....

sas visual analytics factsheet

web analytics wednesday poznań 2015 - visual analytics

deepeyes: progressive visual analytics for … progressive...

deliverable 7.2: evaluation of visual analytics prototypes...

user-centric visual analytics

sas visual analytics: andrew howell groups...sas visual...

immersive analytics sensemaking on different...