visual analytics in omics - why, what, how?
TRANSCRIPT
Visual Analytics in omics - why, what, how?
Prof Jan AertsSTADIUS - ESAT, Faculty of Engineering, University of Leuven, BelgiumData Visualization [email protected]@datavislab.org
creativecommons.org/licenses/by-nc/3.0/
• What problem are we trying to solve?
• What is Visual Analytics and how can it help?
• How do we actually do this?
• Some examples
• Challenges
�2
A. What’s the problem?
�3
hypothesis-driven -> data-driven
Scientific Research Paradigms (Jim Gray, Microsoft)
!
!
!
!
I have an hypothesis -> need to generate data to (dis)prove it.I have data -> need to find hypotheses that I can test.
1st 1,000s years ago empirical
2nd 100s years ago theoretical
3rd last few decades computational
4rd today data exploration
�4
What does this mean?
• immense re-use of existing datasets
• biologically interesting signals may be too poorly understood to be analyzed in automated fashion
• much of initial analysis is exploratory in nature => what’s my hypothesis?=> searching for unknown unknowns
• automated algorithms often act as black boxes => biologists must have blind faith in bioinformatician (and bioinformatician in his/her own skills)
�5
For domain expert: what’s my hypothesis?
�7
Martin Krzywinski
input
filter 1
filter 2
output A
filter 3
output B output C
For developer and domain expert:opening the black box
�8
B. What is Visual Analytics and how can it help?
�9
�10
Our research interest:visual design + interaction design + backend
What is visualization?
�11
visualization of simulations
in situ visualization of real-world structures
What is visualization?
T. Munzner
�12
What is visualization?
T. Munzner
cognition <=> perception cognitive task => perceptive task
�13
• record information
• blueprints, photographs,seismographs, ...
• analyze data to support reasoning
• develop & assess hypotheses
• discover errors in data
• expand memory
• find patterns (see Snow’s cholera map)
• communicate information
• share & persuade
• collaborate & revise
Why do we visualize data?
�14
Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012
The strength of visualization
pictorial superiority effect
“information”
“informa” “i”65% 10%
72hr
�17
Steven’s psychophysical law
= proposed relationship between the magnitude of a physical stimulus and its perceived intensity or strength
�18
Accuracy of quantitative perceptual tasks
McKinlay
what/where (qualitative)how much (quantitative)
�19
Accuracy of quantitative perceptual tasks
McKinlay
what/where (qualitative)how much (quantitative)
�20
Accuracy of quantitative perceptual tasks
McKinlay“power of the plane”
what/where (qualitative)how much (quantitative)
�21
Pre-attentive vision
= ability of low-level human visual system to rapidly identify certain basic visual properties
• some features “pop out”
• used for:
• target detection
• boundary detection
• counting/estimation
• ...
• visual system takes over => all cognitive power available for interpreting the figure, rather than needing part of it for processing the figure
�22
�23
�24
1. Combining pre-attentive features does not always work => would need to resort to “serial search” (most channel pairs; all channel triplets)e.g. is there a red square in this picture
Limitations of preattentive vision
2. Speed depends on which channel (use one that is good for categorical)
�25
Gestalt laws - interplay between parts and the whole
�26
Gestalt laws - interplay between parts and the whole
• simplicity
• proximity
• similarity
• connectedness
• good continuation
• common fate
• familiarity
• symmetry
�27
Bret Victor - Ladder of abstration
�28
For domain expert: what’s my hypothesis?
�29
Martin Krzywinski
�30
Martin Krzywinski
�31
Martin Krzywinski
input
filter 1
filter 2
output A
filter 3
output B output C
For developer and domain expert:opening the black box
�32
A B
C
�33
A B
C
�34
A B
C
�35
C. How do we actually do this?
�36
Talking to domain experts
�37
Data visualization framework
�38
Card sorting
�39
Tools of the trade
�40
Vega - https://github.com/trifacta/vega/wiki
• html + json
�43
D. Examples
�44
Data exploration Data filtering User-guided analysis
Data exploration
HiTSeeBertini E et al. IEEE Symposium on Biological Data Visualization (2011)
Aracari
Ryo Sakai
Bartlett C et al. BMC Bioinformatics (2012)
�46
RevealJäger, G et al. Bioinformatics (2012)
MeanderPavlopoulos et al. Nucl Acids Res (2013)
�48
Georgios Pavlopoulos
ParCoordBoogaerts T et al. IEEE International Conference on
Bioinformatics & Bioengineering (2012)
Thomas Boogaerts
Endeavour gene prioritization
�49
Sequence logo
Seagull
subgroup
similarity difference
Data filtering (visual parameter setting)
TrioVis
Ryo Sakai
Sakai R et al. Bioinformatics (2013)
�54
User-guided analysis
SparkNielsen et al. Genome Research (2012)
clustering
chromatin modification
DNA methylationRNA-Seq
data samples
regions of interest
�55
BaobabViewvan den Elzen S & van Wijk J. IEEE Conference on
Visual Analytics Science and Technology (2011)decision trees
E. Challenges
�57
Many challenges remain
• scalability (data processing + perception), uncertainty, “interestingness”, interaction, evaluation
• infrastructure & architecture
• fast imprecise answers with progressive refinement
• incremental re-computation
• steering computation towards data regions of interest
�58
Computational scalability
• speed
• preprocessing big data: mapreduce = batch
• interactivity: max 0.3 sec lag!
• size
• multiple data resolutions => data size increase
• not all resolutions necessary for all data regions: steer computation to regions of interest
• Options:
• distribute visualization calculations over cluster
• distributing scala/spark or other “real-time” mapreduce paradigm
• functional programming paradigm?
• lazy evaluation and smart preprocessing: only calculate what’s needed
=> generic framework
Perceptual scalability
• “overview first, then zoom and filter, details on demand”: breaks down with very big datasets
• “analyze first, show results, then zoom and filter, details on demand” => need to identify regions of interest and “interestingness features”
• identify higher-level structure in data (e.g. clustering, dimensionality reduction) -> use these to guide user
Thank you
• Georgios Pavlopoulos
• Ryo Sakai
• Thomas Boogaerts
• Toni Verbeiren
• Data Visualization Lab (datavislab.org)
• Erik Duval
• Andrew Vande Moere
�62