Download - Humanizing bioinformatics
Humanizing bioinformatics
Jan AertsAssistant Professor - ESAT/SCDBioData Analysis & VisualizationFaculty of EngineeringLeuven University
[email protected]@jandot
whoami
Leuven
whoami
Wageningen
whoami
Roslin
whoami
Hinxton
whoami
Leuven
why “humanizing bioinformatics”?
scientific research paradigms-
big & complex data-
what about the user?-
data visualization
what I’ll talk about
scientific research throughout time
Science Paradigms
1st 1,000s years ago empirical
2nd 100s years ago theoretical
3rd last few decades computational
4rd today data exploration
Jim Gray
Science Paradigms
1st 1,000s years ago empirical
2nd 100s years ago theoretical
3rd last few decades computational
4rd today data exploration
Jim Gray
computational biology
bioinformatics
ever bigger datasets
ever more complicated mining algorithms
case in point:genome sequencing
why do we sequence?
variation discovery
transcriptionally active sites
gene expression
copy number variation
miRNA expression & discovery
protein-DNA interactionsalternative splicing
coverage
reads
polymorphisms
gene model
single nucleotide polymorphisms
Robberecht et al, 2010
Molecular Biology of the Cell, 4th Edition
structural variation
Human Genome Project
automate, automate, automate
HGP:15 years, $3 billion, tens of labs => 1 genome
now:1 week, $5000, 1 technician => 1 genome
genome sequencing throughput
Mardis, 2010
genome sequencing throughput
“next-generation” sequencing platforms
Mardis, 2010
NHGRI
Metzker et al, 2010
big throughput => big data
advanced data structures
advanced data mining
support vector machine recursive feature elimination
manifold learning
adaptive cascade sharing trees
“Dammit Jim, I’m a doctor, not a bioinformatician!”Christophe Lambert
“Dammit Jim, I’m a doctor, not a bioinformatician!”
We’re alienating the user...too much datablind trust (?) in bioinformatician
what’s the question?
what parameters should I use?
can I trust this output?
I can’t wrap my head around this...
but...
what’s the question?
4th paradigm
question -> hypothesis -> generate data
what’s the question?
4th paradigm
question -> hypothesis -> generate data
generate data -> see what we can do with it
Gene interaction data: “A regulates B”
what parameters should I use?
peak
but is this?
van de Wiel et al, 2010
T. Voet
putative mutations
filter 1
filter 2
A B C
different settings for filters
filter 3
data filteringcan I trust this output?
AB
C
AB
CState of the art: run many filter pipelines and take intersection
AB
C
What we should have found...
different algorithms for finding the same thing
I can’t wrap my head around this...too much (?) info
treatment plan for cancer patients
heterogeneous datasetsmultiple abstraction levels
multiple sourcesmultiple formats
population/family datapatient/clinical data
MR/CT/X-ray tissue samples
pathways gene expression data
collaborative data examinationpathologist geneticist
biologist
researcher is lost...
data visualization
“... the use of computer-supported, interactive, visual representations of data to amplify cognition” (S Card, J Mackinlay & B Schneiderman)
“... computer-based visualization systems providing visual representations of datasets intended to help people carry out some task more effectively.” (T Munzner)
cognitive task => perceptive task
II IIII IIIIII IVIVx y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.588.0 6.95 8.0 8.14 8.0 6.77 8.0 5.7613.0 7.58 13.0 8.74 13.0 12.74 8.0 7.719.0 8.81 9.0 8.77 9.0 7.11 8.0 8.8411.0 8.33 11.0 9.26 11.0 7.81 8.0 8.4714.0 9.96 14.0 8.10 14.0 8.84 8.0 7.046.0 7.24 6.0 6.13 6.0 6.08 8.0 5.254.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5012.0 10.84 12.0 9.13 12.0 8.15 8.0 5.567.0 4.82 7.0 7.26 7.0 6.42 8.0 7.915.0 5.68 5.0 4.74 5.0 5.73 8.0 6.80
n = 11 correlation x & y = 0.816regression line: y = 3+0.5x
mean x = 9.0mean y = 7.5
variance x = 11.0variance y = 4.12
exploration explanation
pictorial superiority effect
“information”
“informa” “i”65% 1%
72hr
exploration explanation
J van Wijk
exploration explanation
exploration explanation
some of the principles
know your visual encodings
power of the plane
danger of depth
eyes beat memoryoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demand
...
(taken from T Munzner)
visual encoding channels
position on common scaleposition on unaligned scale
2D size3D size
Mackinlay
“power of the plane”
position on common scaleposition on unaligned scale
2D size3D size
examples of sub-optimal encoding
Florence Nightingale
Florence Nightingale
Don’t believe everything you see
networks... <sigh>
Martin Krzewinsky
same network
Martin Krzewinsky
different networks!
3D, anyone?
3D, anyone?
occlusioninteraction complexityperspective distortion
text legibility
Gene interaction data: “A regulates B”
regulator
manager
workhorse
size of effect shown in graphic“lie factor” =
size of effect in data
Humanizing bioinformatics
Humanizing bioinformatics
there and back again
put the user back in the loop!
Thank you
Acknowledgments
• graphics creators
• Tamara Munzner
• Martin Krzewinski
Image attributions
... got lost ...
If you find something that’s yours, let me know!