humanizing bioinformatics

Post on 11-May-2015

3.539 Views

Category:

Design

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

In this talk, I explain the need for basic visualization know-how in bioinformatics.

TRANSCRIPT

Humanizing bioinformatics

Jan AertsAssistant Professor - ESAT/SCDBioData Analysis & VisualizationFaculty of EngineeringLeuven University

jan.aerts@esat.kuleuven.be@jandot

whoami

Leuven

whoami

Wageningen

whoami

Roslin

whoami

Hinxton

whoami

Leuven

why “humanizing bioinformatics”?

scientific research paradigms-

big & complex data-

what about the user?-

data visualization

what I’ll talk about

scientific research throughout time

Science Paradigms

1st 1,000s years ago empirical

2nd 100s years ago theoretical

3rd last few decades computational

4rd today data exploration

Jim Gray

Science Paradigms

1st 1,000s years ago empirical

2nd 100s years ago theoretical

3rd last few decades computational

4rd today data exploration

Jim Gray

computational biology

bioinformatics

ever bigger datasets

ever more complicated mining algorithms

case in point:genome sequencing

why do we sequence?

variation discovery

transcriptionally active sites

gene expression

copy number variation

miRNA expression & discovery

protein-DNA interactionsalternative splicing

coverage

reads

polymorphisms

gene model

single nucleotide polymorphisms

Robberecht et al, 2010

Molecular Biology of the Cell, 4th Edition

structural variation

Human Genome Project

automate, automate, automate

HGP:15 years, $3 billion, tens of labs => 1 genome

now:1 week, $5000, 1 technician => 1 genome

genome sequencing throughput

Mardis, 2010

genome sequencing throughput

“next-generation” sequencing platforms

Mardis, 2010

NHGRI

Metzker et al, 2010

big throughput => big data

advanced data structures

advanced data mining

support vector machine recursive feature elimination

manifold learning

adaptive cascade sharing trees

“Dammit Jim, I’m a doctor, not a bioinformatician!”Christophe Lambert

“Dammit Jim, I’m a doctor, not a bioinformatician!”

We’re alienating the user...too much datablind trust (?) in bioinformatician

what’s the question?

what parameters should I use?

can I trust this output?

I can’t wrap my head around this...

but...

what’s the question?

4th paradigm

question -> hypothesis -> generate data

what’s the question?

4th paradigm

question -> hypothesis -> generate data

generate data -> see what we can do with it

Gene interaction data: “A regulates B”

what parameters should I use?

peak

but is this?

van de Wiel et al, 2010

T. Voet

putative mutations

filter 1

filter 2

A B C

different settings for filters

filter 3

data filteringcan I trust this output?

AB

C

AB

CState of the art: run many filter pipelines and take intersection

AB

C

What we should have found...

different algorithms for finding the same thing

I can’t wrap my head around this...too much (?) info

treatment plan for cancer patients

heterogeneous datasetsmultiple abstraction levels

multiple sourcesmultiple formats

population/family datapatient/clinical data

MR/CT/X-ray tissue samples

pathways gene expression data

collaborative data examinationpathologist geneticist

biologist

researcher is lost...

data visualization

“... the use of computer-supported, interactive, visual representations of data to amplify cognition” (S Card, J Mackinlay & B Schneiderman)

“... computer-based visualization systems providing visual representations of datasets intended to help people carry out some task more effectively.” (T Munzner)

cognitive task => perceptive task

II IIII IIIIII IVIVx y x y x y x y

10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.588.0 6.95 8.0 8.14 8.0 6.77 8.0 5.7613.0 7.58 13.0 8.74 13.0 12.74 8.0 7.719.0 8.81 9.0 8.77 9.0 7.11 8.0 8.8411.0 8.33 11.0 9.26 11.0 7.81 8.0 8.4714.0 9.96 14.0 8.10 14.0 8.84 8.0 7.046.0 7.24 6.0 6.13 6.0 6.08 8.0 5.254.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5012.0 10.84 12.0 9.13 12.0 8.15 8.0 5.567.0 4.82 7.0 7.26 7.0 6.42 8.0 7.915.0 5.68 5.0 4.74 5.0 5.73 8.0 6.80

n = 11 correlation x & y = 0.816regression line: y = 3+0.5x

mean x = 9.0mean y = 7.5

variance x = 11.0variance y = 4.12

exploration explanation

pictorial superiority effect

“information”

“informa” “i”65% 1%

72hr

exploration explanation

J van Wijk

exploration explanation

exploration explanation

some of the principles

know your visual encodings

power of the plane

danger of depth

eyes beat memoryoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demandoverview, zoom and filter, details on demand

...

(taken from T Munzner)

visual encoding channels

position on common scaleposition on unaligned scale

2D size3D size

Mackinlay

“power of the plane”

position on common scaleposition on unaligned scale

2D size3D size

examples of sub-optimal encoding

Florence Nightingale

Florence Nightingale

Don’t believe everything you see

networks... <sigh>

Martin Krzewinsky

same network

Martin Krzewinsky

different networks!

3D, anyone?

3D, anyone?

occlusioninteraction complexityperspective distortion

text legibility

Gene interaction data: “A regulates B”

regulator

manager

workhorse

size of effect shown in graphic“lie factor” =

size of effect in data

Humanizing bioinformatics

Humanizing bioinformatics

there and back again

put the user back in the loop!

Thank you

Acknowledgments

• graphics creators

• Tamara Munzner

• Martin Krzewinski

Image attributions

... got lost ...

If you find something that’s yours, let me know!

top related