statistics on big biomedical data - methods and pitfalls when analyzing high-throughput screens

Post on 10-May-2015

169 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Statistics on big biomedical data

Methods and pitfalls when analyzing high-throughput screens

Lars Juhl Jensen

Statistics on big biomedical data

Methods and pitfalls when analyzing high-throughput screens

Lars Juhl Jensen

t-test

ANOVA

normal distribution

useful tests

counts

contingency table

Jensen et al., Nature Reviews Genetics, 2012

Fisher’s exact test

real numbers

no theoretical distribution

non-parametric statistics

do the medians differ?

Mann–Whitney U test

medians can mislead you

do the distributions differ?

Kolmogorov–Smirnov test

does not tell how they differ

resampling

Monte Carlo testing

always applicable

compute intensive

multiple testing

xkcd.com

xkcd.com

xkcd.com

xkcd.com

compare multiple condition

Gene Ontology enrichment

Bonferroni

avoid making any errors

too conservative

Benjamini–Hochberg

control false discovery rate

assumes independence

resampling

negative set

systematic biases

Huang et al., Journal of Proteome Research, 2014

studiedness bias

we study disease proteins

thus we know many PTMs

abundance bias

higher expressed

easier to detect in assays

better characterized

matched background

the big data effect

if you have enough data

any difference is significant

but maybe not relevant

“significant”

statistical significance

p-value

biological relevance

fold change

relative risk

significant and relevant

volcano plots

Lundby et al., Science Signaling, 2013

rather ad hoc

questions?

top related