Download - Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput screens
Lars Juhl Jensen
Statistics on big biomedical data
Methods and pitfalls when analyzing high-throughput screens
Lars Juhl Jensen
t-test
ANOVA
normal distribution
useful tests
counts
contingency table
Jensen et al., Nature Reviews Genetics, 2012
Fisher’s exact test
real numbers
no theoretical distribution
non-parametric statistics
do the medians differ?
Mann–Whitney U test
medians can mislead you
do the distributions differ?
Kolmogorov–Smirnov test
does not tell how they differ
resampling
Monte Carlo testing
always applicable
compute intensive
multiple testing
xkcd.com
xkcd.com
xkcd.com
xkcd.com
compare multiple condition
Gene Ontology enrichment
Bonferroni
avoid making any errors
too conservative
Benjamini–Hochberg
control false discovery rate
assumes independence
resampling
negative set
systematic biases
Huang et al., Journal of Proteome Research, 2014
studiedness bias
we study disease proteins
thus we know many PTMs
abundance bias
higher expressed
easier to detect in assays
better characterized
matched background
the big data effect
if you have enough data
any difference is significant
but maybe not relevant
“significant”
statistical significance
p-value
biological relevance
fold change
relative risk
significant and relevant
volcano plots
Lundby et al., Science Signaling, 2013
rather ad hoc
questions?