statistics on big biomedical data - methods and pitfalls when analyzing high-throughput screens

65
Statistics on big biomedical data Methods and pitfalls when analyzing high- throughput screens Lars Juhl Jensen

Upload: lars-juhl-jensen

Post on 10-May-2015

169 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Statistics on big biomedical data

Methods and pitfalls when analyzing high-throughput screens

Lars Juhl Jensen

Page 2: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Statistics on big biomedical data

Methods and pitfalls when analyzing high-throughput screens

Lars Juhl Jensen

Page 3: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

t-test

Page 4: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

ANOVA

Page 5: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

normal distribution

Page 6: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

useful tests

Page 7: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

counts

Page 8: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

contingency table

Page 9: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Jensen et al., Nature Reviews Genetics, 2012

Page 10: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Fisher’s exact test

Page 11: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

real numbers

Page 12: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

no theoretical distribution

Page 13: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

non-parametric statistics

Page 14: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

do the medians differ?

Page 15: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Mann–Whitney U test

Page 16: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

medians can mislead you

Page 17: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

do the distributions differ?

Page 18: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Kolmogorov–Smirnov test

Page 19: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens
Page 20: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

does not tell how they differ

Page 21: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

resampling

Page 22: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Monte Carlo testing

Page 23: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens
Page 24: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

always applicable

Page 25: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

compute intensive

Page 26: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

multiple testing

Page 27: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

xkcd.com

Page 28: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

xkcd.com

Page 29: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

xkcd.com

Page 30: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

xkcd.com

Page 31: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

compare multiple condition

Page 32: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Gene Ontology enrichment

Page 33: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Bonferroni

Page 34: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

avoid making any errors

Page 35: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

too conservative

Page 36: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Benjamini–Hochberg

Page 37: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

control false discovery rate

Page 38: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

assumes independence

Page 39: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

resampling

Page 40: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

negative set

Page 41: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

systematic biases

Page 42: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Huang et al., Journal of Proteome Research, 2014

Page 43: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

studiedness bias

Page 44: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

we study disease proteins

Page 45: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

thus we know many PTMs

Page 46: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

abundance bias

Page 47: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

higher expressed

Page 48: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

easier to detect in assays

Page 49: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

better characterized

Page 50: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

matched background

Page 51: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

the big data effect

Page 52: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

if you have enough data

Page 53: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

any difference is significant

Page 54: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

but maybe not relevant

Page 55: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

“significant”

Page 56: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

statistical significance

Page 57: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

p-value

Page 58: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

biological relevance

Page 59: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

fold change

Page 60: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

relative risk

Page 61: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

significant and relevant

Page 62: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

volcano plots

Page 63: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

Lundby et al., Science Signaling, 2013

Page 64: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

rather ad hoc

Page 65: Statistics on big biomedical data - Methods and pitfalls when analyzing high-throughput screens

questions?