saving the babies from the bathwater in genome-wide ...zzlab.net/workshopisu/gwas_iowa.pdf · mlm...

44
Saving the Babies from the Bathwater in Genome-Wide Association Studies Zhiwu Zhang Washington State University

Upload: others

Post on 18-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Saving the Babies from the Bathwater in Genome-Wide Association Studies

Zhiwu ZhangWashington State University

Page 2: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

http://luckyrobot.com/wp-content/uploads/2013/04/nih-cost-genome.jpg

Human genome

2nd

Generation Sequencing

Biomedical Innovations out-paced Computer Innovations

Page 3: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

More Research on GWAS and GS

By May 31, 2013

Page 4: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

∗ As fast as one season∗ 50~300 kb resolution

Page 5: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

∗ Computing difficulties: millions of markers, individuals, and traits

∗ False positives, ex: “Amgen scientists tried to replicate 53 high-profile cancer research findings, but could only replicate 6”, Nature, 2012, 483: 531

∗ False negatives

Problems in GWAS

Page 6: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

GWAS Stream

GCTA

PCPC+K

SELECT

EMMA

FST-LMM

EMMAx

GEMMA

GenAbel

P3D

MLMM

Q

Q+K

CMLM

FarmCPUBLINK

ECMLM

Page 7: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

0.1

NC33

B115

I 137TN

81- 1

M EF 156- 55- 2

I L677A

I a5125

I A2132P39I L101

F2EP1

F7

CO 255

NC366

B52

SC213R

NC238

G T112

M p339

G A209

M37WD940Y

T232

U267Y

CI 28A

B2

F2834T

M o24WM S1334

I DS28

I - 29

SA24

4722

SG 18Sg1533

HP301I DS91I DS69

F6

F44

Ab28A

CM L328

O h603

A441- 5

CM L323

SC55

NC264NC370

NC320

NC318

NC334

NC332

CM L92

CM L220

CM L218A272

Tzi9

CM L77

TZI 10

CM L311

CM L349CM L158Q

CM L154Q

CM L281

CM L91

par vi- 14

par vi- 36

par vi- 03

par vi- 49

par vi- 30

NC356

TX601

NC340

NC350

NC304

NC338NC302

NC354TZI 18

A6

Ki44

Ki43

Ki2007Ki21

Ki2021

Ki14

CM L238CM L321

CM L157Q

CM L322

CM L331CM L261

CML277

CM L341CM L45CM L11

CM L10

Q 6199

CM L314

CM L258

CM L5CM L254

CM L61

CM L264

CM L9

CM L108

CM L287Tzi11

CM L38CM L14

SC357L578

T234

N6E2558W

K64

CI 64

CI 44

CI 31ANC230

CI 66K55

R109BM oG

L317

NC232

VaW6

CH701- 30

Va85

CI 90C M 14

H99

Va99 PA91

Ky226

H95

O h40B

VA26Pa762

O H43

A619

Va22

Va17

Va14 Va59

Va102

Va35

T8

A654

Pa880

SD44

CH9

Pa875

H49

W64A

WF9

M o1W

O s420

R168

38- 11

NC260

M o44

CI 21E

Hy

A661

ND246

WD

A554

CO 125

CO 109

B75

DE- 3

DE- 2

DE1

B57

C123

C103

NC360

NC364

NC362

NC342

NC290A

NC262

NC344

NC258

CI 91B

CI 187- 2

A682

K4

NC222

NC236

CI 3A

K148

M t 42

M S153

W401

A556

B77

CM 37

CM V3

W117HT

CM 7

CO 106

B103

R4

Ky228

B164

M o46

M o47

M o45

Hi27

Yu796- NS

I 205

Tzi16Tzi25

B79

B105

N7A

SD40

N28HT

A641

DE811

H105W

A214N

H100

A635

A632

A634B14A

H91B68

B64

CM 174CM 105

B104

B84

NC250

H84

B76

B37

N192

A679

B109NC294

NC368NC326

NC314

NC328 R229

A680

NC310

NC324NC322NC330

NC372

B73Ht r hmB73NC308

NC312 NC306NC268

B46B10

A239

C49AC49

A188

W153R

A659

R177

W22

W182B

CI - 7

33- 16

4226

NC348

NC298

NC352

NC336

NC296

NC346NC296A

NC292

SWEET

POPCORNTROPICAL-SUBTROPICAL

STIFF STALKNON STIFF STALK

MIXED Based on 89 SSR loci

Flint-Garcia et al. (2005) Plant J. 44: 1054

B97

CML103CML69

CML52

CML228

CML247

CML332

IL14H

Ky21

Ki11

Ki3

MS71

Mo18W

OH7B

M162W

MO17

Oh43E

Tx303

TZI8CML333

NC358NC300

ssp. parviglumis

Page 8: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Populationstructure

Lines Q1 Q2 Q333-16 0.014 0.972 0.01438-11 0.003 0.993 0.0044226 0.071 0.917 0.0124722 0.035 0.854 0.111A188 0.013 0.982 0.005

A214N 0.762 0.017 0.221A239 0.035 0.963 0.002A272 0.019 0.122 0.859

A441-5 0.005 0.531 0.464A554 0.019 0.979 0.002A556 0.004 0.994 0.002

A6 0.003 0.03 0.967A619 0.009 0.99 0.001

Page 9: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

GLMforGWAS

Phenotype on individuals

Population structure

Y = SNP +(fixed effect)

General Linear Model (GLM)

(fixed effect)eQ (or PCs) +

Page 10: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Additive numerator relationship

A B

C

D

E

Individual Father MotherABC A BD A CE D B

A B C D E

A 1 0 0.5 0.75 0.375

B 0 1 0.5 0.25 0.625

C 0.5 0.5 1 0.75 0.625

D 0.75 0.25 0.75 1.25 0.75

E 0.375 0.725 0.625 0.75 1.125

Diagonals=1+F

Page 11: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

∗ Proportion of shared allele∗ Average across markers

Marker based kinship

Marker 1 2 3 4 5 Average

Individual1 AA AA AA BB AB

Individual2 AA AB BB BB AB

Similarity 1 0.5 0 1 0.5 0.6

Maximum similarity: 1

Page 12: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Kinship matrix (re-scaled)

Page 13: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

MLMforGWAS

Phenotype

Population structure

Unequal relatedness

Y = SNP + Q (or PCs) + Kinship + e(fixed effect) (random effect)

General Linear Model (GLM)

Mixed Linear Model (MLM)

(fixed effect)

(Yu et al. 2005, Nature Genetics)

Page 14: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Atwell et al Nature 2010

a, No correction test

b, Correction with MLM

GWAS does work well for traits associated with structure

Magnus Norborg

Page 15: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Queen + King

Page 16: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

MLM was more enriched on Flowering time genes

Page 17: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Groupbykinship

7100

7150

7200

7250

7300

7350

7400

74501 2 4 8 16 32 64 128

256

512

1024

Page 18: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

CompressedMLMimprovespower

Average number of individuals per group

Zhang et al, Nature Genetics, 2010

Page 19: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

SA, GC, PCA and QTDT

Henderson’s MLM

GLM (1 group)

Full MLM(n groups)

Pedigreebased kinship

Markerbased kinship

Com

pres

sed

MLM

(s

grou

ps)

Sire model

n ≥

s ≥

1

Unified MLM

Compressed MLM

Compressed MLM is more generalZhang et al, Nature Genetics, 2010

Page 20: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Enriched Compressed MLM

Kinship: Among individuals -> among groups

1 .25 .125 .125

.25 1 .5 .5

.125 .5 1 .75

.125 .5 .75 1

1 .167

.167 .72

1 .25

.25 1

Page 21: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Statistical power improvement

Method shift Human Dog Maize Arabidopsis

GLM to MLM 3.6% 13.8% 10.1% 29.6%

MLM to compression 4.0% 14.2% 7.6% 2.5%

Compression to group kinship

6.4% 13.3% 2.9% 2.6%

Meng Li

Li et al, BMC Biology, 2014

Page 22: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Kinship defined by single marker

S1 S2 S3 S4 R1 R2 R3 R4

S1 1 1 1 1 0 0 0 0

S2 1 1 1 1 0 0 0 0

S3 1 1 1 1 0 0 0 0

S4 1 1 1 1 0 0 0 0

R1 0 0 0 0 1 1 1 1

R2 0 0 0 0 1 1 1 1

R3 0 0 0 0 1 1 1 1

R4 0 0 0 0 1 1 1 1

Sensitive Resistance

Adding additional markers bluer the picture

Page 23: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Derivation of kinship

All SNPs

QTNs

Non-QTNs SNP

Kinship

Page 24: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Statistical power of kinship from

Page 25: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Sing

le t

rait

All

trai

ts

Remove QTN one at a time

Kinship evolution

Page 26: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Statistical power of kinship from

Page 27: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Bin approach

Page 28: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Mimic QTN-1 ∗ 1. Choose t associated SNPs as QTNs each represent an

interval of size s. ∗ 2. Build kinship from the t QTNs∗ 3. Optimization on t and s∗ 4. For a SNP, remove the QTNs in LD with the SNP, e.g. R

square > 1%∗ 5. Use the remaining QTNs to build kinship for testing the

SNP

Page 29: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Statistical power of kinship from

Qishan Wang

PLoS One, 2014

SUPER(Settlement of kinship Under Progressively Exclusive Relationship)

Page 30: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Usage of Software PackagesSoftware Leading Authors Corresponding authors Language Released Citation

PUMA Gabriel E. Hoffman Jason G. Mezey C++ 2013 8

TATES Sophie van der Sluis Sophie van der Sluis Fortran 2013 20

GAPIT Lipka AE Zhang Z R 2012 106

MLMM Vincent S Nordborg M R/python 2012 69

GEMMA Zhou X Stephens M C++ 2012 88

FastLMM Christoph L, Listgarten J,Heckerman D

Christoph L, Listgarten J,Heckerman D C++ 2011 104

Qxpak M. Pérez-Enciso M. Pérez-Enciso Fortran 2004 141

EMMAX Kang HM Sabatti C & Eskin E C++ 2010 349

GCTA Jian Y Jian Y C++ 2011 380

GenABEL Aulchenko YS Aulchenko YS R 2007 510

TASSEL Bradbury PJ, Zhang Z, KroonDE Bradbury PJ Java 2006 660

PLINK Purcell S Purcell S C++ 2007 703775%

Page 31: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Why human geneticists not go beyond PLINK?

Page 32: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Adjustment on marker

si: Testing marker

Q: Population structure

K: Kinship

S: Pseudo QTNs

Adjustment on covariates

GWAS Model Development

Page 33: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

It is time for human geneticists to move forward

Xiaolei Liu

Page 34: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Re-analysis of Arabidopsis data

Xiaolei Liu

Page 35: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Asso

ciat

ions

on

flow

erin

g tim

e

Page 36: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Flowering time genes enriched

Page 37: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

FarmCPU is computing efficient

Testing 60K SNPs

Page 38: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Half million individuals, half million SNPs three days

But, PINK new version is faster

Page 39: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

ZZLab.Net

http://zzlab.net/GAPIT/data

Page 40: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%
Page 41: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

BLINK is super fastTesting half million SNPs

BLINK is 5X > PLINK, 200X > FarmCPU

FarmCPU

BLINK

PLINK

Page 42: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

fdr.mean[, 1]

power PLINKGAPIT

BLINK

FarmCPU

http://zzlab.net/GAPIT/data/Workshop_Iowa.R

Page 43: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

t, F, X2…

GLM

MLM

CMLM

Uncorrelated or equally correlated

Ladder for high hanging fruits

ECMLM

Structure, Eigenstrate

PLIK

EMMA, EMMAx,GCAT, GenABEL

TASSELGAPIT

GAPIT

Page 44: Saving the Babies from the Bathwater in Genome-Wide ...zzlab.net/WorkshopISU/GWAS_Iowa.pdf · MLM to compression 4.0% 14.2% 7.6% 2.5% Compression to group kinship 6.4% 13.3% 2.9%

Acknowledgements

Meng HuangXiaolei LiuMeng LiAlex Lipka

ECMLM

Xiahui Yuan