eleazar eskin ucla

15
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA

Upload: huyen

Post on 07-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information. Eleazar Eskin UCLA. Motivation. Whole genome association study How to perform multiple hypothesis correction To increase statistical power - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Eleazar Eskin UCLA

Increasing Power in Association Studies by using Linkage Disequilibrium Structure and

Molecular Function as Prior Information

Eleazar EskinUCLA

Page 2: Eleazar Eskin UCLA

Motivation

• Whole genome association study• How to perform multiple hypothesis

correction – To increase statistical power

• Incorporate prior information on molecular function of associated loci

• Information on linkage disequilibrium structure

Page 3: Eleazar Eskin UCLA

Main idea

• Traditional method– Use a single significance threshold

• In practice, markers are not identical

• Set a different threshold at each marker, which reflects both intrinsic (e.g. LD, allele freq.) and extrinsic information on the markers

Page 4: Eleazar Eskin UCLA

Standard Association Study

• M markers in N cases and N controls

• fi = minor allele frequency at marker i

• True case/control allele frequency• Marker d: casual variant with a relative

risk

dd

dd

dd

fp

ff

fp

)1(

ii pp /

Page 5: Eleazar Eskin UCLA

Standard Association Study

• Test statistic~ N(

,1)

• Power at a single marker (probability of detecting an association with N individuals at p-value or significance threshold t

Page 6: Eleazar Eskin UCLA

Multiple Hypothesis correction

• Fix the false positive rate at each marker so that the total false positive rate is α

• Bonferroni correction– ti= α/M

• Expected power:where ci is the probability of marker i to be causal

Probability of rejecting the correct null hypothesis

Page 7: Eleazar Eskin UCLA

Multi-Threshold Association

• Allow a different threshold ti for each marker

• Power:

with adjusted false positive rate

• Goal: set values for ti to maximize the power subject to the constraints

Page 8: Eleazar Eskin UCLA

Maximizing the Power

• Gradient at each marker will be equal at the optimal point

• Given a value of gradient, solve for the threshold at each marker to achieve that gradient

• Do binary search over the gradient until thresholds sum to α

Page 9: Eleazar Eskin UCLA

Maximizing Power for Proxies

• In practice, markers are tags for causal variation• Given K variants, assign each potential causal

variation vk to the best marker i

• The effective non-centrality parameter is reduced by a factor of |rki| where rki is the correlation coefficient between variant k and marker i.

• If vk is causal, the power function when observing proxy marker i is ),||,( NrtP kkis

Page 10: Eleazar Eskin UCLA

Maximizing Power for Proxies

• Each variant k has a prob of being causal ck

• The total power captured by each marker i

• The total power of the association study

ik Tv kkiskiim NNrtPcNTtP ),||,(),,(

M

i Tvkkisk

M

iiimM

ik

NNrtPc

NTtPtttP

1

121

),||,(

),,(),...,,(

Page 11: Eleazar Eskin UCLA

Candidate Gene study

• 1000 cases and controls over ENCODE regions using markers in Affymetrix 500k genechip

Page 12: Eleazar Eskin UCLA

Robustness over relative risks

Page 13: Eleazar Eskin UCLA

Whole Genome Association

• Assumption– Each SNP is equally likely to be causal with

relative risk of 2

• Power for traditional study and multi-threshold association for 2,614,057 SNPs– avg: 0.593 / 0.610– Avg over power in [0.1, 0.9]: 0.568 / 0.615

Page 14: Eleazar Eskin UCLA

Impact of extrinsic information

1. cSNPs are more likely to be involved in disease2. Add information on se of genes which are more

likely to be involved in specific disease

• 30,700 cSNPs in HapMap contributes to 20% of the disease causing variation

• Cancer Gene Census: 363 genes in which mutations have been implicated in cancer. 20% of causal variation is assumed in these genes

Page 15: Eleazar Eskin UCLA