ben domingue institute of behavioral science...
TRANSCRIPT
Genome-wide estimates of heritability
Ben DomingueInstitute of Behavioral ScienceUniversity of Colorado [email protected]
1/16
I Genes → behaviors & outcomes of interest.
I Genome-wide data: FHS, HRS, AddHealth, etc....I Hard to get a handle on genotype/phenotype
connection.I GWAS results help, but have limited availability.I Even when available, polygenic scores have limited
predictive value.
What else can we do?
2/16
GCTA
Genome-wide Complex Trait Analysis (GCTA) tells usabout heritability.
I GCTA estimates heritability without knowledge ofcausal variants.
I Instead uses “genetic similarity” (similar to logic oftwin studies).
3/16
Method1. Estimate genome-wide similarity:
Ajk =1
N
∑i
(xij − 2pi)(xik − 2pi)
2pi(1− pi)
2. Then estimate mixed model:
y = Xβ + g + ε
where g ∼ MVN[0, σ2gA].
3. Heritability:σ̂2g
σ̂2g+σ̂2
ε
.
Complicated model & not the DGP.
4/16
Method1. Estimate genome-wide similarity:
Ajk =1
N
∑i
(xij − 2pi)(xik − 2pi)
2pi(1− pi)
2. Then estimate mixed model:
y = Xβ + g + ε
where g ∼ MVN[0, σ2gA].
3. Heritability:σ̂2g
σ̂2g+σ̂2
ε
.
Complicated model & not the DGP.
4/16
Method1. Estimate genome-wide similarity:
Ajk =1
N
∑i
(xij − 2pi)(xik − 2pi)
2pi(1− pi)
2. Then estimate mixed model:
y = Xβ + g + ε
where g ∼ MVN[0, σ2gA].
3. Heritability:σ̂2g
σ̂2g+σ̂2
ε
.
Complicated model & not the DGP.
4/16
Method1. Estimate genome-wide similarity:
Ajk =1
N
∑i
(xij − 2pi)(xik − 2pi)
2pi(1− pi)
2. Then estimate mixed model:
y = Xβ + g + ε
where g ∼ MVN[0, σ2gA].
3. Heritability:σ̂2g
σ̂2g+σ̂2
ε
.
Complicated model & not the DGP.4/16
Sensitivity to genetic architecture?
I Robust to# of causalvariants.
I Sensitive toLD.
5/16
Sensitivity to genetic architecture?
I Robust to# of causalvariants.
I Sensitive toLD.
5/16
Sensitivity to environment?
Could genetic similarity just be a proxy for environmentalsimilarity?
6/16
My goal: Offer intuition and basic guidance on whenGCTA estimates may be reliable.
7/16
Data
HRS: 4950 non-Hispanic whites, ≈ 1.5M autosomalSNPs.
I Height: 0.40
8/16
Q1: Gen sim as function of SNPs
Correlation
50% Sample 0.9830% Sample 0.9510% Sample 0.83
r 2 = 0.01 0.57r 2 = 0.2 0.75r 2 = 0.5 0.88
9/16
Q1: Gen sim as function of SNPs
Correlation
50% Sample 0.9830% Sample 0.9510% Sample 0.83
r 2 = 0.01 0.57r 2 = 0.2 0.75r 2 = 0.5 0.88
9/16
Q1: Gen sim as function of SNPs
Correlation
50% Sample 0.9830% Sample 0.9510% Sample 0.83
r 2 = 0.01 0.57r 2 = 0.2 0.75r 2 = 0.5 0.88
9/16
Q2: GWAS (height) variants
10/16
Q2: GWAS (height) variants
10/16
Q2: GWAS (height) variants
10/16
Q2: GWAS (height) variants
10/16
Q3: HeteroskedasticityHeteroskedasticiy is common problem.
I weight on height.I own education on paternal education.
Of concern here since we’re estimating variancecomponents.
I Simulate outcome based on GCTA model.I y = 0.5 · height + g + ε.I εi has variance exp(α · height · σ2ε ), where α controls
level of heteroskedasticity and σ2ε controlsheritability.
Examine recovery of heritability, but def’n no longersimple.
11/16
Q3: HeteroskedasticityHeteroskedasticiy is common problem.
I weight on height.I own education on paternal education.
Of concern here since we’re estimating variancecomponents.
I Simulate outcome based on GCTA model.I y = 0.5 · height + g + ε.I εi has variance exp(α · height · σ2ε ), where α controls
level of heteroskedasticity and σ2ε controlsheritability.
Examine recovery of heritability, but def’n no longersimple.
11/16
Q3: HeteroskedasticityHeteroskedasticiy is common problem.
I weight on height.I own education on paternal education.
Of concern here since we’re estimating variancecomponents.
I Simulate outcome based on GCTA model.I y = 0.5 · height + g + ε.I εi has variance exp(α · height · σ2ε ), where α controls
level of heteroskedasticity and σ2ε controlsheritability.
Examine recovery of heritability, but def’n no longersimple.
11/16
Q3: Heteroskedasticity
12/16
Q3: Heteroskedasticity
12/16
Q4: Environmental Moderation
Heritability not constant: What are implications forGCTA?
I Standard GCTA: g ∼ MVN[0, σ2gA].
I We simulate data using g ∼ MVN[0,A′] where(i , j)-th entry of A′ is hihjAij .
13/16
Q4: Environmental Moderation
Heritability not constant: What are implications forGCTA?
I Standard GCTA: g ∼ MVN[0, σ2gA].
I We simulate data using g ∼ MVN[0,A′] where(i , j)-th entry of A′ is hihjAij .
13/16
Q4: Environmental Moderation
What if weignoreenvironment?
14/16
Q4: Environmental Moderation
What if weallow forenvironmentalvariation?
15/16
I LD is important consideration (aside: I’m skepticalabout using KING or REAP estimates).
I Heteroskedasticiy leads to inflation of h2 estimates.
I Environmental differences are likely to beproblematic (and yet may be rampant?).
In closing: GCTA is like a table saw.
16/16
I LD is important consideration (aside: I’m skepticalabout using KING or REAP estimates).
I Heteroskedasticiy leads to inflation of h2 estimates.
I Environmental differences are likely to beproblematic (and yet may be rampant?).
In closing: GCTA is like a table saw.
16/16