constructing gene association networks for complex human disordes
TRANSCRIPT
Constructing Association Networks using BGTA
Constructing Gene Association Networks forComplex Human Disordes
Using the BGTA Algorithm
Tian ZhengDepartment of Statistics
Columbia University
June 7th, 2008
1 / 25
Constructing Association Networks using BGTA
Acknowledgements
I http://statgene.stat.columbia.eduI Collaborators
I Professors Shaw-Hwa Lo and Herman ChernoffI Graduate student: Yuejing DingI Our computer specialist: Lei cong
I The research presented is, in part, supported by NIH and NSF.
2 / 25
Constructing Association Networks using BGTA
Motivation
Motivation
I Complex traits –“... are caused by multiple genes interactingwith each other and with environmental factors to create agradient of genetic susceptibility to disease.” (Weeks andLathrop 1995)
I Gene-gene interactions may play a more important role incommon human disorders, which has made the identificationof disease-predisposing genes less successful.
I Multi-marker information collected in genetic studies providesa way to examine interaction information about a disease.
3 / 25
Constructing Association Networks using BGTA
Motivation
Motivation
I Complex traits –“... are caused by multiple genes interactingwith each other and with environmental factors to create agradient of genetic susceptibility to disease.” (Weeks andLathrop 1995)
I Gene-gene interactions may play a more important role incommon human disorders, which has made the identificationof disease-predisposing genes less successful.
I Multi-marker information collected in genetic studies providesa way to examine interaction information about a disease.
3 / 25
Constructing Association Networks using BGTA
Motivation
Motivation
I Complex traits –“... are caused by multiple genes interactingwith each other and with environmental factors to create agradient of genetic susceptibility to disease.” (Weeks andLathrop 1995)
I Gene-gene interactions may play a more important role incommon human disorders, which has made the identificationof disease-predisposing genes less successful.
I Multi-marker information collected in genetic studies providesa way to examine interaction information about a disease.
3 / 25
Constructing Association Networks using BGTA
Motivation
Gene association networks
I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.
I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.
I Combining such association information, one can construct anetwork among these identified genetic loci.
I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.
4 / 25
Constructing Association Networks using BGTA
Motivation
Gene association networks
I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.
I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.
I Combining such association information, one can construct anetwork among these identified genetic loci.
I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.
4 / 25
Constructing Association Networks using BGTA
Motivation
Gene association networks
I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.
I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.
I Combining such association information, one can construct anetwork among these identified genetic loci.
I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.
4 / 25
Constructing Association Networks using BGTA
Motivation
Gene association networks
I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.
I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.
I Combining such association information, one can construct anetwork among these identified genetic loci.
I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.
4 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Interaction
I “Interaction” is a biological term and a statistical term.
I In biology, interaction means a joint action in a molecular oretiological sense.
I In statistics, interaction is usually based on a specific model.
I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.
5 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Interaction
I “Interaction” is a biological term and a statistical term.
I In biology, interaction means a joint action in a molecular oretiological sense.
I In statistics, interaction is usually based on a specific model.
I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.
5 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Interaction
I “Interaction” is a biological term and a statistical term.
I In biology, interaction means a joint action in a molecular oretiological sense.
I In statistics, interaction is usually based on a specific model.
I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.
5 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Interaction
I “Interaction” is a biological term and a statistical term.
I In biology, interaction means a joint action in a molecular oretiological sense.
I In statistics, interaction is usually based on a specific model.
I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.
5 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Partition-based measure of influence
I Consider a set of k SNPs.
I Each SNP has three possible genotypes (A/A, A/B, B/B).
I This creates a partition Π of 3k elements.
I We can study the joint “influence” (association) of theseSNP on a trait Y using
IΠ =∑
nj2(Yj − Y )2 = nσ2
∑ nj
n
(Yj − Y
σ/√
nj
)2
IΠ/nσ2 ∼
∑ nj
nχ2
1 under the null hypothesis.
6 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Partition-based measure of influence
I Consider a set of k SNPs.
I Each SNP has three possible genotypes (A/A, A/B, B/B).
I This creates a partition Π of 3k elements.
I We can study the joint “influence” (association) of theseSNP on a trait Y using
IΠ =∑
nj2(Yj − Y )2 = nσ2
∑ nj
n
(Yj − Y
σ/√
nj
)2
IΠ/nσ2 ∼
∑ nj
nχ2
1 under the null hypothesis.
6 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Partition-based measure of influence
I Consider a set of k SNPs.
I Each SNP has three possible genotypes (A/A, A/B, B/B).
I This creates a partition Π of 3k elements.
I We can study the joint “influence” (association) of theseSNP on a trait Y using
IΠ =∑
nj2(Yj − Y )2 = nσ2
∑ nj
n
(Yj − Y
σ/√
nj
)2
IΠ/nσ2 ∼
∑ nj
nχ2
1 under the null hypothesis.
6 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Partition-based measure of influence
I Consider a set of k SNPs.
I Each SNP has three possible genotypes (A/A, A/B, B/B).
I This creates a partition Π of 3k elements.
I We can study the joint “influence” (association) of theseSNP on a trait Y using
IΠ =∑
nj2(Yj − Y )2 = nσ2
∑ nj
n
(Yj − Y
σ/√
nj
)2
IΠ/nσ2 ∼
∑ nj
nχ2
1 under the null hypothesis.
6 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Partition-based measure of influence
(a) dist. of y dist. of I (shaded hist., black cdf) vs. asymp. (empty hist., red cdf)
cond. dist. of I (shaded hist., black cdf) vs. cond. asymp. (empty hist., red cdf)
bar plot: partition element sizes used in cond. dist.
partition: 20, observations: 100
0 2 4 6
0.000.050.100.150.200.25
f(y)
dens
ity
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
0.20.40.60.81
dens
ity
1 2 3 4
0.00.51.01.5
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
1
2
3
partition: 200, observations: 200
0 2 4 6
0.000.050.100.150.200.25
f(y)
dens
ity
0.6 0.8 1.0 1.2 1.4
012345
0.20.40.60.81
dens
ity
0.5 1.0 1.5
01234
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
25
50
75
partition: 2000, observations: 1000
0 2 4 6
0.000.050.100.150.200.25
f(y)
dens
ity
0.90 1.00 1.10
0
5
10
15
0.20.40.60.81
dens
ity
0.8 0.9 1.0 1.1 1.2
0
5
10
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
400
800
1200
(b)
partition: 20, observations: 100
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5
dens
ity
1 2 3 4
0.0
0.5
1.0
1.5
0.20.40.60.81
dens
ity
1 2 3 4
0.00.51.01.52.0
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
2
4
6
partition: 200, observations: 200
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5
dens
ity
0.8 1.0 1.2 1.4 1.6
012345
0.20.40.60.81
dens
ity
1.0 1.5
01234
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
25
50
75
partition: 200, observations: 1000
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5
dens
ity
1.0 1.5 2.0 2.5 3.0
012345
0.20.40.60.81
dens
ity
1.0 1.5 2.0 2.5 3.0
0
2
4
6
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
15
30
45
partition: 2000, observations: 1000
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5
dens
ity
0.9 1.0 1.1 1.2 1.3
0
5
10
15
0.20.40.60.81
dens
ity
0.8 0.9 1.0 1.1 1.2
0
5
10
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
400
800
1200
y I scores I scores ni
7 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Partition-based measure of influence
(a) dist. of y dist. of I (shaded hist., black cdf) vs. asymp. (empty hist., red cdf)
cond. dist. of I (shaded hist., black cdf) vs. cond. asymp. (empty hist., red cdf)
bar plot: partition element sizes used in cond. dist.
partition: 20, observations: 100
0 2 4 6
0.000.050.100.150.200.25
f(y)
dens
ity
0.5 1.0 1.5 2.0 2.5
0.0
0.5
1.0
1.5
0.20.40.60.81
dens
ity
1 2 3 4
0.00.51.01.5
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
1
2
3
partition: 200, observations: 200
0 2 4 6
0.000.050.100.150.200.25
f(y)
dens
ity
0.6 0.8 1.0 1.2 1.4
012345
0.20.40.60.81
dens
ity
0.5 1.0 1.5
01234
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
25
50
75
partition: 2000, observations: 1000
0 2 4 6
0.000.050.100.150.200.25
f(y)
dens
ity
0.90 1.00 1.10
0
5
10
15
0.20.40.60.81
dens
ity
0.8 0.9 1.0 1.1 1.2
0
5
10
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
400
800
1200
(b)
partition: 20, observations: 100
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5
dens
ity
1 2 3 4
0.0
0.5
1.0
1.5
0.20.40.60.81
dens
ity
1 2 3 4
0.00.51.01.52.0
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
2
4
6
partition: 200, observations: 200
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5de
nsity
0.8 1.0 1.2 1.4 1.6
012345
0.20.40.60.81
dens
ity
1.0 1.5
01234
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
25
50
75
partition: 200, observations: 1000
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5
dens
ity
1.0 1.5 2.0 2.5 3.0
012345
0.20.40.60.81
dens
ity
1.0 1.5 2.0 2.5 3.0
0
2
4
6
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
15
30
45
partition: 2000, observations: 1000
0 2 4 6 8
0.000.050.100.150.200.25
f(y)
δδ = 1.5
dens
ity
0.9 1.0 1.1 1.2 1.3
0
5
10
15
0.20.40.60.81
dens
ity
0.8 0.9 1.0 1.1 1.2
0
5
10
0.20.40.60.81
0 2 4 6 8 10 12
coun
ts
0
400
800
1200
y I scores I scores ni
8 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association methodI In Zheng et al. (2006), we proposed the Genotype-Trait
Distortion (GTD) score to measure association between a setof SNPs and the disease status.
I GTD uses the sum of squared difference between genotypedistributions among the cases and controls.
I This is a special case of the partition-based influence score:
IΠ =∑j∈Π
n2j (Yj − Y )2
=3m∑i=1
(nd ,i + nu,i )2
(nd ,i
nd ,i + nu,i− nd
nd + nu
)2
=
(ndnu
nd + nu
)2 3m∑i=1
(nd ,i
nd−
nu,i
nu
)2
.
9 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association methodI In Zheng et al. (2006), we proposed the Genotype-Trait
Distortion (GTD) score to measure association between a setof SNPs and the disease status.
I GTD uses the sum of squared difference between genotypedistributions among the cases and controls.
I This is a special case of the partition-based influence score:
IΠ =∑j∈Π
n2j (Yj − Y )2
=3m∑i=1
(nd ,i + nu,i )2
(nd ,i
nd ,i + nu,i− nd
nd + nu
)2
=
(ndnu
nd + nu
)2 3m∑i=1
(nd ,i
nd−
nu,i
nu
)2
.
9 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association methodI In Zheng et al. (2006), we proposed the Genotype-Trait
Distortion (GTD) score to measure association between a setof SNPs and the disease status.
I GTD uses the sum of squared difference between genotypedistributions among the cases and controls.
I This is a special case of the partition-based influence score:
IΠ =∑j∈Π
n2j (Yj − Y )2
=3m∑i=1
(nd ,i + nu,i )2
(nd ,i
nd ,i + nu,i− nd
nd + nu
)2
=
(ndnu
nd + nu
)2 3m∑i=1
(nd ,i
nd−
nu,i
nu
)2
.
9 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association method
I GTD score changes when a SNP is removed from a set underevaluation.
I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.
I If GTD increases, this SNP is not important given the otherSNPs in the set.
I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.
I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.
10 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association method
I GTD score changes when a SNP is removed from a set underevaluation.
I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.
I If GTD increases, this SNP is not important given the otherSNPs in the set.
I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.
I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.
10 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association method
I GTD score changes when a SNP is removed from a set underevaluation.
I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.
I If GTD increases, this SNP is not important given the otherSNPs in the set.
I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.
I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.
10 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association method
I GTD score changes when a SNP is removed from a set underevaluation.
I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.
I If GTD increases, this SNP is not important given the otherSNPs in the set.
I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.
I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.
10 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Backward Genotype-Trait Association method
I GTD score changes when a SNP is removed from a set underevaluation.
I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.
I If GTD increases, this SNP is not important given the otherSNPs in the set.
I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.
I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.
10 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Definition of interaction
In our research, we have considered two methods to identify“interactions”:
I Screening return frequencies: SNPs that have interactionsassociated to the disease status are likely to be “returned”together more often than random.
I GTD scores: irreducible SNP clusters are local maxima fromthe backward screening based on random subset of SNPs.SNPs with high joint GTD score and are in a irreducible set,this is regarded as evidence that they are jointly associatedwith the disease status, thus “interaction”.
11 / 25
Constructing Association Networks using BGTA
New Statistics for studying interactions
Definition of interaction
In our research, we have considered two methods to identify“interactions”:
I Screening return frequencies: SNPs that have interactionsassociated to the disease status are likely to be “returned”together more often than random.
I GTD scores: irreducible SNP clusters are local maxima fromthe backward screening based on random subset of SNPs.SNPs with high joint GTD score and are in a irreducible set,this is regarded as evidence that they are jointly associatedwith the disease status, thus “interaction”.
11 / 25
Constructing Association Networks using BGTA
Examples
A simulation example: oligogenic trait with a gene network
Disease model:(a) Tri-locus genotypic risk array:
Penetrance Locus AGenotypes AA Aa aa
Locus B BB Bb bb BB Bb bb BB Bb bbEE n n D n n n n n n
Locus E Ee n n D n D D n n nee n n D n D D D D D
12 / 25
Constructing Association Networks using BGTA
Examples
A simulation example: oligogenic trait with a gene network
'
&
$
%����A1
����B1 ����E1
� @
'
&
$
%����A2
����B2 ����E2
� @
(a) Specified disease model
13 / 25
Constructing Association Networks using BGTA
Examples
A simulation example: oligogenic trait with a gene network
Joint return frequencies (Screening is done on 30 SNPs, 6 of whichassociated with the disease genes. )
Joint returns group1 group2(p-value) M1 → A1 M2 → B1 M3 → E1 M4 → A2 M5 → B2 M6 → E2M1 → A1 993 253 341 0 0 1
(< 10−15) (< 10−15)M2 → B1 253 823 3 6 1 1
(< 10−15)M3 → E1 341 3 755 144 10 1
(< 10−15) (3.1× 10−7)M4 → A2 0 6 144 656 80 304
(3.1× 10−7) (0.015) (< 10−15)M5 → B2 0 1 10 80 487 1
(0.015)M6 → E2 1 1 1 304 1 841
(< ×10−15)
14 / 25
Constructing Association Networks using BGTA
Examples
A simulation example: oligogenic trait with a gene network
����M1
����M2 ����M3� H ����M4
����M5
����M6���
H
(b) Network constructed from data
15 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that
exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on
5407 SNPs throughout the genome.I We used a two-stage screening for this data set.
I First stage: use standard BGTA screening and select topapproximately 20% important markers.
I Second stage: further screening to identify important markerclusters.
I Significant markers were selected based on FDR estimatedusing permutations.
I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.
16 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that
exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on
5407 SNPs throughout the genome.I We used a two-stage screening for this data set.
I First stage: use standard BGTA screening and select topapproximately 20% important markers.
I Second stage: further screening to identify important markerclusters.
I Significant markers were selected based on FDR estimatedusing permutations.
I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.
16 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that
exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on
5407 SNPs throughout the genome.I We used a two-stage screening for this data set.
I First stage: use standard BGTA screening and select topapproximately 20% important markers.
I Second stage: further screening to identify important markerclusters.
I Significant markers were selected based on FDR estimatedusing permutations.
I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.
16 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that
exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on
5407 SNPs throughout the genome.I We used a two-stage screening for this data set.
I First stage: use standard BGTA screening and select topapproximately 20% important markers.
I Second stage: further screening to identify important markerclusters.
I Significant markers were selected based on FDR estimatedusing permutations.
I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.
16 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that
exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on
5407 SNPs throughout the genome.I We used a two-stage screening for this data set.
I First stage: use standard BGTA screening and select topapproximately 20% important markers.
I Second stage: further screening to identify important markerclusters.
I Significant markers were selected based on FDR estimatedusing permutations.
I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.
16 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that
exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on
5407 SNPs throughout the genome.I We used a two-stage screening for this data set.
I First stage: use standard BGTA screening and select topapproximately 20% important markers.
I Second stage: further screening to identify important markerclusters.
I Significant markers were selected based on FDR estimatedusing permutations.
I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.
16 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that
exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on
5407 SNPs throughout the genome.I We used a two-stage screening for this data set.
I First stage: use standard BGTA screening and select topapproximately 20% important markers.
I Second stage: further screening to identify important markerclusters.
I Significant markers were selected based on FDR estimatedusing permutations.
I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.
16 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis I
17 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis I
clusters count gtd score gtd variance( 12, 15, 20, ) 1 0.0452466485 0.0000076622( 40, 47, ) 23 0.0447642803 0.0000057909( 12, 15, ) 23 0.0439568423 0.0000086410( 11, 21, 26, ) 1 0.0415141680 0.0000063145( 2, 28, 42, ) 1 0.0414631006 0.0000055331( 2, 28, 48, ) 1 0.0411999494 0.0000049326( 2, 28, 39, ) 2 0.0410552395 0.0000058928( 4, 19, 49, ) 1 0.0407363209 0.0000052556( 18, 34, 48, 992, ) 1 0.0405105566 0.0000067487( 2, 28, ) 28 0.0403270020 0.0000059208( 11, 21, 42, ) 1 0.0389051813 0.0000073572( 11, 13, 31, ) 1 0.0388380522 0.0000050127( 24, 41, ) 30 0.0387972020 0.0000038129( 11, 16, 42, ) 1 0.0386918428 0.0000067935( 9, 41, 43, ) 1 0.0386662864 0.0000069116( 11, 13, ) 23 0.0385128244 0.0000053157( 5, 6, 27, ) 1 0.0384394683 0.0000027612( 18, 48, ) 23 0.0383532977 0.0000087086( 17, 38, ) 26 0.0383203274 0.0000064260( 39, 40, 636, ) 1 0.0382886778 0.0000046856( 40, 636, ) 17 0.0381770392 0.0000047069( 11, 42, ) 32 0.0379699185 0.0000089074( 29, 45, 50, ) 1 0.0375913449 0.0000055796
18 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis I
19 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
I A candidate gene study on RA.
I 20 SNPs from 14 candidate genes for RA.
I 839 cases and 855 unrelated controls.
I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.
I We use 100 permutations to control for family-wise error rate.
20 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
I A candidate gene study on RA.
I 20 SNPs from 14 candidate genes for RA.
I 839 cases and 855 unrelated controls.
I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.
I We use 100 permutations to control for family-wise error rate.
20 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
I A candidate gene study on RA.
I 20 SNPs from 14 candidate genes for RA.
I 839 cases and 855 unrelated controls.
I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.
I We use 100 permutations to control for family-wise error rate.
20 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
I A candidate gene study on RA.
I 20 SNPs from 14 candidate genes for RA.
I 839 cases and 855 unrelated controls.
I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.
I We use 100 permutations to control for family-wise error rate.
20 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
I A candidate gene study on RA.
I 20 SNPs from 14 candidate genes for RA.
I 839 cases and 855 unrelated controls.
I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.
I We use 100 permutations to control for family-wise error rate.
20 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
21 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
22 / 25
Constructing Association Networks using BGTA
Examples
Application to Rheumatoid Arthritis II
23 / 25
Constructing Association Networks using BGTA
Examples
Current and Future effort
I We are analyzing a breast cancer whole-genome scan.
I We are develop gene-based analysis tools.
I For whole-genome association study, new methods andcomputational strategies are needed to accommodate thelarge number of SNPs.
24 / 25
Constructing Association Networks using BGTA
Examples
Current and Future effort
I We are analyzing a breast cancer whole-genome scan.
I We are develop gene-based analysis tools.
I For whole-genome association study, new methods andcomputational strategies are needed to accommodate thelarge number of SNPs.
24 / 25
Constructing Association Networks using BGTA
Examples
Current and Future effort
I We are analyzing a breast cancer whole-genome scan.
I We are develop gene-based analysis tools.
I For whole-genome association study, new methods andcomputational strategies are needed to accommodate thelarge number of SNPs.
24 / 25
Constructing Association Networks using BGTA
Examples
THANK YOU!
25 / 25