constructing gene association networks for complex human disordes

55
Constructing Association Networks using BGTA Constructing Gene Association Networks for Complex Human Disordes Using the BGTA Algorithm Tian Zheng Department of Statistics Columbia University June 7th, 2008 1 / 25

Upload: others

Post on 03-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Constructing Gene Association Networks forComplex Human Disordes

Using the BGTA Algorithm

Tian ZhengDepartment of Statistics

Columbia University

June 7th, 2008

1 / 25

Page 2: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Acknowledgements

I http://statgene.stat.columbia.eduI Collaborators

I Professors Shaw-Hwa Lo and Herman ChernoffI Graduate student: Yuejing DingI Our computer specialist: Lei cong

I The research presented is, in part, supported by NIH and NSF.

2 / 25

Page 3: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Motivation

Motivation

I Complex traits –“... are caused by multiple genes interactingwith each other and with environmental factors to create agradient of genetic susceptibility to disease.” (Weeks andLathrop 1995)

I Gene-gene interactions may play a more important role incommon human disorders, which has made the identificationof disease-predisposing genes less successful.

I Multi-marker information collected in genetic studies providesa way to examine interaction information about a disease.

3 / 25

Page 4: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Motivation

Motivation

I Complex traits –“... are caused by multiple genes interactingwith each other and with environmental factors to create agradient of genetic susceptibility to disease.” (Weeks andLathrop 1995)

I Gene-gene interactions may play a more important role incommon human disorders, which has made the identificationof disease-predisposing genes less successful.

I Multi-marker information collected in genetic studies providesa way to examine interaction information about a disease.

3 / 25

Page 5: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Motivation

Motivation

I Complex traits –“... are caused by multiple genes interactingwith each other and with environmental factors to create agradient of genetic susceptibility to disease.” (Weeks andLathrop 1995)

I Gene-gene interactions may play a more important role incommon human disorders, which has made the identificationof disease-predisposing genes less successful.

I Multi-marker information collected in genetic studies providesa way to examine interaction information about a disease.

3 / 25

Page 6: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Motivation

Gene association networks

I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.

I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.

I Combining such association information, one can construct anetwork among these identified genetic loci.

I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.

4 / 25

Page 7: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Motivation

Gene association networks

I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.

I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.

I Combining such association information, one can construct anetwork among these identified genetic loci.

I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.

4 / 25

Page 8: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Motivation

Gene association networks

I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.

I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.

I Combining such association information, one can construct anetwork among these identified genetic loci.

I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.

4 / 25

Page 9: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Motivation

Gene association networks

I In association mapping, we can consider the associationbetween a pair of genetic loci and the disease outcome.

I If the genetic combination (genotype) of the two loci are moreinformative about the disease risk than the individual locusinformation of them, one can say the interaction of these twoloci is associated with the disease.

I Combining such association information, one can construct anetwork among these identified genetic loci.

I Relevance of such a network to biological interactions is stillneed to be studied using biological tools.

4 / 25

Page 10: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Interaction

I “Interaction” is a biological term and a statistical term.

I In biology, interaction means a joint action in a molecular oretiological sense.

I In statistics, interaction is usually based on a specific model.

I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.

5 / 25

Page 11: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Interaction

I “Interaction” is a biological term and a statistical term.

I In biology, interaction means a joint action in a molecular oretiological sense.

I In statistics, interaction is usually based on a specific model.

I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.

5 / 25

Page 12: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Interaction

I “Interaction” is a biological term and a statistical term.

I In biology, interaction means a joint action in a molecular oretiological sense.

I In statistics, interaction is usually based on a specific model.

I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.

5 / 25

Page 13: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Interaction

I “Interaction” is a biological term and a statistical term.

I In biology, interaction means a joint action in a molecular oretiological sense.

I In statistics, interaction is usually based on a specific model.

I In our research, we study the joint association between a pairof loci and the disease trait and compare that to the individualassociation of these two loci. If the former is greater than thelatter, we define this as evidence of some “interaction”.

5 / 25

Page 14: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Partition-based measure of influence

I Consider a set of k SNPs.

I Each SNP has three possible genotypes (A/A, A/B, B/B).

I This creates a partition Π of 3k elements.

I We can study the joint “influence” (association) of theseSNP on a trait Y using

IΠ =∑

nj2(Yj − Y )2 = nσ2

∑ nj

n

(Yj − Y

σ/√

nj

)2

IΠ/nσ2 ∼

∑ nj

nχ2

1 under the null hypothesis.

6 / 25

Page 15: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Partition-based measure of influence

I Consider a set of k SNPs.

I Each SNP has three possible genotypes (A/A, A/B, B/B).

I This creates a partition Π of 3k elements.

I We can study the joint “influence” (association) of theseSNP on a trait Y using

IΠ =∑

nj2(Yj − Y )2 = nσ2

∑ nj

n

(Yj − Y

σ/√

nj

)2

IΠ/nσ2 ∼

∑ nj

nχ2

1 under the null hypothesis.

6 / 25

Page 16: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Partition-based measure of influence

I Consider a set of k SNPs.

I Each SNP has three possible genotypes (A/A, A/B, B/B).

I This creates a partition Π of 3k elements.

I We can study the joint “influence” (association) of theseSNP on a trait Y using

IΠ =∑

nj2(Yj − Y )2 = nσ2

∑ nj

n

(Yj − Y

σ/√

nj

)2

IΠ/nσ2 ∼

∑ nj

nχ2

1 under the null hypothesis.

6 / 25

Page 17: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Partition-based measure of influence

I Consider a set of k SNPs.

I Each SNP has three possible genotypes (A/A, A/B, B/B).

I This creates a partition Π of 3k elements.

I We can study the joint “influence” (association) of theseSNP on a trait Y using

IΠ =∑

nj2(Yj − Y )2 = nσ2

∑ nj

n

(Yj − Y

σ/√

nj

)2

IΠ/nσ2 ∼

∑ nj

nχ2

1 under the null hypothesis.

6 / 25

Page 18: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Partition-based measure of influence

(a) dist. of y dist. of I (shaded hist., black cdf) vs. asymp. (empty hist., red cdf)

cond. dist. of I (shaded hist., black cdf) vs. cond. asymp. (empty hist., red cdf)

bar plot: partition element sizes used in cond. dist.

partition: 20, observations: 100

0 2 4 6

0.000.050.100.150.200.25

f(y)

dens

ity

0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

0.20.40.60.81

dens

ity

1 2 3 4

0.00.51.01.5

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

1

2

3

partition: 200, observations: 200

0 2 4 6

0.000.050.100.150.200.25

f(y)

dens

ity

0.6 0.8 1.0 1.2 1.4

012345

0.20.40.60.81

dens

ity

0.5 1.0 1.5

01234

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

25

50

75

partition: 2000, observations: 1000

0 2 4 6

0.000.050.100.150.200.25

f(y)

dens

ity

0.90 1.00 1.10

0

5

10

15

0.20.40.60.81

dens

ity

0.8 0.9 1.0 1.1 1.2

0

5

10

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

400

800

1200

(b)

partition: 20, observations: 100

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5

dens

ity

1 2 3 4

0.0

0.5

1.0

1.5

0.20.40.60.81

dens

ity

1 2 3 4

0.00.51.01.52.0

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

2

4

6

partition: 200, observations: 200

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5

dens

ity

0.8 1.0 1.2 1.4 1.6

012345

0.20.40.60.81

dens

ity

1.0 1.5

01234

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

25

50

75

partition: 200, observations: 1000

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5

dens

ity

1.0 1.5 2.0 2.5 3.0

012345

0.20.40.60.81

dens

ity

1.0 1.5 2.0 2.5 3.0

0

2

4

6

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

15

30

45

partition: 2000, observations: 1000

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5

dens

ity

0.9 1.0 1.1 1.2 1.3

0

5

10

15

0.20.40.60.81

dens

ity

0.8 0.9 1.0 1.1 1.2

0

5

10

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

400

800

1200

y I scores I scores ni

7 / 25

Page 19: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Partition-based measure of influence

(a) dist. of y dist. of I (shaded hist., black cdf) vs. asymp. (empty hist., red cdf)

cond. dist. of I (shaded hist., black cdf) vs. cond. asymp. (empty hist., red cdf)

bar plot: partition element sizes used in cond. dist.

partition: 20, observations: 100

0 2 4 6

0.000.050.100.150.200.25

f(y)

dens

ity

0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

0.20.40.60.81

dens

ity

1 2 3 4

0.00.51.01.5

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

1

2

3

partition: 200, observations: 200

0 2 4 6

0.000.050.100.150.200.25

f(y)

dens

ity

0.6 0.8 1.0 1.2 1.4

012345

0.20.40.60.81

dens

ity

0.5 1.0 1.5

01234

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

25

50

75

partition: 2000, observations: 1000

0 2 4 6

0.000.050.100.150.200.25

f(y)

dens

ity

0.90 1.00 1.10

0

5

10

15

0.20.40.60.81

dens

ity

0.8 0.9 1.0 1.1 1.2

0

5

10

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

400

800

1200

(b)

partition: 20, observations: 100

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5

dens

ity

1 2 3 4

0.0

0.5

1.0

1.5

0.20.40.60.81

dens

ity

1 2 3 4

0.00.51.01.52.0

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

2

4

6

partition: 200, observations: 200

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5de

nsity

0.8 1.0 1.2 1.4 1.6

012345

0.20.40.60.81

dens

ity

1.0 1.5

01234

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

25

50

75

partition: 200, observations: 1000

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5

dens

ity

1.0 1.5 2.0 2.5 3.0

012345

0.20.40.60.81

dens

ity

1.0 1.5 2.0 2.5 3.0

0

2

4

6

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

15

30

45

partition: 2000, observations: 1000

0 2 4 6 8

0.000.050.100.150.200.25

f(y)

δδ = 1.5

dens

ity

0.9 1.0 1.1 1.2 1.3

0

5

10

15

0.20.40.60.81

dens

ity

0.8 0.9 1.0 1.1 1.2

0

5

10

0.20.40.60.81

0 2 4 6 8 10 12

coun

ts

0

400

800

1200

y I scores I scores ni

8 / 25

Page 20: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association methodI In Zheng et al. (2006), we proposed the Genotype-Trait

Distortion (GTD) score to measure association between a setof SNPs and the disease status.

I GTD uses the sum of squared difference between genotypedistributions among the cases and controls.

I This is a special case of the partition-based influence score:

IΠ =∑j∈Π

n2j (Yj − Y )2

=3m∑i=1

(nd ,i + nu,i )2

(nd ,i

nd ,i + nu,i− nd

nd + nu

)2

=

(ndnu

nd + nu

)2 3m∑i=1

(nd ,i

nd−

nu,i

nu

)2

.

9 / 25

Page 21: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association methodI In Zheng et al. (2006), we proposed the Genotype-Trait

Distortion (GTD) score to measure association between a setof SNPs and the disease status.

I GTD uses the sum of squared difference between genotypedistributions among the cases and controls.

I This is a special case of the partition-based influence score:

IΠ =∑j∈Π

n2j (Yj − Y )2

=3m∑i=1

(nd ,i + nu,i )2

(nd ,i

nd ,i + nu,i− nd

nd + nu

)2

=

(ndnu

nd + nu

)2 3m∑i=1

(nd ,i

nd−

nu,i

nu

)2

.

9 / 25

Page 22: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association methodI In Zheng et al. (2006), we proposed the Genotype-Trait

Distortion (GTD) score to measure association between a setof SNPs and the disease status.

I GTD uses the sum of squared difference between genotypedistributions among the cases and controls.

I This is a special case of the partition-based influence score:

IΠ =∑j∈Π

n2j (Yj − Y )2

=3m∑i=1

(nd ,i + nu,i )2

(nd ,i

nd ,i + nu,i− nd

nd + nu

)2

=

(ndnu

nd + nu

)2 3m∑i=1

(nd ,i

nd−

nu,i

nu

)2

.

9 / 25

Page 23: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association method

I GTD score changes when a SNP is removed from a set underevaluation.

I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.

I If GTD increases, this SNP is not important given the otherSNPs in the set.

I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.

I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.

10 / 25

Page 24: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association method

I GTD score changes when a SNP is removed from a set underevaluation.

I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.

I If GTD increases, this SNP is not important given the otherSNPs in the set.

I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.

I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.

10 / 25

Page 25: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association method

I GTD score changes when a SNP is removed from a set underevaluation.

I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.

I If GTD increases, this SNP is not important given the otherSNPs in the set.

I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.

I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.

10 / 25

Page 26: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association method

I GTD score changes when a SNP is removed from a set underevaluation.

I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.

I If GTD increases, this SNP is not important given the otherSNPs in the set.

I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.

I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.

10 / 25

Page 27: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Backward Genotype-Trait Association method

I GTD score changes when a SNP is removed from a set underevaluation.

I If GTD drops, this SNP is important and possibly interactwith the other SNPs in the set.

I If GTD increases, this SNP is not important given the otherSNPs in the set.

I In BGTA, we used a backward greedy screening on randomsubsets of SNPs to screen a big set of candidate SNPs.

I The screening results are saved as returning frequencies ofeach SNP and GTD scores of irreducible SNP clusters fromthe backward screening.

10 / 25

Page 28: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Definition of interaction

In our research, we have considered two methods to identify“interactions”:

I Screening return frequencies: SNPs that have interactionsassociated to the disease status are likely to be “returned”together more often than random.

I GTD scores: irreducible SNP clusters are local maxima fromthe backward screening based on random subset of SNPs.SNPs with high joint GTD score and are in a irreducible set,this is regarded as evidence that they are jointly associatedwith the disease status, thus “interaction”.

11 / 25

Page 29: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

New Statistics for studying interactions

Definition of interaction

In our research, we have considered two methods to identify“interactions”:

I Screening return frequencies: SNPs that have interactionsassociated to the disease status are likely to be “returned”together more often than random.

I GTD scores: irreducible SNP clusters are local maxima fromthe backward screening based on random subset of SNPs.SNPs with high joint GTD score and are in a irreducible set,this is regarded as evidence that they are jointly associatedwith the disease status, thus “interaction”.

11 / 25

Page 30: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

A simulation example: oligogenic trait with a gene network

Disease model:(a) Tri-locus genotypic risk array:

Penetrance Locus AGenotypes AA Aa aa

Locus B BB Bb bb BB Bb bb BB Bb bbEE n n D n n n n n n

Locus E Ee n n D n D D n n nee n n D n D D D D D

12 / 25

Page 31: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

A simulation example: oligogenic trait with a gene network

'

&

$

%����A1

����B1 ����E1

� @

'

&

$

%����A2

����B2 ����E2

� @

(a) Specified disease model

13 / 25

Page 32: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

A simulation example: oligogenic trait with a gene network

Joint return frequencies (Screening is done on 30 SNPs, 6 of whichassociated with the disease genes. )

Joint returns group1 group2(p-value) M1 → A1 M2 → B1 M3 → E1 M4 → A2 M5 → B2 M6 → E2M1 → A1 993 253 341 0 0 1

(< 10−15) (< 10−15)M2 → B1 253 823 3 6 1 1

(< 10−15)M3 → E1 341 3 755 144 10 1

(< 10−15) (3.1× 10−7)M4 → A2 0 6 144 656 80 304

(3.1× 10−7) (0.015) (< 10−15)M5 → B2 0 1 10 80 487 1

(0.015)M6 → E2 1 1 1 304 1 841

(< ×10−15)

14 / 25

Page 33: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

A simulation example: oligogenic trait with a gene network

����M1

����M2 ����M3� H ����M4

����M5

����M6���

H

(b) Network constructed from data

15 / 25

Page 34: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that

exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on

5407 SNPs throughout the genome.I We used a two-stage screening for this data set.

I First stage: use standard BGTA screening and select topapproximately 20% important markers.

I Second stage: further screening to identify important markerclusters.

I Significant markers were selected based on FDR estimatedusing permutations.

I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.

16 / 25

Page 35: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that

exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on

5407 SNPs throughout the genome.I We used a two-stage screening for this data set.

I First stage: use standard BGTA screening and select topapproximately 20% important markers.

I Second stage: further screening to identify important markerclusters.

I Significant markers were selected based on FDR estimatedusing permutations.

I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.

16 / 25

Page 36: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that

exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on

5407 SNPs throughout the genome.I We used a two-stage screening for this data set.

I First stage: use standard BGTA screening and select topapproximately 20% important markers.

I Second stage: further screening to identify important markerclusters.

I Significant markers were selected based on FDR estimatedusing permutations.

I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.

16 / 25

Page 37: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that

exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on

5407 SNPs throughout the genome.I We used a two-stage screening for this data set.

I First stage: use standard BGTA screening and select topapproximately 20% important markers.

I Second stage: further screening to identify important markerclusters.

I Significant markers were selected based on FDR estimatedusing permutations.

I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.

16 / 25

Page 38: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that

exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on

5407 SNPs throughout the genome.I We used a two-stage screening for this data set.

I First stage: use standard BGTA screening and select topapproximately 20% important markers.

I Second stage: further screening to identify important markerclusters.

I Significant markers were selected based on FDR estimatedusing permutations.

I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.

16 / 25

Page 39: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that

exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on

5407 SNPs throughout the genome.I We used a two-stage screening for this data set.

I First stage: use standard BGTA screening and select topapproximately 20% important markers.

I Second stage: further screening to identify important markerclusters.

I Significant markers were selected based on FDR estimatedusing permutations.

I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.

16 / 25

Page 40: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II Rheumatoid Arthritis (RA) is a heterogeneous disease that

exhibits a complex genetic component.I We studied 349 controls and 474 cases with genotypes on

5407 SNPs throughout the genome.I We used a two-stage screening for this data set.

I First stage: use standard BGTA screening and select topapproximately 20% important markers.

I Second stage: further screening to identify important markerclusters.

I Significant markers were selected based on FDR estimatedusing permutations.

I For 39 identified loci that showed strong association with theRA, of which about 2/3 were found in the RA literature, weconstructed an association network among them usingassociation scores.

16 / 25

Page 41: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis I

17 / 25

Page 42: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis I

clusters count gtd score gtd variance( 12, 15, 20, ) 1 0.0452466485 0.0000076622( 40, 47, ) 23 0.0447642803 0.0000057909( 12, 15, ) 23 0.0439568423 0.0000086410( 11, 21, 26, ) 1 0.0415141680 0.0000063145( 2, 28, 42, ) 1 0.0414631006 0.0000055331( 2, 28, 48, ) 1 0.0411999494 0.0000049326( 2, 28, 39, ) 2 0.0410552395 0.0000058928( 4, 19, 49, ) 1 0.0407363209 0.0000052556( 18, 34, 48, 992, ) 1 0.0405105566 0.0000067487( 2, 28, ) 28 0.0403270020 0.0000059208( 11, 21, 42, ) 1 0.0389051813 0.0000073572( 11, 13, 31, ) 1 0.0388380522 0.0000050127( 24, 41, ) 30 0.0387972020 0.0000038129( 11, 16, 42, ) 1 0.0386918428 0.0000067935( 9, 41, 43, ) 1 0.0386662864 0.0000069116( 11, 13, ) 23 0.0385128244 0.0000053157( 5, 6, 27, ) 1 0.0384394683 0.0000027612( 18, 48, ) 23 0.0383532977 0.0000087086( 17, 38, ) 26 0.0383203274 0.0000064260( 39, 40, 636, ) 1 0.0382886778 0.0000046856( 40, 636, ) 17 0.0381770392 0.0000047069( 11, 42, ) 32 0.0379699185 0.0000089074( 29, 45, 50, ) 1 0.0375913449 0.0000055796

18 / 25

Page 43: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis I

19 / 25

Page 44: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

I A candidate gene study on RA.

I 20 SNPs from 14 candidate genes for RA.

I 839 cases and 855 unrelated controls.

I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.

I We use 100 permutations to control for family-wise error rate.

20 / 25

Page 45: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

I A candidate gene study on RA.

I 20 SNPs from 14 candidate genes for RA.

I 839 cases and 855 unrelated controls.

I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.

I We use 100 permutations to control for family-wise error rate.

20 / 25

Page 46: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

I A candidate gene study on RA.

I 20 SNPs from 14 candidate genes for RA.

I 839 cases and 855 unrelated controls.

I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.

I We use 100 permutations to control for family-wise error rate.

20 / 25

Page 47: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

I A candidate gene study on RA.

I 20 SNPs from 14 candidate genes for RA.

I 839 cases and 855 unrelated controls.

I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.

I We use 100 permutations to control for family-wise error rate.

20 / 25

Page 48: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

I A candidate gene study on RA.

I 20 SNPs from 14 candidate genes for RA.

I 839 cases and 855 unrelated controls.

I We evaluated all subsets of 20 SNPs and identified those thatare irreducible in BGTA screening.

I We use 100 permutations to control for family-wise error rate.

20 / 25

Page 49: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

21 / 25

Page 50: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

22 / 25

Page 51: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Application to Rheumatoid Arthritis II

23 / 25

Page 52: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Current and Future effort

I We are analyzing a breast cancer whole-genome scan.

I We are develop gene-based analysis tools.

I For whole-genome association study, new methods andcomputational strategies are needed to accommodate thelarge number of SNPs.

24 / 25

Page 53: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Current and Future effort

I We are analyzing a breast cancer whole-genome scan.

I We are develop gene-based analysis tools.

I For whole-genome association study, new methods andcomputational strategies are needed to accommodate thelarge number of SNPs.

24 / 25

Page 54: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

Current and Future effort

I We are analyzing a breast cancer whole-genome scan.

I We are develop gene-based analysis tools.

I For whole-genome association study, new methods andcomputational strategies are needed to accommodate thelarge number of SNPs.

24 / 25

Page 55: Constructing Gene Association Networks for Complex Human Disordes

Constructing Association Networks using BGTA

Examples

THANK YOU!

25 / 25