bmc proceedings biomed centrals-space.snu.ac.kr/.../109989/1/12919_2007_article_2516.pdf · 2019....

6
BioMed Central Page 1 of 6 (page number not for citation purposes) BMC Proceedings Open Access Proceedings Normalization of microarray expression data using within-pedigree pool and its effect on linkage analysis Yoonhee Kim 1,2 , Betty Q Doan 3 , Priya Duggal 1 and Joan E Bailey-Wilson* 1 Address: 1 Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, Maryland 21224, USA, 2 Department of Biostatistics and Epidemiology, School of Public Health, Seoul National University, 28 Yeungundong, Chongnogu, Seoul 110-799, Republic of Korea and 3 McKusick-Nathans Institute of Genetic Medicine, School of Medicine, Johns Hopkins University, 733 North Broadway, Baltimore, Maryland 21205, USA Email: Yoonhee Kim - [email protected]; Betty Q Doan - [email protected]; Priya Duggal - [email protected]; Joan E Bailey- Wilson* - [email protected] * Corresponding author Abstract "Genetical genomics", the study of natural genetic variation combining data from genetic marker- based studies with gene expression analyses, has exploded with the recent development of advanced microarray technologies. To account for systematic variation known to exist in microarray data, it is critical to properly normalize gene expression traits before performing genetic linkage analyses. However, imposing equal means and variances across pedigrees can over- correct for the true biological variation by ignoring familial correlations in expression values. We applied the robust multiarray average (RMA) method to gene expression trait data from 14 Centre d'Etude du Polymorphisme Humain (CEPH) Utah pedigrees provided by GAW15 (Genetic Analysis Workshop 15). We compared the RMA normalization method using within-pedigree pools to RMA normalization using all individuals in a single pool, which ignores pedigree membership, and investigated the effects of these different methods on 18 gene expression traits previously found to be linked to regions containing the corresponding structural locus. Familial correlation coefficients of the expressed traits were stronger when traits were normalized within pedigrees. Surprisingly, the linkage plots for these traits were similar, suggesting that although heritability increases when traits are normalized within pedigrees, the strength of linkage evidence does not necessarily change substantially. Background Genetical genomics [1] integrates genome-wide expres- sion profile data of microarray experiments and marker- based measures of genetic variation. It has newly become a central methodology in quantitative trait studies in order to determine loci involved in regulatory expression of quantitative variation in RNA level. As a result of the reduced cost of microarray expression arrays and marker genotyping, these studies have been extended to study quantitative traits in linkage and association studies [2,3]. from Genetic Analysis Workshop 15 St. Pete Beach, Florida, USA. 11–15 November 2006 Published: 18 December 2007 BMC Proceedings 2007, 1(Suppl 1):S152 <supplement> <title> <p>Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci</p> </title> <editor>Heather J Cordell, Mariza de Andrade, Marie-Claude Babron, Christopher W Bartlett, Joseph Beyene, Heike Bickeböller, Robert Culverhouse, Adrienne Cupples, E Warwick Daw, Josée Dupuis, Catherine T Falk, Saurabh Ghosh, Katrina A Goddard, Ellen L Goode, Elizabeth R Hauser, Lisa J Martin, Maria Martinez, Kari E North, Nancy L Saccone, Silke Schmidt, William Tapper, Duncan Thomas, David Tritchler, Veronica J Vieland, Ellen M Wijsman, Marsha A Wilcox, John S Witte, Qiong Yang, Andreas Ziegler, Laura Almasy and Jean W MacCluer</editor> <note>Proceedings</note> <url>http://www.biomedcentral.com/content/pdf/1753-6561-1-S1-info.pdf</url> </supplement> This article is available from: http://www.biomedcentral.com/1753-6561/1/S1/S152 © 2007 Kim et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Upload: others

Post on 27-Jan-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • BioMed CentralBMC Proceedings

    ss

    Open AcceProceedingsNormalization of microarray expression data using within-pedigree pool and its effect on linkage analysisYoonhee Kim1,2, Betty Q Doan3, Priya Duggal1 and Joan E Bailey-Wilson*1

    Address: 1Inherited Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive, Suite 1200, Baltimore, Maryland 21224, USA, 2Department of Biostatistics and Epidemiology, School of Public Health, Seoul National University, 28 Yeungundong, Chongnogu, Seoul 110-799, Republic of Korea and 3McKusick-Nathans Institute of Genetic Medicine, School of Medicine, Johns Hopkins University, 733 North Broadway, Baltimore, Maryland 21205, USA

    Email: Yoonhee Kim - [email protected]; Betty Q Doan - [email protected]; Priya Duggal - [email protected]; Joan E Bailey-Wilson* - [email protected]

    * Corresponding author

    Abstract"Genetical genomics", the study of natural genetic variation combining data from genetic marker-based studies with gene expression analyses, has exploded with the recent development ofadvanced microarray technologies. To account for systematic variation known to exist inmicroarray data, it is critical to properly normalize gene expression traits before performinggenetic linkage analyses. However, imposing equal means and variances across pedigrees can over-correct for the true biological variation by ignoring familial correlations in expression values. Weapplied the robust multiarray average (RMA) method to gene expression trait data from 14 Centred'Etude du Polymorphisme Humain (CEPH) Utah pedigrees provided by GAW15 (Genetic AnalysisWorkshop 15). We compared the RMA normalization method using within-pedigree pools to RMAnormalization using all individuals in a single pool, which ignores pedigree membership, andinvestigated the effects of these different methods on 18 gene expression traits previously foundto be linked to regions containing the corresponding structural locus. Familial correlationcoefficients of the expressed traits were stronger when traits were normalized within pedigrees.Surprisingly, the linkage plots for these traits were similar, suggesting that although heritabilityincreases when traits are normalized within pedigrees, the strength of linkage evidence does notnecessarily change substantially.

    BackgroundGenetical genomics [1] integrates genome-wide expres-sion profile data of microarray experiments and marker-based measures of genetic variation. It has newly becomea central methodology in quantitative trait studies in

    order to determine loci involved in regulatory expressionof quantitative variation in RNA level. As a result of thereduced cost of microarray expression arrays and markergenotyping, these studies have been extended to studyquantitative traits in linkage and association studies [2,3].

    from Genetic Analysis Workshop 15St. Pete Beach, Florida, USA. 11–15 November 2006

    Published: 18 December 2007

    BMC Proceedings 2007, 1(Suppl 1):S152

    Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci

    Heather J Cordell, Mariza de Andrade, Marie-Claude Babron, Christopher W Bartlett, Joseph Beyene, Heike Bickeböller, Robert Culverhouse, Adrienne Cupples, E Warwick Daw, Josée Dupuis, Catherine T Falk, Saurabh Ghosh, Katrina A Goddard, Ellen L Goode, Elizabeth R Hauser, Lisa J Martin, Maria Martinez, Kari E North, Nancy L Saccone, Silke Schmidt, William Tapper, Duncan Thomas, David Tritchler, Veronica J Vieland, Ellen M Wijsman, Marsha A Wilcox, John S Witte, Qiong Yang, Andreas Ziegler, Laura Almasy and Jean W MacCluer Proceedings http://www.biomedcentral.com/content/pdf/1753-6561-1-S1-info.pdf

    This article is available from: http://www.biomedcentral.com/1753-6561/1/S1/S152

    © 2007 Kim et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

    Page 1 of 6(page number not for citation purposes)

    http://www.biomedcentral.com/1753-6561/1/S1/S152http://creativecommons.org/licenses/by/2.0http://www.biomedcentral.com/http://www.biomedcentral.com/info/about/charter/

  • BMC Proceedings 2007, 1(Suppl 1):S152 http://www.biomedcentral.com/1753-6561/1/S1/S152

    In microarray experiments, research on determiningwhich normalization method best adjusts for systematicexperimental variation to produce unbiased data is neces-sary. Among the normalization methods used for thecommon Affymetrix GeneChip, the robust multiarrayaverage (RMA) method and the statistical algorithmimplemented in Affymetrix's Microarray Suite (MAS5)program are the gold standard to control for systematicvariation in samples of unrelated individuals [4,5]. Inves-tigation of the effects of various normalization methodsin family data are needed [6].

    RMA adjusts for systematic variation by performing aquantile normalization procedure, which assumes thatthe data for the variable considered (such as study sam-ple) all are sampled from the same or similar distributionsand the values for that variable are then normalized to astandard distribution. However, it is not yet known whichstandard distribution is the best to use for family data ingenetical genomics studies. Linkage analysis utilizes datafrom pedigrees, which usually have more homogeneity

    within pedigrees for the trait under study than betweenpedigrees, both biologically and environmentally. Thus,the increased background sharing within pedigrees canresult in increased correlation between related individualsfor expression levels. Consequently, the familial correla-tion within each pedigree due to biological similarity is ofinterest in linkage studies. Therefore, we hypothesizedthat linkage analysis that uses expression trait data nor-malized by ignoring pedigree membership could improp-erly 'correct' the trait values by imposing equal means andvariances across pedigrees. A conceptual graph (Fig. 1)plots the values of one trait for each individual. It showshow an outlier in a family, whose members have a differ-ent distribution of expression values than other families,will be systematically adjusted (solid arrow) toward thefamilial mean value if its value is normalized by within-pedigree normalization methods. However, the normal-ized trait values will be over-corrected and biased towardthe overall mean if normalized using all individualsbecause this method falsely assumes that they are unre-lated individuals sampled from a single distribution. A

    Adjustment of systematic variation by normalization using within-pedigree pool and using all individuals poolFigure 1Adjustment of systematic variation by normalization using within-pedigree pool and using all individuals pool. a, RMA using a within-pedigree pool: an outlier with systematic variation will be adjusted toward the familial mean of that par-ticular gene expression value. Each symbol (circle, triangle, cross, and X) represents the family and horizontal lines indicate the familial mean value. b, RMA using all individuals as the pool (all subjects shown as circles): an outlier will be adjusted toward the mean of all individuals' gene expression values.

    Page 2 of 6(page number not for citation purposes)

  • BMC Proceedings 2007, 1(Suppl 1):S152 http://www.biomedcentral.com/1753-6561/1/S1/S152

    second member of the same family would be adjusted ina different direction (dashed arrow) away from the overallmean and toward the family mean if its value is normal-ized by within-pedigree normalization.

    Our aim is to maintain the individual familial distribu-tions with normalization. We address this problem ofnormalization of expression data within pedigrees bycomparing two possible standard distributions for nor-malization: 1) applying RMA across all arrays as a pool,which assumes all individuals are independent and sharethe same distribution of trait values, and 2) applying RMAto arrays within pedigrees assuming family memberswithin a pedigree share the same distribution.

    MethodsStudy subjects consisted of 194 individuals from 14 Cen-tre d'Etude du Polymorphisme Humain (CEPH) Utahpedigrees, and 2882 autosomal and X-linked single-nucle-otide polymorphism (SNP) genotypes were availablefrom The SNP Consortium [2]. Quantitative phenotypedata were generated from immortalized B cells, and 8793gene expression values were available from the microarrayraw CEL (cell intensity file) data files from the AffymetrixGenechips and Hgfocus CDF (chip description file) files.In cases in which a subject had more than one expressionfile, the first replicate was chosen for this analysis so thatwe used one array per subject. To evaluate the perform-ance of different scenarios, two data sets were used: 1)RMA normalized using all individuals (arrays) as the nor-malization pool, and 2) RMA normalization applied to awithin-pedigree pool to be able to allow for as many dis-tributions as the number of pedigrees. The 'affy' packagev1.8 in R v2.3.1 was used to perform the normalizationsof the expression data [7].

    In order to examine the effect of using different normali-zation pools consisting of different types of individuals,we performed paired t-tests using 8793 gene expressionvalues of four founders of one family in two ways: com-paring the values of the four founders after normalizingusing themselves as the pool to paired gene expressionvalues after normalizing 1) using all family membersincluding themselves as the pool, and 2) using independ-ent individuals as the pool (other grandparents fromother families) [8]. The purpose of microarray normaliza-tion methods is to remove systematic variation while pre-serving biological variation. Therefore, comparison of thenormalized gene expression data using the four foundersas their own normalization pool to the normalized datausing either of the other two pools (family members orindependent individuals) should not show significant dif-ferences if the normalization pools are all only removingrandom error variation. Thus, we should observe non-sig-nificant p-values for these paired t-tests if only random

    variation is removed by each method. We assumed thatthese t-tests indicated a significant difference in the nor-malization methods if the p-value was less than a conserv-ative p-value of 0.001 (since we were performing over8,000 tests). However, we also evaluated this using a p-value of 0.05 as the significance threshold.

    To examine the effect of normalization methods on link-age results, we selected 18 cis-acting transcriptional regu-lator phenotypes with previous evidence of cis-actinglinkage to the known location of each correspondingstructural gene [2]. Nonparametric quantitative linkageanalysis was performed using Merlin v1.0 with the qtloption. We plotted both the negative p-values of the non-parametric linkage score [8] and the allele-sharing LODscore of Kong and Cox [9]. Based on the change of meanand variance of the trait values in each array, some indi-viduals may have different trait values when using differ-ent normalization methods. FCOR in S.A.G.E v5.1.0 wasused to calculate familial correlations (e.g., parent-off-spring, sibling, and grandparent), which were comparedfor each trait across the different normalization methods.

    Results and discussionWhen comparing the gene expression data normalizedusing four individuals as their own normalization pool ver-sus using their family members as a pool, 39% of traits hada p-value < 0.001 and 91% had a p-value < 0.05. Interest-ingly, 95% of the genes had a p-value < 0.001 and 99.5%had a p-value < 0.05 when comparing data normalizedusing the four individuals as their own normalization poolversus using independent individuals as the normalizationpool. This suggests that using independent individuals asthe normalization pool may be removing biological varia-tion in addition to removing random error. Figure 2a showsthe different means and variances individually before nor-malization colored distinctively by family. Prior to normal-ization, the pedigrees appear to have familial trends, withsiblings having similar trait values while unrelated grand-parents are more different (Fig. 2b). These might reflect thepotential genetic correlations in expression values withinpedigrees. After normalization using all individuals as thenormalization pool (Fig. 2c), all subjects were aligned at theoverall mean and variance. However, when the normaliza-tion uses the within-pedigree pool, the mean and variancewere aligned distinctly for each family (Fig. 2d). Plots of thewithin-pair trait differences for the two normalizationmethods show that some sib-pairs had very similar valuesacross these methods, whereas there was a marked changein these within-pair trait difference values for other sib pairs(data not shown).

    In Table 1, we show that the familial correlation coeffi-cients of expression phenotype values have changedaccording to the normalization pool. Generally, the corre-

    Page 3 of 6(page number not for citation purposes)

  • BMC Proceedings 2007, 1(Suppl 1):S152 http://www.biomedcentral.com/1753-6561/1/S1/S152

    lation coefficients between all relative pairs are higher inthe data normalized within pedigrees than in the dataafter normalization using all individuals. CPNE1, CSTB,ICAP_1A, and TM7SF3 have negative familial correlationcoefficients (bold in Table 1) when they are calculatedwith the phenotypes after normalizing using all individu-als as the normalization pool. Those coefficients are posi-tive after normalizing using the within-pedigree pool.This suggests that the within-pedigree method preservesthe underlying biological variation patterns while remov-ing the systematic variation, since we have strong priorevidence that 18 cis-acting genes are involved in the regu-lation of these gene expression levels. The results of non-

    parametric quantitative linkage show that normalizationusing either the all-individuals pool or the within-pedi-gree pool yields similar maximum allele-sharing LODscores for signals within 20 cM of the target structuralgene. However, most of these allele-sharing LOD scoreswere slightly decreased for the within-pedigree pool com-pared to the individual pool (15/18). The difference inmaximum LOD scores between the methods ranged from0.7 to 0.01. When we compared the negative log p-valuesof the NPL scores, the patterns of most genes normalizedwithin-pedigree (red solid line) and using all individualsin the normalization pool (black dashed line) were simi-lar except for RPS26 and CTSH (Fig. 3).

    Box plots of 194 individuals across 8793 genes before and after normalizationFigure 2Box plots of 194 individuals across 8793 genes before and after normalization. a, Box plots of 194 individuals across 8793 genes before normalization. b, Box plots from the representative two families before normalization. Green, grandparents; black, parents; red, siblings. c, After RMA normalization using all individuals (arrays) as normalization pool. d, After RMA nor-malization using within-pedigree pool. For panels a, c, and d, each color represents a different family.

    Page 4 of 6(page number not for citation purposes)

  • BMC Proceedings 2007, 1(Suppl 1):S152 http://www.biomedcentral.com/1753-6561/1/S1/S152

    Page 5 of 6(page number not for citation purposes)

    Results of nonparametric quantitative linkage analysis of 18 cis-acting gene expression values using MerlinFigure 3Results of nonparametric quantitative linkage analysis of 18 cis-acting gene expression values using Merlin. Plots of negative log p-values (Y axis) of 18 cis-acting genes in nonparametric linkage analysis. Red solid line: LOD score using pheno-types normalized by RMA using a within-pedigree pool. Black dashed line: LOD score using phenotypes normalized by RMA using all individuals as a normalization pool.

    0 50 100 200

    02

    46

    CH13L2

    Chromosome 1

    -logPvalue

    0 50 100 200

    02

    46

    Chromosome 1

    -logPvalue

    0 50 100 200

    02

    46

    GSTM2

    Chromosome 1

    -logPvalue

    0 50 100 200

    02

    46

    Chromosome 1

    -logPvalue

    0 50 100 200

    02

    46

    VAMP8

    Chromosome 2

    -logPvalue

    0 50 100 200

    02

    46

    Chromosome 2

    -logPvalue

    0 50 100 200

    02

    46

    ICAP_1A

    Chromosome 2

    -logPvalue

    0 50 100 200

    02

    46

    Chromosome 2

    -logPvalue

    0 50 100 150

    02

    46

    CTBP1

    Chromosome 4

    -logPvalue

    0 50 100 150

    02

    46

    Chromosome 4

    -logPvalue

    0 50 100 150

    02

    46

    PPAT

    Chromosome 4

    -logPvalue

    0 50 100 150

    02

    46

    Chromosome 4

    -logPvalue

    0 50 100 150

    02

    46

    LOC64167

    Chromosome 5

    -logPvalue

    0 50 100 150

    02

    46

    Chromosome 5

    -logPvalue

    0 50 100 150

    02

    46

    PSPHL

    Chromosome 7

    -logPvalue

    0 50 100 150

    02

    46

    Chromosome 7

    -logPvalue

    0 50 100 150

    02

    46

    IRF5

    Chromosome 7

    -logPvalue

    0 50 100 150

    02

    46

    Chromosome 7

    -logPvalue

    0 50 100 150

    02

    46

    ZP3

    Chromosome 7

    -logPvalue

    0 50 100 150

    02

    46

    Chromosome 7

    -logPvalue

    0 20 60 100

    02

    46

    HSD17B12

    Chromosome 11

    -logPvalue

    0 20 60 100

    02

    46

    Chromosome 11

    -logPvalue

    0 20 60 100

    02

    46

    RPS26

    Chromosome 12

    -logPvalue

    0 20 60 100

    02

    46

    Chromosome 12

    -logPvalue

    0 20 60 100

    02

    46

    TM7SF3

    Chromosome 12

    -logPvalue

    0 20 60 100

    02

    46

    Chromosome 12

    -logPvalue

    20 40 60 80 100

    02

    46

    CTSH

    Chromosome 15

    -logPvalue

    20 40 60 80 100

    02

    46

    Chromosome 15

    -logPvalue

    20 40 60 80 100

    02

    46

    IL16

    Chromosome 15

    -logPvalue

    20 40 60 80 100

    02

    46

    Chromosome 15

    -logPvalue

    0 10 30 50

    02

    46

    CPNE1

    Chromosome 20

    -logPvalue

    0 10 30 50

    02

    46

    Chromosome 20

    -logPvalue

    15 25 35 45

    02

    46

    CSTB

    Chromosome 21

    -logPvalue

    15 25 35 45

    02

    46

    Chromosome 21

    -logPvalue

    15 25 35 45

    02

    46

    DDX17

    Chromosome 22

    -logPvalue

    15 25 35 45

    02

    46

    Chromosome 22

    -logPvalue

    Table 1: Familial correlation coefficients and maximum LOD score of non-parametric linkage analysis after RMA normalization

    Familial correlation coefficienta

    Parent-offspring Sibling Grandparent-grandchild Maximum LODb

    Gene Target gene loci

    All arrays pool

    Within pedigree pool

    All arrays pool

    Within-pedigree pool

    All arrays pool

    Within-pedigree pool

    All arrays pool

    Within-pedigree pool

    CHI3L2 1p12 0.308 0.2689 0.1477 0.1146 0.1274 0.082 4.08 4.25GSTM2 1p13 0.2887 0.3217 0.3284 0.3941 0.2277 0.3412 3.15 3.58VAMP8 2p11 0.0939 0.4196 0.4304 0.7284 0.0475 0.4261 2.7 2.18ICAP_1A 2p25 -0.2898c 0.1242 0.4277 0.7435 -0.1441 0.5436 1.23 1.97CTBP1 4p16 0.3012 0.3772 0.5581 0.639 0.2002 0.3534 0.81 0.58PPAT 4q12 0.2055 0.2545 0.4469 0.4539 0.0778 0.1774 2.24 1.77LOC64167 5q15 0.2927 0.3464 0.2254 0.2843 0.0483 0.1295 4.61 4.12PSPHL 7p11 0.3932 0.5599 0.4086 0.5207 0.223 0.4089 2.51 2.54ZP3 7q11 0.4084 0.6565 0.43 0.7211 0.2311 0.5527 4.61 3.93IRF5 7q32 0.3462 0.384 0.2874 0.314 0.0825 0.1731 3.06 2.42HSD17B12 11p11 0.2375 0.7565 0.3892 0.8512 0.1205 0.7026 2.06 1.96TM7SF3 12p11 0.0618 0.5035 0.1892 0.6145 -0.1298 0.3133 1.67 1.41RPS26 12q13 0.5907 0.6533 0.7628 0.8023 0.3606 0.453 1.58 1.57CTSH 15q25 0.421 0.5694 0.5442 0.6546 0.322 0.5205 2.29 1.6IL16 15q26 0.0235 0.1814 0.4244 0.5689 0.047 0.2333 2.18 1.97CPNE1 20q11 -0.0113 0.3377 0.3958 0.73 0.0512 0.3515 1.52 0.68CSTB 21q22 0.1592 0.3987 0.2717 0.5898 -0.0075 0.2784 2.31 1.69DDX17 22q13 0.1687 0.2588 0.1911 0.5493 0.1911 0.2523 2.02 2.74

    aCorrelation coefficients of 18 gene expression values by familial pairs using FCOR, S.A.G.E.v5.1.0.bMaximum allele-sharing LOD score of nonparametric linkage analysis after RMA normalization.cBold text indicates negative familial correlation coefficient.

  • BMC Proceedings 2007, 1(Suppl 1):S152 http://www.biomedcentral.com/1753-6561/1/S1/S152

    Publish with BioMed Central and every scientist can read your work free of charge

    "BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."

    Sir Paul Nurse, Cancer Research UK

    Your research papers will be:

    available free of charge to the entire biomedical community

    peer reviewed and published immediately upon acceptance

    cited in PubMed and archived on PubMed Central

    yours — you keep the copyright

    Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp

    BioMedcentral

    ConclusionBecause of the presence of systematic variation in generat-ing gene expression data, proper normalization of raw val-ues is required to separate true signals from backgroundnoise. Previous gene expression studies have typicallyfocused on unrelated individuals. With the emergence ofgenetic linkage studies of gene expression traits, the use ofpedigrees poses concerns for proper normalization. Weinvestigated the need to account for pedigrees by normal-izing within families. Our results for the 18 linked traitsshow strong familial correlations, which generallyincreased when traits were normalized within pedigrees.However, in general, the strength of the maximum LODscore decreased (13/18) when traits were normalizedwithin pedigrees. This is not surprising because theincreased correlation of a quantitative trait within a familymay decrease evidence for linkage. It is also possible thatlinkage signals are inflated when normalizing using allindividuals as the pool. Because we did not evaluate link-age of these 18 traits to all markers on other chromosomesin this data set, we cannot evaluate possible inflation oflinkage signals. However, the pattern of the chromosomallinkage plot for these genes (15/18) did not differ acrossthe two normalization strategies. This lack of differencesuggests that normalization within pedigrees may not benecessary, at least in studies with small pedigree sizes.However future linkage studies on expression data withlarger pedigree sizes and/or large sample sizes may benefitfrom normalization by pedigree.

    Competing interestsThe author(s) declare that they have no competing inter-ests.

    AcknowledgementsThis work was partly supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF 2005-213-C00007) and in part by the Intramural Research Program of the National Human Genome Research Institute, National Institutes of Health. Some of the results of this paper were obtained by using the program package S.A.G.E., which is supported by a U.S. Public Health Service Resource Grant (RR03655) from the National Center for Research Resources.

    This article has been published as part of BMC Proceedings Volume 1 Sup-plement 1, 2007: Genetic Analysis Workshop 15: Gene Expression Analysis and Approaches to Detecting Multiple Functional Loci. The full contents of the supplement are available online at http://www.biomedcentral.com/1753-6561/1?issue=S1.

    References1. Hubner N, Wallace CA, Zimdahl H, Petretto E, Schulz H, Maciver F,

    Mueller M, Hummel O, Monti J, Zidek V, Musilova A, Kren V, CaustonH, Game L, Born G, Schmidt S, Müller A, Cook SA, Kurtz TW, Whit-taker J, Pravenec M, Aitman TJ: Integrated transcriptional profil-ing and linkage analysis for identification of genes underlyingdisease. Nat Genet 2005, 37:243-253.

    2. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, SpielmanRS, Cheung VG: Genetic analysis of genome-wide variation inhuman gene expression. Nature 2004, 430:743-747.

    3. Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, BurdickJT: Mapping determinants of human gene expression byregional and genome-wide association. Nature 2005,437:1365-1369.

    4. Chesler EJ, Lu L, Shou S, Qu Y, Gu J, Wang J, Hsu HC, Mountz JD,Baldwin NE, Langston MA, Threadgill DW, Manly KF, Williams RW:Complex trait analysis of gene expression uncovers poly-genic and pleiotropic networks that modulate nervous sys-tem function. Nat Genet 2005, 37:233-242.

    5. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ,Scherf U, Speed TP: Exploration, normalization, and summa-ries of high density oligonucleotide array probe level data.Biostatistics 2003, 4:249-264.

    6. Chesler EJ, Bystrykh L, de Haan G, Cooke MP, Su A, Manly KF, Wil-liams RW: Reply to "Normalization procedures and detectionof linkage signal in genetical-genomics experiments". NatGenet 2006, 38:856-858.

    7. Gautier L, Cope L, Bolstad BM, Irizarry RA: affy – analysis ofAffymetrix GeneChip data at the probe level. Bioinformatics2004, 20:307-315.

    8. Whittemore AS, Halpern J: A class of tests for linkage usingaffected pedigree members. Biometrics 1994, 50:118-127.

    9. Kong A, Cox NJ: Allele-sharing models: LOD scores and accu-rate linkage tests. Am J Hum Genet 1997, 61:1179-1188.

    Page 6 of 6(page number not for citation purposes)

    http://www.biomedcentral.com/1753-6561/1?issue=S1http://www.biomedcentral.com/1753-6561/1?issue=S1http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15711544http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15711544http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15711544http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15269782http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15269782http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=16251966http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=16251966http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15711545http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15711545http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=15711545http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12925520http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=12925520http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=14960456http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=14960456http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=8086596http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=8086596http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=9345087http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Abstract&list_uids=9345087http://www.biomedcentral.com/http://www.biomedcentral.com/info/publishing_adv.asphttp://www.biomedcentral.com/

    AbstractBackgroundMethodsResults and discussionConclusionCompeting interestsAcknowledgementsReferences