design of genetic studiesdesign of genetic studies · design of genetic studiesdesign of genetic...
TRANSCRIPT
Design of Genetic StudiesDesign of Genetic Studies
Dan Koller, Ph.D.Research Assistant ProfessorResearch Assistant Professor
Medical and Molecular Genetics
Genetics and MedicineGenetics and Medicine
• Over the past decade, advances from genetics have permeated medicine– Identification of genes causing disease– Identification of genetic risk factors that may
modulate disease risk
ObjectivesObjectives
• Review basic genetic conceptsReview basic genetic concepts• Review the study designs and statistical
methods that led to genetic advances inmethods that led to genetic advances in medicineR i t i ti hi h• Review concepts in genetics which are being widely used in current studies
ki t di t th ti t ib tiseeking to dissect the genetic contribution to disease
Tools of Genetic StudiesTools of Genetic Studies• Molecular markers
– Microsatellite markers– Widely distributed in the genome– Variable number of copies of a tandemly repeated
segment– Typically, this segment is 2 (di-), 3 (tri-) or 4 (tetra-) yp y, g ( ), ( ) ( )
base pairs long
Allele 1 AGCTCACACACACACACACACAATCGAllele 1 AGCTCACACACACACACACACAATCGAllele 2 AGCTCACACACACACACAATCGTCGAAllele 3 AGCTCACACACACAATCGTCGACCGCAllele 4 AGCTCACACACAATCGTCGACCGCGG
Tools of Genetic StudiesTools of Genetic Studies
• Molecular markersMolecular markers– Single Nucleotide Polymorphisms (SNPs)
Allele 1 AGCTCACACACACACACAllele 2 AGCTAACACACACACACAllele 2 AGCTAACACACACACAC
Several Important QuestionsSeveral Important Questions
• What is the evidence that a disease or trait is genetic?
• Do I have the patient and family resourcesDo I have the patient and family resources to perform genetic studies?
Is it Genetic?Is it Genetic?
• Single gene (Mendelian) disorders– Obvious they are genetic– Reviewing pedigrees makes the mode of
inheritance clear• Genetically complex disorders
– There may be NO recognizable pattern of y g pinheritance
How to prove a diseaseHow to prove a disease has a genetic component?
• Twin Studies• Twin Studies
• Familial Aggregation• Familial Aggregation
Twin StudiesTwin Studies
• Compare Monozygotic and Dizygotic Twins– Monozygotic TwinsMonozygotic Twins
• Genetically identical– Dizygotic Twinsyg
• Like siblings (1/2 genome shared)
• Compare concordance rates of MZ and DZ twins
Twin Studies
• If disease entirely genetic:y g– MZ disease concordance = 100%– DZ disease concordance = 50%
• If disease only partly genetic:• If disease only partly genetic:– MZ concordance < 100%
DZ d 50%– DZ concordance < 50%– MZ concordance > DZ concordance
Familial AggregationFamilial Aggregation
I d i k f di f il• Increased risk for disease among family members of an affected individual
• Compare frequency of disease among first d l ti f ff t d i di id ldegree relatives of affected individuals with the frequency of the disease in the
l l tigeneral population.
Familial AggregationFamilial Aggregation
Heart disease: 3 fold increased risk of disease among the goffspring of an affected individual
Parkinson disease: 2 3 f ld i d i k f di th2-3 fold increased risk of disease among the siblings of an affected person
Simple Mendelian Diseasep
• Most single gene Mendelian disordersMost single gene, Mendelian disorders have been identified
Examples: cystic fibrosis Huntington’s– Examples: cystic fibrosis, Huntington s disease
– Caused by gross changes in the DNACaused by gross changes in the DNA sequence of a gene
• A few disorders still remainOften found in only a few families– Often found in only a few families
Meiosis and Linkageg
Gamete formationGamete formation Meiosis I: Homologous
chromosomes pairCrossing over occurs Genes that are physically
l t thclose together are more likely to be coinherited
Genes that are physicallyGenes that are physically far apart on the chromosome are less likely to be coinheritedlikely to be coinherited
Genome Screen ApproachGenome Screen Approach
• Seeks to identify, IN FAMILIES, chromosomal regions that are consistently transmitted to affected individualsaffected individuals.
• Analyze markers located at regular intervals a y e a e s ocated at egu a te a sthroughout the genome
Markers
Positional Candidate ApproachPositional Candidate Approach
• Once linkage to a particular chromosomal• Once linkage to a particular chromosomal region has been detected the following steps are followed:are followed:– Narrow the critical region by adding more
families or more members of existing families to the analysis
– Once the region is reduced to a few centimorgans identify all genes in the intervalcentimorgans, identify all genes in the interval
– Sequence candidate genes in affected and unaffected family members to identify DNAunaffected family members to identify DNA sequence alterations
Simple Mendelian DiseaseSimple Mendelian Disease
• Even with the identification of the mutantEven with the identification of the mutant disease gene, important questions often remainremain– Why is there clinical variability among
individuals with the same mutation?individuals with the same mutation?– Why do individuals with the same mutation
develop disease at variable ages?develop disease at variable ages?– When does disease onset?
Variability in SimpleVariability in Simple Mendelian Diseases
• Currently, there is great interest in d t i i h th l hi idetermining whether polymorphisms in other genes might contribute to phenotypic
i bilit i h t th i t bvariability in what otherwise seems to be a genetically simple disease
What is a complex disease?What is a complex disease?
• Disorders with complex inheritanceDisorders with complex inheritance
– Likely due to the action of multiple genesy p g– Genes may be interacting with each other to
result in disease phenotype (epistasis)– Affected individuals may have different
genetic mutations/polymorphisms leading to the same disease phenotype
– Environmental factors may be important
Genetics of a Complex DiseaseGenetics of a Complex Disease
Environmental Factor 1
Environmental Factor 2
Gene 1 Gene 2Gene 1 Gene 2
Alzheimer’s Disease Parkinson’s DiseaseParkinson’s Disease
Heart Disease
Identifying genes for complex diseasey g g p
Association Linkage Test candidate gene Collect sample of
Test entire genome Collect families with
affected and control subjectsC f
multiple affected members
Compare frequency of a genetic polymorphism in 2polymorphism in 2 samples
Affected Control
Linkage vs AssociationLinkage vs. Association
LinkageLinkage Measures the segregation of alleles and a
phenotype within a familyphenotype within a family Detected over large physical distances
Association Measures preferential segregation of aMeasures preferential segregation of a
particular allele with a phenotype across families Detected over shorter distances
Linkage in Complex DiseaseLinkage in Complex Disease
Identify families with multiple affectedIdentify families with multiple affected members Increases the likelihood that genes are g
important in disease susceptibility in that family
Pattern of inheritance less certain Collect family members to follow segregation
of disease and marker alleles
Identity By Descent (IBD)y y ( )
Allele 1 AGCTCACACACACACACACACAATCGAllele 2 AGCTCACACACACACACAATCGTCGAAllele 2 AGCTCACACACACACACAATCGTCGAAllele 3 AGCTCACACACACAATCGTCGACCGCAllele 4 AGCTCACACACAATCGTCGACCGCGG
Linkage AnalysisLinkage Analysis
Employ nonparametric linkage methods Identify chromosomal regions that are
preferentially transmitted within a family to the affected individuals.M th d i t b d bi ti b t Method is not based on recombination but on IBD marker allele sharingIt i ft d i th l i f l It is often used in the analysis of complex diseases (ex. heart disease, Alzheimer’s disease diabetes)disease, diabetes)
Linkage Analysis in C l DiComplex Disease
• This approach often leads to the identification of broad chromosomal regionsidentification of broad chromosomal regions shared by affected family members
• Often, there can be a lack of replication of linkage results between studies
Replication of LinkageReplication of Linkage
• Lack of replication may be due to:• Lack of replication may be due to:– Initial linkage was a false positive result
A diff t ti f t ib t– A different proportion of contributory genes were sampled in the 2 groupsI ffi i t l i ( ) t d t t– Insufficient sample size (power) to detect loci of small to moderate effect sizeUnique risk genes in certain populations– Unique risk genes in certain populations
– Differences in sample recruitmentDiff i i t l i k f t– Differences in environmental risk factors
Replication of LinkageReplication of Linkage
Gene 1 Gene 2Gene 1
Gene 1G 2
Gene 2
Gene 3
1
Gene 3 Gene 3
Gene 2
Initial Replication
3
PopulationInitial Sample
ReplicationSample
Linkage Approaches i C l Diin Complex Disease
• This technique has been widely used to identify chromosomal regions linked to– Diabetes– Inflammatory bowel disease– Cancer– Alzheimer’s disease– Bipolar disorder
Linkage vs AssociationLinkage vs. Association
LinkageLinkage Measures the segregation of alleles and a
phenotype within a familyphenotype within a family Detected over large physical distances
Association Measures preferential segregation of aMeasures preferential segregation of a
particular allele with a phenotype across families Detected over shorter distances
Association StudiesAssociation Studies
• Typically employed to test the role of aTypically employed to test the role of a candidate gene
• Candidate gene may be nominated based• Candidate gene may be nominated based on:
P th h i l– Pathophysiology– Genomic location
Si il i h i– Similarity to other important genes
Association StudiesAssociation Studies
• Most tests of association are evaluating the evidence of linkage disequilibriumg qbetween polymorphisms in a candidate gene and a disease risk alleleg
Linkage Disequilibrium Studiesg
• LD is defined asLD is defined as associations between alleles at diff t l i ithidifferent loci within the population
• Measure LD:between SNPs– between SNPs
– between SNP and phenotype
Goldstein et al. Nature Genetics 29: 109-111, 2001
Association StudiesAssociation Studies
• Two commonly used statistical testsTwo commonly used statistical tests employed to test for association between a SNP and a diseaseSNP and a disease– Population based approach
• Case control design• Case control design– Family based approach
• Transmission Disequilibrium Test (TDT)Transmission Disequilibrium Test (TDT)
Population Based AssociationPopulation Based Association
• For a disease risk, the most commonlyFor a disease risk, the most commonly applied design to test for association is the case control designg– Compare allele frequencies of a
polymorphism in a candidate gene between th d t lthe cases and controls
– Can be quite powerful to detect relatively small genotypic effects even in modestsmall genotypic effects, even in modest samples of cases and controls (ex. 100-500 of each)
Population Based AssociationPopulation Based Association
• For a quantitative phenotype (ex BoneFor a quantitative phenotype (ex. Bone density, a-beta levels, etc), the most commonly applied design to test forcommonly applied design to test for association is analysis of variance
Evaluate the evidence of association using a– Evaluate the evidence of association using a regression model with the SNP genotype as the main effect.
Population Based AssociationPopulation Based Association
• Advantages– Quite powerful to detect relatively small p y
genotypic effects, even in modest samples of cases and controls (ex. 100-500 of each)
– Relatively easy to collect the cases and controls or general population samples
Population Based AssociationPopulation Based Association
• DisadvantagesDisadvantages– Population stratification – if there are
underlying differences in the cases andunderlying differences in the cases and controls that are unrelated to disease risk, false positive results are more likelyy
Family Based AssociationFamily Based Association
• Employ a trio design which includes bothEmploy a trio design which includes both parents and an affected offspring– Compare the frequency of alleles transmitted to
affected offspring to the frequency of alleles not transmitted to the affected offspring
1/2 2/2Allele Transmitted Not transmitted1 1 0
1/2
1 1 02 0 1
Family based AssociationFamily based Association
• Advantages– Resistant to potential bias from population– Resistant to potential bias from population
stratification since alleles not transmitted in the family are used as the ‘control alleles’y
Family based AssociationFamily based Association
Di d t• Disadvantages– Requires at least one parent to be
h t t th k b i t t dheterozygous at the marker being tested, therefore power of this approach is significantly lower than population basedsignificantly lower than population based approaches
– Can be more difficult to find 2 generationalCan be more difficult to find 2 generational families willing and able to participate
Association Approaches i C l Diin Complex Disease
• This technique has begun to be used to test candidate genes forg– Alzheimer’s disease and APOE– Diabetes and CalpainDiabetes and Calpain– Inflammatory bowel disease and NOD2
SummarySummary
• Past success of genetics in medicine hasPast success of genetics in medicine has led to the identification of a number of genes which when mutated lead togenes which when mutated lead to disease
• Focus of many current studies is to identify• Focus of many current studies is to identify risk factors for disease
V i h b l d– Various approaches can be employed including Linkage and Association
Where to nextWhere to next
• Things to consider when designing aThings to consider when designing a genetic study….
Is it clear that the disease/trait is genetic?– Is it clear that the disease/trait is genetic?– Do I have the sample base to support this
type of research which typically requires largetype of research which typically requires large numbers of families or patients?