copy number variations and association mappingsssykim/teaching/s13/slides/lecture_cnvassoc.pdf ·...
TRANSCRIPT
SNP and CNV Genotyping
• SNP genotyping assumes two copy numbers at each locus (i.e., no CNVs)
• CNV genotyping assumes no SNPs in the region
• However, SNP and CNV coexist throughout the genome
• Ignoring either SNP or CNV will result in genotyping error – E.g., genotypes like AAB, A
BirdSuite
• A joint es8ma8on of SNP calls and CNV calls by combining SNP and CNV probe informa8on – Discover CNV genotypes for known CNVs (previously catalogued CNVs) – Discovery of novel and rare CNVs
• Associa8on analysis that incorporate both SNP and CNV informa8on
BirdSuite
• Canary: assigns copy number for known common CNPs
• BirdSeed: assigns genotypes for SNPs
• BirdEye: detects novel and rare CNVs
• Fawk: integrates SNPs, CNPs and CNVs
Detecting CNPs
• One-‐dimensional mixture model – Mean: intensity loca8on of the CNP
– Variance around each copy number
– EM algorithm to es8mate the parameters
Birdseed: Genotyping SNPs
• Combining SNP and CNP probe data
• Two-‐dimensional mixture model for two-‐copy genotypes
Birdseed: Genotyping SNPs
• Mixture component for minor allele homozygous sites may be hard to detect if minor allele frequency is low
Birdseye: Detecting De Novo CNVs
• Combine informa8on from Canary and Birdseed
CNV probes SNP probes
Evaluation
• There is no ground-‐truth available. However, consistency in Mendelian inheritance (Mendelian inconsistency, or MI) in HapMap trio samples can be used for evalua8on.
Evaluation
• Birdsuite vs. Birdseed: the rate of mendelian inconsistency (MI) in SNPs that overlap a known CNP for 91 children
Association Analysis with SNPs and CNVs
• At each locus, we have both SNP and CNV informa8on and want to incorporate both SNP and CNV in associa8on test – How can we disentangle the effects of SNPs and CNVs? – If the SNP for copy B lowers ac8vity than A, A and BB may have similar
phenotypes
Association Analysis with SNPs and CNVs
• Assuming A, B represent two SNP alleles, we fit a regression model
• A+B: total copy numbers
• A-‐B: SNP genotypes • b1: CNV effect • b2: SNP effect
Association Analysis with SNPs and CNVs
• Assuming A, B represent two SNP alleles, we fit a regression model
– When there is no copy number varia8ons at the locus, the model reduces the regression model with only SNP effects
– When there is no SNP genotype varia8on at the locus, the model reduces the regression on only CNVs
Simulation Study
• Scenarios to be considered – Dele8on: genotypes {A, B, -‐} at candidate locus – Duplica8on: genotypes {A, B, BB} at candidate locus
– Fix the frequency of B alleles and duplica8on/dele8on events
• Different associa8on tests
SNP and CNV Associations
• CNVs have been found implicated in rare genomic disorders
• CNVs have been implicated in only a few percent of the 2000 or more mendelian diseases
• Complex diseases might be more suscep8ble to ‘sod’ forms of varia8on (varia8on in noncoding sequences and copy number varia8ons)
• In an eQTL study, SNPs and CNVs were associated with 83% and 18% of the gene expression traits – Poten8ally greater roles of SNPs – Possible underes8ma8on of CNV effects -‐ need a more extensive
catalogue of CNVs
Association Studies with CNVs
• Gender ar8fact for dispersed duplica8ons: males/females are not equally represented in case and control groups