dna copy number variation and cancer risk
DESCRIPTION
DNA copy number variation and cancer risk. John F Pearson. Canterbury Statistics Open Day University of Canterbury 2/10/2012. Breast Cancer . Foulkes WD. N Engl J Med 2008; 359:2143-2153. Missing heritability. TA Manolio et al. Nature 461 , 747 - 753 (2009) doi:10.1038/nature08 494. - PowerPoint PPT PresentationTRANSCRIPT
DNA copy number variation and cancer risk
John F Pearson
Canterbury Statistics Open DayUniversity of Canterbury 2/10/2012
2
Breast Cancer
(Foulkes WD. N Engl J Med 2008; 359:2143-2153)Foulkes WD. N Engl J Med 2008; 359:2143-2153
3
Missing heritability
TA Manolio et al. Nature 461, 747-753 (2009) doi:10.1038/nature08494
4
Evan E. Eichler.
5
Copy number variation Allele 1Allele 2
Copy number loss Copy number gain
Whole gene
Partial gene
Contiguous genes
Regulatory effects
6
Copy number variants (CNVs) 16,000 copy number variant loci cover >50% of the human genome
CNVs are associated with cancer risk• Rare CNVs detected in ~50% of familial cancer genes
eg. BRCA1, BRCA2
• Genome-wide association studies of cancer
• prostate cancer, hepatocarcinoma, nasopharyngeal carcinoma, and neuroblastoma
• Increased CNV load
• Li Fraumeni Syndome (cancer related genes?)
• breast cancer (TP53 pathway, ESR1 pathway)
7
SNP arrays
𝑅=𝑋+𝑌𝜃=
2𝜋 arctan
𝑋𝑌
LRR = log2(Robserved/Rexpected)
The B Allele Frequency (BAF) is a somewhat confusing term that actually refers to a normalized measure of relative signal intensity ratio of the B and A allelesWang et al Genome Res. 2007 November; 17(11): 1665–1674.
8
Genomic location
9
Copy number
AA
AB
BB
NormalCopy neutral LOH
Copy number loss
10
Copy number gainCopy number
gain
AAAAABABBBBB
11
Illumina bead arrays.o CNVision (workflow software)o Gnosiso PennCNVo QuantiSNPo CNV Partition
CNV calling
CNV calling algorithms
12
Hidden Markov Model
Estimate copy number at each SNP from• Log R ratio • B allele frequency • transition probability at previous SNP.
PennCNV, QuantiSNP
13
PennCNV
14
PennCNVri LRRbi BAF at SNP i. ( 1 ≤ i ≤ M )zi copy number state The likelihood of the observed data is:
15
PennCNVri LRRbi BAF at SNP i. ( 1 ≤ i ≤ M )zi copy number state The likelihood of the observed data is:
LRR emission probability model includes a term for chemical fluctuations and misannotation/assembly
BAF emission probability complicated mixture model
16
PennCNVri LRRbi BAF at SNP i. ( 1 ≤ i ≤ M )zi copy number state Transmission probabilities between 2 adjacent SNPs i -1 and i.with copy numbers zi and zi-1 at distance di.
D = 100Mb for state 4, 100kb for other states.p are unknowns, estimated by the Baum-Welch algorithm.
17
PennCNVri LRRbi BAF at SNP i. ( 1 ≤ i ≤ M )zi copy number state • Baum-Welch used to train the model• Viterbi algorithm used to infer most likely path• CNV called whenever a stretch of states is different from
normal ( usually state 3 or 4)
18
Copy number gainCopy number
gain
AAAAABABBBBB
19
Noisy data
20
Breast cancerA characteristic of breast tumour cells is genomic instability
BRCA1, BRCA2
21
BRCA1: known large deletions
Sample ID BRCA1 mutationEMB0001242 del exons 2-24EMB0001532 del exons 3-19EMB0001222 del exons 1-23EMB0001425 del exons1-21EMB0001439 del exons 1-23EMB0001458 del exons 1-23EMB0001477 del exons1-21GEM0002463 del exons 16-23PAD0005718 del exons 9-19EMB0001770 del exons 1-17EMB0001057 del exons 1-17KCO0003228 del exons 1-17EMB0001082 del exons 8-13GEM0002430 del exons 8-13
Sample ID BRCA1 mutationEMB0001530 del exons 3-19EMB0001689 del exons 1-17
Detected Not detected
CNV prediction summary:• cnvPartition - 25% (4/16) • GNOSIS - 19% (3/16)• PennCNV - 88%
(14/16)• QuantiSNP - 81%
(13/16)
22
CNV calling by 4 algorithms
QC(1) – GWAS criteria
Endometrial cancer1343 cases
ANECS, SEARCH655 female controls
Hunter Community Study
Case vs. control analyses
1279 cases 619 controls
1210 cases 612 controls
Want to find:
1. CNVs overlapping known susceptibility genes
2. novel CNVs in the mismatch repair pathway
3. common or rare CNVs associations
23
CNV frequency: all Case Control Difference P 1,210 612
Total CNVs 26.7 26.5 0.2 NSDeletions 17.7 18.1 -0.4 NSDuplications 8.9 8.4 0.5 NSExons 7.1 6.9 0.2 NSMean CNV per sample
24
CNV frequency: rare (< 1%) Case Control Difference P 1,210 612
Total CNVs 6 3.3 2.7 4.0E-05Deletions 3.8 1.4 2.4 3.0E-06Duplications 2.2 1.9 0.3 NSExons 6 3.3 2.7 2.0E-04Mean rare CNV per sample
25
CNV frequency: rare (< 1%) Case Control Difference P 1,210 612
Total CNVs 6 3.3 2.7 4.0E-05Deletions 3.8 1.4 2.4 3.0E-06Duplications 2.2 1.9 0.3 NSExons 6 3.3 2.7 2.0E-04Mean rare CNV per sample
26
Association study
Case ControlP
adjustedChr 0 1 3 4 0 1 3 4X 0 1 0 0 0 57 0 0 0.000X 0 30 7 0 0 78 0 0 0.000X 0 2 0 0 0 34 0 0 0.000X 0 0 0 0 0 24 0 0 0.0006 9 10 0 0 4 35 0 0 0.00016 0 125 127 0 0 10 19 0 0.000X 0 0 0 0 0 14 0 0 0.0016 812 203 438 20 477 184 276 14 0.0032 0 2 2 0 0 14 16 0 0.0067 0 0 0 0 0 12 4 0 0.00611 0 38 32 0 0 1 3 0 0.010X 0 1 0 0 0 0 11 0 0.016
CNV Regions
27
Association studyCNV overlapping genes
Case ControlP
adjustedChr 0 1 3 4 0 1 3 4X 0 2 0 0 0 53 0 0 0.0001 0 37 2 0 0 0 0 0 0.0041 0 35 2 0 0 0 0 0 0.0047 0 0 1 0 0 13 5 0 0.0041 0 36 2 0 0 0 0 0 0.0041 0 36 2 0 0 0 0 0 0.0041 0 34 2 0 0 0 0 0 0.0051 0 33 2 0 0 0 0 0 0.0081 0 31 1 0 0 0 0 0 0.0111 0 31 1 0 0 0 0 0 0.0117 0 4 32 2 0 0 0 0 0.011X 0 22 6 0 0 36 0 0 0.021
28
29
AcknowledgementsUniversity of Otago• Gemma Moir-Meyer• Logan Walker• Mackenzie Cancer Research Group
Queensland Institute of Medical Research• Mandy Spurdle• Felicity Lose• Yen Tan• Alex Metcalf• Australian National Endometrial Cancer
Study• Bryony Thompson
University of Cambridge• Deborah Thompson • Paul Pharoah• Alison Dunning • Douglas Easton• Studies of Epidemiology and Risk
Factors in Cancer Heredity (SEARCH)
University of Newcastle• Rodney Scott• Mark McEvoy• John Attia• Elizabeth Holliday• The Hunter Community Study
CIMBA consortiumMAYO clinic• Fergus Couch