![Page 1: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/1.jpg)
Copy-number estimation using Robust Multichip Analysis
-Supplementary materials for the
aroma.affymetrix lab session
Henrik Bengtsson & Terry SpeedDept of Statistics, UC Berkeley
August 7, 2007
BioC 2007
![Page 2: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/2.jpg)
Affymetrix chips
![Page 3: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/3.jpg)
Generic Affymetrix chip
1.28 cm
6.5 million probes/ chip
1.28 cm
Feature size: 100µm to 18µm to 11µm and now 5µm.Soon: 1µm, 0.8µm, with a huge increase in number of probes.
*
5 µm
5 µm
> 1 million identical 25bp sequences
* ***
![Page 4: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/4.jpg)
Abbreviated generic assay description
1. Start with target gDNA (genomic DNA) or mRNA.
2. Obtain labeled single-stranded target DNA fragments for hybridization to the probes on the chip.
3. After hybridization, washing, staining and scanning we get a digital image. This is summarized across pixels to probe-level intensities before we begin. They are our raw data.
![Page 5: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/5.jpg)
Affymetrix probe terminology
Target DNA: ...CGTAGCCATCGGTAAGTACTCAATGATAG... |||||||||||||||||||||||||
Perfect match (PM): ATCGGTAGCCATTCATGAGTTACTAMis-match (MM): ATCGGTAGCCATACATGAGTTACTA
25 nucleotides
* *
* **
PM
Target seq.
* **
MM
* **
other PMs
Other DNA Other DNA Other seq.
X
![Page 6: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/6.jpg)
Affymetrix SNP chips(Mapping 10K, 100K, 500K)
![Page 7: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/7.jpg)
Single Nucleotide Polymorphism (SNP)
Definition:A sequence variation such that two chromosomes may differ by a single nucleotide (A, T, C, or G).
Allele A: A...CGTAGCCATCGGTA/GTACTCAATGATAG...
Allele B: G
A person is either AA, AB, or BB at this SNP.
![Page 8: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/8.jpg)
Probes for SNPs
PMA: ATCGGTAGCCATTCATGAGTTACTAAllele A: ...CGTAGCCATCGGTAAGTACTCAATGATAG...
Allele B: ...CGTAGCCATCGGTAGGTACTCAATGATAG...PMB: ATCGGTAGCCATCCATGAGTTACTA
(Also MMs, but not in the newer chips, so we will not use these!)
* **
PMA >> PMB
AA* **
*
* **
PMA << PMB
* **
BB* **
PMA ¼ PMB
AB* **
![Page 9: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/9.jpg)
Copy-number analysis with SNP arrays
![Page 10: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/10.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (or quantile)
Total CN PM = PMA + PMB
Summarization (SNP signals )
log-additivePM only
Post-processing fragment-length
(GC-content)
Raw total CNs R = Reference
Mij = log2(ij /Rj) chip i, probe j
![Page 11: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/11.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Cross-hybridization:
Allele A: TCGGTAAGTACTCAllele B: TCGGTATGTACTC
AA* **
PMA >> PMB
* **
* **
PMA ¼ PMB
AB* ** *
* **
PMA << PMB
* **
BB
![Page 12: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/12.jpg)
AA
TTAT
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
offset
+
PMT
PMA
![Page 13: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/13.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
PMT
PMA
![Page 14: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/14.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Crosstalk calibration corrects for differences in distributions too
log2 PM
![Page 15: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/15.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Crosstalk calibration corrects for differences in distributions too
log2 PM
![Page 16: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/16.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
AA* **
PM = PMA + PMB
* **
* **
PM = PMA + PMB
AB* **
*
* **
PM = PMA + PMB
* **
BB
![Page 17: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/17.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
The log-additive model:
log2(PMijk) = log2ij + log2jk + ijk
sample i, SNP j, probe k.
Fit using robust linear models (rlm)
![Page 18: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/18.jpg)
100K
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Longer fragments ) less amplified by PCR ) weaker SNP signals
![Page 19: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/19.jpg)
500K
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Longer fragments ) less amplified by PCR ) weaker SNP signals
![Page 20: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/20.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Normalize to get samefragment-length effect for all hybridizations
![Page 21: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/21.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)
Normalize to get samefragment-length effect for all hybridizations
![Page 22: Henrik Bengtsson & Terry Speed Dept of Statistics, UC Berkeley August 7, 2007 BioC 2007](https://reader035.vdocuments.us/reader035/viewer/2022070401/5681362f550346895d9daba4/html5/thumbnails/22.jpg)
Copy-number estimation using Robust Multichip Analysis (CRMA)
CRMA
Preprocessing(probe signals)
allelic crosstalk (quantile)
Total CNs PM=PMA+PMB
Summarization (SNP signals )
log-additive(PM-only)
Post-processing fragment-length
(GC-content)
Raw total CNs Mij = log2(ij/Rj)