a review/update of the ercc & maqc microarray consortia and some applications of their findings...
TRANSCRIPT
A Review/Update of the ERCC & A Review/Update of the ERCC & MAQC Microarray Consortia MAQC Microarray Consortia
and Some Applications of Their and Some Applications of Their FindingsFindings
““Expressionist” Seminar GroupExpressionist” Seminar GroupJohns Hopkins School of Public HealthJohns Hopkins School of Public Health
Ernest S. KawasakiErnest S. Kawasaki
NCI Advanced Technology Center NCI Advanced Technology Center Microarray FacilityMicroarray Facility
August 9, 2006August 9, 2006
ERCC Summary/UpdateERCC Summary/UpdateExternal RNA Controls ConsortiumExternal RNA Controls Consortium
MAQC Summary/UpdateMAQC Summary/UpdateMicroArray Quality Control ConsortiumMicroArray Quality Control Consortium
Possible Use of ERCC/MAQC Possible Use of ERCC/MAQC Standards & Large Data SetStandards & Large Data Set
Organizations/Consortia Developing Organizations/Consortia Developing Standards & Controls for Gene Expression Standards & Controls for Gene Expression
Profiling TechnologiesProfiling Technologies
• MGED -- MGED -- Microarray Gene Expression DatabaseMicroarray Gene Expression Database Standard for data reporting Standard for data reporting
(MIAME)(MIAME)
• MAQC -- MAQC -- Microarray Quality Control GroupMicroarray Quality Control Group FDA sponsored RNA standards, ref FDA sponsored RNA standards, ref datasets, etc. datasets, etc. (Leming Shi et al)(Leming Shi et al)
• ERCC -- ERCC -- External RNA Controls ConsortiumExternal RNA Controls Consortium (M. Salit, J. Warrington et al)(M. Salit, J. Warrington et al)
• NIST -- NIST -- Metrology for Gene Expression ProgramMetrology for Gene Expression Program provides a better understanding of the provides a better understanding of the fundamentals of microarray technologies fundamentals of microarray technologies (M. Salit, M. Satterfield, et al)(M. Salit, M. Satterfield, et al)
4000 -
3000 -
2000 -
0
Nu
mb
er o
f p
ub
lica
tio
ns
Nu
mb
er o
f p
ub
lica
tio
ns
1995-8 1999 2000 2001 2002 2003 2004
Rapid Increase in Microarray PublicationsRapid Increase in Microarray Publications2005 2005 -- The -- The 10 Year Anniversary10 Year Anniversary of the First Expression Microarray of the First Expression Microarray
20052005
5000 -5000 -
5555
43504350
54005400!!
Yearly Summary From PubMedYearly Summary From PubMed
140140 42542511251125
20002000
31103110
No common standards are No common standards are used across platforms so used across platforms so data are difficult or data are difficult or impossible to compare.impossible to compare.
Proliferation of Whole Genome ArraysProliferation of Whole Genome Arrays
ABIABI 60mer60mer 31,000 31,000 Probe Probe SetsSets
AffymetrixAffymetrix 25mer25mer 54,000 54,000 “ “ “ “
AgilentAgilent 60mer60mer 44,000 44,000 “ “ ““
GE AmershamGE Amersham 30mer30mer 55,000 55,000 “ “ ““
IlluminaIllumina 50mer50mer 46,000 46,000 “ “ “ “
Microarrays Inc.Microarrays Inc. 70mer70mer 49,000 49,000 “ “ “ “
NimbleGenNimbleGen 60mer60mer 38,000 38,000 “ “ “ “
Phalanx BiotechPhalanx Biotech 60mer60mer ~30,000~30,000 “ “ “ “
Home BrewHome Brew 70mer 70mer ~40,000 ~40,000 “ “ ““ cDNAcDNA
Etc, Etc, Many other companies Many other companies (Combimatrix) (Combimatrix) making smaller making smaller
custom arrays.custom arrays. DNA-DNA hybrid occupies ~4nmDNA-DNA hybrid occupies ~4nm22 on slide surface on slide surface
ERCCERCCEExternalxternal RRNANA CControlontrol CConsortiumonsortium
Conception in March 2003Conception in March 2003Stanford UniversityStanford University
The Private, Public, and Academic The Private, Public, and Academic sectors working together to produce sectors working together to produce control materials for gene expression control materials for gene expression
analysis.analysis.
Mark Salit NIST/ Janet Warrington Affymetrix
Mission of the ERCCMission of the ERCC
The ERCC is developing The ERCC is developing external RNA controlsexternal RNA controls
useful for gene expression useful for gene expression assays in Microarrays & QRT-assays in Microarrays & QRT-
PCR on a wide variety of PCR on a wide variety of platforms.platforms.
J. Warrington -- Affymetrix
Members of the ERCCMembers of the ERCC More than 70 and countingMore than 70 and counting….
A good mix of academic, government and commercial A good mix of academic, government and commercial organizations with ~115 scientists, 10 countriesorganizations with ~115 scientists, 10 countries
AffymetrixAgilentAmbionApplied BiosystemsATCCBiomerieuxBMSCambridge UniversityCapital BioCelera DiagnosticsCenetronCenters for Disease ControlCenters for Medicare & Medicaid ServicesClinical & Laboratory StandardsInstituteClinical Hospital Center ZagrebCombimatrixEli LillyEppendorf Microarray DivisionExpression Analysis
FDA, CBERFDA, CDERFDA, CDRHFDA, NCTRFDA, OIVDGE HealthcareGenetics Society of VietnamHarvard UniversityIlluminaInformax, Inc.International Federation of Clinical Chemistry & Laboratory MedicineInvitrogenJohns Hopkins UniversityLawrence Livermore LabLGCMarine Molecular Quality ControlsMayo ClinicNational Institute of Standards & TechnologyNIH, National Cancer InstituteNorthwestern
NugenicQiagenQueens University HospitalRoche Molecular SystemsStanford UniversityStratageneTokyo UniversityUCLAUniversity Health NetworkUS Department of AgricultureVeridex, Johnson & JohnsonVialogyVigentechEtc, etc, etc
J. Warrington -- Affymetrix
The ERCC is producing standardized The ERCC is producing standardized expression controls, analysis tools and expression controls, analysis tools and
protocolsprotocols• Well-characterized, widely accepted RNA standard
controls for multiple platforms – Certified Reference Material (CRM)
• Protocols for multiple applications, research and the clinical laboratory (CLSI – Clinical & Laboratory Standards Inst) Approved July 2006!
• Software tools to support development work• Software tools to support multiple applications
J. Warrington -- Affymetrix
Control Sequences June 2006Control Sequences June 2006NumberNumber AffiliationAffiliation Genus SpeciesGenus Species LengthLength
1 - 281 - 28 AffymetrixAffymetrix B. subtlisB. subtlis 700 – 2000700 – 2000
29 - 4029 - 40 AffymetrixAffymetrix Artificial SequencesArtificial Sequences 500 – 1900500 – 1900
41 - 4341 - 43 USDA-ARS-NCAURUSDA-ARS-NCAUR Bos taurusBos taurus 500500
44 - 4644 - 46 USDA-ARS-NCAURUSDA-ARS-NCAUR Glycine maxGlycine max 500500
4747 AmbionAmbion Lamda phageLamda phage 20022002
48 - 5348 - 53 AmbionAmbion Artificial SequencesArtificial Sequences 10001000
54 - 6154 - 61 AmbionAmbion E. coliE. coli 750 – 2000750 – 2000
62 - 8262 - 82 Stanford UniversityStanford University MethanococcusMethanococcus 500 – 750500 – 750
83 - 8583 - 85 Agilent TechnologiesAgilent Technologies Artificial SequencesArtificial Sequences 500500
86 - 9086 - 90 GE HealthCareGE HealthCare E. coliE. coli 10001000
91 – 14091 – 140
141-146141-146
Ambion/AtacticAmbion/Atactic
EppendorfEppendorf
Artificial SequencesArtificial Sequences 10001000
L. Reid -- Expression Analysis, J. Warrington -- AffymetrixL. Reid -- Expression Analysis, J. Warrington -- Affymetrix
A Summary of How ERCC Controls Will Be A Summary of How ERCC Controls Will Be Tested and SelectedTested and Selected
Testing Strategy for RNA ControlsTesting Strategy for RNA Controls
1.1. Design and development -- generate Design and development -- generate reagents -- ~100 in place w/70 sequencedreagents -- ~100 in place w/70 sequenced
2.2. Prototype testing -- validate reagentsPrototype testing -- validate reagents
3.3. Proof of concept -- validate the assaysProof of concept -- validate the assays
4.4. Functional testing -- validate the productFunctional testing -- validate the product
5.5. Performance review -- analyze all dataPerformance review -- analyze all data
Testing begins in the 4Testing begins in the 4thth quarter. quarter. L. Reid et al L. Reid et al
Uses of RNA Controls/StandardsUses of RNA Controls/Standards
• Negative ControlsNegative Controls
-- Determine “true” background-- Determine “true” background
-- QC for slide quality, hybridization, etc.-- QC for slide quality, hybridization, etc.
• Positive ControlsPositive Controls
-- QC as above-- QC as above
-- Labeling efficiency-- Labeling efficiency
-- Dilution series, determine sensitivity of assay,-- Dilution series, determine sensitivity of assay, determine lowest conc. with reliable signaldetermine lowest conc. with reliable signal
-- Ratiometric series, normalization tool-- Ratiometric series, normalization tool
Will allow better comparison of intra or inter lab data Will allow better comparison of intra or inter lab data
and with the same or different array platforms.and with the same or different array platforms.
Tests for Validation of ERCC ControlsTests for Validation of ERCC Controls
• Negative control test – background studiesNegative control test – background studies • Cross-hybridization – determine if any of the Cross-hybridization – determine if any of the controls hybridize to each other or to mRNAscontrols hybridize to each other or to mRNAs
• Labeling test – determine efficiency in the Labeling test – determine efficiency in the presence of complex RNA samplepresence of complex RNA sample
• Latin square – test controls over a range ofLatin square – test controls over a range of concentrations (1:5,000,000 to 1:1000)concentrations (1:5,000,000 to 1:1000)
• Linear range test and ratiometric studiesLinear range test and ratiometric studies
Above studies will require ~102 arrays per site!Above studies will require ~102 arrays per site!
Latin Squares Design for Testing ControlsLatin Squares Design for Testing Controls
A1 – A4 = the 4 arrays used A1 – A4 = the 4 arrays used
G1 – G4 = the 4 transcripts being studiedG1 – G4 = the 4 transcripts being studied
L1 – L4 = the 4 concentrations of each transcriptL1 – L4 = the 4 concentrations of each transcriptL. Reid, BMC Genomics 6:150L. Reid, BMC Genomics 6:150
ERCC Test SitesERCC Test Sites>100 Arrays/Site for Validating Controls>100 Arrays/Site for Validating Controls
• AffymetrixAffymetrix• GE HealthcareGE Healthcare• IlluminaIllumina• NIAIDNIAID• NovartisNovartis• QiagenQiagen• Agilent, ABI, Roche maybeAgilent, ABI, Roche maybe
The MAQC ProjectThe MAQC ProjectMicroArray Quality ControlMicroArray Quality Control
• An FDA sponsored consortium (Leming Shi)An FDA sponsored consortium (Leming Shi)
• Founded to address concerns of microarrayFounded to address concerns of microarray community concerning reproducibility of community concerning reproducibility of expression profiling experiments.expression profiling experiments.
• Group consists of over 140 members from Group consists of over 140 members from academia, government, pharma & biotech.academia, government, pharma & biotech.
• A large study was designed to compare ex-A large study was designed to compare ex- pression data from 10 different platforms andpression data from 10 different platforms and 40 different test sites with >650 arrays.40 different test sites with >650 arrays.
• Study has been completed and results will beStudy has been completed and results will be published in published in Nature Biotechnology. Nature Biotechnology. Data will Data will released next month.released next month.
MAQC Study Goals/Exptl. DesignMAQC Study Goals/Exptl. Design• Establish a set of reference standards for use in the Establish a set of reference standards for use in the MAQC, but more importantly for the array communityMAQC, but more importantly for the array community• Generate large collection of reference data sets usingGenerate large collection of reference data sets using multiple microarray platforms and many diff. labs….multiple microarray platforms and many diff. labs….....• Promote the use of reference RNA samples…..Promote the use of reference RNA samples…..• Make recommendations on the appropriate uses of Make recommendations on the appropriate uses of microarray technology.microarray technology.
The MAQC group first tested multiple RNAs with 160 arrays and then The MAQC group first tested multiple RNAs with 160 arrays and then chose two for titration studies with 200 arrays. Two RNAs at two chose two for titration studies with 200 arrays. Two RNAs at two concentrations were chosen for repeated (5 arrays per sample) concentrations were chosen for repeated (5 arrays per sample) assays for four pools. The samples were UHRR from Stratagene and assays for four pools. The samples were UHRR from Stratagene and Human Brain Ref from Ambion. The four pools were: A. 100% UHRRHuman Brain Ref from Ambion. The four pools were: A. 100% UHRRB. 100% HBRR C. 75% UHRR: 25% HBRR D. 25% UHRR:75% HBRR.B. 100% HBRR C. 75% UHRR: 25% HBRR D. 25% UHRR:75% HBRR.
At the completion of this study there is data from At the completion of this study there is data from over 1026 arrays!over 1026 arrays!
Platforms Used In MAQC StudyPlatforms Used In MAQC Study
ABIABI (Applied Biosystems)(Applied Biosystems) One-Color ArrayOne-Color Array 32,878 Probes 32,878 Probes
AFX AFX (Affymetrix)(Affymetrix) One-Color ArrayOne-Color Array 54,675 Probes 54,675 Probes
AGL AGL (Agilent)(Agilent) Two-Color ArrayTwo-Color Array 43,931 Probes 43,931 ProbesAGI AGI (Agilent)(Agilent) One-Color ArrayOne-Color Array 43,931 Probes 43,931 Probes
CBC CBC (CapitalBioCorp)(CapitalBioCorp) One & Two ColorOne & Two Color 23,231 Probes 23,231 Probes
EPP EPP (Eppendorf)(Eppendorf) One-Color ArrayOne-Color Array 294 Probes 294 Probes
GEH GEH (GE Healthcare)(GE Healthcare) One-Color ArrayOne-Color Array 54,359 Probes 54,359 Probes
ILM ILM (Illumina)(Illumina) One-Color ArrayOne-Color Array 47,293 Probes 47,293 Probes
NCI NCI (NCI-Operon)(NCI-Operon) Two-Color ArrayTwo-Color Array 37,632 Probes 37,632 Probes
TCI TCI (TeleChem Int)(TeleChem Int) One & Two ColorOne & Two Color 27,648 Probes 27,648 Probes
TAQ TAQ (Applied Biosystems)(Applied Biosystems) TaqManTaqMan® Assays® Assays 1,004 PCRs 1,004 PCRs
QGN QGN (Panomics)(Panomics) QuantiGene AssayQuantiGene Assay 245 Probes 245 Probes
GEX GEX (GeneExpress)(GeneExpress) StaRT-PCR™ Assay StaRT-PCR™ Assay 205 Probes 205 Probes
MAQC STUDY DESIGNMAQC STUDY DESIGN
12,091 Genes12,091 Genes
Used for Com-Used for Com-
parison Acrossparison Across
All Platforms.All Platforms.(Damir Herman, Jean (Damir Herman, Jean Thierry-Mieg)Thierry-Mieg)
Take Home Messages/General Findings From Take Home Messages/General Findings From MAQC StudyMAQC Study
• Large data sets are available for objectively Large data sets are available for objectively assessing platform performance and variousassessing platform performance and various data analysis algorithms. data analysis algorithms.
• Microarray technology is reproducible and Microarray technology is reproducible and reliable when one has an understanding of reliable when one has an understanding of its limitations.its limitations.
• Cross platform analyses requires a veryCross platform analyses requires a very careful annotation & mapping of probecareful annotation & mapping of probe sequences.sequences.
• All the platforms had good intra-lab repeat-All the platforms had good intra-lab repeat- ability, and inter-lab reproducibility after ability, and inter-lab reproducibility after removal of outliers.removal of outliers.
• Methods of microarray analysis are an impor-Methods of microarray analysis are an impor- tant variable, and this large data set will help tant variable, and this large data set will help resolve issues in this area (statisticians andresolve issues in this area (statisticians and bioinformaticists take delight……..) bioinformaticists take delight……..)
Manuscripts in MAQC Study -- Entire Issue ofManuscripts in MAQC Study -- Entire Issue ofNature Biotechnology Sept. 2006Nature Biotechnology Sept. 2006
• EditorialEditorial• FDA ForwardFDA Forward• Stanford - Data quality in genomics and microarraysStanford - Data quality in genomics and microarrays• Impact of microarray data quality in genomic data Impact of microarray data quality in genomic data submissions to the FDAsubmissions to the FDA• US EPA efforts to develop a framework for using US EPA efforts to develop a framework for using genomics data in risk assessment and regulatorygenomics data in risk assessment and regulatory decision making.decision making.• MAQC main manuscript – overall descriptionMAQC main manuscript – overall description• The reproducibility of differentially expressed gene The reproducibility of differentially expressed gene lists in microarray studieslists in microarray studies**• An analysis and comparison of alternative platforms An analysis and comparison of alternative platforms • Use of RNA titrations to assess platform performancUse of RNA titrations to assess platform performanc• Performance of one-color vs two-color arraysPerformance of one-color vs two-color arrays
MAQC Manuscripts (cont.)MAQC Manuscripts (cont.)
• External RNA controls for assessment of microarrayExternal RNA controls for assessment of microarray analytical performanceanalytical performance• Normalization and technical variation in gene Normalization and technical variation in gene expression measurementsexpression measurements**• Toxigenomics and microarrays: biological responseToxigenomics and microarrays: biological response measurements are preserved across platformsmeasurements are preserved across platforms• Reproducibility probability score: A metric incorp-Reproducibility probability score: A metric incorp- orating measurement variability across labs for orating measurement variability across labs for gene comparisongene comparison**
Late news: 9 manuscripts submitted and 6 Late news: 9 manuscripts submitted and 6 were accepted. With 3 commentaries there are were accepted. With 3 commentaries there are 9 articles in the Sept. Nature Biotechnology 9 articles in the Sept. Nature Biotechnology Suppl. from the MAQC.Suppl. from the MAQC.
Nature May 25, 2006Nature May 25, 2006
With proper use of negative With proper use of negative and positive controls, and positive controls, microarrays may be used microarrays may be used to identify, quantitate to identify, quantitate expression and count the expression and count the absolute number of genes absolute number of genes being expressed in any being expressed in any given cell or tissue sample.given cell or tissue sample.………………Anonymous………….Anonymous………….
aka ESKaka ESK
Present (P)& Absent (A) Calls in Present (P)& Absent (A) Calls in Spotted Long Oligo ArraysSpotted Long Oligo Arrays
• “ “Average” cell expresses <10,000 genes.Average” cell expresses <10,000 genes.
• “ “Whole” genome array contains >25,000 genes.Whole” genome array contains >25,000 genes.
• Therefore, Therefore, Present Present calls should be 40% or less or 60%calls should be 40% or less or 60% Absent.Absent.
• However, However, P P calls are usually 90% or more using usualcalls are usually 90% or more using usual image analysis systems like GenePix.image analysis systems like GenePix.
• Why is this? Why do we care?Why is this? Why do we care?
Good negative controls may resolve Good negative controls may resolve this issue.this issue.
Li et al (2005) Bioinformatics 21:2875
What is Background?What is Background?Articles are still being written about how to determine “true” background. Controls can be Controls can be used to settle this used to settle this issue.issue.
Internal BackgroundInternal Background
External BackgroundExternal Background
W Yin et al (2005) Bioinformatics 21:2410W Yin et al (2005) Bioinformatics 21:2410
Common Methods for Background Common Methods for Background SubtractionSubtraction
Use of Negative Controls for Use of Negative Controls for Background SubtractionBackground Subtraction
Internal Background ~ 500-1000 unitsInternal Background ~ 500-1000 unitsExternal Background ~ 100-200 “External Background ~ 100-200 “
%Present using external = 96%%Present using external = 96%%Present using internal = 77%%Present using internal = 77%= 21,565/22,464 vs 17,010/22,464= 21,565/22,464 vs 17,010/22,464
Bckgrd subt eliminated 4,555 genes Bckgrd subt eliminated 4,555 genes from further analysis. Good or bad??from further analysis. Good or bad??
Use of negative controls can dramatically change values for % Use of negative controls can dramatically change values for % genes expressed and gene expression ratios!genes expressed and gene expression ratios!
Low Low SignalSignal
N
Negative Control Negative Control BackgroundBackground
External BackgroundExternal Background
Negative Controls & BackgroundNegative Controls & Background
jurk
at_7
0ju
rkat
_71
jurk
at_7
2ju
rkat
_73
L42
8_57
L42
8_58
L42
8_74
L42
8_75
lnca
p_5
3ln
cap
_54
lnca
p_5
5ln
cap
_56
mcf
_48
mcf
_49
mcf
_51
mcf
_52
oci
_66
oci
_67
oci
_68
oci
_69
sud
_61
sud
_62
sud
_63
sud
_64
0
1000
2000
3000
4000
5000B_Cy5neg_Cy5
F_Cy5
Mea
n in
tens
itie
s
jurk
at_7
0ju
rkat
_71
jurk
at_7
2ju
rkat
_73
L42
8_57
L42
8_58
L42
8_74
L42
8_75
lnca
p_5
3ln
cap
_54
lnca
p_5
5ln
cap
_56
mcf
_48
mcf
_49
mcf
_51
mcf
_52
oci
_66
oci
_67
oci
_68
oci
_69
sud
_61
sud
_62
sud
_63
sud
_64
0
1000
2000
3000
4000
5000
6000
B_Cy3neg_Cy3
F_Cy3
Mea
n in
tens
itie
s
Signal distribution of noise background (B), Signal distribution of noise background (B), negative control background (median)(neg) negative control background (median)(neg) and mean intensities of all probes (F) on the and mean intensities of all probes (F) on the slide separated by Cy5 and Cy3 channelsslide separated by Cy5 and Cy3 channels
Histogram of Intensities of Negative ControlsHistogram of Intensities of Negative Controls
Histogram of Jurkat 70: all negativecontrol spots:Cy3 and Cy5
0100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
3100
3200
3300
3400
3500
3600
3700
3800
3900
4000
4100
4200
4300
4400
4500
4600
4700
4800
4900
5000
5100
5200
5300
5400
5500
5600
5700
5800
5900
6000
6100
6200
6300
6400
6500
66000
10
20
30
40
50
Bin
neg
ati
ve p
rob
e n
um
bers
Cy5Cy5
Cy3Cy3
Influence of Type of Background Subtraction on Influence of Type of Background Subtraction on Expression RatiosExpression Ratios
• Assume control sample gene has signal of 600 units.Assume control sample gene has signal of 600 units. The experimental has a signal of 5600 in same gene.The experimental has a signal of 5600 in same gene. • The external background is 100 units.The external background is 100 units.
• Therefore, the calculated ratio value would be 11.Therefore, the calculated ratio value would be 11.5500/500 = 115500/500 = 11
• But if the negative control background is 500, the But if the negative control background is 500, the ratio is now 51ratio is now 51..
5100/100 = 515100/100 = 51
• Use of negative controls as background may relieve Use of negative controls as background may relieve some of the “compression” in ratios for these types ofsome of the “compression” in ratios for these types of arrays and give a more accurate expression value.arrays and give a more accurate expression value.
1 2 3 4 5 6 7 8 9 10 11 12
1. jurkat; 2. jurkat_neg; 3. L428l; 4. L428_neg; 5. lncap; 6. lncap_neg;7. mcf; 8. mcf_neg; 9. oci; 10. oci_neg; 11. sud; 12sud_neg
Box plots of CV (data are loess normalized, one set with Box plots of CV (data are loess normalized, one set with negative bg sub, another set without) – this figure shows negative bg sub, another set without) – this figure shows background subtraction could improve the data qualitybackground subtraction could improve the data quality
Probability at each cut off threshold
0 250 500 750 1000 1250 1500 1750 20000.00
0.25
0.50
0.75
1.00
tp5
tn5
3 standard deviation
cut off threshold
Pro
bab
ilit
yProbability of True Positives and True Probability of True Positives and True
Negatives Using 3 SD CutoffNegatives Using 3 SD Cutoff
ROC of Jurkat_70 Cy5
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00.00
0.25
0.50
0.75
1.00
3 standard deviation
1- specificity
sen
sit
ivit
y
Sensitivity & Specificity Cutoff ThresholdSensitivity & Specificity Cutoff ThresholdAt a 3 SD ValueAt a 3 SD Value
Perfect Match (PM) and Mismatch (MM): The Perfect Match (PM) and Mismatch (MM): The Affy Image Quantitation MethodsAffy Image Quantitation Methods
GCOS (Gene Chip Operating System): default Affy GCOS (Gene Chip Operating System): default Affy analysis software.analysis software.
RMA (Robust Multiarray Average): Irizarry method using RMA (Robust Multiarray Average): Irizarry method using only PM signals.only PM signals.
GCRMA: Similar to RNA but takes into account GC GCRMA: Similar to RNA but takes into account GC contentcontent
dChip: Similar to GCOS but has with or without MM dChip: Similar to GCOS but has with or without MM options.options.
YW ChipYW Chip: The Yonghong Wang method. PM only with : The Yonghong Wang method. PM only with only sequence validated oligos used in analysis.only sequence validated oligos used in analysis.
Correlation Between 2 Technical Replicates – Affy ChipsCorrelation Between 2 Technical Replicates – Affy Chips
GCOSGCOS No background subtractionNo background subtraction
MAS background subtractionMAS background subtraction RMA background subtractionRMA background subtraction
Influence of Different Methods of Background SubtractionInfluence of Different Methods of Background Subtraction
PM Only vs PM-MM AnalyisPM Only vs PM-MM Analyis of of Technical ReplicatesTechnical Replicates
Mean Values IntensitiesMean Values Intensities S.D. Dist. Probesets 4 RepsS.D. Dist. Probesets 4 Reps
PM OnlyPM Only PM-MMPM-MM
PMPMPMPMMMMM PMPM
PMPMMMMM
Log22 Intensities
Correlation Study of Gene With Absent CallsCorrelation Study of Gene With Absent CallsGenes here were called absent by GCOS in 8 hybs from 2 technical Genes here were called absent by GCOS in 8 hybs from 2 technical replicates. Data indicates that absent calls may not be truly absent in replicates. Data indicates that absent calls may not be truly absent in many cases.many cases.
The MM Probes: C or T at 13The MM Probes: C or T at 13thth Position May Position May Result in Artefactual High Signal: 92% of All Result in Artefactual High Signal: 92% of All MM with Higher Signal Than PM have C or TMM with Higher Signal Than PM have C or T
Probe Mapping Data Will Be Available for Probe Mapping Data Will Be Available for All Platforms Used In The MAQC StudyAll Platforms Used In The MAQC Study
Analysis of Probe Sequences Within Probe Analysis of Probe Sequences Within Probe Sets in Affy Gene ChipSets in Affy Gene Chip
# of “Correct” or Mapped# of “Correct” or Mapped # Probe Sets in Each# Probe Sets in Each Oligos/Probe SetOligos/Probe Set CategoryCategory
11 692 69222 514 51433 433 43344 450 45055 425 42566 499 49977 626 62688 862 86299 1608 1608
1010 3771 3771 1111 36562 36562
How The ERCC & MAQC Can Increase TheHow The ERCC & MAQC Can Increase TheReliability/Acceptance of Microarray DataReliability/Acceptance of Microarray Data
• A set of controls used by all expression platforms willA set of controls used by all expression platforms will go a long way to end confusion about comparabilitygo a long way to end confusion about comparability of data from related experiments.of data from related experiments.• Probe mapping and sequences from all platforms will Probe mapping and sequences from all platforms will be extremely useful for cross platform comparisons.be extremely useful for cross platform comparisons.• Very large data set from all major platforms will point Very large data set from all major platforms will point out problem areas in present protocols/technologies,out problem areas in present protocols/technologies, which, hopefully, will result in their improvement.which, hopefully, will result in their improvement.
• Large data sets from ERCC and MAQC combined will Large data sets from ERCC and MAQC combined will provide a great resource for critically evaluating algo-provide a great resource for critically evaluating algo- rithms used in analyzing arrays. Which analysis rithms used in analyzing arrays. Which analysis method provides “true” answers?method provides “true” answers?• Hopefully, a (workable) consensus about utilization ofHopefully, a (workable) consensus about utilization of microarray technologies will arise from these two large microarray technologies will arise from these two large exercises in (sometimes a bit contentious) human exercises in (sometimes a bit contentious) human scientific cooperation.scientific cooperation.
Is your back to the wall? Are you under a lot of pressure?Is your back to the wall? Are you under a lot of pressure?
USFUSF
In Closing……. In Closing……. My attempt at being funny…..My attempt at being funny…..
Do you feel you’re on the Treadmill of Life?Do you feel you’re on the Treadmill of Life?
Moebius Strip II by M.C. EscherMoebius Strip II by M.C. Escher
Nature vol. 246, p776, 2003Nature vol. 246, p776, 2003
Nano Smiley DNAsNano Smiley DNAs --- Many Happy Genomes --- Many Happy Genomes
100 nm100 nm
Courtesy P Rothemund Nature v440p297y06Courtesy P Rothemund Nature v440p297y06
Keep on smilin’, ‘caus when you’re smilin’, Keep on smilin’, ‘caus when you’re smilin’, the whole world smiles with you……the whole world smiles with you……
Thank you all ------Thank you all ------
ERCC & MAQC ConsortiaERCC & MAQC Consortia
ATC Microarray Lab CrewATC Microarray Lab Crew
YW for Analysis & AP for YW for Analysis & AP for Chip DataChip Data