Download - Canadian Bioinformatics Workshops
Lecture 8Microarrays II: Data Analysis
MBP1010
Dr. Paul C. BoutrosWinter 2014
DEPARTMENT OFMEDICAL BIOPHYSICSDEPARTMENT OFMEDICAL BIOPHYSICS
This workshop includes material originally developed by Drs. Raphael Gottardo, Sohrab Shah, Boris Steipe and others
††
††
Aegeus, King of Athens, consulting the Delphic Oracle. High Classical (~430 BCE)
Lecture 8: Microarrays Part II bioinformatics.ca
Course Overview• Lecture 1: What is Statistics? Introduction to R• Lecture 2: Univariate Analyses I: continuous• Lecture 3: Univariate Analyses II: discrete• Lecture 4: Multivariate Analyses I: specialized models• Lecture 5: Multivariate Analyses II: general models• Lecture 6: Sequence Analysis• Lecture 7: Microarray Analysis I: Pre-Processing• Lecture 8: Microarray Analysis II: Multiple-Testing• Lecture 9: Machine-Learning• Final Exam (written)
Lecture 8: Microarrays Part II bioinformatics.ca
House Rules• Cell phones to silent
• No side conversations
• Hands up for questions
Lecture 8: Microarrays Part II bioinformatics.ca
Topics For This Week• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II bioinformatics.ca
Example #1You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Tumour penetrance in these two lines is 100%. Your hypothesis: tumours in mice lacking TS will be smaller than those in mice with amplification of OG, as assessed by post-mortem volume measurements of the primary tumour. Your data:
TS (cm3)3.97.13.14.45.0
OG (cm3)5.21.95.06.14.54.8
Lecture 8: Microarrays Part II bioinformatics.ca
Example #2You are conducting a study of osteosarcomas using mouse models. You are studying transgenic animals with deletion of a tumour suppressor (TS), or with amplification of an oncogene (OG). You consider the penetrance of tumours in a set of 8 different mouse strains.Your hypothesis: some mouse strains are lead to bigger tumours than others when OG is amplified and only considering animals in which tumours form. You measure tumour volume in mm3 using calipers.
Strain 1 (mm3)916983
Strain 2 (mm3)2017071
Strain 3 (mm3)153620
Strain 4 (mm3)525253
Strain 5 (weeks)11
53859
Strain 6 (mm3)6
6063
Strain 7 (mm3)857970
Strain 8 (mm3)100105121
Lecture 8: Microarrays Part II bioinformatics.ca
Example #3You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Tumour penetrance in these two lines is 100%. Your hypothesis: mice lacking TS are less likely to respond to a novel targeted therapeutic (DX) than wildtype animals, as assessed by molecular imaging:
TS (imaging response)YesNoYesYesNo
WT (imaging response)YesYesYesYesNoYes
Lecture 8: Microarrays Part II bioinformatics.ca
Example #4You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Based on your previous data, you now hypothesize that mice lacking TS will show a similar molecular response to DX as those with amplification of OG. You use microarrays to study 20,000 genes in each line, and identify the following genes as changed between drug-treated and vehicle-treated:
TS (DX-responsive genes)MYC KRAS CD53CDH1 FBW1 SEPT7MUC1 MUC3 MUC9RNF3
OG (DX-responsive genes)MYC KRAS CD53CDH1 MUC1 MARCH1PTEN IDH3 ESR2RHEB CTCF STK11MLL3 KEAP1 NFE2L2ARID1A
Lecture 8: Microarrays Part II bioinformatics.ca
Example #5You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice naturally susceptible to these tumours at ~20% penetrance. You are studying two transgenic lines, one with deletion of a tumour suppressor (TS), the other with amplification of an oncogene (OG). Tumour penetrance in these is 100%.Your hypothesis: You now wonder if tumour size is differing by age of the animal, and suspect tumour-size differs between lines, but is confounded by age differences. Your data:
TS (cm3)3.9 (17 weeks)7.1 (15 weeks)3.1 (15 weeks)4.4 (22 weeks)5.0 (22 weeks)
OG (cm3)5.2 (17 weeks)1.9 (9 weeks)
5.0 (15 weeks)6.1 (15 weeks)4.5 (21 weeks)4.8 (20 weeks)
Wildtype (cm3)1.1 (9 weeks)
1.5 (10 weeks)2.1 (15 weeks)2.5 (15 weeks)0.3 (17 weeks)2.2 (21 weeks)
Lecture 8: Microarrays Part II bioinformatics.ca
Example #6You are conducting a study of osteosarcomas using mouse models. You are using a strain of mice that is naturally susceptible to these tumours at a frequency of ~20%. You are studying two transgenic lines, one of which has a deletion of a putative tumour suppressor (TS), the other of which has an amplification of a putative oncogene (OG). Tumour penetrance in these two lines is 100%. Your hypothesis: mice lacking TS will acquire tumours sooner than wildtype mice. You test the mice weekly using ultrasound imaging. Your data:
TS (week of tumour)47765
OG (week of tumour)393243
Lecture 8: Microarrays Part II bioinformatics.ca
Topics For This Week• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II bioinformatics.ca
Summary Point #1:
Microarray data is analyzed with a pipeline of sequential algorithms.
This pipeline defines the standard workflow for microarray experiments.
Lecture 8: Microarrays Part II bioinformatics.ca
Quantitation
Cy3 Cy5Spot
SpotQuality
Intra-ArrayInter-array
Spot List
Clustering
Background
SignificanceTesting
Integration ?
Lecture 8: Microarrays Part II bioinformatics.ca
Summary Point #3:
These basic steps hold true for all microarray platforms and types.
Lecture 8: Microarrays Part II bioinformatics.ca
What Is BioConductor?
“Bioconductor is an open source, open development software project to provide tools for the analysis and comprehension of high-throughput genomic data.”
- BioConductor website
The vast majority of our analyses will use BioConductor code, but there are clearly non-BioConductor approaches.The vast majority of our analyses will use BioConductor code, but there are clearly non-BioConductor approaches.
Module 1 bioinformatics.ca
I’ve outlined the general workflow.
Each technology and application has its own unique characteristics to consider.
Module 1 bioinformatics.ca
Quantitation
Cy3 Cy5Spot
SpotQuality
Intra-ArrayInter-array
Spot List
Clustering
Background
SignificanceTesting
Integration ?
Quantitation is done according to Affymetrix defaults with minimal user intervention.
Quantitation is done according to Affymetrix defaults with minimal user intervention.
One-Channel arrayOne-Channel array
Typically ignoredTypically ignored
Single-Channel array, so one simultaneous normalization procedure
Single-Channel array, so one simultaneous normalization procedure
Module 1 bioinformatics.ca
.CELFiles.CELFiles
Background Normalization
ProbeSetAnnotation
Spot List
Integration
?
StatisticsClustering
Module 1 bioinformatics.ca
First let’s go Back to Pre-Processing
What exactly is pre-processing (aka normalization)?
What exactly is pre-processing (aka normalization)?
Why do we do it?Why do we do it?
Module 1 bioinformatics.ca
Any step in the experimental pipeline can introduce artifactual noise
• Array design• Array manufacturing• Sample quality• Sample identity sequence effects?• Sample processing• Hybridization conditions ozone?• Scanner settings
Pre-Processing tries to remove these systematic effectsPre-Processing tries to remove these systematic effects
Module 1 bioinformatics.ca
Important Note
Pre-processing is never a substitute for good experimental design. This is not a course on statistical design, but a few basic principles should be mentioned.
Pre-processing is never a substitute for good experimental design. This is not a course on statistical design, but a few basic principles should be mentioned.
Always try to balance experimental groups.Always try to balance experimental groups.
Biological replicates are preferable to technical
replicates.
Biological replicates are preferable to technical
replicates.
If processing samples identically is not possible, include controls for processing-effects.
If processing samples identically is not possible, include controls for processing-effects.
Lecture 8: Microarrays Part II bioinformatics.ca
Pre-Processing
What exactly is pre-processing (aka normalization)?
What exactly is pre-processing (aka normalization)?
Why do we do it?Why do we do it?
Lecture 8: Microarrays Part II bioinformatics.ca
Sources of Technical Noise
Where does technical noise come from?
Lecture 8: Microarrays Part II bioinformatics.ca
Any step in the experimental pipeline can introduce artifactual noise• Array design• Array manufacturing• Sample quality• Sample identity sequence effects?• Sample processing• Hybridization conditions ozone?• Scanner settings
Pre-Processing tries to remove these systematic effectsPre-Processing tries to remove these systematic effects
Lecture 8: Microarrays Part II bioinformatics.ca
Affymetrix Pre-Processing Steps
1. Background Correction
2. Normalization
3. Probe-Specific Adjustment
4. Summarizing multiple Probes into a single ProbeSet
Let’s look at two common approachesLet’s look at two common approaches
Module 1 bioinformatics.ca
Introducing Two Major Affymetrix Pre-Processing Methods
• The two most commonly used methods are:• RMA = Robust Multi-array• MAS5 = Microarray Analysis Suite version 5
• MAS5 has strengths & weaknesses• Sacrifices precision for accuracy• Can easily be used in clinical settings
• RMA has strengths & weaknesses• Sacrifices accuracy for precision• Challenging to integrate multiple studies• Reduces variance (critical for small-n studies)
• Both are well accepted by journals and reviewers, perhaps RMA a bit more so. We’ll talk about some of the mathematics later on in this course.
Lecture 8: Microarrays Part II bioinformatics.ca
Approach #1: MAS5
• Affymetrix put significant effort into developing good data pre-processing approaches
• MAS5 was an attempt to develop a “standard” technique for 3’ expression arrays
• The flaws of MAS5 led to an influx of research in this area.
• The algorithm is best-described in an Affymetrix white-paper, and is actually quite challenging to reproduce exactly in R.
Lecture 8: Microarrays Part II bioinformatics.ca
MAS5 Model
Observations = True Signal + Random Noise + Probe EffectsObservations = True Signal + Random Noise + Probe Effects
Assumptions?Assumptions?
Lecture 8: Microarrays Part II bioinformatics.ca
MAS5: Background & NoiseBackground
•Divide chip into zones
•Select lowest 2% intensity values
•stdev of those values is zone variability
•Background at any location is the sum of all zones background, weighted by 1/((distance^2) + fudge factor)
Noise
•Using same zones as above
•Select lowest 2% background
•stedev of those values is zone noise
•Noise at any location is the sum of all zone noise as above
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
Lecture 8: Microarrays Part II bioinformatics.ca
MAS5: Adjusted Intensity
A = Intensity minus background, the final value should be > noise.
A: adjusted intensityI: measured intensityb: backgroundNoiseFrac: default 0.5 (another fudge factor)
And the value should always be >=0.5 (log issues)(fudge factor)
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
Lecture 8: Microarrays Part II bioinformatics.ca
MAS5: Ideal MismatchBecause Sometimes MM > PM
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
Lecture 8: Microarrays Part II bioinformatics.ca
MAS5: Signal
Value for each probe:
Modified mean of probe values:
Scaling Factor (Sc default 500)
Tbi = Tukey Biweight (mean estimate, resistant to outliers)TrimMean = Mean less top and bottom 2%
•From http://www.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf
ReportedValue(i) = nf * sf * 2 (SignalLogValuei)Signal(nf=1)
Lecture 8: Microarrays Part II bioinformatics.ca
Why do we use a “robust” method?
Robust summaries really improve over the standard ones by down weighing outliers and leaving their effects visible in residuals.
Why do we use “array”?
To put each chip’s values in the context of a set of similar values.
RMA = Robust Multi-Array
What is RMA?
Lecture 8: Microarrays Part II bioinformatics.ca
What is RMA?
Assumes all the chips have the same background distribution
Does not use the mismatch probe (MM) data from the microarray experiments
It is a log scale linear additive model
Why?
Lecture 8: Microarrays Part II bioinformatics.ca
What is RMA?
Mismatch probes (MM) definitely have information - about both signal and noise - but using it without adding more noise is a challenge
We should be able to improve the background correction using MM, without having the noise level blow up: topic of current research (GCRMA)
Ignoring MM decreases accuracy but increases precision
Lecture 8: Microarrays Part II bioinformatics.ca
Methodology
Quantile Normalization – the goal of this method is to make the distribution of probe intensities for each array in a set of arrays the same. This method is motivated by the idea that a Q-Q plot shows that the distribution of two data vectors is the same if the plot is a straight diagonal line and not the same if it is anything else.
Lecture 8: Microarrays Part II bioinformatics.ca
Methodology
Summarization: combining multiple probe intensities of each probeset to produce expression values
An additive linear model is fit to the normalized data to obtain an expression measure for each probe on the GeneChip
Yij = aj + βi + εij
Lecture 8: Microarrays Part II bioinformatics.ca
Methodology
Yij = aj + βi + εij
Yij denotes the background-corrected normalized probe value corresponding to the ith GeneChip and the jth probe within the probeset [log2(PM-BG)*
ij]
εij is the random error term
aj is the probe affinity jth probe
βi is the chip effect for the ith GeneChip (log scale expression level)
Lecture 8: Microarrays Part II bioinformatics.ca
Methodology
Yij = aj + βi + εij
Estimate aj ( probe affinity) and βi (chip effect) using a robust method:
• Tukey’s Median polish (quick) - fits iteratively, successively removing row and column medians, and accumulating the terms, until the process stabilizes. The residuals are what is left at the end
Lecture 8: Microarrays Part II bioinformatics.ca
RMA vs. MAS5
• RMA sacrifices accuracy for precision
• RMA is generally not appropriate for clinical settings
• RMA provides higher sensitivity/specificity in some tests
• RMA reduces variance (critical for small-n studies)
• RMA is better accepted by journals and reviewers
Lecture 8: Microarrays Part II bioinformatics.ca
Topics For This Week• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II bioinformatics.ca
One key detail has been omitted so far:
How do we know if our pre-processing actually worked?
How do we know if our pre-processing actually worked?
Lecture 8: Microarrays Part II bioinformatics.ca
Can we determine how well our pre-processing worked?
Or if our data looks good?
Lecture 8: Microarrays Part II bioinformatics.ca
Those Three Were From A Spike-In Experiment Done by Affymetrix
Lecture 8: Microarrays Part II bioinformatics.ca
Those Last Three Were From An Experiment We Did On Rat Liver Samples
Lecture 8: Microarrays Part II bioinformatics.ca
Were Those Bad Samples?• Lots of evident spatial artifacts
• But in practice all samples were carried forward into analysis
• And validation (RT-PCR) confirmed the overall study results for many genes
Lecture 8: Microarrays Part II bioinformatics.ca
Eye-ball Assessments Are Hard• A couple of useful tricks:
• Look at the distributions• Did quantile normalization work (for RMA)?
• Look at the inter-sample correlations• Is one sample a strong outlier?
• Look at the 3’ 5’ trend across a ProbeSet
I know of no accepted, systematic QA/QC methodsI know of no accepted, systematic QA/QC methods
Lecture 8: Microarrays Part II bioinformatics.ca
What Do You Do If You Find a Bad Array?• Repeat it?
• Drop the sample?
• Include it but account for the “noise” in another way?
Lecture 8: Microarrays Part II bioinformatics.ca
In This Case• We excluded a series of outlier samples
• We believed these samples had been badly degraded because their were derived from FFPE blocks
Lecture 8: Microarrays Part II bioinformatics.ca
Topics For This Week• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II bioinformatics.ca
T-tests
• What are the assumptions of the t-test?
• When would you feel comfortable using a t-test?
Lecture 8: Microarrays Part II bioinformatics.ca
T-Test Alternative: Wilcoxon Rank-Sum• Also called:
• U-test• Mann-Whitney (U) test
• Some argue that for continuous microarray data there is rarely a good reason to use this test:• Low n: tests of normality are not very powerful• High n: the central limit theorem provides support
• If the sample is normal, asymptotic efficiency is 0.95
Lecture 8: Microarrays Part II bioinformatics.ca
T-Test Alternative: Moderated Statistics• A series of highly complex methods based on Bayesian
statistical methodologies
• Gordon Smyth’s limma R package is by far the most widely used implementation of this technique
This term is “shrunk” by borrowing power across all genes. This increases effective power.
This term is “shrunk” by borrowing power across all genes. This increases effective power.
Lecture 8: Microarrays Part II bioinformatics.ca
T-Test Alternative: Permutation Tests
• SAM is the classic method• Most people suggest not using SAM today
• Empirically estimate the null distribution
Start with many samplesStart with many samples Randomly SampleRandomly Sample
IterateIterate
Lecture 8: Microarrays Part II bioinformatics.ca
Problems with Significance Testing
• What happens if there are NO changes?
• Imagine:• You analyzed 1,000 clinical samples• 20,000 genes in the genome• P < 0.05
• What if… somebody comes and randomizes all your data?
Lecture 8: Microarrays Part II bioinformatics.ca
You had a lot of Data
20,000 genes / array
AllRandomized
1,000 patients
20,000,000 data points
What happens if you analyze this data?
There should be NO real hits anymore!
Genes are mixed up togetherPatients are mixed together
Lecture 8: Microarrays Part II bioinformatics.ca
What will you actually find?
Array: 20,000 genes
Threshold: p < 0.05
20,000 x 0.05 = 1000 False Positives
This is called “multiple testing”.
There is a solution
Lecture 8: Microarrays Part II bioinformatics.ca
A “false-discovery rate adjustment” (FDR) for multiple testing considers all 20,000 p-
values simultaneously
In this experiment, lots of low p-values, so we can use this to “adjust” the p-values so we can find the true hits.
P-Value
Expected Value
0%
5%
10%
15%
20%
Lecture 8: Microarrays Part II bioinformatics.ca
In this experiment, NO enrichment for low p-values,
so no more hits than expected randomly.
This is what you get from randomized data…
Lecture 8: Microarrays Part II bioinformatics.ca
Topics For This Week• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II bioinformatics.ca
The Mask Production Makes Affymetrix Designs Expensive To Change
Photolithographic mask
Lecture 8: Microarrays Part II bioinformatics.ca
We Can Change Those Mappings!
HybridizedChip
HybridizedChip
Lecture 8: Microarrays Part II bioinformatics.ca
CDF File• Chip Definition File
• This file maps Probes (positions) into ProbeSets
• We can update those mappings• Ignore deprecated or cross-hybridizing probes• Merge multiple probes that recognize the same gene• Account for entirely new genes that were not known at the time
of array-design
Lecture 8: Microarrays Part II bioinformatics.ca
Sequence Mappings Are Slow
• Requires aligning millions of 25 bp probes against the transcriptome and identifying the best match for each
• Fortunately, other groups have done this for us, and regularly update their mappings
Lecture 8: Microarrays Part II bioinformatics.ca
But There Is Also A Major Benefit
Increased validation rates using RT-PCR (~10%)
Increased validation rates using RT-PCR (~10%)
Sandberg et alBMC Bioinformatics2007
Sandberg et alBMC Bioinformatics2007
Lecture 8: Microarrays Part II bioinformatics.ca
Topics For This Week• Examples
• Attendance
• Pre-Processing
• QA/QC
• Microarray-Specific Statistics
• ProbeSet remapping
• Organizing –omics studies
Lecture 8: Microarrays Part II bioinformatics.ca
What Are The Outputs of A Microarray Study?
• Primary Data• Raw image (.DAT file)• Quantitation (.CEL file)
• Secondary Data• Normalized data (usually an ASCII text file)• QA/QC plots
• Tertiary Data• Statistical analyses• Global visualization (e.g. heatmaps)• Downstream analyses (e.g. pathway, dataset-integration)
These file can be 10s of GB for a typical Affy study
These file can be 10s of GB for a typical Affy study
Lecture 8: Microarrays Part II bioinformatics.ca
How Do You Organize These Data?
/data//data/
I recommend you put things on a fast, backed-up network drive I recommend you put things on a fast, backed-up network drive
/data/Project/data/Project
Organize data by projectOrganize data by project
/data/Project/raw/data/Project/QAQC/data/Project/pre-processing/data/Project/statistical/data/Project/pathway
/data/Project/raw/data/Project/QAQC/data/Project/pre-processing/data/Project/statistical/data/Project/pathway
Create separate directories for each analysisCreate separate directories for each analysis
Lecture 8: Microarrays Part II bioinformatics.ca
How Do You Organize The Scripts?
I recommend you write a separate script for each analysis, and put those in a standardized (backed-up!) location, mirroring the directory structure and naming of your dataset directories.
Some sub-structure here is often useful:
I recommend you write a separate script for each analysis, and put those in a standardized (backed-up!) location, mirroring the directory structure and naming of your dataset directories.
Some sub-structure here is often useful:
/scripts/Project/pre-processing.R/scripts/Project/statistical-univariate.R/scripts/Project/statistical-multivariate.R/scripts/Project/pathway/GOMiner.R/scripts/Project/pathway/Reactome.R/scripts/Project/integration/mRNA+CNV.R/scripts/Project/integration/public-data.R
/scripts/Project/pre-processing.R/scripts/Project/statistical-univariate.R/scripts/Project/statistical-multivariate.R/scripts/Project/pathway/GOMiner.R/scripts/Project/pathway/Reactome.R/scripts/Project/integration/mRNA+CNV.R/scripts/Project/integration/public-data.R
Lecture 8: Microarrays Part II bioinformatics.ca
Why Many Small Scripts?
• Monolithic scripts are hard to maintain• Easier to make errors
• Accidentally re-using the same variable name• Harder to debug
• Harder for somebody else to learn
• Small scripts are more flexible• Quicker to modify/re-run a small part of your analysis• Easier to re-use the same code on another dataset
• This is akin to the “unix” mindset of systems design
Lecture 8: Microarrays Part II bioinformatics.ca
What To Save?• Everything!!
• All QA/QC plots (common reviewer request)• All pre-processed data (needed for GEO uploads)• Gene-wise statistical analyses
• Not just the statistically-significant genes• Collapse all analyses into one file, though
• All plots/etc
• Using clear filenames is critical• Disk-space is not usually a critical concern here
• Your raw data will be much larger than your output!
Lecture 8: Microarrays Part II bioinformatics.ca
Most Important Points• Do not delete things:
• Keep all old versions of your scripts by including the date in the filename (or using source-control)
• Version output files by date• I have needed to go back to analyses done 7 years prior!
• Make regular (weekly) backups:• Try to pass this work off to professional sysadmins• External hard-drives/USBs are okay if you cannot get access to
network drives, but try to automate
Lecture 8: Microarrays Part II bioinformatics.ca
Course Overview• Lecture 1: What is Statistics? Introduction to R• Lecture 2: Univariate Analyses I: continuous• Lecture 3: Univariate Analyses II: discrete• Lecture 4: Multivariate Analyses I: specialized models• Lecture 5: Multivariate Analyses II: general models• Lecture 6: Sequence Analysis• Lecture 7: Microarray Analysis I: Pre-Processing• Lecture 8: Microarray Analysis II: Multiple-Testing• Lecture 9: Machine-Learning• Final Exam (written)