![Page 1: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/1.jpg)
Some Statistical Issues in Microarray Data Analysis
Alex SánchezEstadística i Bioinformàtica
Departament d’Estadística Universitat de BarcelonaUnitat d’Estadística i BioinformàticaIR-HUVH
![Page 2: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/2.jpg)
2
Outline
Introduction Experimental design Selecting differentially expressed genes
Statistical tests Significance testing Linear models and Analysis of the variance Multiple testing
Software for microarray data analysis
![Page 3: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/3.jpg)
Introduction
![Page 4: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/4.jpg)
4
Microarray experiments: Overview
![Page 5: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/5.jpg)
5
Why are we talking of statistics?
A microarray experiment is, as called, an experiment, that is: It has been performed to determine if some
previous hypothesis are true or false (although it can also lead to new hypotheses)
It is subject to errors which may arise from many sources
![Page 6: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/6.jpg)
6
Sources of variability Biological Heterogeneity in Population Specimen Collection/ Handling Effects
Tumor: surgical bx, FNA Cell Line: culture condition, confluence
level Biological Heterogeneity in Specimen RNA extraction RNA amplification
Fluor labeling
Hybridization
Scanning – PMT voltage – laser power
(Geschwind, Nature Reviews Neuroscience, 2001)
![Page 7: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/7.jpg)
7
Categories of variability
Systematic variability Amount of RNA in the
biopsy Efficiencies of lab
procedures such as: RNA extraction, reverse transcription, Labeling or photodetection
Random variation PCR yield DNA quality spotting efficiency, spot size cross-/unspecific
hybridization stray signal
![Page 8: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/8.jpg)
8
Dealing with systematic variability
Systematic variability has similar effects on many measurements
Corrections can be estimated from dataCALIBRATION or NORMALIZATION is the
general name for processes that correct for systematic variability
![Page 9: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/9.jpg)
9
Dealing with random variation
Random variation cannot be explicitly accounted for
Usual way to deal with it is to assume some ERROR MODELS (e.g. ei~N(0, 2))
Assuming these error models are true… EXPERIMENTAL DESIGN is (must be) used to EXPERIMENTAL DESIGN is (must be) used to
control the action of random variationcontrol the action of random variation STATISTICAL INFERENCE is (must be) used to STATISTICAL INFERENCE is (must be) used to
extract conclusions in the presence of random extract conclusions in the presence of random variationvariation
![Page 10: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/10.jpg)
10
Biological verification and interpretation
Microarray experiment
Experimental design
Image analysis
Normalization
Biological question
TestingEstimation Discrimination
AnalysisClustering
Quality Measurement
Failed
Pass
Today
![Page 11: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/11.jpg)
Experimental design
![Page 12: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/12.jpg)
12
Why experimental design?
The objective of experimental design is to make the analysis of the data and the interpretation of the resultsAs simple and as powerful as possibleGiven the purpose of the experimentAnd the constraints of the experimental
material
![Page 13: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/13.jpg)
13
Scientific aims and design choice
The primary focus of the experiments needs to be clearly stated, whether it is: to identify differentially expressed genes to search for specific gene-expression patterns to identify phenotypic subclasses
Aim of the experiment guides design choiceSometimes only one choice is reasonableSometimes different options available
![Page 14: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/14.jpg)
14
Designing microarray experiments
The appropriate design of a microarray experiment must considerDesign of the arrayAllocation of mRNA samples to the slides
![Page 15: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/15.jpg)
15
I: Layout of the array
Which sequences to usecDNA’s Selection of cDNA from library
Riken, NIA, etcAffymetrix PM’s and MM’s
Oligo probes selection (from Operon, Agilent, etc)Control probes
What %?. Where should controls be put
How many sequences to use Should there be replicate spots within a slide?
![Page 16: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/16.jpg)
16
II: Allocating samples in slides
Types of SamplesReplication: technical vs biologicalPooled vs individual samples
Different design layout / data analysis:Scientific aim of the experimentEfficiency, Robustness, Extensibility
Physical limitations (cost) :Number of slidesAmount of material
![Page 17: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/17.jpg)
17
Basic principles of experimental design
Apply the following principles to best attain the objectives of experimental designReplicationLocal control or BlockingRandomization
![Page 18: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/18.jpg)
18
1. Replication It’s important
To reduce uncertainty (increase precision) To obtain sufficient power for the tests As a formal basis for inferential procedures
Consider different types of replicates Technical
Duplicate spots Multiple hybridizations from the same sample
Biological Repeat most what is expected to vary most!
2
var XXn
![Page 19: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/19.jpg)
19
Biological vs Technical Replicates
@ Nature reviews & G. Churchill (2002)
2B
2A
2e
![Page 20: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/20.jpg)
20
Replication vs Pooling mRNA from different samples are often combined to
form a ``pooled-sample’’ or pool. Why? If each sample doesn’t yield enough mRNATo compensate an excess of variability ?
Statisticians tend not to like it but pooling may be OK if properly doneCombine several samples in each poolUse several pools from different samplesDo not use pools when individual information is
important (e.g.paired designs)
![Page 21: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/21.jpg)
21
2. Blocking Assume we wish to perform an experiment to
compare two treatments. The samples or their processing may not be
homogeneous: There are blocks Subjects: Male/Female Arrays produced in two lots (February, March)
If there are systematic differences between blocks the effects of interest (e.g. tretament) may be confounded Observed differences are attributable to treatment
effect or to confounding factors?
![Page 22: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/22.jpg)
22
Confounding block with treatment effects
Sample Treatment Sex Batch Sample Treatment Sex Batch1 A Male 1 1 A Male 12 A Male 1 2 A Female 23 A Male 1 3 A Male 14 A Male 1 4 A Female 25 B Female 2 5 B Male 16 B Female 2 6 B Female 27 B Female 2 7 B Male 18 B Female 2 8 B Female 2
Awful design Balanced design
Two alternative designs to investigate treatment effects Left: Treatment effects confounded with Sex and Batch effect Right: Treatments are balanced between blocks
Influence of blocks is automatically compensated Statistical analysis may separate block from treatment efefect
![Page 23: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/23.jpg)
23
3. Randomisation
Randomly assigning samples to groups to eliminate unspecific disturbancesRandomly assign individuals to treatments.Randomise order in which experiments are
performed. Randomisation required to ensure validity
of statistical procedures. Block what you can and randomize what
you cannot
![Page 24: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/24.jpg)
24
Experimental layout
How are mRNA samples assigned to arrays The experimental layout has to be chosen
so that the resulting analysis can be done as efficient and robust as possibleSometimes there is only one reasonable choiceSometimes several choices are available
![Page 25: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/25.jpg)
25
Case 1: Meaningful biological control (C)Samples: Liver tissue from 4 mice treated by cholesterol modifying
drugs.Question 1: Genes that respond differently between the T and the C.Question 2: Genes that responded similarly across two or more treatments relative to control.
Case 2: Use of universal reference.Samples: Different tumor samples.
Question: To discover tumor subtypes.
Example I: Only one design choice
T2 T3 T4
C
T1 T1
Ref
T2 Tn-1 Tn
![Page 26: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/26.jpg)
26
Example 2: a number of different designs are suitable for use (1) Time course experiments
Design choice depends on the comparisons of interest
T2 T3 T4T1
Ref T2 T3 T4T1
T2 T3 T4T1 T2 T3 T4T1
![Page 27: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/27.jpg)
27
How can we decide?
A-optimality: choosee design which minimizes variance of estimates of effects of interest
A simple example: Direct vs indirect estimates
A BA
BR
Direct Indirect
2 /2 22
average (log (A/B)) log (A / R) – log (B / R )
![Page 28: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/28.jpg)
28
Summary Selection of mRNA samples is important
Most important: biological replicates Technical replicates also useful, but different If needed and possible use pooling wisely
Choice of experimental layout guided by The scientific question Experimental design principles Efficiency and robustness considerations
Correspondence between experimental Designs-Linear Models-ANOVA can be exploited to select model and analyze data
![Page 29: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/29.jpg)
29
Experimental design, Linear Models and Analysis of the Variance In experimental design the different
sources of variability influencing the observed response may be identified.
These sources can be related with the response using a linear model
Analysis of the variance can be used to separately estimate and test the relative importance of each source of variability.
![Page 30: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/30.jpg)
Statistical methods to detect differentially expressed genes
![Page 31: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/31.jpg)
31
Class comparison: Identifying differentially expressed genes Identify genes differentially expressed between
different conditions such as Treatment, cell type,... (qualitative covariates) Dose, time, ... (quantitative covariate) Survival, infection time,... !
Estimate effects/differences between groups probably using log-ratios, i.e. the difference on log scale log(X)-log(Y) [=log(X/Y)]
![Page 32: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/32.jpg)
32
What is a “significant change”?
Depends on the variability within groups, which may be different from gene to gene.
To assess the statistical significance of differences, conduct a statistical test for each gene.
![Page 33: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/33.jpg)
33
Different settings for statistical tests Indirect comparisons: 2 groups, 2 samples, unpaired
E.g. 10 individuals: 5 suffer diabetes, 5 healthy One sample fro each individual Typically: Two sample t-test or similar
Direct comparisons: Two groups, two samples, paired E.g. 6 individuals with brain stroke. Two samples from each: one from healthy (region 1) and
one from affected (region 2). Typically: One sample t-test (also called paired t-test) or
similar based on the individual differences between conditions.
![Page 34: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/34.jpg)
34
Different ways to do the experiment
An experiment use cDNA arrays (“two-colour”) or affy (“one-colour).
Depending on the technology used allocation of conditions to slides changes.
Type of chip
Experiment
cDNA(2-col)
Affy
(1-col)
10 indiv.
Diab (5)
Heal (5)
Reference design.
(5) Diab/Ref (5) Heal/Ref
Comparison design.
(5) Diab vs (5) Heal
6 indiv.
Region 1
Region 2
6 slides
1 individual per slide
(6) reg1/reg2
12 slides
(6) Paired differences
![Page 35: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/35.jpg)
35
1 1
1 1Mean difference =
Classical t-test = ( ) 1/ 1/
Robust t-test = Use robust estimates of location &scale
CT nn
i ii iT C
p T C
T C T Cn n
t T C s n n
“Natural” measures of discrepancy
1
1Mean (log) ratio = , (R or M used indistinctly)
Classical t-test = ( ) , ( estimates standard error of R)
Robust t-test = Use robust estimates of location &scale
Tn
iiT
Rn
t R SE SE
For Direct comparisons in two colour or paired-one colour.
For Indirect comparisons in two colour or Direct comparisons in one colour.
![Page 36: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/36.jpg)
36
Some Issues Can we trust average effect sizes (average difference of
means) alone? Can we trust the t statistic alone? Here is evidence that the answer is no.
Gene M1 M2 M3 M4 M5 M6 Mean SD t
A 2.5 2.7 2.5 2.8 3.2 2 2.61 0.40 16.10
B 0.01 0.05 -0.05 0.01 0 0 0.003 0.03 0.25
C 2.5 2.7 2.5 1.8 20 1 5.08 7.34 1.69
D 0.5 0 0.2 0.1 -0.3 0.3 0.13 0.27 1.19
E 0.1 0.11 0.1 0.1 0.11 0.09 0.10 0.01 33.09
Courtesy of Y.H. Yang
![Page 37: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/37.jpg)
37
Some Issues
Gene M1 M2 M3 M4 M5 M6 Mean SD t
A 2.5 2.7 2.5 2.8 3.2 2 2.61 0.40 16.10
B 0.01 0.05 -0.05 0.01 0 0 0.003 0.03 0.25
C 2.5 2.7 2.5 1.8 20 1 5.08 7.34 1.69
D 0.5 0 0.2 0.1 -0.3 0.3 0.13 0.27 1.19
E 0.1 0.11 0.1 0.1 0.11 0.09 0.10 0.01 33.09
Courtesy of Y.H. Yang
Can we trust average effect sizes (average difference of means) alone?
Can we trust the t statistic alone? Here is evidence that the answer is no.
•Averages can be driven by outliers.
![Page 38: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/38.jpg)
38
Some Issues
Gene M1 M2 M3 M4 M5 M6 Mean SD t
A 2.5 2.7 2.5 2.8 3.2 2 2.61 0.40 16.10
B 0.01 0.05 -0.05 0.01 0 0 0.003 0.03 0.25
C 2.5 2.7 2.5 1.8 20 1 5.08 7.34 1.69
D 0.5 0 0.2 0.1 -0.3 0.3 0.13 0.27 1.19
E 0.1 0.11 0.1 0.1 0.11 0.09 0.10 0.01 33.09
Courtesy of Y.H. Yang•t’s can be driven by tiny variances.
Can we trust average effect sizes (average difference of means) alone?
Can we trust the t statistic alone? Here is evidence that the answer is no.
![Page 39: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/39.jpg)
39
Variations in t-tests (1)
Let Rg mean observed log ratio
SEg standard error of Rg estimated from data on gene g.
SE standard error of Rg estimated from data across all genes.
Global t-test: t=Rg/SE
Gene-specific t-test t=Rg/SEg
![Page 40: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/40.jpg)
40
Some pro’s and con’s of t-test
Test Pro’s Con’s
Global t-test:
t=Rg/SE
Yields stable variance estimate
Assumes variance homogeneity
biased if false
Gene-specific: t=Rg/SEg
Robust to variance heterogeneity
Low power Yields unstable variance estimates (due to few data)
![Page 41: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/41.jpg)
41
T-tests extensions
g
g
RS
c SE
2 20
0
( 1)
2
g
g
Rt
v SE n SE
v n
2 20 0
0
g
g
Rt
d SE d SE
d d
SAM (Tibshirani, 2001)
Regularized-t (Baldi, 2001)
EB-moderated t(Smyth, 2003)
![Page 42: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/42.jpg)
42
Up to here…: Can we generate a list of candidate genes?
Gene 1: M11, M12, …., M1k
Gene 2: M21, M22, …., M2k
…………….Gene G: MG1, MG2, …., MGk
For every gene, calculateSi=t(Mi1, Mi2, …., Mik),
e.g. t-statistics, S, B,…
A list of candidateDE genes
Statistics of interestS1, S2, …., SG
?
With the tools we have, the reasonable steps to generate a list of candidate genes may be:
We need an idea of how significant are these values We’d like to assign them p-values
![Page 43: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/43.jpg)
Significance testing
![Page 44: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/44.jpg)
44
Nominal p-values
After a test statistic is computed, it is convenient to convert it to a p-value:
The probability that a test statistic, say S(X), takes values equal or greater than that taken on the observed sample, say S(X0), under the assumption that the null hypothesis is true
p=P{S(X)>=S(X0)|H0 true}
![Page 45: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/45.jpg)
45
Significance testing
Test of significance at the level:Reject the null hypothesis if your p-value
is smaller than the significance levelIt has advantages but not free from
criticisms Genes with p-values falling below a
prescribed level may be regarded as significant
![Page 46: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/46.jpg)
46
Hypothesis testing overview for a single gene
Reported decision
H0 is Rejected
(gene is Selected)
H0 is Accepted
(gene not Selected)
State of the nature ("Truth")
H0 is false
(Affected) TP, prob: 1-
FN, prob: 1-Type II error
Sensitiviy
TP/[TP+FN]
H0 is true
(Not Affected)
FP, P[Rej H0|H0]<=
Type I error
TN , prob: Specificity
TN/[TN+FP]
Positive predictive value
TP/[TP+FP]
Negative predictive value
TN/[TN+FN]
![Page 47: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/47.jpg)
47
Calculation of p-values
Standard methods for calculating p-values:
(i) Refer to a statistical distribution table (Normal, t, F, …) or
(ii) Perform a permutation analysis
![Page 48: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/48.jpg)
48
(i) Tabulated p-values
Tabulated p-values can be obtained for standard test statistics (e.g.the t-test)
They often rely on the assumption of normally distributed errors in the data
This assumption can be checked (approximately) using a HistogramQ-Q plot
![Page 49: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/49.jpg)
49
Example
Golub data, 27 ALL vs 11 AML samples, 3051 genesA t-test yields 1045 genes with p< 0.05
![Page 50: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/50.jpg)
50
(ii) Permutations tests
Based on data shuffling. No assumptions Random interchange of labels between samples Estimate p-values for each comparison (gene) by
using the permutation distribution of the t-statistics Repeat for every possible permutation, b=1…B
Permute the n data points for the gene (x). The first n1 are referred to as “treatments”, the second n2 as “controls”
For each gene, calculate the corresponding two samplet-statistic, tb
After all the B permutations are done putp = #{b: |tb| ≥ |tobserved|}/B
![Page 51: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/51.jpg)
51
Permutation tests (2)
![Page 52: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/52.jpg)
52
Volcano plot : fold change vs log(odds)1
Significant change detected No change detected1: log(odds) is proportional to “-log (p-value)”
![Page 53: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/53.jpg)
Linear models and Analysis of the Variance to
analyze designed experiments
![Page 54: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/54.jpg)
54
From experimental design to linear models
Some weaknesses of statistical frameworkWhat to do if treatment has more than 2 levels? How to deal with more than one treatment or
experimental condition?How to deal with nuisance factors such as
batch effects, covariates, etc…? Most of this can be solved with an
alternative approach: Analysis of the Variance
![Page 55: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/55.jpg)
Multiple testing
![Page 56: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/56.jpg)
56
How far can we trust the decision?
The test: "Reject H0 if p-val ≤ " is said to control the type I error because,
under a certain set of assumptions,the probability of falsely rejecting H0 is less than a fixed small threshold
Nothing is warranted about P[FN] “Optimal” tests are built trying to minimize this
probability In practical situations it is often high
![Page 57: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/57.jpg)
57
What if we wish to test more than one gene at once? (1)
Consider more than one test at onceTwo tests each at 5% level. Now probability of
getting a false positive is 1 – 0.95*0.95 = 0.0975Three tests 1 – 0.953 =0.1426n tests 1 – 0.95n
Converge towards 1 as n increases Small p-values don’t necessarily imply
significance!!! We are not controlling the probability of type I error anymore
![Page 58: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/58.jpg)
58
What if we wish to test more than one gene at once? (2): a simulation
Simulation of this process for 6,000 genes with 8 treatments and 8 controls
All the gene expression values were simulated i.i.d from a N (0,1) distribution, i.e. NOTHING is differentially expressed in our simulation
The number of genes falsely rejected will be on the average of (6000 · ), i.e. if we wanted to reject all genes with a p-value of less than 1% we would falsely reject around 60 genes
See example
![Page 59: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/59.jpg)
59
Multiple testing: Counting errors
Decision reported
H0 is Rejected
(Genes Selected)
H0 is accepted (Genes not Selected)
Total
State of the nature
("Truth")
H0 is false
(Affected)mm (S)
(m-mo)-(mm
(T) m-mo
H0 is true
(Not Affected)
m (V) mo-m (U) mo
Total M (R) m-m (m-R) m
V = # Type I errors [false positives]T = # Type II errors [false negatives]All these quantities could be known if m0 was known
![Page 60: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/60.jpg)
60
How does type I error control extend to multiple testing situations?
Selecting genes with a p-value less than doesn’t control for P[FP] anymore
What can be done?Extend the idea of type I error
FWER and FDR are two such extensions
Look for procedures that control the probability for these extended error types
Mainly adjust raw p-values
![Page 61: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/61.jpg)
61
Two main error rate extensions
Family Wise Error Rate (FWER) FWER is probability of at least one false
positiveFWER= Pr(# of false discoveries >0) = Pr(V>0)
False Discovery Rate (FDR) FDR is expected value of proportion of false
positives among rejected null hypothesesFDR = E[V/R; R>0] = E[V/R | R>0]·P[R>0]
![Page 62: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/62.jpg)
62
FDR and FWER controlling procedures
FWER Bonferroni (adj Pvalue = min{n*Pvalue,1})Holm (1979)Hochberg (1986)Westfall & Young (1993) maxT and minP
FDRBenjamini & Hochberg (1995)Benjamini & Yekutieli (2001)
![Page 63: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/63.jpg)
63
Difference between controlling FWER or FDR FWER Controls for no (0) false positives
gives many fewer genes (false positives), but you are likely to miss many adequate if goal is to identify few genes that differ
between two groups
FDR Controls the proportion of false positives if you can tolerate more false positives you will get many fewer false negatives adequate if goal is to pursue the study e.g. to
determine functional relationships among genes
![Page 64: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/64.jpg)
64
Steps to generate a list of candidate genes revisited (2)
Gene 1: M11, M12, …., M1k
Gene 2: M21, M22, …., M2k
…………….Gene G: MG1, MG2, …., MGk
For every gene, calculateSi=t(Mi1, Mi2, …., Mik),
e.g. t-statistics, S, B,…
A list of candidateDE genes
Statistics of interestS1, S2, …., SG
Assumption on the null distribution:data normality
Nominal p-valuesP1, P2, …, PG
Adjusted p-valuesaP1, aP2, …, aPG
Select genes with adjusted P-valuessmaller than
![Page 65: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/65.jpg)
65
Example
Golub data, 27 ALL vs 11 AML samples, 3051 genes
Bonferroni adjustment: 98 genes with padj< 0.05 (praw < 0.000016)
![Page 66: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/66.jpg)
66
Extensions
Some issues we have not dealt withReplicates within and between slidesSeveral effects: use a linear modelANOVA: are the effects equal?Time series: selecting genes for trends
Different solutions have been suggested for each problem
Still many open questions
![Page 67: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/67.jpg)
Examples
![Page 68: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/68.jpg)
68
Ex. 1- Swirl zebrafish experiment
Swirl is a point mutation causing defects in the organization of the developing embryo along its ventral-dorsal axis
As a result some cell types are reduced and others are expanded
A goal of this experiment was to identify genes with altered expression in the swirl mutant compared to the wild zebrafish
![Page 69: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/69.jpg)
69
Example 1: Experimental design
Each microarray contained 8848 cDNA probes (either genes or EST sequences)
4 replicate slides: 2 sets of dye-swap pairs For each pair, target cDNA of the swirl mutant
was labeled using one of Cy5 or Cy3 and the target cDNA of the wild type mutant was labeled using the other dye
Wild type Swirl
2
2
![Page 70: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/70.jpg)
70
Example 1. Data analysis
Gene expression data on 8848 genes for 4 samples (slides): Each hybridixed with Mutant and Wild type
On a gene-per-gene basis this is a one-sample problem
Hypothesis to be tested for each gene:H0: log2(R/G)=0
The decision will be based on average log-ratios
![Page 71: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/71.jpg)
71
Example 2 . Scanvenger receptor BI (SR-BI) experiment
Callow et al. (2000). A study of lipid metabolism and atherosclerosis susceptibility in mice.
Transgenic mice with SR-BI gene overexpressed have low HDL cholesterol levels.
Goal: To identify genes with altered expression in the livers of transgenic mice with SR-BI gene overexpressed mice (T) compared to “normal” FVB control mice (C).
![Page 72: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/72.jpg)
72
Example 2. Experimental design
8 treatment mice (Ti) and 8 control ones (Ci). 16 hybridizations: liver mRNA from each of the 16
mice (Ti , Ci ) is labelled with Cy5, while pooled liver mRNA from the control mice (C*) is labelled with Cy3.
Probes: ~ 6,000 cDNAs (genes), including 200 related to pathogenicity.
T
CC*
8
8
![Page 73: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/73.jpg)
73
Example 2. Data analysis
Gene expression data on 6348 genes for 16 samples: 8 for treatment (log T/C*) and 8 for control (log (C/C*))
On a gene-per-gene basis this is a 2 sample problem
Hypothesis to be tested for each gene:H0: [log (R1/G)-log (R2/G)]=0
Decision will be based on average difference of log ratios
![Page 74: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/74.jpg)
Software for microarray data analysis
![Page 75: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/75.jpg)
75
Introduction Microarray experiments generate huge
quantities of data which have to be Stored, managed, visualized, processed …
Many options available. However… No tool satisfies all user’s needs Trade-off. A tool must be
Powerful but user friendly Complete but without too many options, Flexible but easy to start with and go further Available, to date, well documented but affordable
![Page 76: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/76.jpg)
76
So, what you need is “R”? R is an open-source system for statistical
computation and graphics. It consists of A language A run-time environment with
Graphics, a debugger, and Access to certain system functions,
It can be used Interactively, through a command languageOr running programs stored in script files
![Page 77: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/77.jpg)
77
http://www.r-project.org/
![Page 78: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/78.jpg)
78
Some pro’s & con’s Powerful, Used by statisticians Easy to extend
Creating add-on packages Many already available
Freely available Unix, windows & Mac Lot of documentation
Not very easy to learn Command-based Documentation
sometimes cryptic Memory intensive
Worst in windows Slow at times
We believe the effort is worth the pity!!!• If you “just want to do statistical analysis”
Easy to find alternatives• If you intend to do microarray data analysis
Probably one of best options
![Page 79: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/79.jpg)
79
R and Microarrays
R is a popular tool between statisticians Once they started to work with microarrays
they continued using itTo perform the analysisTo implement new tools
This gave rise very fast to lots of free R-based software to analyze microarrays
The Bioconductor project groups many of these (but not all) developments
![Page 80: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/80.jpg)
80
The Bioconductor project
Open source and open development software project for the analysis and comprehension of genomic data.
Most early developments as R packages. Extensive documentation and training material from
short courseshttp://www.bioconductor.org/workshop.html.
Has reached some stability but still evolving !!! what is now a standard may not be so in a future.
![Page 81: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/81.jpg)
81
There's much more than R!
Give a look at
"My microarray software comparison"http://ihome.cuhk.edu.hk/~b400559/arraysoft.html
![Page 82: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/82.jpg)
Examples
![Page 83: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/83.jpg)
83
Ex. 1- Swirl zebrafish experiment
Swirl is a point mutation causing defects in the organization of the developing embryo along its ventral-dorsal axis
As a result some cell types are reduced and others are expanded
A goal of this experiment was to identify genes with altered expression in the swirl mutant compared to the wild zebrafish
![Page 84: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/84.jpg)
84
Example 1: Experimental design
Each microarray contained 8848 cDNA probes (either genes or EST sequences)
4 replicate slides: 2 sets of dye-swap pairs For each pair, target cDNA of the swirl mutant
was labeled using one of Cy5 or Cy3 and the target cDNA of the wild type mutant was labeled using the other dye
Wild type Swirl
2
2
![Page 85: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/85.jpg)
85
Example 1. Data analysis
Gene expression data on 8848 genes for 4 samples (slides): Each hybridixed with Mutant and Wild type
On a gene-per-gene basis this is a one-sample problem
Hypothesis to be tested for each gene:H0: log2(R/G)=0
The decision will be based on average log-ratios
![Page 86: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/86.jpg)
86
Example 2 . Scanvenger receptor BI (SR-BI) experiment
Callow et al. (2000). A study of lipid metabolism and atherosclerosis susceptibility in mice.
Transgenic mice with SR-BI gene overexpressed have low HDL cholesterol levels.
Goal: To identify genes with altered expression in the livers of transgenic mice with SR-BI gene overexpressed mice (T) compared to “normal” FVB control mice (C).
![Page 87: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/87.jpg)
87
Example 2. Experimental design
8 treatment mice (Ti) and 8 control ones (Ci). 16 hybridizations: liver mRNA from each of the 16
mice (Ti , Ci ) is labelled with Cy5, while pooled liver mRNA from the control mice (C*) is labelled with Cy3.
Probes: ~ 6,000 cDNAs (genes), including 200 related to pathogenicity.
T
CC*
8
8
![Page 88: Some Statistical Issues in Microarray Data Analysis Alex Sánchez Estadística i Bioinformàtica Departament d’Estadística Universitat de Barcelona Unitat](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d805503460f94a65014/html5/thumbnails/88.jpg)
88
Example 2. Data analysis
Gene expression data on 6348 genes for 16 samples: 8 for treatment (log T/C*) and 8 for control (log (C/C*))
On a gene-per-gene basis this is a 2 sample problem
Hypothesis to be tested for each gene:H0: [log (R1/G)-log (R2/G)]=0
Decision will be based on average difference of log ratios