generating quality metrics reports for microarray …...p < 0.01 p < 0.001 # genes 18 outlier...

29
Generating quality metrics reports for microarray data sets Audrey Kauffmann

Upload: others

Post on 28-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

Generating quality metrics reports for 

microarray data sets 

Audrey Kauffmann

Page 2: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

2

Introduction

• Microarrays are widely/routinely used

• Technology and protocol improvements → trustworthy

• Variance and noise– Technical causes:

• Platform

• Lab, experimentalist

• RNA extraction

• Amplification, labeling, hybridization, scanning...

– Biological causes:

• Tissue itself (cell lines, biopsies, blood...)

• Tissue contamination

• Clinical covariates (age, sex, race...)

• Cell cycle...

Page 3: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

3

Who is concerned?• Experimentalist investigating a set of samples

– Choose between different technology platforms, expt. Protocols– Decide when to repeat (certain parts of) the experiment

• Statistical collaborator analysing the experiment– Decide whether to proceed or to ask the experimentalist to go back to the lab

• Microarray core facility– Decide whether to consider their product fit for delivery to customer– Customer decides whether to be content (pay the bill)

• Integrative biologist analysing data in a public database– Has to choose which experiments– Which arrays within an experiment to consider

• Public data(base) provider– Put a quality score on each of her offerings

Page 4: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

4

At which step of the analysis?

• Importing the data

• Preprocessing: background correction, 

normalisation, summarization of probesets

• Differential Expression

• Gene set enrichment analysis

Page 5: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

5

At which step of the analysis?

• Importing the data

• Quality Assessment

• Preprocessing: background correction, 

normalisation, summarization of probesets

• Quality Assessment

• Differential Expression

• Gene set enrichment analysis

Page 6: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

6

At which step of the analysis?

• Importing the data

• Quality Assessment

• Preprocessing: background correction, 

normalisation, summarization of probesets

• Quality Assessment

• Differential Expression

• Gene set enrichment analysis

Remove outlier(s)

Page 7: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

7

What aspects to be evaluated?Which quality metrics?

Per Slide

• What are we looking at?– Intensity­dependent ratio– Detection of spatial effects

• How?– MAplots– Representation of the chip

Between Slides

•  What are we looking at?– Homogeneity– Outlier samples– Biological meaning

•  How?– Boxplots, density plots– Heatmap, PCA

Page 8: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

8

How to easily perform quality assessment?

• arrayQualityMetrics  ­  Bioconductor  package  for  Affymetrix, 

Agilent, Illumina, homemade arrays …

• From an R object ⇒ HTML report

• Plots:– MA plot and spatial representations

– Boxplots and density

– Heatmap and PCA

– Variance­mean dependency

– GC content and probe mapping studies

– Affymetrix only: NUSE, RLE, RNA degradation, QCstats, PM/MM

• Outlier identification

Page 9: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

9

Hands­on

• Run reports

Page 10: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

10

arrayQualityMetrics report – outlier detection

1.5 * IQR

outliers

Boxplot of the scores  S

Page 11: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

11

Fourier Transform

S i=∣ik

arrayQualityMetrics report – per array

S i=∑ lowFreqik

∑ highFreq ik

Page 12: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

12

S i =median I ik S i =IQR I ik

arrayQualityMetrics report – intensity distributions

Page 13: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

13

d ij =mean k ∣I ik−I jk∣

For each couple of arrays i and j, k is a probe and the distance between the arrays is:

S i=∑ d ij

arrayQualityMetrics report – Between arrays

Page 14: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

14

arrayQualityMetrics report – Affymetrix plots

Page 15: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

15

S i =median I ik S i =IQR I ik

arrayQualityMetrics report – Affymetrix NUSE: Normalised Unscaled Standard Error

• Fitting a probe level model (gene k, array j)

• Differences in variability between genes. An array with elevated SE 

(standard error) relative to the other arrays is of lower quality

Page 16: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

16

Why is outlier detection important? Example

• ArrayExpress experiment E­MEXP­886, cerebellar gene 

expression:

– 5 WT mice (15 weeks of age)

– 5 Atnx1 KO mice (15 weeks of age)

• Affymetrix MOE­430A (mouse) Genechip

• Ataxin 1 (Atxn1): protein of unknown function associated 

with cerebellar neurodegeneration in SpinoCerebellar 

Ataxia type 1 (SCA1), which impairs the of eye movement

Page 17: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

17

Results from E­MEXP­886 analysis

• Moderated t­test

• Most enriched KEGG Pathway

­5.65410 samples

Corrected t­value# Significant genes

Neuroactive ligand­receptor interaction

43410 samples

P < 0.001P < 0.01

# Genes

Page 18: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

18

Outlier array’s impact on results• Moderated t­test (limma)

• Most enriched KEGG Pathway

43410 samples

14190Without array 1

P < 0.001P < 0.01

# Genes

­11.5323Without array 1

­5.65410 samples

Corrected t­value# Significant genes

Neuroactive ligand­receptor interaction

Page 19: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

19

Validation of outlier’s array specificity

• Array #1 specific effect?

⇒Remove one by one each array and perform the moderated t­test

# Genes

P < 0.01 P < 0.001

10 samples 34 4

Without sample 1 190 14

Without sample 2 39 3

Without sample 3 29 2

Without sample 4 21 1

Without sample 5 12 1

Without sample 6 87 5

Without sample 7 23 4

Without sample 8 34 4

Without sample 9 17 2

Without sample 10 23 2

Page 20: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

20

Horror Picture Show

Page 21: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

21

Page 22: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

22

Page 23: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

23

Page 24: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

24

Page 25: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

25

Page 26: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

26

Page 27: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

27

Page 28: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

28

Page 29: Generating quality metrics reports for microarray …...P < 0.01 P < 0.001 # Genes 18 Outlier array’s impact on results • Moderated t test (limma) • Most enriched KEGG Pathway

29

Conclusions

• Quality assessment is important– Still needed– Give a first “taste” of the data– By removing outlier, the statistical power is increased

• arrayQualityMetrics package– One command line– Before preprocessing: to decide which normalisation– After normalisation: to check normalisation efficiency– Comprehensive report– Outlier detection