microarray data analysis of illumina data using r/bioconductor reddy gali, ph.d....

33
Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. [email protected] [email protected] http://catalyst.harvard.edu

Upload: garry-ryan

Post on 12-Jan-2016

236 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Microarray Data Analysis of Illumina Data Using R/Bioconductor

Reddy Gali, [email protected]@rt.med.harvard.edu

http://catalyst.harvard.edu

Page 2: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Agenda

• Introduction to microarrays• Workflow of a gene expression microarray experiment • Microarray experimental design• Public microarray databases• Microarray preprocessing - Quality control and Diagnostic

analysis

2

Page 3: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Agenda

3

• Introduction to R/Bioconductor• Installation of R and Bioconductor Packages• General data analysis and strategies• Data analysis using lumi package• Data analysis using limma package

Page 4: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Workflow of Gene Expression

4

Biological question Experimental design

Tissue / sample preparation

Extraction of Total RNA

Microarray hybridization & processing

Image analysis

Probe amplification & labeling

Data analysisExpression measures - Normalization - Statistical Filtering - Clustering - Pathway analysis

Biological Verification

QC

QC

QC

QC

QC

Page 5: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Pitfalls of Microarray Experiment

5

• Gene expression changes detected by microarray analysis cannot be validated by other methods

- Inadequate design

- Data quality is low

- Statistical approach is not adequate - Expression level of gene is below detection limit

- Change in gene expression is small

- Microarray detection probe is not specific or not sensitive

Page 6: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Questions usually asked

6

• What kind of technology or microarrays I have to use• How many replicates do I need• What is a real replicate• Do I need statistical advice• Should I do technical replicate• Should I pool my samples• How do I analyze my dataset• What software should I use

Page 7: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Design of Microarray Experiment

7

• Replicates• Goal, resources, technology, quality, design and

analysis• Two fold change – 3 replicates • Smaller change – 5 replicates• Technical replicates and Biological replicates

• Sample pooling• Amount of sample• Replicates of pooled sample• No way to find variance between samples

Page 8: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Gene Expression Omnibus- GEO

8

Page 9: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Public Microarray Databases

9

• BodyMap - http://bodymap.ims.u-tokyo.ac.jp/• SMD - http://genome-www5.stanford.edu/• RIKEN - http://read.gsc.riken.go.jp/• MGI - http://www.informatics.jax.org/• GEO - http://www.ncbi.nlm.nih.gov/geo/• CIBEX - http://cibex.nig.ac.jp/index.jsp• ArrayExpress - http://www.ebi.ac.uk/microarray-as/ae/

Page 10: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Microarray Platforms

10

• Agilent Microarrays 60-mer format

• Codelink Bioarrays 30-mer format

• Affymetrix GeneChips 25-mer format

• Illumina Beadchips

• NimbleGen 60-mer format

Page 11: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Illumina Bead Array Technology

11

Silica Beads

Each bead is covered with hundreds of thousands of copies of a specific oligonucleotide

Page 12: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Some Facts

• Each bead carries copies of probes with, on average, 30 replicates of every bead type per array

• Around 105 copies of a particular DNA sequence of interest are covalently attached to each bead

• DNA sequences (oligonucleoties) attached to the beads are 75 base pairs in length, with 25 base pairs used for decoding and 50 base pairs used for target hybridization

• A pool of different bead types is created, beads of the same type having the same probe sequence attached

Page 13: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Box Plots of unnormalized data

13

Page 14: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Raw vs Normalized data

14

Raw Data Normalized Data

Page 15: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Histograms of unnormalized data

15

Page 16: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Why Normalize

16

• It adjusts the individual hybridization intensities to balance them appropriately so that meaningful biological comparisons can be made.

• Unequal quantities of starting RNA• Differences in labeling or detection efficiencies between the

fluorescent dyes used

• Systematic biases in the measured expression levels. • Sample preparationSample preparation• Variability in hybridizationVariability in hybridization• Spatial effectsSpatial effects• Scanner settingsScanner settings• Experimenter biasExperimenter bias

Page 17: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Free Software – Data analysis

17

• BioconductorBioconductor– is an open source and open development software

project to provide tools for the analysis and comprehension of genomic data.

• TMEV 4.0TMEV 4.0– is an application that allows the viewing of

processed microarray slide representations and the identification of genes and expression patterns of interest.

Page 18: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

R / Bioconductor

18

• R and Bioconductor packages• R (http://cran.r-project.org/ )is a comprehensive

statistical environment and programming language for professional data analysis and graphical display.

• Bioconductor (http://www.bioconductor.org/) is an open source and open development software project for the analysis of microarray, sequence and genome data.

• More 300 Bioconductor packages.• http://faculty.ucr.edu/~tgirke/Documents/R_BioC

ond/R_BioCondManual.html

Page 19: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

R / Bioconductor - Installation

19

Page 20: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Preparing R for analysis

Page 21: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Preparing R for analysis

Page 22: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Preparing R for analysis

Page 23: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Preparing R for analysis

Page 24: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Preparing R for analysis

Page 25: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Analysis using lumi R package

- Loading data into R/Bioconductor

>lumi_data <- lumiR(‘worshop_data.csv')

- Summary of the loaded data

>lumi_data- Quality control of loaded data

>summary(lumi_data, 'QC')

Page 26: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

>density(lumi.Rdata)

Page 27: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

>boxplot(lumi.Rdata)

Page 28: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

>MAplot(lumi.Rdata)

Page 29: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

>> plot(lumi.Rdata, what='sampleRelation')

>> plot(lumi.Rdata, what=‘cv')

>> plot(lumi.Rdata, what=‘outlier')

Page 30: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Variance Stabilization

> lumi.Tdata <- lumiT(lumi.Rdata)

> lumi.VSdata <- plotVST(lumi.Tdata)

Page 31: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Normalization

> lumi.Ndata <- lumiN(lumi.Tdata)

Or Do all the default preprocessing in one step

> lumi.N.Q <- lumiExpresso(lumi.Rdata)– Background Correction: bgAdjust– Variance Stabilizing Transform method: vst– Normalization method: quantile

– Perform all the QC again> summary(lumi.Ndata, 'QC')

Page 32: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

Differential expression

• >design <- model.matrix(~ -1 + factor(c(1, 1, 1,1, 2, 2, 2,2)))

• >colnames(design) = c("control","affected")

• >fit <- lmFit(lumi.Ndata, design)

• >cont.matrix <- makeContrasts(signature = affected - control,levels=design)

• >fit2 <- contrasts.fit(fit, cont.matrix)

• >ebFit <- eBayes(fit2)

• >results <- topTable(ebFit, number=100, sort.by="B", resort.by="M")

• >print(results)

• >write.table(topTable(ebFit, coef=1, adjust="fdr", sort.by="B", number=25000), file="results.xls", row.names=F, sep="\t")

Page 33: Microarray Data Analysis of Illumina Data Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu submit-c3-bioinformatics@rt.med.harvard.edu

33http://catalyst.harvard.edu

Reddy Gali, [email protected]: 617 432 7471

Thank you