gene expression analyses - welcome to sandberg...

58
Rickard Sandberg Gene Expression Analyses Assistant Professor Ludwig Institute for Cancer Research Department of Cell and Molecular Biology Karolinska Institutet

Upload: others

Post on 20-Mar-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Rickard Sandberg

Gene Expression Analyses

Assistant Professor Ludwig Institute for Cancer Research Department of Cell and Molecular Biology Karolinska Institutet

Page 2: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Outline

- microarrays

- RNA-Seq

- Common gene expression analyses steps

- clustering of samples

- differential expression tests

- enrichment tests

Page 3: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Transcriptome analyses

- rRNAs (dominating, ~95%)

- mRNAs (~5%)

- long non-coding RNAs (e.g. lincRNAs) (~0.05%)

- snoRNAs, snRNAs

- microRNAs, piRNAs

Page 4: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Different protocols identify different parts of the transcriptome

PolyA selection

- rRNAs (dominating, ~95%)

- mRNAs (~5%)

- long non-coding RNAs (e.g. lincRNAs) (~0.05%)

- snoRNAs, snRNAs

- microRNAs, piRNAs

Page 5: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Different protocols identify different parts of the transcriptome

Ribominus (removal of

ribosomal RNAs)

not so random hexamers or DSN

- rRNAs (dominating, ~95%)

- mRNAs (~5%)

- long non-coding RNAs (e.g. lincRNAs) (~0.05%)

- snoRNAs, snRNAs

- microRNAs, piRNAs

Page 6: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Different protocols identify different parts of the transcriptome

small RNA protocol

- rRNAs (dominating, ~95%)

- mRNAs (~5%)

- long non-coding RNAs (e.g. lincRNAs) (~0.05%)

- snoRNAs, snRNAs

- microRNAs, piRNAs

Page 7: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

DNA microarrays

!oligonucleotide arrays (affymetrix, agilent, illumina etc) cDNA microarrays (competitive hybridization)

Page 8: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Important Considerations

§ Microarrays where designed based on EST-clusters § Probes mapping at multiple locations § Multiple probe sets mapping to the same gene !

§ Many projects curated microarray probes to only allow for uniquely mapping ones, e.g. customCDF

http://brainarray.mbni.med.umich.edu/Brainarray/Database/ CustomCDF/genomic_curated_CDF.asp

Page 9: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Basis of Microarrays

Page 10: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Steps in microarray analyses

§ Start with RAW data (for affy arrays = CEL files) § Normalize

àremove systematic strength biases àoften quantile normalization

§ Background adjust/transform àTries to estimate signal from background àlog2 transform (ratios problem, stabilize variance)

§ Gene (or probeset summarization) àmedian polish (fancy average of probes targeting

the same gene/transcript/probe set)

Page 11: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Gene Expression - Microarray data

§ Repositories of raw and processed data: àGene Expression Omnibus (GEO)

http://www.ncbi.nlm.nih.gov/geo/ àArrayExpress

http://www.ebi.ac.uk/microarray-as/ae/

§ Databases with Gene Expression Atlases àHuman, Mouse and Rat Tissue Atlas

Symatlas / BioGPShttp://biogps.gnf.org/

àCancer Gene expression atlas: oncominewww.oncomine.org

Page 12: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

!In what tissues are my gene expressed? using BioGPS (former symatlas)

http://biogps.gnf.org/

Page 13: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Finding experiments where my gene is differentially expressed

ArrayExpress GEO

§ Do not use updated CDFs (probe to transcript mappings) § Constantly evolving (hard to reproduce years later) § Offer no quality control § Limited capabilities for more comprehensive analyses

Page 14: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

What are the methods measuring?

• Expressed Sequence Tags• Traditional 3’UTR focused microarrays

• Exon and Tiling Arrays• Deep Sequencing using Illumina/Solexa, SOLiD, (454)

Page 15: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Isolate polyA+ RNA

mRNA-seq protocol

Wang et al. 2009 Nat Rev Gen

§ polyA+ RNAs § rRNA- RNAs § short RNAs (e.g. miRNAs) § Ribosome footprint

sequencing § GRO-Seq (Global Run On

sequencing) § CLIP-Seq (RNA-protein

interactions) !

§ non-RNA applications:ChIP-Seq, DNAse hypersensitive sites,...

Page 16: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Strand-specific RNA-Seq protocols

Page 17: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Genome Chromosome Fasta Files

+

Known and putative splice junctions Fasta File

2. map reads towards genome + junction compilation

GTAAGT-----------AG Exon n+1

1. compile sets of junctions

Exon n

Mapping of splice junctions

Page 18: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Tophat first MethodIdentifying the transcriptome

A B C identify candidate exons

via genomic mapping

A B C A B C Generate possible

pairings of exons

Align “unmappable”

reads to possible junctions

A B C A B C

Page 19: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Longer readsLonger reads

GATGTTCTCAGTGTCC GATGTAATCAGTGTCC AACCCTCTCAGTGTCC

>HWI-EAS229_75_30DY0AAXX:7:1:0:949

Very long (100Kb+) intron

By segmenting the long reads, and mapping the segments independently, we can

look harder for junctions we might have missed with shorter reads

Running time

independent of

intron size

Page 20: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Mapping to transcriptomeExons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

pre-mRNA

Transcription

AAAAA

RNA processing (splicing, polyadenylation)

mRNA AAAAA

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

Page 21: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Microexons and junction coverage

Exons 5’UTR 3’UTRIntronsGene:

DNA (genome)W

C

2 or more splice junctions within the same read

in-house mapping tophat mapping

Different read length will have different problems!

Page 22: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Finding novel non-annotated genes or transcript variants

Page 23: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Mapping'speed 308'M'reads'/'hour%'uniquely'mapping 60%'multimapping 25%'unmapped 15

Example of STAR aligned single-cell RNA-Seq data

281 719 splice junctions 279 356 with GT/AG 2 123 with GC/AG 215 with AT/AC

Page 24: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

TestesLiverSkeletal MuscleHeartAK074759BC011574AK092689

log 1

0(read

s) 02

02

02

02

3B

3A

3B

RNA-Seq generate quantitative expression estimates

<10M reads

Brain expression / UHR expression (Taqman)

Bra

in R

eads / U

HR

Reads (R

NA

-SE

Q)

104

R = 0.953

slope = .933103

102

101

100

10-1

10-2

10-3

10-4

104 103 102 101 100 10-1 10-2 10-3 10-4

Mortazavi et al. Nat Methods 2008 Ramskold et al. PLoS Comp Biol 2009

03691215 12.3

0.13 0.10Exon Intron Intergenic

MKPR

Wang*, Sandberg* et al. Nature 2008

150x

Page 25: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

How gene expression levels are estimated

gene A (2 kb transcript) gene B (600 bp transcript)

ACGCG... TCGAG... AGGTA... CCGTG... CTGCG...

Sequencing

FragmentationThe number of fragments are proportional to the abundance and length of the transcript.

Normalize for different transcripts lengths and different sequence depths in different samples.

RPKM (Reads per kilobase and million mappable reads): Given 10 million mappable reads:

RPKM, Gene A: 500 reads x 1000/2000 x 106/107

500 / (2 x 10) = 25 RPKM

RPKM roughly corresponds to transcripts per cell (Mortazavi et al. 2008) (assuming a standard cell with ~ 300.000 transcripts)

Fragments PKM (FPKM)

Page 26: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Gene quantification and mRNA copy numbers in cells

CN

X LT

=

X =109R T

C, number of reads mapping to transcript N, total number of sequenced reads !X, copies per cell of transcript T, total length of transcriptome L, transcript length !R, RPKM (reads per kilobase and million

mappable reads)

T, can be estimated from !1. starting amount of mRNA 2. spiked in controls 3. estimate transcriptome length - if 300.000 transcript of around 1500 nt each -> 4.5 *108

- 1 RPKM ~ 0.5 transcripts per cell

XN LC T= = 106

R T103

Page 27: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Depth needed for accurate expression level estimation

Perc

enta

ge o

f gen

es w

ithin

±20

% o

f fin

al e

xpre

ssio

n

100

80

60

40

20

01 5 10 15 20 25 30 35 40 45

1-9 RPKM (n=4338)10-29 RPKM (n=3048)30-99 RPKM (n=2817)100-999 RPKM (n=1469)1000-6705 RPKM (n=56)

Million mapped reads

B

A

01 5 10 15 20 25 30 35 40 45

Million mapped reads

Perc

enta

ge o

f gen

es w

ithin

fold

-cha

nge

of fi

nal e

xpre

ssio

n

100

80

60

40

20

2-fold1.5-fold1.2-fold1.1-fold1.05-fold

Mortazavi et al. 2008 Ramskold/Kavak et al. 2011 (bookchapter)

Page 28: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

RNA sequencing of blastocyst-derived cell lines

Read counts for selected genes

ES TS XEN EpiSCNanog 6525 20 1 263

Cdx2 124 6256 1 1

Sox17 11 5 9814 99

Sox3 151 1234 6 796

Shh 0 0 0 1

Ihh 4 12 107 17

Dhh 10 212 575 80

Page 29: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Significance of expression level

background RPKM ~ 0.05 RPKM detection level of 0.3 RPKM an average 1 500 nt transcript 20 M uniquely mapping reads !background model: 0.05 x 1.5 x 20 = 1.5 reads !expressed at 0.3 RPKM: 0.3 x 1.5 x 20 = 9 reads binomial test for 9 reads out of 20 M mapping to transcript given a background probability of 1.5 / 20x109 gives a p-value of 2.8e-5 !!expressed at 1 RPKM: 1 x 1.5 x 20 = 30 reads

0.05 RPKM 1 RPKM

Page 30: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Mixed species/strains experiments

§ Mixed species experiments allows mapping of host and pathogen interactions

§ Parasite-host interactions

§ Tumor-stroma interactions

Page 31: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Allele-sensitive RNA-seq using mouse crosses

Page 32: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Fusion events, e.g. translocations in cancer

Oszolak and Milos, Nature Rev Genet 2011

Page 33: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Outline

- microarrays

- RNA-Seq

- Common gene expression analyses steps

- clustering of samples

- differential expression tests

- enrichment tests

Page 34: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Early Quality Control

0.0

0.2

0.4

0.6

0.8

1.0

20% at 3'Middle20% at 5'

SMARTer

Varian

t #2

varia

nt #3

Optimize

d

varia

nt #1

varia

nt #4

Supplementary Figure 6. Read coverage across genes in single-cell RNA-Seq data.Fraction of reads mapping to the 20% 5’ most, the 20% 3' most, and the 60% in the middle region for all individual single-cell transcriptome data from HEK293T cells. Variant protocols are as the optimized except for differences in volume of TSO used (variant #1 use 2 ul instead of 1ul), template switching oligo (variant #2 uses rGrG+N, variant #4 uses rGrGrG) or preamplification enzyme (variant #3 uses Advantage 2).

fraction o

f m

apped r

eads

0.00

0.02

0.04

0.06

0.08

0.10

0.12

123

456

789

Read mapping (STAR to hg19)

Reads (

%)

0

20

40

60

80

100

No matchMultimappingUniquely mapping

fraction o

f m

apped r

eads

0.0

0.2

0.4

0.6

0.8

1.0

IntergenicIntronic Exonic

Number of mismatches:

Genomic regions

Variant #2

Variant #3

Optim

ized

variant #1

SM

ARTe

r

variant #4

Supplementary Figure 2. Mapping statistics for single-cell libraries generated using SMARTer, optimized Smart-Seq and variants of the optimized protocol.(A) The fraction of uniquely aligned reads with 1 to 9 mismatches for each single-cell RNA-

Seq library. (B) Percentage of reads that could be aligned uniquely, aligned to multiple

genomic coordinates (multimapping) or did not align for all single-cell RNA-Seq libraries. (C)

The fraction of uniquely aligned reads that mapped to exonic, intronic or intergenic regions

(annotations based on RefSeq gene models). Variant protocols are as the optimized except

for differences in volume of TSO used (variant #1 use 2 ul instead of 1ul), template switch-

ing oligo (variant #2 uses rGrG+N, variant #4 uses rGrGrG) or preamplification enzyme

(variant #3 uses Advantage 2).

A B

C

Variant #2

Variant #3

Optim

ized

variant #1

SM

ARTe

r

variant #4

Variant #2

Variant #3

Optim

ized

variant #1

SM

ARTe

r

variant #4

Page 35: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Biological QC Look at replicates and that samples group by

origin/type

Hierarchical clustering

−100

−50

0

50

100

150

í100 −50 0 50 100 150

PC3 (n=4)

T24(n=4)

Lncap (n=4)

SVD component 1

SVD

com

pone

nt 2

PCA / SVD

Page 36: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

U251

SNB-19

SF-295

SNB-75

HS-578T

SF-539

SF-268

BT-549

HOP-62

NCI-H226

A498

RXF-393

786-0

CAKI-1

UO-31

ACHN

TK-10

MDA-MB-231

HOP-92

SN12C

ADR-RES

OVCAR-8

LOXIMVI

PC-3

OVCAR-3

OVCAR-4

IGROV1

SK-OV-3

OVCAR-5

DU-145

EKVX

A549

NCI-H460

RPMI-8226

K562

K562

K-562

HL-60

MOLT-4

CCRF-CEMSR

HCT-116

SW-620

HCT-15

KM12

HCC-2998

COLO205

HT-29

MCF7

MCF7

MCF7

T-47D

NCI-H322

NCI-H23

NCI-H522

SK-MEL-5

MDA-MB435

MDA-N

M-14

SK-MEL-28

UACC-257

MALME-3M

UACC-62

SK-MEL-2A

1.00

-1.00

0.60

0.20

-0.20

-0.60

leukaemia colon melanomaCNS renal ovarian

breastprostatenon-small-lung

NCI60 cell line expression clustering

ordering pretty arbitrary

Careful about high order clustering

Page 37: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Singular Value Decompostion (SVD)Genes

e_0m

e_30m

e_60m

e_90m

e_120m

e_150m

e_180m

e_210m

e_240m

e_270m

e_300m

e_330m

e_360m

e_390m

Arrays

Genes

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Eigenarrays

1413121110987654321

Eigenarrays

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Eigengenes

1413121110987654321

Eigengenes

e_0m

e_30m

e_60m

e_90m

e_120m

e_150m

e_180m

e_210m

e_240m

e_270m

e_300m

e_330m

e_360m

e_390m

Arrays

Page 38: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

QC: Similarities between replicates

0 hr

6 hr

48 hrSa

mpl

e Pr

ojec

tion

(eig

enge

ne 2

, 31%

)

Sample Projection (eigengene 1, 52%)

Eigengenes 0 hr 6 hr 48 hr 0 hr 6 hr 48 hr

SVD Analysis of Mouse T-cell Stimulation

Captures 83% of variation

Page 39: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

QC: Outliers

Embryoid bodiesSonic Hedgehog induced

?

Page 40: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Differential Expression

Either based on reads or RPKM values

Most tools developed for microarrays are based on probe set expression values, whereas RNA-Seq tools aim to use read counts !Reads • have more statistical power • have unresolved biases • need fewer replicates? !

Expression levels, RPKMs • better understood statistics, but has less power

Page 41: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Statistical models of differential expression

Page 42: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Statistical models of differential expression

Page 43: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Transcript length effects in differential expression tests

Oshlack and Wakefield Biology Direct 2009

p-values should not be the basis for sorting

Page 44: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

non-coding RNAs in prostate cancer: Expression and differential expression

Page 45: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Enrichment analyses

Page 46: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Goals of enrichment analyses

Page 47: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Factors to consider

Page 48: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Gene Sets, e.g. pathways and gene ontology

§ Gene Ontology § KEGG § BioCarta § PANTHER !

§ Chromosomal location

§ Genes found differentially expressed in another experiment

Page 49: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Two strategies

Page 50: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

List-based enrichment analyses

Gene In List Gene NOT In List

In Category a bNOT In Category c d

all genes

in category

gene set

in category

Page 51: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Assessing significance

Page 52: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

DAVID

Page 53: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Query many types of gene sets in one go

Current Background: HOMO SAPIENS Check Defaults ! • Main Accessions (0 selected) • Other Accessions (0 selected) • Gene Ontology (3 selected) • Protein Domains (3 selected) • Pathways (3 selected) • General Annotations (0 selected) • Functional Categories (3 selected) • Protein Interactions (0 selected) • Literature (0 selected) • Disease (1 selected) • Tissue Expression

Page 54: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Gene set enrichment analyses (GSEA)

Page 55: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N
Page 56: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Molecular Signature db

Page 57: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Gene Ontology analyses

§ Note: Background matterschoosing the wrong background set of genes may affect/confound your results

§ Depends upon preselected categories !

§ List-dependente.g. DAVID, http://david.abcc.ncifcrf.gov/ !

§ List-independent methodse.g. GSEA, http://www.broad.mit.edu/gsea/

Page 58: Gene Expression Analyses - welcome to sandberg labsandberg.cmb.ki.se/media/data/courses/bioinfocell/Gene... · 2014-05-22 · Gene quanti"cation and mRNA copy numbers in cells C N

Questions?