microarrays. regulation of gene expression cells respond to environment heat food supply responds to...

57
Microarrays

Post on 20-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Microarrays

Page 2: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Regulation of Gene Expression

Page 3: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Cells respond to environment

Heat

FoodSupply

Responds toenvironmentalconditions

Various external messages

Page 4: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Where gene regulation takes place

• Opening of chromatin

• Transcription

• Translation

• Protein stability

• Protein modifications

Page 5: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Transcriptional Regulation

• Strongest regulation happens during transcription

• Best place to regulate: No energy wasted making intermediate products

• However, slow response timeAfter a receptor notices a change:

1. Cascade message to nucleus

2. Open chromatin & bind transcription factors

3. Recruit RNA polymerase and transcribe

4. Splice mRNA and send to cytoplasm

5. Translate into protein

Page 6: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Transcription Factors Binding to DNA

Transcription regulation:

Certain transcription factors bind DNA

Binding recognizes DNA substrings:

Regulatory motifs

Page 7: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

RNA Polymerase

TBP

Promoter and Enhancers

• Promoter necessary to start transcription

• Enhancers can affect transcription from afar

Enhancer 1 Enhancer 1 Enhancer 1

TATA box

Gene X

DNA binding sites

Transcription factors

Page 8: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Example: A Human heat shock protein

• TATA box: positioning transcription start

• TATA, CCAAT: constitutive transcription• GRE: glucocorticoid response• MRE: metal response• HSE: heat shock element

TATASP1CCAAT AP2HSEAP2CCAATSP1

promoter of heat shock hsp70

0--158

GENE

Motifs:

Page 9: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

The Cell as a Regulatory Network

A B Make DC

If C then D

If B then NOT D

If A and B then D D

Make BD

If D then B

C

gene D

gene B

B

Promoter D

Promoter B

Page 10: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

DNA Microarrays

Measuring gene transcription in a high-throughput fashion

Page 11: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Measuring transcription

AAAAAAAAA

Gene (DNA)

Transcript (RNA)

RNA polymerase – cellular enzyme

AAAAAAAAATTTTTTTTT

Synthetic primer (oligo dT)

Reverse transcriptase (RT) – Retroviral enzyme

- Flourescence tags

Extract RNA

Complementary DNA (cDNA)

Expression ~ RNA ~ flourescence

Page 12: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

What is a microarray

Page 13: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

What is a microarray (2)

• A 2D array of DNA sequences from thousands of genes

• Each spot has many copies of same gene

• Allow mRNAs from a sample to hybridize

• Measure number of hybridizations per spot

Page 14: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

How to make a microarray

• Method 1: Printed Slides (Stanford)– Use PCR to amplify a 1 kb portion of each gene /

EST– Apply each sample on glass slide

• Method 2: DNA Chips (Affymetrix)– Grow oligonucleotides (20bp) on glass– Several words per gene (choose unique words)

If we know the gene sequences,

Can sample all genes in one experiment!

Page 15: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Microarray Experiment

RT-PCR

RT-PCR

LASER

DNA “Chip”

High glucose

Low glucose

Page 16: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Raw data – images

• Red (Cy5) dot – overexpressed or up-regulated

• Green (Cy3) dot – underexpressed or down-

regulated• Yellow dot

– equally expressed

• Intensity - “absolute” levelcDNA plotted microarray

Page 17: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Levels of analysis

• Level 1: Which genes are induced / repressed?Gives a good understanding of the biologyMethods: Factor-2 rule, t-test.

• Level 2: Which genes are co-regulated? Inference of function.-Clustering algorithms.

•Level 3: Which genes regulate others?Reconstruction of networks.- Transcriptions factor binding sites.

Page 18: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Experiment: time course

Time 0G

enes

Sample annotations

Gene annotations

Intensity (Red)Intensity (Green)

Page 19: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Experiment: time course

Time 0.5

Gen

esIntensity (Red)Intensity (Green)

Time 0

Page 20: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Experiment: time courseG

enes

00 0.50 20 50 70 90 110

Time (hours)

Page 21: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Gene expression database

Gen

es

Gene expression levels

Samples Sample annotations

Gene annotations

Gene expression matrix

Page 22: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Gene expression database

SamplesG

enes Gene expression

matrix

Timeseries,Conditions A, B, …Mutants in genes a, b …Etc.

Page 23: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Data normalization expression of gen x in experiment i expression of gen x in reference

Logarithm of ratio - treats induction and repression of identical

magnitude as numerical equal but with opposite sign.

red/green - ratio of expression– 2 - 2x overexpressed– 0.5 - 2x underexpressed

log2( red/green ) - “log ratio”– 1 2x overexpressed– -1 2x underexpressed

Xi log(Ei / Ri).

Page 24: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Analysis of multiple experiments

Xi log(Ei / Ri).

.,...,1 mXXX

Expression of gene x in m experiments can berepresented by an expression vector with m elements

Z-transformation:If

X ~ N(),

.

)(Xstdev

XXX i

i

.1

m

XX

m

ii

.

X

Z

Page 25: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Level 1

• 2-fold rule: Is a gene 2-fold up (or down) regulated?

• Students t-test: Is the regulation significantly different from background variation? (Needs repeated measurements)

Page 26: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

T-test

X ~ N(), .: XH a

.:0 XHCannot reject H0

Reject H0 .

m

XZ

The p-value is the probability of drawing the wrong conclusion by rejecting a null hypothesis

Page 27: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Multiple testing

In a microarray experiment, we perform 1 test / gene

Prob (correct) = 1 – c

Prob (globally correct) = (1 – cn

Prob (wrong somewhere) = 1 - (1 – cn

e = 1 - (1 – cn

For small e : c en

Bonferroni correction for multiple testing ofindependent events

Single comparison

Experiment comparison

Page 28: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Multiple testing

Genes Treated 1 Treated 2 Control 1 Control 2 p-value

Gene 1 0.659081 0.97234 0.372675 0.69511 0.010362

Gene 2 0.341119 0.100549 0.56026 0.285965 0.052948

Gene 3 0.667136 0.29554 0.498284 0.019279 0.150739

Gene 4 0.880788 0.871784 0.552085 0.208167 0.20722

Gene 5 0.092942 0.756629 0.488266 0.84595 0.358535

Gene 6 0.07958 0.736049 0.022873 0.406469 0.391526

Gene 7 0.534497 0.146925 0.659746 0.951731 0.401714

Gene 8 0.062087 0.678039 0.979814 0.795904 0.418683

Gene 9 0.224166 0.17082 0.650215 0.16222 0.512849

Gene 10 0.372998 0.184738 0.353879 0.451197 0.545602

Gene 11 0.537619 0.853997 0.606766 0.083149 0.556954

Gene 12 0.232855 0.77575 0.275746 0.438622 0.58056

Gene 13 0.760863 0.508516 0.823947 0.074637 0.591919

Gene 14 0.568507 0.932771 0.72373 0.027096 0.60806

Gene 15 0.838437 0.549377 0.92673 0.100789 0.623721

Gene 16 0.017407 0.723751 0.310977 0.220452 0.836162

Gene 17 0.893638 0.293472 0.542273 0.886285 0.840617

Gene 18 0.536479 0.887943 0.859521 0.382404 0.861986

Gene 19 0.675622 0.604696 0.445713 0.916473 0.904506

Gene 20 0.836653 0.397073 0.438522 0.778742 0.986562

0.05

Significance

level

Page 29: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Clustering

Hierachical clustering: - Transforms n (genes) * m (experiments) matrixinto a diagonal n * n similarity (or distance) matrix

Similarity (or distance) measures:Euclidic distancePearsons correlation coefficent

Eisen et al. 1998 PNAS 95:14863-14868

Page 30: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Vectors in space: distances

Gene 1

Gene 2

Experiment 1

Experiment 3Experiment 2

d

Page 31: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Distance Measures: Minkowski Metric

r rm

iii

m

m

yxyxd

yyyy

xxxx

myx

||),(

)(

)(

1

21

21

by defined is metric Minkowski The

:features have both and objects two Suppose

Page 32: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Most Common Minkowski Metrics

||max),(

||),(

1

||),(

2

1

1

2 2

1

iimi

m

iii

m

iii

yxyxd

r

yxyxd

r

yxyxd

r

)distance sup"(" 3,

distance) (Manhattan 2,

) distance (Euclidean 1,

Page 33: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

An Example

.4}3,4{max

.734

.5342 22

:distance sup"" 3,

:distance Manhattan 2,

:distance Euclidean 1,

4

3

x

y

Page 34: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Similarity Measures: Correlation Coefficient

. and :averages

)()(

))((),(

1

1

1

1

1 1

22

1

m

iim

m

iim

m

i

m

iii

m

iii

yyxx

yyxx

yyxxyxs

1),( yxs

Page 35: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Similarity Measures: Correlation Coefficient

Time

Gene A

Gene B Gene A

Time

Gene B

Expression LevelExpression Level

Expression Level

Time

Gene A

Gene B

Page 36: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Clustering of Genes and Conditions

• Unsupervised:– Hierarchical clustering– K-means clustering– Self Organizing Maps (SOMs)

Page 37: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Ordered dendrograms

Hierachical clustering:Hypothesis: guilt-by-associationCommon regulation -> common function

Eisen98

Page 38: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Hierarchical Clustering

Given a set of n items to be clustered, and an n*n distance (or similarity) matrix, the basic process hierarchical clustering is this:

1. Start by assigning each item to its own cluster, so that if you have n items, you now have n clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.

2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.

3. Compute distances (similarities) between the new cluster and each of the old clusters.

4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Page 39: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Merge two clusters by:

• Single-Link Method / Nearest Neighbor (NN): minimum of pairwise dissimilarities

• Complete-Link / Furthest Neighbor (FN): maximum of pairwise dissimilarities

• Unweighted Pair Group Method with Arithmetic Mean (UPGMA): average of pairwise dissimilarities

Page 40: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

453652

cba

dcb

453,

cba

dc

Single-Link Method

453652

cba

dcb

Diagonal n*n distance Matrix

Euclidean Distance

ba

c d

(1)

c d

a,b

(2)

a,b,cd

(3)

a,b,c,d

4,, cbad

Page 41: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

453652

cba

dcb

Complete-Link Method

ba

453652

cba

dcb

Distance Matrix

Euclidean Distance

465,

cba

dc6,,

badc

(1) (2) (3)

a,b

cc d

a,b

d c,da,b,c,d

Page 42: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Compare Dendrograms

a b c d a b c d

2

4

6

0

Single-Link Complete-Link

Page 43: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Serum stimulation of human fibroblasts (24h) Cholesterol biosynthesis

Celle cyclusI-E responseSignalling/ Angiogenesis

Wound healning

Page 44: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Partitioning

• k-means clustering• Self organizing maps (SOMs)

Page 45: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

k-means clustering

Tavazoie et al. 1999 Nature Genet. 22:281-285

Page 46: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

k-Means Clustering Algorithm

1) Select an initial partition of k clusters

2) Assign each object to the cluster with the closest centre

3) Compute the new centres of the clusters

4) Repeat step 2 and 3 until no object changes cluster

Page 47: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages
Page 48: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

1. centroide

Page 49: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

1. centroide

2. centroide

3. centroide

4. centroide

5. centroide

6. centroide

k = 6

Page 50: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

1. centroide

2. centroide

3. centroide

5. centroide

6. centroide

k = 6

Page 51: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

1. centroide2. centroide

3. centroide

4. centroide

5. centroide

6. centroide

k = 6

Page 52: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Self organizing maps

Tamayo et al. 1999 PNAS 96:2907-2912

Page 53: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages
Page 54: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

1. centroide 2. centroide 3. centroide

4. centroide 5. centroide 6. centroide

k = (2,3) = 6

Page 55: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

k = 6

Page 56: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

k = 6

Page 57: Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

k = 6