microarrays. regulation of gene expression cells respond to environment heat food supply responds to...

Post on 20-Dec-2015

224 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Microarrays

Regulation of Gene Expression

Cells respond to environment

Heat

FoodSupply

Responds toenvironmentalconditions

Various external messages

Where gene regulation takes place

• Opening of chromatin

• Transcription

• Translation

• Protein stability

• Protein modifications

Transcriptional Regulation

• Strongest regulation happens during transcription

• Best place to regulate: No energy wasted making intermediate products

• However, slow response timeAfter a receptor notices a change:

1. Cascade message to nucleus

2. Open chromatin & bind transcription factors

3. Recruit RNA polymerase and transcribe

4. Splice mRNA and send to cytoplasm

5. Translate into protein

Transcription Factors Binding to DNA

Transcription regulation:

Certain transcription factors bind DNA

Binding recognizes DNA substrings:

Regulatory motifs

RNA Polymerase

TBP

Promoter and Enhancers

• Promoter necessary to start transcription

• Enhancers can affect transcription from afar

Enhancer 1 Enhancer 1 Enhancer 1

TATA box

Gene X

DNA binding sites

Transcription factors

Example: A Human heat shock protein

• TATA box: positioning transcription start

• TATA, CCAAT: constitutive transcription• GRE: glucocorticoid response• MRE: metal response• HSE: heat shock element

TATASP1CCAAT AP2HSEAP2CCAATSP1

promoter of heat shock hsp70

0--158

GENE

Motifs:

The Cell as a Regulatory Network

A B Make DC

If C then D

If B then NOT D

If A and B then D D

Make BD

If D then B

C

gene D

gene B

B

Promoter D

Promoter B

DNA Microarrays

Measuring gene transcription in a high-throughput fashion

Measuring transcription

AAAAAAAAA

Gene (DNA)

Transcript (RNA)

RNA polymerase – cellular enzyme

AAAAAAAAATTTTTTTTT

Synthetic primer (oligo dT)

Reverse transcriptase (RT) – Retroviral enzyme

- Flourescence tags

Extract RNA

Complementary DNA (cDNA)

Expression ~ RNA ~ flourescence

What is a microarray

What is a microarray (2)

• A 2D array of DNA sequences from thousands of genes

• Each spot has many copies of same gene

• Allow mRNAs from a sample to hybridize

• Measure number of hybridizations per spot

How to make a microarray

• Method 1: Printed Slides (Stanford)– Use PCR to amplify a 1 kb portion of each gene /

EST– Apply each sample on glass slide

• Method 2: DNA Chips (Affymetrix)– Grow oligonucleotides (20bp) on glass– Several words per gene (choose unique words)

If we know the gene sequences,

Can sample all genes in one experiment!

Microarray Experiment

RT-PCR

RT-PCR

LASER

DNA “Chip”

High glucose

Low glucose

Raw data – images

• Red (Cy5) dot – overexpressed or up-regulated

• Green (Cy3) dot – underexpressed or down-

regulated• Yellow dot

– equally expressed

• Intensity - “absolute” levelcDNA plotted microarray

Levels of analysis

• Level 1: Which genes are induced / repressed?Gives a good understanding of the biologyMethods: Factor-2 rule, t-test.

• Level 2: Which genes are co-regulated? Inference of function.-Clustering algorithms.

•Level 3: Which genes regulate others?Reconstruction of networks.- Transcriptions factor binding sites.

Experiment: time course

Time 0G

enes

Sample annotations

Gene annotations

Intensity (Red)Intensity (Green)

Experiment: time course

Time 0.5

Gen

esIntensity (Red)Intensity (Green)

Time 0

Experiment: time courseG

enes

00 0.50 20 50 70 90 110

Time (hours)

Gene expression database

Gen

es

Gene expression levels

Samples Sample annotations

Gene annotations

Gene expression matrix

Gene expression database

SamplesG

enes Gene expression

matrix

Timeseries,Conditions A, B, …Mutants in genes a, b …Etc.

Data normalization expression of gen x in experiment i expression of gen x in reference

Logarithm of ratio - treats induction and repression of identical

magnitude as numerical equal but with opposite sign.

red/green - ratio of expression– 2 - 2x overexpressed– 0.5 - 2x underexpressed

log2( red/green ) - “log ratio”– 1 2x overexpressed– -1 2x underexpressed

Xi log(Ei / Ri).

Analysis of multiple experiments

Xi log(Ei / Ri).

.,...,1 mXXX

Expression of gene x in m experiments can berepresented by an expression vector with m elements

Z-transformation:If

X ~ N(),

.

)(Xstdev

XXX i

i

.1

m

XX

m

ii

.

X

Z

Level 1

• 2-fold rule: Is a gene 2-fold up (or down) regulated?

• Students t-test: Is the regulation significantly different from background variation? (Needs repeated measurements)

T-test

X ~ N(), .: XH a

.:0 XHCannot reject H0

Reject H0 .

m

XZ

The p-value is the probability of drawing the wrong conclusion by rejecting a null hypothesis

Multiple testing

In a microarray experiment, we perform 1 test / gene

Prob (correct) = 1 – c

Prob (globally correct) = (1 – cn

Prob (wrong somewhere) = 1 - (1 – cn

e = 1 - (1 – cn

For small e : c en

Bonferroni correction for multiple testing ofindependent events

Single comparison

Experiment comparison

Multiple testing

Genes Treated 1 Treated 2 Control 1 Control 2 p-value

Gene 1 0.659081 0.97234 0.372675 0.69511 0.010362

Gene 2 0.341119 0.100549 0.56026 0.285965 0.052948

Gene 3 0.667136 0.29554 0.498284 0.019279 0.150739

Gene 4 0.880788 0.871784 0.552085 0.208167 0.20722

Gene 5 0.092942 0.756629 0.488266 0.84595 0.358535

Gene 6 0.07958 0.736049 0.022873 0.406469 0.391526

Gene 7 0.534497 0.146925 0.659746 0.951731 0.401714

Gene 8 0.062087 0.678039 0.979814 0.795904 0.418683

Gene 9 0.224166 0.17082 0.650215 0.16222 0.512849

Gene 10 0.372998 0.184738 0.353879 0.451197 0.545602

Gene 11 0.537619 0.853997 0.606766 0.083149 0.556954

Gene 12 0.232855 0.77575 0.275746 0.438622 0.58056

Gene 13 0.760863 0.508516 0.823947 0.074637 0.591919

Gene 14 0.568507 0.932771 0.72373 0.027096 0.60806

Gene 15 0.838437 0.549377 0.92673 0.100789 0.623721

Gene 16 0.017407 0.723751 0.310977 0.220452 0.836162

Gene 17 0.893638 0.293472 0.542273 0.886285 0.840617

Gene 18 0.536479 0.887943 0.859521 0.382404 0.861986

Gene 19 0.675622 0.604696 0.445713 0.916473 0.904506

Gene 20 0.836653 0.397073 0.438522 0.778742 0.986562

0.05

Significance

level

Clustering

Hierachical clustering: - Transforms n (genes) * m (experiments) matrixinto a diagonal n * n similarity (or distance) matrix

Similarity (or distance) measures:Euclidic distancePearsons correlation coefficent

Eisen et al. 1998 PNAS 95:14863-14868

Vectors in space: distances

Gene 1

Gene 2

Experiment 1

Experiment 3Experiment 2

d

Distance Measures: Minkowski Metric

r rm

iii

m

m

yxyxd

yyyy

xxxx

myx

||),(

)(

)(

1

21

21

by defined is metric Minkowski The

:features have both and objects two Suppose

Most Common Minkowski Metrics

||max),(

||),(

1

||),(

2

1

1

2 2

1

iimi

m

iii

m

iii

yxyxd

r

yxyxd

r

yxyxd

r

)distance sup"(" 3,

distance) (Manhattan 2,

) distance (Euclidean 1,

An Example

.4}3,4{max

.734

.5342 22

:distance sup"" 3,

:distance Manhattan 2,

:distance Euclidean 1,

4

3

x

y

Similarity Measures: Correlation Coefficient

. and :averages

)()(

))((),(

1

1

1

1

1 1

22

1

m

iim

m

iim

m

i

m

iii

m

iii

yyxx

yyxx

yyxxyxs

1),( yxs

Similarity Measures: Correlation Coefficient

Time

Gene A

Gene B Gene A

Time

Gene B

Expression LevelExpression Level

Expression Level

Time

Gene A

Gene B

Clustering of Genes and Conditions

• Unsupervised:– Hierarchical clustering– K-means clustering– Self Organizing Maps (SOMs)

Ordered dendrograms

Hierachical clustering:Hypothesis: guilt-by-associationCommon regulation -> common function

Eisen98

Hierarchical Clustering

Given a set of n items to be clustered, and an n*n distance (or similarity) matrix, the basic process hierarchical clustering is this:

1. Start by assigning each item to its own cluster, so that if you have n items, you now have n clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain.

2. Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster.

3. Compute distances (similarities) between the new cluster and each of the old clusters.

4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Merge two clusters by:

• Single-Link Method / Nearest Neighbor (NN): minimum of pairwise dissimilarities

• Complete-Link / Furthest Neighbor (FN): maximum of pairwise dissimilarities

• Unweighted Pair Group Method with Arithmetic Mean (UPGMA): average of pairwise dissimilarities

453652

cba

dcb

453,

cba

dc

Single-Link Method

453652

cba

dcb

Diagonal n*n distance Matrix

Euclidean Distance

ba

c d

(1)

c d

a,b

(2)

a,b,cd

(3)

a,b,c,d

4,, cbad

453652

cba

dcb

Complete-Link Method

ba

453652

cba

dcb

Distance Matrix

Euclidean Distance

465,

cba

dc6,,

badc

(1) (2) (3)

a,b

cc d

a,b

d c,da,b,c,d

Compare Dendrograms

a b c d a b c d

2

4

6

0

Single-Link Complete-Link

Serum stimulation of human fibroblasts (24h) Cholesterol biosynthesis

Celle cyclusI-E responseSignalling/ Angiogenesis

Wound healning

Partitioning

• k-means clustering• Self organizing maps (SOMs)

k-means clustering

Tavazoie et al. 1999 Nature Genet. 22:281-285

k-Means Clustering Algorithm

1) Select an initial partition of k clusters

2) Assign each object to the cluster with the closest centre

3) Compute the new centres of the clusters

4) Repeat step 2 and 3 until no object changes cluster

1. centroide

1. centroide

2. centroide

3. centroide

4. centroide

5. centroide

6. centroide

k = 6

1. centroide

2. centroide

3. centroide

5. centroide

6. centroide

k = 6

1. centroide2. centroide

3. centroide

4. centroide

5. centroide

6. centroide

k = 6

Self organizing maps

Tamayo et al. 1999 PNAS 96:2907-2912

1. centroide 2. centroide 3. centroide

4. centroide 5. centroide 6. centroide

k = (2,3) = 6

k = 6

k = 6

k = 6

top related