analysis of promoter shifting using cage data · introduction to promoters and transcription start...

23
11/11/2014 1 Jack Binysh MRC Internship Analysis of Promoter Shifting Using CAGE data An insight into transcription regulation

Upload: others

Post on 09-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

11/11/2014 1Jack Binysh MRC Internship

Analysis of Promoter Shifting

Using CAGE data

An insight into transcription regulation

Page 2: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Outline

• Background

Introduction to Promoters and Transcription Start Sites (TSS’s)

Classification of Promoters

Motivation for Project

• Project

CAGE data

Previous work

CAGEr

Results

Future Work

11/11/2014 Jack Binysh MRC Internship 2

Page 3: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Promoters and TSS’s

How is regulation achieved?

• Promoter region

Contains regulatory

elements (binding motifs,

CG enrichment…)

Controls gene expression

• Context specific

• Dynamic

• Associated epigentics

Histone placement

Histone ‘marks’

11/11/2014 Jack Binysh MRC Internship 3

Page 4: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Classification

• Correlation betweenseveral features

Broad vs Sharp

CG islands vs TATA

Ordered vs DisordedHistones

General vs Specificfunction

11/11/2014 Jack Binysh MRC Internship 4

Page 5: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

CAGE Data

• Cap Analysis of Gene Expression

• mRNA captured, first ~20 bp

sequenced from 5’ end Tags

Full length estimated

Tags mapped to Genome

• TSS determination at bp resolution.

• Genome wide mapping of mRNA

transcription

• FANTOM5 – CAGE datasets

for many cell types

11/11/2014 Jack Binysh MRC Internship 5

A ‘TSS Profile’

Page 6: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Motivation for Project

• Already known that:

One Gene may have multiple types of promoter → regulated in

several ways

Variants of Transcription factors may exist in different cells

11/11/2014 Jack Binysh MRC Internship 6

Do the rules governing transcription change

between cell types? Both temporally (embryonic development) and spatially (in adult tissues) ?

•Focus on housekeeper genes – always expressed

•Look for changes in TSS profile between cell types…

Page 7: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

CAGEr

11/11/2014 Jack Binysh MRC Internship 7

Input Output

Methods

CAGEset

CAGE bam

files

Availa

ble

resourc

es

Custo

m input

CTSS files

TSS

Tag

clusters

(TC)

Normalized

expression

Page 8: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Clustering in CAGEr

• Two levels of clustering

TSS profiles Tag clusters

Tag cluster Consensus

11/11/2014 Jack Binysh MRC Internship 8

S1

S2

S3

consensus cluster

TCs

CTSSs

TCs

•Tag clusters sample specific

•Consensus clusters the

same for all samples

Page 9: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Extending CAGEr

11/11/2014 Jack Binysh MRC Internship 9

•Large datasets

• 1 sample ~ 46 million tag sites

•FANTOM5 has hundreds of samples

•Pairwise comparisons of datasets O(n2)

• if 1 comparison takes ~ 1 hour, 60

samples takes ~ 10 weeks!

•Need to speed things up, avoid doing every

comparison, etc.

Page 10: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Dendrogram

11/11/2014 Jack Binysh MRC Internship 10

•67 cell types compared

•Most show very little

shifting –’bulk’

•~ 8 ‘outliers’

Cardiac Myocytes

Sertoli Cells

Hepatocytes

Hair follicle papilla

CD 19

Renal Glomerular

Neurons

Aortic Endothelial

Page 11: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Heatmap

11/11/2014 Jack Binysh MRC Internship 11

•Each outlier is separated from

every other cell type

•The difference between two

outliers is greater than the

difference between one outlier

and the ‘bulk’

•Suggests a different set of

shifting promoters in every

outlier

Page 12: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Scatter plots

11/11/2014 Jack Binysh MRC Internship 12

Page 13: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Dinucleotide Density plots

• Each cluster has two

dominant TSS’s – modal

and sample specific

• Look for dinucleotide

enrichment in sequences

• Initiator sequence at

modal CTSS visible

• No obvious motif at the

TSS of the outlier

11/11/2014 Jack Binysh MRC Internship 13

Cardiac Myocytes

Centered on outlier TSS

Cardiac Myocytes

Centered on modal TSS

Page 14: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Motif discovery

• Motif discovery finds no specific motifs 500 bp either side of the outlier

TSS in any of the samples.

• All of the samples show general GC enrichment

• ~80% clusters overlap 1 annotated CpG island, ~20% overlap none

11/11/2014 Jack Binysh MRC Internship 14

Cardiac Myocytes

Page 15: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Gene Ontology

• Gene Ontology analysis

Each cluster associated with nearest annotated TSS & entrezgene ID

Keywords tagged to each entrezgene ID

Statistics on over/under representation of Keywords

11/11/2014 Jack Binysh MRC Internship 15

Cardiac Myocytes

Page 16: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Gene Ontology

11/11/2014 Jack Binysh MRC Internship 16

Cardiac Myocytes

•Significantly over-represented Biological functions tend to be housekeeping –

not cell specific

•Perhaps the shifting promoters are not involved with cell specific gene function

at all?

Page 17: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Future Work

• Repeat analysis using different consensus clusters

Problems with thresholds within analysis

More promising recent dinucleotide maps

11/11/2014 Jack Binysh MRC Internship 17

Page 18: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Future Work

• Analysis of non-shifting promoters

Looking at more general changes in shape

Eg . Dot product, linear scaling

11/11/2014 Jack Binysh MRC Internship 18

Page 19: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Extra slides…

11/11/2014 Jack Binysh MRC Internship 19

Page 20: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Previous Results

11/11/2014 Jack Binysh MRC Internship 20

•Zebrafish embryonic development

•Initial RNA transcriptome inherited from

mother, zygotic gene activation at Mid Blastula

Transition

•Corresponds to change in TSS profile

Sharp Broad

Position of TSS’s shift

Shifting Promoters

•“Differential promoter interpretation by the

maternal and zygotic transcription

machinery”

Page 21: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Shifting Promoters

11/11/2014 Jack Binysh MRC Internship 21

Search for genetic structure

correlated with this shifting

•TATA like enrichment always found

~30 bp upstream in Maternal

•In Zygote, boundary 50 bp

downstream of TSS

•Majority of TATA- like motifs not

canonical TATA boxes (W box)

Two Independent Mechanisms

for Transcription Initiation

Page 22: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Nucleosome Location

11/11/2014 Jack Binysh MRC Internship 22

•H3K4me3 Nucleosome locations

estimated at 4 developmental stages

•Alignment with Zygotic, but not

Maternal, TSS, 50bp downstream

Same location as boundary

• Suggests Zygotic mechanism for

positioning nucleosomes after MBT

Page 23: Analysis of Promoter Shifting Using CAGE data · Introduction to Promoters and Transcription Start Sites (TSS’s) Classification of Promoters Motivation for Project • Project CAGE

Internucleosomal Phasing Patterns

11/11/2014 Jack Binysh MRC Internship 23

•10 bp AA/TT dinucleotide enrichment

periodicity downstream of zygotic

TSS, but not maternal

•Weaker GC/AT enrichment pattern

matching nucleosome free and

wrapped DNA

•Zygotic,not maternal, TSS associated

with nucleosome positioning