endre sebestyén, ari-has, martonvásár, hungary 26th, november, 2009 rcpgd annual meeting

27
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Upload: magnar

Post on 23-Feb-2016

38 views

Category:

Documents


3 download

DESCRIPTION

Identifying conserved promoter motifs and transcription factor binding sites in plant promoters. Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting. Transcription factor binding sites. TFs bind short, often degenerate DNA sequences - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Identifying conserved promoter motifs and transcription factor binding sites in plant promotersEndre Sebestyén, ARI-HAS, Martonvásár, Hungary

26th, November, 2009RCPGD Annual Meeting

Page 2: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Transcription factor binding sites• TFs bind short, often degenerate DNA sequences• Promoters are variable length 5’ sequences

▫ With TFBSs• TFBSs are usually conserved in a nonconserved

surrounding sequence• Some well known TFBSs

▫ TATA box▫ GC box▫ CpG island

• Lots of other, less genereal TFBSs

• Similarly expressed genes, or homologues should contain similar TFBSs

Page 3: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Transcription

Page 4: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

TFBS search and promoter analysis• Wet-lab methods

▫ DNAse footprinting▫ Electrophoretic mobility

shift assay▫ ChIP-Chip, ChIP-Seq

• In silico methods▫ Experimentally verified

sites Consensus sequences Consensus matrices

▫ De novo motif discovery Oligo frequency Phylogenetic

footprinting Other methods

Page 5: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Experimentally verified sites• TRANSFAC• JASPAR• PLACE• PlantCARE

Page 6: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

De novo motif discovery• Orthologous gene groups

▫ Evolutionary conserved functional sites

• Co-regulated genes▫ Same tissue, body part▫ Same developmental

stage▫ Etc

Page 7: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

„Real” promoter structure•No general motifs

▫No TATA-box, GC-box, etc•Lots of false positive TFBS

▫With wet-lab and in silico methods•Sometimes no apparent common TFBSs

between coregulated genes

Page 8: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Database of Orthologous Promoters• Orthologous promoter sequence collections

▫Based on a BLAST search with first exons of reference species Plants (Viridiplantae)

Reference species: Arabidopsis thaliana Chordates

Reference species: Homo sapiens

▫500/1000/3000 bp 5’ upstream regions Conserved sequence regions Annotations Xrefs to other databases Annotated transcription start sites

Page 9: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

DoOP http://doop.abc.hu

Page 10: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

DoOP cluster number

Page 11: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

DoOP subsets• Cluster > Subset

▫Subset: collection of evolutionary monophyletic sequences in a cluster

▫Plant subsets Brassicaceae

Arabidopsis thaliana Brassicaceae species

Eudicotyledons Grape, Solanum species, papaya, tobacco

Magnoliophyta Maize, rice

Viridiplantae

Page 12: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

DoOP subsets

Page 13: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting
Page 14: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

v1.5 v1.6 v1.80

5000

10000

15000

20000

25000

30000

35000

40000

45000 Other

Solanum tuberosum

Arabidopsis lyrata

Sorghum bicolor

Physcomitrella patens

Capsella rubella

Glycine max

Zea mays

Oryza sativa

Solanum lycopersicum

Nicotiana tabacum

Brassica napus

Lotus japonicus

Medicago truncatula

Vitis vinifera

Ricinus communis

Populus trichocarpa

Carica papaya

Boechera stricta

Brassica oleracea

Brassica rapa

Arabidopsis thaliana

Page 15: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Gene types – Gene Ontology•Standardized annotation for genes

▫Biological process What does it do?

Transcription, translation, stress response, etc▫Cellular component

Where is it located? Membrane, ribosome, cytosol, etc

▫Molecular function How does it work?

Dehydrogenase, ATP binding, etc

Page 16: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Gene types – Gene Ontology•500 bp promoters

▫Search for significantly enriched terms in annotation Brassicaceae Eudicotyledons Magnoliophyta Viridiplantae

BP: transcription, translation, protein folding, stress response

CC: plasma membrane, ribosome parts MF: ATP/GTP binding, DNA binding, ribosome parts

Page 17: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Motif generation

• Phylogenetic footprinting• Functional TFBSs should be conserved• Local sequence alignment• Define conserved regions

Page 18: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Motif generation

Magnoliophyta

eudicotyledons

Brassicaceae

Page 19: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Motif statisticsMotif number 500 1000 3000

Brassicaceae 323411 410720 893788

eudicotyledons 13863 20192 34353

Magnoliophyta 2009 2211 1938

Viridiplantae 589 565 372

Page 20: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Motif statistics% conserved 500 1000 3000Brassicaceae 22 19 16

eudicotyledons 5 3 2

Magnoliophyta 6 5 2

Viridiplantae 4 2 1

Avg length 500 1000 3000Brassicaceae 9 9 9

eudicotyledons 7 7 7

Magnoliophyta 8 9 8

Viridiplantae 9 9 9

Page 21: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

TFBS databasesDatabase TFBSsTRANSFAC 977

JASPAR 18

PLACE 416

PlantCARE 646

ABS 650

AGRIS 72

• Lots of redundant data• Low quality, not updated

• More than a 100 different version for TATA box

Page 22: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Synthetic biology• Synthetic biology

▫iGEM competition▫BioBricks▫MIT Registry of Standard Biological Parts

UV responsive promoter Promoter expressed in roots Etc

• Synthetic promoters▫Define basic promoter elements▫Build and use custom made promoters▫Gene expression more or less when and where you

want it

Page 23: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

SNP conservation•Gene expression levels change because

▫Regulatory elements change▫Usually NOT protein coding regions

•Conserved promoter regions might be functional regulatory elements▫Search for SNPs in this regions▫These SNPs might be interesting for

breeders as theye are likely to be functional ones

Page 24: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

A real example• Vilmos Soós, Endre Sebestyén, Angéla Juhász, János Pintér, Marnie

E. Light, Johannes Van Staden, Ervin Balázs (2009) Stress-related genes define essential steps in the response of maize seedlings to smoke-water. Functional and Integrative Genomics, Volume 9, Number 2, Pages 231-242; doi:10.1007/s10142-008-0105-8

•Microarray experiments▫Maize kernels (Mv 540)▫24 and 48 h – control vs smoke treated

samples▫Up and downregulated genes

Promoter sequences up to 1500 bp were extracted if available

Page 25: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Analysis of promoters• TRANSFAC database version 12.1

▫ Collection of TFBSs▫ More than a 100 plant TFBSs

DRE-element: GCCGAC

• Scan for the TFBSs in the maize promoters▫ Up and downregulated

• Also count the frequencies of all 5-8mer sequences▫ In all available maize promoters, not only the up or downregulated

• Calculate the over or underrepresentation of a TFBS by the following▫ Observed frequency in up or downregulated promoters divided by the

expected frequency in all promoters▫ If ratio > 1 : overrepresented▫ If ratio < 1 : underrepresented

Page 26: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Analysis of promoters•Results

▫Binding sites related to Organogenesis Meristem development Housekeeping functions

Biotic stress Cold and dehydration stress

ABA related motifs

Page 27: Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November, 2009 RCPGD Annual Meeting

Thank you for your attention!