comparative analysis of eukaryotic genes mar albà barcelona biomedical research park
TRANSCRIPT
![Page 1: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/1.jpg)
Comparative analysis of eukaryotic genes
Mar Albà
http://genomics.imim.es/evolgenomeBarcelona Biomedical Research Park
![Page 2: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/2.jpg)
Genome Projects
GOLD: Genomes Online Database (www.genomesonline.org)
![Page 3: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/3.jpg)
Genome Projects
GOLD: Genomes Online Database (www.genomesonline.org)
![Page 4: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/4.jpg)
Genome Projects
GOLD: Genomes Online Database (www.genomesonline.org)
![Page 5: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/5.jpg)
Genome Browsers
-NCBI Map Viewerhttp://www.ncbi.nlm.nih.gov/mapview/
-Ensemblhttp://www.ensembl.org
-UCSC Genome Browserhttp://genome.cse.ucsc.edu
The three databases use the same genome assembly, which is generated by NCBI.
![Page 6: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/6.jpg)
Ensembl
![Page 7: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/7.jpg)
Ensembl
-genomic regions-alignments with synthenic sequences
-genes- Homologs, SNPs
- transcripts- EMBL mRNAS, ESTs, Expression
-proteins-Gene Ontology (function), protein domains, diseaseassociations
![Page 8: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/8.jpg)
Ensembl - Biomart
- retrieval of information on gene datasets
![Page 9: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/9.jpg)
Gene comparative sequence analysis
Genome and transcriptome projects have generated a vast amount of information on protein-coding and non-coding gene sequences.
Identification of conserved sequences in different genes can help us understand gene evolution and identify functional regions.
species 1
species 2x N genes(orthologs)
...
promoter
coding
species m
![Page 10: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/10.jpg)
Non-coding sequences in vertebrate genomes
-only 1.2% of the human genome codes for proteins
-but 5% exhibits high sequence conservation levels, compatible with negative selection (MGSC, 2002)
-non-coding- Transcription regulatory regions- Introns- Non-protein coding exons/genes (miRNAs, etc.)- Repetitive elements (Alus, etc.)- Ultra-conserved elements
![Page 11: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/11.jpg)
Gene transcription regulatory sequences
Maston et al., 2006Annu. Rev. Genomics Hum. Genet. 7: 29-59
![Page 12: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/12.jpg)
Frequently-found metazoan motifs in the core promoter
Maston et al., 2006
![Page 13: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/13.jpg)
Wray et al. (2003), Mol. Biol. Evol. 20(9):1377-1419.
Eukaryotic promoter diversity
![Page 14: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/14.jpg)
High evolvability of regulatory sequences
-most of the changes in regulatory networks are likely to occur in cis; changes in trans (transcription factors) may often have too strong effects.
-one single mutation may lead to the acquisition of a newDNA-factor interaction (rapid turnover)
-the expression in one tissue may evolve independently of expression in another tissue (promoter modular organization)
Wray et al. (2003) The Evolution of Transcriptional Regulation in Eukaryotes. Mol. Biol. Evol. 20(9):1377-1419.
![Page 15: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/15.jpg)
Transcription factor binding sites (TFBS) are short and imprecise
-short sequence motifs (6-12 bp)
- some positions of the motif are variable
- sometimes different transcription factors can recognize the same sequence motif
TATAAA TATAGA TATAAATATAAA GATAAATATAAATATAAATATAAT ***
TATA box
![Page 16: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/16.jpg)
Transcription factor binding sites (TFBS)
Weight matrices
TATAAA TATAGA TATAAATATAAA GATAAATATAAATATAAATATAAT ***
1 2 3 4 5 6 - - - - - - - - - - - - A 0 8 0 8 7 7 C 0 0 0 0 0 0 G 1 0 0 0 1 0 T 7 0 8 0 0 1
-> can be used to search for putative motifs in sequences
![Page 17: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/17.jpg)
TRANSFAChttp://transfac.gbf.de/TRANSFAC/
http://www.biobase.de
TRRDhttp://www.bionet.nsc.ru/trrd/
Placehttp://www.dna.affrc.go.jp/htdocs/PLACE/
ooTFD / rTFDhttp://www.ifti.org/cgi-bin/ifti/ootfd.pl
SCPDhttp://cgsigma.cshl.org/jian/
RegulonDBhttp://regulondb.ccg.unam.mx/
Transcription factor binding site databases
![Page 18: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/18.jpg)
TFBS prediction using weight matrices
PROMO
Farré, D., et al. (2003). Nucleic Acids Research 31: 1739-1748.
http://promo.lsi.upc.edu
![Page 19: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/19.jpg)
High false positive rate in TFBS prediction
Test Sequences: 200 vertebrate promoter sequences 607 experimentally-verified sites
Blanco, E., et al.. (2006). Nucleic Acids Research 34: D63-D67.
Predictions: Transfac v.6.4
SENSITIVITY: 46%
SPECIFICITY: 2% Very low!
![Page 20: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/20.jpg)
Comparative approaches are necessary
- orthologous sequences : phylogenetic footprinting
- co-expressed genes : shared regulatory motifs
Select those motifs or regions that are shared by:
![Page 21: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/21.jpg)
Boffelli D, Nobrega MA, Rubin EM. (2004) Nat Rev Genet. 5:456-65
Phylogenetic footprinting
![Page 22: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/22.jpg)
Highly conserved enhancer in gene DACH1
Phylogenetic footprinting
![Page 23: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/23.jpg)
Proximal promoter
pre-initiationcomplex
![Page 24: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/24.jpg)
Motif positional bias
Signal Search Analysis Server (SIB)
![Page 25: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/25.jpg)
Why some motifs should show positional bias?
- promoter structure
- protein-protein interaction positional constraints
Predicted element
Reference element (known)
TFBS 1
proximal promoter
TSSPICACTTFBS 1 TFB 2
regulatory module
TF1 TF2
![Page 26: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/26.jpg)
PEAKS: identification of motif positional bias
functionally-related sequences(ex. co-expressed)
random
Predicted elementReference element (known)
TSSTFBS
over-representation
![Page 27: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/27.jpg)
1
1
2
3
seq1
seq2
seq3
seq4
PEAKS
11
2
Step 1. Construct motif frequency profile
profile
sliding window
Predicted elementReference element (known)
![Page 28: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/28.jpg)
PEAKSStep 1. Construct motif frequency profile
308 housekeeping genesTransfac v.6.4 matrix library
TSS
![Page 29: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/29.jpg)
PEAKSStep 2. Measure significance of peaks
Score (max peak) = Sa x Sb x Sc
Sa = max peak / num motifSb = max peak / num seqSc = max peak / average num motifs
maximum peakFor each matrix:
CAAT-box+675 -325
average signal
difference
![Page 30: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/30.jpg)
PEAKSStep 2. Measure significance of peaks
- determine random expectation score cut-off for different levels of significance using 1000 random datasets
- define significant signal range:
cut-off 0.005
max peak
CAAT-box
aver signal
![Page 31: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/31.jpg)
PEAKS
Step 3. Build “promoter type”
52 genes regulated by NFkB, p < 0.5%
TATASp1
NFkB
BACH1
![Page 32: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/32.jpg)
PEAKS serverhttp://genomics.imim.es/peaks/
Bellora, Farré and Albà (2007). Bioinformatics 23, 243-4.
![Page 33: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/33.jpg)
308 housekeeping genes52 NFkB regulated genes
TATA
CAATGC-box YYTATANFkB
GC-box
BACH1
PEAKS resultshuman promoter sequences
TRANSFAC vertebrate matrices
![Page 34: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/34.jpg)
PEAKS resultspromoters from yeast genes, amino acid metabolism (86 genes)
- 54 yeast weight matrices tested
- significant regions detected by the method show significant enrichment in experimentally-validated sites
![Page 35: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/35.jpg)
Measuring promoter sequence divergence
promoter
species 1
species 2
promoter
species 1
species 2
Divergence (Non-aligned promoter fraction or dSM)
0.8
0.4
Castillo-Davis et al., 2004
1. highly divergent -> less constraints
2. highly conserved -> more constraints
![Page 36: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/36.jpg)
0-0.1 0.1-0.2
0.2-0.3
0.3-0.4
0.4-0.5
0.5-0.6
0.6-0.7
0.7-0.8
0.8-0.9
0.9-1 1
Variability in promoter sequence divergence
8385 human-mouse orthologues2 Kb from transcription start site
Average divergence = 70%
![Page 37: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/37.jpg)
Regulatory genes contain more conserved promoters than structural/metabolic genes
Functional classes enriched in high score promoter alignments
Lee et al. (2006). BMC Genomics 6: 188
- consistent with results by Iwama and Gojobori (2004)
![Page 38: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/38.jpg)
Structural/metabolic genes contain less highly conservedpromoters than regulatory genes
Functional classes enriched in low score promoter alignments
Lee et al. (2006). BMC Genomics 6: 188
![Page 39: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/39.jpg)
Comparison neurogenesis versus ribosomal
neurogenesisribosomal
Lee et al. (2006). BMC Genomics 6: 188
![Page 40: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/40.jpg)
Is expression breadth related to promoter sequence divergence?
0
200
400
600
800
1000
1200
01-05 06-10 11-15 16-20 21-25 26-30 31-35 36-40 41-45 46-50 51-55
Expression breadth (number of tissues)
Number of genes
Expression data from Zhang et al. (2004)
tissue-specific
intermediate
housekeeping
orthologues human-mouse
![Page 41: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/41.jpg)
promoter
species 1
species 2
Measure sequence divergence
-tissue-specific-intermediate-housekeeping
Divergence = non-aligned promoter fraction
2 Kb
![Page 42: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/42.jpg)
l=16 N dSM Ka Ks Ka/KsZhangdataset
3983
01-10 1006 0.688 0.108 0.764 0.14811-50 1931 0.702 0.080 0.746 0.11451-55 1046 0.734 0.050 0.678 0.078
Relationship between promoter divergence and expression breadth
number of tissues
Coding sequence evolutionary rate
Promoter divergence
but..
housekeeping
tissue-specificintermediate
promoter divergence
coding sequence divergence
![Page 43: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/43.jpg)
0
10
20
30
40
50
60
70
-2000-1900-1800-1700-1600-1500-1400-1300-1200-1100-1000-900-800-700-600-500-400-300-200-100
Relationship between promoter divergence and expression breadth
- divergence measured in 100 nt bins
housekeeping
non-housekeeping
TSS
% conservation
![Page 44: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/44.jpg)
Promoter divergence and gene function
highly divergent promoter
RNA bindingligase activityhydrolase activitycatalytic activity
highly conserved promoter
receptor bindingsignal transducer activityreceptor activitystructural molecule activitytranscription regulator activitytranscription factor activityDNA binding
GO class > 50 genes, p-value < 0.01
![Page 45: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/45.jpg)
0,5
0,6
0,7
0,8
organdevelopment
transcriptionfactor
receptor proteinmetabolism
housekeeping
non-housekeeping
Promoter divergence and gene function
divergence
![Page 46: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/46.jpg)
Summary
- the prediction of transcription factor binding sites is very noisy, we need to use comparative genomics
- some motifs show positional bias, this property canhelp us understand the structure of promoters and improve motif predictions
-promoter sequence conservation is related to gene function and to gene expression breadth. the fact that housekeeping genes contain less conserved promoters may obey to a more simple gene expression regulation
![Page 47: Comparative analysis of eukaryotic genes Mar Albà Barcelona Biomedical Research Park](https://reader036.vdocuments.us/reader036/viewer/2022062422/56649ee15503460f94bf2673/html5/thumbnails/47.jpg)
NicolasBellora
DomènecFarré
LorisMularoni
MacarenaToll
The teamEvolutionary Genomics GroupUniversitat Pompeu Fabra, Barcelonahttp://genomics.imim.es/evolgenome
MedyaShikhagaie