busqueda de promotores
DESCRIPTION
Busqueda de promotores area bioinformaticaTRANSCRIPT
Secuencias regulatorias y búsqueda de promotores
Bq. Francisco DuartePh.D. Biotechnology
Contenidos
1. Background
2. Representación de motivos regulatorios
3. Algoritmos de búsqueda de promotores
4. Bases de datos relacionadas con la búsqueda depromotores
Probability of occurrence of each nucleotide
for -10 sequence
T A T A A T
77% 76% 60% 61% 56% 82%
for -35 sequence
T T G A C A
69% 79% 61% 56% 54% 54%
TRANSFAC Estructura de un gen eucarionte
ContigGene
Splice
Variants mRNA
Regulatory
Elements
CDS5’-UTR 3’-UTR
5‘
Splicing
3‘
Transcription
primary
transcript
altern.exon
promoterenhancer 1enhancer 2
TSS
TATAbox
initiatorInr
box Abox Bbox Cbox A‘
compositeelement
box E box Dbox D‘box Fbox Gbox A‘‘
Esquema general de la estructura jerárquica de lasregiones regulatorias de la transcripcion en geneseucariontes
¿Qué es un factor de transcripción?
A transcription factor is a protein that regulates transcription
after nuclear translocation by specific interaction with DNA
or by stoichiometric interaction with a protein that can be
assembled into a sequence-specific DNA-protein complex.
http://www.gene-regulation.com/pub/databases/transfac/clSM.html
Regiones regulatorias
Gene regulation
• Virtually every cell in your body contains acomplete set of genes
• But they are not all turned on in every tissue
• Each cell in your body expresses only a smallsubset of genes at any time
• During development different cells express differentsets of genes in a precisely regulated fashion.
• Gene regulation occurs at the level oftranscription or production of mRNA
• A given cell transcribes only a specific set ofgenes and not others
• Insulin is made by pancreatic cells
Características de las regiones reguladoras
Chequear: http://www.ccg.unam.mx/Computational_Genomics/PromoterTools/http://molbiol-tools.ca/Promoters.htmhttp://www.phisite.org/main/index.php?nav=tools&nav_sel=hunterhttp://www.fruitfly.org/seq_tools/promoter.htmlhttp://linux1.softberry.com/berry.phtml?topic=bprom&group=programs&subgroup=gfindb
Central dogma
Genetic information always goes from DNA to RNA to protein
Gene regulation has been well studied in E. coli
When a bacterial cell encounters a potential food source it will manufacture the enzymesnecessary to metabolize that food
Gene Regulation
In addition to sugars like glucose and lactose E. coli cells also require amino acidsOne essential aa is tryptophan.
When E. coli is swimming in tryptophan (milk & poultry) it will absorb the amino acids fromthe mediaWhen tryptophan is not present in the media then the cell must manufacture its’ ownamino acids
Trp Operon
E. coli uses several proteins encoded by a cluster of 5 genes to manufacture the amino acidtryptophan.
All 5 genes are transcribed together as a unit called an operon, which produces a singlelong piece of mRNA for all the genes.
RNA polymerase binds to a promoter located at the beginning of the first gene andproceeds down the DNA transcribing the genes in sequence
Gene regulation
In addition to amino acids, E. coli cells also metabolize sugars
in their environment.
In 1959 Jacques Monod and Fracois Jacob looked at the
ability of E. coli cells to digest the sugar lactose.
In the presence of the sugar lactose, E. coli makes an
enzyme called beta galactosidase.
Beta galactosidase breaks down the sugar lactose so the E.
coli can digest it for food.
It is the LAC Z gene in E. coli that codes for the enzyme beta
galactosidase.
Lac Z Gene
The tryptophane gene is turned on when there is no tryptophan in the
media.
That is when the cell wants to make it´s own tryptophan.
E. coli cells can not make the sugar lactose.
They can only have lactose when it is present in their environment.
Then they turn on genes to beak down lactose.
The E. coli bacteria only needs beta galactosidase if there is lactose in the
environment to digest. There is no point in making the enzyme if there is no
lactose sugar to break down.
It is the combination of the promoter and the DNA that regulate when a
gene will be transcribed.
This combination of a promoter and a gene is called an OPERON
THE OPERON
Operon is a cluster of genes encoding related enzymes that are regulated together
Operon consists of:• a promoter site where RNA polymerase binds and begins transcribing themessage.• a region that makes a repressor.
Repressor sits on the DNA at a spot between the promoter and the gene to betranscribed.
This site is called the operator.
LAC Z GENE
• E. coli regulate the production of BetaGalactocidase by using a regulatory protein calleda repressor
• The repressor binds to the lac Z gene at a sitebetween the promotor and the start of the codingsequence
• The site the repressor binds to is called theoperator
LAC Z GENE
• Normally the repressor sits on the operatorrepressing transcription of the lac Z gene
• In the presence of lactose the repressorbinds to the sugar and this allows thepolymerase to move down the lac Z gene
LAC Z GENE
This results in the production of beta galactosidasewhich breaks down the sugar.
When there is no sugar left the repressor willreturn to its spot on the chromosome and stopthe transcription of the lac Z gene.
Mecanismooperon apagado
GENE REGULATION
• In eukaryotic organisms like ourselves there are severalmethods of regulating protein production
• Most regulatory sequences are found upstream fromthe promoter
• Genes are controlled by regulatory elements in thepromoter region that act like one/off switches ordimmer switches
GENE REGULATION
• Specific transcription factors bind to these regulatoryelements and regulate transcription.
• Regulatory elements may be tissue specific and willactivate their gene only in one kind of tissue
• Sometimes the expression of a gene requires thefunction of two or more different regulatory elements
INTRONS AND EXONS
• Eukaryotic DNA differs from prokaryotic DNA it that the codingsequences along the gene are interspersed with noncodingsequences.
• The coding sequences are called
– EXONS
• The non coding sequences are called
– INTRONS
INTRONS AND EXONS
• After the initial transcript is produced theintrons are spliced out to form the completedmessage ready for translation
• Introns can be very large and numerous, sosome genes are much bigger than the finalprocessed mRNA
INTRONS AND EXONS
• Muscular dystrophy
• DMD gene is about 2.5 million base pairs long
• Has more than 70 introns
• The final mRNA is only about 17,000 base pairslong
RNA Splicing
• Provides a point where the expression of a gene can becontrolled
• Exons can be spliced together in different ways
• This allows a variety of different polypeptides to beassembled from the same gene
• Alternate splicing is common in insects and vertebrates,where 2 or 3 different proteins are produced from onegene
Protein domains in regulator sequences
TFBS: Transcription factors binding sites
Motif representations: from alignments to motifs
Transcription factors
Sequence-specific
DNA bindingNon-DNA binding
TF1 TF2 TF3 TF4
adapter
Co-activator
HAT
DNA
Layer I
Layer III
Layer II
Structure of transcription factors
USF-1, dimer
DNA binding
domain
Activation
domain
oligomerization
domain
Ligand-
binding
domain
Protein-protein
interaction
domain
N Gene Schema and positions of a CE
TRANSCompel
accession number
1.
Scavenger receptor, Homo sapiens
Enhancer –4500/-4100
C00080
2.
GM-CSF,
Mus musculus
-53 -40
: :
C00081
3.
Collagenase, Homo sapiens
-89 -82 -72 -66
: : : :
C00083
4.
IgH ,
Mus musculus
Enhancer at 3’ flank
C00133
5.
Interleukin 2,
Homo sapiens
-283 -268
: :
C00109
6.
Interleukin 2,
Homo sapiens
-167 -142
: :
C00165
7.
2, Mus musculus
-167 -142
: :
C00158
8.
IgH,
Homo sapiens
C00173
9.
А1, Rattus
norvegicus
-117 -73
: :
С00101
10.
IRF-1, Mus
musculus
-123 -113 -49 -40
: : : :
C00192
AP-1 Ets
AP-1 Ets
AP-1 Ets
AP-1 Ets
AP-1 NFAT
AP-1 NF-B
AP-1 Oct-2
Ets CBF
NF-B C/EBP
NF-B STAT-1
Ternary complex NFATp - AP1 - DNA
Synergistic activation of
transcription
Low level
of transcription
Low level
of transcription
F1
F1
F1
F2
F2
F2
Composite elements
Minimal functional units where both protein-DNA and protein-protein
interactions contribute to a highly specific pattern of gene expression
and provide cross-coupling of different signal transduction pathways.
Membrane receptor
Src
SH3
SH2Ras
Ras
GDP
GTP
AdaptorsPLC
PI3-K
Phosphorylation
IP3
Ca2+
Ca2+Ca2+
Ca2+ dependent canal
Calcineurin
ERK
ERK
JNK
JNK
P38MAPK
P38MAPK
NFATp NFATp
NFATp
P
P Pc-Fos c-Fos
с-Fos
c-Jun
c-Jun
c-Jun
c-Jun
ATF-2 ATF-2
ATF-2
IL-2
PKB/Akt
Composite element
cytoplasm
Nucleus
Integration of signals. Cross-coupling of signal transduction pathways
-180 -150-249
AP-1
NFAT
HMG Y
NFAT NFAT
AP-1STAT 6 NF-Y
-114 -88
AP-1
NFAT
HMG Y
-60
AP-1
NFAT
TATA
-28
c-MAF
CE CE
ST
Mouse IL-4 promoter
+1
ST
GM-CSF Homo sapiens
+1
T-cell specific inducible enhancer at –3500 bp Promoter
TATTT
-54
AP-1
NFAT
CE
NF-Bp50/p65
-88
AP-1
NFAT
CE
AP-1
NFAT
CE
AP-1
NFAT
AP-1
NFAT
CE
NF-Bc-Rel/p65
HMG Y(I)
-114
CD28 response element
CBF CBF
Recruitment of CIITA to MHC-II promoters. A prototypical MHC-II promoter (HLA-DRA) is represented schematically with the W, X, X2, and Y sequences conserved in all MHC-II, Ii, and HLA-DM promoters. RFX, X2BP, NF-Y, and an as yet undefined W-binding protein bind cooperatively to these sequences and assemble into a stable higher order nucleoprotein complex referred to here as the MHC-II enhanceosome. CIITA is tethered to the enhanceosome via multiple weak protein-protein interactions with the W, X, X2, and Y-binding factors. The octamer site found in the HLA-DRA promoter (O), and its cognate activators (Oct and OBF-1) are not required for recruitment of CIITA. CIITA is proposed to activate transcription (arrow) via its amino-terminal activation domains(AD), which contact the RNA polymerase II basal transcription machinery.
Masternak K et al., Genes Dev 2000 May 1;14(9):1156-66
Enhanceosome
TFIIA
TFIIE
TFIIH
Site-specific TF
TFIIF
RNA pol II
TFIID
Co-activator
p300/CBP
Acetilase
PCAF
Closed nucleosomes
Acetilation
TFIIB
Acetylase
Acetylation
Databases on gene regulation
http://regulondb.ccg.unam.mx/
Buscar .gbk y 100pares de basesupstream
Ejercicio
BLASTp vs NR para buscar probables ortólogos
>malE - 100 bases upstream
aaagaactacctgaatttcgagattaggcctt
gatcgcgccggggtgaaagcgttatact
gacgcgcaaacgtttgcgcaatttgggcacag
agggggtt
>malE - 100 bases upstream
aggaggatggaaagaggatgtcatagaaagaa
actaaagaccgttaagcgacctctgcgt
atccacgagcaatatacacaaatggaaaagga
cgggttat
http://molbiol-tools.ca/Promoters.htm
http://www.prodoric.de/vfp/vfp_promoter.php
http://www.phisite.org/main/index.php?nav=tools&nav_sel=hunter