subtypes of associated protein-dna (transcription factor-transcription factor binding site) patterns
Post on 02-Jul-2015
377 Views
Preview:
DESCRIPTION
TRANSCRIPT
Subtypes of Associated Protein-DNA (TF-TFBS) Patterns
Prepared by: Cyrus Tak-Ming Chan (tmchan@cse.cuhk.edu.hk)
Tak-Ming Chan, Kwong-Sak Leung, Kin-Hong Lee, Man-Hon Wong, Chi-Kong Lau, Stephen Kwok-Wing Tsui, Subtypes of Associated Protein-DNA (Transcription Factor-Transcription Factor Binding Site) Patterns, Nucleic Acids Research, 2012, doi: 10.1093/nar/gks749.
17/Sep/2012 Version 1.2 (Typos corrected on P12)
1
Introduction
Proteins bind to DNA fragments to regulate genes i.e. Transcription Factors (TFs) bind to Transcription Factor
Binding Sites (TFBSs)
Finding the binding cores (several residues only) is fundamental and important
2
Motivations
Finding patterns/motifs one-sided is challenging and difficult e.g. TFBS Motif Discovery: Noises, variations through mutations,
unknown locations—weak signals to be recovered
? —Prediction —True TFBS
3
Tak-Ming Chan et al, IEEE Transactions on Evolutionary Computation, 2012 / BMC Bioinformatics, 2009, 10: 321 / Bioinformatics, 2007, 24(3)
Introduction
Finding associated patterns on both sides is shown to be promising—when you have many diverse binding sequences (e.g. TRANSFAC) Associated TF-TFBS patterns found from sequences…
x 7664 in TRANSFAC; 408 AAs on average
x 26786 bound TFBSs,1225 matrices in TRANSFAC; 25bp on average
Associated pattern discovery
…NRIAA… …TGACA…
…NRAAA… …TGACA…
…NREAA… …TGTGA……
Tak-Ming Chan et al, Discovering approximate-associated sequence patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
4
Introduction
Finding associated patterns on both sides is shown to be promising—when you have many diverse binding sequences (e.g. TRANSFAC) Associated TF-TFBS patterns found from sequences are verified
on 3D structures to be binding cores!
…NRIAA… …TGACA…
…NRAAA… …TGACA…
…NREAA… …TGTGA……
Verified on 3D structures (binding cores <3.5Å)
x 40222 binding pairs from 1290 PDB protein-DNA complexes
5Tak-Ming Chan et al, Discovering approximate-associated sequence patterns for protein-DNA interactions. Bioinformatics, 2011, 27(4)
Introduction—Motivations
We can go further with these promising associated TF-TFBS patterns Discovering and analyzing the binding variances (subtypes)
…NRIAA… …TGACA…
…NRAAA… …TGACA…
…NREAA… …TGTGA……
Subtypes may•Lead to changed binding preferences•Distinguish conserved from flexible binding residues •Reveal novel binding mechanisms
6
Methods & Materials
7
Methods & Materials
Both L-2 distance and p-value of Chi-squared test are used to shortlist subtypes (3rd: G-C; 4th:G/C-G )
8
Results
Sample results from http://www.cse.cuhk.edu.hk/~tmchan/subtypes/
9
Results
Subtypes with evidence of changed binding preferences >70% of subtypes (& pairs) reflect
changed binding preferences according to PDB structure evidence.
10
Results
Subtype clusters show more conserved (invariant) residues are important for protein-DNA interactions; variant residues show specific properties
11
Results
Case study shows subtypes that are potentially critical for regulation through dimerization and thus TF-TFBS binding
PKVEIL-CAGCTG PKVVIL-CACGTG
myogenic regulatory factor (MRF) family: PDB 1MDY
Myc family (Oncogene): PDB 1NKP
PKVEIL appears in TFs of MRF4, Myf-5, Myf-6, MyoD… in TRANSFAC
PKVVIL appears in TFs of c/L/v-Myc in TRANSFAC
• The subtypes are discovered without family information while reflecting strong familial specificity
• Literatures on wet-labs support that if V is mutated to AA (MycV394D) similar to E, the dimerization of Myc-Max will be abolished (Miz1 binding deficient)
12
Discussion
Further applications Applications on TFBS (motif) matching by adding TF associated
subtype information
Extension of the method on high-throughput sequencing data (e.g. ChIP-Seq, Protein Binding Microarrays)
Integration of other information to enhance the TF-TFBS prediction
Incorporation of 3D homology modeling to better model protein-DNA interactions
Analysis of regulatory mechanisms with other data, e.g. allele-specific mRNA data, to reveal more detailed regulatory mechanisms
13
top related