tf-dna binding dependency a progress report

26
TF-DNA binding dependency A progress report March 17, 2010 Hugo Willy

Upload: neith

Post on 06-Feb-2016

47 views

Category:

Documents


0 download

DESCRIPTION

TF-DNA binding dependency A progress report. March 17, 2010 Hugo Willy. Outline. Re-Introduction of my problem Current state of affair Known dependency factor 1 – Rotamer Known dependency factor 2 – Water Known dependency factor 3 – DNA flexibility Some thoughts on what to do next. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: TF-DNA binding dependency A progress report

TF-DNA binding dependencyA progress report

March 17, 2010Hugo Willy

Page 2: TF-DNA binding dependency A progress report

Outline

• Re-Introduction of my problem• Current state of affair• Known dependency factor 1 – Rotamer• Known dependency factor 2 – Water• Known dependency factor 3 – DNA

flexibility• Some thoughts on what to do next

Page 3: TF-DNA binding dependency A progress report

Re-Introduction

• I am working on finding dependency model of TF-DNA binding

• What is TF-DNA binding?– If you ask this, you may be in the wrong room

• It is known that different TFs prefer different DNA sequence to bind to.

• Classic example TATA box binding proteins binds the sequence “TATA”.

Page 4: TF-DNA binding dependency A progress report

Re-Introduction (2)• It is commonly assumed that each position in T-

A-T-A contributes independently to the binding energy.

• That is to say, some guys from the TF will bind the first “T”, some other will bind the second “A” and so on.

• If the sequence become CATA, then it depends on how much the guys who binds the 1st position likes the new “C”. If they are OK, the binding energy may change a little but the TF still binds.

• Otherwise, too bad.

Page 5: TF-DNA binding dependency A progress report

Re-Introduction (3)• One such model, a

very popular one, is the PSSM model.

• And it is shown to be very good in estimating the real binding sites of many TF.

• However, some were curious whether the model holds for all TF.

Page 6: TF-DNA binding dependency A progress report

Current state of affair• There are quite a few publications which tries to show

that there are measurable dependencies among the positions.– RECOMB 2003-Modeling dependencies in Protein-DNA binding

sites• Multi PSSM, Tree, Multi Tree. Bayesian network based training.

– Bioinformatics 2004-Modeling within-motif dependence for transcription factor binding site predictions

• PSSM with pairwise correlated position using Bayes Factor. Gibbs sampling based.

– BIBE 2006-Discovering DNA Motifs with Nucleotide Dependency• PSSM with multi-positions, heuristic.

– Bioinformatics 2007-Position dependencies in transcription factor binding sites

• Checks dependencies within a set of aligned binding site with different statistical measures.

Page 7: TF-DNA binding dependency A progress report

Current state of affair (2)– Bioinformatics 2008-Context-dependent DNA

recognition code for C2H2 zinc-finger transcription factors

• Neural network based.– PLoSCompBio 2008-A Feature-Based Approach to

Modeling Protein-DNA Interactions• Feature based – currently only consider pairwise position

dependency feature.– NAR 2010-On the detection and refinement of

transcription factor binding sites using ChIP-Seq data• Similar to Bioinformatics 2004.

Page 8: TF-DNA binding dependency A progress report

Current state of affair (3)• However, they have a similar framework

– Start with a set of “known” binding sequence– Try to guess a model with and without

dependencies– Train the model using the dataset (possibly

making gradual change on the model during the training)

– Compare which model is better– They will list down the positions with

dependencies – most are consecutive positions, but some have quite distant positions.

Page 9: TF-DNA binding dependency A progress report

Current state of affair (4)

• Well, these are just a fitting of a model to a set of sequence known to bind. The binding energy was not really taken into account.

• So others, with more $$$ in their lab, did a huge biological experiments and try to see if the experimental binding energies of some TFs do exhibit some dependency pattern.

Page 10: TF-DNA binding dependency A progress report

Current state of affair (5)• Hence some more paper,

– NAR 2002-Nucleotides of transcription factor binding sites exert interdependent effects on the binding affinities of transcription factors

– NAR2002-Additivity in protein-DNA interactions-how good an approximation is it?

– Nature Biotechnology 2006-Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities

– Science 2009-Diversity and Complexity in DNA Recognition by Transcription Factors

– PLoSCompBio 2009-Inferring Binding Energies from Selected Binding Sites

Page 11: TF-DNA binding dependency A progress report

Current state of affair (6)From Science 2009, Protein binding microarray experiment.

Page 12: TF-DNA binding dependency A progress report

Current state of affair (7)

• Yet, none of the publication I have read so far gives a concrete evidence on HOW such dependencies could happen.

• We are now trying to find the answer on what happen on the physical level when two positions in the DNA are dependent.

Page 13: TF-DNA binding dependency A progress report

Known dependency factor 1 – Rotamer

• Recently there is an experiment involving the Zinc Finger TF, Zf268 which has been one of the most popular Zinc finger modeling target.

Page 14: TF-DNA binding dependency A progress report

Known dependency factor 1 – Rotamer

• They tried to change the DNA sequence of the wildtype GCG to ACG, CCG, AAG, and CAG

• We try to see if a program that can change the side chains of the TF to conform to the new DNA sequence can approximate the change in the binding energy.

• We tried FoldX – it does rotamer checks-not sure if it is optimal.

Page 15: TF-DNA binding dependency A progress report

total energy

Backbone Hbond

Sidechain Hbond

Van der

WaalsElectro statics

Solvation Polar

Solvation Hydro

phobic

0 0 0 0 0 0 0

4.23 -0.36 5.01 2.08 2.25 -5.13 0.95

4.28 0 4.37 0.06 1 -1.23 -0.17

-0.02 -0.01 1.96 0.87 0.29 -3.1 -0.1

4 -0.35 6.81 3.14 2.38 -8.67 1.17

4.39 0 5.58 1.28 1.55 -4.15 -0.13

FoldX results

Page 16: TF-DNA binding dependency A progress report

Known dependency factor 1 – Rotamer

• However, the rotamers that FoldX predict does not coincide with the diagrams.

• Either FoldX is not optimal, or the homology modeling done in the paper is not accurate.

• But given the close agreement on the predicted and experimental difference in the binding affinity, most probably they are (more) correct.

• I am still checking on that.

Page 17: TF-DNA binding dependency A progress report

Known dependency factor 2 – Water

• The thing that is explicitly computed in the NAR paper are the solvation penalties (the circles, rectangles and triangles in the diagram).

• They claim that the water mediated H-bonds are not that crucial.

• We can see that FoldX does compute hydration to a certain extent. Yet the rotamer search may not be good enough.

Page 18: TF-DNA binding dependency A progress report

Different solvation state of polar atoms

Page 19: TF-DNA binding dependency A progress report

Known dependency factor 3 – DNA flexibility

• DNA are not a rigid rod.

Page 20: TF-DNA binding dependency A progress report

Known dependency factor 3 – DNA flexibility

A

G

T

C

Page 21: TF-DNA binding dependency A progress report

Known dependency factor 3 – DNA flexibility

Page 22: TF-DNA binding dependency A progress report

Known dependency factor 3 – DNA flexibility

• G-C will have higher roll angle – making it less stable (weaker stacking energy) and easier to “open”.

• There are several work showing that different dinucleotide steps have different bending and twisting energy.

Page 23: TF-DNA binding dependency A progress report

Known dependency factor 3 – DNA flexibility

•TATA binding protein actually binds TATA not because it generates the best binding energy

•The bindings are mostly non-specific.

Page 24: TF-DNA binding dependency A progress report

Known dependency factor 3 – DNA flexibility

Page 25: TF-DNA binding dependency A progress report

Conclusion

• Up to now, the 3 factors are the known/most probable factors of DNA dependency.

• The challenge would be to combine all these into one scoring function that is simple enough to run on large dataset.

Page 26: TF-DNA binding dependency A progress report

Thank you for bearing with me.

Q & A