central dogma of molecular biology - cs.cornell.edu visit day slides/compbio-uri.pdf · • dna is...

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.)

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly

to tomb)

• What changes is what exactly, and how much, each cell “produces”

at any given moment

to tomb)

at any given moment

• To “first order” the products are proteins

to tomb)

at any given moment

• To “first order” the products are proteins and the two-stage process

involves

• transcription: an imprint of the DNA is taken by mRNA

to tomb)

at any given moment

involves

• translation: the mRNA is used to guide the assembly of proteins

to tomb)

at any given moment

involves

• Protein production can be regulated by transcription factors which

• bind to specific DNA sites (Transcription Factor Binding Sites)

to tomb)

at any given moment

involves

• Protein production can be regulated by transcription factors which

• bind to specific DNA sites (Transcription Factor Binding Sites)

• regulate transcription rate of proximal genes

Transcription initiation of the glnA gene in E. coli

Motif finding

• Do these sequences share a common TFBS?

• tagcttcatcgttgacttctgcagaaagcaagctcctgagtagctggccaagcgagctgcttgtgcccggctgcggcggttgtatcctgaatacgccatgcgccctgcagctgctagaccctgcagccagctgcgcctgatgaaggcgcaacacgaaggaaagacgggaccagggcgacgtcctattaaaagataatcccccgaacttcatagtgtaatctgcagctgctcccctacaggtgcaggcacttttcggatgctgcagcggccgtccggggtcagttgcagcagtgttacgcgaggttctgcagtgctggctagctcgacccggattttgacggactgcagccgattgatggaccattctattcgtgacacccgacgagaggcgtccccccggcaccaggccgttcctgcaggggccaccctttgagttaggtgacatcattcctatgtacatgcctcaaagagatctagtctaaatactacctgcagaacttatggatctgagggagaggggtactctgaaaagcgggaacctcgtgtttatctgcagtgtccaaatcctat

Motif finding

• Do these sequences share a common TFBS?

• tagcttcatcgttgacttctgcagaaagcaagctcctgagtagctggccaagcgagctgcttgtgcccggctgcggcggttgtatcctgaatacgccatgcgccctgcagctgctagaccctgcagccagctgcgcctgatgaaggcgcaacacgaaggaaagacgggaccagggcgacgtcctattaaaagataatcccccgaacttcatagtgtaatctgcagctgctcccctacaggtgcaggcacttttcggatgctgcagcggccgtccggggtcagttgcagcagtgttacgcgaggttctgcagtgctggctagctcgacccggattttgacggactgcagccgattgatggaccattctattcgtgacacccgacgagaggcgtccccccggcaccaggccgttcctgcaggggccaccctttgagttaggtgacatcattcctatgtacatgcctcaaagagatctagtctaaatactacctgcagaacttatggatctgagggagaggggtactctgaaaagcgggaacctcgtgtttatctgcagtgtccaaatcctat

If only life could be that simple

• The binding sites are almost never excatly the same

• A more likely sample is:

tagcttcatcgttgactttTGaAGaaagcaagctcctgagtagctggccaagcgagctgcttgtgcccggctgcggcggttgtatcctgaatacgccatgcgccCTGgAGctgctagaccCTGCAGccagctgcgcctgatgaaggcgcaacacgaaggaaagacgggaccagggcgacgtcctattaaaagataatcccccgaacttcatagtgtaatCTGCAGctgctcccctacaggtgcaggcacttttcggatgCTGCttcggccgtccggggtcagttgcagcagtgttacgcgaggttCTaCAGtgctggctagctcgacccggattttgacggaCTGCAGccgattgatggaccattctattcgtgacacccgacgagaggcgtccccccggcaccaggccgttcCTaCAGgggccaccctttgagttaggtgacatcattcctatgtacatgcctcaaagagatctagtctaaatactacCTaCAGaacttatggatctgagggagaggggtactctgaaaagcgggaacctcgtgtttattTGCAttgtccaaatcctat

The dual face of motif finding

• Motif finding really consists of two problems:

• finding the most pronounced motifs in the text

• statistical significance: are they merely artifacts of the size of the

• In the remaining few minutes I will touch on the second problem

Assessing the significance of a putative motif

• Begin with the aligning the motif occurrences:

tTGaAGCTGgAGCTGCAGCTGCAGCTGCttCTaCAGCTGCAGCTaCAGCTaCAGtTGCAt

then create the alignment matrix:

A 3 1 9

G 7 1 8

T 2 10 1 2

then create the alignment matrix:

A 3 1 9

G 7 1 8

T 2 10 1 2

which you then summarize with the

entropy score:

•I :=

∑column i

∑letter j

nij lognij/n

qj(n = 10)

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

• we cannot compare scores of alignments with varying depth or

• The solution is to assess the statistical significance of s

• This is often accomplished by computing the p-value of the

observed score:

. assuming the observed columns are randomly drawn from

the background frequencies {qa, qc, qg, qt}

observed score:

the background frequencies {qa, qc, qg, qt}. what is P0(I ≥ s)?

observed score:

the background frequencies {qa, qc, qg, qt}. what is P0(I ≥ s)?

• This is not as simple as it might look at first sight

Can I submit two different answers?

MEME E-values compared with Consensus E-values (log10 scale)

• MEME is consistently pessimistic when compared with Consensus (by

a factor of over 500 at times)

• MEME is consistently pessimistic when compared with Consensus (by

a factor of over 500 at times)

• Who’s right?

Our work

• We developed a method that borrows ideas from large-deviation theory

to compute a reliable answer reasonably fast

Our work

• The same underlying idea can be used for other fundamental statistical

problems

Our work

problems

• Joint work with: Neil Jones (Ph.D. student at UCSD)

Our work

problems

• Joint work with: Neil Jones (Ph.D. student at UCSD), Niranjan

Nagarajan (Ph.D. student here)

central dogma of molecular biology - cs.cornell.edu visit day slides/compbio-uri.pdf · • dna is...

Documents

truth or dogma?

0910 central dogma

“central dogma of molecular biology” central dogma -...

moral e dogma

rna & central dogma

central dogma & pcr b91901070 wang yu-hsin. central dogma...

dogma sentral 09

karma hits dogma

“central dogma of molecular biology” central dogma...

dogma 95 (1)

dogma sentra

dogma calvo.pdf

“dogma” - exhibit-e

the function of dogma in scientific...

the central dogma

dna & the central dogma

5 central dogma i o ii.ppt - lthcentral dogma i and...

central dogma of dna

dogma of insider trading - vaish associates - … · dogma...

central dogma biology