central dogma of molecular biology - cs.cornell.edu visit day slides/compbio-uri.pdf · • dna is...

Post on 20-Apr-2018

218 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Central Dogma of Molecular Biology

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.)

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly, and how much, each cell “produces”

at any given moment

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly, and how much, each cell “produces”

at any given moment

• To “first order” the products are proteins

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly, and how much, each cell “produces”

at any given moment

• To “first order” the products are proteins and the two-stage process

involves

• transcription: an imprint of the DNA is taken by mRNA

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly, and how much, each cell “produces”

at any given moment

• To “first order” the products are proteins and the two-stage process

involves

• transcription: an imprint of the DNA is taken by mRNA

• translation: the mRNA is used to guide the assembly of proteins

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly, and how much, each cell “produces”

at any given moment

• To “first order” the products are proteins and the two-stage process

involves

• transcription: an imprint of the DNA is taken by mRNA

• translation: the mRNA is used to guide the assembly of proteins

• Protein production can be regulated by transcription factors which

• bind to specific DNA sites (Transcription Factor Binding Sites)

1

Central Dogma of Molecular Biology

• DNA is merely the blueprint

• Shared spatially (eyes, ears, heart etc.) and temporally (from cradle

to tomb)

• What changes is what exactly, and how much, each cell “produces”

at any given moment

• To “first order” the products are proteins and the two-stage process

involves

• transcription: an imprint of the DNA is taken by mRNA

• translation: the mRNA is used to guide the assembly of proteins

• Protein production can be regulated by transcription factors which

• bind to specific DNA sites (Transcription Factor Binding Sites)

• regulate transcription rate of proximal genes

2

Transcription initiation of the glnA gene in E. coli

3

Motif finding

• Do these sequences share a common TFBS?

• tagcttcatcgttgacttctgcagaaagcaagctcctgagtagctggccaagcgagctgcttgtgcccggctgcggcggttgtatcctgaatacgccatgcgccctgcagctgctagaccctgcagccagctgcgcctgatgaaggcgcaacacgaaggaaagacgggaccagggcgacgtcctattaaaagataatcccccgaacttcatagtgtaatctgcagctgctcccctacaggtgcaggcacttttcggatgctgcagcggccgtccggggtcagttgcagcagtgttacgcgaggttctgcagtgctggctagctcgacccggattttgacggactgcagccgattgatggaccattctattcgtgacacccgacgagaggcgtccccccggcaccaggccgttcctgcaggggccaccctttgagttaggtgacatcattcctatgtacatgcctcaaagagatctagtctaaatactacctgcagaacttatggatctgagggagaggggtactctgaaaagcgggaacctcgtgtttatctgcagtgtccaaatcctat

3

Motif finding

• Do these sequences share a common TFBS?

• tagcttcatcgttgacttctgcagaaagcaagctcctgagtagctggccaagcgagctgcttgtgcccggctgcggcggttgtatcctgaatacgccatgcgccctgcagctgctagaccctgcagccagctgcgcctgatgaaggcgcaacacgaaggaaagacgggaccagggcgacgtcctattaaaagataatcccccgaacttcatagtgtaatctgcagctgctcccctacaggtgcaggcacttttcggatgctgcagcggccgtccggggtcagttgcagcagtgttacgcgaggttctgcagtgctggctagctcgacccggattttgacggactgcagccgattgatggaccattctattcgtgacacccgacgagaggcgtccccccggcaccaggccgttcctgcaggggccaccctttgagttaggtgacatcattcctatgtacatgcctcaaagagatctagtctaaatactacctgcagaacttatggatctgagggagaggggtactctgaaaagcgggaacctcgtgtttatctgcagtgtccaaatcctat

4

If only life could be that simple

• The binding sites are almost never excatly the same

• A more likely sample is:

tagcttcatcgttgactttTGaAGaaagcaagctcctgagtagctggccaagcgagctgcttgtgcccggctgcggcggttgtatcctgaatacgccatgcgccCTGgAGctgctagaccCTGCAGccagctgcgcctgatgaaggcgcaacacgaaggaaagacgggaccagggcgacgtcctattaaaagataatcccccgaacttcatagtgtaatCTGCAGctgctcccctacaggtgcaggcacttttcggatgCTGCttcggccgtccggggtcagttgcagcagtgttacgcgaggttCTaCAGtgctggctagctcgacccggattttgacggaCTGCAGccgattgatggaccattctattcgtgacacccgacgagaggcgtccccccggcaccaggccgttcCTaCAGgggccaccctttgagttaggtgacatcattcctatgtacatgcctcaaagagatctagtctaaatactacCTaCAGaacttatggatctgagggagaggggtactctgaaaagcgggaacctcgtgtttattTGCAttgtccaaatcctat

5

The dual face of motif finding

• Motif finding really consists of two problems:

5

The dual face of motif finding

• Motif finding really consists of two problems:

• finding the most pronounced motifs in the text

5

The dual face of motif finding

• Motif finding really consists of two problems:

• finding the most pronounced motifs in the text

• statistical significance: are they merely artifacts of the size of the

data?

5

The dual face of motif finding

• Motif finding really consists of two problems:

• finding the most pronounced motifs in the text

• statistical significance: are they merely artifacts of the size of the

data?

• In the remaining few minutes I will touch on the second problem

6

Assessing the significance of a putative motif

6

Assessing the significance of a putative motif

• Begin with the aligning the motif occurrences:

tTGaAGCTGgAGCTGCAGCTGCAGCTGCttCTaCAGCTGCAGCTaCAGCTaCAGtTGCAt

6

Assessing the significance of a putative motif

• Begin with the aligning the motif occurrences:

tTGaAGCTGgAGCTGCAGCTGCAGCTGCttCTaCAGCTGCAGCTaCAGCTaCAGtTGCAt

then create the alignment matrix:

A 3 1 9

C 8 8

G 7 1 8

T 2 10 1 2

6

Assessing the significance of a putative motif

• Begin with the aligning the motif occurrences:

tTGaAGCTGgAGCTGCAGCTGCAGCTGCttCTaCAGCTGCAGCTaCAGCTaCAGtTGCAt

then create the alignment matrix:

A 3 1 9

C 8 8

G 7 1 8

T 2 10 1 2

which you then summarize with the

entropy score:

•I :=

∑column i

∑letter j

nij lognij/n

qj(n = 10)

7

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

7

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

• we cannot compare scores of alignments with varying depth or

width

7

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

• we cannot compare scores of alignments with varying depth or

width

• The solution is to assess the statistical significance of s

7

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

• we cannot compare scores of alignments with varying depth or

width

• The solution is to assess the statistical significance of s

• This is often accomplished by computing the p-value of the

observed score:

7

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

• we cannot compare scores of alignments with varying depth or

width

• The solution is to assess the statistical significance of s

• This is often accomplished by computing the p-value of the

observed score:

. assuming the observed columns are randomly drawn from

the background frequencies {qa, qc, qg, qt}

7

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

• we cannot compare scores of alignments with varying depth or

width

• The solution is to assess the statistical significance of s

• This is often accomplished by computing the p-value of the

observed score:

. assuming the observed columns are randomly drawn from

the background frequencies {qa, qc, qg, qt}. what is P0(I ≥ s)?

7

What’s in a score?

• By itself, the entropy score s of a particular motif has limited use

• we cannot compare scores of alignments with varying depth or

width

• The solution is to assess the statistical significance of s

• This is often accomplished by computing the p-value of the

observed score:

. assuming the observed columns are randomly drawn from

the background frequencies {qa, qc, qg, qt}. what is P0(I ≥ s)?

• This is not as simple as it might look at first sight

8

Can I submit two different answers?

MEME E-values compared with Consensus E-values (log10 scale)

8

Can I submit two different answers?

MEME E-values compared with Consensus E-values (log10 scale)

• MEME is consistently pessimistic when compared with Consensus (by

a factor of over 500 at times)

8

Can I submit two different answers?

MEME E-values compared with Consensus E-values (log10 scale)

• MEME is consistently pessimistic when compared with Consensus (by

a factor of over 500 at times)

• Who’s right?

9

Our work

9

Our work

• We developed a method that borrows ideas from large-deviation theory

to compute a reliable answer reasonably fast

9

Our work

• We developed a method that borrows ideas from large-deviation theory

to compute a reliable answer reasonably fast

• The same underlying idea can be used for other fundamental statistical

problems

9

Our work

• We developed a method that borrows ideas from large-deviation theory

to compute a reliable answer reasonably fast

• The same underlying idea can be used for other fundamental statistical

problems

• Joint work with: Neil Jones (Ph.D. student at UCSD)

9

Our work

• We developed a method that borrows ideas from large-deviation theory

to compute a reliable answer reasonably fast

• The same underlying idea can be used for other fundamental statistical

problems

• Joint work with: Neil Jones (Ph.D. student at UCSD), Niranjan

Nagarajan (Ph.D. student here)

top related