recitation on em slides taken from: ambuj/courses/bioinformatics/em.pdf...

Post on 01-Jan-2016

224 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Recitation on EMslides taken from:

http://www.cs.ucsb.edu/~ambuj/Courses/bioinformatics/EM.pdf

Computational GenomicsRecitation #6

All EM questions are in the format:

1. Write the likelihood function.2. Write the Q function.3. Derive the update rule.

Estimation problems

Estimation problems

What is the unobserved data in this case?

Estimation problems

?

?

?

?

?

?

?

?

?

?

??

?

?

?

?

??

EM question

• Let G = (G1, … , Gn) be n contiguous DNA regions representing genes. For each Gi we define the mRNA concentration of the gene as Pi, s.t. their sum is equal to 1. P = (P1, … , Pn) can be interpreted as the normalized expression levels for the regions in G.

EM question

• Our model assumes that reads are generated by randomly picking a region R from G according to the distribution P, and then copying this region. The copying process is error-prone. This process is repeated until we have a set of m reads R = r1, … , rm generated according to the model described above.

EM question

• For each region Gj and read ri, we have a probability pij = P(rj | Gi), the probability of observing rj given that the locus of the read was gene Gi. In practice, for each read rj, this probability will be close to zero for all but a few regions.

Likelihood function

• Write the likelihood of observing the m reads.

?

Q function

• Write the Q(P | P(t)) term.

?

?

M-step

• Write the M-step term using argmax function.

Update rule

• Infer from c the update step for P.

When we want to maximize ∑iailog(Pi) based on Pi, we achieve the maximum Pi=ai/∑iai

?

top related