lecture 2 splicing graphs / annoteted transcript expression estimation

Click here to load reader

Upload: merilyn-horn

Post on 26-Dec-2015




0 download


  • Slide 1
  • LECTURE 2 Splicing graphs / Annoteted transcript expression estimation
  • Slide 2
  • Slide 3
  • Transcriptomics Study of RNA biology genome Transcription, alternative splicing 1203x gene 87x 234x How to measure these?
  • Slide 4
  • RNA-seq split read alignment genome align gene More on this on Thursday study group
  • Slide 5
  • Splicing graph 1603150813801597 95 1511 220 1198 1303 1203x 87x 234x Find paths that best explain the graph, e.g: Well study this problem next week
  • Slide 6
  • Annotated transcript expression estimation 1603150813801597 95 1511 220 1198 1303 ?x Assume we know the possible transcripts but their relative abundancies are unknown a b c (1603-(a+b+c)) 2 +(95-b) 2 +(1511-(a+c)) 2 + (1508-(a+c)) 2 +(220-c) 2 +(1198-a) 2 +(1380-(a+b)) 2 + (1303-(a+b)) 2 +(1597-(a+b+c)) 2 f(a,b,c)= Least squares problem: Minimize
  • Slide 7
  • Least squares problem f(a,b,c) receives minimum when all partial derivates of f are zero. f a (a,b,c) = 2(a+b+c-1603)+ 2(a+c-1511) + 2(a+c-1508) + 2(a-1198) + 2(a+b-1380) + 2(a+b-1303) + 2(a+b+c-1597) f b (a,b,c) = 2(a+b+c-1603)+ 2(b-95) +2(a+b-1380) + 2(a+b-1303) + 2(a+b+c-1597) f c (a,b,c) = 2(a+b+c-1603) + 2(a+c-1511) + 2(a+c-1508) + 2(c-220) + 2(a+b+c-1597) 7a+4b+4c=10100 4a+5b+2c=5978 4a+2b+5c=6439 This system has a unique solution, which is { a = 21032/17, b = 5260/51, c = 13097/51 }. Google: linear equations solver, click first link, copy-paste, click solve the system, copy-paste the result
  • Slide 8
  • Solution 1603150813801597 95 1511 220 1198 1303 1237 x 103 x 257 x a b c
  • Slide 9
  • Without annotation 2 n possible transcripts for a gene with n exons Solve least squares for each combination of possible transcripts and select the combination with best solution 2 2 n combinations to consider In the next week Thursday study group we study an algorithm solving the same problem in polynomial time! Before that we study an easier problem, where the goal is just to predict transcripts, not their expression levels
  • Slide 10
  • Study group this Thursday An algorithm for split-read alignment Input: Maximal exact matches between genome and RNA- sequencing read E.g. ACGATCATCGCT vs. ACGAGATCCGCTAGT Such alignment anchors can be computed efficiently using methods from Biological Sequence Analysis course
  • Slide 11
  • Study group this Thursday Output: Consistent split-read alignment covering maximally the initial local alignments Exon 1Exon 2Exon 3