welcome to introduction to bioinformatics wednesday, 10 february genome sequencing/assembly genome...
TRANSCRIPT
Welcome toIntroduction to Bioinformatics
Wednesday, 10 FebruaryGenome Sequencing/Assembly
• Genome sequencing/Assembly
Click anywhere to go on to the next slide
This demonstration is best viewed as a slide show,enabling you to simulate a session and make
changes in cursor position more obvious.To do this, click Slide Show on the top tool bar, then View show.
What to do for summer vacation?
Deadline, SUNday Feb 28!
Target, Monday Mar 1!
Deadline, ???
Deadline, FRIday Feb 26!
Global Viral Genome Project
Deadline, whenever!
Learn more about…
HHMI: http://www.vcu.edu/csbc/hhmi/
BBSI: http://www.vcu.edu/csbc/bbsi/
VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm
GVGP: http://biobike.csbc.vcu.edu (News)
What is the sequence (5' to 3') represented by the gel?
Myers et al SQ2
G A T C
What is the sequence (5' to 3') represented by the gel?
Myers et al SQ2
G A T C
Dideoxy sequencing(= Sanger sequencing)
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
Dideoxy sequencing
What is the sequence (5' to 3') represented by the gel? G A T C
Myers et al SQ2
What is the sequence (5' to 3') represented by the gel? G A T C
ddCddC
ddCddC
ddC
TCGTGTACATCGTAACACGGTTAAGT
Myers et al SQ2
Sequencing processDrosophila genome(~100 million nt)
Sequence it
Technical limitation
Reads limited to 100’s of nt
Sequencing processDrosophila genome(~100 million nt)
. . .
How many possible 500 nt fragments are there?
Sequencing processDrosophila genome(~100 million nt)
. . .
SAMPLE
Sequencing processDrosophila genome(~100 million nt)
SAMPLE
. . .
How many 500 nt samples needed 100 million nt?100 000 000 500
Sequencing processDrosophila genome(~100 million nt)
SAMPLE
. . .
How many 500 nt samples needed 100 million nt?
Is this enough?
Oversampling … coverage?
1 000 000 5
Paint the wall
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
How long will this take?
Paint the wall
How long will this take?
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
Paint the wall
How long will this take?
40 "
25 "
1 sq "
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
Paint the wall
How long will this take?
40 "
25 "
1000paint balls?
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 2 4 6 8 10
Oversampling
Co
mp
lete
nes
s
How much is painted with 1x oversampling?
Study Question 8 & 9"oversampling"? "coverage"?
Shotgun sequencing ?
What fraction won't be painted?
P(TT) = 1/2 x 1/2 = 1/4
Probability that two coins come up both tails
Rule of multiplicationintersectionindependent
Gets T from first AND gets T from second
Intersection of possibilities(Rule of multiplication)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
P(at least 1 T) = 1/4 + 1/4 + 1/4
Probability that either of two coins comes up tails
1/2 x 1/2 = 1/4?
Gets HT or TH or TT
Union of possibilities(Rule of addition)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
1/2 + 1/2 = 1?
P(at least 1 T) = 1/4 + 1/4 + 1/4
Probability that either of two coins comes up tails
Gets HT or TH or TT
Union of possibilities(Rule of addition)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
Rule of additionunion
mutually exclusive
P(at least 1 T) = 1 - 1/4
Probability that either of two coins does not comes up tails
Probability(2 T) = 1 – Probability(NOT 2 T)
Union of possibilities(Rule of complementation)
Second coin toss
H
T
H HH
HTFirst
cointoss
T TH
TT
Rule of complementationyin-yangAdds to 1
Sequencing processDrosophila genome(~100 million nt)
. . .
Focus on one nucleotide…
What’s the probability that it’s covered by one read?
What’s the probability that it’s covered by two reads?
What’s the probability that it’s covered by 200,000 reads?
Problem Set 3, Problem 2Statistics of mini-plasmid assembly
Why read pairs? Scaffolds?
DNA
Myers et al SQ6
Contig 1 Contig 2
G A T Cprimer
primer
x 1000's
plasmid
insert
~2000 nt mates
Myers et al SQ6Why read pairs? Scaffolds?
. . .
~ 150,000 nt
Bacterial Artificial CHROMOSOME
mates
Myers et al SQ6Why read pairs? Scaffolds?
P1-derived Artificial CHROMOSOME
Myers et al SQ6Why read pairs? Scaffolds?
SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements: a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ." b. ". . .trillions of overlaps between reads are examined." c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."
Myers et al (2000)