welcome to introduction to bioinformatics wednesday, 10 february genome sequencing/assembly genome...

45
Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly • Genome sequencing/Assembly Click anywhere to go on to the next slide This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor position more obvious. To do this, click Slide Show on the top tool bar, then View show.

Upload: claud-cross

Post on 16-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Welcome toIntroduction to Bioinformatics

Wednesday, 10 FebruaryGenome Sequencing/Assembly

• Genome sequencing/Assembly

Click anywhere to go on to the next slide

This demonstration is best viewed as a slide show,enabling you to simulate a session and make

changes in cursor position more obvious.To do this, click Slide Show on the top tool bar, then View show.

Page 2: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next
Page 3: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

What to do for summer vacation?

Page 4: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Deadline, SUNday Feb 28!

Page 5: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Target, Monday Mar 1!

Page 6: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Deadline, ???

Page 7: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Deadline, FRIday Feb 26!

Page 8: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Global Viral Genome Project

Deadline, whenever!

Page 9: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Learn more about…

HHMI: http://www.vcu.edu/csbc/hhmi/

BBSI: http://www.vcu.edu/csbc/bbsi/

VCU-USF: http://www.research.vcu.edu/vpr/fellowship.htm

GVGP: http://biobike.csbc.vcu.edu (News)

Page 10: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

What is the sequence (5' to 3') represented by the gel?

Myers et al SQ2

G A T C

Page 11: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

What is the sequence (5' to 3') represented by the gel?

Myers et al SQ2

G A T C

Page 12: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing(= Sanger sequencing)

Page 13: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 14: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 15: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 16: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 17: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 18: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 19: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 20: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 21: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 22: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Dideoxy sequencing

Page 23: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

What is the sequence (5' to 3') represented by the gel? G A T C

Myers et al SQ2

Page 24: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

What is the sequence (5' to 3') represented by the gel? G A T C

ddCddC

ddCddC

ddC

TCGTGTACATCGTAACACGGTTAAGT

Myers et al SQ2

Page 25: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Sequencing processDrosophila genome(~100 million nt)

Sequence it

Technical limitation

Reads limited to 100’s of nt

Page 26: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Sequencing processDrosophila genome(~100 million nt)

. . .

How many possible 500 nt fragments are there?

Page 27: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Sequencing processDrosophila genome(~100 million nt)

. . .

SAMPLE

Page 28: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Sequencing processDrosophila genome(~100 million nt)

SAMPLE

. . .

How many 500 nt samples needed 100 million nt?100 000 000 500

Page 29: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Sequencing processDrosophila genome(~100 million nt)

SAMPLE

. . .

How many 500 nt samples needed 100 million nt?

Is this enough?

Oversampling … coverage?

1 000 000 5

Page 30: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Paint the wall

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

How long will this take?

Page 31: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Paint the wall

How long will this take?

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

Page 32: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Paint the wall

How long will this take?

40 "

25 "

1 sq "

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

Page 33: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Paint the wall

How long will this take?

40 "

25 "

1000paint balls?

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

Page 34: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10

Oversampling

Co

mp

lete

nes

s

How much is painted with 1x oversampling?

Study Question 8 & 9"oversampling"? "coverage"?

Shotgun sequencing ?

What fraction won't be painted?

Page 35: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

P(TT) = 1/2 x 1/2 = 1/4

Probability that two coins come up both tails

Rule of multiplicationintersectionindependent

Gets T from first AND gets T from second

Intersection of possibilities(Rule of multiplication)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

Page 36: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

P(at least 1 T) = 1/4 + 1/4 + 1/4

Probability that either of two coins comes up tails

1/2 x 1/2 = 1/4?

Gets HT or TH or TT

Union of possibilities(Rule of addition)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

1/2 + 1/2 = 1?

Page 37: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

P(at least 1 T) = 1/4 + 1/4 + 1/4

Probability that either of two coins comes up tails

Gets HT or TH or TT

Union of possibilities(Rule of addition)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

Rule of additionunion

mutually exclusive

Page 38: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

P(at least 1 T) = 1 - 1/4

Probability that either of two coins does not comes up tails

Probability(2 T) = 1 – Probability(NOT 2 T)

Union of possibilities(Rule of complementation)

Second coin toss

H

T

H HH

HTFirst

cointoss

T TH

TT

Rule of complementationyin-yangAdds to 1

Page 39: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Sequencing processDrosophila genome(~100 million nt)

. . .

Focus on one nucleotide…

What’s the probability that it’s covered by one read?

What’s the probability that it’s covered by two reads?

What’s the probability that it’s covered by 200,000 reads?

Page 40: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Problem Set 3, Problem 2Statistics of mini-plasmid assembly

Page 41: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Why read pairs? Scaffolds?

DNA

Myers et al SQ6

Contig 1 Contig 2

Page 42: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

G A T Cprimer

primer

x 1000's

plasmid

insert

~2000 nt mates

Myers et al SQ6Why read pairs? Scaffolds?

Page 43: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

. . .

~ 150,000 nt

Bacterial Artificial CHROMOSOME

mates

Myers et al SQ6Why read pairs? Scaffolds?

P1-derived Artificial CHROMOSOME

Page 44: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

Myers et al SQ6Why read pairs? Scaffolds?

Page 45: Welcome to Introduction to Bioinformatics Wednesday, 10 February Genome Sequencing/Assembly Genome sequencing/Assembly Click anywhere to go on to the next

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

Myers et al (2000)