solid technology for high-througput

34
Gennady V. Vasiliev SOLiD technology for high-througput sequencing” Institute of Cytology and Genetics SB RAS

Upload: others

Post on 11-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Gennady V. Vasiliev

“SOLiD technology for high-througput

sequencing”

Institute of Cytology and Genetics SB RAS

First Generation of sequencers – Sanger based

96-capillary ABI 3730XL

Second Generation of sequencers – MPSS

454-FLX

Titanium

Illumina/

Solexa SOLiD 4

Read length (bp) 240–400 2x100 2 x 50

Human genome

resequencing cost

($)

1 000 000 20 000 5 000 - 10 000

High throughput models Medium throughput models

454 Junior Ion Torrent

240–400 100-200

N/A N/A

Third Generation of sequencers – Single molecular

Helicos

tSMS PacBio SMRT Complete Genomics

Read length (bp) 30 Up to 1 000 2 x 35

Human genome

resequencing cost ($) 70 000

Less than

10 000 5000

SOLiD

SOLiD 4 SOLiD 5500

SOLiD 4 and SOLiD 5

SOLiD 4 SOLiD 4hg SOLiD 5500 SOLiD 5500xl

Throughput

per run

Up to 100 GB

(1 hg, 30x)

Up to 300 GB

(3 hg, 30x)

Up to 90 GB

(1 hg, 30x)

Up to 180 GB

(2 hg, 30x)

Samples number Up to 8 per slide,

2 slide

Up to 4 per slide,

2 slide

1–6

(1 FlowChip)

1–12

(2 FlowChips)

Multiplexing 96 DNA and 48

RNA barcodes

- 96 barcodes for

RNA and DNA

96 barcodes for RNA

and DNA

Maximum read

lengths

2 x 50 bp (Mate-

Paired)

50 bp x 25 bp

(Paired-End)

2 x 50 bp (Mate-

Paired)

50 bp x 25 bp

(Paired-End)

2 x 60 bp (Mate-

Paired)

75 bp x 35 bp

(Paired-End)

2 x 60 bp (Mate-

Paired)

75 bp x 35 bp

(Paired-End)

Step 1: Library preparation

Basic 2 × 60 bp mate-paired library preparation workflow

T7 Exonuclease and S1

Nuclease digestion

Major types of SOLiD sequencing runs

mRNA preparation for transcriptome experiment

RiboMinus™ Concentration Module (Invitrogen)

PureLink™ RNA Micro Kit (Invitrogen)

Library preparation for transcriptome experiment

Library preparation for transcriptome experiment

Library preparation for transcriptome experiment

Step 2: E-PCR

SOLiD: результат ПЦР в эмульсии

1.6 x 109

150 – 300 x 109

P1-coupled beads

2) Polymerase extends from P1

1) Template Anneals to P1

3) Complementary sequence is

extended off bead surface

4) Template disassociates

E-PCR

E-PCR

Step 3: Bead enrichment

P1 P2

P2’

P2 P1

Large (5µ)

Polystyrene

bead

Centrifuge in

glycerol gradient Supernatant

Captured beads with templates

Pellet

Beads with no template

Step 4: 3′-end modification and beads deposition

Beads attached to glass

surface in a random array

SOLiD

2357 panel per slide (SOLiD 4)

The targeted bead deposition density is 300,000

P2-positive beads per panel, 708 million per slide

SOLiD: Probe

2 Base Pair Encoding

Using 4 Dyes

A C G T

A

C

G

T

2nd Base

1st B

as

e

Red-probe

5’

n n n A T z z z

3’

Blue-probe

5’

n n n T T z z z

3’

On our probes the 1st base encoded is

position 4

the 2nd base encoded is position 5

SOLiD

SOLiD

SOLiD

1µm

bead

P1 Adapter P2 Adapter Internal Adapter

25 base

Tag #1

25 base

Tag #2

SOLiD read

A C G G T C G T C G T G T G C G T

A C G G T C G T C G T G T G C G T

A C G T A C G T

2nd Base

1st B

as

e

SNP Insertion / Deletion

SOLiD: SNP vs sequencing error

Reference SNP 2 colors change

98% raw base accuracy

99,99% consensus accuracy (~ 20x coverage) SNP

error

SOLiD: SNP and sequencing errors

SOLiD usage

• Whole genome (eucariota or bacteria) resequencing

•Pathogen evolution study

• SNP discovery / associations studies

• Exome resequencing (using Agilent SureSelect tecnology)

• Whole transcriptome analysis

• Micro-RNA analysis

• Genome structural variation analysis / cancer genome study

• ChIP-Seq analysis

•Methylation analysis

SOLiD shortcomings

• Unusable for de novo sequencing

• Some reads are artificial – polyclonal beads result

•Two PCR reactions produce deviation from random reads distribution

• Multi-step sample preparation produce artifacts

Real life differs from advertising pictures

Gaps in sequence up 5% of the reference

Coverage by uniquely

placing single reads

• Duplicated rRNA

genes: rrsE & rrIE

• No coverage by

unique single reads

• Mapped with mate-

paired reads

Coverage by mate-

paired reads

Deviation from Poisson for MPSS reads

Bad library Usable library

Multi-step sample preparation produce artifacts

Some reads are artificial – polyclonal beads result

Data quality drops with read lenth