solid technology for high-througput
TRANSCRIPT
Gennady V. Vasiliev
“SOLiD technology for high-througput
sequencing”
Institute of Cytology and Genetics SB RAS
Second Generation of sequencers – MPSS
454-FLX
Titanium
Illumina/
Solexa SOLiD 4
Read length (bp) 240–400 2x100 2 x 50
Human genome
resequencing cost
($)
1 000 000 20 000 5 000 - 10 000
High throughput models Medium throughput models
454 Junior Ion Torrent
240–400 100-200
N/A N/A
Third Generation of sequencers – Single molecular
Helicos
tSMS PacBio SMRT Complete Genomics
Read length (bp) 30 Up to 1 000 2 x 35
Human genome
resequencing cost ($) 70 000
Less than
10 000 5000
SOLiD 4 and SOLiD 5
SOLiD 4 SOLiD 4hg SOLiD 5500 SOLiD 5500xl
Throughput
per run
Up to 100 GB
(1 hg, 30x)
Up to 300 GB
(3 hg, 30x)
Up to 90 GB
(1 hg, 30x)
Up to 180 GB
(2 hg, 30x)
Samples number Up to 8 per slide,
2 slide
Up to 4 per slide,
2 slide
1–6
(1 FlowChip)
1–12
(2 FlowChips)
Multiplexing 96 DNA and 48
RNA barcodes
- 96 barcodes for
RNA and DNA
96 barcodes for RNA
and DNA
Maximum read
lengths
2 x 50 bp (Mate-
Paired)
50 bp x 25 bp
(Paired-End)
2 x 50 bp (Mate-
Paired)
50 bp x 25 bp
(Paired-End)
2 x 60 bp (Mate-
Paired)
75 bp x 35 bp
(Paired-End)
2 x 60 bp (Mate-
Paired)
75 bp x 35 bp
(Paired-End)
mRNA preparation for transcriptome experiment
RiboMinus™ Concentration Module (Invitrogen)
PureLink™ RNA Micro Kit (Invitrogen)
P1-coupled beads
2) Polymerase extends from P1
1) Template Anneals to P1
3) Complementary sequence is
extended off bead surface
4) Template disassociates
E-PCR
Step 3: Bead enrichment
P1 P2
P2’
P2 P1
Large (5µ)
Polystyrene
bead
Centrifuge in
glycerol gradient Supernatant
Captured beads with templates
Pellet
Beads with no template
SOLiD
2357 panel per slide (SOLiD 4)
The targeted bead deposition density is 300,000
P2-positive beads per panel, 708 million per slide
SOLiD: Probe
2 Base Pair Encoding
Using 4 Dyes
A C G T
A
C
G
T
2nd Base
1st B
as
e
Red-probe
5’
n n n A T z z z
3’
Blue-probe
5’
n n n T T z z z
3’
On our probes the 1st base encoded is
position 4
the 2nd base encoded is position 5
SOLiD read
A C G G T C G T C G T G T G C G T
A C G G T C G T C G T G T G C G T
A C G T A C G T
2nd Base
1st B
as
e
SNP Insertion / Deletion
SOLiD: SNP vs sequencing error
Reference SNP 2 colors change
98% raw base accuracy
99,99% consensus accuracy (~ 20x coverage) SNP
error
SOLiD usage
• Whole genome (eucariota or bacteria) resequencing
•Pathogen evolution study
• SNP discovery / associations studies
• Exome resequencing (using Agilent SureSelect tecnology)
• Whole transcriptome analysis
• Micro-RNA analysis
• Genome structural variation analysis / cancer genome study
• ChIP-Seq analysis
•Methylation analysis
SOLiD shortcomings
• Unusable for de novo sequencing
• Some reads are artificial – polyclonal beads result
•Two PCR reactions produce deviation from random reads distribution
• Multi-step sample preparation produce artifacts
Real life differs from advertising pictures
Gaps in sequence up 5% of the reference
Coverage by uniquely
placing single reads
• Duplicated rRNA
genes: rrsE & rrIE
• No coverage by
unique single reads
• Mapped with mate-
paired reads
Coverage by mate-
paired reads
Bad library Usable library
Multi-step sample preparation produce artifacts
Some reads are artificial – polyclonal beads result