a novel application of pacific biosciences smrt technology
TRANSCRIPT
A Novel Application of Pacific Biosciences SMRT Technology
Steven T. Lott, PhD, MB(ASCP)CR
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Agenda for Today
Technology Overview
Technology Applications
PACIFIC BIOSCIENCES™ CONFIDENTIAL
DNA Polymerase as a Sequencing Engine
ZMW with DNA polymerase ZMW with DNA polymerase and phospholinked
nucleotides
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Base-linked (2nd Generation) Phospholinked (PacBio)
Cleavage by
DNA polymerase
A New Concept for Labeled Nucleotides
• Fluorophore stays in DNA
• Inhibits enzyme
• Creates background light
• Fluor naturally cleaved off by polymerase
• DNA synthesized is uninterrupted
• Eliminates steric hindrance and noise
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Processive Synthesis with Phospholinked Nucleotides
Step 1: Fluorescent phospholinked labeled nucleotides are introduced into the ZMW.
Step 2: The base being incorporated is held in the detection volume for tens of milliseconds, producing a bright flash
of light.
Step 3: The phosphate chain is cleaved, releasing the attached dye molecule.
Step 4-5: The process repeats.
PACIFIC BIOSCIENCES™ CONFIDENTIAL
• 75,000 ZMWs monitored simultaneously
• Tens of thousands of polymerase molecules monitored immobilized in the bottom of the ZMWs*
• ≥1 bases incorporated per second
Real-Time Detection of ZMWs in a Multiplex Array
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Building of the SMRTbellTM
Sample Preparation Workflow
Sample Preparation
DNA Sample
Fragment DNA
Repair Ends
Ligate Adapters
Purify DNA
Binding
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Targeted Enrichment
Products
WGA
cDNA
Genomic DNA
Sample Preparation Workflow
Cancer
Genomics
Gene Expression,
Transcriptome
Profiling
Resequencing,
de novo sequencing,
metagenomics
Targeted
Sequencing,
Resequencing
FFPE(< 600 bp)
WGA
Resequencing,
de novo sequencing,
metagenomicsDNA Sample
Fragment DNA
Repair Ends
Ligate Adapters
Purify DNA
Binding
PACIFIC BIOSCIENCES™ CONFIDENTIAL PACIFIC BIOSCIENCES™ CONFIDENTIAL
Single Molecule Prep Keeps Bias Low
SMRT
2G
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Sam
ple
Pre
para
tio
n
Sequence
• Large insert sizes
• Generates one pass on each molecule sequenced
• Very large insert sizes
• Generates distributed reads on each molecule sequenced
• Small insert sizes
• Generates multiple passes on each molecule sequenced
Standard
Strobe
Circular
Consensus
Flexibility of Sequencing with Multiple Protocols
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Agenda for Today
Technology Overview
SMRT Applications
PACIFIC BIOSCIENCES™ CONFIDENTIAL
SMRT technology will enable comprehensive profiling by providing
real time measurements
DNA
RNA
PROTEIN
animations by: wehi.edu.au
PACIFIC BIOSCIENCES™ CONFIDENTIAL
One of the first applications to leverage the power of SMRT was
the real-time disease weather map
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Fast Time to Result
In collaboration with NYDOH and JCVI
PACIFIC BIOSCIENCES™ CONFIDENTIAL
The capability of SMRT™ sequencing to sequence the same
molecule repeatedly leads to unprecedented accuracy
Example: Influenza, H1N1 (A/New York/1682/2009), Fragment 7 (1,027 bp)
3 consecutive subreads: 12 kb combined read (4+ complete SMRTbell laps)
2.7 kb 4.1 kb
Full-length genomic segment consensus sequence: 100% accurate
5.2 kb
Example section:
PACIFIC BIOSCIENCES™ CONFIDENTIAL
• High-traffic surfaces at Pacific Biosciences:
– Front door handle
– Common laboratory bench top
– Break room refrigerator door handle
– Slide projector remote control
– Lavatory toilet flush handle
– Lavatory door handle
– Laboratory telephone handle
– Cubicle desk surface
– Money
Detection of Viral Pathogens from Inanimate Surfaces
Sampled every week for a period of one month
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Pacific Biosciences Volunteers
Anonymous donors, submitting nasopharyngeal swabs every two weeks over ~2.5 months
PACIFIC BIOSCIENCES™ CONFIDENTIAL
• Multiple strains identified
– H1N1
– H3N2
– H2N2, H5N1 (rarely)
• Some samples with multiple strains present
• Some strains only occurred over single sampling period
– H5N1 – 1
– H2N2 – 2
Detection of Viral Pathogens – Influenza on Surfaces
Sampling period
projector remote 2
restroom door 1
front door 1
1 2 3 4
restroom door 2
toilet flush handle 1
desk 3
desk 2
desk 8
$1*
$1*
projector remote 1
fridge door 2
H1N1
H2N2
H1N1 H1N1/H3N2
H1N1/H2N2
H1N1 H1N1/H3N2 H1N1/H3N2
H1N1/H3N2
H1N1/H3N2H5N1
H1N1/H3N2
H1N1H1N1/H3N2
H1N1/H3N2H5N1
H1N1/H3N2
H1N1
H1N1/H3N2
H1N1/H3N2
H3N2/H1N1
H1N1 H1N1
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Rapid identification of the Haiti cholera outbreak strain
19World Health Organization
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Typical Cholera Outbreak Patterns (Given by WHO in 2009)
No outbreaks
seen in nearly
100 years
No outbreaks
seen in nearly
100 years
20
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Timeline
21
Initiating the project (6-7 Nov)
• 6 Nov: Matt calls• 7 Nov: PacBio
decides to go
Sample prep and sequencing (8-12 Nov)
• 8 Nov: Matt’s group cultures samples
• 10 Nov: Matt’s group sends DNA
• 11 Nov: Sequencing begins
• 12 Nov: Sequencing of 5 genomes complete
Analysis (13-15 Nov)
• Data QC• Assembly and
variation detection
• Building phylogenetic trees
• Annotating structure variation regions
Writing the paper (15-19 Nov)
• PacBio first draft (16 Nov)
• Refined with editor input (18 Nov)
• Submitted to NEJM (19 Nov)
Provisional acceptance (20-24 Nov)
• Reviews received 22 Nov
• Decision indicating intent to publish 24 Nov
Formal acceptance
(25 Nov – 1 Dec)
• Multiple revisions with editor (25-30 Nov)
• Official acceptance (1 Dec)
Paper Published!!
(9 Dec)
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Groups solidly within South Asia
group, NOT Latin America group
Single nucleotide variations unambiguously positioned the
Haitian cholera strain next to South Asia strains
22
PACIFIC BIOSCIENCES™ CONFIDENTIAL
What is the functional relevance of a complete genome:
Resolving the cholera toxin region
Cholera toxin prophage
• Adjacency to other elements define ability, for example to produce virions
• Knowing order of these elements in this region is critical to understanding
infectivity, virulence and other parameters related to public health threat
PACIFIC BIOSCIENCES™ CONFIDENTIAL
A potential transportation fuel for the future
At present a major commodity chemical
• Ammonia production
• Hydrocracking of crude oil
2008 market for hydrogen = 10 billion kg.
2020 projected market for hydrogen = 20 billion kg.
An appealing aspect of hydrogen is that it can be produced by many different routes. Today,
98% of the world’s hydrogen comes from fossil fuels.
Biological Hydrogen Production
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Photosynthesis
Rhodopseudomonas (image from ORNL)
Carbon
metabolism
R. palustris as an Efficient Path to Hydrogen Production
• Hydrogen production involves hundreds of proteins in a web of molecular
interactions
• We are seeking to understand this network to enhance hydrogen production
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Characterize a population of strains (125 isolates from the wild) of Rhodopseudomonas
INTEGRATE!!! ���� Probabilistic causal networks that drive
production of hydrogen
Whole Genome sequencing:
To reveal genetic variation
between strains. Will serve as
scaffold on which to map
transcriptional activity.
Digital, sequence-based
transcriptome profiling:
To reveal transcriptional output of
strains grown under different
hydrogen-producing conditions.
Measure H2 and
nitrogenase activities:
To reveal physiological
potential of strains
Collaboration with Carrie Harwood/UW:
Experimentally validate computationally derived networks
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Beyond the long reads and flexible configurations for a run lies the
time dimension: Getting back to real time all the time
70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.50
100
200
300
400
Flu
ore
scen
ce
inte
nsity (
a.u
.)
Time (s)
104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.50
100
200
300
400
Flu
ore
sce
nce
inte
nsity (
a.u
.)
Time (s)
C
T G A TC G T A C
mA
AG TCT A A
G C C A A A
A
• Kinetic variation at a given site observed over multiple reads covering that site can be
directly observed
• In R. pal. and other alpha-proteobacteria, the A residue in a GATC context can be
methylated by the DNA adenine methylase enzyme (for gene regulation)
DNA
RNA
PROTEIN
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Beyond Long Reads and the A’s, C’s, G’s, and T’s by Detecting Kinetic Variation Regions that Affect Phenotype
Analyze R. pal
seq data for
kinetic variation
R pal genome covered by many reads
70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.50
100
200
300
400
Flu
ore
sce
nce
inte
nsity (
a.u
.)
Time (s)
104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.50
100
200
300
400
Flu
ore
sce
nce
inte
nsity (
a.u
.)
Time (s)
4000 bases
Ra
tio
of
Ra
tes
Regions detected with little variation
Genome Coordinate
Vs. regions with a lot of variation
Ra
tio
of
Ra
tes
Genome Coordinate
PACIFIC BIOSCIENCES™ CONFIDENTIAL
4000 bases
1500 bases
Genome Coordinate
Ra
tio
of
Ra
tes
Ra
tio
of
Ra
tes
Genes associated with kinetic variation that in turn associate with hydrogen production
Regions detected with little variation
Regions detected with a lot of variation
Results:
• Thousands of sites detected with significant variation (1% FDR)
• AT residues in GATC context were > 12-fold enriched for kinetic variation!
• Regions of increased and decreased variation found
• Even without validation these data can be used as a covariate in QTL analyses(i.e., is kinetic variation region associated with H2 production)
Nitrogenase
genes
PACIFIC BIOSCIENCES™ CONFIDENTIAL
An Entire Network Identified that Affects Nitrogenase Activity
and then Hydrogen Production as a Result
PACIFIC BIOSCIENCES™ CONFIDENTIAL
Structural differences ���� differences in energy production
x2.1
-1.0
-0.5
0.0
0.5
x2
.2
Structural QTL Genotype in the H2 Network of Genes
Exp
ressio
n le
ve
l o
f ke
y H
2 n
etw
ork
drive
rs (
log r
atio
)
BIS3 genotype
DX-1 genotype
Big H2 output
Small H2 output
Portion of the R. pal genome:
Structure of DX-1 Strain
Structure of BIS3 Strain
Structural differences emerge in comparing these strains; structural differences drive the hydrogen network
PACIFIC BIOSCIENCES™ CONFIDENTIAL
If we want to succeed at this kind of game, we need the power of
SMRT to provide a more accurate picture
• Long reads to assemble de novo
genomes (not using whole
genome sequencing for SNP
detection)
• Long reads to fully characterize
the transcriptome (fusion
products, etc)
• Kinetic information to inform on
the full epigenome
• Circular consensus to get at the
most highly accurate base calls
possible
• Quick time to results for maximal
impact