a novel application of pacific biosciences smrt technology

A Novel Application of Pacific Biosciences SMRT Technology

Steven T. Lott, PhD, MB(ASCP)CR

PACIFIC BIOSCIENCES™ CONFIDENTIAL

Agenda for Today

Technology Overview

Technology Applications


DNA Polymerase as a Sequencing Engine

ZMW with DNA polymerase ZMW with DNA polymerase and phospholinked

nucleotides


Base-linked (2nd Generation) Phospholinked (PacBio)

Cleavage by

DNA polymerase

A New Concept for Labeled Nucleotides

• Fluorophore stays in DNA

• Inhibits enzyme

• Creates background light

• Fluor naturally cleaved off by polymerase

• DNA synthesized is uninterrupted

• Eliminates steric hindrance and noise


Processive Synthesis with Phospholinked Nucleotides

Step 1: Fluorescent phospholinked labeled nucleotides are introduced into the ZMW.

Step 2: The base being incorporated is held in the detection volume for tens of milliseconds, producing a bright flash

of light.

Step 3: The phosphate chain is cleaved, releasing the attached dye molecule.

Step 4-5: The process repeats.


• 75,000 ZMWs monitored simultaneously

• Tens of thousands of polymerase molecules monitored immobilized in the bottom of the ZMWs*

• ≥1 bases incorporated per second

Real-Time Detection of ZMWs in a Multiplex Array


Building of the SMRTbellTM

Sample Preparation Workflow

Sample Preparation

DNA Sample

Fragment DNA

Repair Ends

Ligate Adapters

Purify DNA

Binding


Targeted Enrichment

Products

WGA

cDNA

Genomic DNA

Sample Preparation Workflow

Cancer

Genomics

Gene Expression,

Transcriptome

Profiling

Resequencing,

de novo sequencing,

metagenomics

Targeted

Sequencing,

Resequencing

FFPE(< 600 bp)

WGA

Resequencing,

de novo sequencing,

metagenomicsDNA Sample

Fragment DNA

Repair Ends

Ligate Adapters

Purify DNA

Binding

PACIFIC BIOSCIENCES™ CONFIDENTIAL PACIFIC BIOSCIENCES™ CONFIDENTIAL

Single Molecule Prep Keeps Bias Low

SMRT

2G


Sam

ple

Pre

para

tio

n

Sequence

• Large insert sizes

• Generates one pass on each molecule sequenced

• Very large insert sizes

• Generates distributed reads on each molecule sequenced

• Small insert sizes

• Generates multiple passes on each molecule sequenced

Standard

Strobe

Circular

Consensus

Flexibility of Sequencing with Multiple Protocols


Agenda for Today

Technology Overview

SMRT Applications


SMRT technology will enable comprehensive profiling by providing

real time measurements

DNA

RNA

PROTEIN

animations by: wehi.edu.au


One of the first applications to leverage the power of SMRT was

the real-time disease weather map


Fast Time to Result

In collaboration with NYDOH and JCVI


The capability of SMRT™ sequencing to sequence the same

molecule repeatedly leads to unprecedented accuracy

Example: Influenza, H1N1 (A/New York/1682/2009), Fragment 7 (1,027 bp)

3 consecutive subreads: 12 kb combined read (4+ complete SMRTbell laps)

2.7 kb 4.1 kb

Full-length genomic segment consensus sequence: 100% accurate

5.2 kb

Example section:


• High-traffic surfaces at Pacific Biosciences:

– Front door handle

– Common laboratory bench top

– Break room refrigerator door handle

– Slide projector remote control

– Lavatory toilet flush handle

– Lavatory door handle

– Laboratory telephone handle

– Cubicle desk surface

– Money

Detection of Viral Pathogens from Inanimate Surfaces

Sampled every week for a period of one month


Pacific Biosciences Volunteers

Anonymous donors, submitting nasopharyngeal swabs every two weeks over ~2.5 months


• Multiple strains identified

– H1N1

– H3N2

– H2N2, H5N1 (rarely)

• Some samples with multiple strains present

• Some strains only occurred over single sampling period

– H5N1 – 1

– H2N2 – 2

Detection of Viral Pathogens – Influenza on Surfaces

Sampling period

projector remote 2

restroom door 1

front door 1

1 2 3 4

restroom door 2

toilet flush handle 1

desk 3

desk 2

desk 8

$1*

$1*

projector remote 1

fridge door 2

H1N1

H2N2

H1N1 H1N1/H3N2

H1N1/H2N2

H1N1 H1N1/H3N2 H1N1/H3N2

H1N1/H3N2

H1N1/H3N2H5N1

H1N1/H3N2

H1N1H1N1/H3N2

H1N1/H3N2H5N1

H1N1/H3N2

H1N1

H1N1/H3N2

H1N1/H3N2

H3N2/H1N1

H1N1 H1N1


Rapid identification of the Haiti cholera outbreak strain

19World Health Organization


Typical Cholera Outbreak Patterns (Given by WHO in 2009)

No outbreaks

seen in nearly

100 years

No outbreaks

seen in nearly

100 years

20


Timeline

21

Initiating the project (6-7 Nov)

• 6 Nov: Matt calls• 7 Nov: PacBio

decides to go

Sample prep and sequencing (8-12 Nov)

• 8 Nov: Matt’s group cultures samples

• 10 Nov: Matt’s group sends DNA

• 11 Nov: Sequencing begins

• 12 Nov: Sequencing of 5 genomes complete

Analysis (13-15 Nov)

• Data QC• Assembly and

variation detection

• Building phylogenetic trees

• Annotating structure variation regions

Writing the paper (15-19 Nov)

• PacBio first draft (16 Nov)

• Refined with editor input (18 Nov)

• Submitted to NEJM (19 Nov)

Provisional acceptance (20-24 Nov)

• Reviews received 22 Nov

• Decision indicating intent to publish 24 Nov

Formal acceptance

(25 Nov – 1 Dec)

• Multiple revisions with editor (25-30 Nov)

• Official acceptance (1 Dec)

Paper Published!!

(9 Dec)


Groups solidly within South Asia

group, NOT Latin America group

Single nucleotide variations unambiguously positioned the

Haitian cholera strain next to South Asia strains

22


What is the functional relevance of a complete genome:

Resolving the cholera toxin region

Cholera toxin prophage

• Adjacency to other elements define ability, for example to produce virions

• Knowing order of these elements in this region is critical to understanding

infectivity, virulence and other parameters related to public health threat


A potential transportation fuel for the future

At present a major commodity chemical

• Ammonia production

• Hydrocracking of crude oil

2008 market for hydrogen = 10 billion kg.

2020 projected market for hydrogen = 20 billion kg.

An appealing aspect of hydrogen is that it can be produced by many different routes. Today,

98% of the world’s hydrogen comes from fossil fuels.

Biological Hydrogen Production


Photosynthesis

Rhodopseudomonas (image from ORNL)

Carbon

metabolism

R. palustris as an Efficient Path to Hydrogen Production

• Hydrogen production involves hundreds of proteins in a web of molecular

interactions

• We are seeking to understand this network to enhance hydrogen production


Characterize a population of strains (125 isolates from the wild) of Rhodopseudomonas

INTEGRATE!!! �� Probabilistic causal networks that drive

production of hydrogen

Whole Genome sequencing:

To reveal genetic variation

between strains. Will serve as

scaffold on which to map

transcriptional activity.

Digital, sequence-based

transcriptome profiling:

To reveal transcriptional output of

strains grown under different

hydrogen-producing conditions.

Measure H2 and

nitrogenase activities:

To reveal physiological

potential of strains

Collaboration with Carrie Harwood/UW:

Experimentally validate computationally derived networks


Beyond the long reads and flexible configurations for a run lies the

time dimension: Getting back to real time all the time

70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.50

100

200

300

400

Flu

ore

scen

ce

inte

nsity (

a.u

.)

Time (s)

104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.50

100

200

300

400

Flu

ore

sce

nce

inte

nsity (

a.u

.)

Time (s)

C

T G A TC G T A C

mA

AG TCT A A

G C C A A A

A

• Kinetic variation at a given site observed over multiple reads covering that site can be

directly observed

• In R. pal. and other alpha-proteobacteria, the A residue in a GATC context can be

methylated by the DNA adenine methylase enzyme (for gene regulation)

DNA

RNA

PROTEIN


Beyond Long Reads and the A’s, C’s, G’s, and T’s by Detecting Kinetic Variation Regions that Affect Phenotype

Analyze R. pal

seq data for

kinetic variation

R pal genome covered by many reads

70.5 71.0 71.5 72.0 72.5 73.0 73.5 74.0 74.50

100

200

300

400

Flu

ore

sce

nce

inte

nsity (

a.u

.)

Time (s)

104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.50

100

200

300

400

Flu

ore

sce

nce

inte

nsity (

a.u

.)

Time (s)

4000 bases

Ra

tio

of

Ra

tes

Regions detected with little variation

Genome Coordinate

Vs. regions with a lot of variation

Ra

tio

of

Ra

tes

Genome Coordinate


4000 bases

1500 bases

Genome Coordinate

Ra

tio

of

Ra

tes

Ra

tio

of

Ra

tes

Genes associated with kinetic variation that in turn associate with hydrogen production

Regions detected with little variation

Regions detected with a lot of variation

Results:

• Thousands of sites detected with significant variation (1% FDR)

• AT residues in GATC context were > 12-fold enriched for kinetic variation!

• Regions of increased and decreased variation found

• Even without validation these data can be used as a covariate in QTL analyses(i.e., is kinetic variation region associated with H2 production)

Nitrogenase

genes


An Entire Network Identified that Affects Nitrogenase Activity

and then Hydrogen Production as a Result


Structural differences �� differences in energy production

x2.1

-1.0

-0.5

0.0

0.5

x2

.2

Structural QTL Genotype in the H2 Network of Genes

Exp

ressio

n le

ve

l o

f ke

y H

2 n

etw

ork

drive

rs (

log r

atio

)

BIS3 genotype

DX-1 genotype

Big H2 output

Small H2 output

Portion of the R. pal genome:

Structure of DX-1 Strain

Structure of BIS3 Strain

Structural differences emerge in comparing these strains; structural differences drive the hydrogen network


If we want to succeed at this kind of game, we need the power of

SMRT to provide a more accurate picture

• Long reads to assemble de novo

genomes (not using whole

genome sequencing for SNP

detection)

• Long reads to fully characterize

the transcriptome (fusion

products, etc)

• Kinetic information to inform on

the full epigenome

• Circular consensus to get at the

most highly accurate base calls

possible

• Quick time to results for maximal

impact

a novel application of pacific biosciences smrt technology

Documents