Thanks to: Washington U, Harvard-MIT
Broad Inst., DARPA-BioSpice, DOE-GTL, EU-MolTools,
NGHRI-CEGS, NHLBI-PGA, NIGMS-SysBio, PhRMA, Lipper Foundation
Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, SynBioCorp, ThermoFinnigan, Xeotron/Invitrogen
For more info see: arep.med.harvard.edu
CHI Microarrays in Medicine4-May-2005 9:20-9:50 AM
Synthesis &Analysis on Molecular Arrays
Systems Biology Loop
Syntheses &Perturbations
Models
Experimental designs
(Systematic)
Data
Analysis & Synthesis Tools
Genome engineering
DNA & RNAPolony
Sequencing
Why Synthetic Genomes & Proteomes?
• Test array hypotheses e.g. cis-DNA/RNA-elements • Multi-epitopes, vaccines, protein design• Mass spectrometry & array standards.• Access to any protein (complex) including post-transcriptional modifications• Utility of molecular biology DNA-RNA-Protein
in vitro "kits" (e.g. PCR, T7, Roche)
Whole genome or part?Whole if major redesign e.g. changingthe genetic code and stability.
Up to 760K Oligos/Chip18 Mbp for $1K raw (6-18K genes)
<1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng , Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert
Tian, Gong, Church
Improve DNA Synthesis CostSynthesis on chips in pools is 5000X less expensive per
oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!)
Solution: Amplify the oligos then release them.
10 50 10 => ss-70-mer (chip)
20-mer PCR primers with restriction sites at the 50mer junctions
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
=> ds-90-mer
=> ds-50-mer
Improve DNA Synthesis Accuracyvia mismatch selection
Tian & Church Other mismatch methods: MutS (&H,L)
Improving DNA synthesis accuracy
Method Bp/error
Chip assembly only 160 Hybridization-selection 1,400MutS-gel-shift 10,000PCR 35 cycles 10,000MutHLS cleavage 100,000
Tian & Church 2004 NatureCarr & Jacobson 2004 NARSmith & Modrich 1997 PNAShttp://www.invitrogen.com/content.cfm?pageid=453
Computer aided Design Polymerase Assembly Multiplexing (CAD-PAM)
For tandem, inverted and dispersed repeats: Focus on 3' ends, hierarchical assembly, size-selection and scaffolding.
Mullis 1986 CSHSQB, Dillon 1990 BioTech, Stemmer 1995 Gene Tian et al. 2004 Nature, Kodumal et 2004 PNAS
50
75
125 225 425 825 … 100*2^(n-1)
Genome assembly
0 1 2 3 4 PAM cycle# 550 75 125 225 425 #bp 825
50 HS PAM 425 MutS PAM 10K anneal 100K red5Mbp
USER USER-S1 USER-5'only One pool 480 pools 480 genomic 48 1 of 117K universal primers primer pairs
HS=Hybridization-SelectionUSER=Uracil DNA glycosylase &EndoVIII remove flanking primer pairs
] ]PCR in vitro
All 30S-Ribosomal-protein DNAs(codon re-optimized)
Tian, Gong, Sheng , Zhou, Gulari, Gao, Church
1.7 kb
0.3 kb
s190.3kb
Nimblegen 95K chip
Atactic <4K chip
Extreme mRNA makeover for protein expression in vitro
RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially.
RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable.
Solution: Iteratively resynthesize all mRNAs with less mRNA structure.
Tian & Church
20w 20m 17w 17m 16w 16m
10kd
W: wild-typeM: modified
Western blot based on His-tags
3 Exponential technologies
Shendure J, Mitra R, Varma C, Church GM, 2004 Nature Reviews of Genetics. Carlson 2003 ; Kurzweil 2002
1E-3
1E-1
1E+1
1E+3
1E+5
1E+7
1E+9
1E+11
1E+13
1830 1850 1870 1890 1910 1930 1950 1970 1990 2010
urea
E.coli
B12
tRNA
operons
telegraph
Computation & communication
(bits/sec)
Synthesis (daltons)
Analysis(bp/$) tRNA
Why Personal Genomics?
• Pathogen rapid response: emerging disease & biowarfare• B & T-cell diversity: clinical temporal profiling• Proteomics: antibodies & aptamers • RNA & methylation: quantitate splicing, & chromatin.• Preventative medicine: genotype–phenotype association• Cancer: drug targets, loss-of-heterozygosity• Synthetic biology: laboratory selections• Phylogenetic: footprinting, biodiversity
Shendure et al. 2004 Nature Reviews of Genetics
Cancer Genome Projectdiagnosis, prognosis, therapies
Mutations G719S, L858R, Del746ELREA in red.
EGFR Mutations in lung cancer: correlation with clinical response to gefitinib [Iressa] therapy.
Paez, … Meyerson (Apr 2004) Science 304: 1497
Lynch … Haber, (Apr 2004) New Engl J Med. 350:2129.
Pao .. Mardis,Wilson,Varmus H, PNAS (Aug 2004) 101:13306-11.
Dulbecco R. (1986) A turning point in cancer research: sequencing the human genome. Science 231:1055-6.
A’
A’A’
A’
A’
A’
B
BB
B
BB
A
Single Molecule From Library
B
BA’
A’
1st Round of PCR
Primer is Extendedby Polymerase
B
A’
BA’
Polymerase colony (polony) PCR in a gel
Primer A has 5’ immobilizing Acrydite
Mitra & Church Nucleic Acids Res. 27: e34
Polymerase clones Plone sequencing
Polony-slides vs. Plone-beads1 vs. 2 immobilized primersdNTP extension vs. ligationSingle molecule vs. multi-molecule detection
Cleavable dNTP-Fluorophore (& terminators)
Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65
Reduce
or
photo-cleave
Polony Bead Sequencing Pipeline
In vitro libraries via paired tag
manipulation
Bead polonies via emulsion PCR
[Dre03]
Monolayered immobilization in acrylamide
Enrichment of amplified beads
SOFTWARE
Images → Tag Sequences
Tag Sequences → Genome
FISSEQ or “wobble”sequencing
Epifluorescence Scope with Integrated Flow
Cell
Mitra, Shendure, Porreca, Rosenbaum, Church unpub.
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
C
A
G
C
G
C
C
A
G
C
G
C
GM12248 GM12249
GM10835
T
T
A
T
A
T
T
T
A
T
A
T
C
A
G
C
G
C
T
T
A
T
A
T
Haplotypes inferred by pedigreevs. direct single molecule measures homozygous
in the parents
heterozygous in the son
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
GM10835
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
C
A
G
C
G
C
T
T
A
T
A
T
1Mb haplotypes
AT=198 GT=0 GC=45
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
GM10835
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
C
A
G
C
G
C
T
T
A
T
A
T
75Mb haplotypes
TT=8 TC=0 AC=23
rs3778973
rs997906
rs1557917
rs39284
rs10500042
rs4717028
GM10835
1.8Mb
79.9Mb
88.2Mb
89.4Mb
114Mb
155Mb
C
A
G
C
G
C
T
T
A
T
A
T
153Mb haplotypes
TT=72 CT=15 CC=28
Plone-bead Fluorescent In Situ Sequencing in vitro Libraries
Greg PorrecaAbraham Rosenbaum
1 to 100kb Genomic1 to 100kb Genomic
M
L R
M
PCRbead
Sequencingprimers
Selectorbead
2x20bp after MmeI2x20bp after MmeI
Dressman et al PNAS 2003 emulsion
Plone-FISSeq: up to 1 billion beads/slideWhite= Fe-core pixels, Cy5 primer (570nm) ; Cy3 dNTP (666nm)
Jay Shendure, Greg Porreca
• # of bases sequenced (total Mbp) 23 (no) 10.8 (yes)
• # bases sequenced (unique) 73 b 4.7 Mb (72%)
• Avg fold coverage 324K 2.3
• Pixels used per bead (analysis) 3.6 3.6
• Read Length (bp) 14 24
• Indels 0.6% ?
• Substitutions (raw error-rate) 4e-5 1e-2• Throughput (kb/min) 360 10• Speed/cost ratio relative to 1100 32 current ABI capillary sequencing @ 0.75 kb/min/device
Plone-bead FISSeq '04 '05Consider amplification , homopolymer, context errors?
Shendure & Porreca
CD44 Exon Combinatorics (Zhu & Shendure)
• Alternatively Spliced Cell Adhesion Molecule• Specific variable exons are up-or-down-regulated in
various cancers (>2000 papers)• v6 & v7 enable direct binding to chondroitin sulfate,
heparin…
Zhu,J, et al. Science. 301:836-8.
Zhu J, Shendure J, Mitra RD, Church GM. Science 301:836-8. Single molecule profiling of alternative pre-mRNA splicing.
EXON PATTERN Eph4 Eph4bDD TOTALEph4 FRATIO LSTP-PV------------7-8-9-10 609 764 1373 1.17 1E-4--------------8-9-10 320 390 710 1.13 3E-2----------6-7-8-9-10 431 251 682 -1.85 4E-18------4-5-6-7-8-9-10 218 216 434 -1.08 2E-1----------------9-10 68 143 211 1.96 7E-7--------5-6-7-8-9-10 86 39 125 -2.37 2E-6----3-4-5-6-7-8-9-10 40 56 96 1.30 9E-2------4-5---7-8-9-10 16 74 90 4.30 2E-9--2-3-4-5-6-7-8-9-10 44 28 72 -1.69 1E-21-2-3-4-5-6-7-8-9-10 22 5 27 -4.73 3E-4--------5---7-8-9-10 5 19 24 3.53 3E-3----3-4-5---7-8-9-10 1 15 16 13.95 4E-4--2-3-4-5---7-8-9-10 1 10 11 9.30 5E-3
Eph4 = murine mammary epithelial cell line
Eph4bDD = stable transfection of Eph4 with MEK-1 (tumorigenic)
CD44 RNA splicing isoforms
Systems Biology Loop
Syntheses &Perturbations
Models
Experimental designs
(Systematic)
Data
Analysis & Synthesis Tools
Genome engineering
DNA & RNAPolony
Sequencing
Molecular Systems BiologyTranscriptomics
Proteomics Metabolomics
Functional genomics Structural genomics
Computational biology Theoretical biology
Mathematical biologySynthetic biology
An open access journalwww.nature.com/msb/
Thanks to: Washington U, Harvard-MIT
Broad Inst., DARPA-BioSpice, DOE-GTL, EU-MolTools,
NGHRI-CEGS, NHLBI-PGA, NIGMS-SysBio, PhRMA, Lipper Foundation
Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, SynBioCorp, ThermoFinnigan, Xeotron/Invitrogen
For more info see: arep.med.harvard.edu
CHI Microarrays in Medicine4-May-2005 9:20-9:50 AM
Synthesis &Analysis on Molecular Arrays