1 use a circular template to get redundant reads and so more accuracy. pacific biosciences

39
1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Upload: kaley-settle

Post on 02-Apr-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

1

Use a circular template to get redundant reads and so more accuracy.

Pacific Biosciences

Page 2: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

2

DNA methylation detection by bisulfite conversion

Page 3: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

3

Detection of methylated adenine in Pacific Biosciences (SMRT) sequencing

Page 4: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

4

IPD = average interpulse duration ratio (meth/non-meth)

Template position

Page 5: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

5

Pacific Biosciences

• 50,000 ZMWs (Aug., 2011), and density may climb

• Long reads (e.g., full molecules to determine full length splicing isoforms)

• Direct RNA sequencing possible.

• DNA methylation detectable

Page 6: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

6

Agilent SureSelect RNA Target Enrichment

Capture a subgenomic region of interest for economy and speed of sequencing:

E.g.,

the entire exome (all exons w/o introns or intergeneic regions)

hundreds of cancer genes

a particular genomic locus

Alternative: hybridize to a custom microarray.

Agilent

Page 7: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

7

Nimblegen (Roche) sub=-genomic DNA capture options: Beads or microarrays

Page 8: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

8

Targeted Capture and Next-Generation SequencingIdentifies C9orf75, encoding Taperin, as the Mutated Gene in Nonsyndromic Deafness DFNB79

Rehman et al.American Journal of Human Genetics 86, 378–388,2010

Some results using DNA capture for subgenomic sequencing

Page 9: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

9

----CpG-- > ----CmpG--- > ----CmpG--- >< ---G p Cm---

Na bisulfiteHeat

cytosine

uracil

----UpG-- > ----CmpG--- >

Na bisulfiteHeat

deamination

PCR

----TpG-- ><--ApC---

----CpG-- ><--GpC---

All NON-methylated Cs changed to T. Sequence and compare to deduce the methylated C’s

Detection of methylated C (~all in CpG dinucleotides)

DS DNA

Page 10: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

10

DEEP SEQUENCING (Next generation sequencing, High throughput sequencing, Massively parallel sequencing) applications:

Human genome re-sequencing (mutations, SNPs, haplotypes, disease associations, personalized medicine)

Tumor genome sequencing

Microbial flora sequencing (microbiome, viruses)Metagenomic sequencing (without cell culturing)

RNA sequencing (RNAseq; gene expression levels, miRNAs, lncRNAs, splicing isoforms)

Chromatin structure (ChIP-seq; histone modifications, nucleosome positioning)Epigenetic modifications (DNA CpG methylation and hydroxymethylation)

Transcription kinetics (GROseq; nascent RNA, BrdU pulse labeled RNA)

High throughput genetics (QUEPASA; cis-acting regulatory motif discovery)Drug discovery (bar-coded organic molecule libraries) [Manocci PNAS paper]

Page 11: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

11

Ke et al, and Chasin, Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res. 2011. 21: 1360-1374 ).

Order an equal mixture of all 4 bases at these 6 positions

Page 12: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

12

Quantifying extensive phenotypic arraysfrom sequence arrays (= QUEPASA)

Page 13: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

13Rank 6-mer ESRseq score (~ -1 to +1)1 AGAAGA 1.0339 2 GAAGAT 0.9918 3 GACGTC 0.9836 4 GAAGAC 0.9642 5 TCGTCG 0.9517 6 TGAAGA 0.9434 7 CAAGAA 0.9219 8 CGTCGA 0.8853 : :4086 TAGATA -0.86094087 AGGTAG -0.87134088 CGTCGC 0.8850 4089 CTTAAA -0.87864090 CCTTTA -0.88124091 GCAAGA 0.89114092 TAGTTA -0.89334093 TCGCCG 0.91134094 CCAGCA -0.89424093 CTAGTA -0.92514094 TAGTAG -0.9383 4095 TAGGTA -0.9965 4096 CTTTTA -1.0610

Best exonic splicing enhancers

Worst exonic splicing enhancers,= best exonic splicing silencers

-

-

-

Page 14: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

14

Composite exon (from ~100,000)

Constitutive exons

Alternativexons

Pseudo exons

Page 15: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

1515

Experiment: 1 1 1 2 2 1+2 2 2 1 2

Sequence of 36 Quality codeCGCACTGTGCTGGAGCTCCCGGGGTTAACTCTAGAA abU^Vaa`a\aaa]aWaTNZ`aa`Q][TE[UaP_U]TACACTGTGCTGGAGCTCCCAACGGCAACTCTAGAA a`P^Wa`[`Wa^`X_X_XWVa^NSP]_]S^X_T\X^CGCACTGTGCTGGAGCTCCCATGGAGAACTCTAGAA aTa`^b``baaaa^aab^YaTQLOHIa`^a``TX]]TACACTGTGCTGGAGCTCCCCTCCCAAACTCTAGAA I_`aaaa`aaaaaaa_a_^[KZIGIGZ`U`\^P^^`CGCACTGTGCTGGAGCTCCCAATAGTAACTTTAGAA aY_\abb[T\abaaa`a`bZ[HXXIZa_`_LGMS[`TATACTGTGCTGGAGCTCCCGACGTAAACTCTAGAA aba]^aa_a]`aa]_]`XWSMFGGIPX[P]X`V_Y^TACACTGTGCTGGAGCTCCCTGGTAAAACTCTAGAA a_^a^aa`aYaaa_aY`Y_^[I]VY\`]V]R\W]VVTACACTGTGCTGGAGCTCCCAATAAAAACTCTAGAA XZababa`aZaaaaaYaYXX`baa``\\TaUa\aW`

2 nt barcode (TA or CG)

Constant regions(peculiar to our expt.)

Variable regionBarcoding allows multiplexing of several or many experiments at once(in one channel of a sequencer) economy. Here, two biological replicates

What the data looks like:

Error

Page 16: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

16

Next generation methods for high throughput genetic analysis:

Use custom oligo libraries to construct minigene libraries (40,000, up to 60 nt long):

E.g., for saturation mutagenesis to identify all exonic bases contributing to splicing (or transcription or polyadenylation, …..)

Use bar codes to detect sequences missing from the selected molecules

E.g., Nat Biotechnol. 2009 27:1173-5. High-resolution analysis of DNA regulatory elements by synthetic saturation mutagenesis. Patwardhan RP, Lee C, Litvin O, Young DL, Pe'er D, Shendure J.

Long (200-mer) synthetic oligo library

Page 17: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

1717OUTLINE OF LECTURE TOPICS COMING UP

Expression and manipulation of transgenes in the laboratory

• In vitro mutagenesis to isolate variants of your protein/gene with desirable properties– Single base mutations– Deletions– Overlap extension PCR– Cassette mutagenesis

• To study the protein: Express your transgene – Usually in E. coli, for speed, economy– Expression in eukaryotic hosts– Drive it with a promoter/enhancer– Purify it via a protein tag– Cleave it to get the pure protein

• Explore protein-protein interaction• Co-immunoprecipitation (co-IP) from extracts• 2-hybrid formation• surface plasmon resonance• FRET (Fluorescence resonance energy transfer)• Complementation readout

Page 18: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

1818

PCR

fragment subsequent cloning in a plasmid

(or not, the PCR product itself can be used in many ways, e.g., transfection)

Cut with RE 1 and 2

Ligate into similarly cut vector

RS1 RS2

RS1 RS2

Site-directed mutagenesis by overlap extension PCR

1 2

Strachan and Read Human Mol. Genet.3, p.148

Page 19: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

1919

Original sequence coding for, e.g., a transcription enhancer region

Cassette mutagenesis = random mutagenesis but in a limited region:

1) by error-prone PCR

------*--------*--*-**---------------*-----------*--*-------*------------------------*-*-*------------*------------*--

----------------------------------------------------------------------------------------------------------------------

Cut in primer sites and clone upstream of a reporter protein sequence.

Pick coloniesAnalyze phenotypes Sequence

PCR fragment with high Taqpolymerase and Mn+2 instead of Mg+2 errors

Page 20: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

2020

Original enhancer sequence

-*------------------------*-*-*------------*------------*--------*--------*--*-**---------------*-----------*--*------

----------------------------------------------------------------------------------------------------------------------

Buy 2 doped oligos; annealOK for up to ~80 nt.

Clone upstream of a reporter. Doping = e.g., 90% G, 3.3% A, 3.3% C, 3.3% Tat each position

Pick coloniesAnalyze phenotypes Sequence

Cassette mutagenesis = random mutagenesis but in a limited region:

2) by “doped” synthesis Target = e.g., an enhancer element

Page 21: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

2121

E. coli as a host

• PROs:Easy, flexible, high tech, fast, cheap; but problems

• CONs

• Folding (can misfold)

• Sorting within the cell -> can form inclusion bodies

• Purification -- endotoxins• Modifications -- not done (glycosylation, phosphorylation, etc. )

• Modifications:• Glycoproteins • Acylation: acetylation, myristoylation• Methylation (arg, lys)• Phosphorylation (ser, thr, tyr)• Sulfation (tyr)• Prenylation (farnesyl, geranylgeranyl on cys)• Vitamin C-Dependent Modifications (hydroxylation of proline and lysine)• Vitamin K-Dependent Modifications (gamma carboxylation of glu)• Selenoproteins (seleno-cys tRNA at UGA stop)

Page 22: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

E. coli expression vectorsPromoter examples:

1) Lac promoter (with operator)-YFG, + lac repressor (I gene): Induce expression by inactivationof thelac repressor with IPTG or lactose

2) As above but with a hybrid Tac promoter (tryptophan operon + lac operon):Stronger. Use iq mutant of lac I gene, which prodices high levels of the lac repressor.

Expression regulatatable over several orders of magnitude.

3) BAD promoter-YFG. Arabinose utilization operon. Inducible by arabinose via theendogenous araC gene for a transciptional activator. Background levels driven down by including glucose.

4) Phage T7 promoter-YFG. Vector carries gene for T7 polymerase, under control of the lac promoter. Add IPTG or lactose to induce T7 polymerase and thence YFG.

IPTG = isoproplthiogalactoside (non-metabolizable indicer)YFG = your favorite gene

Page 23: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

23

Myristoylation – myristoic acid to N-terminal glycine alpha amino group

Anchors protein to memebrane.

Page 24: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

24

Lysine epsilon amino group modifications

mono methyl, dimethyl also

Well-studied in histones, microtubules

Page 25: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

25

Via seleno-cys tRNA at a UGA nonsense codonSequence context dictates efficiency.

Page 26: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

26Gamma carboxylation of glutamic acid

Binds calcium, used in coagulation proteins

Page 27: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

2727

Some alternative hosts

• Yeasts (Saccharomyces , Pichia)• Insect cells with baculovirus vectors• Mammalian cells in culture (later)• Whole organisms (mice, goats, corn)

(not discussed) • In vitro (cell-free), for analysis only, not preparatively

(good for radiolabeled proteins, discussed later)

Page 28: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Some popular yeast promoters

ARS = autonomously replicating sequence element

Selectable marker

orihttp://biochemie.web.med.uni-muenchen.de/Yeast_Biol/04 Yeast Molecular Techniques.pdf

Page 29: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

2929Yeast Expression Vector (example)

2μ = 2 micron plasmid

2 mu seq features:yeast orioriE = bacterial oriAmpr = bacterial selectionLEU2, e.g. = Leu biosynthesisfor yeast selection

Saccharomyces cerevisiae(baker’s yeast)

oriE

Your favorite

gene(Yfg)

LEU2

Ampr

GAPD term’n

GAPD prom

Complementation of an auxotrophy can be used instead of drug-resistance

Auxotrophy = state of a mutant in a biosynthetic pathway resulting in a requirement for a nutrient

GAPD = the enzyme glyceraldehyde-3 phosphate dehydrogenase

For growth in E. coli

Page 30: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Got this far

Page 31: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

31

Genomic DNA

HIS4 mutation-

Yeast - genomic integration via homologous recombination

HIS4

gfY

pt Vector DNA

FunctionalHIS4 gene

DefectiveHIS4 gene

Yfg

tp

Genomic DNA

Page 32: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

32

Double recombination Yeast (integration in Pichia pastoris)

AOX1 gene (~ 30% of total protein)

Genomic DNA

AOX1p

Yfg

AOX1t HIS4 3’AOX1

Genomic DNA

HIS4

Yfg

AOX1p

AOX1t

3’AOX1

Vector DNA

P. pastoris-tight control-methanol induced (AOX1)-large scale production (gram quantities)

Alcohol oxidase gene

Page 33: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Expression in mammalian cellsLab examples of immortal cell lines:HEK293 Human embyonic kidney (high transfection efficiency)HeLa Human cervical carcinoma (historical, low RNase)CHO Chinese hamster ovary (hardy, diploid DNA content, mutants)Cos Monkey cells with SV40 replication proteins (-> high transgene copies)3T3 Mouse or human exhibiting ~regulated (normal-like) growth+ various others, many differentiated to different degrees, e.g.:BHK Baby hamster kidney HepG2 Human hepatomaGH3 Rat pituitary cellsPC12 Mouse neuronal-like tumor cellsMCF7 Human breast cancerHT1080 Human fibroblastic cells with near diploid karyotypeIPS induced pluripotent stem cells and:Primary cells cultured with a limited lifetime. E.g., MEF = mouse embryonic fibroblasts, HDF = Human diploid fibroblasts

Common in industry:NS1 mAbs Mouse plasma cell tumor cellsVero vaccines African greem monkey cellsCHO mAbs, other therapeutic proteins Chinese hamster ovary cellsPER6 mAbs, other therapeutic proteins Human retinal cells

Page 34: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Mammalian cell expression

Generalized gene structure for mammalian expression:

cDNA geneMam.prom.

polyA site

intron

5’UTR3’UTR

Intron is optional but a good idea

Page 35: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Popular mammalian cell promoters

• SV40 LargeT Ag (Simian Virus 40)• RSV LTR (Rous sarcoma virus)• MMTV (steroid inducible) (Mouse mammary tumor virus)• HSV TK (low expression) (Herpes simplex virus)• Metallothionein (metal inducible, Cd++)• CMV early (Cytomegalovirus)• Actin• EIF2alpha• Engineered inducible / repressible:

tet, ecdysone, glucocorticoid (tet = tetracycline)

Page 36: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Engineered regulated expression:Tetracycline-reponsive promotersTet-OFF (add tet shut off)

tTA cDNA

tTA = tet activator fusion protein:tetR = tet repressor (original role)

tetRdomain

VP16 transcriptionactivation domain

No tet.Binds tet operator (multiple copies)(if tet not also bound)

tetRdomain

Tetracycline (tet), or,better, doxicyclin (dox)

active

not active

CMV prom.

polyA sitetTA gene must be in cell (permanent transfection, integrated):

Tet-OFF

Tet-OFF

(Bujold et al.)

Allosteric change in conformation

VP16 transcriptionactivation domain

Page 37: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

MIN. CMV prom. your favorite gene

polyA site

Mutliple tet operator elements

MIN. CMV prom. your favorite gene

polyA site

tetRdomain

VP16 tc’nact’n domain

not activelittle transcripton (2%?, bkgd)

Doxicyclin present:

MIN. CMV prom. your favorite gene

polyA siteactivePlenty of transcripton

No doxicyclin:

tetRdomain

VP16 tc’nact’n domain

RNA po l

Tet-OFF, cont.

Page 38: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

Tetracycline-reponsive promotersTet-ON (add tet turn on gene

tTA cDNA

tetRdomain

VP16 tc’nact’n domain

tetRdomain

VP16 tc’nact’n domain

Tetracycline (tet), or,better, doxicyclin (dox)

active

not active

Full CMV prom.

polyA site

Different fusion protein: Does NOT bind tet operator(if tet not bound)

Tet-ON

Must be in cell (permanent transfection, integrated): commercially available (293, CHO) or do-it-yourself

Page 39: 1 Use a circular template to get redundant reads and so more accuracy. Pacific Biosciences

MIN. CMV prom. your favorite gene

polyA site

Mutliple tet operator elements

MIN. CMV prom. your favorite gene

polyA site

active

Doxicyclin absent:

MIN. CMV prom. your favorite gene

polyA siteactivePlenty of transcripton (> 50X)

Add dox:

tetRdomain

VP16 tc’nact’n domain

RNA pol II

Tet-ON

tetRdomain

VP16 tc’nact’n domain

not active little transcription (bkgd.)

doxicyclin