next generation sequencing in virus and parasite research

Post on 17-Dec-2015

220 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Next Generation Sequencing Next Generation Sequencing in Virus and Parasite Researchin Virus and Parasite Research

Sanger Read

>800bp

GS-FLX read

~250bp 500 bp

100Mb|

500Mbper run

WGS

Annotation

PopulationDiversity

PathogenDiscovery

Applications Presented

Four main projectsIn the lab

Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis

• Total scaffolds: ~8250• Longest scaffold: 6.5 Mb• Total bases in scaffolds: 71 Mb• Total span of scaffolds: 80 Mb

Genome size ~100Mb

6 chromosomes in 8250 pieces

Sanger(cloning bias)

Closing the

Genome

Next-generation sequencing

Fingerprint maps

Curating the Data

DATABASEMapping 5’ and 3’UTRs

Functional annotation

Re-assemble genome Re-annotate

Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data

(Hybrid Sanger-GSFLX assembly) (Confirm UTRs by GSFLX)

Mix of random reads and paired readsAvg read length: ~220bp

~100 Mb

GS-FLX Sequencing of WormgDNA and cDNA

5 runs= 5X coverage of the genome

5’UTR 3’UTR SL gDNA

Paired-Ends and WGS UTRs

Whole Plate 4-well gasket

Mapping of paired and non-paired reads onto genomic assembly

SEQUENCE ASSEMBLYhits100%

||

80%Paired-ends

No apparent Bias

20Mb of Brugia reads = ~0.25X coverage

Sequencing UTRs of B. malayi

mRNA

PAAAA

CIPTAPRNA ligase

AAAA

RT-PCR

RNA oligoMmeI site

NlaIII

SAGE Tag

Unique sequence

Concatenated SAGE Tags

AAAA

DITAGS

(variable length)

Sequencing Results

One sequence run

~50Mb of data in ~400,000 reads

5’UTR 3’UTR SL

Data processingRaw Data

RemoveLinker, Small tags(<10),

Identical, Junk

Blast against

Genome EST Exon CDS

Unmatched tags

Blast against

Small contigs

Mitochondrion Bacterial singletons

EST

3’-tag

SL-tag

5’-tag

40S ribosomal protein S18

Mapping of Tags

Intra-Host Diversity of Influenza A Virus

Antigenic variants Drug resistant and Sensitive variants

HA1 HA2566aa1,757nt

Amplicons:

Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin

450bp

Mapped Translated GS-FLX Reads on Epitopes of HA1 Domain

E D A B D B D D E C

Patterns:Non-Synonymous mutations are predominantly

in epitope regions(13/19 sites)

BBAAAAD#reads23

1221

12212

4137

421

1717811114111

35

Identifying rare variants:Drug resistance mutation

Resistant H1N11/437=0.2%

agt (S) aat (N)

N31S

#reads

Matrix segment in H1N1 isolate

SNP Analyses: Probability that Polymorphism is Real

Base# A C G N T GAP SNP probability

pbShort(polybayes)- Marth Lab, Boston College

Error Correction(homopolymer tracks)

Signal Processing: Length Distribution adjusting the stringency of quality filters

Changes length distributionReads slightly shorter BUT Average quality is higher

Default

Higher stringency

Read length

75,000 – avg ln 20070,000 – avg ln 195

Signal Processing: Quality Distribution

Reduce the # of basesBUTIncrease the proportion ofbases of HIGH QUALITY

Default

Higher stringency

Quality Score

15 Million bp14 Million bp

Whole Virus Genome Sequencing

Limitation of read length BUT:

- Isolate single genome (limited dilution, other?)- Random prime or specific primers with barcodes- use barcode to amplify- Multiplex: 20 barcodes, 16-well gasket = 320 samples

Virus Genomic Library Construction- Discovery -

RNA

RT

PCR

cDNA or

ssDNA

Klenow Exo-DNA polymerase

dsDNA

Select 500 bp amplicons for emulsion PCR and

pyrosequencing

NNNN

NNNN

NNNNNNNN

NNNNNNNNNNNN

NNNNNNNN

NNNNNNNN

1a Reversetranscription

1b DNAextension fromrandom primers

2Amplification

from tags

3Size selection& Sequencing

Multiplexing by Barcoding

Pools

Barcodes mapped onto readsNUCMER

MySQL db

BLASTNBLASTX

Post-Processing Pipeline

Reads clusteredand reduced to a unique set

26,750 contigs BLASTN 56% match human DNA12, 889 contigs BLASTX 120 match viruses

Periodontal Disease Caries

VIR

AL

VIR

AL

VIR

AL

VIR

AL

BA

CT

ER

IAL

BA

CT

ER

IAL

BA

CT

ER

IAL

BA

CT

ER

IAL

Pool 1

Family FamilyFamilyFamily

BU128

WV409

BK026

BR095

HIGH LOW HIGH LOW

TagA

TagB

TagC

TagD

5 2 3 76 84

BU128

WV409

BK026

BR095

WV001

WV213

BK044

BU130

WV001

WV213

BK044

BU130

BR009

WV597

WV631

BU133

BR009

WV597

WV631

BU133

BR023

WV041

BU137

WV628

BR023

WV041

BU137

WV628

Oral Microbiome Project

Bacterial Diversity Heat Maps:

Sequencing of 16S rRNA variable

region

Sequencing of PCR Amplicons 250bp in size

AcknowledgmentsAcknowledgments

School of Dental School of Dental MedicineMedicineMary Marazita

Ghedin LabGhedin LabSchool of MedicineSchool of MedicineJay DePasseAdam FitchXu Zhang

Graduate School of Graduate School of Public healthPublic healthRobert FerrellMike Barmaba

Funding:Funding:

NIDCR/NIHNIDCR/NIH

CTSICTSI

JDRFJDRF

Burroughs-Burroughs-Wellcome FundWellcome Fund

GPCLGPCLDebby Hollingshead Paul WoodJanette Lamb

top related