next generation sequencing in virus and parasite research

Next Generation Sequencing Next Generation Sequencing in Virus and Parasite Researchin Virus and Parasite Research

Sanger Read

>800bp

GS-FLX read

~250bp 500 bp

100Mb|

500Mbper run

Annotation

PopulationDiversity

PathogenDiscovery

Applications Presented

Four main projectsIn the lab

Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis

• Total scaffolds: ~8250• Longest scaffold: 6.5 Mb• Total bases in scaffolds: 71 Mb• Total span of scaffolds: 80 Mb

Genome size ~100Mb

6 chromosomes in 8250 pieces

Sanger(cloning bias)

Closing the

Genome

Next-generation sequencing

Fingerprint maps

Curating the Data

DATABASEMapping 5’ and 3’UTRs

Functional annotation

Re-assemble genome Re-annotate

Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data

(Hybrid Sanger-GSFLX assembly) (Confirm UTRs by GSFLX)

Mix of random reads and paired readsAvg read length: ~220bp

~100 Mb

GS-FLX Sequencing of WormgDNA and cDNA

5 runs= 5X coverage of the genome

5’UTR 3’UTR SL gDNA

Paired-Ends and WGS UTRs

Whole Plate 4-well gasket

Mapping of paired and non-paired reads onto genomic assembly

SEQUENCE ASSEMBLYhits100%

80%Paired-ends

No apparent Bias

20Mb of Brugia reads = ~0.25X coverage

Sequencing UTRs of B. malayi

CIPTAPRNA ligase

RT-PCR

RNA oligoMmeI site

NlaIII

SAGE Tag

Unique sequence

Concatenated SAGE Tags

DITAGS

(variable length)

Sequencing Results

One sequence run

~50Mb of data in ~400,000 reads

5’UTR 3’UTR SL

Data processingRaw Data

RemoveLinker, Small tags(<10),

Identical, Junk

Blast against

Genome EST Exon CDS

Unmatched tags

Blast against

Small contigs

Mitochondrion Bacterial singletons

3’-tag

SL-tag

5’-tag

40S ribosomal protein S18

Mapping of Tags

Intra-Host Diversity of Influenza A Virus

Antigenic variants Drug resistant and Sensitive variants

HA1 HA2566aa1,757nt

Amplicons:

Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin

Mapped Translated GS-FLX Reads on Epitopes of HA1 Domain

E D A B D B D D E C

Patterns:Non-Synonymous mutations are predominantly

in epitope regions(13/19 sites)

BBAAAAD#reads23

1717811114111

Identifying rare variants:Drug resistance mutation

Resistant H1N11/437=0.2%

agt (S) aat (N)

#reads

Matrix segment in H1N1 isolate

SNP Analyses: Probability that Polymorphism is Real

Base# A C G N T GAP SNP probability

pbShort(polybayes)- Marth Lab, Boston College

Error Correction(homopolymer tracks)

Signal Processing: Length Distribution adjusting the stringency of quality filters

Changes length distributionReads slightly shorter BUT Average quality is higher

Default

Higher stringency

Read length

75,000 – avg ln 20070,000 – avg ln 195

Signal Processing: Quality Distribution

Reduce the # of basesBUTIncrease the proportion ofbases of HIGH QUALITY

Default

Higher stringency

Quality Score

15 Million bp14 Million bp

Whole Virus Genome Sequencing

Limitation of read length BUT:

- Isolate single genome (limited dilution, other?)- Random prime or specific primers with barcodes- use barcode to amplify- Multiplex: 20 barcodes, 16-well gasket = 320 samples

Virus Genomic Library Construction- Discovery -

cDNA or

Klenow Exo-DNA polymerase

Select 500 bp amplicons for emulsion PCR and

pyrosequencing

NNNNNNNN

NNNNNNNNNNNN

NNNNNNNN

1a Reversetranscription

1b DNAextension fromrandom primers

2Amplification

from tags

3Size selection& Sequencing

Multiplexing by Barcoding

Barcodes mapped onto readsNUCMER

MySQL db

BLASTNBLASTX

Post-Processing Pipeline

Reads clusteredand reduced to a unique set

26,750 contigs BLASTN 56% match human DNA12, 889 contigs BLASTX 120 match viruses

Periodontal Disease Caries

Pool 1

Family FamilyFamilyFamily

HIGH LOW HIGH LOW

5 2 3 76 84

Oral Microbiome Project

Bacterial Diversity Heat Maps:

Sequencing of 16S rRNA variable

region

Sequencing of PCR Amplicons 250bp in size

AcknowledgmentsAcknowledgments

School of Dental School of Dental MedicineMedicineMary Marazita

Ghedin LabGhedin LabSchool of MedicineSchool of MedicineJay DePasseAdam FitchXu Zhang

Graduate School of Graduate School of Public healthPublic healthRobert FerrellMike Barmaba

Funding:Funding:

NIDCR/NIHNIDCR/NIH

CTSICTSI

JDRFJDRF

Burroughs-Burroughs-Wellcome FundWellcome Fund

GPCLGPCLDebby Hollingshead Paul WoodJanette Lamb

next generation sequencing in virus and parasite research

mb of data

mb of brugia

gsflx slide

mb genome size

lab slide

gasket slide

sequencing utrs of

mb total bases

Documents

messana science 8 chapter 25. microbes = microorganism,...

what is a virus? what is the definition of a virus? what is...

complete genome direct rna sequencing of influenza a...

the poison parasite defense - dartmouth...

laboratory of experimental virology virus discovery 454...

prrs virus surveillance: role of virus sequencing and...

dr.linda maher. infection and inflammation infection...

chapter 20 keeping food safe. key terms foodborne illness...

structure of a protozoan virus from the human ... ·...

chlamydia alice beckholt rn, ms, cns. chlamydia trachomatis,...

multiplex pcr method for minion and illumina sequencing of...

repared by the plasmodium writing group...1 comprehensive...

next generation sequencing for virus discovery ... · 5-...

the cryptogenic parasite haplosporidium pinnae invades the...

using virus sequencing to determine source of sars-cov-2...

repared by the plasmodium writing group · comprehensive...

introduction. 1- parasite types of parasites: obligatory...

part ii - parasite remains preserved in various materials...

new - isipaper.org · rna , cytomegalovirus (cmv) (r...

direct next-generation sequencing of virus-human mixed