next generation sequencing in virus and parasite research
TRANSCRIPT
![Page 1: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/1.jpg)
Next Generation Sequencing Next Generation Sequencing in Virus and Parasite Researchin Virus and Parasite Research
![Page 2: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/2.jpg)
Sanger Read
>800bp
GS-FLX read
~250bp 500 bp
100Mb|
500Mbper run
WGS
Annotation
PopulationDiversity
PathogenDiscovery
Applications Presented
Four main projectsIn the lab
![Page 3: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/3.jpg)
Brugia malayi Genome ProjectParasitic nematode, causes lymphatic filariasis
• Total scaffolds: ~8250• Longest scaffold: 6.5 Mb• Total bases in scaffolds: 71 Mb• Total span of scaffolds: 80 Mb
Genome size ~100Mb
6 chromosomes in 8250 pieces
Sanger(cloning bias)
![Page 4: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/4.jpg)
Closing the
Genome
Next-generation sequencing
Fingerprint maps
Curating the Data
DATABASEMapping 5’ and 3’UTRs
Functional annotation
Re-assemble genome Re-annotate
Brugia malayi Genome ProjectPHASE II – Use Next-Gen Data
(Hybrid Sanger-GSFLX assembly) (Confirm UTRs by GSFLX)
![Page 5: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/5.jpg)
Mix of random reads and paired readsAvg read length: ~220bp
~100 Mb
GS-FLX Sequencing of WormgDNA and cDNA
5 runs= 5X coverage of the genome
5’UTR 3’UTR SL gDNA
Paired-Ends and WGS UTRs
Whole Plate 4-well gasket
![Page 6: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/6.jpg)
Mapping of paired and non-paired reads onto genomic assembly
SEQUENCE ASSEMBLYhits100%
||
80%Paired-ends
No apparent Bias
20Mb of Brugia reads = ~0.25X coverage
![Page 7: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/7.jpg)
Sequencing UTRs of B. malayi
mRNA
PAAAA
CIPTAPRNA ligase
AAAA
RT-PCR
RNA oligoMmeI site
NlaIII
SAGE Tag
Unique sequence
Concatenated SAGE Tags
AAAA
DITAGS
(variable length)
![Page 8: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/8.jpg)
Sequencing Results
One sequence run
~50Mb of data in ~400,000 reads
5’UTR 3’UTR SL
![Page 9: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/9.jpg)
Data processingRaw Data
RemoveLinker, Small tags(<10),
Identical, Junk
Blast against
Genome EST Exon CDS
Unmatched tags
Blast against
Small contigs
Mitochondrion Bacterial singletons
![Page 10: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/10.jpg)
EST
3’-tag
SL-tag
5’-tag
40S ribosomal protein S18
Mapping of Tags
![Page 11: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/11.jpg)
Intra-Host Diversity of Influenza A Virus
Antigenic variants Drug resistant and Sensitive variants
![Page 12: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/12.jpg)
HA1 HA2566aa1,757nt
Amplicons:
Mapped GS-FLX Sequence Readson antigenic domain of Hemagglutinin
450bp
![Page 13: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/13.jpg)
Mapped Translated GS-FLX Reads on Epitopes of HA1 Domain
E D A B D B D D E C
![Page 14: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/14.jpg)
Patterns:Non-Synonymous mutations are predominantly
in epitope regions(13/19 sites)
BBAAAAD#reads23
1221
12212
![Page 15: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/15.jpg)
4137
421
1717811114111
35
Identifying rare variants:Drug resistance mutation
Resistant H1N11/437=0.2%
agt (S) aat (N)
N31S
#reads
Matrix segment in H1N1 isolate
![Page 16: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/16.jpg)
SNP Analyses: Probability that Polymorphism is Real
Base# A C G N T GAP SNP probability
pbShort(polybayes)- Marth Lab, Boston College
![Page 17: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/17.jpg)
Error Correction(homopolymer tracks)
![Page 18: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/18.jpg)
Signal Processing: Length Distribution adjusting the stringency of quality filters
Changes length distributionReads slightly shorter BUT Average quality is higher
Default
Higher stringency
Read length
75,000 – avg ln 20070,000 – avg ln 195
![Page 19: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/19.jpg)
Signal Processing: Quality Distribution
Reduce the # of basesBUTIncrease the proportion ofbases of HIGH QUALITY
Default
Higher stringency
Quality Score
15 Million bp14 Million bp
![Page 20: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/20.jpg)
Whole Virus Genome Sequencing
Limitation of read length BUT:
- Isolate single genome (limited dilution, other?)- Random prime or specific primers with barcodes- use barcode to amplify- Multiplex: 20 barcodes, 16-well gasket = 320 samples
![Page 21: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/21.jpg)
Virus Genomic Library Construction- Discovery -
RNA
RT
PCR
cDNA or
ssDNA
Klenow Exo-DNA polymerase
dsDNA
Select 500 bp amplicons for emulsion PCR and
pyrosequencing
NNNN
NNNN
NNNNNNNN
NNNNNNNNNNNN
NNNNNNNN
NNNNNNNN
1a Reversetranscription
1b DNAextension fromrandom primers
2Amplification
from tags
3Size selection& Sequencing
![Page 22: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/22.jpg)
Multiplexing by Barcoding
Pools
![Page 23: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/23.jpg)
Barcodes mapped onto readsNUCMER
MySQL db
BLASTNBLASTX
Post-Processing Pipeline
Reads clusteredand reduced to a unique set
![Page 24: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/24.jpg)
26,750 contigs BLASTN 56% match human DNA12, 889 contigs BLASTX 120 match viruses
![Page 25: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/25.jpg)
Periodontal Disease Caries
VIR
AL
VIR
AL
VIR
AL
VIR
AL
BA
CT
ER
IAL
BA
CT
ER
IAL
BA
CT
ER
IAL
BA
CT
ER
IAL
Pool 1
Family FamilyFamilyFamily
BU128
WV409
BK026
BR095
HIGH LOW HIGH LOW
TagA
TagB
TagC
TagD
5 2 3 76 84
BU128
WV409
BK026
BR095
WV001
WV213
BK044
BU130
WV001
WV213
BK044
BU130
BR009
WV597
WV631
BU133
BR009
WV597
WV631
BU133
BR023
WV041
BU137
WV628
BR023
WV041
BU137
WV628
Oral Microbiome Project
![Page 26: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/26.jpg)
Bacterial Diversity Heat Maps:
Sequencing of 16S rRNA variable
region
Sequencing of PCR Amplicons 250bp in size
![Page 27: Next Generation Sequencing in Virus and Parasite Research](https://reader030.vdocuments.us/reader030/viewer/2022032723/56649cfa5503460f949cbe0d/html5/thumbnails/27.jpg)
AcknowledgmentsAcknowledgments
School of Dental School of Dental MedicineMedicineMary Marazita
Ghedin LabGhedin LabSchool of MedicineSchool of MedicineJay DePasseAdam FitchXu Zhang
Graduate School of Graduate School of Public healthPublic healthRobert FerrellMike Barmaba
Funding:Funding:
NIDCR/NIHNIDCR/NIH
CTSICTSI
JDRFJDRF
Burroughs-Burroughs-Wellcome FundWellcome Fund
GPCLGPCLDebby Hollingshead Paul WoodJanette Lamb