master thesis presentation
TRANSCRIPT
Whole Genome Sequencing analysis of an outbreak of
Mycobacterium caprae in alpine region affecting both domestic and wildlife animals: molecular
epidemiological perspective
ByAshok VaradharajanMsc EpidemiologyLMU
SupervisorDr. Helmut BlumLaboratory for Functional Genome AnalysisGene Center of the LMUFeodor-Lynen-Strasse 2581377 Munich
BACKGROUND Tuberculosis An infectious bacterial disease characterized by the growth of nodules (tubercles) in the tissues, especially the lungs.
Mycobaterium Tuberculosis Complex (MTC) Genetically related group of Mycobacterium species causing tuberculosis in humans or other organisms Mycobacterium
tuberculosis
Mycobacterium africanum
Mycobacterium bovis
Mycobacterium microti
Mycobacterium canettii
Mycobacterium caprae
Mycobacterium pinnipedii
Mycobacterium suricattae
Mycobacterium mungi
tubercles
BACKGROUND
Mycobacterium caprae Formerly known as M. Tuberculosis subsp. caprae and M. Bovis subsp.
caprae
Designated as new species in the MTC in 2003
Predominantly found in Central and Western European countries
Pathogen can affect both animals and Humans
Differentiating Features in MTC Special combination of polymorphisms in the genes of oxyR , pncA,
katG, gyrA and gyrB Region of Differentiation 4 (RD4)
BACKGROUND
Phylogeny of MTC (Rodriguez-Campos et.al, 2014)
Differences between
M. caprae and M. bovis
PROBLEMS
Increased prevalance of tuberculosis by M. caprae around Alpine regions
Affected in various species such as Cattle, Red deer, Fox and Human
Lack of Knowledge about the transmission of infection between domestic and wildlife animals
Little knowledge about genomic features of M. Caprae concerning markers for genotyping
Lack of Complete genome sequence of M.caprae, complicating the differentiation of M. caprae and M. bovis
OBJECTIVES
Analysis of spatial spread of the disease by Whole Genome Sequencing analysis.
1. Identification of Variants 2. Clustering of isolates 3. Analysis of Infection chain4. Analysis of RD4 region
Development of Web Integrated Environment 1. Automation of Variant Calling pipeline 2. A tailored database to store the identified variants3. Real time generation of results such as phylogenetic tree, Minimum
Spanning Tree , SNP plot Coverage plot etc
Distribution of sequenced M. caprae isolates
Host Number of Isolates
Red Deer 95
Cattle 180
Human 3
Schaf 1
Total 279
By host By Location
Germany
RegionNumber of
Isolates
GAP 9
Landshut 3OA 122OAL 21
Oldenburg 2RO 1TÖÄ 2UA 7UAL 4WM 2
Unknown 15
Austria
Region Number of Isolates
BH Schwaz 7
Bludenz 18
Bregenz 2
Eranet 12
Kaisers 12
Reutte 15
Vorarlber 1
Zillertal 23
Total 279
WHOLE GENOME SEQUENCING (WGS)
Sequencer - Illumina HiSeq 1500Technique - Paired-End sequencingCoverage - 100xRead Length – 100 – 150 basepairs
Principle of WGS
Downstream Analysis
M. Caprae Genome
Library Preparation
Sequenced Reads
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment
BWA - MEM
Removing Duplicates
Picard - MarkDuplica
tes
Variant Calling
• Samtools - Pileup
• VarScan - mpileup2sn
p
SNPs
VCF
Variant Calling Workflow
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing
DuplicatesVariant Calling SNPs
@HISEQ:55:H80HWADXX:2:1101:1341:2054 1:N:0:GCTCAATTCGAGCCGATGCACCAGGTGTTCCTAGGTGTGCGGT+CCCFFFFFHHHHHJJJJJJJJJJJJHIJJJJJIJJHIIJJJJJ
@HISEQ:55:H80HWADXX:2:1101:1259:2056 1:N:0:AGCCTGCTGGTTGCTGGGTCATTGCGCCATGCCTTCGAGAACA+CBCFFFFFHHHHHIJIIJIHIJJIJIJJJJJJJJJJJJIJJJJ
@HISEQ:55:H80HWADXX:2:1101:1341:2054 3:N:0:GGGGGATCGACCGCTCCCGGAATTCGGTGGAAGCTGCTGCGGT+CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJHHD
@HISEQ:55:H80HWADXX:2:1101:1259:2056 3:N:0:GGCGAGGGCCGCGTCATTGCGGCGTAGCGTGGACGCGATGTTG+CCCFFFFFHHHHHIIJJJJJJJJJIGHFFFDDDDDDDDDDDDD
Identifiers->Sequence ->Comments ->
Quality ->
Identifiers->Sequence ->Comments ->
Quality ->
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing
DuplicatesVariant Calling SNPs
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing
DuplicatesVariant Calling SNPs
Reference Genome Position
Mapped Reads
Coverage
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing
DuplicatesVariant Calling SNPs
Duplicate
Reads
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing
DuplicatesVariant Calling SNPs
SNP
Reference Base
METHODS - IDENTIFICATION OF VARIANTS
Paired End Fastq files
Quality Filter Alignment Removing
DuplicatesVariant Calling SNPs
METHODS – CLUSTERING OF ISOLATES
- Phylogenetic tree can be drawn from custom generated Fasta sequence containing only the identified SNPs
- FastTree tool can generate approximately-maximum-likelihood phylogenetic trees
Phylogenetic Tree Magnified Version of Constructed tree
METHODS – ANALYSIS OF INFECTION CHAIN
Minimum Spanning Tree
Algorithm available:
Kruskal's algorithm Prim's algorithm reverse-delete
algorithm
Applications Taxonomy Cluster Analysis Constructing trees for broadcasting in computer networks
METHODS – ANALYSIS OF RD4 REGION
OAOAOAOAOAOALROGAPGAPGAPGAPOAOAOAOAOAOA
Compare Strains
View SNPs
SNP plot
Phylogenetic Tree
Coverage Plot
Minimum Spanning Tree
WEB INTEGRATED ENVIRONMENT
Illumina HiSeq Sequencer
Fastq Files
Raw Reads
Demultiplexing
Database
Web Server
Variant Calling Pipeline
Galaxy
MCDBA
Mycobacterium caprae Database and Analysis
(MCDBA)
QUESTIONS????
THANK YOU