master thesis presentation

21
Whole Genome Sequencing analysis of an outbreak of Mycobacterium caprae in alpine region affecting both domestic and wildlife animals: molecular epidemiological perspective By Ashok Varadharajan Msc Epidemiology LMU pervisor . Helmut Blum boratory for Functional Genome Analysis ne Center of the LMU odor-Lynen-Strasse 25 377 Munich

Upload: ashok-varadharajan

Post on 18-Aug-2015

12 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Master Thesis Presentation

Whole Genome Sequencing analysis of an outbreak of

Mycobacterium caprae in alpine region affecting both domestic and wildlife animals: molecular

epidemiological perspective

ByAshok VaradharajanMsc EpidemiologyLMU

SupervisorDr. Helmut BlumLaboratory for Functional Genome AnalysisGene Center of the LMUFeodor-Lynen-Strasse 2581377 Munich

Page 2: Master Thesis Presentation

BACKGROUND Tuberculosis An infectious bacterial disease characterized by the growth of nodules (tubercles) in the tissues, especially the lungs.

Mycobaterium Tuberculosis Complex (MTC) Genetically related group of Mycobacterium species causing tuberculosis in humans or other organisms Mycobacterium

tuberculosis

Mycobacterium africanum

Mycobacterium bovis

Mycobacterium microti

Mycobacterium canettii

Mycobacterium caprae

Mycobacterium pinnipedii

Mycobacterium suricattae

Mycobacterium mungi

tubercles

Page 3: Master Thesis Presentation

BACKGROUND

Mycobacterium caprae Formerly known as M. Tuberculosis subsp. caprae and M. Bovis subsp.

caprae

Designated as new species in the MTC in 2003

Predominantly found in Central and Western European countries

Pathogen can affect both animals and Humans

Differentiating Features in MTC Special combination of polymorphisms in the genes of oxyR , pncA,

katG, gyrA and gyrB Region of Differentiation 4 (RD4)

Page 4: Master Thesis Presentation

BACKGROUND

Phylogeny of MTC (Rodriguez-Campos et.al, 2014)

Differences between

M. caprae and M. bovis

Page 5: Master Thesis Presentation

PROBLEMS

Increased prevalance of tuberculosis by M. caprae around Alpine regions

Affected in various species such as Cattle, Red deer, Fox and Human

Lack of Knowledge about the transmission of infection between domestic and wildlife animals

Little knowledge about genomic features of M. Caprae concerning markers for genotyping

Lack of Complete genome sequence of M.caprae, complicating the differentiation of M. caprae and M. bovis

Page 6: Master Thesis Presentation

OBJECTIVES

Analysis of spatial spread of the disease by Whole Genome Sequencing analysis.

1. Identification of Variants 2. Clustering of isolates 3. Analysis of Infection chain4. Analysis of RD4 region

Development of Web Integrated Environment 1. Automation of Variant Calling pipeline 2. A tailored database to store the identified variants3. Real time generation of results such as phylogenetic tree, Minimum

Spanning Tree , SNP plot Coverage plot etc

Page 7: Master Thesis Presentation

Distribution of sequenced M. caprae isolates

Host Number of Isolates

Red Deer 95

Cattle 180

Human 3

Schaf 1

Total 279

By host By Location

Germany

RegionNumber of

Isolates

GAP 9

Landshut 3OA 122OAL 21

Oldenburg 2RO 1TÖÄ 2UA 7UAL 4WM 2

Unknown 15

Austria

Region Number of Isolates

BH Schwaz 7

Bludenz 18

Bregenz 2

Eranet 12

Kaisers 12

Reutte 15

Vorarlber 1

Zillertal 23

Total 279

Page 8: Master Thesis Presentation

WHOLE GENOME SEQUENCING (WGS)

Sequencer - Illumina HiSeq 1500Technique - Paired-End sequencingCoverage - 100xRead Length – 100 – 150 basepairs

Principle of WGS

Downstream Analysis

M. Caprae Genome

Library Preparation

Sequenced Reads

Page 9: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment

BWA - MEM

Removing Duplicates

Picard - MarkDuplica

tes

Variant Calling

• Samtools - Pileup

• VarScan - mpileup2sn

p

SNPs

VCF

Variant Calling Workflow

Page 10: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing

DuplicatesVariant Calling SNPs

@HISEQ:55:H80HWADXX:2:1101:1341:2054 1:N:0:GCTCAATTCGAGCCGATGCACCAGGTGTTCCTAGGTGTGCGGT+CCCFFFFFHHHHHJJJJJJJJJJJJHIJJJJJIJJHIIJJJJJ

@HISEQ:55:H80HWADXX:2:1101:1259:2056 1:N:0:AGCCTGCTGGTTGCTGGGTCATTGCGCCATGCCTTCGAGAACA+CBCFFFFFHHHHHIJIIJIHIJJIJIJJJJJJJJJJJJIJJJJ

@HISEQ:55:H80HWADXX:2:1101:1341:2054 3:N:0:GGGGGATCGACCGCTCCCGGAATTCGGTGGAAGCTGCTGCGGT+CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJHHD

@HISEQ:55:H80HWADXX:2:1101:1259:2056 3:N:0:GGCGAGGGCCGCGTCATTGCGGCGTAGCGTGGACGCGATGTTG+CCCFFFFFHHHHHIIJJJJJJJJJIGHFFFDDDDDDDDDDDDD

Identifiers->Sequence ->Comments ->

Quality ->

Identifiers->Sequence ->Comments ->

Quality ->

Page 11: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing

DuplicatesVariant Calling SNPs

Page 12: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing

DuplicatesVariant Calling SNPs

Reference Genome Position

Mapped Reads

Coverage

Page 13: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing

DuplicatesVariant Calling SNPs

Duplicate

Reads

Page 14: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing

DuplicatesVariant Calling SNPs

SNP

Reference Base

Page 15: Master Thesis Presentation

METHODS - IDENTIFICATION OF VARIANTS

Paired End Fastq files

Quality Filter Alignment Removing

DuplicatesVariant Calling SNPs

Page 16: Master Thesis Presentation

METHODS – CLUSTERING OF ISOLATES

- Phylogenetic tree can be drawn from custom generated Fasta sequence containing only the identified SNPs

- FastTree tool can generate approximately-maximum-likelihood phylogenetic trees

Phylogenetic Tree Magnified Version of Constructed tree

Page 17: Master Thesis Presentation

METHODS – ANALYSIS OF INFECTION CHAIN

Minimum Spanning Tree

Algorithm available:

Kruskal's algorithm Prim's algorithm reverse-delete

algorithm

Applications Taxonomy Cluster Analysis Constructing trees for broadcasting in computer networks

Page 18: Master Thesis Presentation

METHODS – ANALYSIS OF RD4 REGION

OAOAOAOAOAOALROGAPGAPGAPGAPOAOAOAOAOAOA

Page 19: Master Thesis Presentation

Compare Strains

View SNPs

SNP plot

Phylogenetic Tree

Coverage Plot

Minimum Spanning Tree

WEB INTEGRATED ENVIRONMENT

Illumina HiSeq Sequencer

Fastq Files

Raw Reads

Demultiplexing

Database

Web Server

Variant Calling Pipeline

Galaxy

MCDBA

Mycobacterium caprae Database and Analysis

(MCDBA)

Page 20: Master Thesis Presentation

QUESTIONS????

Page 21: Master Thesis Presentation

THANK YOU