from metagenomic sample to useful visual anna shcherbina 01/10/2013 1 anna shcherbina bioinformatics...

Post on 28-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20131

Anna Shcherbina

Bioinformatics Challenge Day

02/02/2013

From Metagenomic Sample to Useful Visual

This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002.  Opinions, interpretations, recommendations and conclusions are those of the authors and

are not necessarily endorsed by the United States Government.

Distribution Statement A: Approved for public release; distribution is unlimited.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20132

The Opportunity

•NGS instruments have recently given us the ability to characterize the microbiomes that we live in and that live in us.

•We can get a step closer to this goal by creating a visualization program that facilitates manual data curation by a human.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20133

Your Mission

Invent novel visualization approaches to represent metagenomic data.

Subgoals:•Pick out anomalies within a given dataset. •Generate time series representation of multiple datasets.•Compress data efficiently to allow visualization of huge datasets.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20134

Metagenomic datasets (FASTQ format) from clinical and environmental samples.

• Metagenome of the human oral cavity under healthy and diseased conditions, with a focus on supragingival dental plaque and cavities. – “oral_healthy” and “oral_diseased” datasets– Roche 454

• Nose/throat swab from Nicaraguan child with acute respiratory illness– “nicaragua” dataset– Illumina

The Data (I)

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20135

• Skin surface from the palm of a human hand – “palm” dataset– Roche 454

• Human abscess sample of unknown etiology – “abscess” dataset– Illumina

• Cultivated corn soil metagenome – “soil” dataset– Illumina

The Data (II)

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20136

Our Processing Pipeline

Raw FASTA reads

BLAST against virus, bacteria, and archaea databases

(from GenBank)

Data Processing•Parsed CSV summary of BLAST hits

•BLAST hits sorted by species, FASTA format

Other BLAST parsers

Data is available from each stage of the processing pipeline

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20137

Parsed BLAST File Example for a Single Hit

S62.141238_159200 Query Name+ Query Strand1 Query Start232 Query EndNeisseria meningitidis Query OrganismBacteria; Proteobacteria; Betaproteobacteria; Query Taxonomy 232 Identities100 Percent0 Number Gaps0 Number CharactersGU561418 Target Name- Target Strand47 Target Start 278 Target EndNeisseria subflava Target OrganismBacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria.Target Taxonomy CTGGGCCGTGTCTCAGTCCCAGTGTGGC Query SequenceCTGGGCCGTGTCTCAGTCCCAGTGTGGC Target SequenceBLASTN Analysis Programbacteria.gdna Database

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20138

Your Open-Source Toolkit

•MEGAN4

•IMG/IM

•KRONA (included with PhymmBl)

•MG-RAST

•METAREP

•Mothur

•Feel free to use any additional tools you think are useful.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20139

MEGAN4-MEtaGenomoe ANalyzer

•A simple lowest common ancestor algorithm assigns reads to taxa. • Taxonomic level reflects the degree of conservation of a sequence.

•Dissects large datasets without assembly or the targeting of specific phylogenetic markers.

•Graphical and statistical output for comparing different datasets.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201310

MEGAN4-MEtaGenomoe ANalyzer

Oral Diseased Bacteria

Oral Healthy Bacteria

Oral Diseased Virus Oral Healthy Virus

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201311

MEGAN4-MEtaGenomoe ANalyzer

Oral healthy Vs.

Oral diseasedBacteria

Oral healthy Vs.

Oral diseasedVirus

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201312

• Web interface: http://img.jgi.doe.gov/cgi-bin/m/main.cgi

IMG/IM – Integrated Microbial Genomes with Microbial Samples

source: http://img.jgi.doe.gov/m/doc/about_index.html

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201313

IMG/IM Phylogenetic Distribution of Genes Based on Distribution of BLAST Hits

source: http://img.jgi.doe.gov/m/doc/about_index.html

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201314

IMG/M Abundance Profile Overview

source: http://img.jgi.doe.gov/m/doc/about_index.html

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201315

• KRONA allows hierarchal data to be explored with zoomable pie-charts. – Excel template or KRONA tools. – Support for several bioinformatics tools and raw data formats.

KRONA

source: http://sourceforge.net/p/krona/home/krona/

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201316

MG-RAST

Oral Diseased

source: http://blog.metagenomics.anl.gov/

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201317

MG-RAST

Oral Healthy

source: http://blog.metagenomics.anl.gov/

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201318

MG-RAST

Oral Diseased Oral Healthy

source: http://blog.metagenomics.anl.gov/

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201319

• A Web 2.0 application to analyze and compare annotated metagenomic datasets.

• Compare absolute and relative counts of multiple datasets at various functional and taxonomic levels.

• Statistical tests, multidimensional scaling, heatmap and hierarchal clustering plots.

JCVI Metagenomics Reports (METAREP)

source: http://blogs.jcvi.org/tag/metarep/

Heatmap Plot

Hierarchical Clustering Plot

METASTAT Results

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201320

• A single platform for sequence alignment, pairwise distance calculation, distance matrix analysis.

• Venn diagrams, community trees, heat maps, sample-based rarefaction curves.

Mothur: 16S rRNA Sequence Analysis

top related