Download - [2013.11.01] visualizing omics_data
Visualizing omics data
CENTER FOR MICROBIAL COMMUNITIES
Mads AlbertsenIntroduction to community systems microbiology
2013
• Visualizing omics data
• Re-introduction to 16S analysis
• Hands on 16S analysis in Rstudio
• There is so much to learn. How do I start?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Agenda
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Visualizing data?
http://mkweb.bcgsc.ca/
Martin Krzywinski
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Who - when, where and why?
Re-introduction to 16S analysis
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Who - when, where and why?
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Who - when, where and why?
http://phil.cdc.gov/phil/details.asp?pid=2226http://en.wikipedia.org/wiki/File:EBPR_FISH_Floc.jpg P. Larsen 2012
Accumulibacter Competibacter Bacillus anthracis
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
The affinities of all the beings of the same class have sometimes been represented by a great tree... The green and budding twigs may represent existing species; and those produced during former years may represent the long succession of extinct species.
C. Darwin, 1872
http://tolweb.org
Nothing in biology makes sense, except in the light of evolution.
T. Dobzhansky, 1973
Taking advantage of evolution
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why do we use the 16S gene?
Ribosomes are universal
rRNA = Structural RNAhttp://www.rna.icmb.utexas.edu/SAE/2B/ConsStruc/Diagrams/cons.16.b.Bacteria.pdf
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why do we use the 16S gene?
http://www.rna.icmb.utexas.edu/SAE/2B/ConsStruc/Diagrams/cons.16.b.Bacteria.pdf
8F8F Universal primer
8F
8F
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Why do we use the 16S gene?
http://www.rna.icmb.utexas.edu/SAE/2B/ConsStruc/Diagrams/cons.16.b.Bacteria.pdf
Ashelford et al. AEM. 2005;71:7724-7736
• Advantages:• Universal gene (No horizontal gene transfer)• Conserved regions• Variable regions• Great databases and alignments
• Problems:• Variable copy number• No universal (unbiased) primers• (Not directly correlated with activity)• (Lack of functional information)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Sampling SequencingExtraction Sample prep Bioinformatics
There is a lot of steps!
Sampling SequencingExtraction Sample prep Bioinformatics
• Standardisation, standardization, standardizasion..!
• Use biological replicates and evaluate your variation…!
• Design a good experiment with realistic expectations to the outcome (Most studies fail here!!!)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
AAU activated sludge standard @ midasfieldguide.org
Sampling SequencingExtraction Sample prep Bioinformatics
eDNA removal
Input (mg)
Bead beating
Storage
Intensity (ms-1)D
urati
on (s
)4 6
400160
804020
1 2 4 9 22• Fresh• 24 h @ 4°C• 24 h @ 20 °C
PMA650 W 10 min
+ N+ CH3
NH2
N3
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
AAU activated sludge standard @ midasfieldguide.org
Sampling SequencingExtraction Sample prep Bioinformatics
Bp
Mea
n fr
eque
ncy
of
mos
t com
mon
resi
due
in 5
0 bp
win
dow
0 500 1000 1500
1.0
0.8
0.6 V1 V2 V3V4 V5
V6
V7 V8V9
V1.3 V4V3.4
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
AAU activated sludge standard @ midasfieldguide.org
Ashelford et al. AEM. 2005;71:7724-7736
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
PCR with modified 16S primers
5’-AATGATACGGCGACCACCGAGATCTACAC GTACGTACG GT AGAGTTTGATCCTGGCTCAG-3’
5’-CAAGCAGAAGACGGCATACGAGAT TCCCTTGTCTCC ACGTACGTAC CCG ATTACCGCGGCTGCTGG-3’Illumina adapter Barcode Pad linker 534R
Illumina adapter Pad linker 27F
////Target region
//
1.
2.
3.
AAU activated sludge standard @ midasfieldguide.org
PCR Cycle
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Mardis, 2008 (PMID 18576944)
≈ 500 bp target amplicon
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Read 1: 300 bp
Read 2: 300 bp
Read 1Read 2Barcode
≈ 500 bp target amplicon
After Sequencing:
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
How many sequences are needed? It depends on your question! (although 50.000 raw sequences per sample is usually fine)
AAU raw kit and chemical costs (DKK) Cost Cost v2
DNA extraction 105 70a
Library preparation 40 40
Sequencing (min 100k reads / sample) 190b 70c
Total 335 180a Kits discountedb 50 samples per runc 150 samples per run (can run up to 300)
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Merge Cluster
31131
OTU Count
Assign taxonomy (Compare to database)
3 Accumulibacter11 Unkown
3 Competibacter1 Bacillus anthracis
OTU Count OTU table
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Merge Cluster
2 1 3 83 01 0
OTU A B
Assign taxonomy (Compare to database)
AAAAAAAAABBBBBBBBB
Barcode
2 1 Accumulibacter 3 8 Unkown3 0 Competibacter1 0 Bacillus anthracis
OTU A BOTU table
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Sequence errors, chimera’s and weird stuff..
The chance of a perfect read as function of the read length
Chimera’s
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Typical workflow
Merge Cluster
3113
OTU Count
Assign taxonomy (Compare to database)
3 Accumulibacter11 Unkown
3 Competibacter
OTU Count OTU table
Removing unique sequences makes the subsequent steps 10-100x faster and removes
the majority of errors and chimera’s
Dependent on sequencing depth and sample complexity! Be careful!
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
AAU workflow
16SAMP-14516SAMP-14616SAMP-14716SAMP-14816SAMP-14916SAMP-150
16S.V13.workflow.sh
Find sample ID’s on Google drive
OTU table (+ R version)Plain text file
2 1 Accumulibacter 3 8 Unkown3 0 Competibacter
OTU A B
Sampling SequencingExtraction Sample prep Bioinformatics
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
AAU workflow
What 16S.V13.workflow.sh does:1. Find and unpack your samples2. Optional subsampling3. Remove potential phiX contamination (bowtie2)4. Merge read 1 and read 2 (flash)5. Remove reads outside length criteria6. Optional removal of unique reads and subsampling to even depth7. Format reads for QIIME8. Cluster reads to OTUs (Uclust, QIIME)9. Assign taxonomy (RDP classifier, QIIME + database: MiDAS, Greengnes or Silva)10. Generate OTU table (QIIME)
CENTER FOR MICROBIAL COMMUNITIES | AALBORG UNIVERSITY
Where do I start?
• Get online (twitter, blogs, seqanswer.com)
• Learn basic multivariate statistics
• Learn R (with Rstudio)
• Analyzing Ecological Data (2007) by Zuur, Ieno & Smith