metagenomics 2015 module3 lecture - bioinformatics · 2018-11-21 · module!3!! bioinformatics.ca...
TRANSCRIPT
Canadian&Bioinforma,cs&Workshops&
www.bioinforma,cs.ca&
2 Module #: Title of Module
Module&3&
Metagenomic&Taxonomic&Composi,on!
Morgan&Langille&
Module!3!! bioinformatics.ca
Learning!Objec3ves!of!Module!
• Understand&the&pros&and&cons&between&16S&and&metagenomic&sequencing&
• Understand&different&approaches&for&determining&the&
taxonomic&composi,on&of&a&metagenomics&sample&
• Be&able&to&run&Metaphlan2&on&one&or&more&samples&
• Be&able&to&determine&sta,s,cally&significant&differences&
in&taxonomic&abundance&across&sample&groups&using&
STAMP&
Module!3!! bioinformatics.ca
16S!vs!Metagenomics!
• 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&a&marker&for&iden,fica,on&
• Pros&– Well&established&
– Sequencing&costs&are&rela,vely&cheap&(~10,000&reads/sample)&
– Only&lifies&what&you&want&(no&host&contamina,on)&
• Cons&– Primer&choice&can&bias&results&towards&certain&organisms&
– Usually¬&enough&resolu,on&to&iden,fy&to&the&strain&level&&– Need&different&primers&usually&for&archaea&&&eukaryotes&(18S)&
– Doesn’t&iden,fy&viruses&
Module!3!! bioinformatics.ca
16S!vs!Metagenomics!
• Metagenomics:&sequencing&ALL&the&DNA&in&a&sample&
• Pros&– Less&bias&from&sequencing&
– Can&iden,fy&allµbes&(euks,&viruses,&etc.)&
– Provides&func,onal&informa,on&(“What&are&they&doing?”)&
• Cons&– Host/site&contamina,on&can&be&signficant&
– Expensive&(more&sequencing&depth&is&required)&
– May¬&be&able&to&sequence&“rare”µbes&
– Complex&bioinforma,cs&
Module!3!! bioinformatics.ca
Metagenomics:!Who!is!there?!
• Goal:&Iden,fy&the&rela,ve&abundance&of&differentµbes&in&a&sample&given&using&metagenomics&
• Problems:&
– Reads&are&all&mixed&together&&
– Reads&can&be&short&(~100bp)&– Lateral&gene&transfer&
• Two&broad&approaches&1. Binning&Based&2. Marker&Based&&
Module!3!! bioinformatics.ca
Binning!Based!
• Aaempts&to&“bin”&reads&into&the&genome&from&which&
they&originated&
• Composi,onbbased&
– Uses&GC&composi,on&or&kbmers&(e.g.&Naïve&Bayes&Classifier)&
– Generally¬&very&precise&and¬&recommended&
• Sequencebbased&– Compare&reads&to&large&reference&database&using&BLAST&(or&
some&other&similarity&search&method)&
– Reads&are&assigned&based&on&“Bestbhit”&or&“Lowest&Common&
Ancestor”&approach&
Module!3!! bioinformatics.ca
LCA:!Lowest!Common!Ancestor!!
• Use&all&BLAST&hits&above&a&threshold&and&assign&taxonomy&at&the&lowest&level&in&the&tree&which&covers&these&taxa.&
• Notable&Examples:&
– MEGAN:&hap://ab.inf.unibtuebingen.de/sodware/megan5/&
• One&of&the&first&metagenomic&tools&
• Does&func,onal&profiling&too!&– MGbRAST:&haps://metagenomics.anl.gov/&
• Webbbased&pipeline&(might&need&to&wait&awhile&for&results)&
– Kraken:&haps://ccb.jhu.edu/sodware/kraken/&• Fastest&binning&approach&to&date&and&very&accurate.&&• Large&compu,ng&requirements&(e.g.&>128GB&RAM)&
Module!3!! bioinformatics.ca
Marker!Based!
• Single&Gene&• Iden,fy&and&extract&reads&hikng&a&single&marker&gene&(e.g.&16S,&
cpn60,&or&other&“universal”&genes)&
• Use&exis,ng&bioinforma,cs&pipeline&(e.g.&QIIME,&etc.)&
• Mul,ple&Gene&
• Several&universal&genes&– PhyloSid&(Darling&et&al,&2014)&
» Uses&37&universal&singlebcopy&genes&• Clade&specific&markers&
– MetaPhlAn&(Segata&et&al,&2012)&
&
Module!3!! bioinformatics.ca
Marker!or!Binning?!
• Binning&approaches&– May&be&too&computa,onally&intensive&
– May¬&adequately&reflect&organism&abundances&due&to&
genome&size&
• Marker&approaches&
– Doesn’t&allow&func,ons&to&be&linked&directly&to&organisms&
– Genome&reconstruc,on&is¬&possible&
– Very&sensi,ve&to&choice&of&markers&
Module!3!! bioinformatics.ca
Why!MetaPhlAn?!
• Fast&(marker&database&is&considerably&smaller)&
• Markers&for&bacteria,&archaea,&eukaryotes,&and&viruses&
(since&MetaPhlAn2&was&released)&
• Being&con,nuously&updated&and&supported&• Used&by&the&Human&Microbiome&Project&
• Generally&accepted&as&a&robust&method&for&taxonomy&
assignment&
• Main&Disadvantage:¬&all&reads&are&assigned&a&
taxonomic&label&
Module!3!! bioinformatics.ca
MetaPhlAn!
• Uses&“cladebspecific”&gene&markers&
• A&clade&represents&a&set&of&genomes&that&can&be&as&broad&
as&a&phylum&or&as&specific&as&a&species&
• Uses&~1&million&markers&derived&from&17,000&genomes&
– ~13,500&bacterial&and&archaeal,&~3,500&viral,&and&~110&eukaryo,c&
• Can&iden,fy&down&to&the&species&level&(and&possibly&even&strain&level)&
• Can&handle&millions&of&reads&on&a&standard&computer&
within&a&few&minutes&
Module!3!! bioinformatics.ca
MetaPhlAn!
• Openbsource:&– haps://bitbucket.org/biobakery/metaphlan2&
Module!3!! bioinformatics.ca
MetaPhlAn!Marker!Selec3on!
Module!3!! bioinformatics.ca
MetaPhlAn!Marker!Selec3on!
Module!3!! bioinformatics.ca
Using!MetaPhlan!
• MetaPhlan&uses&Bow,e2&for&sequence&similarity&
searching&(nucleo,de&sequences&vs.&nucleo,de&database)&
• Pairedbend&data&can&be&used&directly&
• Each&sample&is&processed&individually&and&then&mul,ple&
sample&can&be&combined&together&at&the&last&step&
• Output&is&rela3ve!abundances&at&different&taxonomic&
levels&
Module!3!! bioinformatics.ca
Absolute!vs.!Rela3ve!Abundance!
• Absolute&abundance:&Numbers&represent&real&abundance&
of&thing&being&measured&(e.g.&the&actual&quan,ty&of&a&
par,cular&gene&or&organism)&
• Rela,ve&abundance:&Numbers&represent&propor,on&of&
thing&being&measured&within&sample&
• In&almost&all!cases!microbiome&studies&are&measuring&
rela,ve&abundance&
– This&is&due&to&DNA&lifica,on&during&sequencing&library&
prepara,on¬&being&quan,ta,ve&
Module!3!! bioinformatics.ca
Rela3ve!Abundance!Use!Case!
• Sample&A:&
– Has&108&bacterial&cells&(but&we&don’t&know&this&from&sequencing)&
– 25%&of&theµbiome&from&this&sample&is&classified&as&Shigella&
• Sample&B:&
– Has&106&bacterial&cells&(but&we&don’t&know&this&from&sequencing)&
– &50%&of&theµbiome&from&this&sample&is&classified&as&Shigella&
• “Sample&B&contains&twice&as&much&Shigella&as&Sample&A”&
– WRONG!&(If&quan,fied&it&we&would&find&Sample&A&has&more&Shigella)&
• “Sample&B&contains&a&greater&propor,on&of&Shigella&compared&to&
Sample&A”&
– Correct!&
Module!3!! bioinformatics.ca
Visualiza3on!and!Sta3s3cs!
• Various&tools&are&available&to&determine&sta,s,cally&
significant&taxonomic&differences&across&groups&of&
samples&
– Excel&– SigmaPlot&
– R&– MeV&(Mul,Experiment&Viewer)&
– Python&(matplotlib)&
– LefSe&&&Graphlan&(Huaenhower&Group)&&– STAMP!
Module!3!! bioinformatics.ca
STAMP!
Module!3!! bioinformatics.ca
Module!3!! bioinformatics.ca
STAMP!Plots!
Module!3!! bioinformatics.ca
STAMP!
• Input&1. “Profile&file”:&Table&of&features&(samples&by&OTUs,&samples&by&
func,ons,&etc.)&
• Features&can&form&a&heirarchy&(e.g.&Phylum,&Order,&Class,&etc)&to&allow&
data&to&be&collapsed&within&the&program&
2. “Group&file”:&Contains&different&metadata&for&grouping&
samples&
• Can&be&two&groups:&(e.g.&Healthy&vs&Sick)&or&mul,ple&groups&(e.g.&Water&depth&at&2M,&4M,&and&6M)&
• Output&– PCA,&heatmap,&box,&and&bar&plots&
– Tables&of&significantly&different&features&
Module!3!! bioinformatics.ca
Ques3ons?!
Module!3! bioinformatics.ca
We&are&on&a&Coffee&Break&&&
Networking&Session&