metagenomics 2015 module3 lecture - bioinformatics · 2018-11-21 · module!3!! bioinformatics.ca...

13
Canadian Bioinforma,cs Workshops www.bioinforma,cs.ca

Upload: others

Post on 12-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Canadian&Bioinforma,cs&Workshops&

www.bioinforma,cs.ca&

2 Module #: Title of Module

Page 2: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module&3&

Metagenomic&Taxonomic&Composi,on!

Morgan&Langille&

Module!3!! bioinformatics.ca

Learning!Objec3ves!of!Module!

•  Understand&the&pros&and&cons&between&16S&and&metagenomic&sequencing&

•  Understand&different&approaches&for&determining&the&

taxonomic&composi,on&of&a&metagenomics&sample&

•  Be&able&to&run&Metaphlan2&on&one&or&more&samples&

•  Be&able&to&determine&sta,s,cally&significant&differences&

in&taxonomic&abundance&across&sample&groups&using&

STAMP&

Page 3: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

16S!vs!Metagenomics!

•  16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&a&marker&for&iden,fica,on&

•  Pros&– Well&established&

–  Sequencing&costs&are&rela,vely&cheap&(~10,000&reads/sample)&

–  Only&amplifies&what&you&want&(no&host&contamina,on)&

•  Cons&–  Primer&choice&can&bias&results&towards&certain&organisms&

–  Usually&not&enough&resolu,on&to&iden,fy&to&the&strain&level&&–  Need&different&primers&usually&for&archaea&&&eukaryotes&(18S)&

–  Doesn’t&iden,fy&viruses&

Module!3!! bioinformatics.ca

16S!vs!Metagenomics!

•  Metagenomics:&sequencing&ALL&the&DNA&in&a&sample&

•  Pros&–  Less&bias&from&sequencing&

–  Can&iden,fy&all&microbes&(euks,&viruses,&etc.)&

–  Provides&func,onal&informa,on&(“What&are&they&doing?”)&

•  Cons&–  Host/site&contamina,on&can&be&signficant&

–  Expensive&(more&sequencing&depth&is&required)&

– May&not&be&able&to&sequence&“rare”&microbes&

–  Complex&bioinforma,cs&

Page 4: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

Metagenomics:!Who!is!there?!

•  Goal:&Iden,fy&the&rela,ve&abundance&of&different&microbes&in&a&sample&given&using&metagenomics&

•  Problems:&

–  Reads&are&all&mixed&together&&

–  Reads&can&be&short&(~100bp)&–  Lateral&gene&transfer&

•  Two&broad&approaches&1.  Binning&Based&2.  Marker&Based&&

Module!3!! bioinformatics.ca

Binning!Based!

•  Aaempts&to&“bin”&reads&into&the&genome&from&which&

they&originated&

•  Composi,onbbased&

–  Uses&GC&composi,on&or&kbmers&(e.g.&Naïve&Bayes&Classifier)&

–  Generally&not&very&precise&and&not&recommended&

•  Sequencebbased&–  Compare&reads&to&large&reference&database&using&BLAST&(or&

some&other&similarity&search&method)&

–  Reads&are&assigned&based&on&“Bestbhit”&or&“Lowest&Common&

Ancestor”&approach&

Page 5: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

LCA:!Lowest!Common!Ancestor!!

•  Use&all&BLAST&hits&above&a&threshold&and&assign&taxonomy&at&the&lowest&level&in&the&tree&which&covers&these&taxa.&

•  Notable&Examples:&

– MEGAN:&hap://ab.inf.unibtuebingen.de/sodware/megan5/&

•  One&of&the&first&metagenomic&tools&

•  Does&func,onal&profiling&too!&– MGbRAST:&haps://metagenomics.anl.gov/&

•  Webbbased&pipeline&(might&need&to&wait&awhile&for&results)&

–  Kraken:&haps://ccb.jhu.edu/sodware/kraken/&•  Fastest&binning&approach&to&date&and&very&accurate.&&•  Large&compu,ng&requirements&(e.g.&>128GB&RAM)&

Module!3!! bioinformatics.ca

Marker!Based!

•  Single&Gene&•  Iden,fy&and&extract&reads&hikng&a&single&marker&gene&(e.g.&16S,&

cpn60,&or&other&“universal”&genes)&

•  Use&exis,ng&bioinforma,cs&pipeline&(e.g.&QIIME,&etc.)&

•  Mul,ple&Gene&

•  Several&universal&genes&–  PhyloSid&(Darling&et&al,&2014)&

»  Uses&37&universal&singlebcopy&genes&•  Clade&specific&markers&

–  MetaPhlAn&(Segata&et&al,&2012)&

&

Page 6: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

Marker!or!Binning?!

•  Binning&approaches&– May&be&too&computa,onally&intensive&

– May&not&adequately&reflect&organism&abundances&due&to&

genome&size&

•  Marker&approaches&

–  Doesn’t&allow&func,ons&to&be&linked&directly&to&organisms&

–  Genome&reconstruc,on&is&not&possible&

–  Very&sensi,ve&to&choice&of&markers&

Module!3!! bioinformatics.ca

Why!MetaPhlAn?!

•  Fast&(marker&database&is&considerably&smaller)&

•  Markers&for&bacteria,&archaea,&eukaryotes,&and&viruses&

(since&MetaPhlAn2&was&released)&

•  Being&con,nuously&updated&and&supported&•  Used&by&the&Human&Microbiome&Project&

•  Generally&accepted&as&a&robust&method&for&taxonomy&

assignment&

•  Main&Disadvantage:&not&all&reads&are&assigned&a&

taxonomic&label&

Page 7: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

MetaPhlAn!

•  Uses&“cladebspecific”&gene&markers&

•  A&clade&represents&a&set&of&genomes&that&can&be&as&broad&

as&a&phylum&or&as&specific&as&a&species&

•  Uses&~1&million&markers&derived&from&17,000&genomes&

–  ~13,500&bacterial&and&archaeal,&~3,500&viral,&and&~110&eukaryo,c&

•  Can&iden,fy&down&to&the&species&level&(and&possibly&even&strain&level)&

•  Can&handle&millions&of&reads&on&a&standard&computer&

within&a&few&minutes&

Module!3!! bioinformatics.ca

MetaPhlAn!

•  Openbsource:&–  haps://bitbucket.org/biobakery/metaphlan2&

Page 8: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

MetaPhlAn!Marker!Selec3on!

Module!3!! bioinformatics.ca

MetaPhlAn!Marker!Selec3on!

Page 9: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

Using!MetaPhlan!

•  MetaPhlan&uses&Bow,e2&for&sequence&similarity&

searching&(nucleo,de&sequences&vs.&nucleo,de&database)&

•  Pairedbend&data&can&be&used&directly&

•  Each&sample&is&processed&individually&and&then&mul,ple&

sample&can&be&combined&together&at&the&last&step&

•  Output&is&rela3ve!abundances&at&different&taxonomic&

levels&

Module!3!! bioinformatics.ca

Absolute!vs.!Rela3ve!Abundance!

•  Absolute&abundance:&Numbers&represent&real&abundance&

of&thing&being&measured&(e.g.&the&actual&quan,ty&of&a&

par,cular&gene&or&organism)&

•  Rela,ve&abundance:&Numbers&represent&propor,on&of&

thing&being&measured&within&sample&

•  In&almost&all!cases!microbiome&studies&are&measuring&

rela,ve&abundance&

–  This&is&due&to&DNA&amplifica,on&during&sequencing&library&

prepara,on&not&being&quan,ta,ve&

Page 10: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

Rela3ve!Abundance!Use!Case!

•  Sample&A:&

–  Has&108&bacterial&cells&(but&we&don’t&know&this&from&sequencing)&

–  25%&of&the&microbiome&from&this&sample&is&classified&as&Shigella&

•  Sample&B:&

–  Has&106&bacterial&cells&(but&we&don’t&know&this&from&sequencing)&

–  &50%&of&the&microbiome&from&this&sample&is&classified&as&Shigella&

•  “Sample&B&contains&twice&as&much&Shigella&as&Sample&A”&

–  WRONG!&(If&quan,fied&it&we&would&find&Sample&A&has&more&Shigella)&

•  “Sample&B&contains&a&greater&propor,on&of&Shigella&compared&to&

Sample&A”&

–  Correct!&

Module!3!! bioinformatics.ca

Visualiza3on!and!Sta3s3cs!

•  Various&tools&are&available&to&determine&sta,s,cally&

significant&taxonomic&differences&across&groups&of&

samples&

–  Excel&–  SigmaPlot&

–  R&– MeV&(Mul,Experiment&Viewer)&

–  Python&(matplotlib)&

–  LefSe&&&Graphlan&(Huaenhower&Group)&&–  STAMP!

Page 11: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

STAMP!

Module!3!! bioinformatics.ca

Page 12: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

STAMP!Plots!

Module!3!! bioinformatics.ca

STAMP!

•  Input&1.  “Profile&file”:&Table&of&features&(samples&by&OTUs,&samples&by&

func,ons,&etc.)&

•  Features&can&form&a&heirarchy&(e.g.&Phylum,&Order,&Class,&etc)&to&allow&

data&to&be&collapsed&within&the&program&

2.  “Group&file”:&Contains&different&metadata&for&grouping&

samples&

•  Can&be&two&groups:&(e.g.&Healthy&vs&Sick)&or&mul,ple&groups&(e.g.&Water&depth&at&2M,&4M,&and&6M)&

•  Output&–  PCA,&heatmap,&box,&and&bar&plots&

–  Tables&of&significantly&different&features&

Page 13: Metagenomics 2015 Module3 Lecture - bioinformatics · 2018-11-21 · Module!3!! bioinformatics.ca 16S!vs!Metagenomics! • 16S&is&targeted&sequencing&of&a&single&gene&which&acts&as&

Module!3!! bioinformatics.ca

Ques3ons?!

Module!3! bioinformatics.ca

We&are&on&a&Coffee&Break&&&

Networking&Session&