#ievobio keynote - june 26, 2013

Post on 29-Nov-2014

1.028 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Visualizing biodiversity in the era of high-throughput

sequencing

Holly Bik, UC Davis @Dr_Bik

Our ability to visualize high-throughput sequencing data is as

bad as my title slide

���

$250k, 1 year��

“A Research-Driven Data Visualization Framework for High-

Throughput Environmental Sequence Data” �

http://pitchinteractive.com @pitchinc

“Pitch Interactive dissects large data sets in search of meaningful and often hidden patterns that

serve to determine the shape and form that best tells a story.”

Diverse marine community!

EASY!EASY!

EASY!

VERY Difficult!!

Mark Rothko, �No. 14, 1960�

�rectangles of orange and

purple with soft edges ��

h"p://pippascabinet.blogspot.com/2012/11/on6true6love.html:

Challenge 1: Environmental data is terrible at revealing fine-scale

taxonomic patterns

ShallowGulf:

ShallowCalif:

AtlanAc22#1:AtlanAc25#2:

AtlanAc29:AtlanAc43: Pacific128:

Pacific528:Pacific422:

Pacific321:

Pacific237:AtlanAc45:

PC2:(12.21%):

PC3:(10.54%): PC1:(13.03%):

Overarching Community Patterns!

Bik et al. 2012, Molecular Ecology,! 21(5):1048-59 !

0:

0.1:

0.2:

0.3:

0.4:

0.5:

0.6:

0.7:

0.8:

0.9:

1:

Post-spill�

Fungal Dominance�

Nematode Dominance�Pre-spill�

Bik et al. 2012, PLoS ONE, 7(6):e38550 !

Algae:

Environmental:

Fungi:

Metazoa::Annelida:

Metazoa::Arthropoda:

Metazoa::Gastrotricha:

Metazoa::Nematoda:

Metazoa::Platyhelminthes:

No:Match:

Stramenopiles:

Unicellular:Eukaryotes:

Metazoa::Acanthocephala:

Metazoa::Brachiopoda:

Metazoa::Bryozoa:

Metazoa::Chordata:

Metazoa::Cnidaria:

Metazoa::Echiura:

Metazoa::Entoprocta:

Metazoa::Mollusca:

Fungi�

Grand&Isle,&Louisiana&:

Bik et al. 2012, PLoS ONE, 7(6):e38550 !

Exploring Trees�Ecologically, what are these reference taxa doing??!

Pertinent info for biological interpretations of DNA data!!!

Challenge 2: Taxonomic, phylogenetic, and ecological knowledge is imperative for

making meaningful interpretations of high-throughput sequence datasets

Enoplus spp.��

Daptonema spp.��

Robbea spp.��

Caenorhabditis elegans

Actinomyces spp.��

Clostridium spp.��

Listeria spp.

Synechococcus spp.

Challenge 3: Extreme bioinformatics bottleneck for

microbial eukaryote data

rDNA copy number & genome size in eukaryotes

Prokopowich CD, Gregory TR, Crease TJ. (2003) Genome, 46(1):48–50.

Bik et al., in revision

…and in ONE genus of nematodes

Caenorhabditis brenneri ~323 rRNA gene copies

Caenorhabditis briggsae ~56 rRNA gene copies

OCTU Reads OCTU Length Bit Score E-Value Match bp Total bp % Similarity Chimera DB match

27 63 266 525 e-146 265 265 100 -1 B. seani 175

12 9 265 500 e-138 261 264 98.86 -1 B. seani 175170 8 264 496 e-137 261 264 98.86 0 B. seani 175513 1 264 494 e-136 259 262 98.85 -2 B. seani 175579 2 263 492 e-136 258 261 98.85 -2 B. seani 175570 1 262 492 e-136 258 261 98.85 -1 B. seani 175394 1 263 490 e-135 260 264 98.48 1 B. seani 17519 2 269 488 e-135 264 269 98.14 0 B. seani 175658 1 266 486 e-134 260 265 98.11 -1 B. seani 175412 2 264 480 e-132 260 265 98.11 1 B. seani 175465 9 254 478 e-132 251 254 98.82 0 B. seani 1751164 1 268 478 e-132 261 267 97.75 -1 B. seani 175304 1 261 474 e-130 255 260 98.08 -1 B. seani 175868 1 244 460 e-126 242 245 98.78 1 B. seani 175514 2 274 458 e-126 263 272 96.69 -2 B. seani 175683 1 250 426 e-116 241 249 96.79 -1 B. seani 175627 1 230 422 e-115 223 226 98.67 -4 B. seani 175171 3 212 400 e-108 209 211 99.05 -1 B. seani 1751223 1 202 355 5.00E-95 198 204 97.06 2 B. seani 175

Porazinska et al. 2010 Zootaxa

Intragenomic variation in Eukaryotic rRNA

Tail!

Head!

Artificial control community containing known nematode species, all with corresponding full length reference 18S sequences!

Head-Tail Pattern in Nematode OTUs

99% cutoff

OTUs as ‘Clouds’

97% cutoff

How to correlate OTUs with biological species?

Sparse Databases for Eukaryotes

SILVA&108&Ref&rRNA&Database&(16S/18S)&

Bacteria: 530,197:

Archaea: 25,658:

Eukaryotes: 62,587:

Ambiguous Taxonomy

Taxa Region 1 95%

Region 2 95%

Region 1 99%

Region 2 99%

Metazoa (20 Phyla) 1360 1461 43255 25668 Nematoda 765 879 27020 15518

Annelida 217 197 7073 3869 Arthropoda 128 178 2280 2323

Unicellular eukaryotes 738 1257 15198 22020 Environmental isolates 774 686 12687 9775 No match 480 354 11345 1868 Fungi 225 163 9984 2445 Stramenopiles 137 146 1771 1583 Algae 111 96 975 861 Total (all taxa) 3825 4163 95215 64220

!1!Deep sea and shallow water marine sediment 1.2 million reads, 454 GS FLX Titanium

Bik et al. 2012, Molecular Ecology, 21(5):1048-59

Goal 1: A web-based, scalable visualization framework for

standard data formats

Tier One

Standard outputs from bioinformatic pipelines

•  BIOM (json) files – OTU tables, metagenome datasets •  Tab-delimited metadata files

http://explore.climbsf.com

Goal 2: Destroy biologists’ addiction to pie charts

A pie chart is not the most informative way to interpret

biodiversity data!

Tier Two

Bacteria: Archaea:

Nematodes:

Cilliates:

Crustaceans:

Circle:size:=:species:abundance:Circle:color:=:metadata:(sample,:temprature,:pH,:etc.):Mockup:example:take:from:h"p://www.wefeelfine.org/::

Goal 4: Find intuitive ways to visualize new data outputs

Explicitly Phylogenetic Approaches!Aligned:environmental:sequences:

Guide:Tree:

EvoluAonary:Placement:of:short:reads:

:::::::::

http://phylosift.wordpress.com!

Input Sequences rRNA workflow

protein workflow

profile HMMs used to align candidates to reference alignment

Taxonomic Summaries

parallel option

hmmalign multiple alignment

LAST fast candidate search

pplacer phylogenetic placement

LAST fast candidate search

LAST fast candidate search

search input against references

hmmalign multiple alignment

hmmalign multiple alignment

Infernal multiple alignment

LAST fast candidate search

<600 bp

>600 bp

Sample Analysis & Comparison

Krona plots, Number of reads placed

for each marker gene

Edge PCA, Tree visualization, Bayes factor tests

each

inpu

t seq

uenc

e sc

anne

d ag

ains

t bot

h w

orkf

low

s

Probability Distributions: �when a pie chart is not a pie chart

Great! !

Not Bad !

Getting Tricky… !

Marine:Metagenome:

Tree:Placement:Sing:Tree:6:Guppy:

Goal 5: Pester other people Solicit case study participants

Goal 6: (Phase 2) Build a user and developer community

Acknowledgements :

:

Jonathan Eisen Aaron Darling Guillaume Jospin Dongying Wu David Coil :

: Further Information

•  hbik@ucdavis.edu

•  @Dr_Bik – updates posted to Twitter

•  Grant proposal now posted on Figshare!

!!!:

top related