what should bioinformatics do for evodevo?

25
Insights into the evolution and development of planarian regeneration from the genome of the flatworm Girardia tigrina SUJAI KUMAR 2014-07-24 VIENNA EURO EVODEVO WHAT SHOULD BIOINFORMATICS DO FOR EVODEVO?

Upload: ylog

Post on 10-May-2015

235 views

Category:

Science


1 download

DESCRIPTION

Presented at Euro Evo Devo 2014 in Vienna

TRANSCRIPT

Page 1: What should Bioinformatics do for EvoDevo?

Insights into the evolution and development of planarian regeneration from the genome of the flatworm Girardia tigrina

SUJAI KUMAR

2014-07-24 VIENNA EURO EVODEVO

WHAT SHOULD BIOINFORMATICS DO FOR EVODEVO?

Page 2: What should Bioinformatics do for EvoDevo?

EVODEVO

SUJAI KUMAR

Page 3: What should Bioinformatics do for EvoDevo?

SUJAI KUMAR

"Winkel triple projection SW" by Strebe - Own workLicensed under Creative Commons Attribution-Share Alike 3.0 via Wikimedia Commons http://commons.wikimedia.org/wiki/File:Winkel_triple_projection_SW.jpg

Cartoonist and mathematics teacher inNew Delhi

Page 4: What should Bioinformatics do for EvoDevo?

SUJAI KUMAR

Finding patterns in sequences:TIMSS 1999 video study

MS in Educational Psychology at the University of Illinois

Page 5: What should Bioinformatics do for EvoDevo?

SUJAI KUMAR

Self-organising systems research in New Delhi

Page 6: What should Bioinformatics do for EvoDevo?

SUJAI KUMAR

Sequenced four nematode genomes for PhD in Blaxter Lab, Edinburgh

Page 7: What should Bioinformatics do for EvoDevo?

SUJAI KUMAR

Planarian regeneration genomics in Aboobaker Lab, Oxford

Page 8: What should Bioinformatics do for EvoDevo?

Outline of this talk

1. Regeneration, planarian flatworms, and Girardia tigrina

2. Creating G tigrina genomic resources

3. Using these resources to understand regeneration

4. What should bioinformatics do for EvoDevo

Page 9: What should Bioinformatics do for EvoDevo?

1. Regeneration,planarian flatworms,and Girardia tigrina

Bely and Nyberg, 2010 DOI:10.1016/j.tree.2009.08.005

Page 10: What should Bioinformatics do for EvoDevo?

1. Regeneration,planarian flatworms,and Girardia tigrina

Kao, 2014. PhD Thesis “Transcriptome assembly and analysisof the freshwater planarian Schmidtea mediterranea”

Platyhelminthes

Cestoda

Monogenea

Trematoda

Rhabditophora

Turbellaria

Tricladida

Macrostomorpha

Lecithoepitheliata

RhabdocoelaTT

T

TT

T

Girardia tigrinaaboobakerlab.com/genomes

G

Schmidtea mediterraneasmedgd.neuro.utah.edu

G

Polycladida

Page 11: What should Bioinformatics do for EvoDevo?

1. Regeneration,planarian flatworms,and Girardia tigrina

• What we know already

• Some genes and pathways that are essential for WBR• Some transcription expression profiles• No transgenics in any planarian

Page 12: What should Bioinformatics do for EvoDevo?

2. Creating G tigrina genomic resources

Sequencing > Assembly > Annotation > Delivery

Page 13: What should Bioinformatics do for EvoDevo?

2. Creating G tigrina genomic resources

Sequencing > Assembly > Annotation > Delivery

Illumina HiSeq: WorkhorseShort paired reads~$£€ 1,000 / 100 MegaBaseMate pairs essential

PacBio: expensiveHigh quality fly genome~$£€ 10,000 / 100 MegaBase

Nanopore – not a game changer just yet

Page 14: What should Bioinformatics do for EvoDevo?

2. Creating G tigrina genomic resources

Sequencing > Assembly > Annotation > Delivery

• Quality Control

• Raw data QC fastqc

• Preliminary assembly Blobology

• Separate components contaminants/ endosymbionts/ mitochondrial

• Assess insert sizes Bad mate pair libraries confound scaffolding

Page 15: What should Bioinformatics do for EvoDevo?

Each point is a contigfrom a preliminaryassembly

(Caenorhabditis Sp. 5)

Taxon-annotatedGC-Coverage(TAGC)Plots

a.k.a“Blobology”

Page 16: What should Bioinformatics do for EvoDevo?

GC Content

Rea

d co

vera

ge

Girardia tigrina

Page 17: What should Bioinformatics do for EvoDevo?

2. Creating G tigrina genomic resources

Sequencing > Assembly > Annotation > Delivery

• Quality Control

• Raw data QC fastqc

• Preliminary assembly Blobology

• Separate components contaminants/ endosymbionts/ mitochondrial

• Assess insert sizes Bad mate pair libraries confound scaffolding

• Generate many assemblies

• ABySS, CLC, MaSurCA, SGA, Spades, ALLPATHS-LG• Evaluate assemblies

• FRCbam, REAPR, CGAL

• CEGMA, alignments to known sequences• Freeze and release

Page 18: What should Bioinformatics do for EvoDevo?

2. Creating G tigrina genomic resources

Sequencing > Assembly > Annotation > Delivery

• NOT a great assembly• But it was GoodEnough™ • Next version with long-insert mate pairs• Diploid, but high heterozygosity

Assembly version nGt.0.3 nGt.0.5

Raw read data ~500M short read pairs160 GBases

Consolidating near identical contigs 

Total Span Gbases 1.898 1.500

Num Contigs 581,558 422,617

Span Contigs >10kb 541,653,308 536,575,093

Num Contigs >10kb 29,050 27,495

N50 5,751 6,827

CEGMA 45% 56%

Page 19: What should Bioinformatics do for EvoDevo?

2. Creating G tigrina genomic resources

Sequencing > Assembly > Annotation > Delivery

• Gene prediction

• RNA-seq• Predictors Augustus, SNAP, GeneMark

• Consolidators MAKER, EVM, ENSEMBL genebuild

• Evaluate use Annotation Edit Distance (AED) as a metric

• Functional annotation

• InterProScan, Trinotate, Blast2GO

• Community annotation

• WebApollo, Community Annotation Portal

Annotation Version

Num of Genes

Num of Genes with AED>0.5

Mean aa length

Num of Genes with InterPro annotations

nGt.0.5.1 39,119 35,061 268 22,747

Page 20: What should Bioinformatics do for EvoDevo?

2. Creating G tigrina genomic resources

Sequencing > Assembly > Annotation > Delivery

• Genome Browser

• Blast server

• Bulk data downloads

• Interface

• Badger, Tripal, InterMine, Ensembl

Page 21: What should Bioinformatics do for EvoDevo?

3. Using these resources to understand regeneration

• Individual genes and pathways

• Transgenics

• Protein ortholog analysis

• 4 triclads, 1 other platyhelminth, 2 ecdysozoa, 4 deuterostomes• 14k out of 40k G tigrina proteins in strict ortholog clusters• ~8000 triclad-specific clusters• ~800 triclad-specific clusters with all 4 species represented

• Cis-regulatory analysis

• Neoblast specific regulatory regions

Page 22: What should Bioinformatics do for EvoDevo?

4. What should bioinformatics do for EvoDevo

• What should I do for an experimental EvoDevo lab

• Visual > Text• View additional information in place• Plot everything vs everything• Create gene models visually• Routine analyses should not require bioinformatician• Clear explanations of how a resource was created• Not too many versions• Minimum standards

Page 23: What should Bioinformatics do for EvoDevo?

4. What should bioinformatics do for EvoDevo

• What should the bioinformatics community do for me as an EvoDevo bioinformatician

• Best practice documentation for analyses• Easy to install tools• Minimum standards for assembly, metadata, annotation, and delivery• Grants for coordination, tools, resources

Page 24: What should Bioinformatics do for EvoDevo?

Summary

• Please use the resources at aboobakerlab.com/genomes

• Tell us what other resources you’d like to see as standard

• Fund technology development and training

Page 25: What should Bioinformatics do for EvoDevo?

Acknowledgements

• AboobakerLab.com

• Aziz Aboobaker• Natalia Pouchkina-Stantcheva• Damian Kao• Yuliana Mihaylova• Aphrodite Zhao

• Blaxter Lab (nematodes.org)

• Ben Elsworth (Badger)

• Sequencing

• Edinburgh Genomics

• Funding

• BBSRC• BSDB / Company of Biologists travel grant