Download - 2011Field talk at iEVOBIO 2011
iEVOBIO 2011
The role of grass-roots data sharingcommunities, standards and
megasequencing projects in the genomics revolution
Dawn FieldNERC Centre for Ecology and Hydrology
iEVOBIO 2011
Opportunities and Challenges
The era of genomics is just beginning...
...how will we cope with the data?
...how will we gain the most knowledge from this investment in data?
iEVOBIO 2011
PARADIGM SHIFTPARADIGM SHIFT1960-1990
16S RNA
1990-2010
Genomes
2010-2020
Pangenomes
Nikos Kyrpides
iEVOBIO 2011
GREAT CHALLENGESGREAT CHALLENGES
1995-2009 2010-2015
Finished 1000 3000
Draft 1000 10000
P. Chain et al. Science, 2009Genome Sequencing Projects on GOLD
September 2009, 5643 projects
0
1000
2000
3000
4000
5000
6000
Incomplete
Complete
Nikos Kyrpides
iEVOBIO 2011
iEVOBIO 2011
Culturable
Unculturable
Nikos Kyrpides
The trend is now increasingly geared towards
ever more ambitious megasequencing
projects...
iEVOBIO 2011
And democratization of access to sequencing
power...
Just one example....
iEVOBIO 2011
(~80) 41 metagenomes“Global Ocean Survey” Sanger sequencing(Rusch et al, 2007)
Metagenomics: Putting data generating capacity into perspective with an example from Bergen
(1) 1 metagenomeSargasso SeaSanger sequencing(Venter et al, 2005)
(~120) 4 metagenomes &4 metatranscriptomesBergen mesocosm experimentPyrosequencing(Gilbert et al, 2008)
Gilbert JA, Field D, Huang Y, Edwards R, Li W, Gilna P, Joint I. (2008) Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS ONE. Aug 22;3(8):e3042.
The Bergen ocean acidification study produced 19% of the reads produced in the GOS study and 5% of the total
basepairs of sequence.
Further evidence for the “Unknown Genome” and the
Dark Matter of the Tree of Life
iEVOBIO 2011
The
Data
- Flood
- Tsunami
- Deluge
?
iEVOBIO 2011
the data bonanza
iEVOBIO 2011
To exploit fully the promise of these data we need both scientific
innovation and community agreement on how to provide
appropriate stewardship of these resources for the benefit of all.
Requires the evolution of our scientific, technological and sociological thinking....
iEVOBIO 2011
SuperMarket
The Genome Catalogue
iEVOBIO 2011
DataMarket Norman Morrison
iEVOBIO 2011
Packaging data
iEVOBIO 2011
Labels for data
<phenotype>
<environmen
tal context>
iEVOBIO 2011
standardsPrinciples: Not everything should be ‘standardized’Aggregation of data, information, and knowledge
requires standard ways of doing things
Standards provide foundations; Standards should drive innovation(think of electrical plugs or the internet)
Pick the right concepts to standardize – at the right time, with the right people
Requires good ‘group think’ – or ‘systems thinking’
iEVOBIO 2011
Community-driven solutions:
The Common Path:
•Identify the problem•Define a community to address it•Define scope of the solution•Implement solution•Gain adoption of solution
iEVOBIO 2011
The Genomic Standards ConsortiumGSC 10
Argonne, 2010
GSC 11,Hinxton,
2010
Innovation through Collaboration
GSC 12Bremen,
2011
GSC 13BGI 2012
iEVOBIO 2011
The GSC’s Mission
• the implementation of new genomic standards
• methods of capturing and exchanging metadata
• harmonization of metadata collection and analysis efforts across the wider genomics community
iEVOBIO 2011
The GSC fulfills its mission by
• Organizing meetings • Forming working groups• Creating Consensus Products
iEVOBIO 2011
Pelin Yilmaz et al 2011
iEVOBIO 2011
iEVOBIO 2011
Use of MIGS/MIMS/MIENS
Please provide this minimum information when you publish
•a genome•a metagenome•a gene marker study (i.e. ribosomal genes)
Genbank, EMBL and DDBJ now accept this information and encourage its submission to their public DNA databases
iEVOBIO 2011
Labels for data
<MIGS><MIMS>
iEVOBIO 2011
Goal:Goal:International effort to sequence a reference genome for every cultured Archaeal and Bacterial organism (~9,000 microbes)
Goal:Goal:International effort to sequence a reference genome for every cultured Archaeal and Bacterial organism (~9,000 microbes)
The Microbial Earth The Microbial Earth ProjectProject
Phase I:Sequence one representative from every characterized microbial type type
speciesspecies
Phase I:Sequence one representative from every characterized microbial type type
speciesspecies
GEBAGEBAGEBAGEBA HMPHMPHMPHMP
iEVOBIO 201130
Source: Jack A. GilbertArgonne National Labs
http://earthmicrobiome.org
iEVOBIO 2011
Field et al unpublished work on a Metadata Coverage Index (MCI)
MCI > 50
iEVOBIO 2011
GSC 5 at the EBI2008
iEVOBIO 2011
iEVOBIO 2011
iEVOBIO 2011
J BacteriologyJ Bacteriology
PNASPNAS
NatureNature
ScienceScience
SIGSSIGS
PLoS ONEPLoS ONE
Genome ResearchGenome Research
PLoS GeneticsPLoS Genetics
Nat BiotechNat Biotech
BMC GenomicsBMC Genomics
To
tal g
eno
me
pu
blic
atio
ns (
1995
- 2
011
)
Top ten journals publishing genome reports
Total 1160 Genome publicationsin 60 peer reviewed publications
Source - GenomesOnline DatabaseMay 28, 2011
iEVOBIO 2011
Incentives for compliance
iEVOBIO 2011
MIGS compliant marine phage genomes
iEVOBIO 2011
GSC 9 at the JCVI – April 2010
iEVOBIO 2011
Darwin Core
GSC MIxSPeter Dawyndt
Darwin core vs GSC MixS standard
Darwin core vs GSC MixS standard
Darwin Core
GSC MIxS standard
TaxonIdentification
Occurrence
IPR related info
EventLocation
GeologicalContextSamplingProtocolEnvironmentalConditions
Darwin core vs GSC MixS standard
Darwin core vs GSC MixS standard
Peter Dawyndt
Preliminary (first) conclusions
Preliminary (first) conclusions
•DC & GSC checklist more complementary than overlapping
how can we make these standards completely orthogonal?
iEVOBIO 2011
iEVOBIO 2011
http://gensc.org
More Information about the GSC...
iEVOBIO 2011
Feast of the Mind
iEVOBIO 2011
Labels for data
<soil>
<water>
iEVOBIO 2011
http://environmentontology.org
Member of OBO Foundry http://obofoundry.org
iEVOBIO 2011
1) Pick terms2) View hits
3) Browse4) Follow links to primary
data
– building on ontologies
Users :
http://ontogrator.org Morrison et al, 2011 SIGS
iEVOBIO 2011
Ontogrator approach depends on quality of
• Data Resources• Knowledge Organization Systems (KOS)
used
Can we use this approach to improve both?Can we complete the virtuous cycle?
iEVOBIO 2011
Field, et al 2009. Science. 326:234-236.
http://biosharing.org
iEVOBIO 2011
iEVOBIO 2011
Conclusions
• The era of genomics is just beginning…• Self-organization by the scientific community
can pay dividends (i.e. consensus building, large-scale co-ordination)– Standards are keys to unlocking data– Group thinking overcomes the tragedy of the
commons
• Emerging key players from the molecular domain – “one stop shops”– Genomic Standards Consortium– BioSharing – driving cross-community collaborations
iEVOBIO 2011
Feast of the Mind
iEVOBIO 2011
Future
• Analysis – proof sharing is beneficial• Making the field of data sharing more
quantitative – Objective measures of consensus– Useful Metrics: i.e. Metadata coverage index (MCI)– Modelling – i.e. how to best incentivize data
sharing?
• Further shared concepts– Minimum Information about a Sampling Site (MISS)– Minimum Data Policy– PubData?
AcknowledgementsBergen and L4 metagenomicsJack Gilbert Sue
HuseIan Joint Paul
SwiftPaul Somerfield Rob
Knight
NEBCBela TiwariTim BoothMesude Bicak
CEHNorman MorrisonDave Hancock
University of Manchester
Henning HermjakobChris Taylor
European Bioinformatics Institute
Susanna SansonePhilippe Rocca-SerraEamonn Maguire
Oxford University
Genomic Standards ConsortiumPeter Sterk
iEVOBIO 2011
Acknowledgements
Coordination, workshops, working groups,infrastructure and exchange visits
Additional workshop funds
Local Hosts of GSC workshops
Sponsors of GSC 9 and GSC 10
GSC FundingRCN4GSC