susanna-assunta sansone: an overview of the evolving portfolio of data sharing enablers: biosharing

30
How do we make standards-compliant data sharing culture functional and efficient? Several data management, sharing policies and plans have emerged; the number of data journals is growing and guidelines to authors for reporting data are being enriched; there are thousands of biological databases and a wealth of community standards Although, funders, journal editors, data producers, consumers and service providers agree in principle that shared, annotated research data and methods offers new discovery opportunities, compliance is challenging in practice Starting from the genomics domain and extending to other areas of life-science , we are looking to highlight the success stories and existing problems Policies and standards for reproducible research: from theory to practice

Upload: gigascience-bgi-hong-kong

Post on 27-Jan-2015

111 views

Category:

Technology


0 download

DESCRIPTION

Susanna-Assunta Sansone talk at the Genomics Standards Consortium meeting in Shenzhen, March 6th 2012

TRANSCRIPT

Page 1: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

§  How do we make standards-compliant data sharing culture functional and efficient?

•  Several data management, sharing policies and plans have emerged; the number of data journals is growing and guidelines to authors for reporting data are being enriched; there are thousands of biological databases and a wealth of community standards

•  Although, funders, journal editors, data producers, consumers and service providers agree in principle that shared, annotated research data and methods offers new discovery opportunities, compliance is challenging in practice

§  Starting from the genomics domain and extending to other areas of life-science, we are looking to highlight the success stories and existing problems

Policies and standards for reproducible research: from theory to practice

Page 2: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

§  Representatives from stakeholders involved in complete cycle of data •  from funding and regulation, to production, release and re-use

§  Setting the scene: •  Susanna-Assunta Sansone, University of Oxford, UK

•  Scott Edmunds, GigaScience BGI Shenzhen, China

§  Funders •  Rita Colwell, University of Maryland, USA

•  Paula J. Olsiewski, Sloan Foundation

§  Service providers and/or data producers

•  Philippe Rocca-Serra, University of Oxford, UK •  Folker Meyer, Argonne National Laboratory, USA •  Srikrishna Subramanian, IMTECH, India

§  Editors •  Clare Garvey, Genome Biology/BioMed Central •  Craig Mak, Nature Biotechnology

About this session - speakers

Page 3: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

§  Data management, preservation and sharing policies – view points

•  formulation and enforcement, or

•  uptake and compliance

§  Reporting standards – experiences and challenges

•  evolutions of standards, costs of compliance, reward for complying etc.

•  usability of standards when working across disciplines, also they all have differing community norms

•  challenges in integrating data types and how standards can help

§  Tackling the challenges – approaches and lessons learned

•  balance needs and expectations (data producers, consumers, reviews, service providers etc.)

•  potential role of each stakeholder •  new way forwards

About this session - topics

Page 4: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

the evolving portfolio of data sharing enablers

Susanna-Assunta Sansone, PhD

University of Oxford,

Oxford e-Research Centre, Oxford, UK

http://uk.linkedin.com/in/sasansone

GSC13th, Shenzhen, China, March 5-7, 2012

Page 5: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

From reusable data to reproducible research

To make the datasets comprehensible, interoperable and reusable, underpinning future investigations, we need common ways to report and share the experimental details and the associated results.

Consistent reporting will have a positive and long-lasting impact on the value of collective scientific outputs.

Page 6: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

A ‘general mobilization’ to develop standards, e.g.:

report the same core, essential information

use the same word and refer to the same ‘thing’ allow data to flow from

one system to another

Page 7: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

A ‘general mobilization’ to develop standards…..BUT

§  Fragmentation of the standards is a major issue ! •  Being focused on particular communities’ interests, be their individual technologies

or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards

•  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets

Page 8: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

VO!

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

Growing number of reporting standards

Page 9: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

Growing number of reporting standards

+ 130

Estimated

+ 150

Source: MIB

BI,

EQU

ATOR

+ 303

Source: BioPortal

Databases, annotation,

curation tools

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Page 10: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

But how much do we know about these standards

miame!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Page 11: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

Which one are mature enough for

me to use or recommend?

I work on plants, are these just for

biomedical applications?

What are the criteria to evaluate

their status and value?

How can I get involved to

propose extensions or modifications?

Which tools and databases

implement which standards?

I use high throughput sequencing technologies, which one are applicable

to me?

But how much do we know about these standards

Page 12: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

Which one are mature enough for

me to use or recommend?

I work on plants, are these just for

biomedical applications?

What are the criteria to evaluate

their status and value?

Which tools and databases

implement which standards?

I use high throughput sequencing technologies, which one are applicable

to me?

How can I get involved to

propose extensions or modifications?

Several policy documentations and guidelines are inconsistent and/or unclear when recommending use of standards, e.g.: “..recommend use of appropriate standards...where these exists…....mature, stable efforts....MIAME format…..standards from accredited standards organizations…..deposition to public repositories, supporting these standards…...”

Often not much …

Page 13: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing
Page 14: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

14

2009

Page 15: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

15

Page 16: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

16

A coherent, curated and searchable catalogue of data sharing resources that (collaboratively) works to:

2. Centralizes community-developed bioscience standards and make them discoverable; linking to:

•  data sharing, preservation and management policies •  other portals e.g. MIBBI, NCBO’s BioPortal, NIF, BioSiteMaps, OBO foundry •  related open access, published material e.g. BioMedCentral, Nature Precedings, F1000 •  tools and databases implementing the standards e.g. collaboration with NAR Database

3. Identifies and maintain a set of (implicit) criteria for assessing usability and popularity of the standards, including:

•  implementations by tools and databases •  availability of standards-compliant, public datasets •  relations among standards

3. Fosters communication among groups, in particular to: •  address overlaps and duplication of efforts and enhance interoperability of standards •  produce ‘best practice’ guidelines starting new, or contributing to existing efforts

Ø Will allow stakeholders (funders, journals, service providers and researchers) to make informed decision on standards

Page 17: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

17

Page 18: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

18

Over 400 entries (public and in curation)

Page 19: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

Smith et al, 2007

Page 20: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

Smith et al, 2007

Taylor, Field, Sansone et al, 2008

Page 21: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

21

List of databases, linked to standards a collaboration with Database Issue

Page 22: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

22

List of databases, linked to standards a collaboration with Database Issue

Page 23: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

23

List of databases, linked to standards a collaboration with Database Issue

Page 24: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

24

The relationship among popular standard formats for pathway information BioPAX and PSI-MI are designed for data exchange to and from databases and pathway and network data integration. SBML and CellML are designed to support mathematical simulations of biological systems and SBGN represents pathway diagrams.

CREDIT: Demir, et al., The BioPAX community standard for pathway data sharing, 2010.

Define groups and relations among standards

Page 25: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

E.g. in the genomics context: resources from GSC and other communities…

MixS

EnvO

EnvO-light

OBI

etc…

GCDML

ISA-Tab

SRAxml

BIOM (data matrices)

INSDC

GOLD

MG-RAST

CAMERA

SILVA

etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…

Page 26: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

E.g. in the genomics context: resources from GSC and other communities…

MixS

EnvO

EnvO-light

OBI

etc…

GCDML

ISA-Tab

SRAxml

BIOM (data matrices)

INSDC

GOLD

MG-RAST

CAMERA

SILVA

etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…

Page 27: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

E.g. in the genomics context: resources from GSC and other communities…

MixS

EnvO

EnvO-light

OBI

etc…

GCDML

ISA-Tab

SRAxml

BIOM (data matrices)

INSDC

GOLD

MG-RAST

CAMERA

SILVA

etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…

Page 28: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

E.g. in the genomics context: resources from GSC and other communities…

MixS

EnvO

EnvO-light

OBI

etc…

GCDML

ISA-Tab

SRAxml

BIOM (data matrices)

INSDC

GOLD

MG-RAST

CAMERA

SILVA

etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…

Page 29: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

E.g. in the genomics context: resources from GSC and other communities…

MixS

EnvO

EnvO-light

OBI

etc…

GCDML

ISA-Tab

SRAxml

BIOM (data matrices)

INSDC

GOLD

MG-RAST

CAMERA

SILVA

etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…

Page 30: Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

Acknowledgements: Philippe Rocca-Serra (University of Oxford) Eamonn Maguire (University of Oxford) Annapaola Santarsiero (University of Oxford) Susanna Sansone (University of Oxford) Chris Taylor (EMBL-EBI) Dawn Field (NERC-NEBC) with contributions from members of our communities and individuals.