isa commons / biosharing - susanna-assunta sansone - ismb 2012

30
Toward interoperable bioscience data Susanna-Assunta Sansone, PhD Principal Investigator, Team Leader, University of Oxford e-Research Centre, Oxford, UK @isatools @biosharing ISMB 2012, Long Beach, California, USA, July 15-17 ISMB hashtag: #PP44 Highlights Track: Databases and Ontologies

Upload: susanna-assunta-sansone

Post on 27-Jan-2015

111 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Toward interoperable bioscience data

Susanna-Assunta Sansone, PhD

Principal Investigator, Team Leader, University of Oxford e-Research Centre, Oxford, UK

@isatools @biosharing

ISMB 2012, Long Beach, California, USA, July 15-17

ISMB hashtag: #PP44

Highlights Track: Databases and Ontologies

Page 2: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

§  ISA Commons, a grass-root collaborative that works to facilitate collection, curation and sharing of experiments in an increasingly diverse set of life science domains, using a common, structured representation of the experiments that •  transcends individual biological and technological domains, •  follows the appropriate community norms and standards, many

listed in the BioSharing catalogue and •  is implemented by several curation, storage and data sharing tools

What is this presentation about?

www.biosharing.org

www.isacommons.org

TOWARDS INTEROPERABLE BIOSCIENCE DATA

Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.

Feb 2012 www.isacommons.org

doi:10.1038/ng.1054

ISMB tag: #PP44

Page 3: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

From reusable data to reproducible research

To make the datasets comprehensible, interoperable and reusable, underpinning future investigations, we need common ways to report and share the experimental details and the associated results.

Consistent reporting will have a positive and long-lasting impact on the

value of collective scientific outputs.

ISMB tag: #PP44

Page 4: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

§  Capture all salient features of the experimental workflow

§  Make annotation explicit and discoverable

§  Structure the descriptions for consistency, tracking §  independent variables §  dependent variables using §  cross reference and

resolvable identifiers

Structured description of datasets ISMB tag:

#PP44

Page 5: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

§  We must strike a balance between •  depth and breadth of

information; and •  sufficient information

required to reuse the data

Not too much, not too little, just ‘right’ ISMB tag:

#PP44

Page 6: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

6

Example of experiments by InnoMed PredTox a FP6 public-private consortium

sample characteristic(s)

experimental design

experimental variable(s)

technology(s)

measurement(s)

protocols(s)

data file(s)

......

Page 7: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

report the same core, essential information

use the same word and refer to the same ‘thing’ allow data to flow from

one system to another

Challenges: different communities, different norms and standards, lack of coordination, fragmentation and uneven coverage…

A ‘general mobilization’ to develop standards, e.g.: ISMB tag:

#PP44

Page 8: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Growing number of reporting standards

+ 130

Estimated

+ 150

Source: MIB

BI,

EQU

ATOR

+ 303

Source: BioPortal

MIAME!MIAPA!

MIRIAM!MIQAS!MIX!

MIGEN!

CIMR!MIAPE!

MIASE!

MIQE!

MISFISHIE….!

REMARK!

CONSORT!

MAGE-Tab!GCDML!

SRAxml!SOFT! FASTA!

DICOM!

MzML !SBRML!

SEDML…!

GELML!

ISA-Tab!

CML!

MITAB!

AAO!CHEBI!

OBI!

PATO! ENVO!MOD!

BTO!IDO…!

TEDDY!

PRO!XAO!

DO

VO!

Each one focuses on a particular biological or technological domains

ISMB tag: #PP44

Page 9: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

9

A catalogue to map the landscape of standards : over 400 bio-standards (public and in curation)

Field*, Sansone* et al., Omics data sharing. Science 326, 234-36 (2009) doi:0.1126/science.1180598

Page 10: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Example of multi-assays study – how many ‘standards’ are applicable to this?

ISMB tag: #PP44

Page 11: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Example of multi-assays study – how many ‘standards’ are applicable to this?

ISMB tag: #PP44

Page 12: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Example of multi-assays study – how many ‘standards’ are applicable to this?

ISMB tag: #PP44

Page 13: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Example of multi-assays study – how many ‘standards’ are applicable to this?

ISMB tag: #PP44

Page 14: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

user community

ISMB tag: #PP44

Page 15: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Metadata tracking framework, designed to support the use us several standards checklists, terminologies conversions to (a growing number of) other metadata formats, used by public repositories, e.g. Currently finalizing conversion to RDF to explore the growing Linked Data universe, in collaboration with the W3C HCLSIG)

MAGE-Tab Pride-xml

SRA-xml SOFT

ISMB tag: #PP44

Page 16: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

(Rocca-Serra et al, 2010)

a collaborative effort of international research/service groups: University of Oxford, EBI, Harvard School of Public Health, NERC Environmental Bioinformatics Centre, Genomic Standards Consortium, US FDA Center for Bioinformatics, Leibniz Institute of Plant Biochemistry and more….

ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level

ISMB tag: #PP44

Page 17: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

17

empowering researchers to use standards

To mint DOIs

ISMB tag: #PP44

Page 18: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Maguire E, Rocca-Serra P, Sansone SA, Davies J and Chen M. Taxonomy-based Glyph Design -- with a Case Study on Visualizing Workflows of Biological Experiments, IEEE Transactions on Visualization and Computer Graphics, volume 18, 2012

(in press)

ISMB tag: #PP44

Page 19: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Ontology Search and Tagging in Google Spreadsheets

ISMB tag: #PP44

Page 20: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Ontology Search and Tagging in Google Spreadsheets

ISMB tag: #PP44

Page 21: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:

•  environmental health •  environmental genomics •  metabolomics •  metagenomics •  nanotechnology •  proteomics,

We aim to achieve a common representation of experimental content that transcends individual bioscience domains

•  stem cell discovery •  system biology •  transcriptomics •  toxicogenomics •  also by communities working to build

a library of cellular signatures

Page 22: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:

•  environmental health •  environmental genomics •  metabolomics •  metagenomics •  nanotechnology •  proteomics

Nanotechnology Informatics Working

Group

Some of the internal projects: Some of the public groups/resources:

4

Stem Cell Commons

Stem Cell Commons

•  stem cell discovery •  system biology •  transcriptomics •  toxicogenomics •  also by communities working to build

a library of cellular signatures

Page 23: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Implementations at Harvard

ISA

ISMB tag: #PP44

Page 24: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Importance of a local community

Implementations at Harvard ISMB tag:

#PP44

Page 25: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Importance of a local community

Implementations at Harvard

data sharing in ISA-Tab

ISMB tag: #PP44

Page 26: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

26

Implementation at the EBI ISMB tag:

#PP44

Page 27: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

Data papers

Page 28: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project

28

Nanotechnology Informatics Working Group

Extensions

Page 29: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

@isatools @biosharing isacommons.org biosharing.org

Page 30: ISA Commons / BioSharing - Susanna-Assunta Sansone - ISMB 2012

www.biosharing.org

www.isacommons.org

TOWARDS INTEROPERABLE BIOSCIENCE DATA

Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.

Feb 2012 www.isacommons.org

doi:10.1038/ng.1054

Development timeline

Community involvement and uptake!

Core developments!

2008 2009 2010

1st ISA-Tab workshop!3rd ISA-Tab workshop!

2nd ISA-Tab workshop!

Final ISA-Tab spec! Database instance !at EBI!

ISA software v1!

2011

1st public instance: !Harvard Stem Cell !Discovery Engine!

RDF format starts!

Conversions to !Pride-XML/SRA-XML/!MAGE-Tab and more!

User workshops/visits - start!Growing number of systems starts to adopt ISA framework!

Publications!

‘Omics data sharing!(Science)!

ISA-Tab and !ISA software suite!(Bioinformatics)!

Stem Cell !Discovery !Engine!(NAR)!

2007 2012

Strawman ISA-Tab spec!

Other tools implement !ISA-Tab!

Workshop reports!ISA Commons!(Nature Genetics)!

Links to analysis tools starts!