bioschemas presentation at eccb 2016, the hague

36
Bioschemas.org Structured data for Life Sciences using Schema.org Niall Beard Scientific Web Technologist, University of Manchester

Upload: niall-beard

Post on 21-Jan-2017

236 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Bioschemas presentation at ECCB 2016, The Hague

Bioschemas.org

Structured data for Life Sciences using

Schema.org Niall Beard

Scientific Web Technologist, University of Manchester

Page 2: Bioschemas presentation at ECCB 2016, The Hague

ELIXIR: European infrastructure for biological informationData infrastructure for Europe’s life-science research:

www.elixir-europe.org

@ELIXIREurope

Data

Interoperability

Tools

Compute

Training

Marine metagenomics

Human data

Crop and forest plants

Rare diseases

• 20 Members • 1 Observer

ELIXIR Hub based alongside EMBL-EBI in Hinxton

Page 3: Bioschemas presentation at ECCB 2016, The Hague

• 20 Members• 1 Observer

Page 4: Bioschemas presentation at ECCB 2016, The Hague

FAIRFindable

Accessible

Interoperable

Reusable

Page 5: Bioschemas presentation at ECCB 2016, The Hague

Finding resources – Search engine index

Resource Resource Resource

Page 6: Bioschemas presentation at ECCB 2016, The Hague

Finding resources – Catalogues

bio.tools

tess.elixir-uk.org

Page 7: Bioschemas presentation at ECCB 2016, The Hague

Discover resources by filtering metadata

Page 8: Bioschemas presentation at ECCB 2016, The Hague

Finding resources – Content Integration platforms

Training Resource

Training Resource

Training Resource

Tool Resource

Tool Resource

Tool Resource

bio.tools

tess.elixir-uk.org

Programmatically aggregated

Page 9: Bioschemas presentation at ECCB 2016, The Hague

Bio.tools XSD

https://github.com/bio-tools/biotoolsxsd

Page 10: Bioschemas presentation at ECCB 2016, The Hague

Metadata modelie. Recipe type

Page 11: Bioschemas presentation at ECCB 2016, The Hague
Page 12: Bioschemas presentation at ECCB 2016, The Hague

<div itemscope itemtype="http://schema.org/Recipe">

<div itemprop="nutrition” itemscopeitemtype="http://schema.org/NutritionInformation">

Nutrition facts: <span itemprop="calories">144 kcal</span>, </div>

Ingredients: - <span itemprop="recipeIngredient">800g small new potato</span> - <span itemprop="recipeIngredient">3 shallot</span> . . .

Page 13: Bioschemas presentation at ECCB 2016, The Hague

<script type="application/ld+json">{ "@context": "http://schema.org", "@type": ”Recipe", "name": ”Potato Salad", “NutritionInformation”: {

"calories”: “144 kcal”, "recipeIngredient”: “800g small new potato”, "recipeIngredient”: “3 shallot”. . .

Page 14: Bioschemas presentation at ECCB 2016, The Hague
Page 15: Bioschemas presentation at ECCB 2016, The Hague

Search engine readable = optimized

Content Content Content

Schema.org Schema.org Schema.org

Page 16: Bioschemas presentation at ECCB 2016, The Hague

Search engines favour websites containing schema.org in their search results

Page 17: Bioschemas presentation at ECCB 2016, The Hague

Content integration aggregationTraining Resource

Training Resource

Training Resource

Schema.org Schema.org Schema.org

tess.elixir-uk.org

Page 18: Bioschemas presentation at ECCB 2016, The Hague

Minimum informationControlled vocabularies

Cardinality

Data model

New properties

Page 19: Bioschemas presentation at ECCB 2016, The Hague

BioSchemas.orgminimal, maximal, extensible

Trainingmaterials

Events Organizations

Data

Standards

Software

Minimum information

for one content type

Trainingmaterials

Events Organizations

DataSoftware

Standards

Common properties

among content types

Page 20: Bioschemas presentation at ECCB 2016, The Hague

More depth to a broad-reach technology

DepthDATS

Reach

Page 21: Bioschemas presentation at ECCB 2016, The Hague

Use case 1: TeSS, ELIXIR Training Portal - Aggregates Life Science Training Materials

Page 22: Bioschemas presentation at ECCB 2016, The Hague

Large Training Sites• Well-formed APIs• XML Dumps • RSS feeds

Medium/Small Sites• No structured data

Page 23: Bioschemas presentation at ECCB 2016, The Hague

The long tail, collections sets and small science

Slide courtesy of Todd Vision, Dryad

Page 24: Bioschemas presentation at ECCB 2016, The Hague

http://www.france-bioinformatique.fr/en/training_material

https://search.google.com/structured-data/testing-tool

Applied Drupal 7 schema.org extensionTook about 2 hours

Included in TeSS in an hour

Page 25: Bioschemas presentation at ECCB 2016, The Hague

Biosamples entry(Diabetic mouse strain)

Diabetes termEFO_0000400 Experimental

Factor Ontology

Defined byisAbout

Courtesy of Tony Burdett and Simon Jupp

Use case 2: Mapping data to ontologies

Page 26: Bioschemas presentation at ECCB 2016, The Hague

Organization- name

MedicalEntity- name- description

MedicalCode- codeValue- codingSystem

MedicalCode- name- url- alternateName- description- codeValue- codingSystem…

CreativeWork- about- name- description- url- datePublished…

Data Term Ontology

Courtesy of Tony Burdett and Simon Jupp

Use case 2: Mapping data to ontologies

Page 27: Bioschemas presentation at ECCB 2016, The Hague

Use case 3.1: Dataset Markup, Citation

• Dataset Citation• Mapping to JATS Journal Article

Tag Suite Data extension*• Metadata for data citationGoogle, Bing, Yahoo, Yandex

Trainingmaterials

Events Organizations

DataSoftware

Standards

*Daniel Mietchen et al , Adapting JATS to support data citation, Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015, Bethesda (MD): National Center for Biotechnology Information 2015.

Page 28: Bioschemas presentation at ECCB 2016, The Hague

Use case 3.2: Dataset Markup, Samples

• Biobank Samples• Limited number of simple key

properties• Disease, gender, age and

sample type, data available• Cross-walk MIABIS: Minimum

Information About BIobank data Sharing

Google, Bing, Yahoo, Yandex

Trainingmaterials

Events Organizations

DataSoftware

Standards

Cataloging 400 UK Biobanks

Page 29: Bioschemas presentation at ECCB 2016, The Hague

Value for content providers

• More exposition through search engines and portals• Favoured in search results

• Low barrier for adoption• Embedding schema.org in pages can be done with off-the-

shelf CMS • Tools for most frameworks and web scripting languages

• Longevity of Standard • Standard is open to the wider community and will survive

past funding• Less chance of the schema deprecating after

implementation

Page 30: Bioschemas presentation at ECCB 2016, The Hague

Value for content integration platforms

• Good benefits to persuade providers to structure their data

• Lots of tooling available for parsing structured data• Many open RDFa, JSON-LD, and microdata parses

available on GitHub• Wider community engaged in construction

• Schema.org is a public forum so not limited to just the people you know

• Much more scalable than scraping • Bespoke scripts that gain technical debt when scraping

Page 31: Bioschemas presentation at ECCB 2016, The Hague

Development Process

Page 32: Bioschemas presentation at ECCB 2016, The Hague

Acknowledgements

Page 33: Bioschemas presentation at ECCB 2016, The Hague

Acknowledgments

• TeSSNiall Beard

• BioSharingSA Sansone, A Gonzalez-Beltran, P McQuilton, P Rocca-Serra

• NIH BD2K bioCADDIESA Sansone, A Gonzalez-Beltran, Jeff Grethe

• CommunityPremysl Velek

• EventMartin Cook

• Training materialsAleksandra Nenadic & Gabriella Rustici

Organization representatives

Group chairs

BioSchemas community

• ELIXIRPremysl Velek

• Pistoia AllianceRichard Holland

• GOBLETTerri Attwood

• BBMRIMichaela Mayrhofer

• OrganizationRichard Holland & Rafael C Jimenez

• PersonNiall Beard

• StandardA Gonzalez-Beltran & P McQuilton

Page 34: Bioschemas presentation at ECCB 2016, The Hague

Contributors• Aleksandra Nenadic• Adam Hospital • Gabriella Rustici• Carlos Horro• Martin Cook• Niall Beard• Rafael C Jimenez• Andy Jenkinson• Manuel Corpas• Roberto Preste• Richard Holland• Alejandra Gonzalez-Beltran• Andrew Lonie• Carole Coble• Peter McQuilton• Premysil Velek• Ian Dunlop• Jef Grethe• Milo Thurston• Niklas Blomberg

• Isabelle Perseil• Jaap Heringa• Jon Ison• John Hancock• Simon Jupp• John (Jack) D. Van Horn • Ivana Krenkova• Laura Furlong• Morris Swertz• Mateusz Kuzak• Mario Alberich• Mark Thompson• Maria Martin• Mikael Borg• Montserrat González• Norman Morrison• Núria Queralt-Rosinach• Olivier Sallou• Robert Pergl• Pedro Fernandes

• Yasset Perez-Riverol• Sarala Wimalaratne• Nick Juty• Jose Luis Ambite• Brane Leskošek• Celia van Gelder• Christa Janko• Christine Staiger• Dan Brickley• Daniel Faria• Dmitry Repchevsky• Daniel Sobral• Daniel Vaughan• Ian Fore• Frederik Coppens• Josep Ll. Gelpi• ChuQiao Gong• Hedi Peterson• Hervé Ménager• Nina Hrtonova

• Pierre Larmande• Rob Finn• Renzo Kottmann• Rodrigo Lopez• Sameer Velankar• Sara Light• Carol Shreffler • Silvano Squizzato• Susanna Sansone• Tony Burdett• Terri Attwood• Cath Brooksbank• Hedi Peterson• Luc Deltombe• Michaela Mayrhofer• Philippe Rocca-Serra

Page 35: Bioschemas presentation at ECCB 2016, The Hague

Upcoming Bioschemas Activities

• Biosoftware description using bio.tools and schema.org - NETTAB, 24th October

• Bioschemas AGM on 8th-9th November in Rothamsted UK• See: https://goo.gl/hu7uYK

• Implementation study proposal being drafted• Develop more content types for life sciences:

• Data repository• Dataset• Sample• Phenotype• Protein annotations

Page 36: Bioschemas presentation at ECCB 2016, The Hague

http://bioschemas.org

@BioSchemas

Thank you!Mailing List: [email protected]