wikis, semantic data, and community curation

17
WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION What are the benefits to scientific research? 1

Upload: others

Post on 22-May-2022

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

WIKIS, SEMANTIC DATA, AND

COMMUNITY CURATION

What are the benefits to scientific research?

1

Page 2: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Justin Preece Faculty Research Assistant, Bioinformatics

Jaiswal Lab

Department of Botany and Plant Pathology

Oregon State University (USA)

Pankaj Jaiswal

Assistant Professor

2

Page 3: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Overview

How can we improve the processes and products of

curation?

The wiki concept and the semantic web

Our solution

Plans for

additional

features

3

What kinds of

curation are we

talking about?

Page 4: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Plant Genomics: Functional Associations

Functional Assignment

Gene Function

Gene Expression

Phenotypes

Gene/genetic

Interactions

Literature References

Extended Associations

Alleles

Germplasm

Variation

Protein / DNA

Modifications

Genetic Markers

Data aggregation in the following areas:

4

There are challenges to working with large amounts of this data…

Page 5: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Problem No. 1 – The costs of curation

Manual curation (e.g. genomic annotation) can be

an expensive, insular activity.

There is "too much data" to annotate effectively

under current curatorial practices.

How can we make curation more efficient, more

effective, cheaper, and community-driven?

5

Page 6: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Problem No. 2 – Underutilized semantic data

A cornerstone of semantic data in the –omics community is the construction and use of ontologies to describe data.

Ontology: Formally represents knowledge as a set of concepts within a domain, and the relationships between those concepts.

A lot of effort has been put into building ontologies and annotating genomic data with ontology terms.

Why?

To give research communities a common vocabulary

To enhance the computability of this data

The OBO

Foundry

6

Page 7: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

And how are we doing with this data challenge?

Substantial progress on

the creation and use of

ontologies as

vocabularies

Need more development

on the use of ontological

relationships to

demonstrate computable

meaning.

Creation and population of semantic data

Semantic inferencing and reasoning over this data

How do we get more semantic value out of the

annotated data that is being associated with

ontologies?

7

Page 8: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Our proposed solution for both

problems: a semantic wiki

Web application that uses

familiar wiki functionality

Structured semantic page

content; a more traditional

"web form“ interface

Key Features

http://www.wikipedia.org/

8

Page 9: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

How do wikis enable community curation?

Ease of use

Long-tail theory

Participation rate?

Examples:

Gene Wiki

WikiPathways (http://www.wikipathways.org/index.php/WikiPathways)

In many existing scientific wikis, creating content is easy but interacting with that content as data is not necessarily easy.

http://en.wikipedia.org/wiki/Calreticulin

9

Page 10: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

How can a semantic wiki help us extract more meaning from

our curated data?

Open content or loosely-structured data

Hard to parse for meaning

Data is available as RDF (Resource Description Framework)

Can be queried (“reasoned” over)

Traditional Wiki Pages Semantic Wiki Pages

10

Page 11: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

A lightning-fast primer on semantic

web theory

Semantic web: A web of data, as opposed to just a web of documents. When possible, the data should describe itself. http://www.w3.org/2001/sw/

Triple: A logical assertion in the form of subject-predicate-object (“Justin wears eyeglasses.” “Arabidopsis thaliana is a plant.”)

URI (Uniform Resource Identifier): A standard way to identify a piece of data (a “literal”) or a web resource (page, site) and its location on the web.

Graph: A diagram of triples, or put another way, a chart of resources and their relationships

Reasoning: The act of running a computational algorithm over a graph, either to determine its logical consistency or to glean “truthful” statements from it.

TWO CRITICAL COMPONENTS: Self-descriptive data and its relationships to other data.

11

Page 12: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

How could our semantically-structured

data help the research community?

Provide data in useful

formats that both users

and external tools (i.e.

SPARQL engines, web

services) can access and

analyze.

Enable direct semantic

inferencing on the data:

membership, transivity,

relatedness, suggestion,

contradiction, etc.

Short-term Long-term

Ok, that’s nice…but

what kinds of data

will you be hosting?

12

Page 13: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Plant genomics – represented semantically

Functional Assignment

Gene Function

Gene Expression

Phenotypes

Gene/genetic

Interactions

Literature References

Extended Associations

Alleles

Germplasm

Variation

Protein / DNA

Modifications

Genetic Markers

Ok, that’s also nice…but how are you

building this thing?

13

Page 14: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Our implementation:

Planteome Annotation Wiki

Platform: MediaWiki (same as Wikipedia)

Primary Extensions:

Semantic MediaWiki http://semantic-mediawiki.org/

Semantic Forms http://www.mediawiki.org/wiki/Extension:Semantic_Forms

Other extensions:

Data Transfer, External Data

Parser Functions

Semantic Internal Objects

Server prerequisites: LAMP architecture (Linux, Apache, MySQL, PHP)

14

Page 15: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Semantic MediaWiki & Semantic Forms

In true wiki fashion, everything is still a “page”:

Wiki Categories define page types (Annotations, Publications, etc.)

Semantic Properties are declared by creating a page with that property’s name.

Templates are used just as they are in wikis – to provide formatting instructions for viewing pages

Semantic Forms use wiki markup to define the layout of web forms for data entry (again, on wiki pages)

15

Page 16: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

How to represent annotation data semantically

Semantic data is stored in additional MySQL tables alongside the native MediaWiki database structure.

Properties function as the predicates in our S-P-O construct described earlier.

Pages (URI’s) and literal values (numbers and strings) function as both subjects and objects

The Semantic Internal Objects (SIO) extension allows pages to have other collections of data nested inside of them.

Data Structure Examples

16

Page 17: WIKIS, SEMANTIC DATA, AND COMMUNITY CURATION

Thank you! 17

Credits

Justin Elser and Palitha Dharmawardhana (Jaiswal Lab) Technical support and curatorial advice

Chris Sullivan and the OSU Center for Genome Research & Biocomputing Server hosting and support

The Gene Ontology, Plant Ontology Consortium, Gramene, and TAIR (The Arabidopsis Information Resource) Data resources

Yaron Koren Author of Semantic Forms and several other SMW-related extensions

Semantic MediaWiki group (Markus Krötzsch, Jeroen De Dauw, et al)

Funding for the Plant Ontology Consortium (POC) is provided by the U.S. National Science Foundation (Award #0822201).

Thanks again to Timothy Eyres and Syngenta!