choices, modelling and frankenstein ontologies


Click here to load reader

Upload: benosteen

Post on 06-May-2015




1 download


Page 1: Choices, modelling and Frankenstein Ontologies

BRII Project @ Bristol 21/09/2009

Frankenstein ontologies, and

the models that use them.

Page 2: Choices, modelling and Frankenstein Ontologies

First things first

- what the main issues are with this sort of information.

Which leads quickly to:

- how we internally represent Things and why we do it the way we do.

- the vocabularies we use

- and how we are helping the university to contribute back

Page 3: Choices, modelling and Frankenstein Ontologies

The Problem:

Information about research and the people involved in it changes all the time.

Otherwise, it wouldn't be research.

Page 4: Choices, modelling and Frankenstein Ontologies

The Problem:

Information about research and the people involved in it changes all the time.

Otherwise, it wouldn't be research.

Page 5: Choices, modelling and Frankenstein Ontologies

Don't forget, Information about Things should always include context.

- Where did the information come from?- From whom?- How old is it?

- How valid is it?- Who can see it?- When is it valid?

Page 6: Choices, modelling and Frankenstein Ontologies

Ranges from the simple:

“Jane Smith (born Jane Doe) publishes under the name George Maxwell”

They are all names after all, only the context lets us tell them apart – when to use them and when


Validity and context

Page 7: Choices, modelling and Frankenstein Ontologies

To the precise:

“Jane Doe, 35, married July 1995 to Richard Smith... “

Validity and context

Page 8: Choices, modelling and Frankenstein Ontologies

How we cope

The system that holds the canonical version of the Things metadata does not provide the query technologies we are using at any given time.

This allows for a clear separation of concepts and information from the forms in which we wish to ask questions of them.

Page 9: Choices, modelling and Frankenstein Ontologies

How we cope

We are not bound to any one way of looking at or indexing our data.

Currently, we use RDF to serialise our information in the store and lucene(Solr) and quadstore(4Store) indexes.

Page 10: Choices, modelling and Frankenstein Ontologies

The basic model (axioms)

● There are Things● These Things can change over time● These Things can hold information that is valid

in certain contexts● Not just valid at a point in time

● (Things can hold more than just metadata – it's a bag of “stuff”)

Page 11: Choices, modelling and Frankenstein Ontologies

Current Implementation

● Bag of “Stuff” → Object-based storage● Fedora, or● Pairtree FS-based, or● 'Bucket' (SUN Honeycomb, OpenStorage, Amazon S3)

● Contains ROOT and MANIFEST (serialised graphs)● ROOT contains the RDF triples that are globally true (identifiers,

birthnames, that sort of thing mainly)● MANIFEST contains triples describing the other objects in the

bag, and their relationship to other objects/resource in or out of the bag.

● Most importantly, the MANIFEST contains the context of the other parts of the bag.

Page 12: Choices, modelling and Frankenstein Ontologies

“Revisions, personas, etc”

● Things can have different information about them which is valid in different situations.● Publication Thing (like an article) can have revisions● Person can have personas (between 1995 and

2003 person X published under 'Dr Jones')

● All things are allowed this capability and the current implementation handles these using named graphs in the bag.

Page 13: Choices, modelling and Frankenstein Ontologies

Current Implementation

● Eg (from a MANIFEST describing a named graph)


<rdf:Description rdf:about="info:fedora/ora:1/first">

<ov:validUntil rdf:datatype="">2009-09-20T09:31:45.847065</ov:validUntil>

<ov:validFrom rdf:datatype="">2009-09-19T09:31:45.846982</ov:validFrom>

<rdf:type rdf:resource=""/>


<dcterms:created rdf:datatype="">2009-09-21T09:31:45.848289</dcterms:created>

<foaf:primaryTopic rdf:resource="info:fedora/ora:1"/>


Page 14: Choices, modelling and Frankenstein Ontologies

Current Implementation

● Some of the important context for the graph at “info:fedora/ora:1/first":

<ov:validUntil [..] >2009-09-20T09:31:45.847065</ov:validUntil>

<ov:validFrom [..] >2009-09-19T09:31:45.846982</ov:validFrom>

<rdf:type rdf:resource=""/>

<foaf:primaryTopic rdf:resource="info:fedora/ora:1"/>

Page 15: Choices, modelling and Frankenstein Ontologies

Contexts to think about

Context is not restricted to serialised graphs!● dcterms:source● dcterms:creator● foaf:depiction● dcterms:subject● Geo:*● Evidence (who stated this assertion with what


Page 16: Choices, modelling and Frankenstein Ontologies

The Dr Frankenstein approach

● “A little from here, a little from there – making sure that the whole works...”● Foaf● Bio● Bibo, Dcterms and DC● RES – Researcher

ontology – Ann Bowtell, Katie

● SKOS and taxonomies likeLcsh (Library of Congress subject headings)● Hartig and Zhao's

provenance ontology -


Page 17: Choices, modelling and Frankenstein Ontologies

The Dr Frankenstein approach

Not forgetting Dr Frankenstein added bits and pieces of his own devising

● - a home for ontologies, taxonomies, software to be used with them, and information about them

● Activities are afoot to gather domain area taxonomies and to provide simple APIs to maintain these for normal researchers.● Ehumanities● Maths● The HASSET theasaurus (we are in contact with them, but legal

uncertainty on their part is holding things up)

Page 18: Choices, modelling and Frankenstein Ontologies

Bottom line is that CERIF is an Interchange format – originally conceived to allow commercial

management systems to interchange data en masse.

It does have certain design flaws due to its relational database legacy and lowest common

denominator approach

Unfortunately, I foresee many people saying 'we need a CERIF system' and contractors giving

them just that – a system that uses an interchange format as it's datastore format.

The CERIF question

Page 19: Choices, modelling and Frankenstein Ontologies

CERIF will allow to to be shared with a similar system (IMHO it will be like sharing a SQL dump

between versions of wordpress)

Linked Data starts with the premise that it is sharing the data already with anyone with a


The CERIF question