the semantic web: new-style data-integration (and how it works for life-scientists too!)
DESCRIPTION
The Semantic Web: New-style data-integration (and how it works for life-scientists too!). Frank van Harmelen AI Department Vrije Universiteit Amsterdam. What’s the problem? (data-mess in bio-inf). Kenneth Griffiths and Richard Resnick Tut. At Intell. Systems for Molec. Biol., 2003. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/1.jpg)
The Semantic Web:New-style data-integration
(and how it works for life-scientists too!)
Frank van HarmelenAI Department
Vrije Universiteit Amsterdam
![Page 2: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/2.jpg)
What’s the problem?
(data-mess in bio-inf)
![Page 3: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/3.jpg)
Life Science Data
Recent focus on genetic data“genomics: the study of genes and their function. Recent advances in genomics are bringing about a revolution in our understanding of the molecular mechanisms of disease, including the complex interplay of genetic and environmental factors. Genomics is also stimulating the discovery of breakthrough healthcare products by revealing thousands of new biological targets for the development of drugs, and by giving scientists innovative ways to design new drugs, vaccines and DNA diagnostics. Genomics-based therapeutics include "traditional" small chemical drugs, protein drugs, and potentially gene therapy.”
The Pharmaceutical Research and Manufacturers of America - http://www.phrma.org/genomics/lexicon/g.html
Study of genes and their function
Understanding molecular mechanisms of disease
Development of drugs, vaccines, and diagnostics
Kenneth Griffiths and Richard ResnickTut. At Intell. Systems for Molec. Biol., 2003
![Page 4: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/4.jpg)
The Study of Genes...
• Chromosomal location
• Sequence
• Sequence Variation
• Splicing
• Protein Sequence• Protein Structure
![Page 5: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/5.jpg)
… and Their Function
• Homology
• Motifs
• Publications
• Expression
• HTS
• In Vivo/Vitro Functional Characterization
![Page 6: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/6.jpg)
Understanding Mechanisms of Disease
Metabolic and
regulatory pathway induction
![Page 7: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/7.jpg)
Development of Drugs, Vaccines, Diagnostics
Differing types of Drugs, Vaccines, and Diagnostics• Small molecules• Protein therapeutics• Gene therapy• In vitro, In vivo diagnostics
Development requires• Preclinical research• Clinical trials• Long-term clinical research
All of which often feeds back into ongoing Genomics research and discovery.
![Page 8: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/8.jpg)
The Industry’s Problem
Too much unintegrated data:– from a variety of incompatible sources
– no standard naming convention
– each with a custom browsing and querying mechanism (no common interface)
– and poor interaction with other data sources
![Page 9: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/9.jpg)
What are the Data Sources?
• Flat Files• URLs• Proprietary Databases• Public Databases• Data Marts• Spreadsheets• Emails• …
![Page 10: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/10.jpg)
Sample Problem: Hyperprolactinemia
Over production of prolactin– prolactin stimulates mammary gland
development and milk production
Hyperprolactinemia is characterized by:– inappropriate milk production– disruption of menstrual cycle– can lead to conception difficulty
![Page 11: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/11.jpg)
Understanding transcription factors for prolactin production
“Show me all genes in the public literature that are putatively related to hyperprolactinemia, have more than 3-fold expression differential between hyperprolactinemic and normal pituitary cells, and are homologous to known transcription factors.”
“Show me all genes that are homologous to known transcription factors”
SEQUENCE
1Q“Show me all genes that have more than 3-fold expression differential between hyperprolactinemic and normal pituitary cells”EXPRESSION
2Q
“Show me all genes in the public literature that are putatively related to hyperprolactinemia”
LITERATURE
3Q
(Q1Q2Q3)
![Page 12: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/12.jpg)
The Complexity of Biological Data
![Page 13: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/13.jpg)
Source: PhRMA & FDA 2003
Pharmaceutical Productivity
![Page 14: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/14.jpg)
Stitching this all together by hand?
Source: Stephens et al. J Web Semantics 2006
![Page 15: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/15.jpg)
The Medical tower of Babel Mesh
Medical Subject Headings, National Library of Medicine 22.000 descriptions
EMTREE Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms
UMLS Integrates 100 different vocabularies
SNOMED 200.000 concepts, College of American Pathologists
Gene Ontology 15.000 terms in molecular biology
NCI Cancer Ontology: 17,000 classes (about 1M definitions),
![Page 16: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/16.jpg)
Problem with the Current WWW
![Page 17: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/17.jpg)
Why would Semantic Web
technology help?
![Page 18: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/18.jpg)
machine accessible meaning (What it’s like to be a machine)
<name>
<symptoms>
<drug>
<drugadministration>
<disease>
<treatment>
IS-A
alleviatesMETA-DATA
![Page 19: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/19.jpg)
What is meta-data?
it's just datait's data describing other dataits' meant for machine consumption
disease
name
symptoms
drug
administration
![Page 20: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/20.jpg)
Required are:1. one or more standard vocabularies
so search engines, producers and consumersall speak the same language
2. a standard syntax, so meta-data can be recognised as such
3. lots of resources with meta-data attached mechanisms for attribution and trust
is this page really about Pamela Anderson?
![Page 21: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/21.jpg)
no shared understanding
Conceptual and terminological confusion
Actors: both humans and machines
Agree on a conceptualization
Make it explicit in some language.
world
concept
language
What are ontologies &what are they used for
![Page 22: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/22.jpg)
standard vocabularies (“Ontologies”)Identify the key concepts in a domainIdentify a vocabulary for these
conceptsIdentify relations between these
conceptsMake these precise enough
so that they can be shared between humans and humans humans and machines machines and machines
![Page 23: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/23.jpg)
Shared content-vocabularies:Ontologies
Formal,
explicit specification
of a shared
conceptualisation Abstract model ofsome domain
Consensualknowledge
concepts, properties,relations, functions
machineprocessable
![Page 24: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/24.jpg)
Real life examples handcrafted
music: CDnow (2410/5), MusicMoz (1073/7) biomedical: SNOMED (200k), GO (15k),
Emtree(45k+190kSystems biology
ranging from lightweight Yahoo, UNSPC, Open directory (400k)
to heavyweight (Cyc (300k))
ranging from small (METAR) to large (UNSPC)
![Page 25: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/25.jpg)
Biomedical ontologies (a few..) Mesh
Medical Subject Headings, National Library of Medicine 22.000 descriptions
EMTREE Commercial Elsevier, Drugs and diseases 45.000 terms, 190.000 synonyms
UMLS Integrates 100 different vocabularies
SNOMED 200.000 concepts, College of American Pathologists
Gene Ontology 15.000 terms in molecular biology
NCBI Cancer Ontology: 17,000 classes (about 1M definitions),
![Page 26: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/26.jpg)
What’s inside an ontology?
terms + specialisation hierarchy classes + class-hierarchy instances slots/values inheritance (multiple? defaults?) restrictions on slots (type, cardinality) properties of slots (symm., trans., …) relations between classes (disjoint, covers) reasoning tasks: classification,
subsumption
Increasing semantic “weight”
![Page 27: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/27.jpg)
NB: we’re not doing philosophy
Ontologies are not
definitive descriptions of what exists in the world (= philosphy)
Ontologies are
models of the worldconstructed
to facilitate communication
Yes, ontologies exist(because we build them)
![Page 28: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/28.jpg)
Remember “required are”: one or more standard vocabularies
so search engines, producers and consumersall speak the same language
2. a standard syntax, so meta-data can be recognised as such
3. lots of resources with meta-data attached
![Page 29: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/29.jpg)
Stack of languages
![Page 30: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/30.jpg)
Stack of languagesXML:
Surface syntax, no semanticsXML Schema:
Describes structure of XML documentsRDF:
Datamodel for “relations” between “things”RDF Schema:
RDF Vocabular Definition LanguageOWL:
A more expressive Vocabular Definition Language
![Page 31: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/31.jpg)
RDF Triples in Life Sciences
![Page 32: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/32.jpg)
Bluffer’s guide to RDF (1)Object --Attribute-> Value triples
objects are web-resourcesValue is again an Object:
triples can be linked data-model = graph
pers05 ISBN...Author-of
pers05 ISBN...Author-of
MIT
ISBN...
Publ-by
Author-of Publ-
by
![Page 33: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/33.jpg)
Bluffer’s guide to RDF (2) Every identifier is a URL
= world-wide unique naming!
Has XML syntax
Any statement can be an object• graphs can be nested
pers05 ISBN...Author-of
NYT claims
<rdf:Description rdf:about=“#pers05”> <authorOf>ISBN...</authorOf></rdf:Description>
![Page 34: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/34.jpg)
What does RDF Schema add?
• Defines vocabulary for RDF• Organizes this vocabulary in a
typed hierarchy• Class, subClassOf, type• Property, subPropertyOf• domain, range
Person
Teacher Student
subClassOfsubClassOf
Marta
type
supervisesdomain range
Frank
type
supervises
![Page 35: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/35.jpg)
Stack of languagesXML:
Surface syntax, no semanticsXML Schema:
Describes structure of XML documentsRDF:
Datamodel for “relations” between “things”RDF Schema:
RDF Vocabular Definition LanguageOWL:
A more expressive Vocabular Definition Language
![Page 36: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/36.jpg)
OWL: things RDF Schema can’t doequalityenumerationnumber restrictions
Single-valued/multi-valued Optional/required values
inverse, symmetric, transitiveboolean algebra
Union, complement…
![Page 37: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/37.jpg)
OWL: more expressivity
Full
DL
Lite
OWL Full Allow meta-classes etc
OWL DLNegationDisjunctionFull CardinalityEnumerated types
OWL Light(sub)classes, individuals(sub)properties, domain, rangeconjunction(in)equalitycardinality 0/1datatypesinverse, transitive, symmetrichasValuesomeValuesFromallValuesFrom
RDF Schema
![Page 38: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/38.jpg)
Remember “required are”: one or more standard vocabularies
so search engines, producers and consumersall speak the same language
a standard syntax, so meta-data can be recognised as such
3. lots of resources with meta-data attached
![Page 39: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/39.jpg)
Question: who writes the ontologies?Professional bodies, scientific
communities, companies, publishers, ….
See previous slide on Biomedical ontologies Same developments in many other fields
Good old fashioned Knowledge Engineering
Convert from DB-schema, UML, etc.
![Page 40: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/40.jpg)
Question:Who writes the meta-data ?
- Automated learning- shallow natural language analysis- Concept extraction
amsterdam
trade
antwerp europe
amsterdam
merchant
city town
center
netherlandsmerchant
city town
Example: Encyclopedia Britannica on “Amsterdam”
![Page 41: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/41.jpg)
exploit existing legacy-data Amazon Lab equipment?
side-effect from user interaction MIT Lab photo-annotator
NOT from manual effortWeb 2.0 community/social interaction
Question:Who writes the meta-data ?
![Page 42: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/42.jpg)
Remember “required are” one or more standard vocabularies
so search engines, producers and consumersall speak the same language
a standard syntax, so meta-data can be recognised as such
lots of resources with meta-data attached
![Page 43: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/43.jpg)
Some working examples?
• DOPE• HCLS (http://www.w3.org/2001/sw/hcls/)
![Page 44: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/44.jpg)
DOPE: BackgroundVertical Information Provision
Buy a topic instead of a Journal ! Web provides new opportunities
Business driver: drug development Rich, information-hungry market Good thesaurus (EMTREE)
![Page 45: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/45.jpg)
The Data Document repositories:
ScienceDirect: approx. 500.000 fulltext articles
MEDLINE: approx. 10.000.000 abstracts
Extracted Metadata The Collexis Metadata Server: concept-
extraction ("semantic fingerprinting")
Thesauri and Ontologies EMTREE:
60.000 preferred terms 200.000 synonyms
![Page 46: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/46.jpg)
RDF Schema
EMTREE
Queryinterface
RDF
Datasource 1
RDF
Datasource n….
Architecture:
![Page 47: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/47.jpg)
Architecture:
GUI: Spectacle (Aduna)
Metadata Server(Collexis)
EMTREEThesaurus
(RDFS)
Mediator: Sesame (Aduna)
http requests
Java Client
SOAP
DocumentModel(RDFS) Source
Model(RDF)
SeRQL
Additional Source of Data
SourceModel(RDF)SeRQL
GeneThesaurus
(RDFS)
![Page 48: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/48.jpg)
![Page 49: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/49.jpg)
![Page 50: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/50.jpg)
![Page 51: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/51.jpg)
![Page 52: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/52.jpg)
![Page 53: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/53.jpg)
![Page 54: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/54.jpg)
![Page 55: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/55.jpg)
![Page 56: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/56.jpg)
![Page 57: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/57.jpg)
![Page 58: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/58.jpg)
![Page 59: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/59.jpg)
Summarising… Data integration on the Web:
machine processable data besides human processable data
Syntax for meta-data XML (not much meaning) RDF (some meaning) RDF Schema (some meaning) OWL (more meaning
Vocabularies for meta-data Lot’s of them in bio-inf.
Actual meta-data: Lot’s in bio-inf.
Will enable: Better search engines (recall, precision, concepts) Combining information across pages (inference) …
![Page 60: The Semantic Web: New-style data-integration (and how it works for life-scientists too!)](https://reader030.vdocuments.us/reader030/viewer/2022032805/568133ef550346895d9ae2f2/html5/thumbnails/60.jpg)
Things to do for you Practical:
Use existing software to construct new use-scenario’s
Conceptual:Create on ontology for some area of bio-medical expertise
from scratch as a refinement of an existing ontology
Technical:Transform an existing data-set in meta-data format, and provide a query interface (for humans and machines)