www.eagle-i.org www.eagle-i.org
eagle-i: a national network of biomedical research resources
Cambridge Semantic Web Meetup, June 2011
Daniela Bourges-Waldegg eagle-i system architect, on behalf of the eagle-i Consortium
Outline
Introduction and motivation
• The eagle-i consortium and network
• Why eagle-i?
The eagle-i architecture and software stack
• Layered ontology model
• Ontology-driven development
Challenges of producing and consuming linked data
Concluding remarks
eagle-i Consortium – a national network 9 institutions diverse in geography, culture
and resources
Why eagle-i? the problem
Researcher A: Starting new project
needs:
1. Expertise (technical skill set)
2. Knowledge (understanding of domain)
3. Material Resources (plasmids, antibodies, organisms, equipment, services…)
Why eagle-i? the problem
Researcher A: Starting new project
needs:
1. Expertise (technical skill set) ---- ✔ 2. Knowledge (understanding of domain)
3. Material Resources (plasmids, antibodies, organisms, equipment, services…)
Why eagle-i? the problem
Researcher A: Starting new project
needs:
1. Expertise (technical skill set) ---- ✔
2. Knowledge (understanding of domain) ---- ✔ 3. Material Resources (plasmids, antibodies,
organisms, equipment, services…)
Why eagle-i? the problem
Researcher A: Starting new project
needs:
1. Expertise
2. Knowledge
3. Material Resources
• Create
• Purchase
• Borrow/Collaborate
Why eagle-i? the problem
Researcher A: Starting new project
needs:
1. Expertise
2. Knowledge
3. Material Resources
• Create • start now • control quality • time • money
• Purchase • fast and easy • costly • may not be available
• Borrow/Collaborate • free • faster than remaking • collaborative • uncertainty
$ ?"
Why eagle-i? the problem
Researcher B: Finishing a project
Has produced:
1. Expertise
2. Knowledge
3. Material Resources
Why eagle-i? the problem
Researcher B: Finishing a project
Has produced:
1. Expertise
2. Knowledge
3. Material Resources
Next Project, Publications
Why eagle-i? the problem
Researcher B: Finishing a project
Has produced:
1. Expertise
2. Knowledge
3. Material Resources
1. Deep Freeze • always have it • never know where to find
it 2. Toss
• reduce clutter • save on space and energy • Gone for good – may need
it again 3. Organize
1. always have it 2. always find it 3. easily share/collaborate 4. save time and money in
long run 5. takes time in the short run
Why eagle-i? the problem
1. Deep Freeze • always have it • never know where to find
it 2. Toss
• reduce clutter • save on space and energy • Gone for good – may need
it again 3. Organize
• always have it • always find it • easily share/collaborate • save time and money in
long run • takes time in the short
run
1. Create • start now • control quality • time • money
2. Purchase • fast and easy • costly • may not be available
3. Borrow/Collaborate • free • faster than remaking • collaborative • uncertainty
The goal of eagle-i
Provide a mechanism to allow researchers who need, to connect to researchers who have.
Reduce redundancy in resource development.
Connect researchers with resources that they don’t know that they need.
JSU Data Center
eagle-i ontology
Search Application
Federated Network (SPIN)
Repository (RDF)
Data Tools
NIF, PubMed, Entrez Gene,
etc.
The eagle-i architecture
eagle-i design principles
Ontology-centric architecture
Data collection and search user interfaces driven by ontology
Repository performs certain types of ontology-based reasoning
ETL components transform data to ontology-conformant instances Why?
Applications can seamlessly adapt to ontology evolution without code changes
Data is stored as RDF and follows Linked Open Data principles
Query any eagle-i repository via a SPARQL endpoint
All eagle-i resource instances are linkable (an instance is simply an URI) Why?
Storage model best-adapted to ontology-conformant data
Flexibility, extensibility
The eagle-i software stack
Data collection
clients
Data tools
eagle-i ontology
Search Application
Sesame RDF store
REST API
The eagle-i software stack
Sesame RDF store
Data tools
Search Application
eagle-i-app-dataTools.owl
eagle-i-app.owl
Application- specific Ontologies
Ontology Memory Model
EIOntModel API
Jena/Pellet
Domain Ontologies
ero.owl
mesh-diseases.owl ro.owl iao.owl
Bfo.owl etc… Data
collection webapp (GWT)
Data management
webapp (GWT)
ETL
Lucene Search UI (GWT)
eagle-i ontology
eagle-i data collection tool
Type browser: allows navigation of an ontology branch
eagle-i primary types
Object property:
ontology term
Object property:
instance list
Embedded instance
Required property
Datatype property
eagle-i data collection tool
Workflow support
eagle-i search
Faceted search
Autocomplete from instances and ontology
eagle-i search
Instance pages with materialized properties
Layered ontology model
Modeling dichotomy
Eagle-i ontology is a domain model aimed at capturing biological knowledge
Application needs a model from which to derive behavior
Complexity
• Eagle-i ontology is interoperable; it builds on an upper ontology and imports numerous terms
• Not all ontology constructs translate into user-level constructs
Layered ontology model
• Application ontologies annotate domain ontologies with application-specific information and restrictions
Thing
Research Project
Human Study
Entity
Processual entity
Planned process
Occurrent
Epidemiological study Qualitative human study Quantitative human study
GWAS
Property 1
Property 2
Example
Ontology-driven development: process observations
Developing ontology-driven applications requires close collaboration between software developers and ontologists
• Separation of concerns principle • Process for owning, editing and annotating ontology files • Annotations with a pure UI goal that require domain knowledge can be problematic
The applications provide ontology developers with a mechanism to rapidly test and refine their models for different usage scenarios
• Data collection
• Data retrieval
Challenges of producing and consuming linked data
Producing Linked Data
Need to enforce ontology constraints
ETL: in addition to producing ontology-conforming class instances, ETL processes need to inter-link them
Consuming Linked Data
• Need to view the data through an ontology lens
• Filter-out administrative and non-conforming triples
Concluding remarks
eagle-i is a proof-of-concept system
A software suite
A network of institutions
An operational system with curated data
The eagle-i software and know-how are applicable to other problem spaces and domains
• Ontology-driven framework goal: instantiate software stack for any ontology • No code changes to core framework • Annotate new domain ontology with eagle-i application ontology
eagle-i coming soon to open.med.harvard.edu
www.eagle-i.org www.eagle-i.org
Demo scenarios
Overview
o Scenario description
o Entry of data into the Web Tool
o Curation and publishing of data
o Searching on data in the repository
o How ontology integration makes resources visible
Scenario
Primary Scenario: Relapsing Fever – Host-Pathogen Interactions & Human Exposure
Dr. Olivier Lucas studies mechanisms of and ecological risk for infection with Borrelia hermsii, the tick-borne Relapsing Fever agent. He believes he has identified a role for IL-17 in disease resolution in a mouse model and would like to examine contributing immune cell populations. He’s also hoping to begin a study assessing B. hermsii exposure/seroconversion within rural populations in Montana. Lastly, he has received some departmental funds to support a work-study position in his lab.
Dr. Lucas wants to…
1. Advertise his vacant work-study research opportunity.
2. Obtain an IL-17 receptor antibody for his mouse work.
3. Locate a source of human biospecimens from MT for his seroconversion study.
Supporting Scenarios
Supporting Scenario A: Mucosal Immunity and Th17 Populations
Dr. David Pascual studies mucosal immunity and contributing T cell populations. He has developed a monoclonal antibody for the IL-17 receptor and now that this work has been published, would like to share his antibody.
Dr. Pascual wants to…
1. Advertise his IL-17 receptor mAb to potential collaborators.
Supporting Scenario B: Lipid Profiles and Cardiovascular Disease Risk
Dr. Donna Williams is a human health researcher studying cardiovascular disease risk factors in rural, geographically-isolated Montana communities. Some time ago she completed a study in which blood draws were obtained to assess total lipid profiles. Sera from these individuals was collected and frozen back for a potential analysis of inflammatory mediators but she’s since shifted her research focus.
Dr. Williams wants to…
1. Put this frozen sera to good use.