model-based mediation with domain maps bertram ludäscher * amarnath gupta * maryann e. martone + *...
TRANSCRIPT
Model-Based Mediation with Domain Maps
Bertram Ludäscher* Amarnath Gupta*
Maryann E. Martone+
*San Diego Supercomputer Center (SDSC)+National Center for Microscopy and Imaging Research (NCMIR)
University of California, San Diego (UCSD)
Overview
• Motivation – Problem with current Mediator Architecture
– Complex Scientific Multiple-World Scenarios
• Model-Based Mediation Architecture– Lifting from XML to level of Conceptual Models (CMs)
• Formal Framework– Domain Maps (DMs)
– Generic Conceptual Model GCM
– Integrated View Definition
• Example Query Evaluation• Open Issues
A Standard Mediator Architecture (MIX -- Mediation of Information using XML, SDSC/UCSD)
MIX MEDIATOR
INTEGRATED VIEW
USER-QueryUSER-Query
Data Sources
DB Files WWW
Lab1 Lab2 Lab3
Wrapper Wrapper Wrapper
XML Q/A
XML Q/A
XML Integrated View DefinitionXMAS/XQuery
XML Q/A
The Problem: Complex Multiple-World Scenarios
• Current Integration Issues– Structural/Schema Conflicts
• common semistructured data model (XML)
• schema transformations/integration (XML queries & transforms)
– Limited Query Capabilities
• capability based rewriting (e.g., TSIMMIS)
– ... • BUT scenarios are “one-world” (amazon.com vs. bn.com) or
simple multiple world (home buyer)• Problem: No Support for Semantic Mediation
– “complex multiple-world” scenarios (Neuroscience, Geoscience):
• complex, disjoint, seemingly unrelated data
• “hidden semantics” in complex, indirect relationships
A Neuroscience Question
What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?
How about other rodents?
protein localization(NCMIR)
Wrapper
neurotransmission(SENSELAB)
Wrapper
morphometry(SYNAPSE)
Wrapper
??? Integrated View ???
???Mediator ??????Mediator ?????? Integrated
View Definition ???
Hidden Semantics: Protein Localization (NCMIR)
<protein_localization><neuron type=“purkinje cell” /><protein channel=“red”>
<name>RyR</>….</protein><region h_grid_pos=“1” v_grid_pos=“A”>
<density> <structure fraction=“0.8”>
<name>spine</><amount name=“RyR”>0</>
</> <structure fraction=“0.2”>
<name>branchlet</><amount name=“RyR”>30</>
</>
Molecular layer ofCerebellar Cortex
Purkinje Cell layer ofCerebellar Cortex
Fragment of dendrite
Hidden Semantics: Morphometry (SNYAPSE)
<neuron name=“purkinje cell”><branch level=“10”>
<shaft>…
</shaft> <spine number=“1”>
<attachment x=“5.3” y=“-3.2” z=“8.7” />
<length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</>
<length>1.79</> </head>
</spine> …
Branch level beyond 4 is a branchlet
Must be dendritic because Purkinje cells
don’t have somatic spines
Approach: Model-Based Mediation
• Complex Multiple Worlds Integration Problem– terms not directly joinable– complex, indirect associations– unstated, “hidden” semantics (not just schema conflicts)
• Missing “Semantic Link”=> how to define complex, indirect semantic links?
=> lift mediation to the level of conceptual models (CMs)=> domain expert’s knowledge formalized as rules over CMs=> Model-Based Mediation
XML-Based vs. Model-Based Mediation
IF THEN IF THEN IF THEN
LogicalDomainConstraints
Integrated-CM :=
CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
C2 C3
C1
R
Classes,Relations,is-a, has-a, ...
DOMAIN MAP
Raw DataRaw DataRaw Data
XMLElements
XML Models
Integrated-DTD :=
XQuery(Src1-DTD,...)
No DomainConstraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
Extended Mediator Architecture
• Wrappers export Conceptual Models (CMs)– facts & rules for classes, relationships, ICs, ... – source data is “put into context” (“aboutness” index) by linking
to domain maps (DMs)
• Mediator employs CMs and DMs– ... to define complex semantic relationships on the formalized
domain knowledge
• Generic Conceptual Model (GCM)– as a common target CM – minimal requirements/core expressions:
• instance(O,C), subclass(C1,C2)• method_type(C,M,C’), method_value(O,M,R)• relation_type(R,A1/C1,...,An/Cn)• relation_value(R,a1,...,an)
• Expressiveness, Extensibility – allow inductive properties (inheritance, closures, ...)– employ a declarative rule language (e.g. F-Logic)
Model-Based Mediator Architecture
USER/ClientUSER/Client
S1 S2
S3
XML-Wrapper
CM-Wrapper
XML-Wrapper
CM-Wrapper
XML-Wrapper
CM-Wrapper
GCM
CM S1
GCM
CM S2
GCM
CM S3
CM (Integrated View)
MediatorEngine
FL rule proc.
LP rule proc.
Graph proc.XSB Engine
Domain MapDM
Integrated View Definition IVD
Logic API(capabilities)
CM Queries & Results (exchanged in XML)
CM Plug-In
Formalizing Domain Knowledge:Domain Map for SYNAPSE and NCMIR
A domain map comprises• Description Logic facts ...
- concepts ("classes") - roles ("associations")
• derived properties ...• ... expressed as logic rules
- (e.g. F-logic)
domain map
Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).
domain expert knowledge
equivalent Description Logic facts
Domain Map Refinement
In addition to registering (“hanging off”) data, a source may also refine the mediator’s domain map...
... source can register new concepts at the
mediator ...
Definition of Integrated Views (Deja Vu?) ...
• XML/CM-2-FL Translators
<!ELEMENT Studies (Study)*><!ELEMENT Study (study_id, … animal, experiments, experimenters><!ELEMENT experiments (experiment)*><!ELEMENT experiment (description, instrument, parameters)>
studyDB[studies =>> study].study[study_id => string; … animal => animal; experiments =>> experiment; experimenters =>> string].…
• Specification of Domain Knowledge• Subclasses
• Data Classification
• Integrity Constraints
mushroom_spine :: spine
DERIVE S:mushroom_spine FROM S:spine[head_; neck _].
ic1(S):ALERT[type “invalid spine”; object S] IF S:spine[undef ->> {head, neck}].
... Definition of Integrated Views (Multiple Sources)
• Integrated View Definition
• Schema Reasoning & Dynamic Classes
taxon[subspecies string; species string; genus string; … phylum string; kingdom string; superkingdom string].
subspecies::species::genus:: … kingdom::superkingdomTAXON Rank Hierarchy
DERIVE T:TR, TR::TR1 FROMT: ‘TAXON’.taxon[Taxon_Rank TR, Taxon_Rank1 TR1],Taxon_Rank::Taxon_Rank1.
Create Classes fromTAXON data
DERIVE
protein_distribution(Protein, Organism,Brain_region,Feature_name,Anatom,Value)
FROM
I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}] , % from PROLAB
AS..segments..features[name->Feature_name; value->Value],
NAE:neuro_anatomic_entity[name-> Anatom; % from ANATOM
located_in->>{Brain_region}].
TAXON DB Schema
Query Evaluation Example
push selection
@SENSELAB: X1 := select output from parallel fiber ;determine source context
@MEDIATOR: X2 := “hang off” X1 from Domain Map;
compute region of interest (here: downward closure)
@MEDIATOR: X3 := subregion-closure(X2);
push selection
@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);
compute protein distribution
@MEDIATOR: X5 := compute aggregate(X4);
"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"
ANATOM Domain Map with Registered Data ANATOM DATA
Deductive Closure of “has_a” with “tc(is_a)”:(YES -- Real Recursive Views!! ;-) ANATOM CLOSURE
Interactive Queries KIND01
Resulting Sub DOMAIN MAP “Browser” PROTLOC
Computed Protein Localization Data PROTLOC
Client-Side Result Visualization(using AxioMap Viewer: Ilya Zaslavsky)
PROTLOC-AxioMap
Comparison & Summary: Model-Based Mediation
(Complex) Single World/ Simple Multiple World
Complex Multiple World
Integration target global schema(common / shared)
1..n shared domain maps
Example scenario suppliers’ catalogs/ home buyer
complex scientific data (neuroscience, geoscience,…)
Schema level overlapInstance level overlap
large / smalllarge / none
none … smallnone
Source correlation direct, instance / schema level indirect, conceptual (knowledge)level
Techniques schema transformations, schemaintegration
“structural” integration
domain maps, formalized domainknowledge (“semantic bridges”)=> model-based (“semantic”)
mediationIntegration languagesExpressiveness
relational, semistructured,queries & transformations
(e.g., SQL, XQuery, XSLT)
conceptual (description logics),object-oriented, deductive features
(e.g., GCM, F-logic)
Integrators DB expert domain expert + KRDB expert
Conclusions and Outlook
• Model-based Mediation Architecture– for complex multiple worlds scenarios (Neuroscience, ...)– sources export CMs (data “lifted” to conceptual level)– mediator employs DMs (“semantic road map”)
• Simple Prototype based on XSB/FLORA– source and result data situated in DM context– domain scientists are excited ...
• Some Open Issues – striking the right balance between complexity and expressiveness of
DMs (e.g. subsumption and satisfiability of DMs should be decidable)– query processing/optimization– modeling query capabilities– semantic annotation tools for “dumb” sources– re-implement ... *sigh* ...– ...
ADDITIONAL MATERIAL STARTS HERE
ANATOM Domain Map ANATOM
Model-Based Mediation with DOMAIN MAPS (DMs)
Integrated-CM(Z1,...) := get X1,... from Src1;
get X2,... from Src2;LINK (Xi, Yj);Zj = CM-QL(X1,...,Y1,...)
LINK(X,Y):
X.zip = Y.zip
X.addr in Y.zipX.zip overlaps Y.county...
• “Semantic Road Maps” for situating source data
=> navigational aid (browsing source classes at the conceptual level)
=> basis for integrated views across multiple worlds
=> link points (concepts) and labeled arcs (roles)
=> formal semantics (in FL and/or DLs)
Example: ANATOM DM
= antatomical entities (concepts) + is_a, has_a, overlaps, ... (roles)
=> from syntactic equality to semantic joins
Example Query Evaluation (I)
• Example: protein_distribution– given: organism, protein, brain_region– ANATOM DM:
• recursively traverse the has_a_star paths under brain_region collect all anatomical_entities
– Source PROLAB:• join with anatomical structures and collect the value of attribute
“image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = protein and “study_db.study.animal.name” = organism
– Mediator:• aggregate over all parents up to brain_region• report distribution
Interactive Queries KIND
Summary & Outlook: Federation of Brain Data
CCB, Montana SUSurface atlas, Van Essen
Lab
NCMIR, UCSDstereotaxic atlas LONI
MCell, CNL, Salk
ANATOM
PROTLOC
Result (VML)
Result (XML/XSLT)
MODEL-BASED Mediation