ameet n chitnis, abir qasem and jeff heflin 11 november 2007
Post on 18-Dec-2015
218 Views
Preview:
TRANSCRIPT
Talk Organization• Motivation (a.k.a. why yet another benchmark?) and Influences • The Workload
• Domain Ontologies, map ontologies, data sources, queries• The Metrics• How do we generate things?
• Domain ontology generation• Map ontology Generation
• Parameters & Relationships• Map Generator Algorithm
• Data Source Generation• Query Generation
• Sample Workload • Conclusion & Future Work
MotivationAs the Semantic Web matures …
OWL Ontologies and data from various organizations will gain commercial value
Alignment of different ontologies and integration of data that commit to them will be a viable business enterprise
Quite possibly we will have post development alignments between ontologies (Alignment tools, third parties etc.)
Currently DBPedia, Hawkeye provides some form of third party alignments (non commercial)
We wanted to develop a benchmark that reflects the above reality
Influences Lehigh University Benchmark (LUBM) by Y. Guo, Z.
Pan, and J. Heflin. (ISWC 2004) Extended LUBM (can support both OWL Lite and OWL
DL) by L. Ma, Y. Yang, Z. Qiu, G, Xie and Y. Pan. (ESWC 2006)
Statistical Analysis of the available Semantic Web ontologies by Tempich, C. and Volz, R. (ISWC 2003)
Benchmarking DL systems by I. Horrocks and P. Patel-Schneider. (DL Workshop 1998)
Internet topology generator by J. Winick and S. Jamin. (University of Michigan)
The Workload (1)Domain ontologies
“Simple” ontologies. We can control number of classes, properties, and branching factor of the hierarchies
Data sourcesWe can control number of data sources that commit to a
given ontology, number of classes that will have individuals, number of properties that will connect those individuals, number of triples.
QueriesExtensional queries in SPARQL. We can control the mix of classes, properties, individualsWe can control selectivity
The Workload (2)Map ontologies: Main focus of this work
In our work a map ontology consists solely of “mapping” axioms that establish alignment between two domain ontologies
This is just for convenience of generation and analysis. Semantically they are not much different from the domain ontologies
Macro level: We generate Directed acyclic graph of domain ontologies Every edge represents a map ontology
Micro level: We can control the type of axioms that are used to map two
domain ontologies
MetricsMetrics Systems with
Centralized Approach
Systems with Distributed Approach
Initialization Time
Time taken to Load the knowledge base
Time taken to read the index (e.g. meta-data)
Query Response Time
Reasoning time Load Time + reasoning time
Query Completeness
Consider queries that entail at least one answer.In determining the relative completeness of queries against a reference set.
Repository Size Number of triples
N/A
Domain Ontology GenerationSimple taxonomyThe number to generate vary in a normal
distribution with a user supplied value for the mean
Given a branching factor and number of terms we generate a balanced tree
Complex axioms are left for map ontologies
Map Ontology GenerationInputs
No. of Ontologies we want in the workload Average Out-degree (referred to as out below) Diameter
The number of maps created is approximately equal to - maps ~(total onts-terminal onts)* out However we do not have terminal onts as a parameter A reasonable approximation is Terminal ontologies ~ (onts*out)/(diameter+out) Thus we have Number of maps ~ (onts*out*diameter)/(diameter+out)
Map Generator Algorithm1. Determine and mark the number of terminal nodes2. Create a path of diameter length3. Choose targets for every non-terminal ontology.
Constraints:a. No Cyclesb. No path greater than diameterc. Non-terminal nodes should not become terminal
Create the corresponding map ontologies by generating mapping axioms
4. Update the parameters of the source and the target
Mapping axiomsGiven two domain ontologies and a desired
distribution of OWL constructors and restrictions
We choose terms from the domain ontologies and create an axiom that connects them
We can generate fairly complex axioms E.g. O1:A ⊔ O1:B ⊑ ∃ O2:P.O2:C ⊓ ∀O2:Q.O2:D
Currently the algorithm is restricted to generating axioms that will keep the ontology to OWLII (a subset of OWL used by OBII, Qasem et al. 2007, ISWC NFR workshop)
But this is NOT a limitation of our approach
Source GenerationChoose an ontologyChoose number of classes to create
individualsGenerate triples
We can either generate random individuals or Use the domain and range information to
connect the individuals with properties
Query GenerationSPARQL Queries (SELECT) 1. Choose the first predicate from the classes of an ontology.
2. We bias the next predicate with a 75% chance of being one of the properties from the ontology.
3. We make use of shared variables in order to implement “joins”. A shared variable is equally likely to be in the subject as well as the object position.
4. For single predicate queries all the variables are distinguished. For others, on an average 2/3rd of the variables are distinguished and the rest are non-distinguished.
5. There exists a 10% chance for a constant.
A Sample WorkloadWe used the benchmark to evaluate OBII – a
distributed query answering system We compared it with a “baseline” system which
was essentially a KAON2 wrapperSome characteristics of the workload
50% of classes had individuals On an average we generated 75 triples in a source
Generated configurations as large as 100 domain ontologies with about 1000 data sources
Conclusion and Future Work A focus on workload that accounts for post
development alignments Micro level - controlling mapping axioms Macro level - controlling how ontologies are mapped
Domain ontologies synthesis can be expanded to support complex axioms
Experiment with different characteristics Hubs and Authorities (different in-degree / out-degree
pattern)
top related