instance-based ontological knowledge acquisition

33
Instance-Based Ontological Knowledge Acquisition The Graduate University for Advanced Studies (SOKENDAI) National Institute of Informatics Lihua Zhao & Ryutaro Ichise ESWC2013, Montpellier, France, 28th May, 2013

Upload: lihua-zhao

Post on 11-May-2015

543 views

Category:

Technology


2 download

DESCRIPTION

The Linked Open Data (LOD) cloud contains tremendous amounts of interlinked instances, from where we can retrieve abundant knowledge. However, because of the heterogeneous and big ontologies, it is time consuming to learn all the ontologies manually and it is difficult to observe which properties are important for describing instances of a specific class. In order to construct an ontology that can help users easily access to various data sets, we propose a semi-automatic ontology inte- gration framework that can reduce the heterogeneity of ontologies and retrieve frequently used core properties for each class. The framework consists of three main components: graph-based ontology integration, machine-learning-based ontology schema extraction, and an ontology merger. By analyzing the instances of the linked data sets, this framework acquires ontological knowledge and constructs a high-quality integrated ontology, which is easily understandable and effective in knowledge ac- quisition from various data sets using simple SPARQL queries.

TRANSCRIPT

Page 1: Instance-Based Ontological Knowledge Acquisition

Instance-Based Ontological Knowledge AcquisitionThe Graduate University for Advanced Studies (SOKENDAI)

National Institute of Informatics

Lihua Zhao & Ryutaro IchiseESWC2013, Montpellier, France, 28th May, 2013

Page 2: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Outline

Introduction

Related Work

Semi-automatic Ontology Integration FrameworkGraph-Based Ontology IntegrationMachine-Learning-Based Ontology Schema ExtractionOntology Merger

Experiments

Conclusion and Future Work

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 2

Page 3: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Introduction

Linked Open Data (LOD)Machine-readable and interlinked at instance-level.295 data sets, 31 billion RDF triples (as of Sep. 2011).Around 504 million owl:sameAs links.7 domains (cross-domain, geographic, media, life sciences,government, user-generated content, and publications).

WorldFact-book

JohnPeel

(DBTune)

Pokedex

Pfam

US SEC(rdfabout)

LinkedLCCN

EuropeanaEEA

IEEE

ChEMBL

SemanticXBRL

SWDogFood

CORDIS(FUB)

AGROVOC

OpenlyLocal

Discogs(Data

Incubator)

DBpedia

yovisto

Tele-graphis

tags2condelicious

NSF

MediCare

BrazilianPoli-

ticians

dotAC

ERA

OpenCyc

Italianpublic

schools

UB Mann-heim

JISC

MoseleyFolk

SemanticTweet

OS

GTAA

totl.net

OAI

Portu-guese

DBpedia

LOCAH

KEGGGlycan

CORDIS(RKB

Explorer)

UMBEL

Affy-metrix

riese

business.data.gov.

uk

OpenData

Thesau-rus

GeoLinkedData

UK Post-codes

SmartLink

ECCO-TCP

UniProt(Bio2RDF)

SSWThesau-

rus

RDFohloh

Freebase

LondonGazette

OpenCorpo-rates

Airports

GEMET

P20

TCMGeneDIT

Source CodeEcosystemLinked Data

OMIM

HellenicFBD

DataGov.ie

MusicBrainz

(DBTune)

data.gov.ukintervals

LODE

Climbing

SIDER

ProjectGuten-berg

MusicBrainz

(zitgist)

ProDom

HGNC

SMCJournals

Reactome

NationalRadio-activity

JP

legislationdata.gov.uk

AEMET

ProductTypes

Ontology

LinkedUser

Feedback

Revyu

GeneOntology

NHS(En-

AKTing)

URIBurner

DBTropes

Eurécom

ISTATImmi-

gration

LichfieldSpen-ding

SurgeRadio

Euro-stat

(FUB)

PiedmontAccomo-dations

NewYork

Times

Klapp-stuhl-club

EUNIS

Bricklink

reegle

CO2Emission

(En-AKTing)

AudioScrobbler(DBTune)

GovTrack

GovWILDECS

South-amptonEPrints

KEGGReaction

LinkedEDGAR

(OntologyCentral)

LIBRIS

OpenLibrary

KEGGDrug

research.data.gov.

uk

VIVOCornell

UniRef

WordNet(RKB

Explorer)

Cornetto

medu-cator

DDC DeutscheBio-

graphie

Wiki

Ulm

NASA(Data Incu-

bator)

BBCMusic

DrugBank

Turismode

Zaragoza

PlymouthReading

Lists

education.data.gov.

uk

KISTI

UniPathway

Eurostat(OntologyCentral)

OGOLOD

Twarql

MusicBrainz(Data

Incubator)

GeoNames

PubChem

ItalianMuseums

Good-win

Familyflickr

wrappr

Eurostat

Thesau-rus W

OpenLibrary(Talis)

LOIUS

LinkedGeoData

LinkedOpenColors

WordNet(VUA)

patents.data.gov.

uk

GreekDBpedia

SussexReading

Lists

MetofficeWeatherForecasts

GND

LinkedCT

SISVU

transport.data.gov.

uk

Didac-talia

dbpedialite

BNB

OntosNewsPortal

LAAS

ProductDB

iServe

Recht-spraak.

nl

KEGGCom-pound

GeoSpecies

VIVO UF

LinkedSensor Data(Kno.e.sis)

lobidOrgani-sations

LEM

LinkedCrunch-

base

FTS

OceanDrillingCodices

JanusAMP

ntnusc

WeatherStations

Amster-dam

Museum

lingvoj

Crime(En-

AKTing)

Course-ware

PubMed

ACM

BBCWildlifeFinder

Calames

Chronic-ling

America

data-open-

ac-uk

OpenElection

DataProject

Slide-share2RDF

FinnishMunici-palities

OpenEI

MARCCodes

List

VIVOIndiana

HellenicPD

LCSH

FanHubz

bibleontology

IdRefSudoc

KEGGEnzyme

NTUResource

Lists

PRO-SITE

LinkedOpen

Numbers

Energy(En-

AKTing)

Roma

OpenCalais

databnf.fr

lobidResources

IRIT

theses.fr

LOV

Rådatanå!

DailyMed

Taxo-nomy

New-castle

GoogleArt

wrapper

Poké-pédia

EURES

BibBase

RESEX

STITCH

PDB

EARTh

IBM

Last.FMartists

(DBTune)

YAGO

ECS(RKB

Explorer)

EventMedia

STW

myExperi-ment

BBCProgram-

mes

NDLsubjects

TaxonConcept

Pisa

KEGGPathway

UniParc

Jamendo(DBtune)

Popula-tion (En-AKTing)

Geo-WordNet

RAMEAUSH

UniSTS

Mortality(En-

AKTing)

AlpineSki

Austria

DBLP(RKB

Explorer)

Chem2Bio2RDF

MGI

DBLP(L3S)

Yahoo!Geo

Planet

GeneID

RDF BookMashup

El ViajeroTourism

Uberblic

SwedishOpen

CulturalHeritage

GESIS

datadcs

Last.FM(rdfize)

Ren.EnergyGenera-

tors

Sears

RAE2001

NSZLCatalog

Homolo-Gene

Ord-nanceSurvey

TWC LOGD

Disea-some

EUTCProduc-

tions

PSH

WordNet(W3C)

semanticweb.org

ScotlandGeo-

graphy

Magna-tune

Norwe-gian

MeSH

SGD

TrafficScotland

statistics.data.gov.

uk

CrimeReports

UK

UniProt

US Census(rdfabout)

Man-chesterReading

Lists

EU Insti-tutions

PBAC

VIAF

UN/LOCODE

Lexvo

LinkedMDB

ESDstan-dards

reference.data.gov.

uk

t4gminfo

Sudoc

ECSSouth-ampton

ePrints

Classical(DB

Tune)

DBLP(FU

Berlin)

Scholaro-meter

St.AndrewsResource

Lists

NVD

Fishesof

TexasScotlandPupils &Exams

RISKS

gnoss

DEPLOY

InterPro

Lotico

OxPoints

Enipedia

ndlna

Budapest

CiteSeer

Media

Geographic

Publications

User-generated content

Government

Cross-domain

Life sciences

As of September 2011

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 3

Page 4: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Motivation

Figure: Interlinked Instances of “France”.

Problems when access to several data sets:Ontology Heterogeneity Problem

Map related ontology classes and properties.Ontology similarity matching on the SameAs graph patterns.

Di!culty in Identifying Core Ontology SchemasRetrieve frequently used core ontology classes and properties.Machine learning for core ontology schema extraction.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 4

Page 5: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Related Work

Find useful attributes from frequent graph patterns using asupervised machine learning method. [Le, 2010]

Only for geographic data and no discussion about the features.

A debugging method for mapping lightweight ontologies withmachine learning method. [Meilicke, 2008]

Limited to the expressive lightweight ontologies.

Construct intermediate-layer ontology by analyzing conceptcoverings. [Parundekar, 2012]

Only for specific domains and limited between two resources.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 5

Page 6: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Semi-automatic Ontology Integration Framework

Construct a global ontology by integrating heterogeneousontologies of the Linked Open Data.

Graph-Based Ontology Integration [Zhao, et al., 2012]Group related classes and properties.

Machine-Learning-Based Ontology Schema ExtractionExtract frequent core classes and properties.

Ontology MergerMerge extracted ontology classes and properties.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 6

Page 7: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Semi-automatic Ontology Integration Framework

Construct a global ontology by integrating heterogeneousontologies of the Linked Open Data.

Graph-Based Ontology Integration [Zhao, et al., 2012]Group related classes and properties.

Machine-Learning-Based Ontology Schema ExtractionExtract frequent core classes and properties.

Ontology MergerMerge extracted ontology classes and properties.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 7

Page 8: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Graph-based Ontology Integration

Extract graph patterns from interlinked instances to discoverrelated ontology classes and predicates.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 8

Page 9: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

STEP 1: Graph Pattern Extraction

SameAs Graph: An undirected SameAs Graph SG = (V , E , I ), where

V : a set of vertices (the labels of data sets).

E ! V " V : a set of sameAs edges.

I : a set of URIs of the interlinked SameAs Instances.

Example: SGFrance = (VFrance , EFrance , IFrance).

VFrance = {M, D, G, N}EFrance = {(D, G), (D, N), (G, M), (G, N)}IFrance = {mdb-country:FR, db:France, geo:3017382, nyt:67...21}.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 9

Page 10: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

STEP 2: <Predicate, Object> Collection

<Predicate, Object> (PO) pairs and types for SGFrance

Predicate Object Type

rdf:type db-onto:Country Classrdfs:label “France”@en Stringfoaf:name “France”@en Stringfoaf:name “Republique francaise”@en Stringdb-onto:wikiPageExternalLink http://us.franceguide.com/ URIdb-prop:populationEstimate 65447374 Number. . . . . . . . . . . . . . . . . .geo-onto:name France Stringgeo-onto:alternateName “France”@en Stringgeo-onto:featureCode geo-onto:A.PCLI Classgeo-onto:population 64768389 Number. . . . . . . . . . . . . . . . . .rdf:type mdb:country Classmdb:country name France Stringmdb:country population 64094000 Numberrdfs:label France (Country) String. . . . . . . . . . . . . . . . . .rdf:type skos:Concept Classskos:inScheme nyt:nytd geo Classskos:prefLabel “France”@en Stringnyt-prop:first use 2004-09-01 Date

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 10

Page 11: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

STEP 3: Related Classes and Properties Grouping

Related Classes Grouping (Leaf nodes)Tracking subsumption relations from SameAs graphs.

< C1 owl:subClassOf C2 >< C1 skos:inScheme C2 >

Example: SGFrance

Related Classes # {db-onto:Country, geo-onto:A.PCLI,mdb:country, nyt:nytd geo }

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 11

Page 12: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

STEP 3: Related Classes and Properties Grouping

Related Properties GroupingExact matching for creating initial sets of PO pairs S1, S2, . . . , Sk .

Similarity matching on the initial sets of PO pairs.

Sim(POi ,POj) =ObjSim(POi ,POj) + PreSim(POi ,POj)

2

ObjSim(POi ,POj ) =

!"

#1!

|OPOi!OPOj

|OPOi

+OPOjif OPO is Number

StrSim(OPOi,OPOj

) if OPO is String

PreSim(POi ,POj ) = WNSim(TPOi,TPOj

)

StrSim(OPOi,OPOj

): Average of 3 string-based similarity measures.

WNSim(TPOi,TPOj

): Average of 9 WordNet-based similarity measures.

Refine sets of PO pairs according to rdfs:domain.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 12

Page 13: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

STEP 4: Aggregation of All Graph Patterns

Aggregate the integrated classes and properties from all the extractedgraph patterns.

Select A Term for Each Setex-onto:ClassTermex-onto:propTerm

Construct Relationsex-prop:hasMemberClasses<class, ex-prop:hasMemberClasses, ex-onto:ClassTerm>ex-prop:hasMemberDataTypes<property, ex-prop:hasMemberDataTypes, ex-onto:propTerm>

Construct A Preliminary Integrated Ontology

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 13

Page 14: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

STEP 5: Manual Revision

Manually revise the preliminary integrated ontology.

Terms of the integrated classes and properties:Choose a proper term for each group of classes or properties.

Groups of related classes or properties:Correct wrong grouping.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 14

Page 15: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Semi-automatic Ontology Integration Framework

Construct a global ontology by integrating heterogeneousontologies of the Linked Open Data.

Graph-Based Ontology Integration [Zhao, et al., 2012]Group related classes and properties.

Machine-Learning-Based Ontology Schema ExtractionExtract frequent core classes and properties.

Ontology MergerMerge extracted ontology classes and properties.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 15

Page 16: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Machine-Learning-Based Ontology Schema Extraction

Top-level classes and core properties are necessary.

Decision TableRetrieves core properties in each data set.

Belongs to rule-based machine learning with simple hypothesis.Retrieves a subset of properties that are important for describinginstances in a data set.

AprioriRetrieves core properties in the instances of a specifictop-level class.

Belongs to association rule mining.Finds a set of properties, whose support is greater than theuser-defined minimum support.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 16

Page 17: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Decision Table

Retrieve top-level classes and core properties that are important fordescribing instances in a data set.

Collect top-level classes.

Filter out infrequent properties.Convert each instance for the Decision Table algorithm.weight(prop1, inst), weight(prop2, inst), ... weight(propn, inst), class

PF-IIF (Property Frequency - Inverse Instance Frequency)

weight(prop, inst) = pf (prop, inst)" iif (prop,D)

pf (prop, inst) = the frequency of prop in inst.

iif (prop,D) = log|D|

|instprop |

instprop: an instance that contains the property prop.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 17

Page 18: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Apriori

Retrieve top-level classes and frequent core properties that areimportant for describing instances in a specific class.

Collect top-level classes.

Filter out infrequent properties.

Convert each instance of top-level class c for the Apriori algorithm.[prop1, prop2, ..., propn]

Define minimum support and confidence metric.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 18

Page 19: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Semi-automatic Ontology Integration Framework

Construct a global ontology by integrating heterogeneousontologies of the Linked Open Data.

Graph-Based Ontology Integration [Zhao, et al., 2012]Group related classes and properties.

Machine-Learning-Based Ontology Schema ExtractionExtract frequent core classes and properties.

Ontology MergerMerge extracted ontology classes and properties.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 19

Page 20: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Ontology Merger

Graph-Based Ontology Integration outputs a Preliminary IntegratedOntology.

For the ontology classes and properties retrieved fromMachine-Learning-Based Approach:

If Class c $% Preliminary Integrated Ontology,add < ex-onto:ClassTermnew , ex-prop:hasMemberClasses, c >.For each Property prop retrieved from top-level class c using Apriori,add a triple < prop, rdfs:domain, c >.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 20

Page 21: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Experiments

Data Sets

Graph-Based Ontology Integration

Decision Table

Apriori

Comparison of Integrated Ontology

Case Studies

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 21

Page 22: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Data Sets

DBpedia (v3.6): cross-domain, 3.5 million things, 8.9 million URIs.

Geonames (v2.2.1): geographical domain, 7 million URIs.

NYTimes: media domain, 10,467 subject news.

LinkedMDB: media domain, 0.5 million entities.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 22

Page 23: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Data Sets - Machine Learning

Data Set Instances Selected Class Top-level Property SelectedInstances Class Property

DBpedia 3,708,696 64,460 241 28 1385 840Geonames 7,480,462 45,000 428 9 31 21NYTimes 10,441 10,441 5 4 8 7LinkedMDB 694,400 50,000 53 10 107 60

Selected InstancesRandomly select instances per class:DBpedia (5000), Geonames(3000), NYTimes(All), LinkedMDB(3000)

Top-level ClassesOntology-based data set: Use subsumption relations.Without ontology: Use categories.

Selected PropertiesWith frequency threshold ! as

&n, where n is the total number of

instances in the data set.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 23

Page 24: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Graph-Based Ontology Integration

13 graph patterns

Frequent graph patterns:GP1, GP2, GP3

N,G,D: GP4, GP5, GP7, GP8

N,M,D: GP6

M,G,D: GP9

M,D,N,G: GP10, GP11,GP12, GP13

13 graph patterns.

97 classes into 48 groups.

357 properties into 38 groups.

Retrieved related classes and properties by analyzing graph patterns.[Zhao, I-Semantics2012]

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 24

Page 25: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Evaluation of Machine Learning Approaches

Evaluate the Decision Table and Apriori algorithm.

Evaluation of Decision TableEvaluate whether the retrieved sets of properties are important fordescribing instances by testing if they can be used to distinguishdi!erent types of instances in the data set.

Evaluation of AprioriAnalyze the performance of Apriori algorithm in each data set withexamples of retrieved sets of properties.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 25

Page 26: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Decision Table

Data Set Precision Recall F-Measure Selected Properties

DBpedia 0.892 0.821 0.837 53Geonames 0.472 0.4 0.324 10NYTimes 0.795 0.792 0.785 5LinkedMDB 1 1 1 11

Core properties are evaluated by predicting classes of instances (10-fold).11 properties from LinkedMDB can correctly identify class of an instance.DBpedia and NYTimes performs good with selected properties.10 properties from Geonames are commonly used for all types of classes.Examples of retrieved core properties.

DBpedia: db-prop:city, db-prop:debut, db-onto:formationYear,etc.Geonames: geo-onto:alternateName, geo-onto:countryCode, etc.NYTimes: nyt:latest use, nyt:topicPage, wgs84 pos:long, etc.LinkedMDB: mdb:director directorid, mdb:writer writerid, etc.

Retrieved top-level classes and core properties in each data set.Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 26

Page 27: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Apriori

Examples of retrieved core properties with Apriori Algorithm.

Data Set Class Properties

DBpediadb:Event db-onto:place, db-prop:date, db-onto:related/geo.db:Species db-onto:kingdom, db-onto:class, db-onto:family.db:Person foaf:givenName, foaf:surname, db-onto:birthDate.

Geonamesgeo:P geo-onto:alternateName, geo-onto:countryCode.geo:R wgs84 pos:alt, geo-onto:name, geo-onto:countryCode.

NYTimesnyt:nytd geo wgs84 pos:long.nyt:nytd des skos:scopeNote.

LinkedMDBmdb:actor mdb:performance, mdb:actor name, mdb:actor netflix id.mdb:film mdb:director, mdb:performane, mdb:actor, dc:date.

DBpedia and LinkedMDB: Retrieved unique properties.Geonames and NYTimes: Retrieved commonly used properties only.Automatically added missing domain information:< prop, rdfs : domain, classtop >.

Retrieved frequent core properties in each top-level class.Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 27

Page 28: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Comparison of Integrated Ontology

Previous Work Machine Learning Current WorkGraph-Based Decision Apriori IntegratedIntegration Table Ontology

Class 97 50 (38 new) 50 (38 new) 135 (38 new)Property 357 79 (49 new) 119(80 new) 453 (96 new)

Previous Work: 97 classes in 49 groups, 357 properties in 38 groups.

Current Work: 135 classes in 87 groups, 453 properties in 97 groups.

Apriori retrieves more properties than Decision Table.

33 new properties are found with both Apriori and Decision Table.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 28

Page 29: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Case Studies I

Find Missing Links of Islands with Integrated OntologySELECT DISTINCT ?geo ?db ?stringwhere { ?geo geo-onto:featureCode geo-onto:T.ISL.?geo ?gname ?string.ex-onto:name ex-prop:hasMemberDataTypes ?gname.?db rdf:type db-onto:Island.ex-onto:name ex-prop:hasMemberDataTypes ?dname.?db ?dname ?string. }

Retrieved 509 links, including 218 existing SameAs links:97 existing links from DBpedia to Geonames.211 links from Geonames to DBpedia.90 bidirectional links between DBpedia and Geonames.

Discovered 291 missing links with the integrated ontology using exactmatching on the labels of instances.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 29

Page 30: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Case Studies II

Predicates grouped in ex-prop:birthDate

Property Number of Instances rdfs:domain

db-onto:birthDate 287,327 db-onto:Persondb-prop:datebirth 1,675 N/Adb-prop:dateofbirth 87,364 N/Adb-prop:dateOfBirth 163,876 N/Adb-prop:born 34,832 N/Adb-prop:birthdate 70,630 N/Adb-prop:birthDate 101,121 N/A

Suggest “db-onto:birthDate” as the standard property because it

has rdfs:domain definition

has the highest usage in the DBpedia instances.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 30

Page 31: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Case Studies III

Give me all the cities with more than 10,000,000 inhabitants.

Standard Query Query with the Integrated OntologySELECT DISTINCT ?uri ?string SELECT DISTINCT ?uri ?stringWHERE { WHERE {?uri rdf:type db-onto:City. ?uri rdf:type db-onto:City.

ex-onto:population ex-prop:hasMemberDataTypes ?prop.?uri db-prop:populationTotal ?inhabitants. ?uri ?prop ?inhabitants.FILTER (?inhabitants > 10000000). FILTER (?inhabitants > 10000000).OPTIONAL { ?uri rdfs:label ?string. OPTIONAL { ?uri rdfs:label ?string.FILTER (lang(?string) = ’en’) }} FILTER (lang(?string) = ’en’) }}

A SPARQL example from QALD-1 Open Challenge.

Standard query: 9 cities.

Query with the integrated ontology: 20 cities.

Help QA systems for finding more related answers with simple queries.

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 31

Page 32: Instance-Based Ontological Knowledge Acquisition

Introduction Related Work Semi-automatic Ontology Integration Framework Experiments Conclusion and Future Work

Conclusion and Future Work

Conclusion

Semi-automatic ontology integration frameworkGraph-Based Ontology Integration.

Ontology similarity matching on SameAs graph patterns.Retrieve related ontology classes and properties.

Machine-Learning-Based Ontology Schema ExtractionDecision Table and Apriori.Extract top-level classes and core properties.

Ontology Merger

Find missing links, detect misuses of ontologies, and access variousdata sets with integrated ontology.

Future Work

Automatically detect and revise mistakes in ontology merger.

Automatically detect ranges and domains of properties.

Test our framework with more LOD data sets.Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 32

Page 33: Instance-Based Ontological Knowledge Acquisition

Thank you!Questions?

Lihua Zhao, [email protected] Ichise, [email protected]

Lihua Zhao & Ryutaro Ichise | Instance-Based Ontological Knowledge Acquisition | 33