ontoqa: metric-based ontology quality analysis
DESCRIPTION
OntoQA: Metric-Based Ontology Quality Analysis. Samir Tartir, I. Budak Arpinar, Michael Moore, Amit P. Sheth, Boanerges Aleman-Meza IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge Sources Houston, Texas, November 27, 2005. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/1.jpg)
OntoQA: Metric-Based Ontology Quality Analysis
Samir Tartir, I. Budak Arpinar, Michael Moore, Amit P. Sheth,
Boanerges Aleman-Meza
IEEE Workshop on Knowledge Acquisition from Distributed, Autonomous, Semantically Heterogeneous Data and Knowledge
Sources
Houston, Texas, November 27, 2005
![Page 2: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/2.jpg)
The Semantic Web• Current web is intended for human use• Semantic web is for humans and
computers• Semantic web uses ontologies as a
knowledge-sharing vehicle.• Many ontologies currently exist: GO, OBO,
SWETO, TAP, GlycO, PropreO, etc.
![Page 3: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/3.jpg)
Motivation
• Having several ontologies to choose from, users often face the problem of selecting the best ontology that is suitable for their needs.
![Page 4: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/4.jpg)
OntoQA• Metric-Based Ontology Quality Analysis
• Describes ontology schemas and instancebases (IBs) through different sets of metrics
• OntoQA is implemented as a part of SemDis project.
Documentsdatabases
Open/proprietary Heterogeneous Data Sources
HtmlXMLfeeds
PopulatedOntology
Ontology Schema
Emails
![Page 5: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/5.jpg)
Contributions
• Defining the quality of ontologies in terms of:• Schema• Instances
• IB Metrics• Class-extent metrics
• Providing metrics to quantitatively describe each group
![Page 6: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/6.jpg)
I. Schema Metrics• Schema metrics address the design of the
ontology schema.
• Schema quality could be hard to measure: domain expert consensus, subjectivity etc.
• Three metrics:– Relationship richness– Attribute richness– Inheritance richness
![Page 7: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/7.jpg)
I.1 Relationship Richness
• How close or far is the schema structure to a taxonomy?
• Diversity of relations is a good indication of schema richness.
PIsA
PRR
|P|: Number of non-IsA relationships
|IsA|: Number of IsA relationships
![Page 8: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/8.jpg)
I.2 Attribute Richness
• How much information do classes contain?
C
AAR
|A|: Number of literal attributes
|C|: Number of classes
![Page 9: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/9.jpg)
I.3 Inheritance Richness (Fan-out)• General (e.g. spanning various domains) vs.
specific
C
C,CHCC
ijC
SiIR
|Hc(cj, ci)|: Number of subclasses of Class Ci
|C|: Number of classes
![Page 10: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/10.jpg)
II. Instance Metrics• Deal with the size and distribution of the
instance data.
• Instance metrics are grouped into two subcategories:
1. IB metrics: describe the IB as a whole2. Class metrics: describe the way each class that is
defined in the schema is being utilized in the IB
![Page 11: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/11.jpg)
II.1.a Class Richness
• How much does the IB utilizes classes defined in the schema?
• How many classes (in the schema) are actually populated?
C
CCR
`
|C’|: Number of used classes
|C|: Number of defined classes
![Page 12: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/12.jpg)
II.1.b Average Population
• How well is the IB “filled”?
C
IP
|I|: Number of instances
|C|: Number of defined classes
![Page 13: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/13.jpg)
II.1.c Cohesion
• Is IB graph connected or disconnected?
CCCoh
|CC|: Number of connected components
![Page 14: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/14.jpg)
II.2.a Importance
• How much focus was paid to each class during instance population?
I
)I(CImp i
Ci
|Ci(I)|: Number of instances defined for class Ci
|I|: Number of instances
![Page 15: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/15.jpg)
II.2.b Connectivity
• What classes are central and what are on the boundary?
C}C(I),CI(I)CI)I,P(I :{IConn jjjiijijCi
P(Ii,Ij): Relationships between instances Ii and Ij.
Ci(I): Instances of class Ci.
C: Defined classes.
![Page 16: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/16.jpg)
II.2.c Fullness
• Is the number of instances close to the expected?
|)I`(C|
)I(CF
i
i
|Ci(I)|: Number of instances of class Ci.
|Ci’(I)|: Number of expected instances of class Ci.
![Page 17: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/17.jpg)
II.2.d Relationship Richness
• How well does the IB utilize relationships defined in the schema?
)C,C(P
}CC),I(CI),I(CI:))I,I(P(Distinct{RR
ji
jjjiiji
Ci
P(Ii,Ij): Relationships between instances Ii and Ij.
Ci(I): Instances of class Ci.
Cj(I): Instances of class Cj.
C: Defined classes
P(Ci,Cj): Relationships between instances Ci and Cj.
![Page 18: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/18.jpg)
II.2.e Inheritance Richness
• Is the class general or specific?
'C
C,CH
IR'CC
jkC
Cj
i
C’: Classes belonging to the subtree rooted at Ci
|Hc(ck, cj)|: Number of subclasses of Class Ci
![Page 19: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/19.jpg)
Implementation
• Written in Java
• Processes ontology schema and IB files written in OWL, RDF, or RDFS.
• Uses the Sesame to process the ontology schema and IB files.
![Page 20: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/20.jpg)
Testing• SWETO: LSDIS’ general-purpose ontology that covers
domains including publications, affiliations, geography and terrorism.
• TAP: Stanford’s general-purpose ontology. It is divided into 43 domains. Some of these domains are publications, sports and geography.
• GlycO: LSDIS’ ontology for the Glycan Expression
• OBO: Open Biomedical Ontologies
![Page 21: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/21.jpg)
Results – Class Metrics
Ontology # of Classes
# of Instances
Inheritance Richness
Class Richness
Average Population
SWETO 44 1,003,021 0.9 56.8% 22,795.9
TAP 3,230 71,487 1.2 9.4% 22.1
GlycO 356 387 1.3 18.0% 1.1
PropreO 244 0 1.0 0.0% 0.0
![Page 22: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/22.jpg)
Results – Class Importance
Class Importance
010203040506070
Public
atio
n
Scie
ntif
ic_P
ublic
atio
n
Com
pute
r_S
cie
nce_
Researc
her
Org
aniz
atio
n
Com
pany
Confe
rence
Pla
ce
City
Bank
Airport
Terr
orist_
Attack
Event
AC
M_S
ubje
ct_
Desc
ripto
rs
Class
Class Importance
05
101520253035
Mus
icia
n
Ath
lete
Aut
hor
Act
or
Mov
ie
Per
sona
lCom
pute
rG
ame
Boo
k
Pro
duct
Typ
e
Uni
tedS
tate
sCity
Uni
vers
ity City
For
tune
1000
Com
pan
y Ast
rona
ut
Com
icS
trip
Class
SWETO TAP
GlycO
Class Importance
010203040506070
N-g
lyca
n
gly
can
_m
oie
ty
N-g
lyca
n_
resi
du
e
carb
oh
ydra
te_
resi
du
e_
pro
pe
rty
N-g
lyca
n_
alp
ha
-D-
Ma
np
alp
ha
-D-
ma
nn
op
yra
no
syl_
resi
du
e
N-g
lyca
n_
be
ta-D
-G
lcp
NA
c
N-a
cety
l-b
eta
-D-
glu
cop
yra
no
sam
inyl
_re
sid
ue
mo
lecu
lar_
fra
gm
en
t
sug
ar_
con
figu
ratio
n
be
ta-D
-g
ala
cto
pyr
an
osy
l_re
sid
ue
N-g
lyca
n_
be
ta-D
-Ga
lp
N-g
lyca
n_
alp
ha
-N
eu
5A
c
sug
ar_
stru
ctu
ral_
vari
an
t
Class
![Page 23: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/23.jpg)
Results – Class ConnectivityClass Connectivity
0123456789
Terr
orist_
Attack
Bank
Airport
AC
M_S
econd_le
vel
_C
lassifi
catio
n
AC
M_T
hird_le
vel_
Cl
assifi
catio
n City
Sta
te
AC
M_S
ubje
ct_
Desc
ripto
rs
AC
M_T
op_le
vel_
Cla
ssifi
catio
n
Com
pute
r_S
cie
nce_
Researc
her
Scie
ntif
ic_P
ublic
atio
n Com
pany
Terr
orist_
Org
aniz
ati
on
Class
Class Connectivity
01234567
CM
UF
acul
ty
Per
son
Res
earc
hPro
jec
t
Mai
lingL
ist
CM
UG
radu
ateS
tud
ent
CM
UP
ublic
atio
n
CM
U_R
AD
W3C
Spe
cific
ati
on
W3C
Per
son
W3C
Wor
king
Dr
aft
Com
pute
rSci
enti
st
CM
UC
ours
e
Bas
ebal
lTea
m
W3C
Not
e
Class
SWETO TAP
GlycO
Class Connectivity
02468
1012
N-g
lyca
n_be
ta-D
-G
alpN
Ac
N-g
lyca
n_be
ta-D
-G
lcpN
Ac
N-g
lyca
n_al
pha-
Neu
5Ac
N-g
lyca
n_al
pha-
D-
Gal
p
N-g
lyca
n_al
pha-
L-F
ucp
N-g
lyca
n_al
pha-
Neu
5Gc
N-g
lyca
n_be
ta-D
-X
ylp
N-g
lyca
n_be
ta-D
-G
alp
N-g
lyca
n_al
pha-
D-
Glc
p
N-g
lyca
n_al
pha-
D-
Man
p
N-g
lyca
n_be
ta-D
-M
anp
N-a
cety
l-gl
ucos
amin
yl_t
rans
fer
ase_
V
N-g
lyca
n_al
pha-
D-
Glc
pNA
c
N-g
lyca
n_D
-G
lcN
Ac-
ol
Class
![Page 24: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/24.jpg)
BioMedical OntologiesOntology No. of Terms
(Instances)Average No. of
SubtermsConnectivit
y
Protein-protein Interaction
195 4.6 1.1
MGED 228 5.1 0.3
Biological Imaging Methods
260 5.2 1.0
Physico-chemical Process
550 2.7 1.3
Cereal Plant Trait 692 3.7 1.1
BRENDA 2,222 3.3 1.2
Human Disease 19,137 5.5 1.0
Gene Ontology 20,002 4.1 1.4
![Page 25: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/25.jpg)
Conclusions
• More ontologies are introduced as the semantic web is gaining momentum.
• There is no easy way for users to choose the most suitable ontology for their applications.
• OntoQA offers 3 categories of metrics to describe the quality and nature of an ontology.
![Page 26: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/26.jpg)
Future Work
• Calculation of domain dependent metrics that makes use of some standard ontology in a certain domain.
• Making OntoQA a web service where users can enter their ontology files paths and use OntoQA to measure the quality of the ontology.
![Page 27: OntoQA: Metric-Based Ontology Quality Analysis](https://reader035.vdocuments.us/reader035/viewer/2022062321/56812f7f550346895d95017e/html5/thumbnails/27.jpg)
Questions