what is concept dirft and how to measure it?

37
Introduction A theory of concept drift Case studies Summary and future work What is concept drift and how to measure it? Shenghui Wang, Stefan Schlobach, Michel Klein Vrije Universiteit Amsterdam EKAW 2010 Lisbon

Upload: shenghui-wang

Post on 20-Aug-2015

1.295 views

Category:

Education


1 download

TRANSCRIPT

Introduction A theory of concept drift Case studies Summary and future work

What is concept drift and how to measure it?

Shenghui Wang, Stefan Schlobach, Michel KleinVrije Universiteit Amsterdam

EKAW 2010Lisbon

Introduction A theory of concept drift Case studies Summary and future work

Outline

1 Introduction

2 A theory of concept drift

3 Case studiesConcept drift in political communicationConcept drift in DBpediaConcept drift in LKIF-Core

4 Summary and future work

Introduction A theory of concept drift Case studies Summary and future work

Introduction

Knowledge organisation systems (KOS) play a crucial role inproviding semantic interoperability

formal ontologies (modelled in OWL)thesauri or taxonomies (described in SKOS)other term classification schemes

Concepts are the central constructs

However, it is also recognised that concepts driftthe meaning of a concept changes over time, location, orculture

Introduction A theory of concept drift Case studies Summary and future work

Introduction

Knowledge organisation systems (KOS) play a crucial role inproviding semantic interoperability

formal ontologies (modelled in OWL)thesauri or taxonomies (described in SKOS)other term classification schemes

Concepts are the central constructs

However, it is also recognised that concepts driftthe meaning of a concept changes over time, location, orculture

Introduction A theory of concept drift Case studies Summary and future work

Example 1: Follow the Fashion?

Introduction A theory of concept drift Case studies Summary and future work

Example 2: Women’s role?

Suffragettes said that women’s role in society is unacceptable

Pope says that women’s role in society is unacceptable

Introduction A theory of concept drift Case studies Summary and future work

Example 3: European Union

(1979) The European Community is a common denominator for theEuropean Economic (EEC), the European Coal and SteelCommunity (ECSC), and the European Atomic Energy Community(EAEC). – DTV Atlas

(1999) The European Community is the new stage in the implementationof increasing the Union of the European people. – Brockhaus:Europaeische Gemeinschaft

(2003) The European Union or EU is an international organisation ofEuropean states, established by the Treaty on European Union. –Wikipedia 2003

(2006) The European Union (EU) is a supranational and intergovernmentalunion of 25 independent, democratic member states. – Wikipedia2006

(2010) The European Union is an international organisation comprising 27European countries and governing common economic, social, andsecurity policies. – Encyclopedia Britanica

Introduction A theory of concept drift Case studies Summary and future work

Research questions

1 What is concept drift, and how to formalise it?

2 Can we identify the impact of concept-drift?

Introduction A theory of concept drift Case studies Summary and future work

The meaning of a concept

We consider the intension, extension and label as threecomponents of the meaning of a concept:

Definition

The meaning C t of a concept C at some moment in time t is atriple (labelt(C ), intt(C ), extt(C )), where labelt(C ) is a String,intt(C ) a set of properties (the intension of C ), and extt(C ) asubset of the universe (the extension of C ).

Introduction A theory of concept drift Case studies Summary and future work

Identity

Identity allows us to compare two variants of the same concept atdifferent moments in time even if the meaning (either label,extension or the non-rigid part of its intension) has changed.

Definition

Two concepts C1 and C2 are considered identical if and only if,their rigid intension are equivalent, i.e., intr (C1) = intr (C2).

Introduction A theory of concept drift Case studies Summary and future work

Identity

Identity allows us to compare two variants of the same concept atdifferent moments in time even if the meaning (either label,extension or the non-rigid part of its intension) has changed.

Definition

Two concepts C1 and C2 are considered identical if and only if,their rigid intension are equivalent, i.e., intr (C1) = intr (C2).

Introduction A theory of concept drift Case studies Summary and future work

Concept drift

This definition of drift is based on the idea that a concept retainsits identity over time, i.e., remains the same at least temporarily.

Definition

A concept C has extensionally drifted between time ti and tj if andonly if simext(Cti ,Ctj ) 6= 1. Intensional and label drift are definedsimilarly. The meaning of a concept has drifted if one of theaspects has drifted.

Introduction A theory of concept drift Case studies Summary and future work

Concept shift

Definition

The meaning of a concept C extensionally shifts between two of itsvariants C ti and C tj if the extension of C tj is more similar to theextension of a non-identical concept rather than to the extensionof C ti . Intensional and label shift are defined similarly.

C1t1

C1t2

C2t2

time t1 t2

Introduction A theory of concept drift Case studies Summary and future work

(In)stability

The more the meaning of a concept drifts, the more unstable itbecomes.

We put the variants of one concept at different moments intoa chain, i.e., chain(C , t1, tn) = C t1 → C t2 → . . . → C tn

We take the average similarity of all steps along this chain asthe stability measure

As an relative measure, it tells whether one concept is morestable than another over a certain period of time

Introduction A theory of concept drift Case studies Summary and future work

Applying the framework

To apply our framework for concept drift in a specific use-case, thefollowing steps are required:

1 to define intension, extension and a labelling function

2 to define the identity of concepts

3 to define similarity functions over intension, extension andlabels

Introduction A theory of concept drift Case studies Summary and future work

Case studies

Political communication (a political vocabulary described inSKOS)

DBpedia (a general-purposed ontology modelled in RDF(S))

LKIF-core (a legal ontology modelled in OWL)

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Concept drift in political vocabularies

Communication scientists use certain vocabularies to annotatenewspapers, so that they can do content analysis.

We studied five variants of a SKOS vocabulary of politicalconcepts used during five recent Dutch national electioncampaigns, which took place in 1994, 1998, 2002, 2003 and2006.

We also collected all newspaper articles which were manuallyannotated with the concepts from the particular variant ofthat year.

Manuel mappings are used as the identities.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Intension, extension and label of political concepts

The label of a concept is obtained using the SKOS Corelabelling property skos:prefLabel.

The extension ext(Ct) of a concept Ct ∈ Vt at time t is theset of all sentences annotated by Ct , i.e.,

exts(Ct) = {s ∈ ∆t | annotatedBy Ct}.

The intension of a concept int(Ct) is determined by the mostassociated concepts. For each concept C , its intension is a setof concepts which co-occur the most in the sentences theycode in one moment in time.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Intension, extension and label of political concepts

The label of a concept is obtained using the SKOS Corelabelling property skos:prefLabel.

The extension ext(Ct) of a concept Ct ∈ Vt at time t is theset of all sentences annotated by Ct , i.e.,

exts(Ct) = {s ∈ ∆t | annotatedBy Ct}.

The intension of a concept int(Ct) is determined by the mostassociated concepts. For each concept C , its intension is a setof concepts which co-occur the most in the sentences theycode in one moment in time.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Intension, extension and label of political concepts

The label of a concept is obtained using the SKOS Corelabelling property skos:prefLabel.

The extension ext(Ct) of a concept Ct ∈ Vt at time t is theset of all sentences annotated by Ct , i.e.,

exts(Ct) = {s ∈ ∆t | annotatedBy Ct}.

The intension of a concept int(Ct) is determined by the mostassociated concepts. For each concept C , its intension is a setof concepts which co-occur the most in the sentences theycode in one moment in time.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Intension, extension and label of political concepts

The label of a concept is obtained using the SKOS Corelabelling property skos:prefLabel.

The extension ext(Ct) of a concept Ct ∈ Vt at time t is theset of all sentences annotated by Ct , i.e.,

exts(Ct) = {s ∈ ∆t | annotatedBy Ct}.

The intension of a concept int(Ct) is determined by the mostassociated concepts. For each concept C , its intension is a setof concepts which co-occur the most in the sentences theycode in one moment in time.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Similarity measures

Edit distance between concept labels

Jaccard similarity between concept intensions

Instance-matching based similarity between concept extensions

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Stability of political concepts

2 0 0 2 2 0 0 3 2 0 0 6

EnvironmentalActivist

Democracy0 . 0 3

Moroccans

0 .02

Rechtss taa t

0 . 0 3

Democracy

High Incomes

0.02

Referendum

0.04

Bureaucracy

0.04

Democracy

Islam

0.02

VotingComputers

0.01

Sharia

0 .03

Figure: Intension of concept Democracy in 3 years, with average stabilityof (Sint = 0.02)

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Stability of political concepts

2 0 0 2 2 0 0 3 2 0 0 6

unions

employees unions

Socio-EconomicCouncil

employees

employers

0 . 1 2 20 . 0 9

0 . 2 2 9

socialpact

employers

0 . 1 8 9

0.266

0 .26

employers

employees

workmigration

0 . 0 3 2

0.085

discrimination

0 . 0 4 8

Figure: Intension of concept Employers in 3 years, with average stabilityof (Sint = 0.15)

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in political communication

Concept shifts of political concepts

1994

1998 2006

Military

Military

Dutch military deployment

Military 2003

2006

Childcare

Childcare

Free Childcare

(a) Label shift (b) Extensional shift

Figure: Example of label shift and extension shift, where the red linksindicate the two concepts are identical according to our domain experts,while the blue links are the most similar concepts in terms of thecorresponding aspect.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in DBpedia

Concept drift in DBpedia

We studied 4 versions of DBpedia: 3.2, 3.3. 3.4 and 3.5

We use URI references as identities of concepts

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in DBpedia

Concept drift in DBpedia

We studied 4 versions of DBpedia: 3.2, 3.3. 3.4 and 3.5

We use URI references as identities of concepts

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in DBpedia

RDF(S) concepts and their meaning

Definition

Let O be the DBpedia ontology, i.e., a set of triples (s, p, o), andO∗ the semantic closure of O.

The rdf-label labr (C ) of C is defined as the object of the(C ,rdfs:label, o).

The rdf-extension extr (C ) of C is defined as the set ofresources r such that (r rdf:type C ) ∈ O∗.

The rdf-intension intr (C ) of C is defined as the set of alltriples (C , p, o) ∈ O∗ in O where p =rdfs:subclass and(s, p,C ), where p ∈ {rdfs:subclass, rdfs:domain, rdfs:range}.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in DBpedia

Stability ranking of DBpedia concepts

Rank Extensional Intensional

1 Planet SportsEvent

2 Road FormulaOneRacer

3 Infrastructure WineRegion

4 Cyclist Cleric

5 LunarCrater WrestlingEvent

... ...163 OfficeHolder Vein

164 Politician BasketballPlayer

165 City EthnicGroup

166 College Band

167 ChemicalCompound BritishRoyalty

Table: The top 5 most stable and last 5 least stable DBpedia concepts interms of their extension and intension (of the 167 concepts present in allfour versions)

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in DBpedia

Concept shifts in DBpedia

dbpedia32 dbpedia33 dbpedia34 dbpedia35

SportsEvent SportsEvent0.98

Protista Protista0.89

City City0.99

River River0.99

ChemicalCompound ChemicalCompound0.64

SportsEvent0.98

Fungus0.77

City0.84

River0.78

ChemicalCompound0.47

SportsEvent0.97

Fungus0.89

Settlement0.60

Stream0.62

ChemicalCompound0.71

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in LKIF-Core

LKIF-Core ontology

The Legal Knowledge Interchange Format (LKIF) CoreOntology is a core ontology of basic legal concepts, developedby the ESTRELLA consortium

We study 4 major versions of LKIF-Core: 1.0, 1.0.2, 1.0.3 and1.1.

Unfortunately, the rdfs:label actually was rarely used; only 4concepts specify their labels which stay constant for allvariants.

There are no instances associated with these legal concepts

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in LKIF-Core

LKIF-Core ontology

The Legal Knowledge Interchange Format (LKIF) CoreOntology is a core ontology of basic legal concepts, developedby the ESTRELLA consortium

We study 4 major versions of LKIF-Core: 1.0, 1.0.2, 1.0.3 and1.1.

Unfortunately, the rdfs:label actually was rarely used; only 4concepts specify their labels which stay constant for allvariants.

There are no instances associated with these legal concepts

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in LKIF-Core

The meaning of OWL concepts

Definition

Let O to be the OWL ontology and O∗ denote the OWLIMinferred semantic closure. The owl-label labo(C ) of C is defined asthe object of the (C ,rdfs:label, o). The owl-intension into(C ) of Cis defined:

1 all triples (C , p, o) ∈ O∗ and (s, p,C ) ∈ O∗

2 all triples in chains {(C , p1, o1) ◦ (s2, p2, o2) ◦ . . . , ◦(sn, pn, on)}where sk = ok−1, plus

3 all triples in chains{(s1, p1, o1) ◦ (s2, p2, o2), ◦, . . . , ◦(sn, pn,C )} where sk+1 = ok

being blank nodes.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in LKIF-Core

Stable and unstable concepts

Most stable concepts Most unstable concepts

norm.owl#Custom legal-action.owl#Mandate

expression.owl#Promise legal-action.owl#Public Law

norm.owl#Potestative Expression legal-action.owl#Asignment

norm.owl#Hohfeldian Power legal-action.owl#Act of Law

relative-places.owl#Place legal-action.owl#Delegation

Table: Top 5 stable and unstable concepts.

Introduction A theory of concept drift Case studies Summary and future work

Concept drift in LKIF-Core

Intensional shift in LKIF-Core

lkif1.0:action.owl#Speech Act lkif1.0.2:expression.owl#Speech Act

lkif1.0:action.owl#Termination lkif1.0.2:process.owl#Termination

lkif1.0.2:lkif-top.owl#Mental Concept lkif1.0.3:lkif-top.owl#Mental Entity

lkif1.0.2:lkif-top.owl#Physical Concept lkif1.0.3:lkif-top.owl#Physical Entity

Table: Examples of confirmed intensional shift in LKIF-Core

Introduction A theory of concept drift Case studies Summary and future work

Summary

We proposed a general theory to study concept drift based onconcept identity.

We introduced a theoretical foundation for the notions ofdrift, shift and stability

We applied the general mechenism in three practicalapplications modelled in SKOS, RDFS and OWL respectively.

Introduction A theory of concept drift Case studies Summary and future work

Future work

Investigate alternative theories for concept drift, such as basedon morphing

Develop systematic evaluation methods

Develop applications which leverage the detected concept drift