semantic relations: new (terminological) challenges in a world of linked data

27
Semantic relations: new challenges in a world of linked data Nathalie Aussenac-Gilles (IRIT – CNRS, Toulouse, France) [email protected]

Upload: nathalie-aussenac-gilles

Post on 18-Jul-2015

222 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Semantic relations: new challenges in a world of

linked data

Nathalie Aussenac-Gilles

(IRIT – CNRS, Toulouse, France)

[email protected]

Outline of the talk

• Semantic relations

– what do we mean?

– why do we need them for?

• Finding semantic relations

– what are the issues?

– what can large corpora and machine learning do for you?

– how can terminology contribute to Linked Data?

2Semantic relations - Aussenac09/01/2015

Semantic relations,what do we mean?

Research field

• Linguistics: semantic relations, semantic roles, discourse relations

• Terminology– Weak structure

– Stored in DB or SKOS models

• Information extraction

What is a relation

A tree comprises at least a trunk, roots and branches.

A tree [Plants] comprises [meronymy] at least a trunk, roots and branches.

tree has_parts trunk, roots, branches(tree, has parts, trunk) … in a gardening terminology

looks for relations between instances

09/01/2015 Semantic relations - Aussenac 3

tree Plantation year Species Branches

Tree1 1990 Oak > 20

Tree2 1995 Oak 15

whole

parts

Semantic relations,what do we mean?

Research field• Domain Ontology engineering

– Formal (logic, RDF, OWL …) and may lead to infer new knowledge

– The relation is part of a network– May be shared or not

• Semantic web– Independent triples– Publically available in data

repositories with W3C Standard format

– Connect triples with existing ones, with web ontologies

What is a relation

bot:Tree bot:has_part bot:Branch

09/01/2015 Semantic relations - Aussenac 4

Trunk

Has-part

Root

Plant

Fonguscereals

Has-part

Root

is_a

TreeHas-part

Branch

bot:myTree

bot:has-part

bot:MyTreeRoots

bot:Treebot:has-

partbot:Branch

rdf:Type

Example: tree in DBPedia

5

dbpedia-owl:tree

dbpedia-owl:Species

dbpedia-owl:Place

Semantic relations - Aussenac09/01/2015

dbpedia-owl:PhysicalEntity

rdfs:subClassOf

dbpedia-owl:Organism

Example: Plants in DbPedia

6

owl:SameAsyago:WordNet_Plant_

100017222dbpedia-owl:Plant

Semantic web specificities- New types of relations: mappings- Focus 1: resources, either classes or instances- Focus 2: relations between resources - Focus 3: give a type to resources

dbpedia-owl:Acer_Stone

bergae

dbpedia-owl:Alopecurus_ca

rolinianus

dbpedia-owl:Alsmithia_long

ipesdbpedia-owl:…

rdf:typerdf:typerdf:typerdf:type

Semantic relations - Aussenac09/01/2015

dbpedia-owl:PhysicalEntity

rdfs:subClassOf

dbpedia-owl:Organism

Example: Plants in DbPedia

7

owl:SameAsyago:WordNet_Plant_

100017222dbpedia-owl:Plant

Semantic web specificities- New types of relations: mappings- Focus 1: resources, either classes or instances- Focus 2: relations between resources - Focus 3: give a type to resources

dbpedia-owl:Acer_Stone

bergae

dbpedia-owl:Alopecurus_ca

rolinianus

dbpedia-owl:Alsmithia_long

ipesdbpedia-owl:…

rdf:typerdf:typerdf:typerdf:type

Semantic relations - Aussenac09/01/2015

dbpedia-owl:PhysicalEntity

rdfs:subClassOf

dbpedia-owl:Organism

Example: Plants in DbPedia

8

owl:SameAsyago:WordNet_Plant_

100017222dbpedia-owl:Plant

Semantic web specificities- New types of relations: mappings- Focus 1: resources, either classes or instances- Focus 2: relations between resources - Focus 3: give a type to resources

dbpedia-owl:Acer_Stone

bergae

dbpedia-owl:Alopecurus_ca

rolinianus

dbpedia-owl:Alsmithia_long

ipesdbpedia-owl:…

rdf:typerdf:typerdf:typerdf:type

Semantic relations - Aussenac09/01/2015

Semantic relations: why do we need them for?

Research field

• Linguistics: to understand how language produces meaning

• Terminology: to capture domain specific terms and meanings

• Information extraction: to collect structured data – the schema (classes and relations) is known

Relations contribute to• semantics and language

interpretation• formal semantics and

discourse semantics

• Structure terminologies and make browsing easier

• Connect terms• Give meaning to “concepts”

• Find related entities (values) to build DB and then mine these data (statistics)

09/01/2015 Semantic relations - Aussenac 9

Semantic relations: why do we need them for?

Research field

• Formal ontologies: to define axioms and inferences associated with relations

• Domain ontologies: human and machine “understanding”

• Linked data: interoperability, data connection from various sites or applications

Relations contribute to• Inferences and reasoning• Ex: sub-classes inherit of some class

properties• Ex : transitivity (cf. some parthood rel.),

symmetry • Ex: cardinality …

• Relations define concepts by differences and similarities

• Relations have labels and are human readable ; each one can be processed in a specific way

• Relations connect resources (data)• Their semantics is defined in ontologies

or produce behaviors in inference engines (rdf:subClassOf)

09/01/2015 Semantic relations - Aussenac 10

09/01/2015 Semantic relations - Aussenac 11

Erarht Rahm http://dbs.uni-leipzig.de/file/paris-Octob2014.pdf

Semantic web specificities- Relations connect web data called resources- Relations connect data with ontology classes: importance of hypernymy- Relations may map ontology classes

Finding semantic relations, what are the issues?

• Knowledge sources: – where can we find relations?

• Extraction techniques– How can we identify them?

• Representation – Which way do I represent this information?

• Validation– What makes a relation representation valild? Relevant?

09/01/2015 Semantic relations - Aussenac 12

Finding semantic relations, what are the issues?

• Knowledge sources – text, human experts, existing “semantic” resources (lexicon,

terminologies, ontologies, Linked Data vocabularies)– Domain specific vs general knowledge

• Extraction techniques– “obvious” language regularities, known relations and classes (or

entities) -> Patterns• Issues : domain dependence, domain coverage, variation and

flexibility, rigidity (need to be regularly updated) • Research issues: automatic building by machine learning

– “more implicit” language regularities, medium size corpora, open list of classes/entities -> supervised learning

– Very large corpora, unexpected relations -> unsupervised learning

13Semantic relations - Aussenac09/01/2015

Pattern based relation extraction, an issue: variation

• A tree comprises at least a trunk, roots and branches.

• With branches reaching the ground, the willow is an ornamental tree.

• The tree of the neighbor has been delimed.

• He climbs on the branches of the tree.

• This tree is wonderful. Its branches reach the ground.

• Contains: very systematic pattern; the parts may be difficult to spot; enumeration > various parts

• With: meronymy pattern only in some genres (such as catalogs, biology documents)

• Delimed : Term and pattern are in the same word; requires background knowledge: delimed -> has_partbranches (and branches are cut)

• Of : Very ambiguous pattern; polysemy reduced in [verb N1 of N2]

• Its : very ambiguous pattern; necessity to take into account two sentences

14Semantic relations - Aussenac09/01/2015

Pattern based relation extraction, learning patterns (1)

• Patterns are seen (and stored) as lexicalizations of ontology properties

• Patterns are “extracted” from syntactic dependencies between related entities (in triples)

• Assumes that patterns are structured around ONE lexical entry

• Lemon format for lexical ontologies

• Entries can be frames

09/01/2015 Semantic relations - Aussenac 15

ATOLL—A framework for the automatic induction of ontology lexica S. Walter, C. Unger, P. Cimiano, DKE (94), 148-162 (2014)

Pattern based relation extraction, learning patterns (1)

• Patterns are seen (and stored) as lexicalizations of ontology properties

• Patterns are “extracted” from syntactic dependencies between related entities (in triples)

• Assumes that patterns are structured around ONE lexical entry

• Lemon format for lexical ontologies

• Entries can be frames

09/01/2015 Semantic relations - Aussenac 16

ATOLL—A framework for the automatic induction of ontology lexica S. Walter, C. Unger, P. Cimiano, DKE (94), 148-162 (2014)

Pattern based relation extraction, learning patterns (2)

09/01/2015 Semantic relations - Aussenac 17

Michelle Obama is the wife of Barack Obama, the current president.Michelle Obama allegedly told her husband, Barack Obama, to ..Michelle Obama, the 44th first lady and wife of President Barack

Dbpedia:spouse

Find all lexicalizations of the entities: Michelle Obama, Mrs. Obama, Michelle Robinson …

Pattern based relation extraction, learning patterns (3)

09/01/2015 Semantic relations - Aussenac 18

• Pattern = shortest path btw the 2 entities in the dependency graph

[MichelleObama (subject), wife (root), of (preposition), BarackObama (object)]

• Lexical entry in the ontology

Relation extraction:learning relations from enumerative structures

09/01/2015 Semantic relations - Aussenac 19

IS_A

IS_A

Learning relations from an parallel enumerative structure = - classification task to identify the relation (IS_A, part_Of, other)- Term extraction to identify the primer and the items

Relation extraction:learning relations from enumerative structures

• Corpus– 745 enumerative structures from

Wikipedia pages– 3 relation types: taxonomic,

ontological_non_taxonomic, non_ontological

• Classification task– Feature definition– Automatic evaluation of features– 3 algorithms are compared : SVM,

MaxEntropy and baseline (majority)– Training of the 2 algorithms

• Results– 82% f-measure for SVM– Best result with a 2 step process

(ontological yes/no -> feature and then taxonomic yes/no)

09/01/2015 Semantic relations - Aussenac 20

From intepretation to representation

• A tree comprises at least a trunk, roots and branches.

• With branches reaching the ground, the willow is an ornamental tree.

• The tree of the neighbor has been delimed.

Tree

Trunk

Branches

Has-part Roots

Ornamental Tree

Willow Tree Has-part Branches

Has-part Branches

Has-part Branches

Semantic relations - Aussenac 2109/01/2015

From intepretation to representation

• A tree comprises at least a trunk, roots and branches.

• With branches reaching the ground, the willow is an ornamental tree.

• The tree of the neighbor has been delimed.

• He’s climbing on the branches ofthe tree.

• This tree is wonderful. Itsbranches reach the ground.

Tree

Trunk

Branches

Has-part Roots

Has-part Branches

Has-part Branches

Semantic relations - Aussenac 2209/01/2015

NeighborTree

Instance _of

Finding semantic relations:what can large corpora and machine

learning do for you ?

• Learning patterns– Poor results

– Requires very large data sets

– Reasonable for general knowledge

• Learning relations– Much more relevant

– Large variety of approaches in the state of the art

– Key step = select feature

– more features, better the results

09/01/2015 Semantic relations - Aussenac 23

Finding semantic relations, what can terminology do for the semantic web?

• Terminology can play a key role – change its practices – contribute to enrich the LOD with QUALITY data

• Contribute to define /evaluate tools– evaluate machine learning tools – Test and experiment them on various textual genre, in

various domains– Propose more features

• Propose terminology tools – … to improve these approaches with the studies carried

out during the last 20 years to build terminologies and terminological knowledge bases.

– term extractors, relation extractors, patterns

24Semantic relations - Aussenac09/01/2015

Finding semantic relations, what can terminology do for the semantic web?

• Contribute to enrich the LOD with terminologies

– Terminologies as structuring networked vocabularies

– Publish your ontologies as public web resources

– Conform to linked data requirements

– Use standards format like SKOS (Simple KnowledgeOrganization System) the W3C’s OWL ontology for creating

thesauruses, taxonomies, and controlled vocabularies

• Contribute to the multilingual LOD

– Add lexicon to ontologies

– Multilingual efforts: LEMON, MONNET projects DID NOT include terminologists

09/01/2015 Semantic relations - Aussenac 25

The LOD and the linguistic LOD

need you!

http://linguistics.okfn.org/resources/llod/

09/01/2015 Semantic relations - Aussenac 26

Some landmark KOS LD implementations

• Many Libraries

– Swedish National Library’s Libris catalogue and thesaurus http://libris.kb.se/

– Library of Congress’ vocabularies, including LCSH http://id.loc.gov/

– DNB’s Gemeinsame Normdatei (incl. SWD subject headings) http://dnb.info/gnd/• Documentation at https://wiki.dnb.de/display/LDS

– BnF’s RAMEAU subject headings http://stitch.cs.vu.nl/

– OCLC’s DDC classification http://dewey.info/ and VIAF http://viaf.org/

– STW economy thesaurus http://zbw.eu/stw

– National Library of Hungary’s catalogue and thesauri

– http://oszkdk.oszk.hu/resource/DRJ/404(example)

• Other fields

– Wikipedia categories through Dbpedia http://dbpedia.org/

– New York Times subject headings http://data.nytimes.com/

– IVOA astronomy vocabularies http://www.ivoa.net/Documents/latest/Vocabularies.html

– GEMET environmental thesaurus http://eionet.europa.eu/gemet

– Agrovoc http://aims.fao.org/

– Linked Life Data http://linkedlifedata.com/

– Taxonconcept http://www.taxonconcept.org/

– UK Public sector vocabularies http://standards.esd.org.uk/ (e.g., http://id.esd.org.uk/lifeEvent/7)

09/01/2015 Semantic relations - Aussenac 27