porting terminologies to the semantic web
DESCRIPTION
aka : the Semiotic Web. Presentation at ISKO UK Linked Data Event, London, 2010-09-14TRANSCRIPT
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 1
Porting terminologies to the Semantic Web(aka: the Semiotic Web)
making sense of content TM
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 2
Mondeca at a glance
Facts and figures - Established : 1999 - Founder and CEO : Jean Delahousse - Staff 2010 : 22- Bernard Vatant has been Senior Consultant for Mondeca since 2000
Products- Intelligent Topic Manager (Vocabularies and Knowledge base management)- CA Manager (Content integration through semantic annotation)
Services- Consulting and training in Semantic Web technologies deployment- Modeling, data and vocabulary migration and integration
References- Publication, territorial management, tourism, public sector, health
- Lexis Nexis, Wolters Kluwer, Thomson, BnF, Documentation Française, OPOCE …
- Participation in many national and european research projects- Including DataLift http://datalift.org/ (just about to kick off)
- Ongoing participation in Semantic Web standards and linked data community- From Topic Maps (2000-2001) to OWL, SKOS, …- In the Cloud : geonames.org, lingvoj.org ontologies
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 3
Summary
A semiotic view of terminology- « Every sign is a thing » : signs (terms) are resources (business objects)- The semiotic triangle : terms, concepts and referents
Current approaches to term representations- SKOS-XL, BS 8723, ISO 25964- The Eurovoc model : a term is a denotation of a concept- Lexvo.org : a term is a sign defined by string + language- ISO TC-37 standards (LMF) only XML schemas, no ontology
Moving forward- Limits of current approaches- A strawman « Simple Term System »- Introducing explicit « meaning » objects (aka : references or significations)
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 4
The pervasive Web – quick reminder
Internet (ca.1970)- Network of identified, connected and addressable computers
- Technical support : IP addresses
Web 1.0 (ca. 1990)- Network of identified, connected and addressable resources
- Technical support : URLs, http
Semantic Web (ca. 2010)- Network of identified, connected and addressable representations
- Technical support : URIs, RDF, content negociation
- Just about anything can be represented and connected- People (Social Web), Devices (Web of Things), Places (GeoSemantic Web),
Concepts (Web of Vocabularies) … « Everything is a Thing »
Everything? Even signs?
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 5
Every sign is a thing (& vice versa)
http://fr.wikipedia.org/wiki/Fichier:Impasse_%C3%A0_sens_unique.jpg
Impasse Saint-Quentin
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 6
The semiotic triangle : road signs
impasse, cul-de-sac, voie sans issue, no through road, dead end, 死路… have to get out using the path you get in … sometimes no way to get out at all
« signifiant »
« signifié »
« référent »
denotation
representation
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 7
The semiotic triangle : lexical signs (terms)
L’Arctique est la région entourant le pôle Nord de la Terreà l’intérieur et aux abords du cercle polaire nord (Wikipédia)
‘Arctique’@fr
« signifiant »
« signifié »
« référent »
denotation
representation
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 8
Sorting out Terms, Concepts and Things
Terms are lexical entities (signifiants)- Generally used as denotations for concepts or things- If possible qualified by terminologists- Expressed in some identified natural language
- Devil in the details : encoding system, scripting system.
Concepts are specific representations of « things »- In a certain view of the world- For a specific functional purpose
- Indexing, classification, search, inference
Things are ... just things- What users are about at the end the day (people, places, products, ideas …)
Terms, Concepts and Things should all be first-class citizens in the Semantic Web- Switching from a term-centric to a concept-centric view …
- Like in SKOS and ISO 25964- … does not mean that terms and terminology are out of the picture!
- They simply need to be defined and managed at a different level
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 9
Translation into Semantic Web languages
Something« référent »
Concept« signifié »
Term« signifiant »
denotes
represents
owl:Thinghttp://dbpedia.org/resource/Arctic
skos:Concepthttp://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11940481m
skosxl:Labelhttp://lexvo.org/id/term/fra/Arctique
foaf:focus
lvont:means
‘Arctique’@frskosxl:literalForm
skosxl:prefLabel
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 10
Concept-centric approach of terms (SKOS)
The concept-centric approach put concepts at the center of discourse- Terms are denotations of concepts- Standalone terms can be considered in theory, but not in practice
Minimal, shallow level of description of terms- Basic properties : lexical form + language- No support for proper lexical properties
- Part of speech, lemma, tokenization, variant
- Basic expressivity for term-to-term relationships- skosxl:labelRelation is just an abstract superproperty
Good expressivity of the term-to-concept relationships- But clearly asserted from a concept viewpoint
No support for context- Implicit context : the term-concept relationship inside a given concept scheme
Similar approach used by BS 8723 and ISO 25964- Also used in EUROVOC model with customized extensions
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 11
Concept-centric approach to homographs
A term can denote more than one concept- aka: homography, ambiguity … issue
Q : Are homograph terms (denoting different concepts) the same resource, or not?- In other words : should they be given the same URI?
The SKOS-xl approach- SKOS-xl statement : If two instances of the class skosxl:Label
have the same literal form, they are not necessarily the same resource. - IOW : Existence of distinct terms (distinct URIs) bearing the same literal form
in the same language is not forbidden.- « table@en » can be the literal form of different terms (different URIs),
e.g., denoting different concepts such as « table (furniture) », « table (data base) » …
- SKOS-xl does not enforce this distinction, either- Using the same term (same URIs) for different concepts is not forbidden
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 12
Concept-centric model : EUROVOC
EUROVOC model is built as extension of SKOS
Subclasses of skosxl:Label- eu:ThesaurusTerm, eu:PreferredTerm, eu:SimpleNonPreferredTerm …
- Type of term defined by the type of relationship to a concept
- No « standalone definition » of a term : a term is attached to a single concept
Specific relationships between terms- Translation, Permuted lexical form- Full name/short name, Acronym/expansion
No lexical (grammatical) level properties- Neither POS, lemma, variants …
Homographs are distinct terms- Hence homographs attached to different concepts
- Have different URIs …- … are not linked whatsoever, except appearing as sibling results of a query …- … should not occur since EUROVOC should be a unique name space
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 13
A concept representation in EUROVOCas seen in Mondeca back-office (ITM)
pref label in current language
concept attributes
preferred term in current language
preferred terms in other languages
User language choice(25 languages available)
concept schemes hierarchy(domains and microthesauri) related concepts
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 14
A concept representation (continued)
non-preferred termsin various languages
broader-narrower hierarchyDisplay uses terms in current user language
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 15
Term representation level
lexical formterm type
term attributes
The term « meaning » concept Display uses the preferred termin current user language
relationships between terms
User language choice(25 languages available)
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 16
The term-centric (semiotic) approach
As used by Lexvo.org A term is uniquely defined by a string and a language
- This definition is made functional in the URI structure- Example : http://lexvo.org/id/term/fra/Arctique
A term can have zero or more declared « meanings »- Values of the « lvont:means » property
The URI is functional whether there is zero, one or more declared « meanings »
Simple approach, but the number of meanings is to everyone guess1. http://www.lexvo.org/id/term/eng/hubject
- No meaning found in the data base, but the world is open 2. http://www.lexvo.org/id/term/eng/photosphere
- Two meanings found, linked by a lexvont:nearlySameAs relationship
3. http://www.lexvo.org/id/term/eng/table - How many meanings?
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 17
What « table@en » means
many more of the same…
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 18
ISO TC-37 terminology standards
Build up on top of various other (ISO) standards Define a lot of data models or schemas
- Either UML or XML schemas
Dwelve in deep complex lexical details- Addressing fine-grained terminology management issues
But provide no interoperability with the Semantic Web universe- Not even as informative annexes
Example : Lexical Markup Framework- An attempt to produce an OWL representation of LMF model- Neither normative nor even OWL-conformant- Been sitting useless on LMF website for two years.
- Any feedback? Does anyone really care? http://www.lexicalmarkupframework.org/
Even if published in Semantic Web formats- Chances of mainstream adoption are weak- Due to their sheer complexity…
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 19
Adding context to the semiotic triangle
http://sw.opencyc.org/2009/04/07/concept/en/Table_PieceOfFurniture
‘table’@en
« signifiant »
« signifié »
« référent »
denotation
representation
Furniture
« context »
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 20
Context of meaning in existing approaches
In SKOS and concept-centric models- The context of the meaning is the Concept Scheme
<http://id.loc.gov/authorities/sh85131792#concept>a skos:Concept [ skos:prefLabel ‘Table@en’
skos:inScheme http://id.loc.gov/authorities#topicalTerms> ]- Reads from the viewpoint of the term
- ‘Table’ is the english preferred term for concept ‘ #sh85131792’in the context of LCSH topical terms
In the purely semiotic approach of Lexvo.org- The only context is the declared language- Ambiguity is assumed, but not resolved- A term description is a bag of possible meanings ad translations- Useful, but not enough
In a nutshell, regarding context- Concept-centric approach is too restrictive …- Lexvo.org approach is too open …
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 21
Trying to capture context
Context can be more than an implicit skos:ConceptScheme- A language- A country, a community- A document or corpus lexical context- Any combination of the above …
Actually a context might be any kind of relevant resource- Including list of resources
Neither term or concept should be linked directly to a context- Need to define « reference » or « meaning » resources- Linking one term to one concept and one context- Allowing attachement of metadata (e.g., Dublin Core)
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 22
Requirements for « STS »
STS = « Simple Terminology System » - aka : « Simple Terminology Semiotics »
As simple as SKOS is for representation of concepts- And as extensible
Based on core classes of LMF or any relevant ISO TC-37 model- Simpler than LMF but extensible to capture all LMF subtleties
Interoperable with concept layers formats (SKOS and SKOS-xl) As open and robust as the semiotic approach of Lexvo.org Including representation of context/meanings/references
And of course recommended by a relevant standard body - Food for another W3C recommandation track?
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 23
STS draft model (built upon lexvo ontology)
lvont:Term
sts:Context
sts:signifier
sts:Meaning
skos:Concept
sts:inContextsts:signified
anythingsts:contextPropery
geo:SpatialThingsts:spatialContext
time:Periodsts:timeContext
skos-xl:Label
lvont:Languagelvont:language
rdf:Literalskosxl:literaForm
sts:lexicalProperty
Dublin Core metadatadcterms:*
anything
extensions to fit e.g., TC-37 LMF schemasor EUROVOC management specifics …
making sense of content TM
ISKO Linked Data Event - London - 2010-09-14 24
Ready for a standardization track ?