bt exact technologies - adastral park, ipswich july - october 2003 linguistic web services for...

30
BT Exact Technologies - Adastral Park, Ipswich BT Exact Technologies - Adastral Park, Ipswich July - October 2003 July - October 2003 Linguistic Web Services for Semantic Web Dr. Vassil T. Vassilev Dr. Vassil T. Vassilev London Metropolitan University London Metropolitan University BT Short Term Research Fellowship BT Short Term Research Fellowship

Upload: grant-matthews

Post on 26-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

BT Exact Technologies - Adastral Park, Ipswich BT Exact Technologies - Adastral Park, Ipswich July - October 2003July - October 2003

Linguistic Web Services for Semantic Web

Dr. Vassil T. VassilevDr. Vassil T. VassilevLondon Metropolitan UniversityLondon Metropolitan University

BT Short Term Research FellowshipBT Short Term Research Fellowship

Part IPart I

Semantic Web and

Linguistic Data Processing

Content

1 Project Background: Semantic Web and NLP2 RDFRDF – Lingua Franca of Semantic Web3 The need for linguistic support of Semantic Web4 WordNet:WordNet: Universal Linguistic Resource

WordNet as a model of the word semantics WordNet as an online thesaurus WordNet as a relational database

5 Step One:Step One: Putting WordNet on the Web6 Step Two:Step Two: Extending WordNet 7 Step Three:Step Three: LinguaShare8 Problems and DirectionsProblems and Directions

1 Project Background: Semantic Web and NLPSemantic Web: Model-driven framework for semantically

rich data processing over the Web – RDFRDF – Dublin Core (1999), W3C (1999) DAMLDAML – DARPA (2000); OIL – FP5 (2000)

http://www.w3c.org/2001/sw/http://www.w3c.org/2001/sw/http://www.dublincore.org/documents/dces/http://www.dublincore.org/documents/dces/

Semantic Thesaurus: Linguistic database containing word meanings and semantic relations WordNetWordNet – George Miller, Princeton Univ. (1990) EuroWordNetEuroWordNet – FP4 (1997); BalkaNet – FP5 (2000)

http://www.cogsci.princeton.edu/~wn/http://www.cogsci.princeton.edu/~wn/http://www.hum.uva.nl/~ewn#EuroWordnethttp://www.hum.uva.nl/~ewn#EuroWordnet

1.1. Semantic data processing over the Web

Syntactic markupSyntactic markup of the data (RDFRDF,Topic MapsTopic Maps) Using a kind of a meta-languagemeta-language (schema) for

providing intended semantics of the data represented (RDFSRDFS, DAMLDAML)

Specify domain ontologiesontologies for representing the restrictions, dependencies, regularities and rules for inference (KIFKIF, OILOIL, OWLOWL)

Layer Cake Layer Cake

(McGuiness, 2002)(McGuiness, 2002)

1.2. Computer-based semantic thesaurus

Explaining the Explaining the meaningmeaning of the words of the words Finding other words with the Finding other words with the same meaning

((synonymssynonyms)) Finding of other wordsFinding of other words with similar meaning in

the same context (synonymous usagesynonymous usage) ) Finding of semantically independent, related or

dependent word forms (semantic referencingsemantic referencing)

EXAMPLE:EXAMPLE: Type inference through analysis of Type inference through analysis of the argument structure of verb phrases and their the argument structure of verb phrases and their syntactic appearance in texts:syntactic appearance in texts: The varieties of argument structure for EVENT-verbs

suggests seven major subtypes: PHENOMENON, ASPECTUAL, STATE, ACT, PSYCHOLOGICAL_EVENT, CHANGE and CAUSE_CHANGE

Based on them, we can differentiate COGNITIVE_EVENT (experiencer is syntactic subject, e.g. fear) from ACT (experiencer is syntactic object, e.g., frighten)

Determining ontological information Determining ontological information using lexical informationusing lexical information

1.3 Project definition

Aims:Aims: utilizing the full potential of WordNet multilingual thesauri utilizing the full potential of WordNet multilingual thesauri

as an universal linguistic ontology for semantic verification as an universal linguistic ontology for semantic verification of specialist terminologyof specialist terminology

embedding it in applications for semantic data processing embedding it in applications for semantic data processing over the Webover the Web

using contemporary Semantic Web Services technologies using contemporary Semantic Web Services technologies and toolsand tools

Methodology:Methodology: Analytical research (Analytical research (WordNetWordNet)) Modeling (relational models, Modeling (relational models, UMLUML)) Software prototyping (Software prototyping (TomcatTomcat, , MySQLMySQL))

2 RDF – Lingua Franca of Semantic Web

Language to describe resources primarily on the Web Language to describe resources primarily on the Web (has (has semanticssemantics); can be used not only on the Web – ); can be used not only on the Web – e.g. Dublin Core for library cataloguese.g. Dublin Core for library catalogues

Use XML as a syntax representation of RDF Use XML as a syntax representation of RDF statements (statements (serialization syntaxserialization syntax); there are alternative ); there are alternative serializations (e.g. triplets), but XML is the most serializations (e.g. triplets), but XML is the most popularpopular

The language can formulate statements about the The language can formulate statements about the language itself (language itself (meta-descriptionmeta-description); RDF Schema or ); RDF Schema or RDFSRDFS

The statements can be stored, processed and The statements can be stored, processed and transported over the Web (transported over the Web (data persistencedata persistence))

2.1 RDF Model

ResourcesResources – Things being described by RDF expressions. – Things being described by RDF expressions. Resources are named by URIs Resources are named by URIs

ExamplesExamples: HTML document, XML element within the document, : HTML document, XML element within the document, Collection of pages, Book Collection of pages, Book

PropertiesProperties – Specific attributes or relations used to describe a – Specific attributes or relations used to describe a resource. Attributes and relations can be also used as resources.resource. Attributes and relations can be also used as resources.

ExamplesExamples: Creator, Title, Name: Creator, Title, Name

ValuesValues – Simply literals or references to resources – Simply literals or references to resources

Statements, e.g.Statements, e.g. Predicate(Property) Predicate(Property)

Subject(Resource)Subject(Resource) Object(Value) Object(Value)

ExampleExample

““Vassil Vassilev whose e-mail is Vassil Vassilev whose e-mail is [email protected] is the creator of web [email protected] is the creator of web page http://www.lgu.ac.uk/~vassil/index.html”page http://www.lgu.ac.uk/~vassil/index.html”

Subject Subject (Resource): (Resource): ‘http://www.lgu.ac.uk/~vassil/index.html’‘http://www.lgu.ac.uk/~vassil/index.html’

PredicatePredicate (Property): ‘Creator’ (Property): ‘Creator’ObjectObject (Value): ‘Vassil Vassilev’ (Value): ‘Vassil Vassilev’

Graphical representationGraphical representation

http://w w w .lgu.ac.uk/~vassil/index.htm l

v.vassilev@ londonm et.ac.uk Vassil Vassilev

Creator

Em ail Nam e

Serialized representation in Serialized representation in XMLXML

<<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.1/” xmlns:vcard="http://imc.org/vCard/3.0#"> <rdf:Description about=“http://www.lgu.ac.uk/~vassil/index.htm”http://www.lgu.ac.uk/~vassil/index.htm” <dc:creator> <rdf:Description> <vcard:FN>Vassil Vassilev</vcard:FN>

<vcard:EMAIL>[email protected]</vcard:EMAIL>

</rdf:Description> </dc:creator> </rdf:Description> </rdf:RDF>

2.2 Semantic Web Applications

Context-based Information RetrievalContext-based Information Retrieval (search (search after semantic patterns)after semantic patterns)

Personalized Information DeliveryPersonalized Information Delivery (data (data presentation based on user profiles)presentation based on user profiles)

User trackingUser tracking (dynamic construction of user (dynamic construction of user profiles based on log analysis)profiles based on log analysis)

Document summarizingDocument summarizing (text generation based (text generation based on models of the meaning)on models of the meaning)

Automatic translationAutomatic translation (text transformation which (text transformation which uses meaning models)uses meaning models)

2.3 Semantic Web Tools

PersistentPersistent storage storage and query interpretersquery interpreters (XML databases/XQuery, RDF repositories/RQL)

OntologyOntology visualizers visualizers and editorseditors (OntoEdit, Protégé, etc.)

Ontology navigatorsnavigators and semantic searchsearch enginesengines (AskJeeves, RDF Quiz, OntoSearch)

Ontology-based inference enginesinference engines (Cyc, Kaon, OMM)

Some observationsSome observations

Layers separation (data storage, data Layers separation (data storage, data communication, information description, communication, information description, terminology definition, fact inference) terminology definition, fact inference)

Layers isolation (syntactic wrapping Layers isolation (syntactic wrapping vs.vs. semantic mapping)semantic mapping)

Information processing concentrated on the Information processing concentrated on the most abstract level (ontology)most abstract level (ontology)

Hierarchy of languagesHierarchy of languages

SQL SQL XML XMLRDF RDF RDFS RDFS OWL OWL

3 The Need for Linguistic Support of Semantic Web

For combining multiple namespaces and For combining multiple namespaces and syntactic names reconciliationnames reconciliation

For For word disambiguationword disambiguation in text analysis in text analysis For semantic indexingsemantic indexing of text corpora For For resolvingresolving semantic inaccuraciessemantic inaccuracies in texts in texts

(esp. similarity, alternatives, exclusion, (esp. similarity, alternatives, exclusion, generalization,etc)generalization,etc)

For For representing text meaningrepresenting text meaning in in transformations which use an intermediate transformations which use an intermediate model of the meaning

Why:Why:

4 WordNet as Universal Linguistic Resource

Word formsWord forms (nouns, verbs, adjectives and (nouns, verbs, adjectives and adverbs) and adverbs) and lexical relationslexical relations between them between them

SynsetsSynsets and and meaning relationsmeaning relations (synonymy, (synonymy, antonymy, hyponymy, meronymy, troponimy, etc)antonymy, hyponymy, meronymy, troponimy, etc)

LexicalLexical databasedatabase (set of indexed files or a (set of indexed files or a database)database)

Command language Command language interface (originally Tcl/tk interface (originally Tcl/tk scripts for direct file manipulation, but APIs for scripts for direct file manipulation, but APIs for Java and other languages also available)Java and other languages also available)

Multi-lingualMulti-lingual thesaurithesauri (network of WordNet (network of WordNet databases for most of the languages)databases for most of the languages)

4.1 WordNet semantics

Relational modelRelational model with both standard (ATTRIBUTE, ANTONYM, ENTAILMENT, CAUSE) and transitive relationstransitive relations (HYPERNYM,HOLONYM, MERONYM)

Formally can be interpreted in first-order first-order relational structuresrelational structures (Kripke structures) – requires modal logic

For adequate representation of the relations either object-relational, or relational databasedatabase with additional indexing of the transitive relations (transitive closure) is necessary

Fig. 1 WordNet Relations

a

1

de

b

c

hi

f

g

jk

lmn

a Sem equivalent

b Sem relatedc Sem rel equivalent

d Transitively sem rel equivalent

e Lexrelatedf Lexrel sem equivalent

g Lexrel sem related

h Lexrel sem rel equivalent

i Lexrel trans sem rel equivalentj Lexrel lexrelated

k Lexrel lexrel sem equivalentl Lexrel lexrel sem related

m Lexrel lexrel sem equivalent

n Lexrel lexrel trans sem equivalent

Sets o f Re lated W ords

Synset equivalent

Transitively synsetequivalent

Lexically related

Sem antically related

Types o f W ord Re la tions

80x86(Intel)

x86(Intel-

com patible)

non-x86 (non-Intel)

PC (desktop,laptop, PDA,

etc.)

Mac (iMac,notebook, iPod, etc.)

com puter(w orkstation,

server, cluster,etc.)

4.2 Relational schema of the original WordNet thesaurus

wordword represents the syntactic word forms divided into four main categories – noun phrases, verb phrases, adjectives and adverbs

synsetsynset defines the different meaning sets used for giving semantic interpretation of the word forms

sensesense many-to-many relationship between word forms and synsets

lexrellexrel purely lexical relationships which hold between the word forms

semrelsemrel semantic relationships between the word forms which contains the semantic thesaurus

Fig. 2 Relational

schema of WordNet

frame

frameno : INTEGERdescription : VARCHAR(50)

frame_pk_idx()

sample

sampleno : INTEGERsynsetno* : INTEGERdefinition : TEXT

sample_pk_idx()sample_synset_fk_idx()

lexname

lexno : INTEGERlextyp : VARCHAR(30)description : VARCHAR(80)

lexname_pk_idx()lexname_lextyp_idx()

verbframe

wordno* : INTEGERsynsetno* : INTEGER

verbframe_pk_idx()verbframe_frame_fk_idx()verbframe_sense_fk_idx()

verbframe_frame_fk

adjmod

wordno* : INTEGERsynsetno* : INTEGERmodifier : CHAR(2)

adjmod_pk_idx()adjmod_sense_fk_idx()adjmod_modifier_idx()

word

wordno : INTEGERlemma : VARCHAR(70)

word_pk_idx()word_lemma_idx()

sense

wordno* : INTEGERsynsetno* : INTEGERtagcnt : INTEGER

sense_pk_idx()sense_word_fk_idx()sense_synset_fk_idx()sense_tagcnt_idx()

verbframe_sense_fk

adjmod_sense_fk

sense_word_fklexrel

word1* : INTEGERword2* : INTEGERsynset1* : INTEGERsynset2* : INTEGERreltypeno* : INTEGER

lexrel_pk_idx()lexrel_sense_fk1_idx()lexrel_sense_fk2_idx()lexrel_reltype_fk_idx()

lexrel_sense_fk2

lexrel_sense_fk1

synset

synsetno : INTEGERlexno* : INTEGERdefinition : TEXT = NULL

synset_pk_idx()synset_lexname_fk_idx()

sample_synset_fk synset_lexname_fk

sense_synset_fk

reltype

reltypeno : INTEGERreltyp : CHAR(1)description : VARCHAR(80)

reltype_pk_idx()reltype_reltyp_idx()

lexrel_reltype_fk

semrel

synset1* : INTEGERsynset2* : INTEGERreltypeno* : INTEGER

semrel_pk_idx()semrel_synset_fk1_idx()semrel_synset_fk2_idx()semrel_reltype_fk_idx()

semrel_synset_fk1

semrel_synset_fk2

semrel_reltype_fk

trans

synset1*synset2*reltypeno*transnosynset0*

trans_pk_idx()trans_reltype_fk_idx()trans_semrel_fk_idx()trans_semrel_fk0_idx()trans_transno_idx()

trans_semrel_fk1

trans_semrel_fk0

Used to calculate the closure of transiti...

5 Putting WordNet on the Web

SynchronousSynchronous query/response model of working query/response model of working (CGI calls)(CGI calls)

Purely Purely relational databaserelational database for storing the for storing the thesaurus (MySQL)thesaurus (MySQL)

Front-end implemented as a set of Front-end implemented as a set of servletsservlets which query the thesaurus on behalf of other which query the thesaurus on behalf of other applicationsapplications

XMLXML format of the data returned as a result of format of the data returned as a result of the queriesthe queries

SeparatedSeparated from the applications and use of from the applications and use of independent server (Tomcat)independent server (Tomcat)

Servlet CGI Parameters

Synsets wordno,word

Synonyms wordno,word

Semrels relno,reltype,relname,wordno,word

Lexrels relno,reltype,relname,wordno,word

LexrelSynonyms relno,reltype,relname,wordno,word

SemrelSynonyms relno,reltype,relname,wordno,word

LexrelSemrels lexrelation,semrelation,wordno,word

SemrelSemrels semrelation,semrelation,wordno,word

LexrelSemrelSynonyms lexrelation,semrelation,wordno,word

SemrelSemrelSynonyms semrelation,semrelation,wordno,word

Tabl. 1 Servlets to explore word relations

Part IIPart II

LinguaShare:LinguaShare:

Linguistic Web Service

for Semantic Web