“publishing and consuming linked data. (lessons learnt when using lod in an application)”

Post on 17-Jan-2017

22 Views

Category:

Presentations & Public Speaking

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)

Marta Villegas

Universitat Pompeu Fabra Cercedillas, June 2015

IULA-UPF scenario

OLAC Language Resource Catalogue

OAI-PMH SERVER

Dublin Core Metashare OLAC

Metadata Formats

.....METADATA HARVESTING....

IULA-UPF moving to LOD

Ojectives: - Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit. Triple store (Virtuoso) http://lodserver.iula.upf.edu Sparql server (Virtuoso) http://lodserver.iula.upf.edu/sparql Web Browser (RoR + SPARQL) http://lod.iula.upf.edu/

RDF

When the focus shifts from growing the cloud to deploying applications

• Complex types (identity resolution) • Simple types (as instances) • Linking data (linking vs. reusing) • Data enrichment

• Approach: incremental process first bunch and curation

process

RDFying – index

RDFyinf – complex instances

<Document>

<Person>

<Organisation>

<Project>

<LangResourceInfo> <identificationInfo>

<distributionInfo>

<contactPerson>

<metadataInfo>

<validationInfo>

<resourceDocumentationInfo>

<resourceCreationInfo>

<resourceComponentType>

</LangResourceInfo>

RDFyinf – complex instances

<langResource-URI-1>

<langResource-URI-2>

<langResource-URI-3>

<langResource-URI-n>

<person-URI-1>

<person-URI-2>

<person-URI-3>

=?

Identity resolution

<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>monica.monachini@ilc.cnr.it</email> <email>risorse@ilc.cnr.it</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>

http://…/Monica_Monachini

<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>monica.monachini@ilc.cnr.it</email> <email>risorse@ilc.cnr.it</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>

http://…/Monica_Monachini

<fundingProject> <projectName> Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Languages Technologies </projectName> <projectShortName> PANACEA </projectShortName> <url> http://panacea-lr.eu/ </url> <fundingType> euFunds </fundingType> <funder> European Union </funder>

</fundingProject> <organizationInfo>

<organizationName> Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale “Antonio Zampolli” </organizationName> <organizationShortName>CNR</organizationShortName> …

For each embeded Project/Person/Organization/ 1. Generate: Subject property URI triple for the

backwards relation. – If Person then use “name_givenName” – If “short name” exists use “shortname” – Else use 20 first characters of “long name”

2. Generate URI property object triples as the result of

the union of all local declarations (where union removes duplicate triples).

– This requires a final curation task that agrees on node values

in case they are different.

– The preliminary version needs further curation (we used SPARQL select distinct to identify oddities)

RDFying Documents:

- DBLP to get full RDF descriptions - Google Scholar to get BibTex descriptions

- For a small dataset this can be assumed. For big

datasets this needs a lot of work (some automatic tasks may be defined)

<document>Quochi V, Frontini F, Rubino F. A MWE Acquisition and Lexicon Builder Web Service. COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical

Papers,8-15 December 2012, Mumbai, India</document>

RDFying - Where to stop?

BIBTEX: @inproceedings {quochi2012mwe, title={A MWE Acquisition and Lexicon Builder Web Service.}, author={Quochi, Valeria & Frontini, Francesca & Rubino, Francesco}, booktitle={COLING}, year={2012}}

DBLP <http://dblp.uni-trier.de/rec/conf/coling/QuochiFR12 > owl:sameAs <http://dblp.org/rec/conf/coling/QuochiFR12> ; dblp:title “A MWE Acquisition and Lexicon Builder Web Service”; dblp:authoredBy <http://dblp.uni-trier.de/pers/q/Quochi:Valeria>; dblp:authoredBy <http://dblp.uni-trier.de/pers/f/Frontini:Francesca>; dblp:authoredBy <http://dblp.uni-trier.de/pers/r/Rubino:Francesco >; dblp:publishedAsPartOf <http://dblp.uni-trier.de/rec/conf/coling/2012 >: dblp:yearOfPublication “2012” .

Article title creator Mikel Forcada subject discourse analysis, question answering keywords NER, LMF, ... references FreeLing, TreeBank, PANACEA ... language English

RDFying- simple types

<subject>Gender Studies</subject> <usage>NER</usage> <format>XCES</format> <standard>LMF</standard>

Not only Enumerations but also string elements !!!

RDFying - simple types as instances

RDFying - simple types as instances

Value Value counter

Resource counter

eng 518 476 en 215 174 EN 120 120 Spa 390 376 es 77 71 ES 10 10

Language codes in MS central node

Enumerations: object property + Class + instances + checking existing vocabularies ‘free strings’: 1) generate data type property + string value. 2) curation process that: a) identifies ‘enumeration like’ candidates (eg. language) and choose an appropriate Vocabulary b) Match value strings to relevant URIS (Dbpedia)

RDFying - simple types as instances

SELECT DISTINCT ?language WHERE { ?s ms:languageId ?language }

(eng , en , EN …) INSERT { ?s ms:language <http://.../English>.} WHERE { ?s ms:language “EN". } DELETE { ?s ms:language “EN". }

Curation using SPARQL

RDFying - simple types as instances

Linking data !!

Person Organization Document Project

Enumerations String valued

VIAF ORCID DBLP Vocabularies

DBpedia

Linking data !! – linking vs reusing

documentation sameAs

documentation

Linking data !! – linking vs reusing

http://lod.iula.upf.edu/resources/PAN_metadata_MW_ENV_IT http://lod.iula.upf.edu/resources/doc_37

local URIs

external URIs

Core concepts which belong to some ‘local’ Class.

Instances which belong to some ‘external’ Class:

• Person (FOAF) • Document (BIBO) • Organisation (FOAF) •….

But, some functional reasons:

Why all this ? Is it worth it?

- Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit.

<usage>NER</usage> <format>XCES</format> <standard>LMF</standard>

Any good article or tool ?

NER

Projects

Services Articles

Reports

Named Entity

SELECT * WHERE { ?s ?p ms:NER }

IULA?

10!

Why all this ? – IULA at MS central node

IULA?

104

P E R S O N

ID

name

description

...

A N I S AT I O N

ID

name

description

...

R E S O U R C E

ID

name

description

...

L I C E N S E

ID

name

description

...

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

Has_

ID

ID

D O C U M E N T

ID

name

description

...

P E R S O N

ID

name

description

...

P R O J E C T

ID

name

description

...

SELECT * FROM WHERE { … ...} HELP!!

Everything about IULA?

HELP!!

SELECT * WHERE { ?s ?p “IULA” }

SELECT * WHERE { ?s ?p “IULA” }

sample data (855 records)

sample data (855 records)

Why all this ? – data Mashups

Backwards relations

• LOD opens new possibilities and SPARQL is a powerful tool

BUT • Curation task is crucial and effort/time consuming. You can

address it as an incremental process.

Publishing LOD vs. deploying LOD applications

• Until now, the LOD community seems to focus on “growing the cloud”

• In this scenario, creating new URIs and mapping to existing URIs is OK but,

• when the focus shifts from growing the cloud to developing applications, new problems will arise: massive redundancy of URIs, trust on third party servers/data, …

Conclussions & reflections

top related