“publishing and consuming linked data. (lessons learnt when using lod in an application)”

Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)

Marta Villegas

Universitat Pompeu Fabra Cercedillas, June 2015

IULA-UPF scenario

OLAC Language Resource Catalogue

OAI-PMH SERVER

Dublin Core Metashare OLAC

Metadata Formats

.....METADATA HARVESTING....

IULA-UPF moving to LOD

Ojectives: - Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit. Triple store (Virtuoso) http://lodserver.iula.upf.edu Sparql server (Virtuoso) http://lodserver.iula.upf.edu/sparql Web Browser (RoR + SPARQL) http://lod.iula.upf.edu/

When the focus shifts from growing the cloud to deploying applications

• Complex types (identity resolution) • Simple types (as instances) • Linking data (linking vs. reusing) • Data enrichment

• Approach: incremental process first bunch and curation

process

RDFying – index

RDFyinf – complex instances

</LangResourceInfo>

RDFyinf – complex instances

<langResource-URI-1>

<langResource-URI-n>

<person-URI-1>

<person-URI-2>

<person-URI-3>

Identity resolution

<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>monica.monachini@ilc.cnr.it</email> <email>risorse@ilc.cnr.it</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>

http://…/Monica_Monachini

<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>monica.monachini@ilc.cnr.it</email> <email>risorse@ilc.cnr.it</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>

http://…/Monica_Monachini

<fundingProject> <projectName> Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Languages Technologies </projectName> <projectShortName> PANACEA </projectShortName> <url> http://panacea-lr.eu/ </url> <fundingType> euFunds </fundingType> <funder> European Union </funder>

</fundingProject> <organizationInfo>

<organizationName> Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale “Antonio Zampolli” </organizationName> <organizationShortName>CNR</organizationShortName> …

For each embeded Project/Person/Organization/ 1. Generate: Subject property URI triple for the

backwards relation. – If Person then use “name_givenName” – If “short name” exists use “shortname” – Else use 20 first characters of “long name”

2. Generate URI property object triples as the result of

the union of all local declarations (where union removes duplicate triples).

– This requires a final curation task that agrees on node values

in case they are different.

– The preliminary version needs further curation (we used SPARQL select distinct to identify oddities)

RDFying Documents:

- DBLP to get full RDF descriptions - Google Scholar to get BibTex descriptions

- For a small dataset this can be assumed. For big

datasets this needs a lot of work (some automatic tasks may be defined)

<document>Quochi V, Frontini F, Rubino F. A MWE Acquisition and Lexicon Builder Web Service. COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical

Papers,8-15 December 2012, Mumbai, India</document>

RDFying - Where to stop?

BIBTEX: @inproceedings {quochi2012mwe, title={A MWE Acquisition and Lexicon Builder Web Service.}, author={Quochi, Valeria & Frontini, Francesca & Rubino, Francesco}, booktitle={COLING}, year={2012}}

DBLP <http://dblp.uni-trier.de/rec/conf/coling/QuochiFR12 > owl:sameAs <http://dblp.org/rec/conf/coling/QuochiFR12> ; dblp:title “A MWE Acquisition and Lexicon Builder Web Service”; dblp:authoredBy <http://dblp.uni-trier.de/pers/q/Quochi:Valeria>; dblp:authoredBy <http://dblp.uni-trier.de/pers/f/Frontini:Francesca>; dblp:authoredBy <http://dblp.uni-trier.de/pers/r/Rubino:Francesco >; dblp:publishedAsPartOf <http://dblp.uni-trier.de/rec/conf/coling/2012 >: dblp:yearOfPublication “2012” .

Article title creator Mikel Forcada subject discourse analysis, question answering keywords NER, LMF, ... references FreeLing, TreeBank, PANACEA ... language English

RDFying- simple types

<subject>Gender Studies</subject> <usage>NER</usage> <format>XCES</format> <standard>LMF</standard>

Not only Enumerations but also string elements !!!

RDFying - simple types as instances

Value Value counter

Resource counter

eng 518 476 en 215 174 EN 120 120 Spa 390 376 es 77 71 ES 10 10

Language codes in MS central node

Enumerations: object property + Class + instances + checking existing vocabularies ‘free strings’: 1) generate data type property + string value. 2) curation process that: a) identifies ‘enumeration like’ candidates (eg. language) and choose an appropriate Vocabulary b) Match value strings to relevant URIS (Dbpedia)

SELECT DISTINCT ?language WHERE { ?s ms:languageId ?language }

(eng , en , EN …) INSERT { ?s ms:language <http://.../English>.} WHERE { ?s ms:language “EN". } DELETE { ?s ms:language “EN". }

Curation using SPARQL

Linking data !!

Person Organization Document Project

Enumerations String valued

VIAF ORCID DBLP Vocabularies

DBpedia

Linking data !! – linking vs reusing

documentation sameAs

documentation

Linking data !! – linking vs reusing

http://lod.iula.upf.edu/resources/PAN_metadata_MW_ENV_IT http://lod.iula.upf.edu/resources/doc_37

local URIs

external URIs

Core concepts which belong to some ‘local’ Class.

Instances which belong to some ‘external’ Class:

• Person (FOAF) • Document (BIBO) • Organisation (FOAF) •….

But, some functional reasons:

Why all this ? Is it worth it?

- Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit.

Any good article or tool ?

Projects

Services Articles

Reports

Named Entity

SELECT * WHERE { ?s ?p ms:NER }

Why all this ? – IULA at MS central node

P E R S O N

description

A N I S AT I O N

description

R E S O U R C E

description

L I C E N S E

description

D O C U M E N T

description

P E R S O N

description

P R O J E C T

description

SELECT * FROM WHERE { … ...} HELP!!

Everything about IULA?

HELP!!

SELECT * WHERE { ?s ?p “IULA” }

sample data (855 records)

Why all this ? – data Mashups

Backwards relations

• LOD opens new possibilities and SPARQL is a powerful tool

BUT • Curation task is crucial and effort/time consuming. You can

address it as an incremental process.

Publishing LOD vs. deploying LOD applications

• Until now, the LOD community seems to focus on “growing the cloud”

• In this scenario, creating new URIs and mapping to existing URIs is OK but,

• when the focus shifts from growing the cloud to developing applications, new problems will arise: massive redundancy of URIs, trust on third party servers/data, …

Conclussions & reflections

“publishing and consuming linked data. (lessons learnt when using lod in an application)”

Presentations & Public Speaking

lod/lam presentation

lod ovarian drilling

r.d. knight, j.m. bednarski, e. grunsky, h.a.j....

pyrmont bridge - buildingsmart · 2018. 6. 26. · pyrmont...

aat lod microthesauri

cigs lod dod_gh_20131118

cat lod food chart

cigs lod publishlod_gd_20131118

current trends of japanese lod -through lod challenge japan

s:b @ 0.1 pg/ml: mean = 3.87 sd = 0.249 %cv = 6.4% mean lod...

october 03, 2017 draft for public comment · the lod...

bbc and the lod

lod for entrepreneurs 20111115

new lodチャレンジ2012 に挑戦しよう！ - ospn ·...

the level of development (lod) specification 2015 (lod 2015)

cigs lod rcahms_seneschal_pm_20131118

lod challenge

lod levels of detail

dx11 tessellation...let lod be the tessfactor at each edge...

how to teach lod?