“publishing and consuming linked data. (lessons learnt when using lod in an application)”
Post on 17-Jan-2017
22 Views
Preview:
TRANSCRIPT
Publishing and Consuming Linked Data. (Lessons learnt when using LOD in an application)
Marta Villegas
Universitat Pompeu Fabra Cercedillas, June 2015
IULA-UPF scenario
OLAC Language Resource Catalogue
OAI-PMH SERVER
Dublin Core Metashare OLAC
Metadata Formats
.....METADATA HARVESTING....
IULA-UPF moving to LOD
Ojectives: - Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit. Triple store (Virtuoso) http://lodserver.iula.upf.edu Sparql server (Virtuoso) http://lodserver.iula.upf.edu/sparql Web Browser (RoR + SPARQL) http://lod.iula.upf.edu/
RDF
When the focus shifts from growing the cloud to deploying applications
• Complex types (identity resolution) • Simple types (as instances) • Linking data (linking vs. reusing) • Data enrichment
• Approach: incremental process first bunch and curation
process
RDFying – index
RDFyinf – complex instances
<Document>
<Person>
<Organisation>
<Project>
<LangResourceInfo> <identificationInfo>
<distributionInfo>
<contactPerson>
<metadataInfo>
<validationInfo>
<resourceDocumentationInfo>
<resourceCreationInfo>
<resourceComponentType>
</LangResourceInfo>
RDFyinf – complex instances
<langResource-URI-1>
<langResource-URI-2>
<langResource-URI-3>
<langResource-URI-n>
<person-URI-1>
<person-URI-2>
<person-URI-3>
=?
Identity resolution
<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>monica.monachini@ilc.cnr.it</email> <email>risorse@ilc.cnr.it</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>
http://…/Monica_Monachini
<contactPerson> <surname>Monachini</surname> <givenName>Monica</givenName> <communicationInfo> <email>monica.monachini@ilc.cnr.it</email> <email>risorse@ilc.cnr.it</email> <url>http://www.ilc.cnr.it/</url> <address>Via Giuseppe Moruzzi</address> <zipCode>56124</zipCode> <city>Pisa</city> <country>Italy</country> </communicationInfo> <affiliation> <organizationName>………</organizationName> <departmentName>Istituto …</departmentName> <communicationInfo> </affiliation> </contactPerson>
http://…/Monica_Monachini
<fundingProject> <projectName> Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Languages Technologies </projectName> <projectShortName> PANACEA </projectShortName> <url> http://panacea-lr.eu/ </url> <fundingType> euFunds </fundingType> <funder> European Union </funder>
</fundingProject> <organizationInfo>
<organizationName> Consiglio Nazionale delle Ricerche. Istituto di Linguistica Computazionale “Antonio Zampolli” </organizationName> <organizationShortName>CNR</organizationShortName> …
For each embeded Project/Person/Organization/ 1. Generate: Subject property URI triple for the
backwards relation. – If Person then use “name_givenName” – If “short name” exists use “shortname” – Else use 20 first characters of “long name”
2. Generate URI property object triples as the result of
the union of all local declarations (where union removes duplicate triples).
– This requires a final curation task that agrees on node values
in case they are different.
– The preliminary version needs further curation (we used SPARQL select distinct to identify oddities)
RDFying Documents:
- DBLP to get full RDF descriptions - Google Scholar to get BibTex descriptions
- For a small dataset this can be assumed. For big
datasets this needs a lot of work (some automatic tasks may be defined)
<document>Quochi V, Frontini F, Rubino F. A MWE Acquisition and Lexicon Builder Web Service. COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical
Papers,8-15 December 2012, Mumbai, India</document>
RDFying - Where to stop?
BIBTEX: @inproceedings {quochi2012mwe, title={A MWE Acquisition and Lexicon Builder Web Service.}, author={Quochi, Valeria & Frontini, Francesca & Rubino, Francesco}, booktitle={COLING}, year={2012}}
DBLP <http://dblp.uni-trier.de/rec/conf/coling/QuochiFR12 > owl:sameAs <http://dblp.org/rec/conf/coling/QuochiFR12> ; dblp:title “A MWE Acquisition and Lexicon Builder Web Service”; dblp:authoredBy <http://dblp.uni-trier.de/pers/q/Quochi:Valeria>; dblp:authoredBy <http://dblp.uni-trier.de/pers/f/Frontini:Francesca>; dblp:authoredBy <http://dblp.uni-trier.de/pers/r/Rubino:Francesco >; dblp:publishedAsPartOf <http://dblp.uni-trier.de/rec/conf/coling/2012 >: dblp:yearOfPublication “2012” .
Article title creator Mikel Forcada subject discourse analysis, question answering keywords NER, LMF, ... references FreeLing, TreeBank, PANACEA ... language English
RDFying- simple types
<subject>Gender Studies</subject> <usage>NER</usage> <format>XCES</format> <standard>LMF</standard>
Not only Enumerations but also string elements !!!
RDFying - simple types as instances
RDFying - simple types as instances
Value Value counter
Resource counter
eng 518 476 en 215 174 EN 120 120 Spa 390 376 es 77 71 ES 10 10
Language codes in MS central node
Enumerations: object property + Class + instances + checking existing vocabularies ‘free strings’: 1) generate data type property + string value. 2) curation process that: a) identifies ‘enumeration like’ candidates (eg. language) and choose an appropriate Vocabulary b) Match value strings to relevant URIS (Dbpedia)
RDFying - simple types as instances
SELECT DISTINCT ?language WHERE { ?s ms:languageId ?language }
(eng , en , EN …) INSERT { ?s ms:language <http://.../English>.} WHERE { ?s ms:language “EN". } DELETE { ?s ms:language “EN". }
Curation using SPARQL
RDFying - simple types as instances
Linking data !!
Person Organization Document Project
Enumerations String valued
VIAF ORCID DBLP Vocabularies
DBpedia
Linking data !! – linking vs reusing
documentation sameAs
documentation
Linking data !! – linking vs reusing
http://lod.iula.upf.edu/resources/PAN_metadata_MW_ENV_IT http://lod.iula.upf.edu/resources/doc_37
local URIs
external URIs
Core concepts which belong to some ‘local’ Class.
Instances which belong to some ‘external’ Class:
• Person (FOAF) • Document (BIBO) • Organisation (FOAF) •….
But, some functional reasons:
Why all this ? Is it worth it?
- Displaying data to the user in a comprehensive way - Aggregating external data in a sensitive manner - Making hidden implicit relations explicit.
<usage>NER</usage> <format>XCES</format> <standard>LMF</standard>
Any good article or tool ?
NER
Projects
Services Articles
Reports
Named Entity
SELECT * WHERE { ?s ?p ms:NER }
IULA?
10!
Why all this ? – IULA at MS central node
IULA?
104
P E R S O N
ID
name
description
...
A N I S AT I O N
ID
name
description
...
R E S O U R C E
ID
name
description
...
L I C E N S E
ID
name
description
...
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
Has_
ID
ID
D O C U M E N T
ID
name
description
...
P E R S O N
ID
name
description
...
P R O J E C T
ID
name
description
...
SELECT * FROM WHERE { … ...} HELP!!
Everything about IULA?
HELP!!
SELECT * WHERE { ?s ?p “IULA” }
SELECT * WHERE { ?s ?p “IULA” }
sample data (855 records)
sample data (855 records)
Why all this ? – data Mashups
Backwards relations
• LOD opens new possibilities and SPARQL is a powerful tool
BUT • Curation task is crucial and effort/time consuming. You can
address it as an incremental process.
Publishing LOD vs. deploying LOD applications
• Until now, the LOD community seems to focus on “growing the cloud”
• In this scenario, creating new URIs and mapping to existing URIs is OK but,
• when the focus shifts from growing the cloud to developing applications, new problems will arise: massive redundancy of URIs, trust on third party servers/data, …
Conclussions & reflections
top related