linked(dataand(language( technologies:(the(liderproject ·...
TRANSCRIPT
2014.05.08 1 Presenter name
Linked Data and Language Technologies: The LIDER project
A. Gómez-‐Pérez (UPM)
Project Coordinator
CSA Budget: 1.482.000€ Starting date: 1. Nov. 2013 Duration: 2 Years 163 PM
2014.05.08 2 Asun Gómez-‐Pérez
The LIDER consorIum
2
Universidad Politécnica de Madrid (UPM, Spain) [COORDINATOR]
Trinity College Dublin (Ireland) DFKI (Germany)
National University of Ireland, Galway (Ireland)
Institut für Angewandte Informatik EV (INFAI, Germany) University of Bielefeld (Germany) Universita degli Studi di Roma La Sapienza (Italy)
GEIE ERCIM (France)
2014.05.08 4 Asun Gómez-‐Pérez
hKp://es.wikIonary.org
hKp://rae.es
hKp://www.wikilengua.org/index.php/Terminesp:red
hKp://es.wikipedia.org
hKp://www.wordreference.com/sinonimos/
An example
“Red” (computer network)
2014.05.08 5 Asun Gómez-‐Pérez
5
hKp://rae.es
Complex queries using data from heterogeneous sources
2014.05.08 6 Asun Gómez-‐Pérez
6 *Picture a@ribu#on: h@p://commons.wikimedia.org/wiki/User:Gugerell
hKp://es.wikIonary.org
hKp://rae.es
2014.05.08 7 Asun Gómez-‐Pérez
7 *Picture a@ribu#on: h@p://commons.wikimedia.org/wiki/User:Gugerell
hKp://es.wikIonary.org
hKp://rae.es
hKp://www.wikilengua.org/index.php/Terminesp:red
2014.05.08 8 Asun Gómez-‐Pérez
8 *Picture a@ribu#on: h@p://commons.wikimedia.org/wiki/User:Gugerell
hKp://es.wikIonary.org
hKp://rae.es
hKp://www.wikilengua.org/index.php/Terminesp:red
hKp://www.wordreference.com/sinonimos/
2014.05.08 9 Asun Gómez-‐Pérez
9 *Picture a@ribu#on: h@p://commons.wikimedia.org/wiki/User:Gugerell
hKp://es.wikIonary.org
hKp://rae.es
hKp://www.wikilengua.org/index.php/Terminesp:red
hKp://es.wikipedia.org
hKp://www.wordreference.com/sinonimos/
2014.05.08 10 Asun Gómez-‐Pérez
*Picture a@ribu#on: h@p://commons.wikimedia.org/wiki/User:Gugerell
“Red”
Etimologiy Del latin “rete”
Gender: “f”
Definition.: “Conjunto de ordenadores o de equipos informáticos conectados entre sí….”
“Red”
Sinonyms: “sistema”, “malla”,” distribución”
“Red”
Norm: UNE 21302-131
English: network
German: Netzwerk
“Red”
Pronunciation: [red]
Grammar category: sustantivo femenino
Singular: “red”
Plural: “redes”
“Red_de_computadores”
Category: redes informáticas
Image
Complementary but not connected
2014.05.08 11 Asun Gómez-‐Pérez
Heterogeneity of LinguisIc Resources
• Ecosystem of – Open and Closed resources
– Complementary resources
• Lexicon • Corpora • DicIonaries • ….
– Heterogeneous formats • E.g, for Lexicons: Lexinfo, LMF, LIR, Lemon, …
– Language Resources available on the web
• Meta-‐share, ELDA, ELRA, Clarin, FLaReNet, MulIJedi,
– ProperIes • Mature • Curated • Clear Liability
2014.05.08 12 Asun Gómez-‐Pérez
LimitaIons when using LRs
Finding and reusing LR in third party applicaIons is manual and Ime consuming
2014.05.08 13 Asun Gómez-‐Pérez
Linked Data allows linguisIc metadata and linguisIc data
integraIon
2014.05.08 14 Asun Gómez-‐Pérez
LD allows linguisIc data integraIon
14
Red
Phonetic form Form
number singular
[RED]
Form
plural [REDES]
Phonetic form number
Red Sense
written form “red”
Sense written form
“malla”
equivalent
Red
image
Red
Sense Sense
translation es - en
written form
“red” “network”
written form
Red
written form
Form
gender
femenine
“red”
2014.05.08 15 Asun Gómez-‐Pérez
Linked Open Data and Language
LOD interconnects resources
– In many domains – in many languages – LOD is increasingly
mulIlingual
Music
Geographic Life Sciences
Publications E-Gov
On-line activities
Cross-domains
How many Linguistic Resources are exposed in RDF?
How many Linguistic Resources are exposed in RDF?
2014.05.08 16 Asun Gómez-‐Pérez
LinguisIc Linked (Open) Data
q Subset of LOD q LinguisIc domain q Open License q Resources in RDF q Interconnected with other LD resources
Requirements: Keep track of the License (open or closed) informaIon Keep track of the Provenance of the resource Keep track of the use of the resource
2014.05.08 17 Asun Gómez-‐Pérez
Linked Data and Language Resources
• Uniform access to Language Resources – Agree on vocabularies for describing LR metadata and content
– Unified and standardized language for describing resources ( RDF(S))
– Unified and standardized query language (SPARQL)
– Standardized non-‐proprietary APIs • Links to other resources
2014.05.08 18 Asun Gómez-‐Pérez
What is 3LD?
3LD Linguis#c Linked Licensed Data
Language resources such as:
-‐ Lexica -‐ Corpora -‐ DicEonaries ..
NIF NLP Interchange Format
Using RDF and standard data models (vocabularies): -‐ Lexica
-‐ Corpora
ODRL Open Digital Rights Language
Published along with a machine-‐readable
license.
2014.05.08 21 Asun Gómez-‐Pérez
Industry use cases
Technical ac#vi#es
Community building
networking
2014.05.08 22 Asun Gómez-‐Pérez
Technical acIviIes
• Which extensions to the LOD are needed to support a new generaIon of large-‐scale content analyIcs applicaIons that will overcome language barriers. – Vocabularies – Expose LinguisIc Resources in LD format with license informaIon
• Metadata • Content
– Guidelines for LinguisIc Linked Licensed Data (3LD) – SpecificaIon of a new generaIon of 3LD aware NLP services – Reference architecture – Roadmap
2014.05.08 24 Asun Gómez-‐Pérez
Technical ac#vi#es 1. Roadmap on 3LD for
Content Analy#cs 2. Guidelines for 3LD 3. 3LD Reference
Architecture
Community building
networking LD4LT
BP-‐MLOD W3C-‐CG OntoLex W3C-‐CG
2014.05.08 25 Asun Gómez-‐Pérez
Community building Networking
• CreaIon of an open and sustainable worldwide community around LinguisIc Linked Data for content analyIcs
• Means – Community building acIviIes – Open community events – Community portal
LD4LT
BP-‐MLOD OntoLex
2014.05.08 26 Asun Gómez-‐Pérez
W3C LD4LT community group
• Reach agreement on core metadata ontologies for LR – Input: META-‐SHARE, CLARIN, LRE Map, … – 62 parIcipants – Started on April 2014
• Guidelines for – migraIng exisIng LR metadata into RDF – publishing LRs content (lexica, corpora, …) as linked data
hKp://www.w3.org/community/ld4lt/ hKp://www.w3.org/community/
bpmlod/
2014.05.08 27 Asun Gómez-‐Pérez
Community Building 1. Surveys to localizaIon industry and general Web companies
h@ps://www.w3.org/2002/09/wbs/68293/LD4LT-‐1/ 2. Open community Events
– Roadmapping Workshops 2013 • 21 March, EDF (Athens) • 7-‐8 May, MulIlingual Web WS (Madrid) • 26-‐27 May, WS on EmoIons (LREC – Reykjavik) • 27 May, WS on LD and LinguisIcs (LREC – Reykjavik) • 4-‐6 June, WS on LocalizaIon World (Dublin) • 2 September, WS on SemanIcs Conference (Leipzig)
– Tutorial @ LREC on LD and LT – Hackathon on LD and LT on September -‐ SemanIcs Conference (Leipzig)
3. W3C community groups for publicaIon of best pracIces – Linked data and Language Technologies (LD4LT) – Best pracIces for MulIlingual Linked Open Data (BP-‐MLOD W3C-‐CG) – OntoLex W3C-‐CG
2014.05.08 28 Asun Gómez-‐Pérez
Industry use cases
Technical ac#vi#es 1. Roadmap on 3LD for
Content Analy#cs 2. Guidelines for 3LD 3. 3LD Reference
Architecture
Community building
networking LD4LT
BP-‐MLOD W3C-‐CG OntoLex W3C-‐CG
.- Surveys
.- Requirements WP1
WP2, 3
WP4
2014.05.08 29 Asun Gómez-‐Pérez
Dran ClassificaIon of 3LD Use Cases
Use of NLP
Uses 3LD to tune/train NLP
Use NLP to curate and/or leverage mulIlingual &
mulImedia content annotated with 3LD
Enrich 3LD with NLP
Sell NLP services
Access to more/beKer/cheaper data for client services
Make it easier for clients to train their
NLP
Leverage content + meta-‐data
Convert to 3LD for publishing/sale
Link to other resources for added value/aKribuIon