![Page 1: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/1.jpg)
Linguistic Linked Open Data: What’s in for (Deep) Machine Translation?
Christian [email protected]
DeepMT, Sep 4th, 2015, Prague
![Page 2: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/2.jpg)
Linked Open Data
Basic Concepts
![Page 3: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/3.jpg)
3
Linked Open Data (LOD)
Plenty of Resources, linked with each other ;)Aug 2014
![Page 4: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/4.jpg)
4
Linked Open Data (LOD)
• However, LOD pertains not so much to a resource (or a bundle of resources), but to a philosophy
• Best practices for publishing data on the web– Goals
• reusability• accessibility• transparent and explicit semantics
– esp. for links
![Page 5: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/5.jpg)
Linked (Open) Data, informally
• use URIs as names for things (1)– links to external URIs (links) allow us to retrieve more
information from these sites• if they can be resolved via HTTP (2)• and provide information via SPARQL/RDF* (3)• and they include links to other URIs (4)Þ then, this is Linked Data
http://www.w3.org/DesignIssues/LinkedData.html<Nr>
![Page 6: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/6.jpg)
6
Linked Open Data: The 5 star plan
![Page 7: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/7.jpg)
7
From Tables …
PHOnetics Information Base and LExicon (PHOIBLE) Moran, S. 2012. Using Linked Data to Create a Typological Knowledge Base. In Chiarcos, C., Nordhoff, S., and Hellmann, S. (eds), Linked Data in Linguistics: Representing and Connecting Language Data and Language Metadata. Springer, Heidelberg.
![Page 8: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/8.jpg)
8
From Tables to RDF …
Subject(primary key)
![Page 9: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/9.jpg)
9
From Tables to RDF …
Subject
Property(„Relation“)
![Page 10: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/10.jpg)
10
From Tables to RDF …
Subject
Property(„Relation“)
Object
![Page 11: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/11.jpg)
11
From Tables to RDF …
1. Decompose tables into triples, i.e.,– entity attribute value resp.– Subject Property Object
Subject
Property(„Relation“)
Object
![Page 12: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/12.jpg)
12
From Tables to RDF …
1. Decompose tables into triples, i.e.,– entity attribute value resp.– Subject Property Object
Subject
Property(„Relation“)
Object
tha u:glyph
![Page 13: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/13.jpg)
13
From Tables to RDF …
1. Decompose tables into triples, i.e.,– entity attribute value resp.– Subject Property Object
Subject
Property(„Relation“)
Object
tha u:glyph
We chose “hasSegment” for the property corresponding to column “glyph”
![Page 14: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/14.jpg)
14
From Tables to RDF …
1. Decompose tables into triples, i.e.,– entity attribute value resp.– Subject Property Object
Subject
Property(„Relation“)
Object
tha u:hasSegment
We chose “hasSegment” for the property corresponding to column “glyph”
![Page 15: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/15.jpg)
15
From Tables to RDF …
1. Decompose tables into triples2. Multiple triples constitute a graph
![Page 16: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/16.jpg)
16
From Tables to RDF …
1. Decompose tables into triples2. Multiple triples constitute a graph3. A graph can aggregate triples from other sources, as well
![Page 17: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/17.jpg)
17
From Tables to RDF …
Graphs can be represented in other ways, but RDF allows us to
1. Provide explicit semantics (RDF Schema, Ontology)
2. Check consistency and infer implicit information
3. Merge (not only syntactically, but semantically)
4. Query
5. Link (enrich with external data)
![Page 18: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/18.jpg)
18
From Tables to RDF …
Graphs can be represented in other ways, but RDF allows us to
1. Provide explicit semantics (RDF Schema, Ontology)
2. Check consistency and infer implicit information
3. Merge (not only syntactically, but semantically)
4. Query
5. Link (enrich with external data)
RDFS, OWL
![Page 19: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/19.jpg)
19
From Tables to RDF …
Graphs can be represented in other ways, but RDF allows us to
1. Provide explicit semantics (RDF Schema, Ontology)
2. Check consistency and infer implicit information
3. Merge (not only syntactically, but semantically)
4. Query
5. Link (enrich with external data) URIs & SPARQL
![Page 20: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/20.jpg)
20
Uniform Resource Identifiers (URIs)
● Agree on a common vocabulary and names for entities● URIs provide globally unique identifiers
“hasSegment”
vs.
<http://mlode.nlp2rdf.org/resource/phoible/hasSegment>
vs.
@prefix phoible: <http://mlode.nlp2rdf.org/resource/phoible/>
... phoible:hasSegment ...
string, not unambiguous
URIs
![Page 21: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/21.jpg)
21
Turtle
• Simple triple notation
Subject-URI Property-URI Object-URI .Subject-URI Property-URI “Literal value” .
e.g., phoible:khm phoible:hasSegment "u:".
![Page 22: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/22.jpg)
22
SPARQL
Merge data and query it using the W3C standard SPARQL (SPARQL Protocol and Query Language)
“the SQL of the Semantic Web”
SELECT DISTINCT ?languageWHERE {
?language phoible:hasSegment ?segment .?segment phoible:hasFeature phoible:delayed_release .
}
![Page 23: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/23.jpg)
23
From Tables to RDF to Linked Data
• use URIs as names for things (1)– links to external URIs (links) allow us to retrieve more information
from these sites
• if they can be resolved via HTTP (2)• and provide information via SPARQL/RDF* (3)• and they include links to other URIs (4)Þ then, this is Linked Data
@prefix phoible: <http://mlode.nlp2rdf.org/resource/phoible/>phoible:khm phoible:hasSegment "u:".phoible:khm owl:sameAs <http://lexvo.org/id/iso639-3/khm>.
Turtle notation
http://www.w3.org/DesignIssues/LinkedData.html
![Page 24: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/24.jpg)
24
From Tables to RDF to Linked Data
@prefix phoible: <http://mlode.nlp2rdf.org/resource/phoible/>phoible:khm phoible:hasSegment "u:".phoible:khm owl:sameAs <http://lexvo.org/id/iso639-3/khm>.
Turtle notation
![Page 25: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/25.jpg)
25
Linked Open Data: The 5 star plan
![Page 26: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/26.jpg)
26
Linked Open Data (LOD, Aug 2014)
Linguistic Linked Open Data
![Page 27: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/27.jpg)
Linguistic Linked Open Data (LLOD)
Linguistic MotivationsA brief History
![Page 28: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/28.jpg)
28
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
![Page 29: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/29.jpg)
29
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
hence, several proposals to address them
• Independently, different groups considered using RDF/OWL as a (local) solution, e.g., for– terminology (GOLD, ISOcat, OLiA) (Farrar & Langendoen
2003, Ide & Wright 2004, Schmidt et al. 2006)
![Page 30: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/30.jpg)
30
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
hence, several proposals to address them
• Independently, different groups considered using RDF/OWL as a (local) solution, e.g., for– integrating typological data bases (TDS)
(Saulwick et al. 2005, Dimitriades et al. 2010)
![Page 31: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/31.jpg)
31
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
hence, several proposals to address them
• Independently, different groups considered using RDF/OWL as a (local) solution, e.g., for– modelling and querying multi-layer corpora
(Cassidy 2010, Chiarcos et al. 2008, Rehm et al. 2008)
![Page 32: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/32.jpg)
32
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
hence, several proposals to address them
• Independently, different groups considered using RDF/OWL as a (local) solution, e.g., for– NLP pipelines
(Buyko et al. 2008, Ribieira et al. 2012, Hellmann et al. 2013)
![Page 33: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/33.jpg)
33
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
hence, several proposals to address them
• Independently, different groups considered using RDF/OWL as a (local) solution, e.g., for– interfacing corpus and dictionary data
(Burchardt et al. 2008, Mazziotta et al. 2010)
![Page 34: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/34.jpg)
34
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
hence, several proposals to address them
• Independently, different groups considered using RDF/OWL as a (local) solution
• lexical resources long provided by the SW(Gangemi et al. 2003, Buitelaar et al. 2006)
![Page 35: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/35.jpg)
35
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
• But these activities were not coordinated– and in particular, RDF was used, but resources rarely
linked to other resources in the web of data
![Page 36: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/36.jpg)
36
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
• by the time, long been recognized as a problem
• But these activities were not coordinatedÞ Interest in RDF, barely any links Þ need for establishing communication channels
![Page 37: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/37.jpg)
37
Language Resources, 2010 AD
• used in natural language processing, scientific research, language documentation, ...
• accessibility challenges– different formats, schemes– distributed– dispersed metadata collections
Þ require common specifications to represent, share, access and register language resourcesLinguistic Linked Open Data (LLOD) & LLOD cloud
Community Building
• by the time, long been recognized as a problem
• But these activities were not coordinatedÞ Interest in RDF, barely any links Þ need for establishing communication channels
![Page 38: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/38.jpg)
38
OKFN Open Linguistics Working Group (OWLG)
• founded in Oct 2010 in Berlin, Germany– Working group of the Open Knowledge Foundation
• open network of individuals interested in– linguistic resources and/or – their publication under open licenses
• multi-disciplinary– NLP/CL, typology/language documentation, SW, …
• infrastructure – mailing list, web site/blog, wiki– http://linguistics.okfn.org
![Page 39: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/39.jpg)
39
OWLG activities(http://linguistics.okfn.org)
– promoting open linguistic resourcesÞraising awareness, collecting metadata
(datahub.io)– facilitating wide-range community activities
• workshops, mailing list, publications– Linked Data in Linguistics (LDL)– Multilingual Linked Open Data for Enterprises (MLODE)– Linked Data in Linguistic Typology (LDLT)
![Page 40: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/40.jpg)
40
OWLG activities(http://linguistics.okfn.org)
– promoting open linguistic resourcesÞraising awareness, collecting metadata
(datahub.io)– facilitating wide-range community activities
• workshops, mailing list, publications• facilitating exchange between and among more
specialized community groups– W3C OntoLex, BP-MLOD, LD4LT, ...– ACL SIGs (SIGLEX, SIGANN), ...– DGfS, MPI-EVA, ...
![Page 41: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/41.jpg)
41
LLOD cloud
• a collection of linguistic resources– published under open licenses– as linked data– decentralized developed and maintained– meta data at http://datahub.io
=> cloud diagram
– developed as a community effort in the context of the Open Linguistics Working Group of the Open Knowledge Foundation
![Page 42: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/42.jpg)
42
2011 a sketch on a table napkin
Mar 2012Chiarcos et al. (2012), LDL book
Sep 2012MLODE hackathon to produce first diagram from original (meta) data
2012-2014more data, more rigid quality constrantsemergence of related W3C Community Groups
Aug 2014top-level category in the LOD diagram
Workshop series
Linked Data in Linguistics(LDL, anually)
Multilingual Linked Open Data for Enterprises
(MLODE, bi-annually)
Linked Data in Linguistic Typology
(LDLT, 2013)
Building the LLOD Cloud
![Page 43: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/43.jpg)
43linguistic-lod.org
![Page 44: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/44.jpg)
44
Recent developments
• 9th International Conference on Language Resources and Evaluation (LREC-2014)– „the new hot topic in our community“
(Nicoletta Calzolari, Pres. ELRA)
• Selected LLOD events in the last 3 months– 4th Multilingual Semantic Web (Portoroz, Slovenia, June 2015)– 1st Summer Datathon on LLOD (Cercedilla, Spain, June 2015)– EUROLAN-2015 Summer School on “Linguistic Linked Open Data”
(Sibiu, Romania, July 2015)– LLOD-LSA Workshop at the Summer Institute of the Linguistic Society
of America (Chicago, IL, July 2015)– 4th Linked Data in Linguistics (Beijing, PRC, July 2015) – Ontology session at ESSLLI-2015 (Barcelona, Spain, Aug 2015)– 2nd Workshop on NLP&LOD (Hissar, Bulgaria, Sep 2015)
![Page 45: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/45.jpg)
Linked Open Data for Linguists
Possible applications
![Page 46: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/46.jpg)
46
Linked Open Data for Linguists
• Linked Data– rules of best practice for publishing data on the web
• protocols and standards• links between data sets
Þ improved access to distributed resourcesÞ improved (re-)usability of language resourcesÞ improved visibility of language resources
![Page 47: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/47.jpg)
47
Linked Open Data for Linguists
• Linked Data– rules of best practice for publishing data on the web
Þ Information integration– Structural interoperability
• comparable formats and protocols to access dataÞ use the same query language for different data sets
![Page 48: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/48.jpg)
48
Linked Open Data for Linguists
• Linked Data– rules of best practice for publishing data on the web
Þ Information integration– Structural interoperability– Conceptual interoperability
• develop and (re-)use a shared vocabularies for equivalent concepts
Þ the same query on different data sets
![Page 49: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/49.jpg)
49
Linked Open Data for Linguists
• Linked Data– rules of best practice for publishing data on the web
Þ Information integration– Structural interoperability– Conceptual interoperability– Federation
• data published on the web– with a query interface (SPARQL end point)
Þ use a single query to query different datasets
![Page 50: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/50.jpg)
50
Linked Open Data for Linguists
• Linked Data– rules of best practice for publishing data on the web
Þ Information integration– Structural interoperability– Conceptual interoperability– Federation
• data published on the web– with a query interface (SPARQL end point)
Þ use a single query to query different datasets
SPARQL 1.1
![Page 51: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/51.jpg)
51
Linked Open Data for Linguists
• Linked Data– rules of best practice for publishing data on the web
Þ Information integration– Structural interoperability– Conceptual interoperability– Federation
• data published on the web– with a query interface (SPARQL end point)
Þ use a single query to query different datasets
Achievable with any graph-based data model
![Page 52: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/52.jpg)
52
Linked Open Data for Linguists
• Linked Data– rules of best practice for publishing data on the web
Þ Information integration– Structural interoperability– Conceptual interoperability– Federation
• data published on the web– with a query interface (SPARQL end point)
Þ use a single query to query different datasets
The “killer application”, e.g., for annotated corpora
![Page 53: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/53.jpg)
53
Conceptual Interoperability: Monolingual
• When language ressources for a low-resource language are developed, different people have different ideas, e.g., for English (by the mid-1990s)
Susanne PennThe AT DTFulton NP1s NNPCounty NNL1cb NNPGrand JJ NNPJury NN1c NNPsaid VVDv VBDFriday NPD1 NNP
![Page 54: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/54.jpg)
54
Conceptual Interoperability: Monolingual
Susanne PennThe AT DTFulton NP1s NNPCounty NNL1cb NNPGrand JJ NNPJury NN1c NNPsaid VVDv VBDFriday NPD1 NNP
395 tagsword classes
morphological featuressyntactic features
lexical classes
57 tagsword classes
number and degree
![Page 55: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/55.jpg)
55
Conceptual Interoperability: Monolingual
• Integrating both resources allows us to– apply more wide-scale statistical analyses– increase training data for supervised POS tagging– increase test data for unsupervised POS tagging
395 tagsword classes
morphological featuressyntactic features
lexical classes
57 tagsword classes
number and degree
![Page 56: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/56.jpg)
56
Conceptual Interoperability: Multilingual
• with interoperable POS tags used across different languages, …– we can apply the same unlexicalized NLP tools
(e.g., parsers, cf. McDonald et al. 2013)– we can perform comparative corpus studies– we simplify multilingual annotation projection
![Page 57: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/57.jpg)
57
Conceptual Interoperability
• Multiple terminology repositories exist– available over the web– RDF representation
• Are linked with each other– for language IDs: Glottolog & lexvo.org– for lexical senses: WordNets (ILI)– for grammatical categories & features: GOLD,
ISOcat, OLiA
![Page 58: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/58.jpg)
58
Linked Terminologies
English
EAGLES
MULTEXT/East
15 (mostly) Eastern European languages
MULTEXT/East
MULTEXT/East 11 European languages
STTS
TIGER GermanConnexor
TüBa-D/ZGerman
PennBrown
Susanne
etc.
OLiAReference
Model
GOLD
ISOcat(morpho-syntax)
OntoTag(morpho-syntax)
TDS ontology
Ontologies of Linguistic Annotation OLiA
External Reference Models(Terminology Repositories)
(resource-specific) Annotation Models
Language Ressources
DictionariesCorpora
NLP Tools
EAGLESEAGLES
![Page 59: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/59.jpg)
59
Conceptual Interoperability
PennThe DTFulton NNPCounty NNPGrand NNPJury NNPsaid VBDFriday NNP
Determiner PronounOrDeterminer
SusanneThe ATFulton NP1sCounty NNL1cbGrand JJJury NN1csaid VVDvFriday NPD1
ProperNoun Noun hasNumber.Singular
ProperNoun Noun hasNumber.Singular
ProperNoun Noun hasNumber.Singular
ProperNoun Noun hasNumber.Singular
ProperNoun Noun hasNumber.Singular
(MainVerb StrictAuxiliaryVerb) hasTense.Past [sic!]
DefiniteArticle ArticleDeterminer PronounOrDeterminer
Surname ProperNoun Noun hasNumber.Singular
TopographicalNoun ProperNoun Noun hasNumber.Singular
AdjectivehasDegree.Positive
CommonNoun Noun hasNumber.Singular
TemporalNoun ProperNoun Noun hasNumber.Singular
MainVerb hasTense.Past
atomic statements mostly identical, just a few more
from Susanne
![Page 60: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/60.jpg)
What’s in for (Deep) MT?
Entity Linking: A Special Track for Proper NamesRe-Using Lexical Resources
Addressing Lexical GapsBootstrapping Dictionaries
Improved Deep Analysis
![Page 61: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/61.jpg)
61
Deep MT (© Jan Hajic, yesterday)
![Page 62: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/62.jpg)
62
Entity Linking: A Special Track for Proper Names
![Page 63: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/63.jpg)
63
Translating Proper Names
• Normally not directly translated, but maintained– Differences in inflection– Different writing systems (Cyrillic vs. Latin vs.
Arabic, etc.)– SMT: make sure your Language Model doesn’t
override the Translation Model !
![Page 64: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/64.jpg)
64
Translating Proper Names
• Make sure your Language Model doesn’t override the Translation Model !My grandfather's grandfather came to Germany in 1905.– "Il nonno di mio nonno arrivò in Canada nel 1905."
(google translate, Jul 2010)– "Nonno di mio nonno è venuto in Germania nel
1905." (google translate, Sep 2012)
![Page 65: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/65.jpg)
65
Translating Proper Names
• Make sure your Language Model doesn’t override the Translation Model !"Recentemente", conferma Maria Serena Balestracci, "mi ha telefonato un signore da Bologna, che aveva sentito parlare del libro alla radio– "... a gentleman from London ..." (google
translate, Oct 2010)– "... a gentleman from Bologna ..." (google
translate, Sep 2012)
![Page 66: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/66.jpg)
66
Translating Proper Names
• Make sure your Language Model doesn’t override the Translation Model !
These errors stopped after Google bought Freebase
![Page 67: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/67.jpg)
67
Trivial Entity Linking
• Named Entity Recognition -> treat NEs in a special way
![Page 68: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/68.jpg)
68
Trivial Entity Linking
• Named Entity Recognition • Entity Linking -> Link with an ontology, which
may provide multilingual labels
![Page 69: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/69.jpg)
69
Trivial Entity Linking
• Named Entity Recognition • Entity Linking -> Link with an ontology, which
may provide multilingual labels
![Page 70: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/70.jpg)
70
Trivial Entity Linking
• Named Entity Recognition • Entity Linking -> Link with an ontology, which
may provide multilingual labels– E.g., Entity Linking via DBpedia Spotlight– Follow DBpedia-JRCNames linking– Use multilingual label from JRCNames instead of
translating yourself
![Page 71: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/71.jpg)
71
Problems with inflecting languages
• Just knowing about (one) possible label in another language doesn’t help much if you need to inflect a name – or, any other string label from a knowledge base
![Page 72: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/72.jpg)
72
Problems with inflecting languages
• Listing all forms helps with entity linking, but not with machine translation
Þ We need linguistic LOD (LLOD)Systematic inclusion of grammatical information=> LLOD vocabularies and conventions
![Page 73: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/73.jpg)
73
(Re-) Using lexical resources
![Page 74: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/74.jpg)
74
Lemon: Lexicon Model for Ontologies
• Developed by the W3C Ontology-Lexica Community Group (OntoLex)
![Page 75: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/75.jpg)
75
Lemon: Lexicon Model for Ontologies
• Developed by the W3C Ontology-Lexica Community Group (OntoLex)
• Provides a data model for adding linguistic information to ontologies
• Widely used within the LLOD cloud– Also by colleagues not participating in OntoLex
• E.g., PanLex (Long Now Foundation)
– “Abused” for any kind of lexical resource• Even beyond the original ontology lexicalization use
case
![Page 76: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/76.jpg)
lemon Core
<Nr>
![Page 77: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/77.jpg)
lemon Sample (Moran and Brümmer 2013)
<Nr>
![Page 78: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/78.jpg)
Open World Assumption
• Unless explicitly stated, information is per se incomplete– Additional information can be expressed– E.g., using linguistic categories and features from
terminology repositoriesÞ Grammatical information can be described in a
reusable wayRecommended vocabularies: lexinfo, OLiA, GOLD
<Nr>
![Page 79: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/79.jpg)
Vocabularies for Lexical-Conceptual Resources
• lemon provides data structures, but – for content and metadata, it relies on external
vocabularies• Interoperability depends on a bundle of vocabularies
– WordNet, DBpedia, any ontology (lexical senses)– lexvo (language identifiers)– glottolog (languoid identifiers from linguistic typology)– PHOIBLE (phoneme inventories and phonological structures)– lexinfo (grammatical features for lexical resources)– OLiA (annotations)– ISOcat (resource metadata)– GOLD (grammatical concepts) <Nr>
![Page 80: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/80.jpg)
Vocabularies for Lexical-Conceptual Resources
• lemon provides data structures, but – for content and metadata, it relies on external
vocabularies• Interoperability depends on a bundle of vocabularies
– WordNet, DBpedia, any ontology (lexical senses)– lexvo (language identifiers)– glottolog (languoid identifiers from linguistic typology)– PHOIBLE (phoneme inventories and phonological structures)– lexinfo (grammatical features for lexical resources)– OLiA (annotations)– ISOcat (resource metadata)– GOLD (grammatical concepts)
Providing (lexical) resources in
accordance with these vocabularies improves
their reusability<Nr>
![Page 81: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/81.jpg)
Vocabularies for Lexical-Conceptual Resources
• lemon provides data structures, but – for content and metadata, it relies on external
vocabularies• Interoperability depends on a bundle of vocabularies
– WordNet, DBpedia, any ontology (lexical senses)– lexvo (language identifiers)– glottolog (languoid identifiers from linguistic typology)– PHOIBLE (phoneme inventories and phonological structures)– lexinfo (grammatical features for lexical resources)– OLiA (annotations)– ISOcat (resource metadata)– GOLD (grammatical concepts)
Providing (lexical) resources in
accordance with these vocabularies improves
their reusability
This effect can be extended to
NLP tools
<Nr>
![Page 82: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/82.jpg)
82
Addressing lexical gaps
![Page 83: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/83.jpg)
Addressing lexical gaps
• Subsumption inference can partially compensate the lack of lexical resources/coverageÞ If no counterpart for the target language is found,
try hypernyms
<Nr>
![Page 84: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/84.jpg)
A gaggle of photographers …
• Idea: – translate something that doesn’t exist in the target
language• Assume you know that it refers to the English
WordNet term gaggle-n. What would it be in German?– http://wordnet-rdf.princeton.edu/ provides
multilingual labels
<Nr>
![Page 85: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/85.jpg)
A gaggle of photographers …• http://wordnet-rdf.princeton.edu/wn31/gaggle-n
Basque, Finnish, Japanese, no German-> check hyperym
wn31:gaggle-n wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
<Nr>
![Page 86: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/86.jpg)
A gaggle of photographers …• http://wordnet-rdf.princeton.edu/wn31/gaggle-n
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
Port. branco, Span. banda, …no German -> check indirect hypernyms
<Nr>
![Page 87: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/87.jpg)
A gaggle of photographers …• http://wordnet-rdf.princeton.edu/wn31/gaggle-n
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym*/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
French groupe, Cat. grup, Gal. grupo, … still no German -> check an external resource, say lemonUby(which we know to contain some German)
<Nr>
![Page 88: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/88.jpg)
A gaggle of photographers …• http://wordnet-rdf.princeton.edu/wn31/gaggle-n
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym*/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")wn31:gaggle-n …/wn:synset_member/owl:sameAs ?gaggleUbyFILTER regexp(str(?gaggleUby), “http://lemon-model.net/lexica/uby/wn/.*”).?gaggleUby …FILTER (lang(?gaggle-n-de) = “de")
Traverse different resources (in the same end point) according to their structure, until something is found …
<Nr>
![Page 89: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/89.jpg)
A gaggle of photographers …• http://wordnet-rdf.princeton.edu/wn31/gaggle-n
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym*/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")wn31:gaggle-n …/wn:synset_member/owl:sameAs ?gaggleUbyFILTER regexp(str(?gaggleUby), “http://lemon-model.net/lexica/uby/wn/.*”).?gaggleUby …FILTER (lang(?gaggle-n-de) = “de")
Possible because the structure of these resources is lemon-conformant
<Nr>
![Page 90: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/90.jpg)
A gaggle of photographers …• http://wordnet-rdf.princeton.edu/wn31/gaggle-n
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym*/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")wn31:gaggle-n …/wn:synset_member/owl:sameAs ?gaggleUbyFILTER regexp(str(?gaggleUby), “http://lemon-model.net/lexica/uby/wn/.*”).?gaggleUby …FILTER (lang(?gaggle-n-de) = “de")
For resources out of the current end point, another SERVICE can be addressed -> federation
<Nr>
![Page 91: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/91.jpg)
A gaggle of photographers …• http://wordnet-rdf.princeton.edu/wn31/gaggle-n
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")
wn31:gaggle-n lemon:sense/lemon:reference/wn:hypernym*/wn:synset_member/wn:translation ?gaggle-n-de FILTER (lang(?gaggle-n-de) = “de")wn31:gaggle-n …/wn:synset_member/owl:sameAs ?gaggleUbyFILTER regexp(str(?gaggleUby), “http://lemon-model.net/lexica/uby/wn/.*”).?gaggleUby …FILTER (lang(?gaggle-n-de) = “de")
Quite slow, though, but can be used to pre-compile word lists with generalisation
<Nr>
![Page 92: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/92.jpg)
92
Bootstrapping dictionaries
![Page 93: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/93.jpg)
Bootstrapping dictionaries
By transitivity• Goal: Translate from Czech to Farsi• No dictionary, but
– Czech-English, English-Farsi• Quite noisy, though, hence check multiple paths
– Czech-Russian, Russian-Farsi– …-> Limit to forms with high confidence (by the number of paths, alternatives, etc.)
• Slow, again, but can be used for precompiling<Nr>
![Page 94: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/94.jpg)
94
Improved Deep Analysis
![Page 95: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/95.jpg)
95
Improved Deep Analysis
• Re-using externally provided tools (and data)– Structural Interoperability
• The output of NLP tools can be represented in RDF– NLP Interchange Format (NIF, nlp2rdf.org)
» If only one layer of analysis is considered» For more complicated annotations and actual corpora,
additional means are necessary, cf. POWLA (purl.org/powla)
– Conceptual Interoperability• Represent and integrate the output of NLP tools with
reference to LLOD repositories, e.g., the Ontologies of Linguistic Annotation (OLiA)
![Page 96: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/96.jpg)
96
Improved Deep Analysis
• Re-using externally provided tools (and data)– Structural Interoperability
• The output of NLP tools can be represented in RDF– NLP Interchange Format (NIF, nlp2rdf.org)
» If only one layer of analysis is considered» For more complicated annotations and actual corpora,
additional means are necessary, cf. POWLA (purl.org/powla)
– Conceptual Interoperability• Represent and integrate the output of NLP tools with
reference to LLOD repositories, e.g., the Ontologies of Linguistic Annotation (OLiA)
![Page 97: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/97.jpg)
98
Comparing and combining heterogeneous linguistic analyses
diese nicht neue Erkenntnisthis not new insight`this well-known insight‘
* P. Tapanainen and T. Järvinen. 1997. A nonprojective dependency parser. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 64–71, Washington, DC, April 1997
** H. Schmid and F. Laws. 2008. Estimation of conditionalprobabilities with decision trees and an application to fine-grained pos tagging. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008) , Manchester, UK, August 2008.
Connexor*PRON Dem FEM SG NOM
RFTagger**PRO.Dem.Attr.-3.Acc.Sg.Fem
![Page 98: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/98.jpg)
99
Comparing and combining heterogeneous linguistic analyses
ConnexorPRON Dem FEM SG NOM
RFTaggerPRO.Dem.Attr.-3.Acc.Sg.Fem
rdf:type(olia:PronounOrDeterminer)rdf:type(olia:Pronoun)rdf:type(olia:DemonstrativePronoun)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)olia:hasCase(olia:Nominative)
rdf:type(olia:PronounOrDeterminer)rdf:type(olia:Determiner)rdf:type(olia:DemonstrativeDeterminer)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)olia:hasCase(olia:Accusative)
OLiA Reference Model descriptions
![Page 99: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/99.jpg)
100
Comparing and combining heterogeneous linguistic analyses
rdf:type(olia:PronounOrDeterminer)rdf:type(olia:Pronoun)rdf:type(olia:DemonstrativePronoun)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)olia:hasCase(olia:Nominative)
rdf:type(olia:PronounOrDeterminer)rdf:type(olia:Determiner)rdf:type(olia:DemonstrativeDeterminer)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)olia:hasCase(olia:Accusative)
OLiA Reference Model descriptions
confidence ranking(simple voting)
rdf:type(olia:PronounOrDeterminer)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)
rdf:type(olia:Pronoun)rdf:type(olia:Determiner)
rdf:type(olia:DemonstrativePronoun)rdf:type(olia:DemonstrativeDeterminer)
olia:hasCase(olia:Accusative)olia:hasCase(olia:Nominative)
predicted by both tools
predicted by one tool
![Page 100: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/100.jpg)
101
Comparing and combining heterogeneous linguistic analyses
rdf:type(olia:PronounOrDeterminer)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)
rdf:type(olia:Pronoun)rdf:type(olia:Determiner)
rdf:type(olia:DemonstrativePronoun)rdf:type(olia:DemonstrativeDeterminer)
olia:hasCase(olia:Accusative)olia:hasCase(olia:Nominative)
disambiguation: create the maximal consistent set S of descriptions
1. S is empty2. process descriptions with decreasing confidence
a) if the current description is consistent with all descriptions in S, then add it to S
b) if not, skip itc) iterate until all descriptions are processed
confidence ranking(simple voting)
predicted by both tools
predicted by one tool
![Page 101: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/101.jpg)
102
Comparing and combining heterogeneous linguistic analyses
rdf:type(olia:PronounOrDeterminer)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)
rdf:type(olia:Pronoun)rdf:type(olia:Determiner)
rdf:type(olia:DemonstrativePronoun)rdf:type(olia:DemonstrativeDeterminer)
olia:hasCase(olia:Accusative)olia:hasCase(olia:Nominative)
disambiguation: create the maximal consistent set S of descriptions
1. S is empty2. process descriptions with decreasing confidence
a) if the current description is consistent with all descriptions in S, then add it to S
b) if not, skip itc) iterate until all descriptions are processed
identify incompatible annotations
check consistency conditionsin the ontology
![Page 102: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/102.jpg)
103
Comparing and combining heterogeneous linguistic analyses
olia:Determinerolia:Pronoun
olia:PronounOrDeterminer
olia_top:MorphosyntacticCategory
is-a
is-ais-a
olia:Demonstrative
Pronoun
olia:Demonstrative
Determiner
is-a is-a
rdf:type(olia:Pronoun)rdf:type(olia:Determiner)
rdf:type(olia:DemonstrativePronoun)rdf:type(olia:DemonstrativeDeterminer)
siblings are inconsistent
cousins are inconsistent
rdf:type(olia:Determiner)rdf:type(olia:DemonstrativePronoun)
aunts/nieces, etc. are inconsistent
A is consistent with B iff A B or B A
![Page 103: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/103.jpg)
104
Comparing and combining heterogeneous linguistic analyses
rdf:type(olia:PronounOrDeterminer)olia:hasNumber(olia:Singular)olia:hasGender(olia:Feminine)
rdf:type(olia:Pronoun)rdf:type(olia:Determiner)
rdf:type(olia:DemonstrativePronoun)rdf:type(olia:DemonstrativeDeterminer)
olia:hasCase(olia:Accusative)olia:hasCase(olia:Nominative)
disambiguation: create the maximal consistent set S of descriptions
consistency
1. S is empty2. process descriptions with decreasing confidence
a) if the current description is consistent with all descriptions in S, then add it to S
b) if not, skip itc) iterate until all descriptions are processed
Þ from every equally-ranked pair of inconsistent descriptions:
first come, first serve(simple voting with random tie resolution)
![Page 104: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/104.jpg)
105
Experiments
• we know that ensemble combination improves accuracy– if so, we should observe an increase of accuracy
for (at least some) combinations of tools– but accuracy may be the wrong criterion
• inaccurate if the target annotation is less rich than one of the source annotations
Þ measure recall, not accuracy
![Page 105: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/105.jpg)
106
Experiments (Chiarcos 2010)
• German newspaper corpora• 10,000 tokens from each of the following
newspaper corpora– NEGRA (Skut et al. 1998)– TIGER (Brants et al. 2002)– Potsdam Commentary Corpus (PCC, Stede 2004)
• TIGER/NEGRA-style target annotation
![Page 106: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/106.jpg)
107
RFTagger TreeTaggerStanfordTagger
StanfordParser
BerkeleyParser
MorphistoMorphology
Connexor
corpus file with reference annotation
in TIGER format
plain texttokenized
RFTaggeranotation
model
STTS annotation
model
STTSannotation
model
STTSannotation
model
STTSannotation
model
Morphistoannotation
model
Connexorannotation
model
set of OLiA reference model
descriptions
TIGERannotation
model
maximalconsistentdescription
comparisonwith reference
description
![Page 107: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/107.jpg)
108
ExperimentsMorphosyntax: Example
Diese nicht neue Erkenntnis
Þ PronounOrDeterminer& Determiner& DemonstrativeDeterminer
![Page 108: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/108.jpg)
109
ExperimentsMorphosyntax: Recall
* StanfordTagger was trained on NEGRA
![Page 109: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/109.jpg)
110
ExperimentsMorphosyntax: Recall
• continuous increase of (avg.) recall• combination of 5-6 tools outperforms best-
performing single tool– except StanfordTagger on NEGRA
• trained on NEGRA
Þ table for individual combinations
![Page 110: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/110.jpg)
111
ExperimentsMorphosyntax: Results
• best-performing combinations (NEGRA)1. Stanford Tagger (98.97% recall)2. -“- + Stanford Parser (98.71% recall)3. -“- + TreeTagger (99.00% recall)4. Stanford Tagger + Stanford Parser + Morphisto + RFTagger
(98.87% recall)5. Stanford Tagger + Stanford Parser + TreeTagger + RFTagger +
Connexor (98.29% recall)Þ marginal decrease of performance of best-performing
tools
![Page 111: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/111.jpg)
112
ExperimentsMorphosyntax: Results
• worst-performing combinations (NEGRA)(Berkeley Parser excluded)
1. Morphisto (70.06 % recall)2. -“- + Connexor (86.05 % recall)3. -“- + TreeTagger (91.90 % recall)4. -“- + RFTagger (94.29 % recall)5. -“- + StanfordTagger (96.10 % recall)
Þ rapid increase of performance for worst-performing tools
![Page 112: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/112.jpg)
113
Findings
• result is a consistent set of ontological descriptions– no loss of detail when trained/evaluated against
a target annotation• with different granularityÞ can be evaluated against corpora with different target
annotation
– natural handling of different granularities• hierarchical structures
– string-based representation can be generated
![Page 113: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/113.jpg)
114
More Experiments
Comparable results for German morphology (Chiarcos 2010, 3 tools)
Similar results for German dependency/edge labels(Chiarcos, unpublished, 5 tools, labels only)
Different use case, similar methodologyPareja et al. (2010), Spanish particle se
![Page 114: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/114.jpg)
115
Even more experiments
• Instead of combining existing tools, we can also train tools directly on ontological representations of annotations– Even if these originate from different annotations
• With ontology-based pruning, these yield ontologically consistent descriptions
– Chiarcos & Sukhareva (NLP&LOD2, next week)• Trained a neural network, encoded and decoded with
OLiA representations• Increased granularity (depth of analysis), stable accuracy
– Replicate for discourse parsing• Based on Chiarcos (2014)
![Page 115: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/115.jpg)
116
Summary
• LLOD – provides resources– facilitates interoperability
• data, tools, annotations, lexical resources
– facilitates access/integration of heterogeneous/distributed information
• is a community effort– depends on your input
![Page 116: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/116.jpg)
117
Want to stay/get involved ?
• Join our discussions / meetings– present your resources, interests, questions, etc.
• Open Linguistics Working Group– http://linguistics.okfn.org/
• mailing list, telcos, meetings and events
– also consider the relevant W3C CGs, e.g.,• OntoLex => lexical-conceptual resources• BP-MLOD => best practice guidelines• LD4LT => NLP applications
![Page 117: Linguistic Linked Open Data: What’s in for (Deep) Machine Translation? Christian Chiarcos chiarcos@informatik.uni-frankfurt.de DeepMT, Sep 4 th, 2015,](https://reader035.vdocuments.us/reader035/viewer/2022062422/56649e805503460f94b84661/html5/thumbnails/117.jpg)
118
Thanks a lot !
• Join our discussions / meetings– present your resources, interests, questions, etc.
• Open Linguistics Working Group– http://linguistics.okfn.org/
• mailing list, telcos, meetings and events
– also consider the relevant W3C CGs, e.g.,• OntoLex => lexical-conceptual resources• BP-MLOD => best practice guidelines• LD4LT => NLP applications