sharing and browsing linguistic data emeld arizona: terry langendoen scott farrar
TRANSCRIPT
![Page 1: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/1.jpg)
Sharing and Browsing Sharing and Browsing Linguistic DataLinguistic Data
EMELD Arizona:EMELD Arizona:
Terry LangendoenTerry Langendoen
Scott FarrarScott Farrar
![Page 2: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/2.jpg)
Since Santa BarbaraSince Santa Barbara
Focus on morpho-syntaxFocus on morpho-syntax Decided to build ontology (to be Decided to build ontology (to be
discussed later in this talk)discussed later in this talk) Decided to build supporting toolsDecided to build supporting tools
– smart search engine (Hedwig)smart search engine (Hedwig)– editoreditor
Some work on xml markupSome work on xml markup
![Page 3: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/3.jpg)
The ProblemThe Problem
Currently there is no general way for Currently there is no general way for researchers in the endangered researchers in the endangered languages community to languages community to electronically share information.electronically share information.
The Web is the most likely tool that The Web is the most likely tool that could provide a solution.could provide a solution.
The current WWW is not adequate.The current WWW is not adequate. An Example from the WWW:An Example from the WWW:
![Page 4: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/4.jpg)
![Page 5: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/5.jpg)
![Page 6: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/6.jpg)
![Page 7: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/7.jpg)
![Page 8: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/8.jpg)
Further ComplicationsFurther Complications
What about other data formats?What about other data formats?– lexiconslexicons– grammatical descriptionsgrammatical descriptions– (comparative) word lists(comparative) word lists– paradigmsparadigms– etc.etc.
![Page 9: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/9.jpg)
Warumungu DescriptionWarumungu Description
'Grammatical case suffixes' are those which 'Grammatical case suffixes' are those which express grammatical relations (subject, express grammatical relations (subject, object, indirect object), like /karriny-ji/ in object, indirect object), like /karriny-ji/ in (4). A noun without a case suffix is (4). A noun without a case suffix is interpreted as having Absolutive case - interpreted as having Absolutive case - /nanttu/ in (4) and /wangarri/ in (5) - or as /nanttu/ in (4) and /wangarri/ in (5) - or as being the main predicator, or as agreeing being the main predicator, or as agreeing with some argument with Absolutive case - with some argument with Absolutive case - /kumppu/ and /pulyurrulyurru/ in (5)./kumppu/ and /pulyurrulyurru/ in (5).
(from J. Simpson 1998)(from J. Simpson 1998)
![Page 10: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/10.jpg)
(4)Karriny-ji +ajjul nyirri-njina nanttu, ngapa-kajji.people-ERG +3pl.S put-PAST.CONT humpy, water-LEST'The people were erecting humpies for fear of the rain.' [JS:PND:RS]
(5)Nyirri-nyi +ama wangarri kumppu pulyurrulyurru.place-PAST.PUN +he rock ABS big.ABS red.ABS'He placed a big red hill.' [JS:PND:RS]
![Page 11: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/11.jpg)
Chichewa DescriptionChichewa Description
Other elements that appear as verbal Other elements that appear as verbal prefixes include modals – for prefixes include modals – for instance, -ngo- 'just, merely' – as well instance, -ngo- 'just, merely' – as well as directional elements -ka- 'go' and -as directional elements -ka- 'go' and -dza- 'come'. These are placed in the dza- 'come'. These are placed in the immediate pre-OM position, after the immediate pre-OM position, after the tense. This is shown by the following:tense. This is shown by the following:
(from Mchombo 1998)(from Mchombo 1998)
![Page 12: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/12.jpg)
(8a)Mkângo s-ú-ná-ngo-wá-phwány-a maûngu . . . 3-lion NEG-3SM-past-just-6OM-smash-fv 6-pumpkins . . .'The lion did not just smash them, the pumpkins . . .'
(8b)Mkângo u-ku-ká-phwány-á máûngu.3SM-pres.-go-smash-fv 6-pumpkins'The lion is going to smash some pumpkins.'
![Page 13: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/13.jpg)
A SolutionA Solution
Take advantage of new Web Take advantage of new Web technologytechnology
Build a community of practice on the Build a community of practice on the Semantic WebSemantic Web
What is the Semantic Web?What is the Semantic Web?
![Page 14: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/14.jpg)
The Semantic WebThe Semantic Web
New markup: <xml>, <rdf>, <owl>New markup: <xml>, <rdf>, <owl>
New tools: smart search engines New tools: smart search engines ontologies, new editorsontologies, new editors
Meaning is encoded explicitly.Meaning is encoded explicitly.
Pages are interpreted by a reasoner.Pages are interpreted by a reasoner.
![Page 15: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/15.jpg)
An Example from the Semantic An Example from the Semantic WebWeb
New markup adds functionality to New markup adds functionality to existing <html> documents.existing <html> documents.
Example:Example:
<rdf:Description rdf:about="#A110604"> <rdf:type rdf:resource="#State" /> <NS0:name>Tennessee</NS0:name> </rdf:Description>
<rdf:Description rdf:about="#876555"> <rdf:type rdf:resource="#Language" /> <EMELD:name>Navajo</EMELD:name> </rdf:Description>
![Page 16: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/16.jpg)
Aardvark
nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata WordNet for 'aardvark'
Nouns:
1. nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer
Verbs:
Adjectives:
Adverbs:
![Page 17: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/17.jpg)
<html><head><rdf:RDF…<Word rdf:about="aardvark"> <hasSense rdf:resource="9385"/></Word><SynSet rdf:about="9385"> <type rdf:resource="noun"/> <rdfs:comment>nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata </rdfs:comment> <hasElement rdf:resource="aardvark"/> <hasElement rdf:resource="ant_bear"/> <hasElement rdf:resource="anteater"/> <hasElement rdf:resource="Orycteropus_afer"/></SynSet></rdf:RDF></head><body>WordNet for 'aardvark'<br><br>Nouns:<br><br> 1. nocturnal burrowing mammal of the grasslands of Africa that feeds on termites; sole extant representative of the order Tubulidentata<br> Synonyms: aardvark,ant_bear,anteater,Orycteropus_afer<br><br>Verbs:<br><br>Adjectives:<br><br>Adverbs:<br><br></body></html>
![Page 18: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/18.jpg)
The OntologyThe Ontology
Crucial component of the Semantic Crucial component of the Semantic WebWeb
A resource that explicitly defines A resource that explicitly defines what entities can exist in a domain, what entities can exist in a domain, i.e., the endangered languages i.e., the endangered languages communitycommunity
A resource that defines what A resource that defines what relations hold between entitiesrelations hold between entities
demodemo
![Page 19: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/19.jpg)
OWL Web Ontology LanguageOWL Web Ontology Language
Analogous role of <html> on the Analogous role of <html> on the WWWWWW
The most current “standard” The most current “standard” Semantic Web languageSemantic Web language
Under development at the W3C:Under development at the W3C:
www.w3c.orgwww.w3c.org
![Page 20: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/20.jpg)
Facilitating ToolsFacilitating Tools
Search tools for the Semantic WebSearch tools for the Semantic Web Editors for composing Semantic Web Editors for composing Semantic Web
pagespages Reasoning enginesReasoning engines An extensible data modelAn extensible data model
![Page 21: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/21.jpg)
A Search EngineA Search Engine
EMELD Arizona’s prototype (Hedwig)EMELD Arizona’s prototype (Hedwig)
http://emeld.douglass.arizona.edu:http://emeld.douglass.arizona.edu:
8080/searchindex.html (temporarily 8080/searchindex.html (temporarily out of service)out of service)
demo on Sundaydemo on Sunday
![Page 22: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/22.jpg)
An EditorAn Editor
EMELD Arizona’s prototype (name?)EMELD Arizona’s prototype (name?)
demo on Sundaydemo on Sunday
![Page 23: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/23.jpg)
A Good Data Model for Creating a A Good Data Model for Creating a Community of PracticeCommunity of Practice
Language data should be searchable Language data should be searchable and comparable—broad access and comparable—broad access (centralized).(centralized).
Authors or communities want control Authors or communities want control over their data (local/distributed).over their data (local/distributed).
Local control should be balanced with Local control should be balanced with data interoperability (Semantic Web).data interoperability (Semantic Web).
![Page 24: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/24.jpg)
Centralized ModelCentralized Model
Warumungu
Wari
Mocovi
Biao Min
ArchiHopi
Community
![Page 25: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/25.jpg)
Local Control with Broad AccessLocal Control with Broad Access
Semantic Web
ontology
Wari<xml>
Hopi<xml>
Archi<xml>
Community
toolstools
tools
![Page 26: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/26.jpg)
Community RequirementsCommunity Requirements
No need to standardize your No need to standardize your terminology or abandon tradition.terminology or abandon tradition.
No need to learn <xml> (it doesn’t No need to learn <xml> (it doesn’t hurt!)hurt!)
Use EMELD tools to put your data on Use EMELD tools to put your data on the Semantic Webthe Semantic Web
Maintain your dataMaintain your data
![Page 27: Sharing and Browsing Linguistic Data EMELD Arizona: Terry Langendoen Scott Farrar](https://reader034.vdocuments.us/reader034/viewer/2022042717/56649e115503460f94afcbb3/html5/thumbnails/27.jpg)
Contact InfoContact Info
Terry LangendoenTerry Langendoen Scott FarrarScott Farrar
[email protected]@[email protected]@u.arizona.edu
See our website:See our website:
http://emeld.douglass.arizona.edu:8080http://emeld.douglass.arizona.edu:8080