karlheinz mörth 1 , stephan procházka 2 , ines dallaji 2
DESCRIPTION
Laying the Foundations for a Diachronic Dictionary of Tunis Arabic A First Glance at an Evolving New Language Resource. Karlheinz Mörth 1 , Stephan Procházka 2 , Ines Dallaji 2 1 Institute of Corpus Linguistics and Text Technology ( Austrian Academy of Sciences ) - PowerPoint PPT PresentationTRANSCRIPT
Laying the Foundations for a Diachronic Dictionary of Tunis Arabic
A First Glance at an Evolving New Language Resource
Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2
1Institute of Corpus Linguistics and Text Technology (Austrian Academy of Sciences)2Department of Oriental Studies (University of Vienna)
IntroductionTwo projects
Vienna Corpus of Arabic Varieties (VICAV)
Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach (TUNICO)
Text technology + Linguistics
IntroductionVICAV
==> Vienna Corpus of Arabic Varieties
Digital language resources of a wide range of spoken Arabic varieties: dictionaries, corpora, bibliographies, language profiles, best practices
Cooperation of University of Vienna and the Austrian Academy of Sciences
http://corpus3.aac.oeaw.ac.at/vicav2/
IntroductionVICAV
IntroductionVICAV
IntroductionVICAV
IntroductionTUNICO
==> Linguistic Dynamics in the Greater Tunis Area: A Corpus-based Approach
Funded by the Austrian Science Fund (FWF, P 25706-G23)
Main objectives:
Exploration of spoken, contemporary Arabic
Two digital language resourcesCorpus of spoken youth language
Dictionary of Tunis Arabic
Arabic dialect lexicography
No comprehensive dictionary of the Arabic dialect of Tunis
Basis for diachronic research:• Nicolas, A. (1911). Dictionnaire français-arabe• Beaussier, M. (2006). Dictionnaire pratique arabe-français (arabe maghrébin)• Quéméneur, J. (1961). “Notes sur quelques vocables du parler Tunisien”• Quéméneur, J. (1962). “Glossaire de dialectal”• Abdellatif, K. (2010). Dictionnaire «le Karmous» du Tunisien • Marçais, W. , Guîga, A. (1958-61). Textes arabes de Takroûna. II: Glossaire
Dictionary of Tunis Arabic
- micro-diachronic and machine-readable- up-to-date and easily accessible lexical information - incorporation of:a) contemporary data from a digital corpusb) various historical sources (e.g. Stumme, H.) - information added is kept traceable to its origin
- basis: data taken from didactic materials - 3 other main sources: newly created corpus, interviews and historical publications
Dictionary of Tunis ArabicContemporary sources
1) Corpus of spoken youth language (dialogues, narratives):
uncommon approach in Arabic dialectology: dialectological interests in language of older people --> only olderforms of particular varieties knownfocus on modern language, contemporary usage and lexicalneologisms
2) Additional interviews to complete the data gained from corpus and historical sources
Dictionary of Tunis ArabicHistorical sources
- 800-page grammar of the Medina of Tunis by Hans-Rudolf Singer (1984): evaluation of data, integration of excerpted lexicographic data into dictionary
- Verification and completion of collected data with other historical resources
- Diachronic dimension helps to understand processes in the development of the lexicon
- Material gathered will allow analysis of recent developments (migration of parents from rural areas, influence by other Arabic varieties, influence of revolution, foreign elements)
Dictionary of Tunis Arabic
Dictionary of Tunis ArabicTechnical issues
Modelling the data
Interoperability
TEI P5
Dictionary of Tunis ArabicTechnical issues
Using the TEI dictionary module to encode digitised print dictionaries is a fairly common standard procedure in digital humanities.
The TEI dictionary module needs to be further constrained:• to enhance interoperability• to reduce alternate constructs• to achieve a high degree of compliance with LMF (ISO
24613)
Easy to impose in the creation of digitally born dictionaries.
Dictionary of Tunis ArabicBasic schema
<TEI> <teiHeader> ... </teiHeader>
<text> <body> <div type="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> </body> </text></TEI>
<TEI> <teiHeader> ... </teiHeader>
<text> <body> <div type="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> </body> </text></TEI>
Dictionary of Tunis ArabicBasic schema
<body> <div type="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> <div type="examples"> <cit type="example">...</cit> <cit type="example">...</cit> <cit type="example">...</cit> ... ... ... </div> </body>
<body> <div type="entries"> <entry>...</entry> <entry>...</entry> <entry>...</entry> ... ... ... </div> <div type="examples"> <cit type="example">...</cit> <cit type="example">...</cit> <cit type="example">...</cit> ... ... ... </div> </body>
Dictionary of Tunis ArabicBasic schema
<entry id="ktaab_001"> <form type="lemma"> <orth lang="ar-aeb-x-tunis-vicav">ktāb</orth></form>
<form type="inflected" ana="#n_pl"> <orth lang="ar-aeb-x-tunis-vicav">ktub</orth></form>
<gramGrp> <gram type="pos">noun</gram> <gram type="root" lang="ar-aeb-x-tunis-vicav">ktb</gram> </gramGrp>
<sense> <cit type="translation" lang="en"> <quote>book</quote></cit> <cit type="translation" lang="de"> <quote>Buch</quote></cit> <cit type="translation" lang="fr"> <quote>livre</quote></cit> </sense> </entry>
<entry id="ktaab_001"> <form type="lemma"> <orth lang="ar-aeb-x-tunis-vicav">ktāb</orth></form>
<form type="inflected" ana="#n_pl"> <orth lang="ar-aeb-x-tunis-vicav">ktub</orth></form>
<gramGrp> <gram type="pos">noun</gram> <gram type="root" lang="ar-aeb-x-tunis-vicav">ktb</gram> </gramGrp>
<sense> <cit type="translation" lang="en"> <quote>book</quote></cit> <cit type="translation" lang="de"> <quote>Buch</quote></cit> <cit type="translation" lang="fr"> <quote>livre</quote></cit> </sense> </entry>
Dictionary of Tunis ArabicRepresenting diachrony
…<bibl> <author>Ritt-Benmimoun</author> <date>2014</date></bibl>
…
<bibl> <author>Singer</author> <date>1958</date> <biblScope unit="page">56</biblScope></bibl>
…
…<bibl> <author>Ritt-Benmimoun</author> <date>2014</date></bibl>
…
<bibl> <author>Singer</author> <date>1958</date> <biblScope unit="page">56</biblScope></bibl>
…
Dictionary of Tunis ArabicTools
Viennese Lexicographic Editor (VLE)XML editor providing functionalities typically needed in compiling
lexicographic data
Web-based standalone application
Designed to process standard-based lexicographic and terminological data such as LMF, TBX, RDF or TEI.
Automating procedures
Freely configurable visualisation (via XSLT)
Validation: MSXML Schema
Client-server architecture (php + mysql)
Freely available and easy to setup
Dictionary of Tunis ArabicTools
Dictionary of Tunis ArabicTools
Corpus – Dictionary interface
Dictionary of Tunis ArabicTools
corpus_shell... a modular framework of reusable software components to access and publish heterogeneous and distributed language resources such as language corpora, dictionaries, encyclopaedic databases, prosopographic databases, bibliographies, metadata, and schemata.
Language Resources Portalclarin.oeaw.ac.at/ccv/corpus_shell.
clarin.oeaw.ac.at/ccv/
Dictionary of Tunis ArabicStatus and outlook
CLARIN-ERIC (Common Language Resources and Technology Infrastructure).
Open access and open source.
~5000 entries
النتباهكم ! شكرًا
Karlheinz Mörth1, Stephan Procházka2, Ines Dallaji2
1Institute of Corpus Linguistics and Text Technology (Austrian Academy of Sciences)2Department of Oriental Studies (University of Vienna)
Thank you for your attention!