typological database of the ugric languagesugortip.elte.hu/vegyes/tdbuglengl.pdf · 2013. 10....

32
APmkNon APmk=Case APmk=WO APmk=Top NoDWO ?DWO SV VS NoPP NoCase NoPPCase PPAff AffPP PPCaseTon PPCaseInflex CaseMix &Suppl Typological Database of the Ugric Languages An introduction Helsinki, 16.9.2013 Nikolett F. Gulyás, Eötvös Loránd University &Suppl

Upload: others

Post on 11-May-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

APmkNonAPmk=CaseAPmk=WOAPmk=Top NoDWO

?DWOSVVS

NoPPNoCaseNoPPCasePPAffAffPPPPCaseTonPPCaseInflexCaseMix&Suppl

Typological Database of the Ugric LanguagesAn introduction

Helsinki, 16.9.2013Nikolett F. Gulyás, Eötvös Loránd University

&Suppl

Page 2: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Overview} Introduction} New approaches (and old questions) in Uralistics} Linguistic typology and Uralistics} The Uralic Typological Database Project} The Uralic Typological Database Project} Databases and other online sources} UgTDB as a pilot project} Summary

2 Typological Database of the Ugric Languages OTKA 104249

Page 3: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

About our Department} Since 1872} Majors in Finnish, Estonian and Uralistics} Ca. 100 students (BA, MA, PhD)} Languages taught:} Finnish and Estonian} Finnish and Estonian} Veps, Karelian} Erzya} Mari } Komi, Permyak, Udmurt} Surgut-Khanty, Northern Mansi

3 Typological Database of the Ugric Languages OTKA 104249

Page 4: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Main research areas} Literature, Archeology, Ethnography, History etc. } Comparative linguistics:} historical comparative morphology and syntax} synchronic comparative morphology and syntax} language contacts, areal linguistics

} Sociolinguistics:} code-switching} attitudes} attitudes

} Phonetics:} comparative research on vowel features} accent of non-native speakers

} Linguistic typology:} diachronic typology} TAM, alignment patterns, information structure, syntax-semantics interface

} databases

4 Typological Database of the Ugric Languages OTKA 104249

Page 5: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Overview} Introduction} New approaches (and old questions) in

Uralistics} Linguistic typology and Uralistics} The Uralic Typological Database Project} The Uralic Typological Database Project} Databases and other online sources} UgTDB as a pilot project} Summary

5 Typological Database of the Ugric Languages OTKA 104249

Page 6: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

New approaches (and old questions) in Uralistics} The gap in research:} early research focused on etymology} first descriptions: rich data without conventional descriptions, problems with transcriptions etc.

} from ’30s there were no fieldworks } there is a gap between the development of modern linguistic } there is a gap between the development of modern linguistic descriptions and new fieldworks carried out systematically from the ’80s

} We already have:} the major FU languages investigated within different frameworks } new (text) corpora on minor FU languages} precise description on phonological and (to some extent) morphological features of minor FU languages

6 Typological Database of the Ugric Languages OTKA 104249

Page 7: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

New approaches (and old questions) in Uralistics} Fields emerging in present day research:} creating new text corpus (parallel and colloquial corpora)} systematic syntactic description } systematic phonetical description } pragmatics} discourse analysisL2 acquisition

Typological Database of the Ugric Languages OTKA 1042497

} L2 acquisition} etc.

} New(er) approaches:} generative grammar} functional-cognitive grammar} construction grammar} variational typology} etc.

Page 8: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Overview} Introduction} New approaches (and old questions) in Uralistics} Linguistic typology and Uralistics} The Uralic Typological Database Project} The Uralic Typological Database Project} Databases and other online sources} UgTDB as a pilot project} Summary

8 Typological Database of the Ugric Languages OTKA 104249

Page 9: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Linguistic typology and Uralistics} From Uralistics to Linguistic Typology:} updated and comparable data} detailed explanations} new data provided by natives (more informants)} possibly new questions} possibly new questions

} From Linguistic Typology to Uralistics:} methods} new (-old) questions and research areas} the place of our language family within the world’s languages

9 Typological Database of the Ugric Languages OTKA 104249

Page 10: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Overview} Introduction} New approaches (and old questions) in Uralistics} Linguistic typology and Uralistics} The Uralic Typological Database Project} The Uralic Typological Database Project} Databases and other online sources} UgTDB as a pilot project} Summary

10 Typological Database of the Ugric Languages OTKA 104249

Page 11: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

The Uralic Typological Database Project} Goals:} an online database containing all the possible morphosyntactic parameters of a given language

} values of the parameters with examples and descriptions} new parameters (and languages) can be added to the database

} Research questions:What are the typologically defined morphological and syntactic } What are the typologically defined morphological and syntactic features of the analyzed Uralic languages?

} How do these languages fit into the typology of the world’s languages?

} How can a database compatible with already existing digital analysis (parameters, parameter values, combinative contrastive mechanisms) be created in theory and practice for researchers to analyze relevant relationships between languages?

11 Typological Database of the Ugric Languages OTKA 104249

Page 12: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

UTDB} Once again, the reasons:} intragenetic typology analyzes related languages in terms of their shared features and differences

} Uralic languages have not been analyzed from this point of view before

} History of the project:

Typological Database of the Ugric Languages OTKA 10424912

} History of the project:} 2005:CIFU, Joshkar-Ola: the aim has emerged} 2008: Vienna: conference, pilot projects on different typological questions

} 2010: CIFU, Piliscsaba: symposium dedicated to the typological description of the Finno-Ugric languages

} 2013- The Typological Database of the Ugric Languages (pilot project)

Page 13: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

UTDB} The database:} it provides all the basic material for further typological collection

} it will be accessible to not only Finno-Ugric linguists but also researchers of linguistic typology

} it will offer both complete typological analysis of the

Typological Database of the Ugric Languages OTKA 10424913

} it will offer both complete typological analysis of the Uralic languages in terms of morphology and syntax and make it possible to do comparative analysis with the rest of the world’s languages

} the parameters and programming framework of the database can offer an example and starting point for similar future projects

Page 14: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

UTDB} Some examples:} word order} marking of semantic roles} possession} case markingalignment patterns

Typological Database of the Ugric Languages OTKA 10424914

} alignment patterns} argument structure} negation} the structure of the NP, VP} person marking} numerals

Page 15: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Overview} Introduction} New approaches (and old questions) in Uralistics} Linguistic typology and Uralistics} The Uralic Typological Database Project} The Uralic Typological Database Project} Databases and other online sources} UgTDB as a pilot project} Summary

15 Typological Database of the Ugric Languages OTKA 104249

Page 16: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Databases and other online sources} Typological databases:

} The World Atlas of Language Structures

} Matthew Dryer's Typological Database

} Surrey Morphology Group

16 Typological Database of the Ugric Languages OTKA 104249

Page 17: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

WALS} the most complex typological database of the world’s languages

} data presented from 2678 languages} main categories:} phonology

Typological Database of the Ugric Languages OTKA 10424917

} nominal categories} verbal categories} nominal syntax} word order} simple clauses} complex sentences} lexicon, sign languages, other

Page 18: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

WALS} http://wals.info/} Uralic languages:} 27 languages} traditional classification, but: Mari and the Mordvinic languages are separated

} Parameters: ca. 160

Typological Database of the Ugric Languages OTKA 104249

18

} the most for Hu: 140} the fewest for Izhor and Liv: 1

} Notes:} Yazva-Komi is classified as a Finnic language} Khanty, Mansi, Karelian, Estonian presented without separation} only 3 Saami languages} some values are questionable

Page 19: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Matthew Dryer's Typological Database} only word order parameters for areal patterns} 34 different parameters} 225 languages including 34 Uralic languages} not easy to search data and compare the results} search by language (family) is not possible

Typological Database of the Ugric Languages OTKA 10424919

} search by language (family) is not possible} the Ob-Ugric languages have only a few values

} http://linguistics.buffalo.edu/people/faculty/dryer/dryer/database

Page 20: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Surrey Morphology Group} investigation of grammatical categories with the use of explicit formal and statistical frameworks

} different databases on canonical typology} a very good source for particular questions} Hungarian is included but not in all cases

Typological Database of the Ugric Languages OTKA 10424920

} Hungarian is included but not in all cases} Ob-Ugric languages are neglected

} http://www.surrey.ac.uk/englishandlanguages/research/smg/webresources/index.htm

Page 21: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Databases and other online sources} Languages and language families:} Ethnologue

} Online etymological dictionary:} Uralonet

} On Ob-Ugric languages:

Typological Database of the Ugric Languages OTKA 10424921

On Ob-Ugric languages:} Ob-Babel project

} Morphologic analyzers} the Hungarian National Corpus & the Mazsola:} HNC} Mazsola

Page 22: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Ethnologue} General information about the world’s languages:

} ISO} Alternate Names} Population} Location} Language Maps } Language Status} Classification

Typological Database of the Ugric Languages OTKA 10424922

} Dialects} Typology (only basic word order)} Language Development} Language Resources} Writing} Place in Language Cloud

} Notes:} there is no Ugric subgroup } the dialectal grouping at least for Khanty is controversial} http://www.ethnologue.com/

Page 23: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Uralonet} Digitalized version of the Uralisches Etymologisches Wör te rbuch U EW )

} some updated etymologies} new comments can be added} meanings are provided in Hungarian, German and English

Typological Database of the Ugric Languages OTKA 10424923

English} search not only by languages but by semantic fields as well

} some exercises according to the comparative method can be found

} http://www.uralonet.nytud.hu/

Page 24: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Ob-Babel project} a collection of former data about the Ob-Ugric languages} http://www.babel.gwi.uni-muenchen.de/} Content:} text corpora in four different dialects of Khanty and Mansi} innovative e-grammars} e-dictionaries

Typological Database of the Ugric Languages OTKA 10424924

} e-dictionaries} an e-library} ethnographic and folklore material

} Notes:} morphologic analyzer is useful but there are some problems with the tool in use

} typological data is not provided} NB: different tasks as in the UTDB

Page 25: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Morphologic analyzers} Analyzers for Nganasan, Permic and Ob-Ugric languages

} Text corpora is included} Structural analysis} Task: complex morphologic analyzer

Typological Database of the Ugric Languages OTKA 10424925

} Task: complex morphologic analyzer} The tool works well with its own corpus but cannot deal with other transcriptions (at least in the case of Khanty)

} http://www.morphologic.hu/urali/

Page 26: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

The Hungarian National Corpus & the Mazsola} HNC:} the largest digital corpus for Hungarian} based on written sources} different dialects, materials from the whole Carpathian Basin

} different genres: press, literature, science, official, personal

Typological Database of the Ugric Languages OTKA 10424926

} different genres: press, literature, science, official, personal} more than 187 million tokens

} Mazsola:} on the basis of the HNC} for the research on argument structures: verbal prefixes, adjectives etc.

} textual frequency

Page 27: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Overview} Introduction} New approaches (and old questions) in Uralistics} Linguistic typology and Uralistics} The Uralic Typological Database Project} The Uralic Typological Database Project} Databases and other online sources} UgTDB as a pilot project} Summary

27 Typological Database of the Ugric Languages OTKA 104249

Page 28: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

UgTDB as a pilot project} objective: an online database for the Ugric languages} content:} parameters} description of the parameters} languages: Northern Mansi, Surgut-Khanty, Synja-Khanty and Hungarian

} description of the languages and different subgroups} description of the languages and different subgroups} features} description of the features} values} description of the values} glossed examples for a given value in all languages } citations} search tool (dealing with 5 different parameters at the same time)} the database works in English, in Russian and in Hungarian

28 Typological Database of the Ugric Languages OTKA 104249

Page 29: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Overview} Introduction} New approaches (and old questions) in Uralistics} Linguistic typology and Uralistics} The Uralic Typological Database Project} The Uralic Typological Database Project} Databases and other online sources} UgTDB as a pilot project} Summary

29 Typological Database of the Ugric Languages OTKA 104249

Page 30: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Summary} New approaches and research questions in recent times

} Cross-linguistically relevant questions have remained unanswered in Uralistics

} Some endangered languages are well documented, minor FU languages are not

Typological Database of the Ugric Languages OTKA 10424930

minor FU languages are not} A database can provide a possible solution for this problem

} UTDB > UgTDB} There are a lot of databases to follow, none of which is completely suitable for our goals > UgTDB

Page 31: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

Next time} Our project} Tasks} Target languages} Sources} Methods

Typological Database of the Ugric Languages OTKA 10424931

} Methods} Parameters} Values} Recent results} Questions} More questions

Page 32: Typological Database of the Ugric Languagesugortip.elte.hu/Vegyes/TDbUgLEngl.pdf · 2013. 10. 18. · }it will be accessible to not only Finno-Ugric linguists but also researchers

KIITOS! KÖS ZÖNÖ M! TH NK YO !

Typological Database of the Ugric Languages OTKA 10424932

KIITOS! KÖS ZÖNÖ M! TH NK YO [email protected]