language research in service to the nation creating a dual-use pandialectal pashto grammar af-pak...
TRANSCRIPT
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Creating a dual-use pandialectal Pashto grammar
AF-PAK LEARN OmahaMay 17, 2010
Corey Miller ([email protected]), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman,
Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Motivation
• Pashto is an indispensable Afghan language critical to our nation’s security
• Pashto is difficult for English speakers
• Updated, comprehensive, learner-oriented Pashto materials are needed– Grammar
– Easy-access dictionary
2
LANGUAGE RESEARCH IN SERVICE TO THE NATION
What makes Pashto difficult?
• Ergativity
• Up to four cases: direct, oblique, ablative, and vocative
• Multiple noun and adjective declension classes
• Variety of adpositions: prepositions, postpositions, and circumpositions
• Retroflex consonants
• Variety of verbal structures
3
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Project components
4
FieldworkDescriptive Grammar
Dictionary
Formal Grammar
Parser
Parser enables easy access to dictionary
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Fieldwork
• Identified native speakers of Pashto from Afghanistan and Pakistan living in the US– Peshawar, Quetta, Pakistan
– Kabul, Kandahar, Afghanistan
• Create and run elicitation guides highlighting range of grammatical features
• Review all paradigms and example sentences, note dialect variation
• Digitally record all sessions5
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Motivation for descriptive grammar
• Existing materials suffer from liabilities– dated
– cover single dialect• Tegey and Robson 1996: Kabul
• Penzl 1955: Kandahar
• Shafeev 1964: Kandahar
– lack Pashto script (T&R has it)
6
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Goals for descriptive grammar
• Contemporary data and presentation
• Use of Pashto script and transcription throughout
• Cover dialect variation wherever it applies
7
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Descriptive grammar
• Pashto language, orthography, phonology
• Adpositions• Pronouns• Nouns• Adjectives• Verbs• Dialectology• Miscellaneous
8
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Pronoun paradigm: incorporation of dialect information
10
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Morphological parsing
• Inputs– Formal grammar
– Dictionary (Lexicon)
• Output capability– Analysis: given an inflected form, produce
possible headwords
– Generation: given a headword, produce possible inflected forms
16
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Uses of morphological parser
• Analysis capability enables dictionary lookup of inflected forms
• Generation has pedagogical uses including self-testing
17
LANGUAGE RESEARCH IN SERVICE TO THE NATION
How morphological analysis aids lookup
• Inflected forms may differ substantially from citation forms
• Experts can work around this problem, but non-experts often can’t
18
Translation Transcription PashtoI am shooting
wə́�lə́m ولم
I was shooting
wiʃtə́�lə́m ويشتلم
LANGUAGE RESEARCH IN SERVICE TO THE NATION
The parser maps inflected forms to citation forms (headwords)
ويشتل[wishtə́�l] (verb) to shoot
Grammatical info: first person singular present imperfectiveCitation form: ويشتل
What does this Pashto word mean?
ولم
What does this Pashto word mean?
ولم
19
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Conclusion
• Updated descriptive grammar based on fieldwork
• Formal grammar and lexicon feed parser
• Parser enables simplified dictionary lookup
• Faster, more informed processing of Pashto
20
LANGUAGE RESEARCH IN SERVICE TO THE NATION
Conclusion
• Updated descriptive grammar based on fieldwork
• Formal grammar and lexicon feed parser• Parser enables simplified dictionary lookupFaster, more informed processing of
Pashto
21
LANGUAGE RESEARCH IN SERVICE TO THE NATION
References
• David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged Languages, Third International Joint Conference on Natural Language Processing, Hyderabad, India.
• Maxwell, Michael and Anne David. 2008. Interoperable Grammars. First International Conference on Global Interoperability for Language Resources, Hong Kong.
• Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).
22
LANGUAGE RESEARCH IN SERVICE TO THE NATION
References
• Penzl, Herbert. 1955. A Grammar of Pashto. Washington, DC: American Council of Learned Societies.
• Tegey, Habibullah and Barbara Robson. 1996. A Reference Grammar of Pashto. Washington, DC: Center for Applied Linguistics.
• Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. International Journal of American Linguistics 30.
23