language research in service to the nation creating a dual-use pandialectal pashto grammar af-pak...

23
LANGUAGE RESEARCH IN SERVICE TO THE NATION Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010 Corey Miller ([email protected]), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis

Upload: wilfrid-lane

Post on 17-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Creating a dual-use pandialectal Pashto grammar

AF-PAK LEARN OmahaMay 17, 2010

Corey Miller ([email protected]), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman,

Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Motivation

• Pashto is an indispensable Afghan language critical to our nation’s security

• Pashto is difficult for English speakers

• Updated, comprehensive, learner-oriented Pashto materials are needed– Grammar

– Easy-access dictionary

2

LANGUAGE RESEARCH IN SERVICE TO THE NATION

What makes Pashto difficult?

• Ergativity

• Up to four cases: direct, oblique, ablative, and vocative

• Multiple noun and adjective declension classes

• Variety of adpositions: prepositions, postpositions, and circumpositions

• Retroflex consonants

• Variety of verbal structures

3

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Project components

4

FieldworkDescriptive Grammar

Dictionary

Formal Grammar

Parser

Parser enables easy access to dictionary

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Fieldwork

• Identified native speakers of Pashto from Afghanistan and Pakistan living in the US– Peshawar, Quetta, Pakistan

– Kabul, Kandahar, Afghanistan

• Create and run elicitation guides highlighting range of grammatical features

• Review all paradigms and example sentences, note dialect variation

• Digitally record all sessions5

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Motivation for descriptive grammar

• Existing materials suffer from liabilities– dated

– cover single dialect• Tegey and Robson 1996: Kabul

• Penzl 1955: Kandahar

• Shafeev 1964: Kandahar

– lack Pashto script (T&R has it)

6

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Goals for descriptive grammar

• Contemporary data and presentation

• Use of Pashto script and transcription throughout

• Cover dialect variation wherever it applies

7

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Descriptive grammar

• Pashto language, orthography, phonology

• Adpositions• Pronouns• Nouns• Adjectives• Verbs• Dialectology• Miscellaneous

8

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Pashto dialects

9

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Pronoun paradigm: incorporation of dialect information

10

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Interlinear example sentences

11

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Adjective paradigm

12

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Formal grammar of inflectional affix

13

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Stem allomorphy in nouns

14

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Formal grammar of phonological rule

15

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Morphological parsing

• Inputs– Formal grammar

– Dictionary (Lexicon)

• Output capability– Analysis: given an inflected form, produce

possible headwords

– Generation: given a headword, produce possible inflected forms

16

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Uses of morphological parser

• Analysis capability enables dictionary lookup of inflected forms

• Generation has pedagogical uses including self-testing

17

LANGUAGE RESEARCH IN SERVICE TO THE NATION

How morphological analysis aids lookup

• Inflected forms may differ substantially from citation forms

• Experts can work around this problem, but non-experts often can’t

18

Translation Transcription PashtoI am shooting

wə́�lə́m ولم

I was shooting

wiʃtə́�lə́m ويشتلم

LANGUAGE RESEARCH IN SERVICE TO THE NATION

The parser maps inflected forms to citation forms (headwords)

ويشتل[wishtə́�l] (verb) to shoot

Grammatical info: first person singular present imperfectiveCitation form: ويشتل

What does this Pashto word mean?

ولم

What does this Pashto word mean?

ولم

19

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Conclusion

• Updated descriptive grammar based on fieldwork

• Formal grammar and lexicon feed parser

• Parser enables simplified dictionary lookup

• Faster, more informed processing of Pashto

20

LANGUAGE RESEARCH IN SERVICE TO THE NATION

Conclusion

• Updated descriptive grammar based on fieldwork

• Formal grammar and lexicon feed parser• Parser enables simplified dictionary lookupFaster, more informed processing of

Pashto

21

LANGUAGE RESEARCH IN SERVICE TO THE NATION

References

• David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged Languages, Third International Joint Conference on Natural Language Processing, Hyderabad, India.

• Maxwell, Michael and Anne David. 2008. Interoperable Grammars. First International Conference on Global Interoperability for Language Resources, Hong Kong.

• Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).

22

LANGUAGE RESEARCH IN SERVICE TO THE NATION

References

• Penzl, Herbert. 1955. A Grammar of Pashto. Washington, DC: American Council of Learned Societies.

• Tegey, Habibullah and Barbara Robson. 1996. A Reference Grammar of Pashto. Washington, DC: Center for Applied Linguistics.

• Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. International Journal of American Linguistics 30.

23