computer-aided learning of transitive non-locative constructions with a concrete direct object in...

20
Computer-aided learning of transitive non-locative constructions with a concrete direct object in Modern Greek Kyriaki Ioannidou [email protected] Eleni Tziafa [email protected]. gr Rania Voskaki rvoskaki@hotmail. Laboratory of Translation and Language Processing Aristotle University of Thessaloniki GREECE 7 th International Technology, Education & Development Conference, Valencia 4–6/3/2013

Upload: nyah-pashby

Post on 15-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Computer-aided learningof transitive non-locative

constructions with a concrete direct object

in Modern Greek

Kyriaki Ioannidou [email protected] Eleni [email protected] Rania [email protected]

Laboratory of Translation and Language Processing

Aristotle University of ThessalonikiGREECE

7th International Technology, Education & Development Conference, Valencia 4–6/3/2013

Presentation Plan Theoretical and methodological framework

Linguistic resources used

Linguistic resources created

Modern Greek CDO lexicon-grammar table

Parameterised finite state automaton (FSA)

FSA for Greek noun phrases

In the context of CLIL and CEFR

•Research aim:Computer-aided learning of transitive non-locative constructions with a concrete direct object (CDO) for Modern Greek learners

•Theoretical Framework adopted: Transformational Grammar defined by Z. S. Harris (1968) [1]

•Methodology Model adopted: Lexicon-Grammar developed by M. Gross (1975) [2]

Linguistic resources used in our research

•a corpus of 7.000.000 words (journalistic and educational discourse)▫3.000.000 words of the Makedonia

newspaper▫2.000.000 words of the TA NEA newspaper▫2.000.000 words of school books by the

Pedagogical Institute

•a morphological dictionary of 2.000.000 flectional forms belonging to the Laboratory of Translation and Language Processing of the Aristotle University of Thessaloniki

Linguistic resources created

•a syntactic-semantic lexicon in the form of a table (Lexicon-Grammar Table comprising 300 verbs).

•a parameterised FSA allowing the use of the above lexicon.

•a set of 382 chunking FSA for the noun phrase structure description.

CDO in Modern Greek

• constructions are transitive

Η Μαίρη έσπασε το βάζο Mary broke the vase

• the complement is one and obligatory

Ο Γιάννης επιδιορθώνει το ψυγείοJohn is repairing the fridge

* Ο Γιάννης επιδιορθώνει* John is repairing

CDO in Modern Greek

• the complement is not prepositional

Η μητέρα της καθάρισε τα τζάμια Her mother cleaned the windows

Χ Πάει στο πάρκοΧ He goes to the park

• the complement is a concrete direct object

Ο Ερρίκος σιδέρωσε το πουκάμισό του

Eric ironed his shirt

Χ Η γραμματέας ενημέρωσε το αφεντικό τηςΧ The secretary informed her boss

Lexicon-Grammar Table constructed

Application to corporaUse of a parameterised FSA

• Parameterised FSA are meta-graphs allowing the automatic generation of a set of graphs, on the basis of a lexicon-grammar table.

• They refer to the columns of the lexicon-grammar table in the form of parameters or variables.

• They allow the recognition of certain syntactic terms, direct objects in our study.

• They describe all possible constructions of the verbs studied.

• They have the format of a FSA (Hopcroft, Motwani & Ullman 2001) [3]

• They are created via the Unitex platform (Paumier 2003)[4]

Parameterised FSA

Parameterised FSA enrichment•Parameterised FSA focus on recognising:

▫the syntactic role of constituents▫all possible transformations described in the

lexicon-grammar table

•Parameterised FSA do not recognise :▫complex structures of syntactic terms

A detailed description of noun phrase structure is needed

Noun phrase description•Based on the approach of Ramshaw & Marcus

(1995) [5]

▫Base noun phrases : non-recursive noun phrases; noun phrases that contain no nested noun phrases Αποστείρωσε το μικρό μπουκάλι

(He sterilised the small bottle)

▫Maximal-length noun phrases : base noun phrases modified by other base noun phrases

Πλένω την κούκλα και τα ρούχα της(I am washing the doll and its clothes)

Noun phrase description• Structures recognised:

▫all base noun phrases (nouns, pronouns, nominals)Άρπαξε το ποτήρι(She grabbed the glass)

▫maximal-length noun phrase with the use of genitive caseΟι εργάτες έκαψαν το σπίτι του εργοδότη τους(The workers burnt the house of their employer)(Lit. transl.)(The workers burnt their employer’s house)

▫maximal-length noun phrase with coordinationΠληκτρολόγησε το βιογραφικό της και τη συνοδευτική επιστολή(She typed her CV and the motivation letter)

Noun phrase description• In NLP noun phrase description can be considered

as equal to noun chunking(Abney 1991 [6]; Voutilainen 1993 [7]; Tjonk Kim Sang 2000 [8];Bai, Li, Kim & Lee 2006 [9])

• Description is made by the use of FSA (chunking graphs)

(Brill 1993 [10], Roche 1993 [11]; Abney 1996 [12]; Blanc et al. 2007 [13]; Mokrane et al. 2008 [14])

• FSA were created via the Unitex platform (Paumier 2003 [4])

Chunking FSA

Sample concordances

[…] και αγγίξαμε [όλες τις επιφάνειες των σχημάτων]NP1

([…] we touched [all the surfaces of the shapes]NP1)(Lit. tr.)

([…]we touched [all the shapes surfaces]NP1)

Κάθε μέρα πήγαινε εκεί, άνοιγε [την κάνουλα]NP1 […]

(Every day he went there, opened [the faucet]NP1 […])

Έτρωγα [το πρωινό και το μεσημεριανό]NP1 που’φερνε […] (I ate [the breakfast and the lunch]ΝP1 […]) (Lit.

transl.)(I ate [breakfast and lunch]NP1 […])

In the context of CLIL The corpus used comprises the following

thematic units: (i) economics, (ii) book presentation, (iii) visual arts, (iv) arts and culture, (v) biology, (vi) history, (vii) translated Ancient Greek texts, (viii) physics, (ix) chemistry, (x) economics theory, (xi) Modern Greek learning, (xii) religion, (xiii) mathematics.

and the following text types:(i) sports news, (ii) reportage, (iii) gastronomy, (iv) interview, (v) advertisements, (vi) science news, (vii) artistic reviews, (viii) biographies, (ix) curriculum vitae), (x) short stories, (xi) stories, (xii) forecast, (xiii) dialogs, (xiv) research, (xv) e-mail.

In the context of CEFRThe proposed method could serve as a

complement to learners of B2, C1, C2 levels, since in

• B2 level: circumstantial structural oversight is allowed,

• C1 level: use of complex structures with a few structural errors is acquired,

• C2 level: use of complex structures is strongly required.

Out of sample tests and past papers (last five years) of the Certificate of Attainment in Modern Greek (Center for the Greek Language) activities requiring the right order have been observed.

Perspectives• Enrichment of the existing corpora

• Improvement of the FSA, so as to eliminate ambiguities

• Syntactical tagging part of the corpus, in order to evaluate the obtained results (by recall and precision)

• Recognition of other types of direct objects (e.g. ‘human object’, ‘body part object’, etc)

References[1] Harris, Z. S. (1968). Mathematical Structures of Language, New York, Wiley.

[2] Gross, M. (1975). Méthodes en syntaxe. Régime des constructions complétives. Paris : Hermann.

[3] Hopcroft, J. E., Motwani, R., & Ullman, J. D. (2006). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley.

[4] Paumier, S. (2003). De la reconnaissance de formes linguistiques à l'analyse syntaxique. Thèse de doctorat, Paris, Université de Marne-la-Vallée.

[5] Ramshaw, L. A. & Marcus, M. P. (1995). "Text Chunking using Transformation-Based Learning". ACL Third Workshop on Very Large Corpora, pp. 82-94.

[6] Abney, S. (1991). Parsing by Chunks. In S. A. Robert Berwick, Principle-Based Parsing. Dordrecht: Kluwer Academic Publishers.

[7] Voutilainen, A. (1993). NPTool, a detector of English noun phrases. Proceedings of the Workshop on Very Large Corpora, ACL, pp. 48-57.

[8] Tjong Kim Sang, E. F. (2000). "Noun Phrase Recognition by System Combination". Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference (pp. 50-55). Stroudsburg, PA: Association for Computational Linguistics.

[9] Bai, X.-M., Li, J.-J., Kim, D.-I. & Lee, J.-H. (2006). "Identification of Maximal-Length Noun Phrases Based on Expanded Chunks and Classified Punctuations in Chinese". 21st International Conference on the Computer Processing of Oriental Languages 2006, pp. 268-276. Berlin: Springer-Verlag.

[10] Brill, E. (1993). A Corpus-Based Approach to Language Learning. University of Pennsylvania.[11] Roche, E. (1993). Analyse syntaxique transformationnelle du français par transducteurs et

lexique-grammaire. Université Paris 7.[12] Abney, S. (1996). Chunk stylebook. Technical report, SfS, University of Tübingen.[13] Blanc, O., Constant, M., & Watrin, P. (2007). "Segmentation en super-chunks." Actes de TALN

2007. Toulouse: ATALA.[14] Mokrane, A., Friburger, N., & Antoine, J.-Y. (2008). "Cascades de transducteurs pour le

chunking de la parole conversationnelle : l'utilisation de la plateforme CasSys dans le projet EPAC." TALN 2008. Avignon.