discourse annotation for arabic 3
Post on 11-Jun-2015
66 Views
Preview:
TRANSCRIPT
Discourse Annotation for Arabic Imam University
College of Computer and Information systems
Prepared by: Al-harbi.A
Al-Gumlas.H Al-Otaibi.E
• Introduction :Discourse usually refers to a form of written text
or spoken language.
A text is not only a sequence of sentences or clauses, but rather it is a coherent object that has many cohesive devices linking its units (words, clauses and sentences).
• Discourse Relations There are two types of discourse relations: (i) Relations that are signalled explicitly via so called discourse connectives. (ii) Relations that can be inferred from the context without any signaling.
Discourse relations are semantic relations.
• Discourse Relations
• Discourse ConnectivesTypes : Simple Connectives Ex: ألن because - بعدما after – و and Paired Connectives Ex: .. ف ف... if' – Then ..اذا .. - although - بالرغم لبث ما.حتى Modified Connectives Ex: لو ا - even if’ حتى أيضا and also ’و Combined Connectives
Ex: بعد and but ولكٍن� - except after اال
Related Work: . Several textual corpora of Arabic exist.
. Some of them are available with Part-of-Speech and syntactic
annotation such as the Arabic Treebank (ATB) The Prague Arabic Dependency Treebank (PADT), which is smaller in scale than the ATB, contains multilevel annotations, including morphological and analytical level of linguistic representation.
. Also, a recent effort by Dukes and Habash (2010) has produced
The Quranic) has produced The Quranic Arabic Corpus, a free annotated linguistic resource which provides morphological annotation and syntactic analysis of the Holy Quran.
• Collecting Arabic Connectives
.They are collected a large set of Arabic discourse
connectives using text analysis and corpus-based techniques. Example : A. الثمٍن ] �]Arg1 [ لكنها]DC [ باهظة جدا متطورة .Arg2 [ السيارةB. [al-sy¯arh mtt.wrh ˇgd¯an.] Arg1 [lknh¯a] DC [b¯ahz. ah
alt-mn] Arg2C. [The car is so modern.] Arg1 [but] DC [it is too expensive]
Arg2.
• Annotation Scheme
. Annotation is based on lexicalized grammar theory.
1. The anchor of the annotation is the lexical item – a discourse connective (DC).
2. The Arg2 label is assigned to the argument with which the connective was syntactically associated.
3. The Arg1 label, can refer to an abstract object at any distance from the connective.
Theories of Discourse Structure
. Linguists attempted to produce reasonable generalized theories to
represent discourse structure.
.Theories of discourse structure differ in their focus according to the type of
discourse such as: written text or dialogue, the type of organization such as intentional organization (speaker’s plan).
. One of the most popular discourse theories is:
( RST ) Rhetorical Structure Theory
RST. .RST is a theory of how coherence in text is achieved
.RST was originally developed as part of studies of .computer-based text generation
.RST is designed to explain the coherence of texts, seen as a kind of function, linking parts of a text to each other.
RST Relation Name Nucleus Satellite
Background text whose understanding is being facilitated
text for facilitating understanding
Elaboration basic information additional information
Preparation text to be presented text which prepares the reader to expect and interpret the text to be presented .
RST Example With just those relations, we can illustrate the analysis of a text.
applications• Question-Answering and Information
Extraction systems• Speech Recognition.• Text Generation.• Essay Scoring.• Text Summarization.
Dicourse Annotation tool for English and Arabic
Thank you
top related