discourse annotation for arabic 3

Post on 11-Jun-2015

66 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Discourse Annotation for Arabic Imam University

College of Computer and Information systems

Prepared by: Al-harbi.A

Al-Gumlas.H Al-Otaibi.E

• Introduction :Discourse usually refers to a form of written text

or spoken language.

A text is not only a sequence of sentences or clauses, but rather it is a coherent object that has many cohesive devices linking its units (words, clauses and sentences).

• Discourse Relations There are two types of discourse relations: (i) Relations that are signalled explicitly via so called discourse connectives. (ii) Relations that can be inferred from the context without any signaling.

Discourse relations are semantic relations.

• Discourse Relations

• Discourse ConnectivesTypes : Simple Connectives Ex: ألن because - بعدما after – و and Paired Connectives Ex: .. ف ف... if' – Then ..اذا .. - although - بالرغم لبث ما.حتى Modified Connectives Ex: لو ا - even if’ حتى أيضا and also ’و Combined Connectives

Ex: بعد and but ولكٍن� - except after اال

Related Work: . Several textual corpora of Arabic exist.

. Some of them are available with Part-of-Speech and syntactic

annotation such as the Arabic Treebank (ATB) The Prague Arabic Dependency Treebank (PADT), which is smaller in scale than the ATB, contains multilevel annotations, including morphological and analytical level of linguistic representation.

. Also, a recent effort by Dukes and Habash (2010) has produced

The Quranic) has produced The Quranic Arabic Corpus, a free annotated linguistic resource which provides morphological annotation and syntactic analysis of the Holy Quran.

• Collecting Arabic Connectives

.They are collected a large set of Arabic discourse

connectives using text analysis and corpus-based techniques. Example : A. الثمٍن ] �]Arg1 [ لكنها]DC [ باهظة جدا متطورة .Arg2 [ السيارةB. [al-sy¯arh mtt.wrh ˇgd¯an.] Arg1 [lknh¯a] DC [b¯ahz. ah

alt-mn] Arg2C. [The car is so modern.] Arg1 [but] DC [it is too expensive]

Arg2.

• Annotation Scheme

. Annotation is based on lexicalized grammar theory.

1. The anchor of the annotation is the lexical item – a discourse connective (DC).

2. The Arg2 label is assigned to the argument with which the connective was syntactically associated.

3. The Arg1 label, can refer to an abstract object at any distance from the connective.

Theories of Discourse Structure

. Linguists attempted to produce reasonable generalized theories to

represent discourse structure.

.Theories of discourse structure differ in their focus according to the type of

discourse such as: written text or dialogue, the type of organization such as intentional organization (speaker’s plan).

. One of the most popular discourse theories is:

( RST ) Rhetorical Structure Theory

RST. .RST is a theory of how coherence in text is achieved

.RST was originally developed as part of studies of .computer-based text generation

.RST is designed to explain the coherence of texts, seen as a kind of function, linking parts of a text to each other.

RST Relation Name Nucleus Satellite

Background text whose understanding is being facilitated

text for facilitating understanding

Elaboration basic information additional information

Preparation text to be presented text which prepares the reader to expect and interpret the text to be presented .

RST Example With just those relations, we can illustrate the analysis of a text.

applications• Question-Answering and Information

Extraction systems• Speech Recognition.• Text Generation.• Essay Scoring.• Text Summarization.

Dicourse Annotation tool for English and Arabic

Thank you

top related