pragmatic annotation & analysis in dart

15
Pragmatic Annotation & Analysis in DART Martin Weisser School of English & Education Guangdong University of Foreign Studies [email protected] martinweisser.org

Upload: lynsey

Post on 22-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Pragmatic Annotation & Analysis in DART. Martin Weisser School of English & Education Guangdong University of Foreign Studies [email protected] martinweisser.org. Outline. Getting DART Design Background DART Annotation Scheme Basic Automated Annotation Speech-Act Analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Pragmatic Annotation & Analysis in DART

Pragmatic Annotation & Analysis in DART

Martin WeisserSchool of English & Education

Guangdong University of Foreign [email protected]

martinweisser.org

Page 2: Pragmatic Annotation & Analysis in DART

Outline

• Getting DART• Design Background• DART Annotation Scheme• Basic Automated Annotation• Speech-Act Analysis• N-Gram Analysis• Creating & Editing Resources

Page 3: Pragmatic Annotation & Analysis in DART

Getting DART

• go to http://martinweisser.org/ling_soft.html#DART

• download & run installer (currently 64bit Win only)

Page 4: Pragmatic Annotation & Analysis in DART

Design Background (1)

• 1997–1998: Expert Advisory Group on Language Engineering Standards (EAGLES) WP4guidelines for the representation and annotation of

dialogue• 2001–2002: SPAAC (A Speech-Act Annotated

Corpus of Dialogues) Projectannotation of some 1,200 task-oriented dialogue files

(Trainline + BT)– need to annotate and post-edit corpus efficiently and

consistently on multiple levels SPAACy

Page 5: Pragmatic Annotation & Analysis in DART

Design Background (2)

colour coding helps to identify syntactic patternspost-processing

constrained through fixed options

resources loaded automatically

Page 6: Pragmatic Annotation & Analysis in DART

Design Background (3)

• flaws in SPAACY– monolithic, i.e. no separation of ‘linguistic intelligence’ &

output displayhard to improve linguistic analysis– processing & editing of single files only– other interface issues, e.g. too many buttons, etc.

development of DART– modularisation– strict separation of processing and linguistic analysis routines– enhanced options for analysis and creation of resources

Page 7: Pragmatic Annotation & Analysis in DART

DART Annotation Scheme (1) –Basic Input Format

optional stylesheet reference

text with optional punctuation ‘tags’ or embedded comments

basic skeleton can be created via ‘File→New’ (Ctrl + n)

Page 8: Pragmatic Annotation & Analysis in DART

DART Annotation Scheme (1) –Output Format

syntactic category mode = semantico-pragmatic markers/’IFIDs’

topic = semantic info

(surface) polarityspeech act(s)

speech act generally inferred from combination of syntax + mode

Page 9: Pragmatic Annotation & Analysis in DART

Basic Automated Annotation

input files workspace

output files workspace

to load single file, press Ctrl + a(, for whole directory Ctrl + d)

single file loaded;to pre-edit, click hyperlink;

to annotate pragmatically, press Ctrl+a

debugging output;ignore if annotation completes successfully

single file processed;to post-edit, click hyperlink

Page 10: Pragmatic Annotation & Analysis in DART

Speech-Act Analysis

• generate frequency list of syntactic category + speech act(s) from ‘Analysis→Speech act stats’

• click hyperlinked speech act (combination) to prime concordancer

• investigate results• if necessary, correct speech act tag by clicking

the hyperlink to the file and editing it

Page 11: Pragmatic Annotation & Analysis in DART

N-Gram Analysis

• useful for determining formulaic expressions for modes or topic patterns (or in general)

• predefined options for uni- to tri-grams• optionally also freely definable n-grams• frequency lists display abs. & rel. frequencies• hyperlink again primes concordancer– for all n>1 with interpolated optional fillers– due to accommodating mixed-case data, sometimes

‘case insensitive’ flag required

Page 12: Pragmatic Annotation & Analysis in DART

Creating & Editing Resources (1)• mostly done via ‘Edit resources’ menu…• … apart from creating new files• to create new corpus

– choose ‘Edit configuration’– click ‘Add corpus entry’– fill in corpus, lexicon, and topic file name (usually identical, apart from

extension)– click ‘Save configuration’

• new resources created– data folder for corpus– three subfolders: ‘info’, ‘notes’, and ‘stats’– dummy lexicon & topics files (in relevant program folders)

Page 13: Pragmatic Annotation & Analysis in DART

Creating & Editing Resources (2)

• existing resources can be edited…– generally via relevant entry in the ‘Edit resources’ menu– lexica & topic files via hyperlinks in configuration editor

• safest to edit only dialogue, lexica & topic files…• … unless you really know what you’re doing • lexica can also be ‘synthesised’ from corpus data

Page 14: Pragmatic Annotation & Analysis in DART

Creating & Editing Resources (3) –Lexica

• very simple format– word (base form) + space + tag + optional comment (preceded

by #)– special DART tagset

• allows for lexical polysemy– uppercase tag name = unambiguous– lowercase tag name = predominantly tag X

• tooltips on tag buttons provide explanations while editing

• synthesising lexicon works by– creating word list from corpus– ‘subtracting’ items from general lexicon– suggesting possible candidates after morphological analysis

Page 15: Pragmatic Annotation & Analysis in DART

Creating & Editing Resources (4) –Topic Files

• syntax more complex than for lexica• combination of topic labels, space, double

colon, space, associated (representative) patterns

• patterns expressed as– regexes– individual sub-patterns separated by 3 underscores