discourse level software current status and future directions nov. 16, 2004 lars huttar...

24
Discourse Level Software Current Status and Future Directions Nov. 16, 2004 Lars Huttar ([email protected]) Knowledge Management Services

Upload: jack-whitehead

Post on 25-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Discourse Level Software

Current Statusand Future Directions

Nov. 16, 2004Lars Huttar ([email protected])

Knowledge Management Services

Abstract (I)

• Discourse analysis (DA, a.k.a. textlinguistics) is a task frequently cited as needing computer-assisted tools.

• Some tools are currently available for certain tasks, but as yet, no user-ready applications specifically for the discourse charting commonly used on the field.

Abstract (II)

• This presentation will review a few of the existing tools most pertinent to DA on the field, and software that is planned or under development.

• I will also mention the conceptual model for constituent charting described in my thesis, which uses XML encoding of text and analysis, from which a chart is rendered via XSL.

Overview

• The need for discourse analysis software

• What’s already out there?• What’s coming down the pike?

Need for Discourse Software

The task:• Help the user produce charts,

diagrams, and summaries of texts in such a way as to facilitate discovery of discourse patterns and to expedite testing of hypotheses.

Major features desired

• Import (interlinear) text

• Segment and move pieces into chart columns

• Mark genre(s)• Configurable auto-

highlighting, e.g. color by POS.

• Toggle highlighting of certain features

• Manual annotation of features incl. coherence and prominence

• Search text, IT, and annotations

• Chart/summary of results, hyperlinked to data

• Accessible to MTTs/OTTs

» Geoffrey Hunt» Kent Spielmann

Example constituent chart

Current Practice

• Pencil & paper• MS Word• MS Excel• A few brave

souls useother tools

The Right Tools?

Specialized tools could make it quicker and easier!

How to Address the Need?

• Use existing software• SIL FieldWorks DA tool(s)• Extend existing tools?

What’s already here?

• MDA• BART• RSTTool• MATE• CiCaDA

Multilinear Discourse Analysis

• Generate statistics and diagrams relating to span analysis, topic continuity statistics, and other issues

• Input is an SFM marked up text (e.g. from Shoebox)• In Beta 2

• More info: [email protected]

Biblical Analysis Research Tool• BART – has features supporting discourse analysis

of biblical texts• Comes with extensive built-in morphosyntax

markup; supports customizable tagging and complex queries.

• Only for biblical texts; can’t enter vernacular texts.• Part of TW, or

available from WordSearch Corp.

• www.sil.org/translation/bart.htm

RSTTool

• Lets user diagram relations between text “chunks.”• Free download from http://www.wagsoft.com/RSTTOOL

• User can define own set of relations, schemas, etc. such as SSA or Longacre’s propositional relations.

• Can generate statistics based on the tree structures built by the user.

• File format is XML-based.

• Text can be edited even after struc-turing has begun.

MATE Workbench

• Tool “to aid in the display, editing and querying of annotated speech corpora”

• Encodes data in XML and displays via XSL-like stylesheets; could be programmed to produce various displays.

• In “early demo” version (2001). Looks like it has potential, but I can’t get it to runon my machine.

• http://mate.nis.sdu.dk/

CiCaDA

• Produce fairly feature-complete constituent charts from XML data using XSLT stylesheets.

• Encode text, column assignments, and chart configuration in XML; chart is produced automatically.

• Open standards promote modification/ reuse of data.

• There is no “application;” no user-friendly way to enter the XML data.

Helps available

• LinguaLinks Library has several items, including:

• Analyzing Discourse: a Manual of Basic Concepts – Dooley & Levinsohn (avail. on the web as well as in LLL). Very practical.

Do you know of others?

• Please let me know if you are aware of other useful discourse-level software tools!

What’s coming?

• TCC• AGTK• FieldWorks

DA tools

TCC

• “A tool for drawing syntax trees” – could also be used for discourse “chunking” and highlighting

• Looks very easy to use. Collapsible tree makes it easy to browse large text structures.

• Supports Latin-1 charset.• Author taking feedback to

make TCC more useful for SIL’s work.

• Still in beta. No release sched.• Info: http://ulrikp.org/

Annotation Graph ToolKit• AGTK is a toolkit for annotating texts• TreeTrans – edit syntactic trees; charting &

chunking possible• InterTrans – interlinearize text (very beta)

• Saves in an abstract XML format; potential good basis for “Lego” solution

• Not ready for end users.

SIL FieldWorks DA Tool(s)• FW DA software is still on the drawing

board but is a high priority.• Would leverage the huge benefits of all the

work that has gone into FieldWorks!• FW tools already support interlinear text,

text annotations/tagging and highlighting.• Preliminary work has begun on design of

constituent charting features.• Wish list for DA features exists but

requirements not yet prioritized.Guidance team has not yet beenformed.

Conclusion

• There are some good tools already out there for certain tasks related to DA. Unfortunately they don’t interoperate much, and there are no domain-aware applications for constituent charting.

• SIL FieldWorks tools, as they become available, should cover certain DA tasks well, such as constituent charting.

Questions?

Comments?