©2012 paula matuszek csc 9010: text mining applications fall, 2012 introduction to gate dr. paula...

42
©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek [email protected] Taken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/Am itShethGlobalInfInfrastucture/Presentation/GATE.ppt

Upload: gabriel-maurice-adams

Post on 24-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

CSC 9010: Text Mining Applications

Fall, 2012Introduction to GATE

Dr. Paula [email protected]

Taken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Page 2: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

What is GATE?

Stands for General Architecture for Text Engineering.

Developed at the University of Sheffield Component-based architecture with data

separated from applications, many discrete capabilities included as plugins.

Page 3: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Who Uses GATE?

Scientists performing experiments that involve processing human language

Developers developing applications with language processing components

Teachers and students of courses about language and language computation

Us :-)

Page 4: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

How GATE can Help? Specify an architecture, or organizational

structure, for language processing software Provide a framework that implements the

architecture and can be used to embed language processing capabilities in applications

Provide a development environment built on top of the framework made up of convenient tools for developing components (plugins)

Page 5: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Really? Yeah, really. It’s been under development for 15 years and is still

under very active development Open-source, with dozens of developers, some of whom

have been involved since the beginning Active community that provides good support

– Mailing list: lists.sourceforge.net/lists/listinfo/gate-users– twitter: twitter.com/#!/GateAcUk– LinkedIn: http://www.linkedin.com/groups/GATE-2230077

Many other text mining capabilities have been integrated with it.

An almost overwhelming amount of documentation

Page 6: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

GATE Architecture Overview

http://gate.ac.uk/overview.html

Page 7: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

GATE Product Family GATE Developer: IDE for language processing, with

information extraction and other plugins. GATE Embedded: object library which can be included

in applications GATE Teamware: collaborative annotation environment GATE Mimir: a “multiparadigm index” which supports

semantic indexing and search GATE Wiki: “controllable wiki” based on Grails and

Subversion GATE Cloud: GATE embedded running on

supercomputer hardware

Page 8: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

GATE Components We will deal primarily with GATE Developer: It has four components:

– Applications: groups of processes to be run on a document or corpus.

– LanguageResources (LRs): entities such as lexicons, documents, corpora, annotation schemas, ontologies.

– ProcessingResources (PRs): tools that operate on unstructured text, such as parsers and tokenizers. These are mostly plugins.

– DataStores: saved processed documents and resources.

Page 9: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Overview of Gate Developer

GATE Developer Resources Pane

– applications: groups of processes to run on a document or corpus

– language resources: corpus, ontologies, schemas– processing resources: tools that operate on

unstructured text– datastores: saved documents and resources

Display Pane: whatever you’re currently working with.

Page 10: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Setup Options

Configuration– Appearance: font, skin– Advanced:

– add space on markup (to make html and xml more readable)

– Save options and session on exit

– Insert append or prepend (for annotations)

– default browser (for user guide)

Input (?)– default language

Page 11: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Language Resources Language Resources can be of four

kinds:– Documents are modeled as content plus

annotations plus features.– A Corpus is a Java Set whose members are

Documents.– Annotations are organized in graphs, which

are modeled as Java sets of Annotation.– Schemas are XML schemas describing

allowable annotations and features

Page 12: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Documents Processing in GATE

Document:– Formats including XML, RTF, email, HTML,

SGML, and plain text.– Identified and converted into GATE

annotation format.– Processed by Processing Resources.– Results stored in a serial data store (based

on Java serialization) or indexed in a Lucene database.

– Can also be exported as XML.

Page 13: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

New Document Documents are converted to GATE format; can be

saved for future use or exported. Language Resources --> New --> Document Name: can leave blank and it will be created

automatically (no spaces) from filename+UniqueID Checkmarks: required.

– just leave defaults– sourceURL

– can be a file (click the folder icon for browse)– or actual URL (GATE will fetch it)– or set to stringContent to put content in directly.

Encoding will probably be utf-8. markupAware: process XML and HTML tags

Page 14: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Document Display Double-click document

– Text (minus annotations if you chose markupAware)

– Annotation Sets– from XML, HTML, previous annotation work– different colors for different categories

– Annotations list– annotations chosen in Sets pane

Page 15: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Creating a Corpus To import new documents we name the

corpus and create it without any documents.

Language Resources --> New --> Corpus Right-click and populate

– choose directory, extensions, encoding This will create the corpus and show the

corpus and the individual documents in the Resources Pane.

Page 16: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

GATE Corpus Corpus Display Pane:

– Add documents to a corpus with + button which appears when a corpus is displayed.

– Remove with -. (Note: this removes them from corpus, not from Developer)

Documents can be included in multiple corpora.

A corpus can be created from a single concatenated file, by specifying the documentRootElement. This makes sense for, for instance, XML documents.

Page 17: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

CREOLE

A Collection of REusable Objects for Language Engineering

The set of resources integrated with GATE

All the resources are packaged as Java Archive (or ‘JAR’) files, plus some XML configuration data.

Managed in the Creole Plugin Manager

Page 18: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Processing Resources: ANNIE

A family of Processing Resources for language analysis included with GATE

Stands for A Nearly-New Information Extraction system.

Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.

Page 19: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

ANNIE IE Modules

http://gate.ac.uk/sale/tao/splitch6.html#chap:annie

Page 20: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Some ANNIE Components

Tokenizer Gazetteer: lists of entities Sentence Splitter Part of Speech Tagger

– produces a part-of-speech tag as an annotation on each word or symbol.

Semantic Tagger

Page 21: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

ANNIE Component: Tokenizer

Token Types– word, number, symbol, punctuation, and

spaceToken. A tokenizer rule has a left hand side and

a right hand side.

Page 22: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Tokenizer Rule

Operations used on the LHS:– | (or) –  * (0 or more occurrences)  – ? (0 or 1 occurrences)  – + (1 or more occurrences)

The RHS uses ’;’ as a separator, and has the following format: {LHS} > {Annotation type};{attribute1}={value1};...;{attribute  n}={value n}

Page 23: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Example Tokenizer Rule

– "UPPERCASE_LETTER" "LOWERCASE_LETTER"* 

– >  – Token;orth=upperInitial;kind=word; – The sequence must begin with an uppercase

letter, followed by zero or more lowercase letters. This sequence will then be annotated as type “Token”. The attribute “orth” (orthography) has the value “upperInitial”; the attribute “kind” has the value “word”.

Page 24: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

ANNIE Component: Gazetteer

The gazetteer lists used are plain text files, with one entry per line.

Each list represents a set of names, such as names of cities, organizations, days of the week, etc.

Page 25: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Example Gazetteer List A small section of the list for units of currency: …… Ecu  

European Currency Units  FFr  Fr  German mark  German marks  New Taiwan dollar  New Taiwan dollars  NT dollar  NT dollars

……

Page 26: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

ANNIE Component: Semantic Tagger

Based on JAPE language, which contains rules that act on annotations assigned in earlier phases.

Produce outputs of annotated entities.

Page 27: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

ANNIE Component: Sentence Splitter

Segments the text into sentences. This module is required for the tagger. The splitter uses a gazetteer list of

abbreviations to help distinguish sentence-marking full stops from other kinds.

Page 28: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Example Using ANNIE

http://services.gate.ac.uk/annie/

More next week.

Page 29: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Viewing and Editing Annotations

We have looked at annotations, both added by ANNIE and extracted from tags in the document.

It is sometimes useful to examine closely and edit these annotations– you are using a small corpus and want them correct

before you proceed with other tools– you have a sample set that will be used for training or

for quality assurance and they need to be accurate– you are still developing the resources being used to

tag documents.

Page 30: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Unrestricted Annotation Editing

We can change to an arbitrary different annotation type.

The process is:– choose text to be annotated– hover over it or right click. The annotation

editor pops up.– if you’re changing it, delete existing annotation– add new annotation, by choosing or typing it in

Page 31: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Restricted Annotation Editing

Typically we want better consistency and control for our editing.

Use a schema to specify allowable annotation types and features.

GATE includes many predefined schemas

Located at <root>/plugins/ANNIE/resources/schema

Page 32: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Schema Annotation Editor CREOLE resource to let us use the

schema for annotation editing Enable in Manage CREOLE Plugins

window (under File menu) Select an annotation, hover or right-click Different editor window, specifying

allowable types and features Choose new type or feature.

Page 33: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

More on Schemas and Editing

You can also initiate editing by right-clicking on an annotation in the annotations list.

You can use multiple schemata in processing one document.

Page 34: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Create an Application with Processing Resources

(PRs) Applications model a control strategy for the

execution of PRs. Simple pipelines: group a set of PRs

together in order and execute them in turn. Corpus pipelines: open each document in

the corpus in turn, set that document as a runtime parameter on each PR, run all the PRs on the corpus, then close the document

We will do this during lab.

Page 35: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Saving GATE Language Resources and Applications

Data Stores: – save processed documents for additional

use– specialized folder on a hard drive– Lucene database

– improve processing times for large collections of documents

Page 36: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Types of Data Store

Serial Data Store: – based on java’s serialization system. – store in a directory

Lucene Data Store (Lucene is an open-source indexing and search tool.)– searchable repository– Lucene-based indexing

Page 37: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Saving in a datastore

Create a folder. Right-click to get Create Datastore

menu This only creates the store. Save

corpora or documents in the Language Resources pane.

Once saved, they can be

Page 38: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Saving as XML

Individual documents can also be saved directly.– Special GATE XML format

– annotations are appended to the document, locations for tags are embedded in body

– Preserve original format– use for XML or html.– will save all original tags and everything

selected in the annotations – For a plain text file, embeds inline tags.

Page 39: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula MatuszekTaken partially from a presentation by Lin Lin. http://iwayan.info/Research/Interoperability/Tutor_Workshop/AmitShethGlobalInfInfrastucture/Presentation/GATE.ppt

Saving Applications

Save a set of processing resources and their parameters.– Right-click, save application state.– Append .xgapp for name

To export as a standalone, export as teamware– bundles all needed files– intended for teamware but can be used for

sharing directly.

Page 40: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

And LOTS more GATE is an extraordinarily rich system. Some of the other

CREOLE resources included in the standard distribution:– Annotation Merging, Quality assurance summarizer for comparing

annotations– Web crawler , Information Retrieval, Key Phrase Extraction– Machine learning – Domain-specific taggers (e.g., chemistry)– Resources for many languages

CREOLE plugins for integrating with many other systems. E.g.– UIMA– Wordnet– Penn BioTagger– OpenCalais– OpenNLP– LingPipe

More details at http://gate.ac.uk/gate/doc/plugins.html

Page 41: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

Some Links Home page is http://gate.ac.uk/ Some good short tutorial videos for getting started:

http://gate.ac.uk/demos/developer-videos/ . These are only a few minutes each, so they’re fast. Version 6, but they don’t seem to be very different.

User Guide: http://gate.ac.uk/sale/tao/index.html . This is apparently for version 7.1, which is a development build, but again it seems to be fine.

Lots of documentation (“acres” of it): http://gate.ac.uk/documentation.html

The wiki: http://gate.ac.uk/wiki/ Some very nice course materials, with a lot more detail

than we will cover, including a unit on sentiment analysis: http://gate.ac.uk/wiki/training-materials-2011.html

Page 42: ©2012 Paula Matuszek CSC 9010: Text Mining Applications Fall, 2012 Introduction to GATE Dr. Paula Matuszek Paula.Matuszek@gmail.com Taken partially from

©2012 Paula Matuszek

What Next?

In lab we will create a simple application and use it.

Next week we will go into a lot more detail on using Annie for information extraction

Homework. (You knew that was coming...) I’m not going to get into programming in

GATE or the more advanced applications. This might be the best tool for some of your projects, though.