practical nlp with lisp

45
Practical NLP with Lisp Vsevolod Dyomkin Grammarly

Upload: vsevolod-dyomkin

Post on 29-Nov-2014

2.882 views

Category:

Technology


4 download

DESCRIPTION

* Overview of NLP practice* Getting Data* Using Lisp: pros & cons

TRANSCRIPT

Page 1: Practical NLP with Lisp

Practical NLP with Lisp

Vsevolod DyomkinGrammarly

Page 2: Practical NLP with Lisp

Topics

* Overview of NLP practice* Getting Data* Using Lisp: pros & cons* A couple of examples

Page 3: Practical NLP with Lisp

A bit about Grammarly

(c) xkcd

Page 4: Practical NLP with Lisp

An example of what we deal with

Page 5: Practical NLP with Lisp

NLP practiceR - research work:set a goal →devise an algorithm →train the algorithm →test its accuracy

Page 6: Practical NLP with Lisp

NLP practiceR - research work:set a goal →devise an algorithm →train the algorithm →test its accuracy

D - development work:implement the algorithm as an APIAPI with sufficient performanceperformance and scaling characteristics

Page 7: Practical NLP with Lisp

Research1. Set a goal

Business goal:

* Develop best/good enough/better than Word/etc spellchecker

* Develop a set of grammar rules, that will catch errors according to MLA Style

* Develop a thesaurus, that will produce synonyms relevant to context

Page 8: Practical NLP with Lisp

Translate it to measurable goal

* On a test corpus of 10000 sentences with common errors achieve smaller number of FNs (and FPs), that other spellcheckers/Word spellchecker/etc

* On a corpus of examples of sentences with each kind of error (and similar sentences without this kind of error) find all sentences with errors and do not find errors in correct sentences

* On a test corpus of 1000 sentences suggest synonyms for all meaningful words that will be considered relevant by human linguists in 90% of the cases

Page 9: Practical NLP with Lisp

A Note on Terminology

FN and FP instead of precision (P), recall (R)

FN = 1-RFP = 1-P or ???f1 = P*R/(P+R) =(1-FN-FP+FN*FP)/(2-(FN+FP))

Page 10: Practical NLP with Lisp

Research contd.

2. Devise an algorithm3. Train & improve the algorithm

Page 11: Practical NLP with Lisp

Research contd.

2. Devise an algorithm3. Train & improve the algorithm

http://nlp-class.org

Page 12: Practical NLP with Lisp

4. Test its performance

ML: one corpus, divided into training,development,test

Page 13: Practical NLP with Lisp

4. Test its performance

ML: one corpus, divided into training,development,test

Often different corpora:—* for training some part (not whole) of the algorithm* for testing the whole system

Page 14: Practical NLP with Lisp

Theoretical maxima

Theoretical maxima are rarely achievable. Why?

Page 15: Practical NLP with Lisp

Theoretical maxima

Theoretical maxima are rarely achievable. Why?

* Because you need their data. (And data is key)

Page 16: Practical NLP with Lisp

Theoretical maxima

Theoretical maxima are rarely achievable. Why?

* Because you need their data. (And data is key)

* Domains might differ

Page 17: Practical NLP with Lisp

Pre/post-processingWhat ultimately matters is not crude performance, but...

Page 18: Practical NLP with Lisp

Pre/post-processingWhat ultimately matters is not crude performance, but...

Acceptance to users (much harder to measure & depends on domain).

Page 19: Practical NLP with Lisp

Pre/post-processingWhat ultimately matters is not crude performance, but...

Acceptance to users (much harder to measure & depends on domain).

Real-world is messier, than any lab set-up.

Page 20: Practical NLP with Lisp

Examples ofpre-processing

For spellcheck:

* some people tend to use words, separated by slashes, like: spell/grammar check

* handling of abbreviations

Page 21: Practical NLP with Lisp

Where to get data?

Well-known sources:* Penn Tree Bank* Wordnet* Web1T Google N-gram Corpus* Linguistic Data Consortium (http://www.ldc.upenn.edu/)

Page 22: Practical NLP with Lisp

More data

Also well-known sources, but with a twist:* Wikipedia & Wiktionary, DBPedia* OpenWeb Common Crawl (updated: 2010)* Public APIs of some services: Twitter, Wordnik

Page 23: Practical NLP with Lisp

Obscure corporaAcademic resources:* Stanford* CoNLL* Oxford (http://www.ota.ox.ac.uk/)* CMU, MIT,...* LingPipe, OpenNLP, NLTK,...

Page 24: Practical NLP with Lisp

Human-powered?

http://goo.gl/hs4qB

Page 25: Practical NLP with Lisp

Beyond corpora?

* Bootstrapping* Seeding

Page 26: Practical NLP with Lisp

And remember...

“Data is ten times more powerful than algorithms.”

-- Peter Norvig, “The Unreasonable Effectiveness of Data.”http://youtu.be/yvDCzhbjYWs

Page 27: Practical NLP with Lisp

Using Lisp for NLP

(c) xkcd

Page 28: Practical NLP with Lisp

Why Lisp?Lisp is a carefully crafted tool for:

* Engineers* Practical researchers* Artists* Entrepreneurs

Page 29: Practical NLP with Lisp

Some examples* Piano.aero* ITA Software* Secure Outcomes* Impromptu

* Land of Lisphttp://youtu.be/HM1Zb3xmvMc

Page 30: Practical NLP with Lisp

Research requirements

* Interactivity* Mathematical basis* Expressiveness* Agility Malleability* Advanced tools

Page 31: Practical NLP with Lisp

Specific NLP requirements

* Good support for statistics & number-crunching (matrices) Statistical AI–

* Good support for working with trees & symbols – Symbolic AI

Page 32: Practical NLP with Lisp

Production requirements

* Scalability* Maintainability* Integrability* ...

Page 33: Practical NLP with Lisp

...eventually

* Speed

Page 34: Practical NLP with Lisp

...eventually

* Speed* Speed

Page 35: Practical NLP with Lisp

...eventually

* Speed* Speed* Speed

Page 36: Practical NLP with Lisp

Heterogeneous systems

You have to split the system and communicate:

“Java” way vs. “Unix” way

* Sockets, Redis, ZeroMQ, etc for communication* JSON, SEXPs, etc for data

Page 37: Practical NLP with Lisp

Lisp drawbacksThere's no OpenNLP or SciPy & generally there's fewer libraries.

Page 38: Practical NLP with Lisp

Lisp drawbacksThere's no OpenNLP or SciPy & generally there's fewer libraries.

But...* github: eslick/cl-langutils* github: mathematical-systems/clml* github: tpapp/lla* github: blindglobe/common-lisp-stat* … and http://quicklisp.org

Page 39: Practical NLP with Lisp

But #2Porter stemmer: http://tartarus.org/~martin/PorterStemmer & http://www.cliki.net/PorterStemmer

or Soundex: http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/0.html

are irrelevant with good data

Page 40: Practical NLP with Lisp

More drawbacks

Lisp is a fringe language

Not special language(like R, J or Octave)

Page 41: Practical NLP with Lisp

Example #1

API interaction

Page 42: Practical NLP with Lisp

Example #2

Page 43: Practical NLP with Lisp

Lisp FTW

* truly interactive environment* very flexible => DSLs* native tree support* fast and solid

Page 44: Practical NLP with Lisp

Take-aways* Take nlp-class

* Data is key, collect it, build tools to work with it easily and efficiently

* A good language for R&D should be first of all interactive & malleable, with as few barriers as possible

* ... it also helps if you don't need to port your code for production

* Lisp is one of the good examples

Page 45: Practical NLP with Lisp

Thanks!

Vsevolod Dyomkin@vseloved