computational language

Computational Language Andrew Hippisley

Upload: vidor

Post on 17-Jan-2016




0 download


Computational Language. Andrew Hippisley. Computational Language. Computational language and AI Language engineering: applied computational language Case study: spell checkers. Computational language & AI. Artificial Intelligence: - PowerPoint PPT Presentation


Page 1: Computational Language

Computational Language

Andrew Hippisley

Page 2: Computational Language

Computational Language Computational language and AI Language engineering: applied

computational language Case study: spell checkers

Page 3: Computational Language

Computational language & AI Artificial Intelligence:

“the simulation on computer of distinctly human mental functions.” Wilks (1993)

Page 4: Computational Language

Computational language & AI Language integral to intelligent

systems Artificial Intelligence Turing Test ELIZA

Page 5: Computational Language

ELIZA Eliza: Hello. I am ELIZA. How can I help you? Me: I am feeling a bit depressed today Eliza: Do you think coming here will help you not to be depressed? Me: yes Eliza: You are sure? Me: a friend told me I should talk to you Eliza: Do you wish to you should talk to me? Me: I am willing to try Eliza: How long have you been willing to try? Me: I have been willing to try for about two weeks Eliza: Can you elaborate on that? Me: a friend said I should try to talk to you about my problem Eliza: Why do you say your problem? Me: I have a problem with depression Eliza: Can you elaborate on that? Me: no, I don't wish to discuss it Eliza: You are being a bit negative

Page 6: Computational Language

Computational language & AI Why language engineering? Language integral to intelligent

systems Artifiicial Intelligence Turing Test ELIZA Expert systems: natural language

interface, natural language database

Page 7: Computational Language

Computational language & AI Methods shared across systems

Finite State Transition Networks (FSTN)

Logic Formal rules Probability Data: you know it!

Page 8: Computational Language

Applied computational language

History of the field Machine Translation: 1960, 1966, post 1966 Database access Text interpretation Information retrieval Text categorisation

Page 9: Computational Language

Language engineering

Information overload Need a way of automatically

processing text documents Information extraction

Page 10: Computational Language

Language engineering

Information extraction GIDA: system for automatically

monitoring financial market sentiment

Page 11: Computational Language













1 2 3 4 5 6 7 8 9 1 0

T ra d in g d a y

% C



A c tua l C lo s ing% C ha ng eC a lc u la te d % c ha ng e

O u tp u t o b ta in e d fo r t h e p e r io d 1 st t o 1 2 th Ju ly 2 0 0 2 .

Page 12: Computational Language

Language engineering

Information overload Need a way of automatically

processing text documents Information extraction Summarisation

Page 13: Computational Language

Automatic summarisation(courtesy of Paulo FERNANDES de OLIVEIRA, PhD)

• Get information source;

• Extract some content from it;

• Present the most importantmost important part to the userxx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx x xxxx xxxx xxx xxxx xx

xxx xx xxx xxxx xxxxx x xxxx x xx xxxxxx xxx xxxx xx x xxxxxx xxxx x xxx x xxxxx xx xxxxx x x xxxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xxxxx xx xxxx x xxxxxxx xxxxx x

Page 14: Computational Language

Lexical CohesionSentence 23:

J&J's stock added 83 cents to $65.49.

Sentence 26:

Flagging stock markets kept merger activity and new stock offerings on the wane, the firm said.

Sentence 42:

Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30.

Sentence 15:

"For the stock market this move was so deeply discounted that I don't think it will have a major impact".

Links Example

Text title: U.S. stocks hold some gains.

Collected from Reuters’ Website on 20 March 2002.

Page 15: Computational Language

Lexical Cohesion

17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of Compaq Computer -- a result unconfirmed by voting officials.



19. In a related vote, Compaq shareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company.

Bonds Example

Text title: U.S. stocks hold some gains.

Collected from Reuters’ Website on 20 March 2002.

Page 16: Computational Language

Language engineering

Information overload Need a way of automatically

processing text documents Information extraction Summarisation Translation Retrieve only relevant documents Voice processing

Page 17: Computational Language

Language engineering

Two main approaches Symbolic Stochastic

Page 18: Computational Language

Case study spell checkers

Page 19: Computational Language

Spelling dictionaries aim? given a sequence of symbols:

1. identify misspelled strings 2. generate a list of possible ‘candidate’

correct strings 3. select most probable candidate from

the list

Page 20: Computational Language

Spelling dictionaries Implementation:

Probabilistic framework bayesian rule noisy channel model

Page 21: Computational Language

Spelling dictionaries Types of spelling error

actual word errors non-word errors

Page 22: Computational Language

Spelling dictionaries Types of spelling error

actual word errors /piece/ instead of /peace/ /there/ instead of /their/

non-word errors

Page 23: Computational Language

Spelling dictionaries Types of spelling error

actual word errors /piece/ instead of /peace/ /there/ instead of /their/

non-word errors /graffe/ instead of /giraffe/

Page 24: Computational Language

Spelling dictionaries Types of spelling error

actual word errors /piece/ instead of /peace/ /there/ instead of /their/

non-word errors /graffe/ instead of /giraffe/

of all errors in type written texts, 80% are non-word errors

Page 25: Computational Language

Spelling dictionaries non-word errors

Cognitive errors /seperate/ instead of /separate/ phonetically equivalent sequence of symbols

has been substituted due to lack of knowledge about spelling


Page 26: Computational Language

Spelling dictionaries non-word errors

Cognitive errors Typographic (‘typo’) errors

influenced by keyboard e.g. substitution of /w/ for /e/ due to its

adjacency on the keyboard /thw/ instead of /the/

Page 27: Computational Language

Spelling dictionaries non-word errors noisy channel model

The actual word has been passed through a noisy communication channel

This has distorted the word, thereby changing it in some way

The misspelled word is the distorted version of the actual word

Aim: recover the actual word by hypothesising about the possible ways in which it could have been distorted

Page 28: Computational Language

Spelling dictionaries non-word errors noisy channel model What are the possible distortions?

insertion deletion substitution transposition all of these viewed as transformations that

take place in the noisy channel

Page 29: Computational Language

Spelling dictionaries Implementing spelling identification

and correction algorithm

Page 30: Computational Language

Spelling dictionaries Implementing spelling identification and

correction algorithm STAGE 1: compare each string in document with a

list of legal strings; if no corresponding string in list mark as misspelled

STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary

STAGE 3: assign probability values to each candidate in the list

STAGE 4: select best candidate

Page 31: Computational Language

Spelling dictionaries STAGE 3

prior probability given all the words in English, is this candidate more

likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus

likelihood Given, the possible errors, or transformation, how likely

is it that error y has operated on candidate x to produce the typo?

P(t/c), calculated using a corpus of errors, or transformations

Bayesian rule: get the product of the prior probability and the

likelihood P(c) X P(t/c)

Page 32: Computational Language

Spelling dictionaries non-word errors Implementing spelling identification

and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement:

noisy channel model Bayesian Rule

Page 33: Computational Language

Next week

Finite state machines and regular expressions