computational language
DESCRIPTION
Computational Language. Andrew Hippisley. Computational Language. Computational language and AI Language engineering: applied computational language Case study: spell checkers. Computational language & AI. Artificial Intelligence: - PowerPoint PPT PresentationTRANSCRIPT
Computational Language
Andrew Hippisley
Computational Language Computational language and AI Language engineering: applied
computational language Case study: spell checkers
Computational language & AI Artificial Intelligence:
“the simulation on computer of distinctly human mental functions.” Wilks (1993)
Computational language & AI Language integral to intelligent
systems Artificial Intelligence Turing Test ELIZA
ELIZA Eliza: Hello. I am ELIZA. How can I help you? Me: I am feeling a bit depressed today Eliza: Do you think coming here will help you not to be depressed? Me: yes Eliza: You are sure? Me: a friend told me I should talk to you Eliza: Do you wish to you should talk to me? Me: I am willing to try Eliza: How long have you been willing to try? Me: I have been willing to try for about two weeks Eliza: Can you elaborate on that? Me: a friend said I should try to talk to you about my problem Eliza: Why do you say your problem? Me: I have a problem with depression Eliza: Can you elaborate on that? Me: no, I don't wish to discuss it Eliza: You are being a bit negative
Computational language & AI Why language engineering? Language integral to intelligent
systems Artifiicial Intelligence Turing Test ELIZA Expert systems: natural language
interface, natural language database
Computational language & AI Methods shared across systems
Finite State Transition Networks (FSTN)
Logic Formal rules Probability Data: you know it!
Applied computational language
History of the field Machine Translation: 1960, 1966, post 1966 Database access Text interpretation Information retrieval Text categorisation
Language engineering
Information overload Need a way of automatically
processing text documents Information extraction
Language engineering
Information extraction GIDA: system for automatically
monitoring financial market sentiment
GIDA
-5
-4
-3
-2
-1
0
1
2
3
4
5
1 2 3 4 5 6 7 8 9 1 0
T ra d in g d a y
% C
hang
e
A c tua l C lo s ing% C ha ng eC a lc u la te d % c ha ng e
O u tp u t o b ta in e d fo r t h e p e r io d 1 st t o 1 2 th Ju ly 2 0 0 2 .
Language engineering
Information overload Need a way of automatically
processing text documents Information extraction Summarisation
Automatic summarisation(courtesy of Paulo FERNANDES de OLIVEIRA, PhD)
• Get information source;
• Extract some content from it;
• Present the most importantmost important part to the userxx xxx xxxx x xx xxxx xxx xx xxx xx xxxxx xxxx xx xxx xx x xxx xx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xxxx xxxx xxxxxx xx xx xxxx x xxxxx x xx xx xxxxx x x xxxxx xxxxxx xxxxxx x xxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xx x xxxx xxxx xxx xxxx xx
xxx xx xxx xxxx xxxxx x xxxx x xx xxxxxx xxx xxxx xx x xxxxxx xxxx x xxx x xxxxx xx xxxxx x x xxxxxxxxx xx x xxxxxxxxxx xx xx xxxxx xxx xxxxx xx xxxx x xxxxxxx xxxxx x
Lexical CohesionSentence 23:
J&J's stock added 83 cents to $65.49.
Sentence 26:
Flagging stock markets kept merger activity and new stock offerings on the wane, the firm said.
Sentence 42:
Lucent, the most active stock on the New York Stock Exchange, skidded 47 cents to $4.31, after falling to a low at $4.30.
Sentence 15:
"For the stock market this move was so deeply discounted that I don't think it will have a major impact".
Links Example
Text title: U.S. stocks hold some gains.
Collected from Reuters’ Website on 20 March 2002.
Lexical Cohesion
17. In other news, Hewlett-Packard said preliminary estimates showed shareholders had approved its purchase of Compaq Computer -- a result unconfirmed by voting officials.
19. In a related vote, Compaq shareholders are expected on Wednesday to back the deal, catapulting HP into contention against International Business Machines for the title of No. 1 computer company.
Bonds Example
Text title: U.S. stocks hold some gains.
Collected from Reuters’ Website on 20 March 2002.
Language engineering
Information overload Need a way of automatically
processing text documents Information extraction Summarisation Translation Retrieve only relevant documents Voice processing
Language engineering
Two main approaches Symbolic Stochastic
Case study spell checkers
Spelling dictionaries aim? given a sequence of symbols:
1. identify misspelled strings 2. generate a list of possible ‘candidate’
correct strings 3. select most probable candidate from
the list
Spelling dictionaries Implementation:
Probabilistic framework bayesian rule noisy channel model
Spelling dictionaries Types of spelling error
actual word errors non-word errors
Spelling dictionaries Types of spelling error
actual word errors /piece/ instead of /peace/ /there/ instead of /their/
non-word errors
Spelling dictionaries Types of spelling error
actual word errors /piece/ instead of /peace/ /there/ instead of /their/
non-word errors /graffe/ instead of /giraffe/
Spelling dictionaries Types of spelling error
actual word errors /piece/ instead of /peace/ /there/ instead of /their/
non-word errors /graffe/ instead of /giraffe/
of all errors in type written texts, 80% are non-word errors
Spelling dictionaries non-word errors
Cognitive errors /seperate/ instead of /separate/ phonetically equivalent sequence of symbols
has been substituted due to lack of knowledge about spelling
conventions
Spelling dictionaries non-word errors
Cognitive errors Typographic (‘typo’) errors
influenced by keyboard e.g. substitution of /w/ for /e/ due to its
adjacency on the keyboard /thw/ instead of /the/
Spelling dictionaries non-word errors noisy channel model
The actual word has been passed through a noisy communication channel
This has distorted the word, thereby changing it in some way
The misspelled word is the distorted version of the actual word
Aim: recover the actual word by hypothesising about the possible ways in which it could have been distorted
Spelling dictionaries non-word errors noisy channel model What are the possible distortions?
insertion deletion substitution transposition all of these viewed as transformations that
take place in the noisy channel
Spelling dictionaries Implementing spelling identification
and correction algorithm
Spelling dictionaries Implementing spelling identification and
correction algorithm STAGE 1: compare each string in document with a
list of legal strings; if no corresponding string in list mark as misspelled
STAGE 2: generate list of candidates Apply any single transformation to the typo string Filter the list by checking against a dictionary
STAGE 3: assign probability values to each candidate in the list
STAGE 4: select best candidate
Spelling dictionaries STAGE 3
prior probability given all the words in English, is this candidate more
likely to be what the typist meant than that candidate? P(c) = c/N where N is the number of words in a corpus
likelihood Given, the possible errors, or transformation, how likely
is it that error y has operated on candidate x to produce the typo?
P(t/c), calculated using a corpus of errors, or transformations
Bayesian rule: get the product of the prior probability and the
likelihood P(c) X P(t/c)
Spelling dictionaries non-word errors Implementing spelling identification
and correction algorithm STAGE 1: identify misspelled words STAGE 2: generate list of candidates STAGE 3a: rank candidates for probability STAGE 3b: select best candidate Implement:
noisy channel model Bayesian Rule
Next week
Finite state machines and regular expressions