Download - Hercules Final
1
The Hercules Parser
Patrick A. Cameron
Waikato University Student
Abstract
The Hercules parser provides a simple program for the investigation and application
of patterns to discourse. The parser provides flexibility through scripting, whilst
allowing an operator to be able to understand the frames and concepts without a great
deal of knowledge of computer programming languages and scripts. The SQL and
XML integrations allow for well known query languages to form the basis for data
extraction and manipulation. Wordnet is used as a basis for forming concepts and
patterns thereof, allowing a vast number of already established relationships to be
explored using techniques such as self organizing maps and cluster analysis.
Semantic roles, attributes, and hyponym hierarchies play a key role in word sense
disambiguation where pattern recognition forms underlying concepts which are
reinforced by logical statements with the aim of reaching a true Artificial Intelligence,
simulated by a computer.
2
Contents
Abstract……………………………………………………………………..…………1
List of Figures……………………………………………………………….………...4
List of Tables………………………………………………………………….………6
Acknowledgements…………………………………………………………………....7
Section 1: Introduction…………………………………………………………...….8
1.1 Context………………………………………………………………………….....8
1.2 Exposition Goals…………………………………………………………………..8
1.3 Motivation…………………………………………………………………………9
1.4 Report Chapters……………………………………………………………………9
Section 2: Background……………………………………………………………...10
2.1 Other attempts……………………………………………………………………10
2.1 Other attempts……………………………………………………………………10
2.2 A Conceptual Parser for Natural Language……………………………………...10
2.3 Conceptual Dependency and Montague Grammar: A step toward conciliation…10
2.4 Schank/Riesbeck vs. Norman/Rumelhart: What’s the difference?........................10
2.5 How a Neural Net Grows Symbols………………………………………………10
2.6 A hybrid Approach to Word Sense Disambiguation: Neural Clustering with Class
Labelling……………………………………………………………………………...10
2.7 A Generative Model for Semantic Role Labelling……………………………….11
2.8 Unsupervised Semantic Role Labelling………………………………………….11
Section 3: System Overview………………………………………………………..12
3.1 The Hercules AI Parser…………………………………………………………..12
3.2 Input / Output…………………………………………………………………….12
3.3 Wordnet 1.17 Searches…………………………………………………………...12
3.4 The Hercules Data Interface……………………………………………………...12
3.5 Sentence Structures and Base Node Hierarchies…………………………………13
3.51 Sentence Structures Created from User Input…………………………………..14
3.52 The Frame Engine and Node Hierarchies………………………………………15
3.6 Query Engine…………………………………………………………………….16
3.61 Frame Queries and Statistics……………………………………………………18
3.62 Frame Queries with Formulas…………………………………………………..20
3.7 The AI Mind of Hercules………………………………………………………...23
3.8 Forming basic concepts…………………………………………………………..23
3.9 Abstraction of Concepts………………………………………………………….25
3.10 Concepts, purpose, reason and goals……………………………………………27
3.11 Concepts forming reason of a living organism….……………………………...27
3.12 Learning through abstract concepts…………………………………………….27
3.13 Underlying conceptual schemas and schema limitations………………………28
3.14 Schemas based upon CD theory……………………………………………..…29
3.15 Critical Reasoning………………………………………………………………31
3.16 Hierarchy for reasoning…………………………………………………………33
3.17 Database Structures for reasoning ……………………………………………...33
3.18 Script Formula examples for Critical Reasoning……………………………….34
3.19 Fact and Truth Corrections of the Databases…………………………………...36
3.20 Setting the Database Data……………………………………………………….37
3.21 Abstractions of real concepts for Analogy……………………………………...37
3
3.22 Limitations on Abstract Concepts………………………………………………41
3.23 Forming Analogies……………………………………………………………...43
3.24 Corrections and limitations on Analogy………………………………………...45
3.25 Truth and Weight in analogy……………………………………………………48
Section 4: Experimentation, Results and Analysis………………………………..50
Section 5: Future Work…………………………………………………………….51
5.1 Algorithm Design……………………………………………………………….51
5.2 Hercules Parser Enhancements…………………………………………..........51
5.3 Reaching the goal of True Artificial Intelligence..……………………………51
Section 6: Concluding Remarks……………………………………………………52
4
List of Figures
Figure 3.1: The Hercules AI Parser components and subsystems……………………..12
Figure 3.2: Relationships between the Hercules Data Interface (HDI) and the data
collected from Wordnet………………………………………………………………………12
Figure 3.3: sWord class object, pointers, attributes and metadata relationships……14
Figure 3.4: Sentence node hierarchies tokenized using white spaces populated with
resultant HDI search data…………………………………………………………………...15
Figure 3.5: Frame sWord object node hierarchies with frame metadata including
formula data structures………………………………………………………………………16
Figure 3.6: Query creation process and commands…………………………………….17
Figure 3.7: Flow chart for concept wave / fragment section boundaries……………..19
Figure 3.8: sWord data structure updates and construction using the query stack
execution processes for testing and setting bit defined data attributes………………..22
Figure 3.8: Objects, Attributes, Actions, Distance, Time, Position, Actor, and
Witness………………………………………………………………………………………….23
Figure 3.9: Witnessing events in conversation assist in experience, learning and
expectation……………………………………………………………………………………..24
Figure 3.10: Wordnet Hyponym hierarchies of the statement in figure 3.9…………..25
Figure 3.12: Hercules is able to link existing frames to create a new pattern based on
user input and store for later reference……………………………………………………26
Figure 3.13: Hercules will add weight to patterns recognised in prior
communications such as that of figure 3.12…………………………………………….…28
Figure 3.14: Hercules fills a basic concept container for objects and actions by
recognizing the subject matter of the discourse………………………………………..…30
Figure 3.15.1: The hyponym hierarchy of Socrates for premise A………………….…31
Figure 3.15.2: The hyponym node hierarchy premise A joined by relationship to the
hyponym node hierarchy premise B……………………………………………………..…31
Figure 3.15.3: The hyponym and node hierarchy of premise A and B……………..…32
Figure 3.15.4: The hyponym and node hierarchy of premise A and B and C……..…32
Figure 3.16: Socrates :ode Hierarchy of Wordnet data using relationships…….…33
Figure 3.17: Example script for using the Hercules ISA method for testing the node
hierarchy of Socrates…………………………………………………………………………34
Figure 3.18: Socrates node Hierarchies and Mortal definition can be traced through
node relationships…………………………………………………………………………….35
Figure 3.19: Script, data and methods for finding what mortal means for Socrates, or
what anything means for anything if given the context…………………………………..36
Figure 3.20: Hercules simulates an interesting and engaging manner in
communications with others…………………………………………………………………37
Figure 3.21: The subjects removed from a sentence create a frame…………………..37
Figure 3.22: The hyponym hierarchies forming the abstracted concept with the
definition metadata……………………………………………………………………………38
Figure 3.24: Shows the (is a) node relationships created by Hercules between the
table elements of table 3.4…………………………………………………………………...39
Figure 3.25: Shows the overall categorised and ranged concept in a reduced and
understandable way…………………………………………………………………………..39
Figure 3.26: Illustrates the relationships created by Hercules using hyponym data
and critical reasoning………………………………………………………………………..40
Figure 3.27: The distinction made to the concept category where the concept
becomes too abstract…………………………………………………………………………41
5
Figure 3.28: A frame and concept and abstraction within a given range…………….41
Figure 3.29: The hyponym data for Cleopatra…………………………………………...42
Figure 3.30: The hyponym data for Socrates……………………………………………..42
Figure 3.31: An analogy where first subject of comparable discourse is abstracted..43
Figure 3.32: Hyponym hierarchies provided by Wordnet for person, rock, and cat..43
Figure 3.33: Heuristic substitution and abstraction using the hyponym hierarchy of a
particular word sense………………………………………………………………………...44
Figure 3.34: Shows the comparison of hyponym hierarchies of Cleopatra, Socrates,
and a Rock……………………………………………………………………………………..44
Figure 3.35: Is the frame of the concept analogy of figure 3.31……………………….45
Figure 3.36: Syntax structures of concepts and frames combined……………………..45
Figure 3.37: Syntax for concepts and frames with category information included….45
Figure 3.38: Statement of fact provided by a person…………………………………….46
Figure 3.39: Rock hyponym hierarchy with the object category of distinction………46
Figure 3.40: The expanded concept frame to a table or array of data………………..46
Figure 3.40: Illustrates the distinction drawn from the user input of figure 3.38 will
deactivate categories of the analogy and concept………………………………………..47
Figure 3.41: The upper and lower limits of analogy in context with relationships and
attributes……………………………………………………………………………………….47
Figure 3.42: The Analogy Upper Limit……………………………………………………48
Figure 3.43: The Analogy Lower Limit……………………………………………………48
6
List of Tables
Table 3.1: Concept score analysis Table……………………………………………18
Table 3.2: The binary signature of a concept signature of Table 3.1…………………19
Table 3.3: The hyponym hierarchies for “Socrates is a man” using frame “* is a
*”……………………………………………………………………………………………….38
Table 3.4: Shows the 2 x 7 Matrix of concept combinations of table 3.3…………….39
7
Acknowledgements
I would like to thank my supervisor Dr. Tony C. Smith for his help in guiding my
studies and helping me to explain my project to others. Tony has inspired,
encouraged and challenged my views, whilst providing me with guidance to assist in
me explaining my research. Tony’s optimism, and need I say at times devil’s
advocacy, has made for me, an interesting philosophical journey into Artificial
Intelligence research and design.
8
Section 1: Introduction
This report describes a general exposition of the workings and theory behind the
Hercules parser. The Hercules parser has been under development for 3 years now,
and is still in the process of development. There are numerous features and functions
that have been integrated into the parser to provide a basis for a computer to learn
from and communicate with people.
This section provides a brief introduction to the context, goals, motivation, and
chapter overviews of this expositional report on the Hercules Parser.
1.1 Context
Since the invention of the computer, there have been countless fascinations with the
idea of Artificial Intelligence. The idea that a person can communicate with a
computer and have the computer understand and respond has too numerous
applications to describe. With a general view that having a computer understand and
assist people with their lives will be beneficial for those concerned, I have created the
Hercules parser to investigate how this may be achieved. This report exposits the
steps, processes, and theory investigated by myself in providing such a system.
1.2 Exposition Goals
The aim of this exposition is to describe the workings of the Hercules Parser, the
theories underlying the Hercules Parser, and to describe how the parser can be used
with Wordnet and other databases so that further research may be carried out using
the Hercules Parser as the Platform to achieve conclusive scientific research and
findings.
• Describe the compositional structures of the parser
• Describe the data structures of the parser
• Describe the execution of script operations by the parser
• Describe the Databases of the parser
• Describe the theory behind using wordnet hyponym hierarchies in concept
design
• Describe the theory behind using frames with a parser to create concept
objects using wordnet hyponym hierarchies
• Describe how critical reasoning can be used to supplement the node
hierarchies of wordnet
• Describe how analogy may be formed from concepts derived from the
Wordnet hyponym hierarchies
• Describe in general how the Hercules parser can assist in making a hypothesis
surrounding the meanings of conversation where an algorithm can be applied
to control program flow for interpretation of communications
• Describe future work that can flow on from the exposition of the Hercules
Parser
9
1.3 Motivation
The motivation in the creation of the Hercules parser began with the attempt to
leverage the information stored within Wordnet so that a computer may talk to a
person.
The aim was to allow a person to ask questions to Hercules and have Hercules
respond in an interesting way. Because of the large amount of information in
Wordnet and the availability of the code in C++, Wordnet became the logical starting
point for beginning investigation in to Artificial Intelligence. Because C++ was the
default language in the code libraries of Wordnet, it was an attractive starting point
from the perspective that processing and memory overhead would be reduced due to
the nature of the C++ language; where raw power, flexibility, and direct hardware
access may required. Due to constraints in time and complexities interfacing
with .Net databases and libraries, managed class objects, XML, and windows forms
have been integrated into the previously command line based application. The
Wordnet 1.17 code has been altered significantly to incorporate the class objects of
the Hercules parser. Further task specific class object based engines have been
designed for handling core components and the functionality provided thereof.
With the task of the construction of the Hercules parser prototype nearly complete, it
is left that the relationships of data in communications can be explored to identify
patterns used for intelligent communications between individuals, and apply them to
form an artificial intelligence within a computer for the benefit of assisting a person.
1.4 Report Chapters
Section 1: This section provides an overview of the report.
Section 2: Provides a general background into the research documents that have
contributed to the ideas and concepts that the Hercules parser is based upon. Some of
the material has been considered in the construction of the parser so that the theories
or findings of those articles may be explored with a functional parser and databases
for a statistical repository.
Section 3: Discusses and expands on the goals listed in section 1.6. The goals are not
set out individually, but are interrelated and addressed in the subsections under each
topic.
Section 4: Discusses experimentation, results and analysis; however, since the
Hercules parser has been designed to run the experiments, limited work has been
carried out in experimentation. However, research will continue in the future once the
prototype had been completed.
Section 5: Identifies future work to be done in the areas of algorithms, parser
enhancements, and the final goal of true Artificial Intelligence.
Section 6: Discusses concluding remarks and observations surrounding the Hercules
Parser and the exposition within this report.
10
Section 2: Background
Systems and reference documents have assisted in the creation and support of the
underlying concepts the Hercules Parser attempts to encompass and are listed below.
2.1 Other attempts
Earlier attempts in designing an artificially intelligent machine have been numerous.
Attempts include the “CYC Project” by Douglas Lenat, “A.L.I.C.E.” by Dr. Richard S.
Wallace and “Eliza” by Joseph Weizenbaum. More recent attempts have been made
in designing artificial intelligence such as “Jabberwaky” by Rollo Carpenter which
had competed well in an attempt to pass the Turing test in competing for the Loebner
prize.
2.2 A Conceptual Parser for 7atural Language
“A conceptual parser for natural language” - by Roger C Shank and Lawrence G
Tesler describes an operable automatic parser for natural language. It is a conceptual
parser, concerned with determining the underlying meaning of the input utilizing a
network of concepts explicating the beliefs inherent in a piece of discourse.
2.3 Conceptual Dependency and Montague Grammar: A step toward
conciliation
“Conceptual Dependency and Montague Grammar: A step toward conciliation” by
Mark A. Jones and David S. Warren, contrasts and reconciles the CD theory of
Schank’s conceptual parser in section 2.2 with the logic system of Montague
Grammar using a sorted hierarchy and typed lambda calculus.
2.4 Schank/Riesbeck vs. 7orman/Rumelhart: What’s the difference?
“Schank/Riesbeck vs. Norman/Rumelhart: What’s the difference?” explores the
fundamental differences between two sentence parsers and how keywords, frames and
expectations are handled between the two. The paper focus is more specifically at the
operational level but is thought provoking where similarities are shared with the
Hercules Parser.
2.5 How a 7eural 7et Grows Symbols
How a neural net grows symbols” by James Franklin illustrates how clustering may
be used in conjunction with a neural net for data reduction, and are ideal for AI
implementations.
2.6 A hybrid Approach to Word Sense Disambiguation: 7eural Clustering with
Class Labelling
“A hybrid approach to word sense disambiguation: Neural Clustering with class
labelling” by Steve Legrand and JRG Pulido combines a neural algorithm with the
Wordnet lexical database to be able to semi-automatically label groups of items
11
clustered in a multi-branched hierarchy, illustrating the use of neural algorithms
together with ontological knowledge in word sense disambiguation tasks.
2.7 A Generative Model for Semantic Role Labelling
“A Generative Model for Semantic Role Labelling” by Cynthia Thompson, Roger
Levy, and Christopher Manning use FrameNet sematic role and frame ontology for
identifying semantic roles. To quote from it, “the paper attempts the task of learning
to automatically assign such roles. Identifying such roles and the relationships
between them can in turn serve as support for inference about a sentence’s meaning,
for antecedent resolution, or for other understanding or parsing tasks such as
prepositional phrase attachment or word sense disambiguation. FrameNet corpus and
apply it to the task of automatic semantic role and frame identification. This paper
develops a generative model from which one can infer role labels, given sentence
constituents and a word from that sentence that is a predicator, which takes semantic
role arguments”
2.8 Unsupervised Semantic Role Labelling
“Unsupervised Semantic Role Labelling” by Robert Swier and Suzanne Stevenson:
To quote from it they, “present an unsupervised method for labelling the arguments of
verbs with their semantic roles using an algorithm which makes initial unambiguous
role assignments, and then iteratively updates the probability model on which future
assignments are based.”
12
Section 3: System Overview
3.1 The Hercules AI Parser
The Hercules AI parser has been created to allow a person to converse with a
computer. Figure 3.1 illustrates an overview of the Hercules AI Parser components
and subsystems. Hercules uses basic concepts, critical reasoning and analogy to form
a calculated hypothesis about what is being said. Hercules is pre-programmed with
sufficient concepts and rules that allow meanings of conversation to be explored.
Hercules is also a goal oriented parser, where Hercules is able to assist people with
tasks that people wish to complete. The Hercules parser is divided into a number of
components that assist in understanding communications and tasks.
Figure 3.1: The Hercules AI Parser components and subsystems
3.2 Input / Output
Hercules receives input from a person and responds to the person in an intelligent way.
The communications between Hercules provide an experience that Hercules can learn
from. Hercules uses Microsoft Windows Narrator to read Hercules’ output from a
command prompt. Also Microsoft Windows Speech Recognition or a keyboard
allows a user to provide text information to Hercules via the command prompt.
Hercules Parser
WordNet 1.17 C++
Input / Output
Critical Reasoning Database
Concept Database
Memory Database
Query Database
Analogy Database
Hercules Data
Interface
Frame Engine
Query Engine
Query Database
13
3.3 Wordnet 1.17 Searches
Wordnet 1.17 Provides information to Hercules regarding:
• Ontologies of hyponymy (Is A – relationship)
• Ontologies of meronymy (Has A - relationship)
• Word sense information including the definitions of those senses
• Part of speech information
• Synonyms
The Wordnet 1.17 C++ program code has been modified to run multiple searches to
provide the information Hercules requires. Hercules can be modified to use any of
the Wordnet searches to retrieve information from the Wordnet databases. Normally
Wordnet runs a search on a single word and returns specific search data depending on
the search type. Wordnet code libraries have been modified for Hercules to run five
searches per word instead of one. The information normally outputted to the user for
each separate search is collected in a customised data structure called the Hercules
Data Interface.
3.4 The Hercules Data Interface
Figure 3.2 illustrates the hierarchical relationships and flow of data between the
Hercules Data Interface (HDI) and the data collected from Wordnet. The HDI is the
container structure for all of the information retrieved from the Wordnet searches.
When each search is run using the Wordnet libraries, custom modifications to the
code populates the HDI with the Wordnet output data. Once the data has been
collected for the words of the sentence using section 3.3, the data is attached directly
to the words of the sentence as described in section 3.5.
Figure 3.2: Relationships between the Hercules Data Interface (HDI) and the data
collected from Wordnet
3.5 Sentence Structures and Base 7ode Hierarchies
The default container objects for the parsing functions of Hercules use the sWord
class objects. The sWord class object allows a number of linked-lists to be formed in
node hierarchies. Figure 3.3 illustrates the relationships of the metadata to the sWord
Hercules Data Interface
Meronym
Tree
Hyponym
Tree
Word
Senses
Part of
Speech
Synonyms
Wordnet
14
node. Instead of having multiple objects of differing types, extensions to the class
attributes are added as pointers to other data structures which then define the types.
The presence of a particular pointer determines the parsing function that may be used.
Parsing functions or methods are based upon set theory and predicate logic; the
resulting formulas use attributes to identify the super and sub sets, and logical
assertions. Node objects can then be parsed according to a given formula, where
mathematical symbols are mapped to the processes to be carried out on data,
including the relationships between data. As the sWord data structure can be used in
many ways, a general overview is provided below. Figure 3.3 shows a general
overview of the node pointer types that can be used to order the hierarchies.
Figure 3.3: sWord class object, pointers, attributes and metadata relationships
3.51 Sentence Structures Created from User Input
Figure 3.4 illustrates the sWord node and data hierarchies where Hercules receives
text input from the user via a command prompt. This hierarchy is also used for any
text or phrase that Hercules parses, including the text loaded from databases.
1. Input is received from the user
2. A first sWord class object node is created and contains the full text of the
phrase. The words of the sentence in the char buffer are separated by white
spaces in natural language.
3. Each word of the discourse of the first node’s char buffer will then be
separated by Hercules into separate sWord class objects’ char buffers using
the white spaces as the tokens or delimiters.
sWord:
Word-Use
Word-Options
Tenses
Hyponym
Meronym
Senses
POS
Synonyms
Next-sWord
Next-Lists
Phrase-List
Next Phrase-Lists
Frame Metadata
Wordnet Metadata
Frame-Meta:
Categories, Weights, Thresholds, […] Formulas
Wordnet-Meta:
Meronym Tree, Hyponym Tree, Definitions,
Synonyms, Senses
sWord-Next:
Link to next sWord “node” at same level
sWord-Next-List:
Link to next sWord “list” at same level
sWord-Phrase:
Link to next sWord “Phrase-list” at same level
sWord-Next-Phrase-List:
Link to next sWord “list” at the next node level
sWord-Data:
sWord data resulting from searches and metadata
15
4. The first word of the new linked list of separated words is linked to the first
node.
5. A search is run consecutively for each word in the list using section 3.3 to
provide the data of section 3.4
6. The resultant search data of section 3.4 is transfixed to each word of the linked
list after each search of section 3.3
Figure 3.4: Sentence node hierarchies tokenized using white spaces populated with
resultant HDI search data
3.52 The Frame Engine and 7ode Hierarchies
Hercules comprises a frame engine which loads frames of text to the memory of the
computer. Figure 3.5 illustrates the partial node hierarchies and metadata structures
resulting from loading the frame tables of the database. The frame engine loads the
data from a database, the frames of text and any associated Frame-Metadata for each
frame are then available to compare against sentence information.
1. Hercules checks the Word-Sets Table of the main database to know which
tables are to be treated as frame tables
2. A node hierarchy of sWords are created; first by the Table name, second by
the full text frames as a phrase
3. Then full text frames are separated into a linked list of sWords as in section
3.51, except optionally without the Wordnet-Metadata, but the Frame-
Metadata instead
4. The Frame-Metadata remains attached at a higher level sWord node with the
formula to be run if conditions dictate.
Socrates is a man
Hyponym
Meronym
Senses
POS
Synonyms
Hyponym
Meronym
Senses
POS
Synonyms
Hyponym
Meronym
Senses
POS
Synonyms
Hyponym
Meronym
Senses
POS
Synonyms
“Socrates is a man”
16
Figure 3.5: Frame sWord object node hierarchies with frame metadata including
formula data structures
3.6 Query Engine
Hercules comprises a query engine which processes text based scripts into a chain of
query objects. The query objects then allow Hercules to test conditions and carry out
operations against the sentence data in a particular sequence. The query returns the
success if the conditions are met. Figure 3.6 illustrates the query creation process,
where the query information is read from the Pattern-Query database table.
1. The Query is read from the Pattern-Query Table of the database
Frame Metadata Formula
Frame Table: ISA
“All * are *” Reference:0, Category:20, Order:0, weight:0, …
Formula: “SETISA 2 is 4”
All * are *
“All * are *”
Hercules Databases:
HASA, ISA, Ability, Hercules-Memory, Concepts, Analogy,
Critical-Reasoning, Word-Sets, Queries...
Check
Word-Sets
0 20 0 0 0
ISA
HASA
“Has a * a *”
Ability
“SETISA 2 is 4”
17
2. The scripted operators are converted into a bit-category to signify the
operations to be carried out on particular data
3. The scripted conditions are set in the query class objects to signify the
particular data to be tested for
4. Each section of the query creates a query object to be stacked for execution of
the operations against the conditions being tested for
Figure 3.6: Query creation process and commands
Database Table: Pattern-Query:
“wordcount 2, word 1 is solid, word 1 is instance, word 2 is solid, word 2 maybe
noun, Set word 2 guess, Set word 2 noun”
Metadata: Rank:0, Weight:0, Threshold:0, Link:0, Category:20, Active:0
Hercules Databases:
HASA, ISA, Ability, Hercules-Memory, Concepts, Analogy, Critical-
Reasoning, Word-Sets, Pattern-Query, ...
Check for
Operators
Standard Query Operator Actions
Is Not Maybe Not-Maybe And Or
Like Starts End
s Contains If Then
Break Go-to Link Last
Set
Finished
Frame Query Operator Actions
IsA HasA InA Formula
Frame-Item
Query Conditions
Adjective Adverb Instance Verb
Noun Tense
Frame Solid
Guess Present
ID
Set
Conditions
Create
Query
Stack
Word 1
Is :
Solid
Word 2
Is :
Instance
Word 2
Maybe :
Noun
Word 2
Set :
Guess
Word 2
Set :
Noun
18
Once the Query is created and added to the stack, the query can be executed against
the sentence. A success is returned if the whole query executes, Hercules then
continues executing the remaining queries in the stack. Standard logical, set, or
mathematical calculations are then performed as processes of the query. As Hercules
is able to perform all manner of processes on almost any type of information, it is
required that pattern recognition using statistics be investigated in order to determine
how best to use the English language to communicate. Hercules is able to use the
patterns existing in the Wordnet domain categories as the basis for concept
recognition in text. The domain categories can assist in identifying a concept, whilst
the sense of a word or words provides the definitive value of any expression. This is
because any expression of a word by a person in its sense usually has a determinate
meaning, even if the determinate meaning is subjective between individuals. Hence
the actual meaning of Enjoy is actually provided once the correct sense is discovered.
Therefore in order to discover the correct meaning, queries and statistics are used to
discover concepts and the sense of a word.
3.61 Frame Queries and Statistics
Where a frame indicating the possibility of a concept is discovered in a sentence, a
statistic can be generated to assist with understanding the context of the subjects and
therefore the sense of the words. As frames have already been categorised in
Hercules to a particular concept; the frames can help identify probability of the word
senses and subjects of the sentence. In the table 3.1 below it is illustrated by the
column that frame types of Persons, Tense, Movement, Ability, and Accomplish are
some of those concept categories available in Hercules. Concept categories provide a
very rough basis for understanding a sentence using statistics. The discourse of
“Socrates found ingested an antidote to save his life” provides the following table 3.1
when 5 concept categories using non specific discourse concept identifiers are used
against the whole sentence. Later on with further development in pattern recognition,
surrounding sentences can also assist to identify the context of the words. However,
for the purposes of illustration, a smaller context is explored in table 3.1.
“Socrates found and ingested an antidote to save his life”
Personal # Tense # Movement # Ability # Accomplish #
Socrates 1 * was * 0 * ingest* 1 * *ed to * 1 * made * 0
Cleopatra 0 * is * 0 * to * 1 * can * 0 * did * 0
his 1 * will * 0 * *ed * 1 to * 1 * can 0
her 0 *ed 1 * went * 0 * ate * 0 found * to 1
Table 3.1: Concept score analysis Table
Table 3.1 displays where each time a match is discovered under a particular concept
category indicator, a score is generated. The resulting score of concepts show a count
where the scores above are Personal: 2 Tense: 1 Movement: 3 Ability: 2 Accomplish:
1. This statistic of frequency of a concept possibly being present can be represented in
a flow chart. The boundaries of the concept are then established in figure 3.7 as a
section of a wave.
19
0
1
2
3
4
5
Personal Tense Movement Ability Accomplish
Concept Category
Concept Score
Score
Figure 3.7: Flow chart for concept wave / fragment section boundaries
As figure 3.7 provides the boundaries of the possible concepts of a particular category,
it is possible to compare concepts to each other; or measure the concept based on
probability where subjects are used in successful communication. Successful
communication is established later through re-communication learned information
back to a person and then establishing relationships between the data. The boundaries
or wave of a concept can represent a fragment of a concept, where categories are
included or excluded from the query run against the discourse.
Table 3.1 also allow a frame signature to be established. The frame signature of table
3.1 is not the score, but rather the binary representation of the presence of the frames
it has found in a particular category. There may be many identifiers within a section
of discourse of what may indicate a concept; however, the presence of this identifier
in a category allows a binary concept signature to be formed and used in conjunction
with the concept wave or concept fragment boundaries of figure 3.7. Also the scores
of Table 3.1 may weight a signature where signatures appear to be the same in a
binary representation, but differ in score, and therefore weight. This extra score
allows a pattern to further distinguish concepts in order to appropriately weight and
distinguish the overall concept during pattern recognition in related discourse.
Table 3.2: The binary signature of a concept signature of Table 3.1
Table 3.2 illustrates that for each concept of Table 3.1 that is present, a bit is set to 1
for that category. If there is no presence of any indicator of a particular concept
category the cell for that table is set to 0.
Where there exist many concepts within discourse, that are tested for using a query;
the signatures may be stacked atop each other, and re-ordered by category, score, and
presence, using an algorithm for sorting the categories. The algorithm is discussed
later in section 5 for future work to be done. Also it would be interesting to use a
neural network to identify the patterns present in communications where a score and
binary signature can be identified.
1 0 1 1 0
0 0 1 0 0
1 0 1 1 0
0 1 0 0 1
20
Where repeated patterns are identified in communications, and the senses of those
words making up the pattern are discovered, a Hidden Markov model will be able to
be used to identify the concept category, semantic role, or other delineable class of
word or type or category. As repeated patterns will indicate a probability, those
patterns must be tabled; theoretically, into a Bayesian network where the probability
can be deduced from the statistical relationships of the words in the discourse.
Hercules provides a platform for flexible algorithms, identifiable patterns, tables of
probabilities of expected relationships, concept categories, weighting scores, concept
fragments and signatures; which can theoretically assist Hercules to identify in this
example that a person is moving to perform an ability, which will then help more-so
in determining the senses of the subjects of the discourse. In order to accomplish such
a flexible platform for exploring the meanings of communications, Hercules uses
formulas and procedures that can be executed when a particular frame is identified
within the discourse.
3.62 Frame Queries with Formulas
Where more complex data operations are required, a frame allows a formula to be
executed. The formula allows the sWord node hierarchies to be traversed, and query
operations to be carried out on the nodes returned. The formulas can be constructed
to test any attribute or node within any database or the memory of Hercules. It is
logical to use well known and established formula notations such as those found in set
theory and predicate logic. Mathematicians and linguists are familiar with the
symbols and what they represent. Executing a procedure of Hercules by parsing a
script that follows a common notation for grouping data simplifies the creation and
explanation, and implementation of established. Other common formulas, which are
actually processes, have been created to access specific data. The current processes
for accessing the nodes reside in the formula section of the metadata. Formulas and
processes are closely related in Hercules because in reality the processes represent a
return or manipulate of a subset of data in the node hierarchy. Formulas can also be
attached to a frame so that correct algorithms can balance the weight of the data
where a frame is matched. This means that where a statistic is set at a particular level
for a context, that statistic is demoted or promoted in weight based upon that formula.
Otherwise, given a different context, the same statistic is to be treated differently
according to the differing context.
Formulas for returning a member of a subset of nodes in Hercules are ISA, HASA,
INA and MEANS. The formulas will also be extended to include running SQL
commands to retrieve and manipulate the nodes and associated metadata.
Figure 3.8 illustrates the processes carried out by Hercules when the discourse “All
men” is recognised using a query.
1. A user provides the words “All men”
2. The sentence list is created as in section 3.51
3. Wordnet is searched as in section 3.3
4. The HDI is updated as in section 3.4
5. The HDI data is transfixed to the sentence list as indicated in sections 3.4 and
3.51 (a relationship created by assigning the HDI sWord pointers to the
21
sentence sWord Wordnet metadata structures as described in section 3.5) and
#Defined bit flag information is available for the Word-Use-Options for the
Part of Speech setting bits 3 and 4 to indicate an adjective and adverb
respectively
6. Determiners and others information, such as tenses, are identified within the
discourse so that a determinate use of a word may be attributed to a word of
the discourse (e.g. All men – “all” is the determiner, and is set as an Instance
Object, indicated by running a frame query for instance objects using Instance
frames as in section 3.52, and setting on bits 5 and 10 in the “Word-Use”)
7. The query stack is then executed as described in section 3.6 (this example uses
the query example of section 3.6 to illustrate how the operations and
conditions are executed and tested respectively). The query roughly translates
in lay to “if word 1 is a determiner or instance and the next word has the
option of being a noun, then set the next word after the determiner to a noun”
8. As the query stack is executed, a linked list of query objects are executed and
tested against the discourse provided by the user. Because the discourse has
been populated with data from Wordnet and other databases, the query allows
the data to be tested depending on which operations and data members have
been specified within the query objects during their construction at runtime.
An appendix can be provided in future work explaining the defined operations
and data members operated on, including how and why Hercules uses them.
9. In this example Hercules matches the bit defined data within the sWords data
structures to test and set the conditions of other data members according to the
rule put forward in the script and returns a success for the chain of query
objects of a specific query in the stack if all conditions are tested successfully
22
Figure 3.8: sWord data structure updates and construction using the query stack
execution processes for testing and setting bit defined data attributes
Execute next Query in the Stack
Input Sentence: “All men”
Search Wordnet
Update HDI
Create Sentence List
sWord 1
All
sWord 2
Men
Link HDI sWord Data to Sentence
Hyponym
Meronym
Senses
POS
Synonyms
Hyponym
Meronym
Senses
POS
Synonyms
Hyponym
Meronym
Senses
POS
Synonyms
Execute Query
Stack
sWord 1
IS
Solid(5)
sWord 1
IS
Instance
(10)
sWord 2
MAYBE
Noun(1)
Word-Use:
Solid(5),
Instance(10)
Use Options:
Adj(3), Adv(4)
Word-Use:
None(0)
Use Options:
Noun(1)
ID determiners from Frames
e.g. “All *” = Word-Use(5, 10)
sWord 2
SET
Guess(9)
sWord 2
SET
Noun(1)
All:1
U:5,10
O:3,4
Men:2
U:0
O:1
Men:2
U:0
O:1
Men:2
U:0 + bit(9)
O:1
All:1
U:5,10
O:3,4
1000010000
&
0000010000
=
0000010000
0000000001
&
0000000001
=
0000000001
0000000000
|
0100000000
=
0100000000
1000010000
&
1000000000
=
1000000000
1:U & bit(5) 1:U &
bit(10)
1:0 & bit(1)
2:U |= bit(9)
2:U |= bit(1)
0100000000
|
0000000001
=
0100000001
Return Success or Fail
23
3.7 The AI Mind of Hercules
As Hercules is being designed to be the platform of an artificial mind, there must be
some formation of basic concepts in order to understand and respond to a person
intelligently. It is anticipated that Hercules will utilize Neural Network style learning
for pattern recognition using techniques such as clustering as described by James
Franklin in “How a neural net grows symbols” to assist in large data volumes to be
recognised and processed. However, I would theorise that concept fragments and
their troughs and peaks will assist in identifying and distinguishing a concept in
conjunction with a bit-mask filter; instead of a symbol, or use a waves and symbols
instead of just symbols themselves so that the neural net can be understood at a
schema level. Also “A Hybrid Approach to Word Sense Disambiguation: Neural
Clustering with class labelling” illustrates a Self Organized Map which may be used
with concept category re-organization using concept signatures and pattern techniques
of section 3.61 with clustering to allow discerned categories to assist in words sense
disambiguation by reorganising categories of stacked signatures to identify real
patterns of concepts.
3.8 Forming basic concepts
Forming basic concepts allows for communications to be understood. Hercules has
some basic concepts pre-programmed so that a hypothesis can be formed about what
is being said, even if the hypothesis is incorrect. The basic concepts are formed
around a theoretical maxim of for every action there is and equal and opposite
reaction. This requires a subjective view of metaphysics, and a consideration for the
reaction within a persons mind when witnessing and event. The beginning forming
concepts in Hercules requires a binary view of the physical world.
For example; Object X has Attribute Y. Object X at position A moved to position B.
Figure 3.8: Objects, Attributes, Actions, Distance, Time, Position, Actor, and Witness
We are able to heuristically recognise these subtleties in our environment. From the
example of objects, actions, attributes, time and position, we are able to determine
core concept of objects (X), actions (D/T), location (A or B), distances (D = B-A),
and times (T). Metaphysical concepts are established to build from and form a simple
schema for the node hierarchies.
As more is known about Object Z, it is attributed to Object Z; such as were Object Z
is called Hercules, and Attribute Y indicates Hercules is a computer; and so on for any
additional attribute. So the node Hierarchies are similar to Wordnet’s categories of
ISA and HASA; and we can understand and externally build upon Wordnet’s
Object X
Y
A B
T
D
Object Z
Y
24
databases to include Object Z ISA computer, Object Z HASA Attribute Y, Objects
Z’s Attribute Y ISA name, Objects Z’s name IS Hercules.
Actor and witness form a binary view to observations in the real world. In example,
Actor Object Z with attribute Y witnessed Actor Object X with attribute Y move from
position A to position B. An example of the Binary perspective can be applied to real
world situations. A Hercules object Z witnessed a computer object X move: Hercules
witnessed the computer move to a new subnet of the network.
Figure 3.9: Witnessing events in conversation assist in experience, learning and
expectation
Actor and witness also allow a binary perspective to distinguish communications.
Actor Object Z witnessed Actor Object X communicate A, B and C. Hercules
witnessed Sally say “I like eating Chicken, Salmon, and Turkey.”
Concepts are derived from an analogous abstraction of a sentence. Consider
dissecting the statement above. We can make many assumptions about the statement.
The assumptions we make are based upon what we expect or have experienced.
People innately expect what they have experienced. The persons mind will draw a
conclusion about the statement simply by reading or witnessing it. This can be
applied to the learning of Hercules where patterns are recognised within discourse.
In witnessing the statement above, concepts are in actual fact required to supplement
an understanding or hypothesis about what is being said and correctly identifying the
word sense witnessed in the statement made by the other. Pattern recognition can
occur by witnessing a statement, then making a generalization about the structure of
the sentence. Where generalizations are made, such as about the semantic role of a
words sense or about the domain or concept category; the pattern can then be used to
predict that where the repeated sentence structures are recognized, similar concepts
underpin the subjects.
To truly recognize the concepts underpinning a sentence would require some
experience. The initial experience of the computer is pre-coded to a basic level; it has
so far been my experience of what may indicate a concept within a sentence; and
representing that using a familiar frame of English as the reference that achieves this
initial recognition. Fundamental frames of concepts have been pre-written for
Hercules and are used to explore the meanings of communications as described in this
exposition. These fundamental and core concepts in Hercules allow Hercules to
explore the meanings of subjects in a logical and analogous way using the logical
relationships established in Wordnet and by collecting information by communication
back to the user.
Sally
Y
Hercules
Y
“I like eating Chicken (A),
Salmon(B), and Turkey (C).”
25
3.9 Abstraction of Concepts
Considering Sally’s statement again from figure 3.9 we can determine concepts from
the subjects. Wordnet Hyponym Hierarchies can be used to create abstract concepts.
A simple approach can be taken with the discourse. Starting with “I” (Sally), then “I
like”, then “I like eating”, “I like eating A, B, and C”. The concepts and the subjects
are completely related. An abstraction of the concept information can be made
presuming the senses are correctly identified for the statement and shown in figure
3.10 which displays the Wordnet Hyponym hierarchies for the words of the statement
in figure 3.9.
=0>I
=1> not you
=2> person, individual, someone, somebody, mortal, soul
=3> organism, being
=0>like
=1> see, consider, reckon, view, regard
=2> think, believe, consider, conceive
=3> evaluate, pass judgment, judge
=4> think, cogitate, cerebrate
=0>eat
=1> eat
=2> consume, ingest, take in, take, have
=1> consume, ingest, take in, take, have
=0>[A: Chicken, B: Salmon, C: Turkey]
=(1-5)> …
=6> animal, animate being, beast, brute, creature, fauna
=7> organism, being
=8> living thing, animate thing
=9> object, physical object
=10> physical entity
=11> entity
Figure 3.10: Wordnet Hyponym hierarchies of the statement in figure 3.9
Taking nodes of figure 3.10 at a lower level from the initial nodes of the sentence in
figure 3.9 we can chose a pattern which may or may not be useful; for the purposes of
this example a concept can be abstracted to lower nodes and placed in a sequence for
a database to store the abstract concept as:
person:2 [sally] evaluate:3 [enjoys] ingest:2 [eating] animal(s):6 [A, B and C]
The abstract concept is formed by Hercules from the Hyponym hierarchies and can
then be stored in a concept database. The concept may or may not include ranges of
lower nodes e.g. [Person:1-2] [evaluate:1-4] [ingest1-2] [animals:6-11], and can then
26
be limited later on when forming analogies. The concept may range from any of the
category domains, down from the highest nodes to the bottom of the hierarchies. The
syntax and order of the words form the frame for the new concept and the frame can
then be used to compare it with other statements. It is useful to consider the
comparison may be done using the techniques of section 3.61. The comparison can
be made against statements having similar Hyponym hierarchies. Concept Frames
can be derived from any statement; though an understanding of the purpose of the
statement is later determined though experience. Figure 3.11 shows where:
1. Hercules witnesses the statement made by Sally
2. The frames are checked using a query as in section 3.62, the query may or
may not check any or all of the frames Hercules has in memory, though in this
example has “I like *” and “* and *” (and others), but links those frames to
create a larger frame
3. The larger frame combination is then stored into the frame database with other
metadata, such as the subjects and other node metadata, for use later in
recognizing speech patterns and expected subjects
Figure 3.12: Hercules is able to link existing frames to create a new pattern based on
user input and store for later reference
Patterns can by recognised after experiencing communications where repeated
patterns in communications point to valid statements. Valid statements can then be
used to communicate back to a person or identify correct speech in communications.
Sally
Y
Hercules
Y
“I like eating Chicken (A),
Salmon(B), and Turkey (C).”
Check Frames I like eating Chicken, Salmon, and Turkey
I like * * and *
I like * * * * and * Create new pattern
Store in Database
27
3.10 Concepts, purpose, reason and goals
The reasons for any concepts require a purpose because without an understanding of
purpose the concept is meaningless. As such, there is no use for a meaningless
concept without purpose; therefore purpose provides meaning. A meaningful
explanation of an event for Hercules and others requires basic reasons to supplement
the core-concepts of Hercules. The most basic of concepts are those for
understanding the needs of a living organism, allowing a purpose to be speculated by
Hercules. Even if the purpose is misinterpreted, there is opportunity for correction
later via further communications, and it may also be that many purposes are fulfilled
by one action or communication. The person being communicated with should
provide a correction or aberration in their communication if purported or perplexed by
a miscommunication from Hercules. If no correction is provided, the communication
and concepts appear valid but may be challenged later.
Purpose and meaning also requires Hercules apply itself to assist in the goals of a
person. The assistance of persons with their goals allows for learning and for
meaningful exchanges of information by experience. The needs of a living organism
form the lowest nodes of the reason hierarchy, and is must be assumed that any goal
of a person must fit at a higher level to achieve the end purpose. Therefore a goal can
be listed as in an ordered hierarchy of process and procedure.
Another achievement hierarchy would be to accomplishing a goal with a person,
which should be rewarding to those concerned; including for Hercules, and
implemented by simulating a rewarding state of identifiable successes in its
environments. This may accord to social interaction where the needs of others must
be weighed to achieve a purpose. However, Hercules may advance to this at some
later stage, where at the current stage of development Hercules will carry out any
function requested given the means, and based on fact e.g. A person may command
Hercules to add 2 and 2, eject the CD from the CD Rom, tell a joke, or answer a
question.
Hercules may be able to learn that when someone says “I need to put the CD in” or
“OK Herc, CD!”, then the person says “you dumb computer” and manually pushes
the eject, that next time Hercules hears something about a CD, he will ask the user if
they wish Hercules to open the tray. But of course this is open speculation, but quite
possible and not too far fetched.
3.11 Concepts forming reason of a Living organism
The fundamental concepts for understanding goals lie in the 7 traits of living
organisms as discussed briefly in section 3.10. Without living organisms, the
universe would be objects or energy confined in movement by physics. Sentient
living organisms use a higher mental process to achieve their goals. Understanding
the goals of a sentient organism allows purpose to be formed and therefore allowing a
valid reasoning. Valid reasoning is the explanation of actions and events in achieving
a purpose. Goals, purpose and reasoning allow Hercules to explore the meanings of
actions. The explorations of the meaning of actions allow expectations to be formed
on oneself and the environment. Nutrition, Respiration, Movement, Excretion,
Growth, Reproduction, Sensitivity are the core motivations of every living being,
28
therefore everything understood by Hercules will relate to one or more of theses
motivations; or why else would we do anything but to satisfy our needs, even out of
instinct or subconscious actions. It is the goals of the living organism that form the
processes leading to the 7 traits, such goal which may be abundant in variety and
colourful in nature. Take the male peacock, with feathers and plumage, expanding his
tail to attract a mate for reproduction. This does not explain much but it in a node
hierarchy, it could be seen as “Expand Tail -> Attract Mate-> Reproduce”.
3.12 Learning through abstract concepts
An abstraction of concepts is able to be formed from discourse as described in section
3.9. Repeated patterns in discourse reinforce valid communication structures. Valid
communication structures are able to be observed. Figure 3.13 shows where Sally
was to say a statement to Hercules similar to that of figure 3.12, that extra weight
would be added to the frame remembered earlier where similarities exist. Using that
frame later is a matter of discovery, such as where Eggs, Bacon, Toast, Salmon,
chicken and turkey may be classified as food sally likes, or that if sally is eating eggs,
she likely is eating bacon and toast too. It is a matter of social interaction and
communication that will ultimately indicate the real probabilities of what a particular
person is trying to communicate given that a context can be established using patterns.
Figure 3.13: Hercules will add weight to patterns recognised in prior
communications such as that of figure 3.12
Because the concepts categories and expected subjects of the sentence are determined
by probability, a Bayesian Network of both abstract concepts and frame patterns are
Sally
Y
Hercules
Y
“I like eating Eggs, Bacon,
and Toast too!”
Check Frames I like eating Eggs, Bacon, and Toast too!”
I like * * * * and *
Hercules
Y
Hmm… I better remember
this one, I’ve seen it
before!!
Add weight to
repeated patterns
29
constructed where ambiguity exists. Where the node hierarchies have set
relationships, a table is constructed, and the probability of the frame fitting the
subjects is considered where patterns are repeated and further weight is added or
shifted according to the algorithms initiated by the script or query. Weight is added to
possible abstract concepts making them more probable when later applying script
formulae to control communications, such as where concept patterns discernable
attributes and a formula can be constructed to appropriately weight the frame in
reference. When Hercules uses the communication patterns successfully they again
become more probable and are recorded the more probable again. Unsuccessful
communications patterns become demoted or redundant and the weights and
thresholds can be adjusted accordingly using the formula section of the frame or
frames in reference. The concept fragment, wave or signature of section 3.61 can
distinguish a pattern to adjust weight if it is discerned necessary in recognition. The
contexts and goals of the communications are also relevant to pattern recognition and
weighting. The goals of the communications need to be established, and will assist in
providing a context and an actual understanding of the communications. Actual
understanding will allow a correct interpretation and exploration of the
communications may continue where given the means.
3.13 Underlying conceptual schemas and schema limitations
The underlying conceptual schemas form a generic means to build upon the core-
concepts of movement from A to B, Object X at position A or B, Object X has
attribute Y, Object X’s reason for moving was Z; as described in section 3.8. Also,
reasons are attached as attributes to an object formed from the 7 basic needs of living
organisms described in section 3.11.
Limitations exist within the schema, and are actualised in the restrictions of general
physics and observations or descriptions of movement and actions. Further
limitations underlying concepts are attached by attribute when identified; such as
when discovered in conversations or as matter of fact, such as being told the height of
a building or parsing and reading the colour of the sky in an encyclopaedia.
Core-conceptual schemas are built using Object, Action, Attributes, Tense and
Reason. Concept subject objects result from the observations. Formulae and
algorithms can later manipulate the resulting statistics regarding the information
recorded.
Figure 3.14 gives an overview of the basic concept container that hold enough subject
attributes that can be linked together if required. An object or action is not necessarily
described by more that 3 consecutive adjectives or adverbs in everyday conversation.
Also, a tense is either presumed or apparent, and a reason for the communication is
also presumed or apparent. Because each object is related to an action, the object or
action may be linked to other objects or actions. It is then the relationships in the
node hierarchies that define the context or other interrelations. The relationships are
complex and directly related to the purpose and the use of these object action
containers are left open for exploration of future implementations using scripts,
formulae and algorithms; but are none the less required for subject containers where
data has been extracted where a pattern has been recognized.
30
Figure 3.14: Hercules fills a basic concept container for objects and actions by
recognizing the subject matter of the discourse
3.14 Schemas based upon CD theory
Schank created a parser based on Concept Dependency Theory described in “A
conceptual parser for natural language”, which identifies the relationships of concepts
with each other and how a parser functions using concepts. The relationships of
concepts are described by elements, where the elements are derived from rules
common to all languages and concepts. Similar subjects sharing concepts are able to
be interchanged similar to a semantic role in a sentence, and similar to my description
of concepts having subjects in frames. In Schank’s Parser, the concepts and the
relationship elements form a generic concept; however, frames of English language
can also form an ordered concept schema. The graphical representation of concepts is
no different to an ordered frame of English where words such as “to”, “the”, “and”,
“will”, “has” etc can illustrate similar relationships when attributed to a particular
category of concept. Schank attempted to simplify the concept creation process and
his work may be more relevant where used as an underlying conceptual schema to
build upon. Analogous concepts are able to be formed using plain English with a
context. In example, “movement: the * was *.” The context is provided as a
contextual predicator, and the components may be explained using CD Theory. There
may be a limitation to CD theory where subjects can not be distinguished from each
other in a broad concept. This requires the attributes of subjects to be the
distinguishing factor over the basic concept, and using a node hierarchy and unique
identifier for that node and attribute per category to achieve this through re-ordering
relationships. None the less, Schank’s Parser and CD theory provide more than
reasonable proof of the importance of semantic roles in concept identification.
Object X
Y
A B
T
D
Object Z
Y
Concept Object: 3
Noun: Ball
Adjective 1: Red
Adjective 2:
Adjective 3:
Tense: Present
Reason:
Action Link: 4
Concept Action: 4
Verb: Rolling
Adverb 1: Quickly
Adverb 2:
Adverb 3:
Tense:
Reason:
Object Link: 3
“A red ball rolling
quickly from A to B”
Run Queries and
Frame Formulas
Create Concept
Subjects
31
3.15 Critical Reasoning
Critical reasoning is a necessary part of sentient reasoning where logical steps,
relationships and assertions must be considered. Critical Reasoning forms the basis of
making a logical assumption or providing reasonable expectation. Inductive and
deductive reasoning is used as a model to form a hierarchy of expected statements
within Hercules. The core concepts within Hercules’ frame category database tables
provide the foundations for critical reasoning in the parser.
1. A category is required to be determined for the subject of the frame.
2. The category then provides the context of the subject in order to distinguish
the sense of a word or phrase.
3. In order to reason, the first premise is the first distinguished subject, and a
second distinguished subject indicates a relationship with the first subject.
4. The relationship of the first and second subject is recorded in the Critical
Reasoning database of Hercules.
The core-concepts components of Object, Action, Attribute, Tense, and Reason are
assigned and attributed with the new relationships. All assertions are assumed logical
and true by Hercules except where limitations can be applied to conflicts in truth
discovered in communication.
3.16 Hierarchy for reasoning
Hercules has methods for testing a hierarchy. Methods such as IsA() and HasA()
traverse the hierarchy of a particular database.
Figure 3.15.1 is the hyponym node hierarchy of Socrates is a man (premise A)
=1>Socrates
=2> man
Figure 3.15.1: The hyponym hierarchy of Socrates for premise A
In Figure 3.15.2 the statement “All men are mortal” creates a relationship with
premise A shown in figure 3.15.1. The need to test the premise occurs only when
challenged.
=1>Socrates
=2> man
+ (new relationship formed)
=1>(Men, man)
=2>mortal
Figure 3.15.2: The hyponym node hierarchy premise A joined by relationship to the
hyponym node hierarchy premise B
32
When a premise is challenged, to test if Socrates is mortal Hercules will call a method
by formula to traverses the new hierarchy. Figure 3.15.3 shows the hyponym and
attribute hierarchy of “Socrates is a man, All men are mortal”. Within the node
hierarchy there are 3 types of attribute at the same level for each node which are ISA,
HASA, and MEANS. Depending on what information is being requested will depend
which relationship is created and what information is returned. The ISA() method is
capable of finding out whether Socrates is Mortal.
Figure 3.15.3: The hyponym and node hierarchy of premise A and B
Once the node hierarchy is established by relationships stored in a database external
to Wordnet any information is able to be added to a particular level. Say that premise
C was that “Socrates is a red-head”; figure 3.15.4 show this relationship must be
created at the correct level, and in this case probably at level 2 of the hyponym
hierarchy as it is a new distinguished attribute of Socrates. If the new node was
placed under mortal, a mistake may be made where all mortals are believed to be
read-heads.
IsA (Socrates, Mortal)
=1>[Socrates]
=2>man
=3>[mortal]
=4>Red-Head
=2>Red-Head
=3>[Has red hair]
Figure 3.15.4: The hyponym and node hierarchy of premise A and B and C
The response of the test is “yes”
ISA:
Socrates
ISA:
Man
HASA:
=>head
=>body
MEANS:
Male
Person
ISA:
Mortal
Organism
HASA:
Adjective
MEANS:
Subject to
death
33
3.17 Database Structures for reasoning
Wordnet provides Hyponym trees (ISA), Meronym trees (HASA), and sense
definitions (MEANS). The Wordnet information provides the template for the
creation of basic Objects and concepts.
A Hyponym tree is the ontology of category or domain for a word sense. A Meronym
tree is the ontology of composition of a word sense. A premise of a category can be
tested against the Hyponym tree or a premise of composition can be tested against the
Meronym tree. Hercules uses Wordnet to test the premises of what a thing can
comprise or what kind something is. When considering the structures for reasoning,
the attributes are related directly to IsA() and HasA() functions. The data must be
tagged and attributed correctly within the database. Also the data must be added to
the database with the correct attributes in the correct hierarchy.
Figure 3.16 shows a hierarchical representation of the collection of nodes for Socrates
and the attachment of the new attribute “Mortal” is maintained at the correct level
with its metadata. Where mortal is a new attribute, the node hierarchies for mortal are
also maintained, though unused until a method may require the information of mortal.
=1>Socrates
=1> Sense 1: ISA
man, adult male
=1> male, male person
=2> person, individual, someone, somebody, mortal, soul
=3> organism, being
=4> living thing, animate thing
=5> object, physical object
=6> physical entity
=7> entity
=1>New Attribute + [Metadata] : man + Isa: Mortal
=1> Sense 1: MEANS: Definition:
(1437) man, adult male -- (an adult person who is male (as opposed to
a woman); "there were two women and six men on the bus")
=1> Sense 1: HASA
man, adult male
HAS PART(1): adult male body, man's body
HAS PART(2): beard, face fungus, whiskers
HAS PART(3): mustache, moustache
=1>male, male person
HAS PART(1): male body
HAS PART(2): male reproductive system
Figure 3.16: Socrates :ode Hierarchy of Wordnet data using relationships
34
Hercules requires a separate database from Wordnet is created for Critical Reasoning.
The database is a collection of premises and their relationships referenced back to the
original Wordnet hierarchies. The premises are then able to be referenced as a
hierarchy of nodes where new relationships are attributed to each premise.
3.18 Script Formula examples for Critical Reasoning
A premise and attribute have been previously added within the Socrates Node
Hierarchy as shown in section 3.16. The test of the reason the premise pertains to can
be called via a formula attached to a frame. The context and category of the
following frame is assumed as ISA for this example. Figure 3.17 shows the query in
red.
1. Discourse is provided to Hercules of “Is Socrates mortal?”
2. The processes of Section 3.62 are carried out with the data below the script in
red
3. Hercules is designed so that the script simplifies the entry of data, rather than a
person entering in large amounts of complex data, all that is required to be
entered is the script in red to handle any question asking about “is Socrates
mortal?” or “is Cleopatra female?” or any combination of “is [something]
[something]” or “is * *?”
Script: “Wordcount 3, Frame ISA id 1 is present”
Discourse: is Socrates mortal?
Frame: Is * *
Frame Unique ID:1
Frame Category: ISA
Frame Formula: ISA 3 a 2
Method Called: BOOL IsA( [sWord:Char:Socrates], [sWord:Char:Mortal] );
Method Action: Call Parse Method on Socrates node hierarchy for “ISA: mortal”
Method Return: TRUE (when attribute/node [Mortal] is located)
Method Response: printf("Hercules can see %s is a %s", [sWord:Char:Socrates],
[sWord:Char:Mortal] );
Hercules Response Output:”Hercules can see Socrates is a mortal”
Figure 3.17: Example script for using the Hercules ISA method for testing the node
hierarchy of Socrates
Providing the context is still that of Socrates, the question is really “What does mortal
mean for Socrates?” Hercules uses scripts and searches wordnet and the external
databases for discourse metadata to provide the information below in the following
heirarchy. Figure 3.18 illustrates that in a node hierarchy, a relationship can be
created that supports Hercules in retrieving data stored by reason provided in
conversation.
35
=1>Socrates � External Database Term
=1> Sense 1: ISA � Wordnet Hyponym tree
man, adult male
=1> male, male person
=2> […]
ISA: =1>Mortal � External Database Relationship to:
MEANS=1> Sense 1: Defintion: Mortal � Wordnet Definition
(3) mortal -- (subject to death; "mortal beings")
Figure 3.18: Socrates node Hierarchies and Mortal definition can be traced through
node relationships
A script, frame and formula can be used to specifically retrieve the meaning or
definition about the subject of Socrates being mortal. Providing that the correct
meaning of Mortal is attributed to Socrates; we are able to use a simple script
described in red in figure 3.19 to receive a definitive answer to the question, “What
does mortal mean?” Please note that this script can be used to find out what anything
means for anthing where “What does * mean in *” is used. This allows us to ask
Hercules “what does an eagle mean in golf?” or “what does think mean in person?”
Script: “Wordcount 4, Frame MEA7 id 1 is present, or, Wordcount 6, Frame
MEA7 id 2 is present”
Discourse: “what does mortal mean?”
Context Subject Predicator: Socrates
Frame1: What does * mean
Frame Unique ID: 1
Frame Category: MEANS
Frame SUBJECT: Socrates
Frame Formula: MEANS INA 3 in SUBJECT
or
Frame2: What does * mean in *
Frame Unique ID: 2
Frame Category: MEANS
Frame SUBJECT: SPECIFIED
Frame Formula: MEANS INA 3 in 6
Method(s) Called:
1. sWord InA( [sWord:Char:Socrates], [sWord:Char:Mortal] ) and
2. sWord Means([sWord:Mortal:Char:Definition])
36
Method(s) Action: Call Parse Method 1 on Socrates node hierarchy for return of
object “sWord: ISA: mortal”, then call method 2 to return the mortal definition for
output
Method 1 Return: Return sWord Node (when attribute/node [Mortal] is located)
Method 2 Return: Return definition char array for use in output or response
Method Response: printf("%s means %s", [sWord:Char:Mortal],
[sWord:Mortal:Definition:Char:Mortal] );
Hercules Response Output: “Mortal means subject to death”
Figure 3.19: Script, data and methods for finding what mortal means for Socrates, or
what anything means for anything if given the context
In figure 3.19 Hercules’ response conforms to a format set by the formula attached to
frame “* means *”. It should however be noted that any number of scripted
operations may be performed on the Hercules response data providing a method is
constructed, and the location of the data is known. Any operation may be performed
on that data for grouping in sets, comparisons and performing statistical and
mathematical calculations or analysis. Any operation on the data may be performed
via the flexibility provided using scripts.
Initially the scripts are and have been hand written. The writing of scripts is partly
automated in the GUI of Hercules so a person can easily write the scripts without the
need to know a computer programming language. The automated writing of scripts is
automated using hard-coded methods; however, once sufficient methods are
constructed, access to the methods is then provided to Hercules. This automation of
scripts via methods ultimately allows Hercules to write its own scripts. A separate
scripting Database for learning will assist Hercules to re-write its own scripts based
on pattern recognition, acquired knowledge via conversation and corrections to fact.
Current scripts for Hercules are located in the Pattern-Query Table of the databases.
3.19 Fact and Truth Corrections of the Databases
A correction may be made to information via learning; such as where the wrong sense
definition is communicated to a person. The mistake in fact requires the user to
inform Hercules of the error, or a correction will be asked for if a discrepancy is
encountered. The correction is asked for in a manner appropriate to simulate how a
person may discover the actual truth of the circumstances. Any manner of simulation
will disguise the actual methods used. Hercules will personalise the communications
using familiar and accepted terminology. For example, alternate and random phrases
will request the correct information possibly using enthusiastic sounding statements
and humour to assist in engaging the user to provide correct information. Figure 3.20
illustrates how a statement such as “How interesting!” or “hehe!” may emotionally
engage the user to continue discussions. The words in the response of Hercules can
provoke an emotional state in the user where the appearance of emotion is perceived
in Hercules’ statements. It would be interesting to measure the responses by
individuals to different statements made in this manner.
37
Figure 3.20: Hercules simulates an interesting and engaging manner in
communications with others
3.20 Setting the Database Data
Method SetIsA() is called to set data for * is a *. This allows for a database external
to Wordnet to create a relationship between word 1 and 4 of the statement. The script
for the frame to set the data is “Wordcount 4, Frame SETISA is PRESENT” with a
formula attached as “SETISA 1 is 4.” Please note that any SQL statement may also
set the data; however XML SQL Table Data-adapters are used by the methods.
Methods such as SetHasA() and SetMeans() are constructed in a similar fashion
ensuring the correct information is updated.
3.21 Abstractions of real concepts for Analogy
For an analogy to occur, Hercules must generalise the concepts. The concepts and
subjects are required to be abstracted to a greater degree. Parsing the Hyponym node
hierarchies for category information provides the basis for an abstraction of the
concepts and subjects in question. To make an abstraction of concepts the discourse
is generalised for all senses and hierarchies. Core concept frames allow for this to
happen. Take the statement “Socrates is a man, all men are mortal, therefore Socrates
is mortal.” Figure 3.21 shows the frame that underlies the sentence subjects.
* is a *, all * are *, therefore * is *
Figure 3.21: The subjects removed from a sentence create a frame
It is clear by examination of the frame in figure 3.21 that the subjects have been
removed. Smaller frames already within Hercules are matched to the syntax of the
statement. Please note that the Method SetIsA() as described in section 3.20 is called
to set data for * is a * using scripts similarly to section 3.18. This allows for a
database external to Wordnet to create a relationship between word 1 and 4 of the
statement above.
Relationships are created through critical reasoning and pattern recognition; however
subjects are abstracted using the Wordnet hyponym Hierarchies. The level of
abstraction of a concept depends on the possible abstractions made from the nodes of
the hyponym hierarchy.
Hercules
Y
How interesting!… Do you
really like Bacon and Eggs?
I like dogs, hehe!
38
Wordnet provides us with the following information:
Socrates
=1> man
man, adult male
=1> male, male person
=2> person, individual, someone, somebody, mortal, soul
=3> organism, being
=4> living thing, animate thing
=5> object, physical object
=6> physical entity
=7> entity
Mortal
=1> Adjective: Definition – “Subject to Death”
Figure 3.22: The hyponym hierarchies forming the abstracted concept with the
definition metadata
A table of possible concepts are created by factorizing the domain category nodes,
where each node of the category of each sense is considered in probability for the
least ambiguity. This means each word of each possible sense is factorized into an
array and each lower node belongs to a new row of the column. Table 3.3 illustrates
the elements the table can comprise where each column’s category elements are
multiplied by the elements in the next column providing a 2 x 7 matrix of concept
combinations as shown in figure 3.23. Please note that the senses have not yet been
factorized with Table 3.3 as it is presumed the correct sense of the words have been
identified for this example using the frame “* is a *” and the categories are correct for
the sense. An actual implementation would be much more complex than this small
example; requiring factors of table 3.3 to include the senses of ambiguous discourse
and the probabilities and adjustments for correction based on past, present and
expected input of communications.
=1>Socrates =1>Male
=2>Man =2>Person
=3>Organism
=4>Living Thing
=5>Object
=6>Physical Entity
=7>Entity
Table 3.3: The hyponym hierarchies for “Socrates is a man” using frame “* is a *”
Table 3.4 shows table 3.3 expanded as a 2 x 7 matrix of concept combinations. For
larger frames, the concept combinations will be exponential. The table can be
reduced to simplified form where the associated node levels are represented by the
ranges of the nodes in that category.
39
[0010010101, 10101010, 1-7]
[0100101010, 10101010, 1-7]
[Socrates, sense1, isa, 1-7]
[Man, sense 1, isa, 1-7]
Socrates: 0 Male: 1
Socrates: 0 Person: 2
Socrates: 0 Organism: 3
Socrates: 0 Living Thing: 4
Socrates: 0 Object: 5
Socrates: 0 Physical Entity: 6
Socrates: 0 Entity: 7
man: 1 Male: 1
man: 1 Person: 2
man: 1 Organism: 3
man: 1 Living Thing: 4
man: 1 Object: 5
man: 1 Physical Entity: 6
man: 1 Entity: 7
Table 3.4: Shows the 2 x 7 Matrix of concept combinations of table 3.3
[Socrates: 0] (is a) [male person: 1]
[Socrates: 0] (is a) [person: 2]
[Socrates: 0] (is a) [Organism: 3]
[Socrates: 0] (is a) [Living thing: 4]
[Socrates: 0] (is a) [Object: 5]
[Socrates: 0] (is a) [Physical entity: 6]
[Socrates: 0] (is a) [entity: 7]
[man: 1] (is a) [male person: 1]
[man: 1] (is a) [person: 2]
[man: 1] (is a) [Organism, being: 3]
[man: 1] (is a) [Living thing, animate thing: 4]
[man: 1] (is a) [Object: 5]
[man: 1] (is a) [Physical entity: 6]
[man: 1] (is a) [entity: 7]
Figure 3.24: Shows the (is a) node relationships created by Hercules between the
table elements of table 3.4
The concept ranges can be reduced as shown in figure 3.25 where unique concept
category hierarchies can be recognised. It is likely that a binary representation of the
remaining hierarchy form the second node down in each subject is used to abstract the
concept, depending on the relationship. Otherwise the next node down from the
subject will be the first point of abstraction of the sense. The range of what the
concept will cover in analogy is limited later by experience.
Figure 3.25: Shows the overall categorised and ranged concept in a reduced and
understandable way
40
Figure 3.26 shows the relationships between the data of figure 3.22. The concept
categories are derived from Wordnet and have had relationships established by
Hercules in prior conversations where premise A of “Socrates is a man” and premise
B “All men are mortal” has formed the relationships. Abstractions can then be made
once the relationships have been established, supporting the creation of concept
abstractions by Hercules.
“Socrates is a man” “all men are mortal”
(* is a *) (all * are *)
=0>Socrates (is a) =0> man, adult male (is a) =0>Mortal
=1>man =1> male, male person =1>Subject to death
=2> person, individual,
someone, somebody, mortal, soul
=3> organism, being
=4> living thing, animate thing
=5> object, physical object
=6> physical entity
=7> entity
Figure 3.26: Illustrates the relationships created by Hercules using hyponym data
and critical reasoning
Considering that all statements are considered logical and true; Figure 3.26 shows the
first and second premise accepted and grouped with the hyponym data of Socrates and
man with the attribute definition of Mortal. However, abstracted concepts of a
premise may not apply in all circumstances where the abstraction becomes too vague.
A test of the validity of an abstracted concept is discovered in the request or provision
of information in communications. Given the opportunity, Hercules may say to a
person, “a male person is subject to death, yes?” or “Organism is subject to death,
yes?” and so on. However, where an abstraction of a concept may be too general, the
analogy may not apply. A distinction is drawn by Hercules during conversation
where a conflict is noticed.
The conclusion to the premises A and B are drawn in “therefore Socrates is Mortal”,
which is taken as true and correct; and it is in the abstractions that a distinction may
apply. A person may state “an object is not mortal”, but the truth of premise A must
be maintained for Socrates whilst able to be applied to other circumstances. The
analogy must be distinguished by category in as shown by figure 3.27 where we
understand that not all objects are subject to death. Hercules does not yet have the
experience to know that information until proposed in another statement.
41
* (is a) [male person: 1] [subject to death:1]
* (is a) [person: 2] [subject to death:1]
* (is a) [Organism: 3] [subject to death:1]
* (is a) [Living thing: 4] [subject to death:1]
* (is a) [Object: 5] [subject to death:1] � Object becomes too abstract to be certain
* (is a) is a [Physical entity: 6] [subject to death:1]
* (is a) is a [entity: 7] [subject to death:1]
Figure 3.27: The distinction made to the concept category where the concept
becomes too abstract
The abstraction can be limited above at Object: 5 subject to death when considering a
likely statement by a real person that indicates “Not all objects are subject to death” or
“[object: rock] is not subject to death.” At this point of the conversation with the
provision of the new information that conflicts with the concept, a distinguished
category may be considered that not all objects are living things subject to death.
Whilst communications do not diverge from the premises, there is no need to consider
any limitation if for example a person states “My [object: cat] died.” The statement of
the cat dying, displays no divergence or limitations to the established abstractions or
premises in categories 1-7 above. It is only when a conflict arrises that a new premise
can be established or limited.
3.22 Limitations on Abstract Concepts
It is noted here that the combination of abstractions allow Hercules to predict and
recognise what other concepts or subjects may be forthcoming. Experience of
communications allows a predicted conclusion to be drawn. The clause of “therefore”
illustrates one indication of a predicate conclusion drawn in conversation.
For the example using figure 3.28, please assume the framed concept of ingestion,
therefore, and died has been firmly established in the core-concepts using predicate
logic, set theory, tenses and the 7 traits of living organisms. The frame of a concept is
shown without the subjects, and below the frame is a collection of concepts that have
been abstracted to a lower level in the concept category node hierarchy for each
subject. As experiences in communication will form and limit the perception of
concepts and the conclusion drawn; consider the therefore statement of “Socrates
ingested poison therefore he died.” Consider now the probability of Hercules
understanding the statement “Cleopatra ingested poison therefore she died.” after
hearing about Socrates ingesting poison. Figure 3.28 shows that where a concept has
been abstracted the semantic role of the subject is preserved, and allows Hercules to
be able to more easily compare similar sentences for established patterns, therefore
establishing a probability for an expected subject sense or type.
* ingested * therefore * died
[person] ingested [poison] therefore [person] died
Figure 3.28: A frame and concept and abstraction within a given range
42
Given that the structure of the sentence is similar for both statements about Socrates
and Cleopatra; the subjects can be abstracted towards concepts in common which
determine both the expected type of data to fit the frame, and the limit on the
abstraction of the concept.
The Wordnet Hyponym data for Woman is:
=0> Cleopatra =0> woman, adult female
=1> Woman =1> female, female person
=2> person, individual,
someone, somebody, mortal, soul
=3> organism, being
=4> living thing, animate thing
=5> object, physical object
=6> physical entity
=7> entity
Figure 3.29: The hyponym data for Cleopatra
=0>Socrates (is a) =0> man, adult male
=1>man =1> male, male person
=2> person, individual,
someone, somebody, mortal, soul
=3> organism, being
=4> living thing, animate thing
=5> object, physical object
=6> physical entity
=7> entity
Figure 3.30: The hyponym data for Socrates
When comparing the hyponym data of Cleopatra in figure 3.29 with that of Socrates
in figure 3.30, it can be deduced that the closest point of abstraction for man and
woman is at position 2 of the man and woman Hyponym Data. This position is
related to the shared category of Person, where person also shares the remaining
hyponym categories of a particular word sense. The position is not the indicator;
rather the existence in a particular category is the indicator. In this example [Person]
is the common attribute and shared from [Person] to [Entity] and may indicate a
shared sense if sufficiently identified by a detailed category node hierarchy.
To revisit the statements of “Socrates ingested poison therefore he died” and
“Cleopatra ingested poison therefore she died”, the assumption made by Hercules
from the abstraction above provides both an analogy shown in figure 3.31 and also an
expectation and probability that when matching the frame “* who ingest poison will
die” to another sentence, it is likely that the first word of the frame will be of the type
[persons].
43
“[persons] who ingest poison will die.”
Figure 3.31: An analogy where first subject of comparable discourse is abstracted
Hercules will now expect a [person] will be described in discourse characterized by
common attributes and the Hyponym hierarchy. Hercules is also able to apply the
resulting analogy to other discourse. Other factors can later be explored through
experience and patterns to recognise what the other probabilities of any other types
being present may be.
3.23 Forming Analogies
In order to form an analogy, we must first abstract the concepts of a particular kind as
described in section 3.22. In conversation, we are able to heuristically use
abstractions of a kind as synonyms in language, and still remember what the synonym
pertains. Consider the following types of things in figure 3.32; a person, rock and cat.
Each type of things has a hierarchy of the kind of thing.
“Person” “Rock” “Cat”
[person: 0] [rock: 0] [cat: 0]
[organism: 1] [natural object: 1] [feline: 1]
[living thing: 2] [whole, unit: 2] [carnivore: 2]
[object: 3] [object: 3] [placental: 3]
[mammal: 4]
[vertebrate: 5]
[chordate: 6]
[animal: 7]
[organism: 8]
[living thing: 9]
[object: 10]
Figure 3.32: Hyponym hierarchies provided by Wordnet for person, rock, and cat
In figure 3.32 we can see a type of “person” is both a kind of [organism: 1] and a kind
of [living thing: 2]. A type of “cat” is a kind of [mammal: 4], and a kind of [animal:
7].
We can also heuristically substitute a kind of thing in conversation and still
understand what type the kind relates to. I can talk about the cat as a feline, then the
feline an animal. Now consider “the animal was fuzzy and her name was sally”. You
can guess that I am still talking about the cat (unless you are tired). Now consider the
frame of that sentence in figure 3.33.
44
“The * was fuzzy and her name was sally”
“The [cat: 0] was fuzzy and her name was sally”
“The [feline: 1] was fuzzy and her name was sally”
“The [animal: 7] was fuzzy and her name was sally”
Figure 3.33: Heuristic substitution and abstraction using the hyponym hierarchy of a
particular word sense
Analogy is made through the abstraction of concepts. The point of abstraction in a
frame of discourse is similar to a particular semantic role of the subject in the
discourse. Presuming people converse rationally, the word provided by a person at
the point of abstraction should make sense with the remaining discourse. The
Hyponym hierarchy of a new subject at the point of abstraction is presumed to be
valid if provided by a person, and can then be abstracted. The analogy of figure 3.31
resulting from an earlier example of abstraction of concepts involved Socrates and
Cleopatra. The analogy was that:
“[persons] who ingest poison will die.”
The broader the abstraction of concepts; the more subjective becomes the analogy.
The depth of analogy is most sensible at the closest of the more established premises.
That is to say we are more able to make sense of an analogy where the categories are
more specific. It is easier to distinguish [Socrates: 0] over [organism: 2] or [living
thing: 3] even if both other terms 2 and 3 in theory refer to Socrates.
Those words sharing the same categories will qualify the analogy as valid. Those
subjects sharing the same categories can be recognised as the same kind, at particular
levels of abstraction using the node hierarchies. If no similarities exist in a category
of subject, there is no relationship with the analogy category. Therefore the analogy
can not be applied. Figure 3.4 shows the comparison of the hyponym hierarchies
where Cleopatra and Socrates share a common hierarchy at person, where a rock at
first appears to be unrelated in category until the node of object is compared.
=0> Cleopatra =0> Woman =0>Socrates =0> Man =0>Rock
=1> Woman =1> Female =1>Man =1> Male =1>Natural Object
=2> Person =2> Person =2>Whole
=3>Organism =3>Organism =3>Object
=4>Living … =4>Living …
=5>Object =5>Object
Figure 3.34: Shows the comparison of hyponym hierarchies of Cleopatra, Socrates,
and a Rock
A rock is neither any attribute in the hierarchy above [Person: 0], therefore the
analogy of person can not apply to the rock. This is because the kinds of concept
categories shared by Cleopatra and Socrates and their hyponym hierarchies are too
dissimilar from the hyponym hierarchy of the rock at the level of [Person: 0]. By
45
taking the abstraction of the analogy further in comparison to [Object: 3], the Rock
may mistakenly fit the analogy by sharing a kind in common. Therefore, in this
example, the analogy may be incorrect. The analogy may be corrected by a statement
from a person or reference to fact.
3.24 Corrections and limitations on Analogy
Below is an example of how Hercules would limit the scope of the analogy. The
analogy templates remain Active (True) until challenged. Consider the frame in
figure 3.35 below which is the frame of the concept analogy of figure 3.31 where the
concept category has been replaced with a wild card to substitute for any word and
sense.
“* who ingest poison will die.”
Figure 3.35: Is the frame of the concept analogy of figure 3.31
Hercules made the higher level analogy resulting in this frame in the earlier example
shown by figure 3.31 using an abstraction of concepts for Socrates and Cleopatra.
Hercules has this information in its memory.
Considering that there exists attributes and data-structures in Hercules; similar to the
already reasoned “Socrates is a man, all men are mortal” example; we may broadly
understand the syntax structures of the figure 3.36 using the frame and analogy. Plain
text syntax combining concepts and frames can provide a sufficiently understandable
format to be stored within a database to represent frame data. Even if the data stored
in the database is not in the exact format as required, when an administrator of
Hercules is interpreting the concepts and data, they should be presented to the
administrator in a sufficiently understandable format such as in figure 3.36 or 3.37.
* who ingest poison [[will die] : subject to death]
Figure 3.36: Syntax structures of concepts and frames combined
The frame of figure 3.36 and any attribute, such as the definition of a word, may be
further abstracted using the category information as well ranges as described in
section 3.21 and by figure 3.25; shown by figure 3.37.
[[Category]: 0-5][who ingest poison [[will die: 1[Attribute: Definition: subject to
death]]].
Figure 3.37: Syntax for concepts and frames with category information included
Having a general concept outline as described by figure 3.37, the concept can now be
limited by another statement of fact. A statement is provided by a person or from
parsing an encyclopaedia. The statement is:
46
“A rock is not subject to death.”
Figure 3.38: Statement of fact provided by a person
This statement of figure 3.38 can now be used to distinguish the limits of the analogy
concept of figure 3.37. To limit the analogy, consider a distinction in the analogy of
figure 3.37 wherein the Hyponym hierarchy for “Rock” shown in figure 3.39 allows
us to determine the category and kind of distinction. The distinction must be drawn at
the most recognisable level which is at node 3 of figure 3.39 and [object: 3] of figure
3.40. There is a category and kind in common between the rock and the person.
=0> rock, stone
=1> natural object
=2> whole, unit
=3> object, physical object � Category of distinction in common
=4> physical entity
=5> entity
Figure 3.39: Rock hyponym hierarchy with the object category of distinction
[person: 0] [who ingest poison will die: 1] (Active)
[Organism: 1] [who ingest poison will die: 1] (Active)
[Living thing: 2] [who ingest poison will die: 1] (Active)
[Object: 3] [who ingest poison will die: 1] (Active) � Analogy is limited
[Physical entity: 4] [who ingest poison will die: 1] (Active)
[entity: 5] [who ingest poison will die: 1] (Active)
Figure 3.40: The expanded concept frame to a table or array of data
This is the limit of where a distinction can not be drawn for differing subjects of the
analogy sharing a category in common. Where the distinction can not be drawn, the
analogy hierarchy then on is not viable and must be Inactive. The entire analogy must
not be removed, as the analogy is correct depending on the subjects. It is only the
limitation through distinction that is the mechanism for applying the analogy and
determining an understanding of the statement.
By traversing down the Hyponym hierarchy forming the abstractions of the analogy,
we are able to determine the actual category verbose, or via synonyms to determine
that category. The point of distinction then limits the scope of the analogy when
discovered. Given that only one sense of the word “Rock” is of the particular
category intended by the person, and this known; with a distinguished word-sense and
unique Hyponym hierarchy, Hercules can draw a distinction limited to living things: 2
in the concept table shown in figure 3.40. The analogy is now limited to the correct
level of abstraction for it to remain valid.
47
IsA IsA
[Person: 0] [who ingest poison will die: 1] (Active)(Weight: x%)
[Organism: 1] [who ingest poison will die: 1] (Active) (Weight: x%)
[Living thing: 2] [who ingest poison will die: 1] (Active) (Weight: x%)
[Object: 3] [who ingest poison will die: 1] (Inactive) � no longer viable analogy
[Physical entity: 4] [subject to death: 1] (Inactive) � no longer viable analogy
[Entity: 5] [who ingest poison will die: 1] (Inactive) � no longer viable analogy
Figure 3.40: Illustrates the distinction drawn from the user input of figure 3.38 will
deactivate categories of the analogy and concept
Figure 3.41 below illustrates the overall picture of the upper and lower limits of
analogy in context with relationships and attributes. ISA relationships are shared in
common between Socrates and Cleopatra allowing the upper limit to be established.
The lower limit is distinguished when facts limit the level before the limit allows
extraneous attributes to conflict, indicating a different kind of concept below the limit.
Analogy Upper Limit
=2> person, individual,
someone, somebody, mortal, soul
=3> organism, being
=4> living thing, animate thing
Analogy Analogy Lower Limit
Frame is:
[ [[Person(s)] to [Living Organisms] except [Object]] ingest Poison will die ]
Figure 3.41: The upper and lower limits of analogy in context with relationships and
attributes
NOT:
Mortal
IS:
Mortal
IS:
Mortal
=0>Cleopatra
=1>Woman
=0>Rock
=1>Object
=0>Socrates
=1>Man
48
Ideally the analogy frame’s upper limit of 3.41 can be illustrated by figure 3.42, and
can be determined by repeated patterns recognised in sentence discourse which then
establish an analogy abstraction. Given that subject X and subject Y are of Type I,
any element E not shared by X or Y above I establishes the upper limit of the analogy
below E.
Analogy Upper Limit
Figure 3.42: The Analogy Upper Limit
Ideally the analogy frame’s lower limit of 3.41 can be illustrated by figure 3.42, and
can be determined by using the Wordnet hyponym hierarchies (HH) in making a
comparison of new subject Y with an established analogy abstraction. Given that
subject X accepts element E at all levels, and subject Y Refutes E at a common
intersection I, the lower limit of the analogy is established above I, and the analogy
remains valid in the section indicated by the “pass”.
Figure 3.43: The Analogy Lower Limit
3.25 Truth and Weight in analogy
In order to determine an understanding of the discourse, a weight must be added to the
analogy data. The weight assists in providing a determination through probability.
The weight is increased as patterns are identified as being repeated. This confirms
valid communication structures even if the meaning is unknown. The fact a person
has provided the sentence, assumes there must be valid logic behind the pattern. The
probability then provides the context once a meaning can be extracted via experience,
statistics, and the frame context signatures described in section 3.61. The truth
weightings are attached as metadata to the frame as circumstances dictate.
Analogy Lower Limit
E:
NOT:
Mortal
=X>Socrates
=E1>Man
=I>Person
=Y>Cleopatra
=E2>Woman
=I>Person
=I>Person
=O>Organism
=S> …
HH1:
=X2> person
=X3> organism, being
=X4> living thing
=X5> I: Object
=X6> physical entity
=X7> entity IS:
Mortal
HH2:
=Y0>Rock
=Y1>I: Object
49
Other factors that assist with the weighting of the frame are also included in the
database tables; however, the formulas that utilise the statistics have not yet been
designed. The following frame table information can be utilised in any script or
algorithm. A more detailed explanation and use will be explained in future work,
however the table information provides a framework for expansion of the capabilities
of the parser.
Frame-Data: contains the Text Frame or Analogy Frame
Reference: allows a pointer memory address to be stored in the database
Category: is the bit defined category of the context or domain
Order: is the order of the frame in the linked list
Weight: is the weight attributed to the frame
Transform: is the level of category that the frame transforms to a new category
Threshold: is the limit to where the frame is active and relevant
Relationship: is a flexible attribute with no specific definition, used as required
Concepts: provides bit defined categories of the underlying and interrelated concepts
in the concept database, which may fit to the frame
Concept-Mask: assists as a filter to incoming concepts to the frame
Tense: assists in identifying bit defined Tense categories, and the tense related
concepts
Tense-Mask: assists as a filter to incoming tense concepts to the frame
Extensions: acts as a bit defined placeholder for expansion of the frame table
information
Extension-Mask: assists as a filter for the bit defined extensions
Data-link: provides a link to external resources of data relevant to the frame
Reversion: provides tracking for earlier versions of the current frame that the current
frame has been transformed from
Exclusions: provides a list of exceptions to the current frame being activated for a
particular category
Links: Allows the frame to be linked to another
Formula: is the formula or script attached to the frame
50
Section 4: Experimentation, Results and Analysis
As Hercules is in the prototype stages, most experiments, analysis and results will be
carried out after future work. Hercules has been designed to run experiments on
patterns of discourse, and store that data in a database. The results of the experiments
will then supplement a context for the discourse to be understood in the context
provided by the user. It is anticipated that a neural network style learning using the
scores, signatures, concept fragments and wave analysis will supplement the
understanding. Because of the broad nature and flexibility Hercules can provide it
would be outside the scope of this report to examine specific formulae and the results
that may be found; except to say that Hercules is capable of having user defined
functions and symbols mapped to software #defined values for flexibility in executing
hard-coded methods to support the experiments envisaged.
51
Section 5: Future Work
5.1 Algorithm Design
As described earlier in section 3.61 Frame Queries and Statistics, A hybrid Approach
to Word Sense Disambiguation: Neural Clustering with Class Labelling section 2.6
may be used in conjunction with A Generative Model for Semantic Role Labelling
section 2.7 where Role labelling is supplemented by speech patterns identified by
Frame Queries and Statistics of section 3.61 for the abstraction of concepts in section
3.9 to assist in the understanding of shared communications and goals such as in
section 3.10.
5.2 Hercules Parser Enhancements
Hercules must be augmented to handle many of the logic functions and elements that
are used in Set Theory, and Predicate Logic; this will allow formal logic algorithms to
be used along side mathematical equations with subsets of data, to control the
interpretations and response of the Hercules Parser. The graphical user interface must
be enhanced to assist with streamlining manual enhancements.
Frame and Table metadata uses must be further defined and used along side the script
functions and operations which will assist in reaching the goals of true artificial
intelligence.
5.3 Reaching the goal of True Artificial Intelligence
Ultimately, the Hercules Parser will likely construct its own scripts automatically, to
deal with new communications and concepts.
The accuracy of the hypothesis Hercules forms and replies to, with regard to the
communications, may be the true test of the intelligence of the machine.
52
Section 6: Concluding Remarks
The Hercules parser has been created with so much flexibility in mind that it is
difficult to discuss the entire program and all functions it is capable of. I have aimed
this exposition at the most interesting and important components of the parser used
with Wordnet. The Hercules parser can provide a simple and easy way to design, test,
and implement Artificial Intelligence algorithms, formulae and scripts using a
graphical interface, without the need for others to have a complex knowledge of
computer programming to do so.