distributed representations in ai: building the world model and analogical reasoning
DESCRIPTION
Distributed representations in AI: Building the world model and analogical reasoning. - PowerPoint PPT PresentationTRANSCRIPT
1
Distributed representations in AI:Building the world model and
analogical reasoning
Slide number is in the right lower corner:
Dr. Dmitri Rachkovskij
Dept. Of Neural Information Processing Technologies
International Research and Training Center of Information Technologies and Systems, Kiev, Ukraine
National Ukrainian Academy of Sciences
2
2. The world model based on distributed representationsSymbolic, local, and distributed representationsGeneral architecture of the world modelRepresentation and processing of simple structuresRepresentations of sequences and episodes
1. Intro: The world model - content and organizationThe world model: models of attributes, objects, relations, episodes Part-whole hierarchyClassification hierarchy
3. Analogical reasoningAnalogy and its research and modeling by cognitive psychologistsModeling of analogy with distributed representations
Distributed representations in AI:Building the world model and analogical reasoning: Plan
3
The Agent's internal world model – the system of knowledge about domain and Agent
itselfNecessary for organization of intelligent behavior
The world model stores episode and situations
encountered by the Agent
reactions on situations evaluations of results etc.
The world model is used for
recognition analysis prediction reaction etc.
AGENT
BRAIN
WorldMODEL
t1
t0<t1
t2>t1
WORLD
4
Models of attributes, objects, relations, situations,
...
Content of models - appearance, structure, behavior, etc. of objects
Models ofAttributes
black, furry, barking, big, four-legged...
Objects - real or ideal
physical bodies (table)
animals (dog)
unreal (centaur)
Episodes and situations - many objects and relations (hunting, war,..)
Relations
spatial (above)temporal (after)part-whole (part-of)
Relation R(X,Y,…) requires several objects X,Y,…
Undirected (X and Y are neighbors)
Directed (X above Y, X sold Y a BOOK)
Arguments of directed relations have roles (agent X, object Y , etc.)
5
Compositional structure of the world model: part-whole relations and
hierarchy
Models are interrelated. Model of an object (car) is associated with models of its parts (body, motor, wheels), attributes (color, etc.)
Division objects-attributes is not absolute. Model of the part (e.g., wheel) can have
its own models-parts (tire, rim, cap), attributes (shape - ring, texture –
"protector", color - black, material - rubber)
Structural attributesAttributes of
appearance
Mode-wholecar
Model-wholewheel, body,
motor, ...Model-wholewheel
Model-parttire, rim, cap...
Model-wholewheel
Model-partattributes
Color(black)
Texture("protector")
Shape(ring)
car
tire
body
wheel
rim
cap
6
Model of situation
Events
Objects andrelations
Situation
Approach dog ball Bit Explode
Approach(dog, ball)
Bit(dog, ball)
Explode(ball)
A dog approached a ball, the dog bit the ball, the ball exploded
Model-whole may include (associate) models-parts of goals, actions, costs, evaluations,
feelings, и т.д.
Hierarchy part-whole is also named as "meronymy-holonymy", "aggregate", "modular", "compositional", "structural"
7
Examples of hierarchical structures
Logical or symbolic propositions Patterns in structural or syntactic recognition Complex chemical substances Proteins in molecular biology Computer programs Knowledge bases etc.
ba
Fy
ff
F
F(a,f(y),f(y,F(a,b)))
X2 X3X1
N
NN
NH2
b
bbbbbbbaaa aaaa
a
a
a
8
Classification structure of a world model:
is-a relations and hierarchy
Models of classes - combinations of attribute models
Is-a relation (cat is an animal) is also hierarchical -
there exist more abstract (general) or less abstract (specific) classes
ANIMALS
DOGSCATS
CHAO-CHAOS SPANIELS
SPANIAL FIDO
9
Operations with class models
Classes allow transferring experience to new objects and situations and making predictions. E.g., similar looking objects often behave similar.
MODEL OF OBJECTOR CLASS
UN OBSERVABLEATTRIBUTES (MODELS)
OBSERVABLEATTRIBUTES (MODELS)
RECOGNITION PREDICTION
APPEARANCE STRUCTURE,BEHAVIOR...
10
Available "world models"
World models ~ Knowledge Bases ~ Ontologiesare beginning to be used in diverse applications
CYC - a general ontology for commonsense knowledge
(Lenat and Guha)
UMLS (Unified Medical Language System)
- an ontology of medical concepts
(Humphreys and Lindberg )
WORDNET - one of the most comprehensive lexical ontologies
(Miller)
etc.
11
Representation of information in the world models
Representation schemes used in ontologies:
Conceptual graphs
Semantic networks
Frames
Prolog predicates
Other special representation languages
All these representation schemes are based on traditional symbolic and local representations of information - they have drawbacks
12
Distributed representations
(1) immediately reflect similarity degree(2) provide high information capacity
Allow(3) using of associative memory(4) formation of part-whole hierarchies(5) modeling of analogical reasoning
13
Local and symbolic implementations of models
address field1 field2 field3 field4…
1 name_А "A" -- …
2 name_В "В" -- …
3 name_АВ "АВ"address1 address2 …
... ... ... ... ... ...
N
Representation of information in computer Pyramidal networks
A 100000...0...B 010000...0...C 001000...0...
Z 000000...1...
... ...A
... ...B
... ...C
......Z...
......
...
Resource pool Codevector#1 #z
Resource unit
А B C ...
АB АC
Bracketed representations ((A B) (A C D))...(...)
Symbolic representationsentities (sun planet)
expressions (((mass sun) :name masssun)
((mass planet) :name mass-planet) ((greater masssun
massplanet) :name
14
Problems with symbolic and local representations
BC&CABC&AB BC&AC
CBBCCAACBA
A
AB
...... .........
......
...
B C
AB&AC AB&BC AB&BA AB&CA
AC&AB AC&BA AC&CAAC&BC
BC&BA
All-or-none similarityModels are either
identical or unsimilar
А is identical to АА is non-identical to
В,С,…,Z…
А - in a pointer to((B C) ((B D) (X Y Z))...
Information capacity Coding "1 from N"In N units - up to N models
15
...
...
...
...X Y
Comparison and estimation of similarity of complex structured models are very complex (comparison of graphs - finding partial isomorphism) X Y
Problems with symbolic and local representations
16
Graph isomorphism is not enough to justify estimation of similarity by humans
(1) The fascists invaded France, causing people to flee France
(2) Rats infested the apartment, causing the people to leave the apartment
(3) The game show host kissed the contestant, inviting the audience to applaud the contestant
Abstract structure of 3 sentences:Relation12 [Relation1(X,Y), Relation2(Z,Y)]
Correspondences due to the abstract schemeThe fascists the game show host (X) ??
France contestant (Y) ?? invaded kissed (Relation1) ??
People audience (Z) ?? to flee to applaud (Relation2) ?
Problems with symbolic and local representations
17
Distributed representations
In distributed representations, any object is represented by a distributed pattern of units' activity
Resourse pool Codevector
A 010101...0B 010001...0C 001000...1
Z 001000...0
...A
...B
...C
......Z... ...
...
#N#1
Long binary sparse stochastic codevector binary (elements 0 and 1) Overlap:
O=p*p*N number of elements N~100 000 p=M/N number of 1s M~1000 << N O =
M*M/N=10=1% 1s have (pseudo)random positions
18
Advantages of distributed representationsEfficient use of resources for information representation Up to CN
M items (compare to N for local representations)
Natural representation and estimation of similarity. Similarity of X and Y is calculated by dot-product:
S(X,Y) = |X & Y| = SUM XiYi, i =1,…,NFor binary vectors, dot-product = number of overlapping 1s
A B
... 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 B
0 0 1 0 0 0 0 1 A 0 0 1 0 1 1 0 0 1 0 ... 1 0 1 0 0
... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
A B
... 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 B
0 0 1 0 0 0 0 1 A 0 0 1 0 1 1 0 0 1 0 ... 1 0 1 0 0
... 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
S0.01
S0.5
S0.01 S0.5 S1
ABA AB B
A B
... 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 B
0 0 1 0 0 0 0 1 A 0 0 1 0 1 1 0 0 1 0 ... 1 0 1 0 0
... 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0
S0.5
19
Distributed representations
(1) immediately reflect similarity degree(2) provide high information capacity
Allow(3) using of associative memory(4) formation of part-whole hierarchies(5) modeling of analogical reasoning
20
Some historyAcademician N.M.Amosov founded Dept. Of BioCybernetics, Inst. Of Cybernetics, Kiev, in
1960s
Modeling of Thinking and the Mind, Spartan Books, USA, 1967.
M-networks
Amosov, N.M., Basilevsky E.B., Kasatkin A.M., Kasatkina L.M., Luk A.N., Kussul E.M., & Talayev S.A. (1972) M-network as possible basis for construction of heuristic models, Cybernetica 3, pp. 169-186.
1974 - first-in-the-world autonomous vehicle controlled by neural networks. The vehicle could move in natural environment
Amosov, N.M., Kasatkin, A.M., & Kasatkina L.M. (1975) Active semantic networks in robots with an autonomous control. Fourth Intern. Joint Conference on Artificial intelligence, v.9, pp. 11-20
Amosov, N. M., Kussul, E. M., & Fomenko, V. D. (1975) Transport robot with a neural network control system. Advance papers of the Fourth Intern. Joint Conference on Artificial intelligence v.9 pp.1-10
21
Associative-Projective Neural Networks
APNNs - proposed by Dr. Kussul in 1983
E.Kussul (1992) Associative neuron-like structuresT.Baidyk (2002)Neural networks and problems of Artificial IntelligenceAmosov, Baidyk, Goltsev,Kasatkin, Kasatkina, Kussul, Rachkovskij Neurocomputers and Intelligent Robots(Spanish translation is prepared)
Donald Hebb (1949) Organization of behavior
22
The APNN architecture of the world modelbased on distributed representations
objects
vision... ...
olfactory...
shape texture color size
hearing touch
name
sensorial attributes
parallel schemesof modules
...
taste
BUF
ASC
BUF
BUF BUF
BUF
BUF
M
Module M includes buffer fields BUF and associative field ASC. BUF - make bitwise operations ASC - returns the most similarcodevector from the memory
23
Vector A'
Auto-associative
memory
Vector A
The APNN architecture: associative field
1.Vector A
3. Vector C
2. Vector B
L. Vector XY...
Auto-associative
memory
Storage
Retrieval
24
Vector A'
Max DotProduct
A'A
Vector A
Vector A
Vector B
Vector C
Vector XY...
The APNN architecture: associative fieldlocal version
1.Vector A
3. Vector C
2. Vector B
L. Vector XY...
1.Vector A
2.Vector B
3.Vector C
L.Vector XY...
NStorage
Retrieval
distributed version (Hopfield-type)
1.Vector A
N
3. Vector C
2. Vector B
L. Vector XY...
MatrixNxN
Vector A'
VectorMatrixProduct +Threshold
Vector A
MatrixNxN
25
A simple parallel scheme of construction and processing of part-whole structures
M1: A'
M3: A' ' A
M2: '
M1: A
M3: A
M2:
M1: A
M3: A
M2:
M1: A
M3: A
M2:
A,B,C,... ,,,...
Construction by superposition (disjunction) & thinning|<A v alpha> & A| = = 0.5 |A| = 0.5 |alpha|
<...> = thinning = grouping
Searching of the most similar modelA' is similar to A
alpha' is similar to alpha
Decoding of the model-whole through its parts
26
M1: A
M3: A
M1: A
M3: A
M2:
A
attribute attributepart partobject objectvariable valuesituation reactionsituation evaluation
Retrieval of the model-whole by its model-part
Association between two models-parts
Interpretations of A
A simple parallel scheme of construction and processing of part-whole structures
27
Architecture of the world modelusing distributed representations
object
vision... ...
olfactory...
shape texture color size
hearing touch
name
sensory attributes
"parallel" modulescheme
...
taste
"vertical" modulescheme
28
Representation of sequences and hierarchical episodes
( ,(),(,(,)) ) - labeled ordered acyclic graph>>1 >>1>>2 >>1>>1>>2>>2 >>3
AB vs BA. X>>n (n is #). A>>1 B>>2 vs B>>1 A>>2.
biteflee
causeid_spotid_jane
humandog
spot = dog id_spotjane = human id_jane
BITE = 1bite spot jane spot>>agent jane>>objectFLEE = 1flee spot jane jane>>agent spot>>object
P = 2cause BITE FLEE BITE>>cause_antc FLEE>>cause_cnsq
"Spot bit
Jane,
Jane to flee from
Spot"
causing
29
Properties of distributed representations of complex hierarchical
models
The same dimensionality of codevectors of parts and wholes
Codevector of model-whole is similar to codevectors of models-partsSimilar (with respect to objects and relations) hierarchical models have similar codevectorsAssociations(between parts and wholes, attributes and objects,
attributes and classes, objects and situations, etc.) are made by similarity of codevectors (not by connections or pointers,
as in local and symbolic representations)
30
Distributed representations
(1) immediately reflect similarity degree(2) provide high information capacity
Allow(3) using of associative memory(4) formation of part-whole hierarchies(5) modeling of analogical reasoning
31
Comparison of hierarchical structures and analogical reasoning
An ability to estimate easily similarity of complex structured representations is essential for many AI problems
One interesting problem is modeling of analogical reasoning
Gentner & Markman (1995, 1997, 2003);Hummel & Holyoak (1997); Eliasmith & Thagard
(2001)
Analogy, metaphor - a comparison process that allows consideration of one domain from the point of view of different domain (Gentner & Markman 1995, 2003)
Rutherford: solar system atom
32
Similarity of analogs
Analogs are hierarchical structured episodes or situations
Analogs are compared not only by "surface similarity" (common or similar elements - objects, relations, etc. )
"Structural similarity" is very important - how the elements are grouped in analogs ("structural consistency", "isomorphism")
analog X analog Y
33
3 stages of analogy processing
1. Access (retrieval, recall) - process of finding in memory the most similar base analog given target (input, cue, probe) episode
2. Mapping - process of finding correspondences between the elements of two analogs
3. Inferences about target analog based on the info from the base
EPISODE 1
EPISODE 2
EPISODE N
...
EPISODE Х
TARGETEPISODE
BASEEPISODES
EPISODE Y
MEMORY
34
Solution of problems. Base analog is used as a source of ideas about the target problemExplanations. Base analog is used for understanding of target analogFormation and evaluation of hypothesisJustification of point of view (In political, historical, etc. discussions)In literature. Etc.
Analogy in everyday life
35
Analogy in solving problems
Target analog - heat flow. Base analog - water flow.
FLAT-TOP LIQUIDDIAMETER
GREATER
DIAMETER
BEAKER VIAL WATER PIPE
FLOW
GREATER
PRESSUREPRESSURE
Water FlowHeat flow coffee
ice cube
No border
Relation or attribute
Function
Entity
Match Hypothesis
CAUSE
36
Example - episodes with animals, adapted from Thagard, Holyoak, Nelson & Gochfeld (1990).
General scheme: R0 (R1(X,Y), R2(Y,X))The episodes have the same relations, as Probe, but various
types of similarity
Study of analogy in humans. Various types of analog similarity. Episodes with animals.
Similarity type EpisodeProbe
PSpot bit Jane Jane flee from Spot
Literal SimilarityLS
Fido bit John John flee from Fido
Cross-MappingCM
Fred bit Rover Rover flee from Fred
Surface FaturesSF
John flee from Fido Fido bit John
AnalogyAN
Mort bit Felix Felix flee from Mort
1st Order relationsonly FOR
Mort flee from Felix Felix bit Mort
37
Access to analogical episodesSimilar structures have similar codevectors. Total similarity
of structures is evaluated by the overlap of their codevectors
Access to analogical episode is done by finding in long-term memory a codevector most similar to the codevector of the input episode
Modeling analogical reasoning with distributed representations
EPISODE 1 codevector
EPISODE 2 codevector
EPISODE N codevector
...
CODEVECTOR of EPISODE Х
TARGETEPISODE
BASEEPISODES
CODEVECTOR of EPISODE Y
MEMORY
38
Access to analogical episodes Episodes with animals
Humans demonstrate the following pattern of retrieving analogs from long-term memory: LS > CM SF > AN > FOR.
Forbus, Gentner& Law 1995; Ross 1989; Wharton, Holyoak 1994
Similarity values between codevectors of episodes (our model)
Type of Similaritysimilarity Episode value
P Spot bit Jane Jane flee from Spot
1.00
LS Fido bit John John flee from Fido
0.40
CM Fred bit Rover Rover flee from Fred
0.30
SF John flee from Fido Fido bit John
0.24
AN Mort bit Felix Felix flee from Mort
0.14
FOR Mort flee from Felix Felix bit Mort
0.09
Range of 0.005-0.008
39
Mapping (interpretation) of analogy - find corresponding elements of the analogs
In our model, many analogs can be mapped by direct similarity of the codevectors of their elements
Mapping analogs with distributed representations
40
biteflee
causeid_spotid_jane
humandog
spot = dog id_spotjane = human id_jane
BITE = 1bite spot jane spot>>agent jane>>objectFLEE = 1flee spot jane jane>>agent spot>>object
P = 2cause BITE FLEE BITE>>cause_antc FLEE>>cause_cnsq
level#4 level#3 Level#2 level#1Probe Antc Cnsq Bite Flee bite_a bite_o flee_a flee_o
level#4 E_AN 0.25 0.19 0.19 0.08 0.08 0.06 0.07 0.07 0.07level#3 Antc 0.19 0.33 0.02 0.11 0.03 0.10 0.10 0.02 0.02
Cnsq 0.20 0.02 0.35 0.02 0.12 0.02 0.03 0.11 0.11level#2 Bite 0.07 0.11 0.03 0.16 0.03 0.15 0.14 0.02 0.02
Flee 0.07 0.02 0.11 0.02 0.16 0.02 0.02 0.14 0.14level#1 bite_a 0.06 0.10 0.02 0.15 0.02 0.26 0.02 0.02 0.01
bite_o 0.06 0.09 0.02 0.13 0.02 0.02 0.25 0.02 0.02flee_a 0.06 0.02 0.10 0.02 0.14 0.02 0.02 0.26 0.02flee_o 0.07 0.02 0.09 0.02 0.13 0.01 0.03 0.02 0.25
Mapping by similarity
41
Mapping analogs with distributed representations. More sequential scheme.
1flee_obj felix = flee_o
1flee_agt mort = flee_a
Flee = 2flee_a flee_o
Antc = 3cause_antc Flee
bite_o =1jane bite_obj
bite_a = 1spot bite_agt
Bite = 2 bite_a bite_o
Antc = 3Bite cause_antc
Cnsq = 3Flee cause_cnsq
Flee = 2flee_a flee_o
0.03jane
E_FOR = 4Cnsq AntcProbe = 4Cnsq Antc
mortspot
bite_agt
lev#base:lev#base: lev#0:Characters Object rolesAgent roles
lev#4
cause_cnsq
cause_cnsq
flee_agt
0.28(0.09)
1.0
flee_a = 1flee_agt jane
flee_o = 1flee_obj spot
0.57
lev#3
lev#2
lev#4
0.32(0.11)
Cnsq = 3cause_cnsq Bite
0.51(0.3)
0.57
lev#3
0.68(0.03)
0.67(0.02)
0.51(0.3)
0.33(0.01)
0.01
0.03
felix
lev#1
0.5(0.3)
2bite_a bite_o = Bite
0.67(0.03)
0.34(0.01)
lev#2
lev#1
0.68(0.02)
flee_obj
bite_obj
0.01
0.67(0.29)
0.5(0.3)
0.33(0.01)
0.02(0.16)
0.65(0.02) 0.67(0.29)
0.02(0.26) bite_a = 1felix bite_agt
0.02(0.26) 1mort bite_obj = bite_o
0.33(0.01)
42
Distributed representations in APNNs
(1) immediately reflect similarity degree(2) provide high information capacity
Allow(3) using of distributed associative
memory(4) formation of part-whole hierarchies(5) modeling of analogical reasoning
43