distributed representations in ai: building the world model and analogical reasoning

1

Distributed representations in AI:Building the world model and

analogical reasoning

Slide number is in the right lower corner:

Dr. Dmitri Rachkovskij

Dept. Of Neural Information Processing Technologies

International Research and Training Center of Information Technologies and Systems, Kiev, Ukraine

National Ukrainian Academy of Sciences

[email protected]

2

2. The world model based on distributed representationsSymbolic, local, and distributed representationsGeneral architecture of the world modelRepresentation and processing of simple structuresRepresentations of sequences and episodes

1. Intro: The world model - content and organizationThe world model: models of attributes, objects, relations, episodes Part-whole hierarchyClassification hierarchy

3. Analogical reasoningAnalogy and its research and modeling by cognitive psychologistsModeling of analogy with distributed representations

Distributed representations in AI:Building the world model and analogical reasoning: Plan

3

The Agent's internal world model – the system of knowledge about domain and Agent

itselfNecessary for organization of intelligent behavior

The world model stores episode and situations

encountered by the Agent

reactions on situations evaluations of results etc.

The world model is used for

recognition analysis prediction reaction etc.

AGENT

BRAIN

WorldMODEL

t1

t0<t1

t2>t1

WORLD

4

Models of attributes, objects, relations, situations,

...

Content of models - appearance, structure, behavior, etc. of objects

Models ofAttributes

black, furry, barking, big, four-legged...

Objects - real or ideal

physical bodies (table)

animals (dog)

unreal (centaur)

Episodes and situations - many objects and relations (hunting, war,..)

Relations

spatial (above)temporal (after)part-whole (part-of)

Relation R(X,Y,…) requires several objects X,Y,…

Undirected (X and Y are neighbors)

Directed (X above Y, X sold Y a BOOK)

Arguments of directed relations have roles (agent X, object Y , etc.)

5

Compositional structure of the world model: part-whole relations and

hierarchy

Models are interrelated. Model of an object (car) is associated with models of its parts (body, motor, wheels), attributes (color, etc.)

Division objects-attributes is not absolute. Model of the part (e.g., wheel) can have

its own models-parts (tire, rim, cap), attributes (shape - ring, texture –

"protector", color - black, material - rubber)

Structural attributesAttributes of

appearance

Mode-wholecar

Model-wholewheel, body,

motor, ...Model-wholewheel

Model-parttire, rim, cap...

Model-wholewheel

Model-partattributes

Color(black)

Texture("protector")

Shape(ring)

car

tire

body

wheel

rim

cap

6

Model of situation

Events

Objects andrelations

Situation

Approach dog ball Bit Explode

Approach(dog, ball)

Bit(dog, ball)

Explode(ball)

A dog approached a ball, the dog bit the ball, the ball exploded

Model-whole may include (associate) models-parts of goals, actions, costs, evaluations,

feelings, и т.д.

Hierarchy part-whole is also named as "meronymy-holonymy", "aggregate", "modular", "compositional", "structural"

7

Examples of hierarchical structures

Logical or symbolic propositions Patterns in structural or syntactic recognition Complex chemical substances Proteins in molecular biology Computer programs Knowledge bases etc.

ba

Fy

ff

F

F(a,f(y),f(y,F(a,b)))

X2 X3X1

N

NN

NH2

b

bbbbbbbaaa aaaa

a

a

a

8

Classification structure of a world model:

is-a relations and hierarchy

Models of classes - combinations of attribute models

Is-a relation (cat is an animal) is also hierarchical -

there exist more abstract (general) or less abstract (specific) classes

ANIMALS

DOGSCATS

CHAO-CHAOS SPANIELS

SPANIAL FIDO

9

Operations with class models

Classes allow transferring experience to new objects and situations and making predictions. E.g., similar looking objects often behave similar.

MODEL OF OBJECTOR CLASS

UN OBSERVABLEATTRIBUTES (MODELS)

OBSERVABLEATTRIBUTES (MODELS)

RECOGNITION PREDICTION

APPEARANCE STRUCTURE,BEHAVIOR...

10

Available "world models"

World models ~ Knowledge Bases ~ Ontologiesare beginning to be used in diverse applications

CYC - a general ontology for commonsense knowledge

(Lenat and Guha)

UMLS (Unified Medical Language System)

- an ontology of medical concepts

(Humphreys and Lindberg )

WORDNET - one of the most comprehensive lexical ontologies

(Miller)

etc.

11

Representation of information in the world models

Representation schemes used in ontologies:

Conceptual graphs

Semantic networks

Frames

Prolog predicates

Other special representation languages

All these representation schemes are based on traditional symbolic and local representations of information - they have drawbacks

12

Distributed representations

(1) immediately reflect similarity degree(2) provide high information capacity

Allow(3) using of associative memory(4) formation of part-whole hierarchies(5) modeling of analogical reasoning

13

Local and symbolic implementations of models

address field1 field2 field3 field4…

1 name_А "A" -- …

2 name_В "В" -- …

3 name_АВ "АВ"address1 address2 …

... ... ... ... ... ...

N

Representation of information in computer Pyramidal networks

A 100000...0...B 010000...0...C 001000...0...

Z 000000...1...

... ...A

... ...B

... ...C

......Z...

......

...

Resource pool Codevector#1 #z

Resource unit

А B C ...

АB АC

Bracketed representations ((A B) (A C D))...(...)

Symbolic representationsentities (sun planet)

expressions (((mass sun) :name masssun)

((mass planet) :name mass-planet) ((greater masssun

massplanet) :name

14

Problems with symbolic and local representations

BC&CABC&AB BC&AC

CBBCCAACBA

A

AB

...... .........

......

...

B C

AB&AC AB&BC AB&BA AB&CA

AC&AB AC&BA AC&CAAC&BC

BC&BA

All-or-none similarityModels are either

identical or unsimilar

А is identical to АА is non-identical to

В,С,…,Z…

А - in a pointer to((B C) ((B D) (X Y Z))...

Information capacity Coding "1 from N"In N units - up to N models

15

...

...

...

...X Y

Comparison and estimation of similarity of complex structured models are very complex (comparison of graphs - finding partial isomorphism) X Y


16

Graph isomorphism is not enough to justify estimation of similarity by humans

(1) The fascists invaded France, causing people to flee France

(2) Rats infested the apartment, causing the people to leave the apartment

(3) The game show host kissed the contestant, inviting the audience to applaud the contestant

Abstract structure of 3 sentences:Relation12 [Relation1(X,Y), Relation2(Z,Y)]

Correspondences due to the abstract schemeThe fascists the game show host (X) ??

France contestant (Y) ?? invaded kissed (Relation1) ??

People audience (Z) ?? to flee to applaud (Relation2) ?


17


In distributed representations, any object is represented by a distributed pattern of units' activity

Resourse pool Codevector

A 010101...0B 010001...0C 001000...1

Z 001000...0

...A

...B

...C

......Z... ...

...

#N#1

Long binary sparse stochastic codevector binary (elements 0 and 1) Overlap:

O=p*p*N number of elements N~100 000 p=M/N number of 1s M~1000 << N O =

M*M/N=10=1% 1s have (pseudo)random positions

18

Advantages of distributed representationsEfficient use of resources for information representation Up to CN

M items (compare to N for local representations)

Natural representation and estimation of similarity. Similarity of X and Y is calculated by dot-product:

S(X,Y) = |X & Y| = SUM XiYi, i =1,…,NFor binary vectors, dot-product = number of overlapping 1s

A B

... 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 B

0 0 1 0 0 0 0 1 A 0 0 1 0 1 1 0 0 1 0 ... 1 0 1 0 0

... 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

A B

... 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 B

0 0 1 0 0 0 0 1 A 0 0 1 0 1 1 0 0 1 0 ... 1 0 1 0 0

... 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0

S0.01

S0.5

S0.01 S0.5 S1

ABA AB B

A B

... 0 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 B

0 0 1 0 0 0 0 1 A 0 0 1 0 1 1 0 0 1 0 ... 1 0 1 0 0

... 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0

S0.5

19




20

Some historyAcademician N.M.Amosov founded Dept. Of BioCybernetics, Inst. Of Cybernetics, Kiev, in

1960s

Modeling of Thinking and the Mind, Spartan Books, USA, 1967.

M-networks

Amosov, N.M., Basilevsky E.B., Kasatkin A.M., Kasatkina L.M., Luk A.N., Kussul E.M., & Talayev S.A. (1972) M-network as possible basis for construction of heuristic models, Cybernetica 3, pp. 169-186.

1974 - first-in-the-world autonomous vehicle controlled by neural networks. The vehicle could move in natural environment

Amosov, N.M., Kasatkin, A.M., & Kasatkina L.M. (1975) Active semantic networks in robots with an autonomous control. Fourth Intern. Joint Conference on Artificial intelligence, v.9, pp. 11-20

Amosov, N. M., Kussul, E. M., & Fomenko, V. D. (1975) Transport robot with a neural network control system. Advance papers of the Fourth Intern. Joint Conference on Artificial intelligence v.9 pp.1-10

21

Associative-Projective Neural Networks

APNNs - proposed by Dr. Kussul in 1983

E.Kussul (1992) Associative neuron-like structuresT.Baidyk (2002)Neural networks and problems of Artificial IntelligenceAmosov, Baidyk, Goltsev,Kasatkin, Kasatkina, Kussul, Rachkovskij Neurocomputers and Intelligent Robots(Spanish translation is prepared)

Donald Hebb (1949) Organization of behavior

22

The APNN architecture of the world modelbased on distributed representations

objects

vision... ...

olfactory...

shape texture color size

hearing touch

name

sensorial attributes

parallel schemesof modules

...

taste

BUF

ASC

BUF

BUF BUF

BUF

BUF

M

Module M includes buffer fields BUF and associative field ASC. BUF - make bitwise operations ASC - returns the most similarcodevector from the memory

23

Vector A'

Auto-associative

memory

Vector A

The APNN architecture: associative field

1.Vector A

3. Vector C

2. Vector B

L. Vector XY...

Auto-associative

memory

Storage

Retrieval

24

Vector A'

Max DotProduct

A'A

Vector A

Vector A

Vector B

Vector C

Vector XY...

The APNN architecture: associative fieldlocal version

1.Vector A

3. Vector C

2. Vector B

L. Vector XY...

1.Vector A

2.Vector B

3.Vector C

L.Vector XY...

NStorage

Retrieval

distributed version (Hopfield-type)

1.Vector A

N

3. Vector C

2. Vector B

L. Vector XY...

MatrixNxN

Vector A'

VectorMatrixProduct +Threshold

Vector A

MatrixNxN

25

A simple parallel scheme of construction and processing of part-whole structures

M1: A'

M3: A' ' A

M2: '

M1: A

M3: A

M2:

M1: A

M3: A

M2:

M1: A

M3: A

M2:

A,B,C,... ,,,...

Construction by superposition (disjunction) & thinning|<A v alpha> & A| = = 0.5 |A| = 0.5 |alpha|

<...> = thinning = grouping

Searching of the most similar modelA' is similar to A

alpha' is similar to alpha

Decoding of the model-whole through its parts

26

M1: A

M3: A

M1: A

M3: A

M2:

A

attribute attributepart partobject objectvariable valuesituation reactionsituation evaluation

Retrieval of the model-whole by its model-part

Association between two models-parts

Interpretations of A

A simple parallel scheme of construction and processing of part-whole structures

27

Architecture of the world modelusing distributed representations

object

vision... ...

olfactory...

shape texture color size

hearing touch

name

sensory attributes

"parallel" modulescheme

...

taste

"vertical" modulescheme

28

Representation of sequences and hierarchical episodes

( ,(),(,(,)) ) - labeled ordered acyclic graph>>1 >>1>>2 >>1>>1>>2>>2 >>3

AB vs BA. X>>n (n is #). A>>1 B>>2 vs B>>1 A>>2.

biteflee

causeid_spotid_jane

humandog

spot = dog id_spotjane = human id_jane

BITE = 1bite spot jane spot>>agent jane>>objectFLEE = 1flee spot jane jane>>agent spot>>object

P = 2cause BITE FLEE BITE>>cause_antc FLEE>>cause_cnsq

"Spot bit

Jane,

Jane to flee from

Spot"

causing

29

Properties of distributed representations of complex hierarchical

models

The same dimensionality of codevectors of parts and wholes

Codevector of model-whole is similar to codevectors of models-partsSimilar (with respect to objects and relations) hierarchical models have similar codevectorsAssociations(between parts and wholes, attributes and objects,

attributes and classes, objects and situations, etc.) are made by similarity of codevectors (not by connections or pointers,

as in local and symbolic representations)

30




31

Comparison of hierarchical structures and analogical reasoning

An ability to estimate easily similarity of complex structured representations is essential for many AI problems

One interesting problem is modeling of analogical reasoning

Gentner & Markman (1995, 1997, 2003);Hummel & Holyoak (1997); Eliasmith & Thagard

(2001)

Analogy, metaphor - a comparison process that allows consideration of one domain from the point of view of different domain (Gentner & Markman 1995, 2003)

Rutherford: solar system atom

32

Similarity of analogs

Analogs are hierarchical structured episodes or situations

Analogs are compared not only by "surface similarity" (common or similar elements - objects, relations, etc. )

"Structural similarity" is very important - how the elements are grouped in analogs ("structural consistency", "isomorphism")

analog X analog Y

33

3 stages of analogy processing

1. Access (retrieval, recall) - process of finding in memory the most similar base analog given target (input, cue, probe) episode

2. Mapping - process of finding correspondences between the elements of two analogs

3. Inferences about target analog based on the info from the base

EPISODE 1

EPISODE 2

EPISODE N

...

EPISODE Х

TARGETEPISODE

BASEEPISODES

EPISODE Y

MEMORY

34

Solution of problems. Base analog is used as a source of ideas about the target problemExplanations. Base analog is used for understanding of target analogFormation and evaluation of hypothesisJustification of point of view (In political, historical, etc. discussions)In literature. Etc.

Analogy in everyday life

35

Analogy in solving problems

Target analog - heat flow. Base analog - water flow.

FLAT-TOP LIQUIDDIAMETER

GREATER

DIAMETER

BEAKER VIAL WATER PIPE

FLOW

GREATER

PRESSUREPRESSURE

Water FlowHeat flow coffee

ice cube

No border

Relation or attribute

Function

Entity

Match Hypothesis

CAUSE

36

Example - episodes with animals, adapted from Thagard, Holyoak, Nelson & Gochfeld (1990).

General scheme: R0 (R1(X,Y), R2(Y,X))The episodes have the same relations, as Probe, but various

types of similarity

Study of analogy in humans. Various types of analog similarity. Episodes with animals.

Similarity type EpisodeProbe

PSpot bit Jane Jane flee from Spot

Literal SimilarityLS

Fido bit John John flee from Fido

Cross-MappingCM

Fred bit Rover Rover flee from Fred

Surface FaturesSF

John flee from Fido Fido bit John

AnalogyAN

Mort bit Felix Felix flee from Mort

1st Order relationsonly FOR

Mort flee from Felix Felix bit Mort

37

Access to analogical episodesSimilar structures have similar codevectors. Total similarity

of structures is evaluated by the overlap of their codevectors

Access to analogical episode is done by finding in long-term memory a codevector most similar to the codevector of the input episode

Modeling analogical reasoning with distributed representations

EPISODE 1 codevector

EPISODE 2 codevector

EPISODE N codevector

...

CODEVECTOR of EPISODE Х

TARGETEPISODE

BASEEPISODES

CODEVECTOR of EPISODE Y

MEMORY

38

Access to analogical episodes Episodes with animals

Humans demonstrate the following pattern of retrieving analogs from long-term memory: LS > CM SF > AN > FOR.

Forbus, Gentner& Law 1995; Ross 1989; Wharton, Holyoak 1994

Similarity values between codevectors of episodes (our model)

Type of Similaritysimilarity Episode value

P Spot bit Jane Jane flee from Spot

1.00

LS Fido bit John John flee from Fido

0.40

CM Fred bit Rover Rover flee from Fred

0.30

SF John flee from Fido Fido bit John

0.24

AN Mort bit Felix Felix flee from Mort

0.14

FOR Mort flee from Felix Felix bit Mort

0.09

Range of 0.005-0.008

39

Mapping (interpretation) of analogy - find corresponding elements of the analogs

In our model, many analogs can be mapped by direct similarity of the codevectors of their elements

Mapping analogs with distributed representations

40

biteflee

causeid_spotid_jane

humandog

spot = dog id_spotjane = human id_jane

BITE = 1bite spot jane spot>>agent jane>>objectFLEE = 1flee spot jane jane>>agent spot>>object

P = 2cause BITE FLEE BITE>>cause_antc FLEE>>cause_cnsq

level#4 level#3 Level#2 level#1Probe Antc Cnsq Bite Flee bite_a bite_o flee_a flee_o

level#4 E_AN 0.25 0.19 0.19 0.08 0.08 0.06 0.07 0.07 0.07level#3 Antc 0.19 0.33 0.02 0.11 0.03 0.10 0.10 0.02 0.02

Cnsq 0.20 0.02 0.35 0.02 0.12 0.02 0.03 0.11 0.11level#2 Bite 0.07 0.11 0.03 0.16 0.03 0.15 0.14 0.02 0.02

Flee 0.07 0.02 0.11 0.02 0.16 0.02 0.02 0.14 0.14level#1 bite_a 0.06 0.10 0.02 0.15 0.02 0.26 0.02 0.02 0.01

bite_o 0.06 0.09 0.02 0.13 0.02 0.02 0.25 0.02 0.02flee_a 0.06 0.02 0.10 0.02 0.14 0.02 0.02 0.26 0.02flee_o 0.07 0.02 0.09 0.02 0.13 0.01 0.03 0.02 0.25

Mapping by similarity

41

Mapping analogs with distributed representations. More sequential scheme.

1flee_obj felix = flee_o

1flee_agt mort = flee_a

Flee = 2flee_a flee_o

Antc = 3cause_antc Flee

bite_o =1jane bite_obj

bite_a = 1spot bite_agt

Bite = 2 bite_a bite_o

Antc = 3Bite cause_antc

Cnsq = 3Flee cause_cnsq

Flee = 2flee_a flee_o

0.03jane

E_FOR = 4Cnsq AntcProbe = 4Cnsq Antc

mortspot

bite_agt

lev#base:lev#base: lev#0:Characters Object rolesAgent roles

lev#4

cause_cnsq

cause_cnsq

flee_agt

0.28(0.09)

1.0

flee_a = 1flee_agt jane

flee_o = 1flee_obj spot

0.57

lev#3

lev#2

lev#4

0.32(0.11)

Cnsq = 3cause_cnsq Bite

0.51(0.3)

0.57

lev#3

0.68(0.03)

0.67(0.02)

0.51(0.3)

0.33(0.01)

0.01

0.03

felix

lev#1

0.5(0.3)

2bite_a bite_o = Bite

0.67(0.03)

0.34(0.01)

lev#2

lev#1

0.68(0.02)

flee_obj

bite_obj

0.01

0.67(0.29)

0.5(0.3)

0.33(0.01)

0.02(0.16)

0.65(0.02) 0.67(0.29)

0.02(0.26) bite_a = 1felix bite_agt

0.02(0.26) 1mort bite_obj = bite_o

0.33(0.01)

42

Distributed representations in APNNs


Allow(3) using of distributed associative

memory(4) formation of part-whole hierarchies(5) modeling of analogical reasoning

distributed representations in ai: building the world model and analogical reasoning

Documents

world model content

organizationthe world

situation model

agents internal world

world model stores episode

objects x

world modelrepresentation

division objectsattributes