ccntbcliet s2asch cvee piniie pebpotation giodps

60
CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS by HAfilC A. ABANHA, E.Tech. in H.E. A IHEISIS IS COHPaTEB SCIENCE Sutiitted to the Graduate Faculty of Texas Tech University in Partial Pulfillaent of the Bequireaents for the Degree of aASlEB OF SCIENCE Approved May, 198M

Upload: others

Post on 28-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

by

HAfilC A. ABANHA, E . T e c h . i n H.E.

A IHEISIS

IS

COHPaTEB SCIENCE

Sutiitted to the Graduate Faculty of Texas Tech University in

Partial Pulfillaent of the Bequireaents for

the Degree of

aASlEB OF SCIENCE

Approved

May, 198M

Page 2: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

Cx'f,^^ ACKNOWLEDGHENTS

I sincerely thank Dr. Erol Enre for his direction and

helpful criticisa of this work. I am also indebted to Dr.

John Walkup and Dr. Martin Hardwick for their encourageaent

and technical advice.

11

Page 3: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

ABSTBACT

A penetrance learning system is iapleaented to aechan-

ically find a heuristic to perfora a heuristically control­

led search over finite permutation group graphs. The learn­

ing systea is tased on probabilistic analysis. A saaple so­

lution is evaluated in detail. Seasonably gcod results were

obtained.

2.11

Page 4: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

CONTENTS

ACKNOWIEDGMENTS 11

ABSTBACT iii

CHAPTEB

I. PBEILIHISIEIES 1

Intrcdnction 1 Definitions 2 Probleii Statement 5

II. SOLTJTIClf HETHCDCLOGY 8

IntrodBCtion 8 Definitions 9

—Penetrance Learning System 13 Solver 14 Differentiator 16 Begresscr 18 Penetrance Normalization 19

III. BESDLTS 22

Implenentation 22 Bxperinental Tests 23 Conclusions 27

BIBLIOGBAPHY 29

APPENDIX

A. EBBOB ESTIMATES 31

B. TBAINING PBOBLEM SETS 33

C. COMPDTEB IMPLEMENTATION 34

IV

Page 5: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

LIST OF TABLES

1. Chosen Features 24

2. Learning Phase Results 24

3. Solving Phase Besults 25

4. Test for Local Optiaality 26

5. Training Problems 33

Page 6: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

LIST OF FIGDBES

1. Group Trans fcraat i cns of a Hexagon 7

2 . B r e a d t h - f i r s t Search Tree '^'^

3 . Penetrance Evaluation '̂ ^

4. Penetrance learning System 14

5. Heuristic Search Procedure ^^

6. Region Splitting Procedure ''̂

VI

Page 7: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

CHAPTEB I

PBELIMINABIES

Introduction

The essential nature of Artificial Intelligence(AI) is

that of "syabcls and search" [Newell,1981 ]. However, the

central concern of aost HI research publications to date may

be sumaarized tj the following problem:

"Given a large state-space (possibly infinite), a finite

set of operators, initial states and final states find a se­

quence of operators (possibly optimal) from the initial

state to a final state" [Nilsson,1971]- This problea occurs

in different forms in areas like large production systems,

graaaars, theorem-proving, puzzles, gaaes, database and

knowledge base systeas.

A systematic exhaustive search for a solution to the

above problea is invariably combinatorially explosive and

thus highly prodigal of computer tiae and memory. A concern

of problem-solving research in AI has been to devise heuris­

tics that control the direction of the search, thus yielding

solutions with substantial reduction in search effort.

Heuristics are •rules of thumb* incorporating certain

problem-specific information, which may be aechanically

Page 8: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

discovered [Ernst ,1982] cr supplied by huaan experts in the

problea doaain [Simon, 1980 ] - Often such aethods do not guar­

antee a s o l u t i o n , or i f they do i t aay not he optimal. Thus

such contro l led search aethods aay be at best quas i -a lgo-

r i t h a s . But there i s yet no pract ical a l ternat ive to th i s

approach.

The main concern herein i s to apply such heur i s t i ca l l y

control led search techniques to perautation groups. A prob­

a b i l i s t i c learning systea i s developed to obtain a su i tab le

h e u r i s t i c for the search method. A s i a i l a r penetrance learn­

ing systea has been successful ly iapleaented for the f i f t e e n

puzzle [Bendell ,1983 ] .

Cefinitiong

The fo l lon ing de f in i t i ons pertain to s tate-space search

problems [Georgeff,1983 ] .

Def in i t ion . A s tate-space problea. P, i s a sextuple ,

p := <S,0,Y,C,I ,P>,

where, S = set of states (state space) ,

0 = finite set of operator(input) sjmhols,

Y = state transition partial function, Y:S*C — > S,

C = cost partial function, C:S*0 — > B+,

(B* = the set of positive real numbers)

Page 9: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

I = finite set of initial states(I in S),

F = finite set of final states or goals (F in S) .

Definition, s ==> s« iff there exists o in C, such that o (s)

= s'. ==>+ represents the transitive closure of ==>.

Definition. If i ==>+ f, for i in I and f in F, then the

corresponding operator sequence is a solution to the prob­

lea- If the operator sequence is of ainimal total cost the

solution is octiaal.

Definition. A heuristic (partial) function, h is

H : S — > B-»-.

H will hereinafter be referred to as a heuristic. H serves

to evaluate the potential of an operator to yield a final

state when applied to a given state. It is thus an estimate

of the ainiaal cost of an operator sequence from a given

state to a final state.

The follciing definitions pertain to Group Theory.

[Herstein,1964; Stone,1973].

Definition. A qroup^ <G,.>, is an algebraic structure where

G is a set, . is the product operation and

i) . : G*G — > G

ii) g. (h.k) = (g-h) .k for all g,h,k in G

Page 10: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

• « • 111) there exists, e in G, such that, e.g = g.e = g,

for all g in G

iv) for all g in G there exists an inverse, g» in G,

such that, g.g' = g'-g = e.

<G,.> is finite if the cardinality(order) of G is finite.

The order of a group element, g is the least positive inte­

ger, r, such that, g = e.

Definition. A subset of elements of a group, G, is called a

generator set of G iff every eleaent of G can be expressed

as a product of the subset elements and their inverses.

Definition. For a group, G, an equality of a product of gen­

erators and generator inverses to the group identity is

cailed a generator relation.

Definition. A group graph is a directed graph in which:

1. The vertices are labelled in one to one correspondence

with the elements of the group.

2. If X is a group generator then for each vertex, y,

there is an edge froa y to z, where, z = y.x.

Definition. A perautation of a set, X, is a fcijection froa X

onto X. The degree of the perautation is the cardinality

(nuaber of elements) of X.

Page 11: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

D e f i n i t i o n . A cjjcle of a permuta t ion , P, of a s e t , X, i s an

ordered s e t , 2 m-1

(X ,XP,XP, . . ,XP ) ,

where, x is in X, and m is the least positive integer such m

that, xP = X.

Definition. A permutation .group is a set of permutations

that forms a group under function composition (i.e., a sub­

group of the set of all fcijections on a set, X).

Problem Statement

Given a finite generator set of a permutation group of

finite degree the problea of concern herein is to obtain a

product sequence of generator permutations that yields a

given permutation in the group.

This is clearly a state space problem as defined above.

The initial state(I) corresponds to the identity permutation

and the operator set(O) corresponds to the generator set or

to any permutation derived as a product of generators. Thus

states and operators are indistinguishable here. The cost

function may be taken as C(s,o) := 1 (o) , where 1 (o) = the

length of the generator seguence represented by operator, o.

The state transition function is defined by the functional

coaposition of permutations and is thus total.

Page 12: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

This problem is non-trivial when large permutation

groups are considered. In general it can be extended to any

group as it can be shown that any group is isomorphic to

soae perautation group [Stone,!973 ]. The execution tiae of

an exhaustive search algorithm for such a problea is expo­

nential in the cardinality of the generator set. However, it

is typical of those probleas for which the controlled search

described above is appropriate.

The solution of this kind of problea could be useful in

areas such as robot task planning, geometrical transforma­

tions, aemory interconnection design [Wu,1981] and the solu­

tion to several coabinatorial puzzles having a group struc­

ture [ Stone, 1973 J.

For example, figure 1 shows a siaple application to

geometrical transforaations using the dihedral group

[Stone,197 3 ]. There are two generators, rotation through 60

degrees (g) and reflection about a vertical axis (h). Thus

froa the identity several configurations may be obtained by

finding the appropriate sequence of generators. This could

be a far aore ccaplicated problem for more complex geometri­

cal configurations.

Similarly we could consider a robot manipulator with a

set of primiaitive movements (generators) which could be

concatenated tc achieve a certain resultant motion (goal).

Page 13: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

Any particular nanipulatcr configuration is thus considered

a group element.

Identity, e Botated 180 degrees and reflected about the vertical axis, g^h

Figure 1: Group Transformations of a Hexagon

Page 14: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

CHAPTEB II

SOLUTION METHOCOLOGY

Introduction

Several different kinds of state-space search proce­

dures using heuristics have been matheaatically analysed

[Nilsson,1971]. But in practice it is extreaely difficult

to find heuristics that satisfy specific aatheaatical prop­

erties [Bagchi, 1S83]. Exaaples of such properties are 'ad-

aissibility* and the •acnotcne restriction* [Nilsson,1980].

So invariably an empirical approach has to be adopted. Thus

in order to solve the problem at hand the main concern is to

find some method of obtaining a useful heuristic. A learn­

ing systea is employed tc obtain a heuristic by statistical­

ly analysing several completed solutions. The program devel­

oped has a capability of learning from experience and

improving its performance with tiae. Thus the main objective

is to develop an automated learning system which needs lit­

tle human intervention.

8

Page 15: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

Definitions

The following are definitions of terainology used to

describe the learning systea that has been developed.

Definition. A rroblea instance. P, is a particular problea

in the class (domain) of probleas under consideration.

Definition. A search tree, T(H,P,H), is the tree obtained

from problem instance, P, by repeatedly applying all opera­

tors to the currently best state (according to the heuris­

tic, H) until either a solution is found or until a aaxiaua

of M states have been generated-

Definition. A feature is a property that serves as a measure

of the difference (difference metric) between each state, S,

and the final state, G, of a problea instance.

Definition. A feature vector for a given state is an n-vec­

tor (for n chosen features) whose components are the feature

values of the state.

Definition. An n-dimensional feature space is an n-dimen-

sional vector space of feature vectors.

In practice the features are chosen based on

problem-specific inforaation. They may be aechanically

Page 16: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

10

detera ined [Erns t ,1982 ] , or based on expert huaan knowledge

of the problem doaain. Figure 2 shows a s imple search t ree

with cons tant h e u r i s t i c (breadth- f i r s t ) for the symmetric

group of degree f o u r . The fea ture vector coaponents are

eva lua ted for the s t a t e s on the s o l u t i o n path (darkened).

The t ¥ 0 chosen f ea tures are: f1=absolute d i f f e r e n c e of prod­

u c t s of c y c l e l e n g t h s , f2=nuaber of misplaced e l e a e n t s . Ter­

minal s t a t e s are parenthes ized . For s t a t e S1=0 1 2 3 the

c y c l i c r epresen ta t ion i s (0) (1) (2) (3) and for the g o a l , &=2

0 1 3 i t i s (102) ( 3 ) . Therefore, we have, f1 = 1 ( 1 . 1 . 1 . 1 ) -

(3 .1 )1 = 2 , where the f a c t o r s correspond to the c y c l e

l e n g t h s and, we have, f2 = 3 because the f i r s t three symbols

are a i s p l a c e d . The array representa t ion of permutations i s

exp la ined in chapter I I I . This use of f e a t u r e s i s s i a i l a r

to the aeans-end a n a l y s i s used i n GPS. [ N i l s s o n , 1980;

Ernst , 1969] .

D e f i n i t i o n . A r e g i o n , r , i s a p a r t i c u l a r subset (voluae) of

t h e f e a t u r e space . For s i a p l i c i t y we assume that for a 2-D

f e a t u r e space the r e g i o n s are r e c t a n g l e s , for 3-D cuboids ,

and so on. Regions aay be represented by t h e i r upper and

lower l i a i t s — for exaaple rectangular r e g i o n s are

represented by d iagona l ly oppos i te corners of the

corresponding r e c t a n g l e s .

Page 17: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

11

S 1=0 1 2 3

£2=1 0 2 3 S3=1 2 3 C

(S4=0 1 2 3) S5=2 1 3 0 S6=0 2 3 1 S7=2 3 0 1

B

(S8=2 C 3 1) S9=3 2 0 1 S10=1 3 0 2

{S11=3 2 1 0) (S12=3 1 2 0) 513=0 3 1 2 (S14=2 0 1 3)

GENEBATOB SET: A = 1 0 2 3 B = 1 2 3 0

GOAL = 2 0 1 3 SOLUTION = EBAE (backward chained se q uence)

State SI S3̂ S6 S10 S14

f 1 2 1 0 1 0

f2 3 4 4 4 0

Figcre 2 : Breadth - f i r s t Search Tree

D e f i n i t i o n . The penetrance , p , of a search t r e e , T, in

r e g i o n , r, i s ,

p(r ,T) := g ( r , T ) / t ( r , T ) ,

where, g (r ,T) = nuaber of expanded s t a t e s in r which are on

a s o l u t i o n path and t ( r ,T) = t o t a l nuaber of expanded s t a t e s

in r .

Page 18: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

12

Figure 3 shows a search t r e e , ! , and a corresponding

two-dimensional feature space ,F , divided into 'rectangular*

r e g i o n s . The problea doaain i s the symmetric group of degree

f o u r . The penetrances in the various reg ions are as f o l l o w s :

p(r1 ,T) = 1/1=1.C, p ( r 2 , T ) = V 2 = 0 . 5 , and p (r3,T) =1/3 = 0.3

[ R e n d e l l , 1 9 8 3 ] .

7I\ °7t^x:\; o o \ o o \ \ ^-.

H

T

\ GOAL N

\ \

\ \

\ \

\ \

\ \ \ \

\ \

\ \

\ \

\ \

\ \

• ^ v]x- ^4^ ^5

- >

"*T-

\ m. - ^

y ^

\ H

Eigure 3: Penetrance Evaluation

Page 19: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

13

Penetrance could be used as measure of the worthiness

of states for expansion. For any state, S, in tree, T,

P(r,T) is an estimate of the conditional probability that S

is on a solution path given that the feature vector for S

lies in region, r. Thus if we can estiaate the penetrance

at any point in the feature space we could use this to esti­

aate the relative aerit of each selected feature. So, as

will be seen later, an estiaate of p(r,T) for each region,

r, of search tree, T, is as good as a heuristic.

Penetrance Learning System

The concept of penetrance is used to develop a learning

systea which * learns* a useful heuristic. Figure 4 shows a

Penetrance Learning Systea (PLS). It consists of three main

coaponents: a SOLVER, a DIFFEBENTIATOB and a CLOSTERER.

This systea is described below. Further details aay be ob­

tained froa [Rendell, 1983].

The penetrance learning systea is used to •learn* a

heuristic on a probabilistic basis by solving several sets

of training probleas. To get the systea started a 'booting*

aechanisa is incorporated.

Page 20: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

14

Set of problea instances,P

Solution trees, T(b,P)

w

SCLVEB ^

P)

'r - i

DIFFEBENTIATOB

CuBulative region set, {(r,p,e) )

BEGBESSCB

Feature weight vector,b

Figure 4: Penetrance Learning System

Solver

The SOLVEE essentially consists of soae heuristic graph

search procedure similar to A* [Nilson,1980 ]. Such a graph

search is used to select a node for expansion based on the

value of the corresponding evaluation function or heuristic.

Details of the particular search procedure used here are

given in figure .5. Expanding a state means generating the

Page 21: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

15

next s t a t e s by applying a l l the re lavant opera tors . Here

OPEN and CLCSED are b i - d i r e c t i o n a l l inked l i s t s [Knuth,1968;

S tandish ,1980 ] . We assuae a h e u r i s t i c funct ion of the f o r a ,

H := bO -f b 1 . f 1 • b2. f2 • b3 . f3 • . . . • bn. fn ,

where, f 1 , f 2 , . . , f n are the values of a s e t of n chosen f e a ­

t u r e s and b 0 , t 1 , b 2 , . . , b n are the feature we ights .

AI. Place, the i n i t i a l s t a t e s on OPEN.

A2. I f OPEN i s eapty , e x i t with f a i l u r e . Otherwise cont inue .

A3. Transfer from OPEN to CLOSED s t a t e , s , such t h a t H(s) i s a in iaua over a l l s in OPEN (reso lve t i e s a r b i t r a r i l y but always in favour of a f i n a l s t a t e ) .

A4. I f s i s a f i n a l s t a t e , e x i t with the s o l u t i o n seguence obtained by t r a c i n g backwards through the po inters ( see A5). Otherwise cont inue .

A5. Expand s . I f there are no next s t a t e s go to A2. I f any next s t a t e i i s in OPEN cr CLOSED ignore i t . Otherwise compute H (i) and s e t up a po inter froa i to s .

A6. Go to A2.

Figure 5: Heuristic Search Procedure

For the heuristic to be defined we aust determine the

feature weights b0,b1,b2,-.,bn. To start with all the bi

values are set to zero and we consider the entire feature

Page 22: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

16

space within the bounds of each feature for the particular

problem doaain (i.e., initially the feature space consists

of a single region). We then give the SCLVEB a set of prob­

leas to solve resulting in a set of search trees. Since,

initially, H := C, the search trees are breadth-first as we

have a constant heuristic. From the set of search trees we

get a corresponding set of feature space points and we mark

those that lie on a solution path — this constitutes the

output of the SCL7EB.

Differentiator

The main function of the differentiator is to partition

the feature space into representative clusters(regions). The

output of the SCLVEB is used to calculate the penetrance in

each region,r of the feature space. Associated with each

penetrance value, p, is an error estimate, e (see appendix

A) . In practice the set of regions is stored as triples

(r,p,e) on a •blackboard* (a globally modifiable data struc­

ture) and is thus accessable to different components of the

PLS. The DIFFEBENTIATOB systematically splits each region of

the feature space into two by inserting hyperplanes

(infinite dividing planes) parallel to the feature axes at

regular intervals and determining which split gives the

Page 23: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

17

maximum difference in penetrance between the corresponding

two sub-regions . The process of s p l i t t i n g i s repeated re­

curs ive ly unt i l the feature space cannot be d i f ferent iated

any further (see f igure 6) . The newly obtained region set

now replaces what was previously on the blackboard. This

d i f f e r e n t i a t i o n procedure i s s i a i l a r to the c luster ing of

large data s e t s for s t a t i s t i c a l analys is [2upan,1982], ex­

cept that here the c lus ter ing i s dene in reverse.

Let r be a region in the cumulative region s e t .

Exhaustively in ser t hyperplanes paral le l to the feature axes.

While any hypeiplane boundaries reaain untried do Select a hyperplane creating two sub-regions r1 and r 2 . Find the penentances and error e s t i a a t e s for r1 and r2 . I f t h i s dichotoay gives a d is tance , d (see appendix A) greater than any previous, note the hyperplane.

Endwhile

If the best d was greater than some ainimum replace r by the corresponding r1 and r2 .

Repeat the above procedure for a l l regions in the cumulative region s e t unt i l no more s p l i t t i n g occurs.

Figure 6: Region Spl i t t ing Procedure

Page 24: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

18

Repressor

Using the region set information as input the REGRESSOR

performs a multiple linear regression with the center

points of each region as the independent variable and the

penetrance as the response variable [Draper, 1981 ]. The cen­

ter of each region is representative of that region. The

centroid could be used for better results. This regression

model is siaple and straightforward and was proved success­

ful for the fifteen puzzle [Rendell, 1983]. Several effi­

cient computerized statistical packages are available for

regression analysis. The present system uses the routine

RLSEP of the IKS Library [IMSL,1982]. Thus we obtain the

relative aerit of each chosen feature, giving the feature

weight vector components,bi. In practice it aay be useful to

use a tranforaation function before the regression. We use

In (p) instead of the penetrance,p as the response variable

as there tends to be very little difference in penetrance

values. For the heuristic we use,

fl := exp(b0 + b1.f1 • b2.f2 • ... • bn.fn).

The new feature weight values,bi, now replace the previous

values on the corresponding blackboard.

The PLS is thus given successive problea sets until the

feature space becoaes undiffrentiable and so cannot be

further partitioned. At this stage we expect the bi values

Page 25: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

\

19

to becoae more or less constant. The PLS has then,

hopefully, •learned• a heuristic that will solve most prob­

leas in the problem doaain under consideration, expanding

substantially fewer states than an exhaustive search would-

This hope is based on the probability that the chosen fea­

tures are relevant towards evaluating the merits of a state

of the problem and that the systea is fairly stable. The

fundamental design aspect of the PLS is therefore to accumu­

late knowledge iteratively by iaproving the active heuristic

between successive problea set solutions.

Penetrance Noraalization

Various aeasnres need to be taken to stabilize the per-

foraance of the PLS. Associated with the penetrance,p in

each region is an error estiaate,e which is used as a

weighting factor before the regression is perforaed. Thus

instead of just In(p) as the response variable we use

ln(p)/ln(e). For error estimates see Appendix A. Besides

this stability measure we also have to correct for bias in

penetrance values.

In practice, for non-trivial problem instances,

realizable search trees are either breadth-first for easier

problem instances or the result of using good heuristics.

Page 26: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

20

for harder problem instances. But as better heuristics are

used in tree searching the localized penetrance within re­

gions of the feature space tends to be biased upward rela­

tive to the overall (true) penetrance of a breadth-first

search tree. A perfect heuristic would thus yield the larg­

est possible penetrance of unity. In order to stabilize the

PLS performance we therefore need to standardize all pen­

etrance values by removing this bias.

Penetrance normalization is done empirically as fol­

lows. Let p be the actual localized penetrance and p* the

true penetrance in some region r of the feature space. Let

s = P/P'» If 1̂ gets split into several sub-regions and if ri

is one such sub-region with localized penetrance, pi, then

the normalized penetrance for ri is given by pi/si, where.

In (si) = ln(s) • (1-1/h) .b. (cr-ci) .

cr and ci are the centers of r and ri, respectively, and h

is obtained by a simple linear regression, assuming the mod-

el p» = P . This regression is performed using the set of

par- ent regions before they are differentiated. For unsplit

par- ent regions we just use p* = p^ as this is found to be

the general trend from values obtained in practice. The p»

values are those obtained from a breadth-first search

(initially), or from previous penetrance normalization

(subsequently). For split regions we have to account for

Page 27: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

21

bias within a region and therefore an addit ional correction

i s appl ied. I t aay be noted that t . (cr-ci) i s the expected

logarithmic true penetrance difference between r and r i and

the factor 1/h converts th i s to a biased value, so that the

e n t i r e correct ion term counters the logar i tha ic bias due to

the h e u r i s t i c , B . There i s no perfect rat ionale however be­

hind t h i s noraal izat ion procedure; but i t i s found tc work

wel l in pract i ce .

Page 28: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

CHAPTER I I I

RESULTS

lapleaentation

A PLS as described in chapter II was iapleaented in the

C prograaaing language [Kernighan,1978; Whitesaiths,1983 ] on

a VAX-11/780 running VMS. This PLS could be used to devel­

op a heuristic to search any group graph as explained in

chapter I. Although a significant amount of list processing

is involved, a procedural language like C is appropriate be­

cause of the nature of the aatheaatical coaputations used.

There is also soae aotivation nowadays to use procedural

languages in AI research because they are acre universally

available and urderstood.

Perautaticns are internally represented as one-dimen­

sional arrays [Knuth, 1968 ]. For exaaple, the permutation

0—>2, 2—>1, 1—>0, 3—>3 aay be conveniently represented

as 2 0 1 3, where the domain eleaents are implicitly repre­

sented by the position(array index) of their corresponding

images. The data resulting from solving a set of problea

instances is stored in a large sorted array, which is sorted

using a shell sort,in order of feature vector values. Other

imleaentation details can be found in appendix C.

22

Page 29: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

23

Experiaental Tests

Several t e s t s were run with the developed systea and

i t s perforaance was found to be generally s a t i s f a c t o r y . In

P^rticulsLT deta i led performance evaluation i s presented for

the fol lowing perautation group:

Generator Set :

A = 1 0 2 3 4 5

E = 1 2 3 4 5 0

It can be shown [Stone,1573] that this generator set gener­

ates the symaetric group of degree six, and order 6!. This

particular exaaple was chosen for illustration as it is not

small enough to be too trivial and not too large to compare

the results obtained with corresponding breadth-first solu­

tions. For exaiple, trial problea instances with the syaae-

tric group of degree 8 took several hours of CPU time for a

breadth-first search. Hence for extensive tests the above

problem doaain is most practical.

Four apparently relevant features were chosen(see table

1) . A total of 45 training problems (see appendix B) were

used. The initial iteration consisted of 10 problems and

thereafter 5 problems were used in each iteration. After

about 7 iterations the program had rejected two of the

features by setting their weights to zero. The feature

Page 30: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

24

TABLE 1

Chosen Features

f1 - absolute difference of products of cycle lengths between a given state and the final state.

f2 = nuaber of aisplaced eleaents. f3 = sum of distances apart of egual elements

in a given state state and the final state. f4 - number of pair reversals between a given state

and the final state.

weights apparently converged to constant values after the

cumulative region set became undifferentiable. The results

of the 'learning* phase are suamarized in table 2.

TABLE 2

Learning Phase Besults

Iter­ation

1 2 3 4 5 6 7 8

number of regions

2 4 15 25 27 27 27 27

bO

-0.67 -0.78 -0.58 -1.86 -1.93 -5.24 -7.61 -7.54

Feature bi

0.0 0.05

-0.01 -0.32 0.0 0.48 0.67 0.64

heights b2

0.0 0.0 0.0 CO 0.0 0.0 0.0 0.0

b3

0.0 0.0 0.0 0.0 -0.33 -0.24 -0.29 -0.28

b4

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

Page 31: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

25

Taking the aean of the l a s t two rows we obtain the heu­

r i s t i c ,

H = exp (-7.575 + 0.655 f l - 0.285 f3) .

The so lver was then given 41 randomly se lected problem in ­

stances and on the aggregate about 14% fewer s t a t e s were ex­

panded compared with breadth-f irst so lut ions of the same

problem ins tances . However, of these 4 1 problem ins tances ,

i f we consider only those 32 for which more than 100 s t a t e s

were expanded in the breadth-f irst s o l u t i o n s , the above heu­

r i s t i c proved far superior (see table 3 ) . Approximately 4056

fewer s t a t e s were expanded with the learned heur i s t i c . The

mean so lu t ion length, however, was approximately three times

that for breadth- f irs t search. It aust be noted that a

breadth- f irs t search always re su l t s in the shortest so lut ion

sequence.

TABLE 3

Solving Phase Results

Nuaber of problem instances = 32.

Breadth-first Heuristic Search Search

States expanded (mean) 368.56 229.34 Solution length (aean) 11.13 31.38

Page 32: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

26

[ R e n d e l l , 1 9 8 3 ] c l a i a s t h a t u s i n g a s i a i l a r PLS f o r the

wel l -known f i f t e e n p u z z l e the l e a r n e d h e u r i s t i c was found t o

be l o c a l l y o p t i a a l , b o t h , i n terms of t h e aean nuaber of

s t a t e s expanded and i n terms of the aean s o l u t i o n l e n g t h .

However, i n t h i s e x a a p l e , t h e l e a r n e d h e u r i s t i c was n o t e x ­

a c t l y l o c a l l y o p t i a a l a s can be seen f r o a t a b l e 4 — t h e

s a a e 32 problem i n s t a n c e s a s i n t a b l e 3 were used and the

f e a t u r e w e i g h t , b1 was perturbed s l i g h t l y in e i t h e r d i r e c ­

t i o n .

TABLE 4

T e s t f o r Local O p t i a a l i t y

S t a t e s Expanded S o l u t i o n Length (aean) (aean)

b i = b i (opt) 229 .34 3 1 , 3 8

b i = 1 .25 b1<opt) 232 .56 3 2 . 8 1 b i = 0 . 7 5 b 1 ( c p t ) 225 .03 3 4 . 9 4

b i = 2 . 0 b l ( o p t ) 314 .81 4 2 . 0 6 b'i = 0 . 5 b l ( o p t ) 234 .81 33-69

Page 33: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

27

ConclusJrOns

Since the irethods used are largely eap ir ica l in nature

they are not subject to rigorous aatheaatical ana lys i s .

However, cer ta in calculated guesses can be aade based on e s -

t i a a t i c n s and experience. Perforaance evaluation can thus

be only on the basis of experiaent. Froa the experiaental

r e s u l t s obtained several conclusions aay be drawn. The cor­

responding exhaustive search (breadth-first) has been used

as a yardstick in the perforaance evaluation.

Although heur i s t i c search was t e t t e r than a breadth-

f i r s t search in teras of the nuaber of s t a t e s expanded, the

r e s u l t s are r e l a t i v e l y not so good for problea instances

so lvable by the breadth-f irst search within 100 s t a t e expan­

s i o n s . This i s probably due to the fact that the training

problea instances were arranged in order of the nuaber of

s t a t e s expanded by a breadth-f irst search. Hence, i f a so­

lu t ion i s not obtained after a certain aaximua number of

s t a t e s are expanded by breadth-f irst search, we can then ea-

ploy a h e u r i s t i c search. This wi l l invariably be the case

for large problea doaains.

Although the so lut ion lengths are sub-optiaal t h i s i s

not a very serious drawback. A few aechanically discovered

generator r e l a t i o n s aay be used as patterns in order to

shorten the s o l u t i o n . The Knuth-Horris-Pratt pattern

Page 34: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

28

matching algorithm [Standish,1980] could be used for this

purpose. Hence, in general, less restrictive heuristics

could be used [Eagchi,1983].

Although the learned heuristic was found to be optiaal

for the fifteen puzzle, this claim is not quite valid for

permutation group graphs. This is probably because of the

fact that the operators in the fifteen puzzle problea pro­

duce relatively less perturbation in the configuration of

any given state. Hence there is greater inherent stability

in the fifteen puzzle. It aight be worthwhile to consider

the group generators in coaputing difference metrics as was

done fcr GPS [Ernst,1969 ].

Inspite of the measures taken to produce a convergent

result we cannot ensure convergence. The results obtained

depend on several factors like the specfic problem domain,

the features selected, the significance level of the regres­

sion coefficients, and so on. But it at least succeeds in

eliminating a large amount of human trial and error in order

to obtain a heuristic.

Page 35: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

BIBLIOGRAPHY

B a g c h i , A. and Mahanti , A. , Search Algor i thms Under D i f f e r e n t Kinds of H e u r i s t i c s - A comparat ive s t u d y . i -ACJ 3 0 , 1 , pp 1 - 2 1 , Gan 1983.

Draper , N.R. and Smith , H. , A l l i e d Begres s ion A n a l y s i s . John Wiley and S o n s , 1981.

E r n s t , G.W. and G o l d s t e i n , M.a . , Mechanical D i scovery of C l a s s e s of Problem-Solv ing S t r a t e g i e s , J.ACM 2 9 , 1 , pp 1 - 2 3 , Jan 1S82.

E r n s t , G.W. and N e w e l l , A . , GPS: A Case Study in G e n e r a l i t y and Problem S o l v i n g . Academic P r e s s , 1969 .

G e o r g e f f , M.P . , S t r a t e g i e s i n H e u r i s t i c S e a r c h , A r t i f i c i a l I n t e l l i g e n c e 20,pp 393 -425 , 1983.

H e r s t e i n , I . N . , Topics i n Algebra . B l a i s d e l l P u b l i s h i n g C o . , 1 9 6 4 .

H o p c r o f t , J . E . and Oilman, J . D . , I n t r o d u c t i o n t o Automata Theory, Languages and Computat ion, Addison-Wesley P u b l i s h i n g ^ C o T , 19797

IMSL, Library Be ference Manual. Edn 9, V o l . 4 , Ch. R, June 198 2 .

Kernighan, B.K. and R i t c h i e , D. M., Tjje C Programming Language, P r e n t i c e - H a l l , I n c , 1978.

Knuth, D . E . , The Art of Computer Programming. Vol . J : Fundamental A l s o r i t h n i s , Addison-Wesley , 1968.

L e n a t , D . B . , The Nature of H e u r i s t i c s , A r t i f i c i a l I n t e l U g e n c e 19, pp 189 -249 , Oct 1982.

N e w e l l , A. and Simon, H.A. , Computer S c i e n c e a s Empir i ca l Enguiry:Symbols and Search , Mind D e s i g n . Bradford Books P u b l i s h e r s , pp 3 5 - 6 6 , 1 9 8 1 .

N i l s s o n , N . J . , P r o b l e m - S o l v i n g Methods i n A r t i f i c i a l I n t e l l i g e n c e . " f l c Graw-Hi l l Bock C o . , 1 9 7 1 .

N i l s s o n , N . J . , P r i n c i p l e s of A r t i f i c i a l I n t e l l i g e n c e . Tioga P u b l i s h i n g C o . , 198^7~

29

Page 36: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

30

R e n d e l l , L . , A New B a s i s for S t a t e - S p a c e l e a r n i n g Systems and a S u c c e s s f u l l a p l e m e n t a t i o n . A r t i f i c i a l I n t e l l i g e n c e 20 , pp 3 6 7 - 3 9 2 , 1983.

Sedgewick , R . , Permutat ion Generat ion Methods, Coaputing Survexs 9 , 2 , pp 1 3 7 - 1 6 4 , June 1977.

S h a p i r o , S .C . , Techniques of A r t i f i c i a l I n t e l l i g e n c e , Van Nostrand C o . , 1979.

Simon, H.A. , l e s s o n s for AI from Human P r o b l e m - S o l v i n g , Computer S c i e n c e Research R e i i § i # Department of Computer S c i e n c e , Carneg ie -Mel lon U n i v e r s i t y , 1980.

S l a g l e , J . R . , A r t i f i c i a l I n t e l l i g e n c e ; The H e u r i s t i c Programming Af j roach , Mc Graw-Hi l l Book C o . , 1971 .

S t a n d i s h , T - A . , D a t a s t r u c t u r e s T e c h n i g u e s . Addison-Wesley P u b l i s h i n g CoT7"l9 8 0 . * ~

S t o n e , H . S . , D i s c r e t e Mathematical S t r u c t u r e s and t h e i r A p p l i c a t i o n s , S c i e n c e Research A s s o c i a t e s , I n c . , 1973.

Toper , R.W., Fundamental S o l u t i o n s o f t h e Eight Queens Problem, BIT 2 2 , pp 4 2 - 5 2 , 1982 .

W h i t e s a i t h s , L t d . , C I n t e r f a c e Manual f o r VAX-11, March 1983 .

Wu, C. and Feng, T . , The U n i v e r s a l i t y of the S h u f f l e -Exchange Network, IEEE Trans , on Computers C-3Q,5, pp 3 2 4 - 3 3 2 , May 1S81.

Zupan, J . , C l u s t e r i n g of Large Data S e t s . Besearch S t u d i e s P r e s s , 198 27

Page 37: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

APPENDIX A

EBBOR ESTIMATES

For the purpose of achieving stability in the PLS each

penetrances, p, is associated with an error factor, e. Hence

a region, r, is coded as the triple, (r,p,e), representing a

penetrance as saall as p/e or as large as pe.

As the criterion to split a region, r, into the sub-re­

gions ri and r2 the distance, d, is given by,

d = In(p1/e1) - In(e2.e2),

where, pi, el are the penetrance and error estimate for ri,

and p2, e2 that for r2.

Whenever region splitting occurs, the sub-regions in­

herit the error factor of the parent region. This is multi­

plied by two other factors. The first is (1 ^ sqrt(g)yg).(1

• sqrt (t)/t), where g and t are the same as in the defini­

tion of penetrance in chapter II. The second multiplying

factor is a quantity inversely proportional to the pene­

trance, p, and is (1 > 1/sqrt(p)). These expressions are

based on the reasoning that the accuracy decreases as sample

size decreases.

Whenever regression is performed on the cumulative

region set each response variable value,In(p) is weighted by

Bultiplying by 1/ln (e) .

31

Page 38: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

32

If pi and p2 are true penetrance estiaates of a

region,r, and el and e2 are the corresponding error factors,

the new estiaate, p, is obtained froa.

In p = (In pl/ln el 4 in p2/ln e2)/(1/ln el • 1/ln e2) ,

where. In e = 1/(1/ln el • 1/ln e2).

Page 39: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

APPENDIX E

TRAINING PROBIEP! SETS

The following are the problea sets used in each itera­

tion of the learning phase (only final states shown).

TABLE 5

Training Probleas

I t e r a t i o n 1

3 4 5 0 2 1 3 5 0 2 1 4 5 1 0 3 2 4 0 1 3 2 4 5 5 2 0 1 3 4

I t e r a t i o n 3

3 1 4 C 5 2 3 4 0 5 2 1 3 5 4 C 2 1 3 2 1 4 0 5 3 1 4 G 5 2

I t e r a t i o n 6

1 2 5 0 3 4 5 4 2 C 1 3 3 0 4 1 2 5 5 4 3 1 0 2 1 3 0 5 2 4

5 0 1 3 4 1 5 2 0 3 0 1 3 2 5 1 4 2 0 3 5 1 3 4 2

I t e r a t i o n

5 3 4 2 0 0 4 3 1 5 2 4 0 5 3 3 1 4 0 2 4 1 3 5 2

2 4 4 «;

0

jl

1 2 1 5 0

I t e r a t i o n 7

2 4 0 3 1 4 2 0 1 3 4 0 1 3 5 0 1 2 5 4 3 1 0 4 2

5 5 2 3 5

I t e r a t i o n 2

0 5 2 4 3 1 0 2 4 3 1 5 2 1 0 4 3 5 2 0 4 3 1 5 2 1 0 4 5 3

I t e r a t i o n 5

4 3 5 2 0 1 4 5 2 0 1 3 4 1 3 5 2 0 4 3 5 2 0 1 4 5 2 0 1 3

I t e r a t i o n 8

3 5 2 1 4 0 2 4 1 3 0 5 1 0 5 3 2 4 0 3 1 4 2 5 0 2 5 4 3 1

33

Page 40: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

APPENDIX C

COMPUTER IBPLEBESTATION

The f o l l c w i n g i s t h e c o a p l e t e l i s t i n g of a c o a p u t e r im­

p l e m e n t a t i o n of the PLS i n the C programming language

[ K e r n i g h a n , 1 9 7 8 ; W h i t e s m i t h s , 1 9 8 3 ] , with FORTRAN i n t e r f a c ­

i n g .

* HEADER DECLARATIONS ( p l s . h ) *

• i n c l u d e < s t d . h > • i n c l u d e < s t d t y p . h > • i n c l u d e < p a s c a l . h > • i n c l u d e <vms.h> • i n c l u d e < s t d i c . h >

• d e f i n e MAXDEG 10 • d e f i n e MAXGEN 10 • d e f i n e MAXPOIST 20000 • d e f i n e MAXNODE 2001 • d e f i n e MAXCOEF 5 • d e f i n e YES 1 • d e f i n e NO 0

struct node { int state[HAXDEG]; float fvalue; int ident; int gen; struct node *parent; struct node *lnode; struct node *rnode;

};

struct gener { int state[HA](DEG]; char symbol;

};

struct cell { int lo[nAXCCEE];

/* permutation array */ /* state fvalue •/ /• node number •/ /* generator index •/ /* points to parent */ /* points to left neighbour */ /* points to right neighbour •/

34

Page 41: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

35

}

int hi[MAXCCEF]; float cp[HAXCC£F]; float p,e,ep,s; struct cell •link;

* PENETRANCE LEARNING SYSTEM *

•include "pls.h»»

main () {

s t a t i c s t r u c t c e l l •head; s t a t i c s t r u c t gener gense t [ MAXGEN ] ; s t a t i c i n t goal[MAXDEG],pointsCHAXPOINTX HAXCOEF]; s t a t i c i n t ndeg ,ngens ,nprob ,pmax ,n i t er ,nreg , i ; s t a t i c f l o a t b[MAXCOEF]; FILE • f o p e n O , • f p , ^ f s , * f b , ^ f r, • f d;

fp = fopen ("dra1:Xwkg24]plsp. dat","r") ; f s = s t d c u t ;

g e t p r o b ( f p , f s , g e n s e t , S n d e g , e n g e n s , & n p r o b , S u i t e r ) ; f pr in t f ( fs ,"main: i t e r a t i o n ^ !?d\n",niter) ;

fb = fopen (»»dra1:[wkg24 ]plsb .dat" ,"r") ; getb (fb,*«l€",b,flAXCOEF) ; f c l o s e (fb) ;

pmax=0; whi le (nproh— > 0) { ge ta ( fp ,"^d " , g o a l , n d e g ) ; s o l v e r ( g e n s e t , g o a l , points[ paax] ,6paax ,ndeg ,ngens ,b , f s) ; }

fr = fopen ("dra1:[wkg24]plsr.dat","r") ; head = getcell(fr,MAXCOEF,5nreg); clust(head,t,Snreg,points,MAXCOEF,pmax,niter) ; f close (f r) :

fr = fopen (**dra1:[wkg24]plsr.dat»*,"w") ; f printf (fr,"1!id\n",nreg) ; putcell(fr,head,MAXCOEF); f close (f r) ;

fb = fopen (••dra1:£wkg24]plsb.dat","w") ; regres(h€ad,b,MAXCOEF,nreg) ; putb(fb,"5r6.2f ",b,MAXCOEF) ;

Page 42: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

36

}

fclose(ft); putb(fs,"5?8.2f ",b, MAXCOEF) fclose(fs);

* SOLVER FUNCTIONS * #41 ;^4>4c««4(i|c^«*;*4 4 : » 4 c * # 4 ' 4 ( * 4 c 4 c 4 i 4 ( * 4 t 4 c 4 i « 4 * « « « 4 i * 4 ' « « « • • • • • • • • • • • ' • • • • • /

s o l v e r ( g s e t , g c a l , v,pmax, ndeg, ngen, b,fp) s t r u c t gener g s e t [ ] ; i n t goa l [ ] , v [ ][HAXCCEF],*paax; i n t ndeg,ngen; f l o a t b [ ] ; FILE • f p ;

{ ex tern char * a l l o t b u f ; s t r u c t node * o p e n , • c l o s e d , • p , ^ g ; s t r u c t node • s e t l i s ( ) , • s e l e c t () , ^ o n l i s ( ) ; i n t ns[MAXEEG],f[ MAXCOEF ],compa () ; i n t 1 , j , f cund,memo,pid ,nopen,nc losed ,nregen; f l o a t fl 0 , f va l ; f pr in t f (f p," ^\n") ; f p r i n t f ( fp ,"So lver : goal = »*) ; puta(fp,"! lc " ,goa l ,ndeg) ; nopen = nc losed = nregen = 0; pid = 0; / • i n i t i a l i z e • / open = s e t l i s O ; / • s e t - u p l i s t s • / c l o s e d = s e t l i s () ; f or ( i = 0 ; i < ngen; i++) { / • put generators on open • /

f v a l = H ( f , h , g s € t [ i ] . s t a t e , g o a l , n d e g ) ; for (j=1; j < MAXCOEF; j**)

^CFi3]Cj] = ^1335 / * copy • * / f ixncde (open, gset[i ].state,ndeg,fval,pid,i,NULL) ; pid**; nopen**;

} found = NO; aeao = YES; whi le( (opeB->rnode != open) S5 found==NC 85 Beao==YES) {

p = s e l e c t (open); / • choose node with ain fva lue • / t r a n s f e r (p->lnode ,c losed) ; nopen—; nc losed**; f o r ( i = 0 ; i < ngen; i**) {

prod ( n s , p - > s t a t e , g s e t [ i ] . s t a t e , n d e g ) ; i f ( o n l i s (open,ns,ndeg) | | o n l i s ( c lo sed ,ns ,ndeg) )

nregen**; / • ignore regenerated nodes • / e l s e { fval = H(f,b,ns,goal,ndeg): q = fiinode(open->lnode,ns,adeg^fval,pid,i,p) ;

Page 43: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

37

if (q == NULL |1 pid = MAXNCDE) [ fprintf(fp,"solver: memory exhausted n"); memo = NO; break;

} f o r ( j = 1 ; j < MAXCOEF; j**)

• C F i d ] [ j ] = f [ j ] ; / • copy ! • / p i d * * ; nopen**; if(compa(goal,g->state,nd€g) == 0) {

•pmax *= pid; found = YES; fprintf (fp, "Solution (reversed) = *•) ; putS€q(fp,q,gset,v) ; break;

} } } } if (found == NO) fprintf(fp," nsolution not foundXn") ;

fprintf (f p,"States:E=%d 0=5(d R=«d\n", nclosed,nopen,nregen) ;

freeit (open) ; } /*.-... •/

getprob ( f i , f o , g s , d g , n g , n p , n i ) FILE • f i , ^ f o ; s t r u c t {

i n t state[MAXDEG]; char symbol;

} *gs ; i a t •dg ,^ng ,^np ,^ni ; {

i n t i ; f s canf (fi,'»ird %d %d %d n", dg ,ng ,np ,n i ) ; f o r ( i = 0 ; i < •ng; i**) {

f s c a n f ( f !,"%•€ %c %*c %*c !l^c",figs[ i ] . symbol) ; geta(f i ,"%d " , g s [ i ] . s t a t e , •dg) ;

}

fprintf (fc,"Generator Set: \n") ; for(i=0; i < •ng; i**) [

fprintf(fo," %c = ",gs[i].symbol); puta(fo,"^d ",gs[i].state,•dg);

} } /* V

Page 44: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

38

float H(f,b,s,g,n) /• heuristic evaluation function •/ int f[ ],s[ ],g[ ],n; /• using 5 features •/ float b[ ]; { IMPORT DOUBLE €xp(); i n t c y c l p o ,nmis() , s d i s t Q , sd i f () ,npairs () , i ; f l o a t value;

f [ 0 ] = 1.0; / • constant feature • / f [ 1 ] = abs ( cyc lp ( s ,n ) -cyc lp (g ,n ) ) ; / • cycle difference • / f [ 2 ] = n m i s ( s , g , n ) ; / • mismatches • / f [ 3 ] = s d i s t ( s , g , n ) ; / • sum of distances • / f [ 4 ] = npairs ( s ,g ,n ) ; / * pair reversals • / value = 0; for ( i=0; i < MAXCOEF; i**)

value *= b [ i ] ^ f [ i ] ; return (exp (value)) ; }

y^:/li^*iti**iliHc:^**********:i^**iH^:^li*^L***** * * * * * * * * * * * * * * * * * * * * * * * * * *

* REGION SET DIFFRENTIATION & PEliETRANCE NORMALIZATION • ^^it:Hi*:tt * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /

c lus t (head ,b ,nr ,v ,m,n ,n i ) / • c lus ters feature space • / s t r u c t c e l l •head; / • by d i f ferent ia t ion • / f l o a t b[ ] ; i n t •nr,v[ ][HAXCOEF],o,n,ni; { s truct c e l l • r ; f l o a t h,power(); i n t i ;

pr intf ("clust: \o") ; s o r t a ( v , a , n ) ; i f (ni > 1)

h=power(head,^nr,v,m,n); SDlit(head->link,v,ff l ,n); i f (ni > 1 )

norm (head,b,h ,v, a,n) ; *nr = 0 ; / • s i z e of new region se t • / r=head->link; whi le(r != NULI) {

r = r ->l ink; (•nr) **;

} } / • * /

Page 45: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

39

s p l i t ( r e g i o n , v , a , n ) / • s p l i t s r e g i o n s e t • / s t r u c t c e l l • r e g i o n ; i n t v[ ][MAXC0EI3, a, n; [

IMPORT DOUBLE s g r t () ; s t r u c t c e l l • r ; i n t i , a x i s , d i v , n d i v ; i n t lo[MAXCOEF],hi[f!AXCOEF],lo IHAXCCEF3,hi [MAXCOEF]; f l o a t p [ 2 ] , p _ [ 2 ] , e [ 2 ] , e ^ [ 2 ] , d , d _ , d i s t ( ) ;

c o p y a ( l o _ , r e g i o n - > l o , a ) ; / * i n i t i a l i z e b e s t s p l i t i n f o • / c o p y a ( h i , r e g i o n - > h i , a ) ; d_ = - 1 0 0 0 0 0 0 . C O ; f o r ( a x i s = 1 ; a x i s < a; a x i s * * ) { / • t r y a l l f e a t u r e a x e s • /

c o p y a ( l o , r e g i o n - > l o , B ) ; copya ( h i , r e g i o n - > h i , a ) ; nd iv= r e g i o n - > h i [ a x i s ] - r e g i o n - > l o [ a x i s ] ; f o r ( d i v = 1 ; d i v < n d i v ; div**) { / • t r y a l l h y p e r p l a n e s • /

l o [ a x i s ] * * ; / • on t h i s a x i s • / h i [ a x i s ] = l o [ a x i s ] ; pene t (&p[C] ,8e [0 ] , v , r e g i o n - > l o , h i , a, n) ; penet (8p[ 1 ] , S e [ 1 ] , v , l o , r e g i o n - > h i , a ,n) ; d = d i s t (p ,e ) ; i f (d > d ) { / • save b e s t s p l i t so f a r • /

d_ = d; copya (lc_,lo,B) ; copya (hi ,hi,a) ; for(i=G;~i<2; i**) {

P^Ci]=P[i3; e.[i]=e[i]; }

} }

}

3

if(d > -C.2) ( /• create two sub-regions •/ r="(struct cell •) allot (sizeof (struct cell)) ; copya (r->lo,lo_,B) ; copya (r->hi,region->hi,m); copya (r->cp,region->cp, m) ; r->p = p j 1 ] : jc->€ = e_[ 1]+(region->ep)^(1*1/sgrt(50.0^p_[1])) r->ep = region->ep; r->s = region->s; r->l ink = region->link; copya (region->hi,hi_,m) ; region-)p = p_£0]; region->€ = e_[0] • (region->ep)

• (1*1/sqrt(5C.0*p_[0])) ; region->link = r;

Page 46: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

40

i f (d_ > C) s p l i t (region,v,m,n);

e l s e i f {(region = region->link) != NULL)

s p l i t (region,V,a,n); } / * . • V

s t ruc t c e l l • / • inputs region se t • / g e t c e l l ( f p , a , n ) FILE *fp; i n t a, •n; [

struct cell •r, •t, •head; int i,j;

f scanf (f p,*'1d " ,n) ; r=head=(struct cell •) allot(sizeof(struct cell)) ; for(i=0; i < ^n; i**) {

t= ( s t ruc t c e l l •) a l l o t ( s i z e o f ( s t r u c t c e l l ) ) ; g e t a ( f p , "^d ", t ->lo ,m); g e t a ( f p , "^d ", t - > h i , a ) ; for ( j=0; j<a; j**)

t - > c p [ j ] = 0.5^(t->loCJ] * t - > h i [ j ] ) ; fscanf (f p, "5SeJ{e", 8t->p, 8t->e) ; t->ep = t ->e; r->l ink = t ; t ->l ink = NULL; r=t;

} return (head);

} / *

p u t c e l l (fp,head,a) / • outputs region set • / FILE • f p ; s truct c e l l •head; i n t a; {

s truct c e l l • t ; t=head->link; while (t > SULL) [

puta(fp, "%d ", t -> lo ,a ) ; puta( fp , "̂ d̂ ", t->hi,m) ; f p r i n t f ( f p , "«12.9f 5512.9f\n", t ->p , t->e) ; t= t -> l ink;

}

= /

Page 47: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

41

/* V

p e n e t ( p , e , v , l c , h i , m , n ) / • f i n d s p e n e t r a n c e = p , e rror=e • / f l o a t • p , ^ e ; / • wi th p o i n t s = v [ n ] [ m ] , b c u n d s = l o , h i • / i n t v[ ] £ H A X C 0 1 F : , 1 O [ ] , h i £ ] , a , n ;

IMPORT DCOELE s g r t ( ) ; i n t i , t i n a r y ( ) , c o B p a ( ) ; f l o a t g , t ;

g = t = 0 . 0 ; i = 0 ;

w h i l e ( c o a p a ( S l o [ 1 ] , S v [ i ] [ 1 ] , a - 1 ) > 0) i * * ;

w h i l e { c o B i a ( S h i [ 1 ] , S v £ i ] [ 1 ] ,B-1) > 0 8S i < n) { i^ ( • [ i l E O ] ) g *= 1 .0; / • i n c good count • / t *= 1.C; / • i n c t o t a l count • / !+•;

)

• p = (t>O.C S8 g > 0 . 0 ? g / t : 1 . 0 e - 5 ) ; / • pene trance • / • e = (g>O.C ? 1 . 0 * 1 . 0 / s q r t ( g ) : 100 .0)

• (t>O.C ? 1 . 0 * 1 . 0 / s q r t (t) : 1 0 0 . 0 ) ; / • e r r o r • / } / * . . . • V

f l o a t d i s t ( p , e ) / • d i s t a n c e between s u b - r e g i o n s • / f l o a t p [ ] , e [ ] ; {

II! PORT DC U EL I In () ; i n t i , j ;

i = 0 ; j = 1 ; i f ( P [ 0 ] < p [ 1 ] ) {

j=0 ; i = 1 ;

r e t u r n (In < p [ i ] / e [ i ] ) - In (p[ j ]^e[ j ] ) ) ; } / • . . . . V

float power(head,nr,v,m,n) /• search power 8 •/ struct cell •head; /• uncorrected search factor •/ int v[ ]£MAXCOEF',nr,B,n; C

IMPORT DCOELE ln() ; struct cell •r; float p,e,b,xy£500]; int i;

printf ("power: \E") ;

Page 48: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

42

r=head->l ink; i = 0 ; while (r > 5ULL) {

penet (6p ,Se , v , r - > l o , r - > h i , a, n) ; r ->s = p/r->p; x y £ i ] = l n ( p ) ; x y £ i * n r : = l n ( r - > p ) ; i**; r = r - > l i n k ;

} r s l f o r (5b,xy,8nr) ; return(b>0 ? b : 1 e - 5 ) ;

/* V nora (head,b,h,v,a,n) /• fine penetrance normalization •/ struct cell •head; /• of split regions •/ float b[ ], h; int m; {

IMPORT DCOELE exp () , l n () , s g r () , sqrt () ; s t r u c t c e l l • r ; i n t i , c o i n ( ) ; f l o a t p , e , e € , f a c t ;

p r i n t f ( " n o r a : \ n " ) ; r = head->l ink; whi le (r != NULL) {

i f ( c o i n ( r - > l o , r - > h i , r - > c p , m ) ) { / • u n s p l i t region • / penet (8p,Se,v,r->lo,r->hi,m,n) ; p = €xp(h^ln (p)) ; ee = 1.0/sqr (ln(r->e)) * 1.0/sgr (ln(e)) ; r->p = exp((ln (r->p)/sgr (ln(r->e))

* In (p)/sqr(ln(e)))/ee) ; r->e = exp (sgrt(1.0/ee)) ;

else { /• region was split •/ fact = 0; for (1=1; i < B; i**)

fact *= b[i]^(r->cp£i] - 0.5^(r->hi[i] * r->lo[i]));

r->p /= exp ((ln(r->s) * (1.0 - 1.0/h) • fact)); if (r->p < 1e-5) r->p = 1e-5;

} r = r->link; /• next region •/

Page 49: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

43

/ • - V

int coin (x,y,c,i) /• is c centre of region (x,y) ? •/ lot x[ ],y[ ],m; float c£ ]; {

int i;

i=1; while{i<B 88 ( a b s ( c [ i ] - (x [ i ]*y[ i ] ) •O.S) < 1e-5) )

1 * * ; return (i==i ? 1 : 0);

/ * , _ • /

regres (head,b,B,nr) / • au l t ip le l inear regression • / s t ruct c e l l •head; f l o a t b[ ] ; iftt m,nr; C

IMPORT DOOELE In () ; s t r u c t c e l l * r ; f l o a t x y [ 1 C 0 0 ] , b x y [ 1 0 0 ] ; i n t i , j ;

p r i n t f ( " r e g r e s : n") ; r = h e a d - > l i E k ; i = 0 ; w h i l e (r > HULL) {

f o r ( j = 1 ; j < a ; j * * ) ^ y f i • ( j - 1 ) * n r ] = 0 . 5 ^ ( r - > l o [ j ] * r - > h i [ j ] ) ;

x y [ i * (a-1) • n r ] = l n ( r - > p ) / l n (r->e) ;

r = r - > l i D k ; } r a l f o r ( b , b x y , x y , 8 m , 6 n r ) ;

}

y*****it***ili**** ***********************************^*4^^^^^

• LIST MANAGEMENT FUNTIONS * ****************************************** ************^Hiy

struct node • /• sets up list with header node •/ setlis 0 {

struct node •p;

p = (struct node •) allot(sizeof(struct node));

Page 50: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

44

} / * .

p->lnode = p->rnode = p; return (p) ;

y transfer(p ,q) / • transfers node after p to after q • / s truct node • p , •q; {

s t r u c t node • r ; i f (p->rnod€ == p)

return (-1); r = p->rncde; r->rnode->lnode = p; p->rnode = r->rnode; i n s e r t (q,r) ;

} / * • /

inser t (p,q) / • i n s e r t s node q after node p • / s t r u c t node •p,«q; / • in a l i s t of nodes • / {

q->rnode = p->rnode; q->lnode = p->rnode->lnode; p->mode = p->rnode->lnode = g;

} / * . . . - . . - . . . . . • / s truct node • / • creates new node and • / fixnode ( g n , s , n , f , o , g , p ) / • returns pointer to i t • / s t ruc t node •qn; / * pointer to predecessor node • / i n t s[ ] , n; / • permutation of degree n • / f l o a t f; / • fvalue of node • / i n t o; / • n o d e number • / i n t g; / • generator index • / s t ruc t node •p ; / • pointer to parent node • / {

s t r u c t node •pn;

pn = (s truct node •) a l l o t ( s i z e o f ( s t r u c t node)) i f (pn == KOLL) return (NULL) ; copya(pn->state,s,n); pn->fvalue = f; pn->ident = o; pn->gen = g; pn->parent = p; insert (qn,f n) ; return (pn);

Page 51: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

45

} /• V

struct node • onlis(qn,x,n) /• returns pointer to node with •/ struct node •qn; /• state x on list pointed to by qn •/ int x[ ],n; [

struct node •p;

f or (p=qn->rnode; p != qn; p=p->mode) i f (coapa ( p - > s t a t e , x , n ) == 0)

return (p) ; return(NOII) ;

} / • V

putseq ( f p , s , g , v ) / • pr in t s s o l u t i o n seguence • / FILE • f p ; / • from node s backwards • / s t r u c t node • s ; s t r u c t gener • g ; i n t v£ ][ MAXCOEF :; {

s t r u c t node • p ; i n t l e n ;

l e n = 0; f o r ( p = s ; p 1= NULL; p=p->parent) {

l e n * * ; fpr in t f ( fp ,"5c" ,g [F->gen] . symbol ) ; v [ p - > i d e n t ] [ 0 ] = 1 ; / • good node • /

f p r i n t f ( f p , " nso lu t ion length = *«d\n",len); }

/ * - • • • V

s t r u c t node • ^. ^ ^ .^v • ^ , *^ s e l e c t (qn) / * s e l e c t s f i r s t node with mm fvalue • / s t r u c t node^ qn; / • from queue gn • / i

s t r u c t node • p , •pmm; f l o a t minf; p = qn->rnode; a i n f = p->fvalu€; for (pa in=p; p != qn; p=p->rnode)

i f ( p - > f v a l u e < ainf) { ainf = p->fvalu€; pain = p;

Page 52: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

46

} return (pain);

} /*.. V •define NULL 0 /• pointer value for error report •/ •define ALL0TSI2E 4000000 /• size of available space •/

static char allctbuf[ALLCTSIZB] = {»0*); /• allot storage •/ static char •allotp = allotbuf; /• next free position •/

char •allot(n) /• returns pointer to n bytes •/ int n; /• general byte storage allocator •/ {

if (allctp • n <= allotbuf * ALLOTSIZE) { /• fits •/ allotp *= n; return(allotp - n); /• old p •/

} else return (NOLL) ;

} /*.......... •/

freeit (p) /• free storage pointed to by p •/ struct node •p; {

if (p >= allctbuf 85 p < allotbuf * ALLOTSIZE) allotp = p;

} ^ ^ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

* ARRAY MANIPULATICNS • ^***********************************************^*******/

prod(p,x,y,n) /• product of two perautations •/ int p[ ], x[], y£], n; {

int i;

for(i=0 ; i < n; i**)

P[i] = i C ^ E i ] ] ;

^. V i n t c o a p a ( x , y , n ) / • returns <0 i f x<y, • / i n t x[ ] , y [ ] , n; / • 0 i f x=y, >0 i f x>y • / {

int i; i=0;

Page 53: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

47

while ( x [ i ] = y [ i ] 85 i<n) i**;

return(i==n ? 0 : x [ i ] - y [ i ] ) ; } / * • • • • - - - . . . V copya(x,y,n) / • copy array y to array x • / i a t x£ ] , y [ ] , n; C

int i ;

for( i=0; i<n; i**) ^ i ] = l £ i ] ;

} / * - . - . . . . . • /

geta(fp,ffflt,x,n) / • inputs array • / FILE •fp; char • fat ; int x[ ] , n; {

in t i ; for (1=0; i < n; i**) {

fscanf (fp,fat ,Sx[i]) ; }

} / * . V

puta (fp,fmt,x,D) /• outputs array •/ FILE •fp; char •fat; int x£ ], n; {

in t i ; for(i=0; i < n; i**)

fprintf (fp,fmt,x[i]) ; f printf (fp,"\n") ;

3 / * . - . . ^ V getb (fp,fBt,x,n) / • inputs f loat array • / FILE •fp; char • fa t ; f loa t x[ ] ; int n; {

int i ;

Page 54: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

48

for(i=0; i < n; i**) { fscanf (fp,fat,8x[i]) ;

3

putb FILE c h a r

( f p , f a t , x * f p ; • f a t ;

f l o a t x[ ] ; i n t n; {

i n t i ;

,n)

• /

/• outputs float array •/

3 / *

f or ( i= 0 ; i < n; i**) fpr int f ( f p , f a t , x [ i ] ) ;

f printf (fp,"\n") ;

* /

/• orbit lengths product of •/ /• perautation x of degree n •/

i n t cyc lp(x ,n) i n t x£ ] , n; { i n t prod,count,i,3,teap,y[MAXDEG]; f o r ( i = 0 ; i < n; i**)

y [ i ] = x [ i ] ; prod = 1; f o r ( i = 0 ; i < n; i**) {

ifCyCi] < C) continue; count = 1; j = i ; while ( y [ j ] != i) {

teap = y £ j ] ; yCj] = - 1 ; j = teap; count**;

3 yC j ] = - n prod •= count;

3 return (prod) ; 3 / *

* /

i n t nmis(jc,y,n) / • nuaber of misaatches of arrays x and y • / i n t x£ ] , y[ ] , n; { i n t c o u n t , i ;

Page 55: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

49

CO un t = 0 ; f o r ( i = 0 ; i < n; i**)

i f ( x [ i 3 •= y£ i ] ) count**; return (count) ; 3 / * * /

i n t s d i s t ( x , y , n ) i n t x£ ] ,y[ 3 ,n; £ i n t c o u n t , i , j ;

/ • SUB of d i s tances • / / * between equal elements • /

count=0; f o r ( 1 = 0 ; i < n; i**) {

j = 0 ; w h i l e ( j < n '65 x [ i ] ! =

count *= a b s ( i - j ) ; 3 re turn (count) ; 3 / * . . . . . . . . . . . - . . . . . . ,

y l j ] )

V

i n t npairs ( x , y , n ) i n t x[ ] , y£ ] , n; { i n t c o u n t , i , j ;

/ • number of pair r e v e r s a l s • /

count=0; f o r ( i = 0 ; i < n; i**)

for ( j = i * 1 ; j < n ; j**) i f ( x [ i ] = = y [ j ] 88 y [ i ] = = x [ j ] )

count**; return (count) ; 3 / * . . . . . . . • * /

s o r t a ( v , B , n ) / • s h e l l s o r t v£0 ] . . .v£n-1 ] in i n c order • / i n t v£ ]£flAXCOEF', m, n; {

i n t g a p , t e a p , i , j ,k ,compa() ;

for (gap = n /2; gap > 0; gap / = 2) f o r ( i = gap; i < n; i**)

f o r ( j = i - g a p ; j>=0; j-=gap) { i f (compa(fiv[ j ] £ 1 ] , 8 v [ j * g a p ] [ 1],m-1) <= 0)

break; f cr (k=0; k<m; k**) {

Page 56: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

50

3

temp = v£j ]£k3; • [ J l C k ] = v£j*gap]£k] ; vCJ*gap3£k] = temp;

3 / * • /

Page 57: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

51

The following are FORTRAN subroutines used for regression.

Q*:^* *:ltt ***************** *******************i^* ***********

C BEGRESSION ROUTINES • Q*********i^i^** ************ ^*^:tiit:^:$^i:^^i^^^^^^:^i^i^im^^jli^lli^**

SUBROUTINE RSLFOR (B,XY,N) C SIMPLE LINEAR REGRESSION

INTEGER N REAL XT(N,2) , B INTEGER IX,IMOD,IPRED,IP,NN,IER REAL ALEAP(3) ,DES(5) ,AN0VA(14) ,STAT(9) ,PflED(1,7)

IX = N IMOD = 1 IPRED = 0 ALBAP (1) = 0.05 IP=1 CALL RLCHE(XY,IX,N,IMOD,IPRED,ALBAP,DES,

• ANOVA,STAT,PBED,IP,NN,IEB) B = STAT (1) RETURN END

SUBROUTINE RMLFOR (B,XYB, XY, M, N) FORWARD STEPWISE MULTIPLE LINEAR REGRESSION

INTEGER REAL INTEGER REAL

B,N E(H) ,XYB(M,5),XY(N,M) IX,IJ0B(2) ,IND(9) ,IB,IER ALFA(2) ,AN0VA(16) ,VABB(15)

MX = H-1 IX = N ALFA(1) ALFA (2) IJOB(1) IJ0B(2) IB = H IN0(1) IND(2) IND (3) IND(4)

0.1 0.15 0

= 1

= 1 = 0 = 0 = 0

CALL RLSEP (XY,N,MX,IX,ALFA,iaOB, IND,AHOVA,XYB,IB,VARB,IER)

B(1) = XTE(M,2)

Page 58: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS

52

10

K=l!-1 DO 10 1=1,K B ( I * 1 ) = XYB(I ,2 ) RETURN END

Page 59: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS
Page 60: CCNTBCLIET S2ASCH CVEE PINIIE PEBPOTATION GiODPS