ai complete notes

CHAPTER 1 : AI And Internal RepresentationCHAPTER 1 : AI And Internal Representation

INTRODUCTION The problem of defining artificial intelligence becomes one of defining intelligence itself. What is intelligence…..? Is it a single faculty…? Or is it a collection of properties…..? Is it a learned faculty….? Or is it something already existed….? Can it be observed in behavior….? Or, do we have some scale to measure it…? These questions, though answers of which may be diverse nature, help in understanding and

exploring the limits of AI.

HISTORY Intelligence forms the foundation of all human technology and in fact all human civilization. But there was a feeling that the human efforts to learn knowledge constitute a transgression

against the laws of God or nature. E.g. Eve in Bible, Prometheus in Greek mythology. The logical starting point of the history in AI dates back to Aristotle. He formalized the insights, wonders and fears regarding nature with careful analysis into a

disciplined thought. For him, the study of thought it self was the basis of all knowledge. In his ‘Logic’, he investigated whether certain propositions can be said to be “true” because they

are related to other things that are known to be true. Gottlob Frege, Bertrand Russel, Kurt Godel, Alan Turing etc followed this school of thought. The major development, which drastically changed the world view was the discovery of Copper

Niccus - that the earth is not the center of universe but is a just a component of it. Though it was against the practiced dogmas and revered religious beliefs, it was a new

realization that -o Our ideas about world may be fundamentally different from its appearance.o There is a gap between the human mind and its surrounding realities.o There is a gap between idea about things and things about themselves.

The argument is that-o We could separate mind and physical world.o It is necessary to find way to connect these.

The accepted view is that, though they are separate, they are not fundamentally different. Mental process, like physical process, can be characterized using formal mathematics or logic.

LOGIC BASED INTELLIGENCE As thinking has become a form of computation, its formalization and mechanization became

necessary. In 1887, Leibniz introduced a system of formal logic and constructed a machine for automating

its calculation. Euler’s discovery of graph theory, through the Kongsberg problem, introduced the concept of

state space representation. George Boole, through his Boolean algebra, made the mathematical formalization of the laws of

logic through AND, OR & NOT. Gottlob Frege created a language, called ‘First Order Predicate Calculus’, which later used as a

representation techniques in AI. Bertrand Russel, in his ‘Principia Mathematica’, developed theoretical foundation of AI by

treating mathematics as a formal system.

of 94

Alfred Tarski created a ‘Theory of Reference’ wherein the well-formed formulae can be referred. Though study of science and mathematics were the prerequisites for the formal study of AI, by

the invention of digital computers it was changed. Using its potential to provide memory and processing power, we can treat

o Intelligence as a form information processing.o Search as a problem solving methodology.o Knowledge can be put in a representational way, which can be manipulated using

algorithms.

THINKING MACHINES Can machines think? Are human’s machines? Can machines think like humans? How would we know whether a computer is thinking?

CAN COMPUTERS THINK? Some common answers:

o No (dualist/mystic): Computers lack "mental stuff". They don't have intuitions, feelings, phenomenology. (Soul?)

o No (neurophysiology critical): Even if their behavior is arbitrary close, biology is essential.

o No (beyond our capabilities): Not impossible in practice, but too complex. There are limits on our self-knowledge which will prevent us from creating a thinking computer.

o Yes (but not in our lifetimes): Too complex. Practical obstacles may be insurmountable. But with better science and technology, maybe.

o Yes (functionalist): Programs/computers will become smarter without a clear limit. For all practical purposes, they will 'think' because they will perform the FUNCTIONS of thinking. (The standard AI answer.)

o Yes (extreme functionalist; back to mystic): Computers already think. All matter exhibits mind in different aspects and degrees.

TURING TEST Allan Turing, in 1950, considered the question of whether a machine could actually make to

think. Turing test measures the performance of an ‘intelligent’ machine.

of 94

LOGIC BASED INTELLIGENCE VS. AGENT BASED INTELLIGENCE We have looked at intelligence as a logical inference and logic as a knowledge representation

technique. This is a typical western thought starting from Aristotle. Of late, there have been instances of questioning this school of thought. This argument is based on simple facts like –

o Formation of Language has no logical reasoning. It has much to do with cultural and social situation.

o Working of brain is based on the inputs of neurons (So we have Artificial Neural Network).

o Specie’s adapt to an environment (So we have Genetic Algorithm).o Working of social systems is based on the performance of autonomous individual agents

(So we have Intelligent Agents). These examples show two things -

o Intelligence is emerged in the process.o Intelligence is reflected by the collective behavior of agents.

Agent-oriented and emergent view of intelligence has the following properties –o Agents are autonomous or semi-autonomous. Each agent has a specified responsibility to

undertake and is ignorant about what others are doing.o Each agent is sensitive to its own surrounding environment and has no knowledge of the

full domain.o Agents interact one another. The society of agents is structured, helping in the solving of

global problem.o The cooperative interaction of the agents results in the emergence of intelligence.

FOUNDATIONS OF AI Different fields have contributed to AI in the form of ideas, viewpoints and techniques.

o Philosophy: Logic, reasoning, mind as a physical system, foundations of learning, language and rationality.

o Mathematics: Formal representation and proof algorithms, computation, (un) decidability, (in) tractability, probability.

o Psychology: adaptation, phenomena of perception and motor control.o Economics: formal theory of rational decisions, game theory.o Linguistics: knowledge representation, grammar.o Neuroscience: physical substrate for mental activities.o Control theory: homeostatic systems, stability, and optimal agent design.

A BRIEF HISTORY What happened after WWII?

o 1943: Warren Mc Culloch and Walter Pitts: a model of artificial Boolean neurons to perform computations.

First steps toward connectionist computation and learning (Hebbian learning). Marvin Minsky and Dann Edmonds (1951) constructed the first neural network

computer.o 1950: Alan Turing’s “Computing Machinery and Intelligence”

First complete vision of AI. The birth of AI (1956)

of 94

o Darmouth Workshop bringing together top minds on automata theory, neural nets and the study of intelligence.

Allen Newell and Herbert Simon: The logic theorist (first non numerical thinking program used for theorem proving).

For the next 20 years the field was dominated by these participants.o Great expectations (1952-1969)

Newell and Simon introduced the General Problem Solver. Imitation of human problem-solving

Arthur Samuel (1952) investigated game playing (checkers) with great success. John McCarthy(1958)

Inventor of Lisp (second-oldest high-level language) Logic oriented, Advice Taker (separation between knowledge and

reasoning). Great expectations continued

o Marvin Minsky (1958) Introduction of microworlds that appear to require intelligence to solve: e.g.

blocks-world. Anti-logic orientation, society of the mind.

o Collapse in AI research (1966 - 1973) Progress was slower than expected.

Unrealistic predictions. Some systems lacked scalability.

Combinatorial explosion in search. Fundamental limitations on techniques and representations.

Minsky and Papert (1969) Perceptrons. AI revival through knowledge-based systems (1969-1970).

o General-purpose vs. domain specific E.g. the DENDRAL project.

First successful knowledge intensive system.o Expert systems

MYCIN to diagnose blood infections Introduction of uncertainty in reasoning.

o Increase in knowledge representation research. Logic, frames, semantic nets, …

AI becomes an industry (1980 - present) Connectionist revival (1986 - present)

o Parallel distributed processing (RumelHart and McClelland, 1986); back propagation. AI becomes a science (1987 - present)

o In speech recognition: hidden markov models.o In neural networks.o In uncertain reasoning and expert systems: Bayesian network formalism.

The emergence of intelligent agents (1995 - present)o The whole agent problem:

“How does an agent act/behave embedded in real environments with continuous sensory inputs”.

STATE OF THE ART Deep Blue defeated the reigning world chess champion Garry Kasparov in 1997.

of 94

ALVINN: No hands across America (driving autonomously 98% of the time from Pittsburgh to San Diego).

DART: During the 1991 Gulf War, US forces deployed an AI logistics planning and scheduling program that involved up to 50,000 vehicles, cargo, and people.

NASA's on-board autonomous planning program controlled the scheduling of operations for a spacecraft.

Proverb solves crossword puzzles better than most humans.

AI IN LOGIC PERSPECTIVE AI is the study of mental faculties through the use of computational models. It is on the premise that what brain does may be thought of as a kind of computation. Though what brain does easily takes enormous efforts to be done by a machine. E.g. vision.

INTERNAL REPRESENTATION In order to act intelligently, a computer must have the knowledge about the domain of interest. Knowledge is the body of facts and principles gathered or the act, fact, or state of knowing. This knowledge needs to be presented in a form, which is understood by the machine. This unique format is called internal representation. Thus plain English sentences could be translated into an internal representation and they could be

used to answer based on the given sentences.

PROPERTIES OF INTERNAL REPRESENTATION Internal representation must remove all referential ambiguity.

o Referential ambiguity is the ambiguity about what the sentence refers to.o E.g. ‘Raj said that Ram was not well. He must be lying.’o Who does ‘he’ refer to?

Internal representation should avoid word-sense ambiguity.o Word-sense ambiguities arise because of multiple meaning of words.o E.g. ‘Raj caught a pen. Raj caught a train. Raj caught fever.’

Internal representation must explicitly mention functional structure.o Functional structure is the word order used in the language to express an idea.o E.g. ‘Ram killed Ravan. Ravan was killed by Ram.’o Thus internal representation may not use the order of the original sentence.

Internal representation should be able handle complex sentence without losing meaning attached with it.

PREDICATE CALCULUS Predicate Calculus is an internal representation methodology which helps us in deducing more

results from the given propositions (statements). Predicate calculus access’s individual components of a proposition and represent the proposition. For example, the sentence ‘Raj came late on Sunday’ can be represented in predicate calculus as:

(came-late Raj Sunday) Here ‘came-late’ is a predicate that describes the relation between a person and a day. ‘Raj came late on a rainy Sunday’ can be represented as: (came-late Raj Sunday) & (inst Sunday

rainy) Predicate permits us to break a statement down into component parts namely, objects, a

characteristic of the object, or some assertion about the object.

SYNTAX OF PREDICATE CALCULUS

of 94

Predicate and Argumentso In predicate calculus, a proposition is divided into two parts:

Arguments (or objects) Predicate (or assertion)

o The arguments are the individual or objects an assertion is made about. The predicate is the assertion made about them.

o In an English language sentence, objects are nouns that serve as subject and object of the sentence and predicate would be the verb or part of the verb.

o For example the proposition: ‘Vinod likes apple’ would be stated as: (likes Vinod apple) o Where ‘likes’ is the predicate and Vinod and apple are the arguments.o In some cases, the proposition may not have any predicates. For example: Anita is a

woman i.e. (inst Anita woman). Constants

o Constants are fixed value terms that belong to a given domain.o They are denoted by numbers and words. E.g. 123, abc.

Variableso In predicate calculus, letters may be substituted for the arguments.o The symbols x or y could be used to designate some object or individual.o The example “Vinod likes apple” could be expressed in variable form if x = Vinod and y

= apple. Then the proposition becomes: (likes x, y)o If variables are used, then the stated proposition must be true for any names substituted

for the variables. Instantiation

o Instantiation is the process of assigning the name of a specific individual or object to a variable.

o That object or individual becomes an “instance” of that variable.o In the previous example, supplying Vinod for x and apple for y is a case of instantiation.

Connectiveso There are four connectives used in predicate calculus.o They are ‘not’, ‘and’, ‘or’ and ‘if’.o If p and q are formulas then (and p, q), (or p, q), (not p) and (if p, q) are also formulas.o They can be expressed in truth tables.o (not p):

p (not p)T FF T

o (and p, q):p q (and p, q)T T TT F FF T FF F F

o (or p, q)p q (or p, q)T T TT F TF T T

of 94

F F Fo (if p, q):

p q (if p, q)T T TT F FF T TF F T

Quantifierso A quantifier is a symbol that permits us to state the range or scope of the variables in a

predicate logic expression.o Two quantifiers are used in logic:

The universal quantifier – ’for all’. E.g. (forall (x) f) for a formula f. The existential quantifier – ‘exists’. E.g. (exists (x) f) for a formula f.

Function applicationso It consists of a function which takes zero or more arguments.o E.g. friend-of (x).

“All Maharashtrians are Indian citizens” could be expressed as:o (forall (x) (if Maharastrian(x) Indian citizen(x)).

“Every car has a wheel” could be expressed as:o (forall (x) (if (Car x) (exists (y) wheel-of (x y))).

THE PREDICATE CALCULUS CONSISTS OF: A set of constant terms. A set of variables. A set of predicates, each with a specified number of arguments. A set of functions, each with a specified number of arguments. The connectives - ‘if’, ‘and’, ‘or’ and ‘not’. The quantifiers - ‘exists’ and ‘forall’. The terms used in predicate calculus are:

o Constant terms.o Variables.o Functions applied to the correct number of terms.

The formulas used in predicate calculus are:o A predicate applied to the correct number of terms.o If p and q are formulas then (if p, q), (and p, q), (or p, q) and (not p).o If x is a variable, and p is a formula, then (exists(x) p), and (forall (x) p).

In predicate calculus, the initial facts from which we can derive more facts are called axioms. The facts we deduce from the axioms are called theorems. The set of axioms are not stable and in fact change over time as new information (axioms)

comes.

INFERENCE RULES From a given set of axioms, we can deduce more facts using inference rules. The important

inference rules are:o Modus ponens: From p and (if p, q) infer q.o Chain rule: From (if p, q) and (if q, r) infer (if p, r).o Substitution: if p is a valid axiom, then a statement derived using consistent substitution

of propositions is also svalid.

of 94

o Simplification: From (and p, q) infer p.o Conjunction: From p and q infer (and p q).o Transposition: From (if p, q) infer (if (not q) (not p)).o Universal instantiation: if something is true of everything, then it is true for any particular

thing.o Abduction: From q and (if p, q) infer p. (Abduction can lead to wrong conclusions. Still,

it is very important as it gives lot explanation. For example: medical diagnosis.)o Induction: From (P, a), (P, b) … infer (forall (x) (P, x)). (Induction leads to learning.)

EXERCISE: EXPRESS THE FOLLOWING IN PREDICATE CALCULUS:- Roses are red.

o (if (inst x rose) (color x red)). Violets are blue.

o (if (inst x violet) (color x red)). Every chicken hatched from an egg.

o (forall (x) (if (chicken x) (exists (y) hatched-from(x y))). Some language is spoken by everyone in this class.

o (forall (x) (if (belong-to-class x) (exists (y) speak-language(x y))). If you push anything hard enough, it will fall over.

o (forall (x) (if (push-hard x) (fall-over x)). Everybody loves somebody sometime.

o (forall (x) ((exists (y) loves-sometime(x y))). Anyone with two or more spouses is a bigamist.

o (forall (x) ((inst x have-two-or-more-wife) (inst x bigamist)))

ALTERNATIVE NOTATIONS Knowledge, which is represented in the internal representation technique predicate calculus,

could be represented in a number of alternative notations. The important representations are:

o Semantic networks. One of the oldest and easiest to understand knowledge representation schemes is

the semantic network. They are basically graphical depictions of knowledge that show hierarchical

relationships between objects. For example ‘Sachin is a cricketer’ i.e. (inst Sachin cricketer), can be represented in associative network as

of 94

inst

Cricketer

Sachin

A semantic network is made up of a number of ovals or circles called nodes. Nodes represent objects and descriptive information about those objects. Objects can be any physical item, concept, event or an action. The nodes are interconnected by links called arcs. These arcs show the relationships between the various objects and descriptive

factors. The arrows on the lines point from an object to its value along the corresponding

arc. From the viewpoint of predicate calculus, associative networks replace terms with

nodes and relation with labeled directed arcs. The semantic network is a very flexible method of knowledge representation. There are no hard rules about knowledge in this form. Semantic networks can show inheritances in the sense that it can explain how

elements of specific classes inherit attributes and values from more general classes in which they are included.

The isa relation is a subset relation. The cricketers are a subset of the set of sportsman.

E.g. (isa cricketer sportsman). The instance relation corresponds to the relation element-of. Sachin is an element of the set of cricketers. Thus he is an element of all the

supersets of all cricketers. The ‘isa’ relation corresponds to the relation ‘subset of’. Cricketers are a subset of sportsmen and hence cricketers inherit al the properties

of sportsmen.

of 94

inst

Cricketer

SachinSportsman

isa

The predicate calculus lacks a backward pointer resulting a long search for retrieving information.

Thus the predicate calculus along with an indexing (pointing) scheme is a much better internal representation scheme than semantic networks as it has connectives and quantifiers.

o Slot Assertion Notation In a slot assertion notation various arguments, called slots, of predicate are

expressed as separate assertions. Slot assertion notation is a special type of predicate calculus representation. For example (catch-object Sachin ball) can be expressed as

(inst catch1 catch-object) … // catch1 is a one type of catching. (catcher catch1 Sachin) … // Sachin did the catching. (caught catch1 ball) … // he caught the ball.

o Frame notation. Frame notation combines the different slots of the slot assertion notation. Thus we have,

(catch-object catch1 (catcher Sachin) (caught ball)) Here we have constructed a single structure called a frame that includes all the

information.

EXERCISE: CONVERT THE FOLLOWING TO FIRST-ORDER PREDICATE LOGIC USING THE PREDICATES INDICATED:-

swimming_pool(X) steamy(X) large(X)

of 94

unpleasant(X) noisy(X) place(X)

All large swimming pools are noisy and steamy places. All noisy and steamy places are unpleasant. All noisy and steamy places except swimming pools are unpleasant. The swimming pool is small and quiet.

ANSWERS:- All large swimming pools are noisy and steamy places.

o (forall (x) (if (and large(X) swimming_pool(X)) (and noisy(X) (and (steamy(X) place(X)))).

All noisy and steamy places are unpleasant. o (forall (x) (if (and noisy(X) (and (steamy(X) place(X))) unpleasant(X))).

All noisy and steamy places except swimming pools are unpleasant. o (forall (x) (if ((not swimming_pool(x)) and noisy(X) (and (steamy(X) place(X)))

unpleasant(X)))). The swimming pool is small and quiet.

o (and swimming_pool(x) and (not large(X)) (not noisy(X)))CHAPTER 2 : LISPCHAPTER 2 : LISP

BRIEF HISTORY Lisp (the acronym stands for LISt Processor) is the second oldest programming language still in

use (after FORTRAN), invented by Ram McCarthy at MIT in 1958. For many years it could only be run on special purpose and rather expensive hardware. Until the

mid '80s Lisp was more a family of dialects than a single language. In 1986 an ANSI subcommittee was formed to standardize these dialects into a single Common

Lisp, the result being the first Object Oriented language to become standardized, in 1994. Famous book: ANSI Common Lisp by Paul Graham, Prentice Hall 1996.

LIST Lists are surrounded by parentheses. Anything surrounded by parentheses is a list. Here are some examples of things that are lists:

o (1 2 3 4 5)o (a b c)o (cat 77 dog 89)

What if I put parentheses around nothing? What if I put parentheses around another list? In both cases the answer is the same. You still have a list. Atoms are separated by white space or parentheses. A name for the things that appear between the parentheses – the things that are not themselves

lists, but rather (in our examples so far) words and numbers. These things are called atoms. Accordingly, these words and numbers are all atoms:

o 1o 25o 342o mouseo factorial

of 94

FORM A form is meant to be evaluated. A form can be either an atom or a list. The important thing is that the form is meant to be evaluated. A number is an atom. (Its value is constant for obvious reasons.) Lisp does not store a value for a number – the number is said to be self-evaluating. We are going to introduce a new term without a complete definition. For now, think of a symbol as an atom that can have a value. If a form is a list, then the first element must be either a symbol or a special form called a lambda

expression. The symbol must name a function. In Lisp, the symbols +, -, *, and / name the four common arithmetic operations: addition,

subtraction, multiplication, and division. Each of these symbols has an associated function that performs the arithmetic operation. So when Lisp evaluates the form (+ 2 3), it applies the function for addition to the arguments 2

and 3, giving the expected result 5.

FUNCTION A function is applied to its arguments. Lisp, when given a list to evaluate, treats the form as a function call. For example:

o ( + 4 9 ) 13

We can apply arithmetic operations to these numbers. The syntax for doing so iso Left parenthesiso Name of operatoro Various arguments to that operator, each preceded by white spaceo Right parenthesis

For example:o (+ 1 2 3)

NESTED CALCULATIONS We can nest one function call within another one, for example:

o (+ (* 2 3) (* 4 5 6)) There is an unambiguous rule for evaluating function calls, as follows:

o Process the arguments to the function, in order ("from left to right").o Evaluate each argument in turn.o Once all the arguments have been evaluated, call the original function with these values.o Return the result.

So, to evaluate (+ (* 2 3) (* 4 5 6))o We start by noting that we have a call to the function + with arguments (* 2 3) and (* 4 5

6).o We evaluate the first argument, namely (* 2 3)

We note that this is itself a function call (the function is * and its arguments are 2 and 3).

We must therefore evaluate the first argument to * - this is the number 2 The number 2 evaluates to itself

of 94

We next evaluate the second argument to * - this is the number 3 The number 3 evaluates to itself

We can now call the function * with arguments 2 and 3 The result of this function call is 6

o The value of the first argument to the function + is therefore 6.o Similarly the value of the second argument to the function + is 120.o We can now call the function + with arguments 6 and 120 and finally return the value

126. Lisp always does the same thing to evaluate a list form:

o Evaluate the arguments, from left to right.o Get the function associated with the first element.o Apply the function to the arguments.

Remember that an atom can also be a Lisp form. When given an atom to evaluate, Lisp simply returns its value:

o 17.95 17.95

Here are a few more examples:o (atom 123)

To (numberp 123)

To (atom :foo)

T Atom and numberp are predicates. Predicates return a true or false value. NIL is the only false value in Lisp – everything else is

true. A function can return any number of values. Sometimes we did like to have a function return several values. For now, let's see what happens when Lisp evaluates a VALUES form:

o (values 1 2 3 :hi "Hello") 1 2 3 :HI "Hello"

SETQ SETQ evaluates a symbol form by retrieving its variable value. (setq his-name "Rahul")

o "Rahul" his-name

o "Rahul" (setq a-variable 57)

o 57 a-variable

o 57 SETQ's first argument is a symbol. This is not evaluated. The second argument is assigned as the variable's value.

of 94

SETQ returns the value of its last argument. SETQ performs the assignments from left to right, and returns the rightmost value. The SETQ form can actually take any even number of arguments, which should be alternating

symbols and values:o (setq month "July" day 12 year 2005)

1954o month

"July"o day

12o year

2005 SETQ performs the assignments from left to right, and returns the rightmost value.

LET The LET form looks a little more complicated than what we have seen so far. The LET form uses nested lists, but because it's a special form, only certain elements get

evaluated. (let ((a 3) (b 4) (c 5)) (* (+ a b) c))

o 35 a

o Error: Unbound variable. The above LET form defines values for the symbols A, B, and C, then uses these as variables in

an arithmetic calculation. In general, LET looks like this:

o (let (bindings) forms)o where bindings is any number of two-element lists – each list containing a symbol and a

value – and forms is any number of Lisp forms. The forms are evaluated, in order, using the values established by the bindings. If you define a variable using SETQ and then name the same variable in a LET form, the value

defined by LET supersedes the other value during evaluation of the LET:o (setq a 89)

89o a

89o (let ((a 3)) (+ a 2))

5o a

89 Unlike SETQ, which assigns values in left-to-right order, LET binds variables all at the same

time:o (setq w 77)

77o (let ((w 8) (x w)) (+ w x))

85 LET bounds w to 8 and x to w. Because these bindings happened at the same time, w still had its

value of 77.

of 94

COND The COND macro lets to evaluate Lisp forms conditionally. Like LET, COND uses parentheses

to delimit different parts of the form. Consider these examples:o (let ((a 1) (b 2) (c 1) (d 1)) (cond ((eql a b) 1) ((eql a c) "First form" 2) ((eql a d) 3)))

2 EQL returns T if its two arguments are identical, or the same. Only two of the three tests are executed. The first, (EQL A B), returned NIL. Therefore, the rest of that clause (containing the number 1 as its only form) was skipped. The

second clause tested (EQL A C), which was true. Because this test returned a non-NIL value, the remainder of the clause (the two atomic forms,

"First form" and 2) was evaluated, and the value of the last form was returned as the value of the COND, which was then returned as the value of the enclosing LET.

The third clause was never tested, since an earlier clause had already been chosen – clauses are tested in order.

Conventional use of COND uses T as the test form in the final clause. This guarantees that the body forms of the final clause get evaluated if the tests fail in all of the

other clauses. You can use the last clause to return a default value or perform some appropriate operation.

Here's an example:o (let ((a 32)) (cond ((eql a 13) "An unlucky number") ((eql a 99) "A lucky number") (t

"Nothing special about this number"))) "Nothing special about this number".

Sometimes we did like to suppress Lisp's normal evaluation rules. One such case is when we'd like a symbol to stand for itself, rather than its value, when it

appears as an argument of a function call:o (setq a 97)

97o a

97o (setq b 23)

23o (setq a b)

23o a

23o (setq a (quote b))

Bo a

B The difference is that B's value is used in (SETQ A B), whereas B stands for itself in (SETQ A

(QUOTE B)). The QUOTE form is so commonly used that Lisp provides a shorthand notation:

o (QUOTE form) = 'form The symbol means that the two Lisp forms are equivalent.

CONS CONS is the most basic constructor of lists. It is a function, so it evaluates both of its arguments.

The second argument must be a list or NIL.

of 94

(cons 1 nil)o (1)

(cons 2 (cons 1 nil))o (2 1)

(cons 3 (cons 2 (cons 1 nil)))o (3 2 1)

CONS adds a new item to the beginning of a list. The empty list is equivalent to NIL. ( ) = NIL So we could also have written:

o (cons 1 ( )) (1)

o (cons 2 (cons 1 ( ))) (2 1)

o (cons 3 (cons 2 (cons 1 ( )))) (3 2 1)

NIL is one of two symbols in Lisp that isn't a keyword but still has itself as its constant value. T is the other symbol that works like this.

The fact that NIL evaluates to itself, combined with ( ) NIL, means that you can write ( ) rather than (QUOTE ( )).

Otherwise, Lisp would have to make an exception to its evaluation rule to handle the empty list.

LIST As we have noticed, building a list out of nested CONS forms can be a bit tedious. The LIST

form does the same thing in a more perspicuous manner:o (list 1 2 3)

(1 2 3) LIST can take any number of arguments. Because LIST is a function, it evaluates its arguments:

o (list 1 2 :hello "there" 3) (1 2 :HELLO "there" 3)

o (let ((a :this) (b :and) (c :that)) (list a 1 b c 2)) (:THIS 1 :AND :THAT 2)

FIRST AND REST If we think of a list as being made up of two parts – the first element and everything else – then

you can retrieve any individual element of a list using the two operations, FIRST and REST. (setq my-list (quote (1 2 3 4 5)))

o (1 2 3 4 5) (first my-list)

o 1 (rest my-list)

o (2 3 4 5) (first (rest my-list))

o 2 (rest (rest my-list))

o (3 4 5) (first (rest (rest my-list)))

o 3 (rest (rest (rest my-list)))

of 94

o (4 5) (first (rest (rest (rest my-list))))

o 4

NAMING AND IDENTITY A symbol is just a name

o It can stand for itself.o This makes it easy to write certain kinds of programs in Lisp.o For example, if we want your program to represent relationships in your family tree, we

can make a database that keeps relationships like this: (father Ram Arun) (son Ram Dev) (father Ram Sangita) (mother Lakshmi Arun) (mother Lakshmi Sangita)

o Each relationship is a list.o (father Ram Arun) means that Ram is Arun's father.o Every element of every list in our database is a symbol.o Our Lisp program can compare symbols in this database to determine, for example, that

Dev is Arun's grandfather.o If we try to write a program like this in another language – a language without symbols –

we have to decide how to represent the names of family members and relationships, and then create code to perform all the needed operations – reading, printing, comparison, assignment, etc.

o This is all built into Lisp, because symbols are a data type distinct from the objects they might be used to name.

A symbol is always uniqueo Every time our program uses a symbol, that symbol is identical to every other symbol

with the same name. We can use the EQ test to compare symbols: (eq 'a 'a)

T (eq 'david 'a)

NIL (eq 'Lakshmi 'Sangita)

T (setq zzz 'sleeper)

SLEEPER (eq zzz 'sleeper)

To Notice that it does not matter whether we use uppercase or lowercase letters in your

symbol names.o Internally, Lisp translates every alphabetic character in a symbol name to a common case

– usually upper. A symbol can name a value

o Although the ability for a Lisp symbol to stand for itself is sometimes useful, a more common use is for the symbol to name a value.

o This is the role played by variable and function names in other programming languages.

of 94

o A Lisp symbol most commonly names a value or – when used as the first element of a function call form – a function.

o What's unusual about Lisp is that a symbol can have a value as a function and a variable at the same time:

(setq first 'number-one) NUMBER-ONE

(first (list 3 2 1)) 3

first NUMBER-ONE

o Note how FIRST is used as a variable in the first and last case, and as a function (predefined by Lisp, in this example) in the second case.

o Lisp decides which of these values to use based on where the symbol appears.o When the evaluation rule requires a value, Lisp looks for the variable value of the

symbol.o When a function is called for, Lisp looks for the symbol's function.o A symbol can have other values besides those it has as a variable or function.o A symbol can also have values for its documentation, property list, and print name.o A symbol's documentation is text that we create to describe a symbol.o We can create this using the DOCUMENTATION form or as part of certain forms,

which define a symbol's value.o Because a symbol can have multiple meanings, we can assign documentation to each of

several meanings, for example as a function and as a variable. A value can have more than one name

o A value can have more than one name.o That is, more than one symbol can share a value.o Other languages have pointers that work this way.o Lisp does not expose pointers to the programmer, but does have shared objects.o An object is considered identical when it passes the EQ test. Consider the following:

(setq L1 (list 'a 'b 'c)) (A B C)

(setq L2 L1) (A B C)

(eq L1 L2) T

(setq L3 (list 'a 'b 'c)) (A B C)

(eq L3 L1) NIL

o Here, L1 is EQ to L2 because L1 names the same value as L2.o In other words, the value created by the (LIST 'A 'B 'C) form has two names, L1 and L2.

The (SETQ L2 L1) form says, "Make the value of L2 be the value of L1." o Not a copy of the value, but the value.o So L1 and L2 share the same value – the list (A B C, which was first assigned as the

value of L1.o L3 also has a list (A B C) as its value, but it is a different list than the one shared by L1

and L2.

of 94

o Even though the value of L3 looks the same as the value of L1 and L2, it is a different list because it was created by a different LIST form.

o So (EQ L3 L1) NIL because their values are different lists, each made of the symbols A, B, and C.

ESSENTIAL FUNCTION DEFINITIONS DEFUN

o DEFUN defines named functions.o We can define a named function using the DEFUN form:

(defun secret-number (the-number) (let ((the-secret 37)) (cond ((= the-number the-secret) 'that-is-the-secret-number) ((< the-number the-secret) 'too-low) ((> the-number the-secret) 'too-high))))

SECRET-NUMBERo The DEFUN form has three arguments:

The name of the function: SECRET-NUMBER. A list of argument names: (THE-NUMBER), which will be bound to the

function's parameters when it is called. The body of the function: (LET ...).

o (secret-number 11) TOO-LOW

o (secret-number 99) TOO-HIGH

o (secret-number 37) THAT-IS-THE-SECRET-NUMBER

o Of course, we can define a function of more than one argument: (defun my-calculation (a b c x) (+ (* a (* x x)) (* b x) c))

MY-CALCULATION (my-calculation 3 2 7 5)

92 LAMBDA

o LAMBDA defines anonymous functions.o Sometimes the function you need is so trivial or so obvious that you don't want to have to

invent a name or worry about whether the name might be in use somewhere else.o For situations like this, Lisp lets we create an unnamed, or anonymous, function using the

LAMBDA form. A LAMBDA form looks like a DEFUN form without the name: (lambda (a b c x) (+ (* a (* x x)) (* b x) c))

o We can't evaluate a LAMBDA form; it must appear only where Lisp expects to find a function – normally as the first element of a form:

(lambda (a b c x) (+ (* a (* x x)) (* b x) c)) Error

((lambda (a b c x) (+ (* a (* x x)) (* b x) c)) 3 2 7 5) 92

DEFMACROo DEFMACRO defines named macros.o The macro body returns a form to be evaluated. In other words, we need to write the body

of the macro such that it returns a form, not a value.o Here are a couple of simple macros to illustrate most of what you need to know:

(defmacro setq-literal (place literal) (setq place `literal))

of 94

SETQ-LITERAL (setq-literal a b)

B a

Bo SETQ-LITERAL works like SETQ, except that neither argument is evaluated.o So in our call to (SETQ-LITERAL A B) above, here's what happens:

Bind PLACE to the symbol A. Bind LITERAL to the symbol B. Evaluate the body `(SETQ ,PLACE ',LITERAL), following these steps:

Evaluate PLACE to get the symbol A. Evaluate LITERAL to get the symbol B. Return the form (SETQ A 'B). Evaluate the form (SETQ A 'B).

o Most forms create only one value.o A form typically returns only one value. o Lisp has only a small number of forms which create or receive multiple values.

VALUESo VALUES create multiple (or no) values.o The VALUES form creates zero or more values:

(values :this) :THIS

(values :this :that) :THIS :THAT

DATA TYPES Lisp almost always does the right thing with numbers. (/ 1 3)

o 1/3 (float (/ 1 3))

o 0.3333333333333333 Characters give Lisp something to read and write. Basic Lisp I/O uses characters. The READ and WRITE functions turn characters into Lisp objects and vice versa. READ-CHAR and WRITE-CHAR read and write single characters. (read) a

o A (read) #\a

o A (read-char) a

o #\a (write ‘a) A

o A (write #\a) #\a

o #\a (write-char #\a) a

o #\a

of 94

(write-char ‘a)o Error: Not a character

You should notice that newline terminates READ input. This is because READ collects characters trying to form a complete Lisp expression. In the example, READ collects a symbol, which is terminated by the newline. The symbol could also have been terminated by a space, a parenthesis or any other character that

can't be part of a symbol. In contrast, READ-CHAR reads exactly one character from the input. As soon as that character is consumed, READ-CHAR completes executing and returns the

character. Lisp represents a single character using the notation #\char, where char is a literal character.

Character LispSpace #\Space

Newline #\NewlineBackspace #\Backspace

Tab #\TabLinefeed #\LinefeedFormfeed #\Page

Carriage return #\ReturnRubout or DEL #\Rubout

Arrayso If we need to organize data in tables of two, three, or more dimensions, you can create an

array: (setq a1 (make-array ‘(3 4)))

#2A((NIL NIL NIL NIL) (NIL NIL NIL NIL) (NIL NIL NIL NIL)) (setf (aref a1 0 0) (list ‘element 0 0))

(ELEMENT 0 0) (setf (aref a1 1 0) (list ‘element 1 0))

(ELEMENT 1 0) (setf (aref a1 2 0) (list ‘element 2 0))

(ELEMENT 2 0) a1

#2A(((ELEMENT 0 0) NIL NIL NIL) ((ELEMENT 1 0) NIL NIL NIL) ((ELEMENT 2 0) NIL NIL NIL))

(aref a1 0 0) (ELEMENT 0 0)

(setf (aref a1 0 1) pi) 3.141592653589793

(setf (aref a1 0 2) "hello") "hello"

(aref a1 0 2) "hello"

o An array's rank is the same as its number of dimensions.o We created a rank-2 array in the above example.o Lisp prints an array using the notation #rankA(...).o The contents of the array appear as nested lists, with the first dimension appearing as the

outermost grouping, and the last dimension appearing as the elements of the innermost grouping.

of 94

o To retrieve an element of an array, use AREF. AREF's first argument is the array; the remaining arguments specify the index along each dimension.

o The number of indices must match the rank of the array.o Vectors are one-dimensional arrays.o We can create a vector using MAKE-ARRAY, and access its elements using AREF.o (setq v1 (make-array ‘(3)))

#(NIL NIL NIL)o (make-array 3)

#(NIL NIL NIL)o (setf (aref v1 0) :zero)

:ZEROo (setf (aref v1 1) :one)

:ONEo (aref v1 0)

:ZEROo v1

#(:ZERO :ONE NIL)o Lisp prints vectors using the slightly abbreviated form #(...), rather than #1A(...).o You can use either a single-element list or a number to specify the vector dimensions to

MAKE-ARRAY -- the effect is the same.o We can create a vector from a list of values, using the VECTOR form:

(vector 34 22 30) #(34 22 30)

o This is similar to the LIST form, except that the result is a vector instead of a list.o We can use AREF to access the elements of a vector, or you can use the sequence-specific

function, ELT: setf v2 (vector 34 22 30 99 66 77))

#(34 22 30 99 66 77) (setf (elt v2 3) :radio)

:RADIO v2

#(34 22 30 :RADIO 66 77) Strings

o Strings are vectors that contain only characters.o We already know how to write a string using the "..." syntax.o Since a string is a vector, you can apply the array and vector functions to access elements of

a string.o We can also create strings using the MAKE-STRING function or change characters or

symbols to strings using the STRING function.o (setq s1 "hello, there.")

"hello, there."o (setf (elt s1 0) #\H))

#\Ho (setf (elt s1 12) #\!)

#\!o s1

"Hello, there!"

of 94

o (string 'a-symbol) "A-SYMBOL"

o (string #\G) "G"

Symbolso Symbols are unique, but they have many values.o We know that a symbol has a unique identity.o A symbol is identical to any other symbol spelled the same way.o We also know that a symbol can have values as a variable and a function, and for

documentation, print name, and properties.o A symbol’s property list is like a miniature database which associates a number of key /

value pairs with the symbol.o For example, if our program represented and manipulated objects, we could store information

about an object on its property list: (setf (get ‘object-1 ‘color) ‘red)

RED (setf (get ‘object-1 ‘size) ‘large)

LARGE (setf (get ‘object-1 ‘shape) ‘round)

ROUND (setf (get ‘object-1 ‘position) ‘(on table))

(ON TABLE) (setf (get ‘object-1 ‘weight) 15)

15 (symbol-plist ‘object-1)

(WEIGHT 15 POSITION (ON TABLE) SHAPE ROUND SIZE LARGE COLOR RED)

(get ‘object-1 ‘color) RED

object-1 Error: no value

o Note that OBJECT-1 doesn't have a value -- all of the useful information is in two places: the identity of the symbol, and the symbol’s properties.

o This could be able to do in a much easier way using structures. Structures

o Structures let us store related data.o A Lisp structure gives us a way to create an object which stores related data in named slots.o (defstruct struct-1 color size shape position weight)

STRUCT-1o (setq object-2 (make-struct-1 :size ‘small :color ‘green :weight 10 :shape ‘square))

#S(STRUCT-1 :COLOR GREEN :SIZE SMALL :SHAPE SQUARE :POSITION NIL :WEIGHT 10)

o (struct-1-shape object-2) SQUARE

o (struct-1-position object-2) NIL

o (setf (struct-1-position object-2) ‘(under table)) (UNDER TABLE)

of 94

o (struct-1-position object-2) (UNDER-TABLE)

o In the example, we defined a structure type named STRUCT-1 with slots named COLOR, SHAPE, SIZE, POSITION, and WEIGHT.

o Then we created an instance of a STRUCT-1 type, and assigned the instance to the variable OBJECT-2.

o The rest of the example shows how to access slots of a structure instance using accessor functions named for the structure type and the slot name.

o Example: (defstruct point x y z)

POINT (defun distance-from-origin (point) (let* ((x (point-x point)) (y (point-y point)) (z

(point-z point))) (sqrt (+ (* x x) (* y y) (* z z))))) DISTANCE-FROM-ORIGIN

(defun reflect-in-y-axis (point) (setf (point-y point) (- (point-y point)))) REFLECT-IN-Y-AXIS

(setf my-point (make-point :x 3 :y 4 :z 12)) #S(POINT X 3 Y 4 Z 12)

(type-of my-point) POINT

(distance-from-origin my-point) 13.0

(reflect-in-y-axis my-point) -4

my-point #S(POINT X 3 Y -4 Z 12)

Type Informationo Type information is apparent at runtime.o A symbol can be associated with any type of value at runtime. For cases where it matters,

Lisp lets we query the type of a value.o (type-of 123)

FIXNUMo (type-of 123456789000)

BIGNUMo (type-of "hello, world")

(SIMPLE-BASE-STRING 12)o (type-of ‘toolbar)

SYMBOLo (type-of ‘(a b c))

CONSo TYPE-OF returns a symbol or a list indicating the type of its argument.

ESSENTIAL INPUT AND OUTPUT Read

o READ accepts Lisp data.o READ turns characters into Lisp data.o So far, we've seen a printed representation of several kinds of Lisp data:

symbols and numbers

of 94

strings, characters, lists, arrays, vectors, and structureso The Lisp reader does its job according to a classification of characters.o The standard classifications are shown below:

Standard Constituent Charactersa b c d e f g h i j k l m n o p q r s t u v w x y z

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z0 1 2 3 4 5 6 7 8 9

! $ % & * + - . / : < = > ? @ [ ] ^ _ { } ~<backspace> <rubout>

Standard Terminating Macro Characters" ' ( ) , ; `

Standard Non-Terminating Macro Characters#

Standard Single Escape Characters\

Standard Multiple Escape Characters|

Standard Whitespace Characters<tab> <space> <page> <newline> <return> <linefeed>

o If READ starts with a constituent character, it begins accumulating a symbol or number.o When READ encounters a terminating macro character or a white space character, it tries

to interpret the collected constituent characters first as a number, then as a symbol. If a numeric interpretation is possible, READ returns the number.

o Otherwise, READ changes the alphabetical characters to a standard case (normally upper case), interns the name as a symbol, and returns the symbol.

o Escape characters play a special role.o A single escape character forces the following character to be treated exactly as a

constituent character.o In this way characters that are normally treated as white space or terminating macro

characters can be part of a symbol.o If READ encounters an escape character, it never attempts to interpret the resulting

constituents as a number, even if only digits were escaped.o If READ starts with a macro character, the character determines the next step:

“ – Read a string. ‘ – Read a form. ( – Read a list. ; – Ignore everything up to new line. # – Decide what to do based on the next character.

Printo PRINT writes Lisp data for us and for READ.

of 94

o The PRINT function changes a Lisp object into the sequence of characters that READ would need to reconstruct it:

(print ‘abc) ABC ABC

(print (list 1 2 3)) (1 2 3) (1 2 3)

(print “A String”) “A string” “A string”

(print 387.9532) 387.9532 387.9532

o PRINT always begins its output with a new line character ( ), and follows its output with a space ( ).

o This ensures that the PRINT output stands apart from any surrounding output, since new line and space are both treated as white space, and cannot be part of the printed representation of a Lisp object (unless escaped).

o Other variations of PRINT have different uses. PRIN1 behaves as PRINT, but does not surround its output with white space.

o This might be useful if we are building up a name from successive pieces, for example. PRINC behaves as PRIN1, but generates output intended for display, rather than READ; for example, PRINC omits the quotes around a string, and does not print escape characters.

o (print ‘a\ bc) |A BC|

o (prin1 ‘a\ bc) |A BC|

o (princ ‘|A BC|) A BC

Ifo “if” allows the execution of a form to be dependent on a single test-form.o First test-form is evaluated. If the result is true, then then-form is selected; otherwise else-

form is selected. Whichever form is selected is then evaluated.o Examples:

(if t 5) 5

(if nil 1 2) 2

o The syntax for the operator if is if predicate (always evaluated) what to do if the predicate is true, ie if the predicate does not evaluate to nil (this

argument is not evaluated if the predicate is false) what to do if the predicate is false, ie if the predicate evaluates to nil (this

argument is not evaluated if the predicate is true)

of 94

o Suppose that we have been asked to write a function which prompts the user for a value, reads it in and prints the result.

o (fahrenheit-to-celsius) Please give a value in degrees F: 32 32 degrees F is 0.0 degrees C.

Formato It is a function for generating formatted output. First argument is a destination.o Specify

NIL for output to a string (like sprintf in C), which format generates and returns. t for output to the listener (like printf in C), in which case format returns nil.

o The next argument is known as the format string. In the same way that printf handles specially any occurrence of the character %, format handles specially any occurrence of the character ~ (pronounced tilde). In particular,

~& means: output a fresh line (i.e. if we weren't already at the start of a line, output a new one).

~a means: take the next of the arguments to format and insert its printed representation here.

o Example: (let ((name “Alok”)) (format nil “Hello, ~a.” name))

“Hello, Alok.”

LET AND LET* LET and LET* create new variable bindings and execute a series of forms that use these

bindings. LET performs the bindings in parallel and LET* does them sequentially. The form (let ((var1 form-1) (var2 form-2) … ) first evaluates the expressions init-form-1, init-

form-2, and so on, in that order, saving the resulting values. Then all of the variables varj are bound to the corresponding values each binding is lexical unless

there is a special declaration to the contrary. The expressions formk are then evaluated in order; the values of all but the last are discarded. LET* is similar to LET, but the bindings of variables are performed sequentially rather than in

parallel. The expression for the form of a var can refer to vars previously bound in the LET*. The form (let* ((var1 form-1) (var2 form-2) ... ) first evaluates the expression form-1, then binds

the variable var1 to that value; then it evaluates form-2 and binds var2, and so on. The expressions formj are then evaluated in order; the values of all but the last are discarded.

For both LET and LET*, if there is not a form associated with a var, var is initialized to nil. The special form LET has the property that the scope of the name binding does not include any

initial value form. For LET*, a variable’s scope also includes the remaining initial value forms for subsequent variable bindings.

Examples:o (setq a ‘big )

BIGo (defun dummy-function ( ) a)

DUMMY-FUNCTIONo (let ((a ‘small) (b a)) (format nil “~S ~S ~S” a b (dummy-function)))

“SMALL BIG BIG”o (let* ((a ‘small) (b a)) (format nil “~S ~S ~S” a b (dummy-function)))

“SMALL SMALL BIG”

of 94

FURTHER LOGIC And

o Takes any number of arguments.o Evaluates them in order until one returns nil, at which point and stops evaluating things

and returns nil.o If the last argument returns non-nil, then and returns that value.o Examples:

(and 1 2 3 4 5) 5

(and 1 2 nil 4 5) NIL

Oro Takes any number of arguments.o Evaluates them in order until one return a non-nil value, at which point or stops

evaluating things and returns that value.o If the last argument returns nil, then or returns nil.o Examples:

(or 1 2 3 4 5) 1

(or nil nil nil 4 nil 5 6 7) 4

Setfo As in the case of vectors and arrays, we can change the value of a variable using setf

function.o (defun look-at-setf (thing) (format t "~&Value supplied was ~a" thing) (setf thing 99)

(format t "~&Value has been changed to ~a" thing) thing) LOOK-AT-SETF

o (look-at-setf 55) Value supplied was 55 Value has been changed to 99 99

TOP LOOP We interact with the Lisp system through a built-in piece of code called the toploop, which

repeats three simple steps for as long as we run the Lisp system:o Read an expression (we provide the expression).o Evaluate the expression just read.o Print the result(s) of the evaluation.

This is also called the "read-eval-print" loop. The toploop also provides a minimal user interface -- a prompt to indicate that it's ready to read a

new expression -- and a way to gracefully catch any errors we might make. If we were to write the Lisp code for a toploop, it would look something like this:

o (loop (terpri) (princ ‘ready>) (print (eval (read)))) (terpri) prints a blank line. (loop ...) executes its forms in order, then repeats (eval ...) returns the result of evaluating a form

The system’s prompt has been by replaced with READY>. Every valid Lisp form we type will be read, evaluated, and printed by our toploop.

of 94

Example:o READY> (cons 1 (cons 2 (cons 3 nil)))

(1 2 3)o READY>

We can get out of this prompt using ‘abort’. In Lisp, the debugger is accessed via a “break loop”. This behaves just like a toploop, but accepts additional commands to inspect or alter the state of

the “broken” computation.

FUNCTION THAT TAKES ONE OR MORE OPTIONAL ARGUMENTS If we want to make a function that takes one or more optional arguments, use the &OPTIONAL

keyword followed by a list of parameter names, like this:o (defun silly-list (p1 p2 &optional p3 p4) (list p1 p2 p3 p4))

SILLY-LISTo (silly-list 'f 'b)

(F B NIL NIL)o (silly-list 'f 'b 'ba)

(F B BA NIL)o (silly-list 'f 'b 'ba 'hi)

(F B BA HI) The optional parameters default to NIL when the call does not supply a value.

RECURSIVE FUNCTIONS A function that calls itself is recursive. The recursive call may be direct (the function calls itself) or indirect (the function calls another

function which -- perhaps after calling still more functions -- calls the original function). Suppose we want to find the factorial of a number:

o (defun factorial (n) (if (eql n 0) 1 (* n (factorial (- n 1))))) This can be done alternatively as follows:

o (defun factorial (n) (cond ((= n 0) 1) (t (* n (factorial (- n 1)))))) This function has two cases, corresponding to the two branches of the COND. The first case says that the factorial of zero is just one -- no recursive call is needed. The second case says that the factorial of some number is the number multiplied by the factorial

of one less than the number -- this is a recursive call which reduces the amount of work remaining because it brings the number closer to the terminating condition of the first COND clause.

The following function calculates the length of list:o (defun length (list) (cond ((null list) 0) (t (1+ (length (rest list))))))

LENGTHo (length '(a b c d))

4 NULL is true for an empty list, so the first COND clause returns zero for the empty list. The second COND clause gets evaluated (if the first clause if skipped) because its condition is T;

it adds one to the result of the recursive call on a list which is one element shorter (a list consists of its FIRST element and the REST of the list.)

Note the similarities between FACTORIAL and MY-LENGTH.

of 94

The base case is always the first in the COND because it must be tested before the recursive case -- otherwise, the recursive function calls would never end.

Suppose we are trying to write a function which takes two arguments - a list and a number - and if the number is = to the any element of the list then return the position of that element in the list. (So if the number matches the first element in the list return 0, if it matches the second return 1, and so on.) If the number isn't found in the list, we'll return nil. Here’s one solution to the problem:

o (defun position-one (list number) (if list (if (= (first list) number) 0 (let* ((pos (position-one (rest list) number))) (if pos (1+ pos) (1+ pos) nil)))))

LOOPS A simple loop looks like the following:

o (loop (print “How are we?”) (return 1) (print “I am fine.”)) “How are we?” 1

RETURN is normally used in a conditional form, like this:o (let ((n 0)) (loop (when (> n 10) (return)) (print n) (prin1 (* n n)) (incf n)))

0 0 1 1 2 4 3 9 4 16 5 25 6 36 7 49 8 64 9 81 10 100 NIL

incf and decf are used for incrementing and decrementing the value respectively.

DOTIMES FOR A COUNTED LOOP To simply loop for some fixed number of iterations, the DOTIMES form is the best choice. The previous example simplifies to:

o (dotimes (n 11) (print n) (prin1 (* n n))) 0 0 1 1 2 4 3 9 4 16 5 25 6 36 7 49 8 64 9 81 10 100 NIL

DOTIMES always returns NIL (or the result of evaluating its optional third argument).

of 94

DOLIST TO PROCESS ELEMENTS OF A LIST Another common use for iteration is to process each element of a list. DOLIST supports this:

o (dolist (item ‘(1 2 4 5 9 17 25)) (format t “~&~D is~:[n’t~;~] a perfect square.~%” item (integerp (sqrt item))))

1 is a perfect square. 2 isn’t a perfect square. 4 is a perfect square. 5 isn’t a perfect square. 9 is a perfect square. 17 isn’t a perfect square. 25 is a perfect square. NIL

FORMATTING Lisp’s FORMAT implements a programming language in its own right, designed expressly for

the purposes of formatting textual output. FORMAT can print data of many types, using various decorations and embellishments. It can print numbers as words or -- for we movie buffs -- as Roman numerals. We can even make portions of the output appear differently depending upon the formatted

variables. FORMAT expects a destination argument, a format control string, and a list of zero or more

arguments to be used by the control string to produce formatted output. Output goes to a location determined by the destination argument. If the destination is T, output goes to *STANDARD-OUTPUT*. The destination can also be a specific output stream. There are two ways FORMAT can send output to a string. One is to specify NIL for the destination: FORMAT will return a string containing the formatted

output. The other way is to specify a string for the destination; the string must have a fill pointer. (defparameter *s* (make-array 0 :element-type ‘character :adjustable t :fill-pointer 0))

o “ ” (format *s* "Hello~%")

o NIL *s*

o “Hello” (format *s* “Goodbye”)

o NIL *s*

o “Helloo Goodbye”

(setf (fill-pointer *s*) 0)o 0

*s*o “ ”

(format *s* “A new beginning”)o NIL

*s*

of 94

o “A new beginning” The call to MAKE-ARRAY with options as shown above creates an empty string that can

expand to accommodate new output. Formatting additional output to this string appends the new output to whatever is already there. To empty the string, we can either reset its fill pointer (as shown) or create a new empty string. FORMAT returns NIL except when the destination is NIL. The format control string contains literal text and formatting directives. Directives are always introduced with a ~ character.

Directive Interpretation~% New Line~& Fresh Line~| Page Break~T Tab Stop~< Justification~> Terminate ~<~C Character~( Case Conversion~) Terminate ~(~D Decimal Integer~B Binary Integer~O Octal Integer~X Hexadecimal Integer~bR Base-b Integer~R Spell An Integer~P Plural~F Floating Point~E Scientific Notation~$ Monetary~A Legibly, Without Escapes~S Readably, With Escapes

DYNAMIC AND GLOBAL VARIABLES Common Lisp provides two ways to create global variables: DEFVAR and DEFPARAMETER. Both forms take a variable name, an initial value, and an optional documentation string. After it has been DEFVARed or DEFPARAMETERed, the name can be used anywhere to refer

to the current binding of the global variable. Global variables are conventionally named with names that start and end with *. Examples of DEFVAR and DEFPARAMETER look like this:

o (defvar *count* 0) o (defparameter *sum* 0.001)

The difference between the two forms is that DEFPARAMETER always assigns the initial value to the named variable while DEFVAR does so only if the variable is undefined.

Practically speaking, we should use DEFVAR to define variables that will contain data we did want to keep even if we made a change to the source code that uses the variable.

After defining a variable with DEFVAR or DEFPARAMETER, we can refer to it from anywhere.

of 94

For instance,o (defun countplus () (incf *count*))

It turns out that that’s exactly what Common Lisp’s other kind of variable -- dynamic variables -- lets us to do.

When we bind a dynamic variable -- for example, with a LET variable or a function parameter -- the binding that’s created on entry to the binding form replaces the global binding for the duration of the binding form.

And it turns out that all global variables are, in fact, dynamic variables. A simple example shows how this works:

o (defvar *x* 10)o (defun fun() (format t "X: ~d~%" *x*))

The DEFVAR creates a global binding for the variable *x* with the value 10. The reference to *x* in fun will look up the current binding dynamically. If we call fun from the top level, the global binding created by the DEFVAR is the only binding

available, so it prints 10.o (fun)

X: 10 But we can use LET to create a new binding that temporarily shadows the global binding, and

fun will print a different value.o (let ((*x* 20)) (fun))

X: 20 Now call fun again, with no LET, and it again sees the global binding.

o (fun) X: 10

Now define another function,o (defun bar () (fun) (let ((*x* 20)) (fun)) (fun))

Note that the middle call to fun is wrapped in a LET that binds *x* to the new value 20. When we run bar, we get this result:

o (bar) X: 10 X: 20 X: 10

As we can see, the first call to fun sees the global binding, with its value of 10. The middle call, however, sees the new binding, with the value 20. But after the LET, fun once

again sees the global binding. As with lexical bindings, assigning a new value affects only the current binding. To see this, we

can redefine fun to include an assignment to *x*.o (defun fun () (format t "Before assignment~18tX: ~d~%" *x*) (setf *x* (+ 1 *x*))

(format t "After assignment~18tX: ~d~%" *x*)) Now fun prints the value of *x*, increments it, and prints it again. If we just run fun, we will see

this:o (fun)

Before assignment X: 10 After assignment X: 11

Not too surprising. Now run bar.o (bar)

Before assignment X: 11 After assignment X: 12

of 94

Before assignment X: 20 After assignment X: 21 Before assignment X: 12 After assignment X: 13

Notice that *x* started at 11 -- the earlier call to fun really did change the global value. The first call to fun from bar increments the global binding to 12. The middle call doesn't see the global binding because of the LET. Then the last call can see the global binding again and increments it from 12 to 13. The name of every variable defined with DEFVAR and DEFPARAMETER is automatically

declared globally special. This means whenever we use such a name in a binding form -- in a LET or as a function

parameter or any other construct that creates a new variable binding -- the binding that's created will be a dynamic binding.

This is why the * naming convention is so important – it’d be bad news if we used a name for what we thought was a lexical variable and that variable happened to be globally special.

If we always name global variables according to the * naming convention, we’ll never accidentally use a dynamic binding where we intend to establish a lexical binding.

DO While DOLIST and DOTIMES are convenient and easy to use, they aren't flexible enough to use

for all loops. (do (variable-definition*) (end-test-form result-form*) statement*) Example:

o (do ((i 0 (1+ i))) ((>= i 4)) (print i))

DEFMACRO DEFMACRO stands for DEFine MACRO. The basic skeleton of a DEFMACRO is quite similar to the skeleton of a DEFUN. (defmacro name (parameter*) “Optional documentation string.” body-form*) Like a function, a macro consists of a name, a parameter list, an optional documentation string,

and a body of Lisp expressions. However, the job of a macro isn’t to do anything directly -- its job is to generate code that will

later do what we want. (defmacro mac1 (a b) ‘(+ ,a (* ,b 3)))

o MAC1 (mac1 4 5)

o 19

APPEND The function APPEND takes any number of list arguments and returns a new list containing the

elements of all its arguments. For instance:

o (append (list 1 2) (list 3 4)) (1 2 3 4)

REDUCING A SEQUENCE The function reduce takes (in the simplest case) a function and a sequence.

of 94

It uses this function first to combine the first two elements of the sequence, then to combine the result with the third element, then to combine this latest result with the fourth element, and so on until the whole sequence has been processed.

Example:o (reduce ‘+ ‘(1 2 3 4 5 6 7))

28

CAR AND CDR OF A LIST car returns the first element of list. cdr returns the rest of the list. Example:

o (car ‘( 1 2 3 4)) 1

o (cdr ‘( 1 2 3 4)) ( 2 3 4)

RPLACA, RPLACD, SETF CIRCULARITY A list is constructed of CONS cells. Each CONS has two parts, a CAR and a CDR. The CAR holds the data for one element of the list, and the CDR holds the CONS that make up

the head of the rest of the list.

By using RPLACA and RPLACD to change the two fields of CONS, we can alter the normal structure of a list. For example, we could splice out the second element of a list like this:

o (defparameter *my-list* (list 1 2 3 4)) *MY-LIST*

o (rplacd *my-list* (cdr (cdr *my-list*))) (1 3 4)

o *my-list* (1 3 4)

CONTRAST EXAMPLE: PUSH AND DELETE Here's an example showing DELETE and PUSH:

o (defparameter *my-list* (list 1 2 3 4)) *MY-LIST*

o (delete 3 *my-list*) (1 2 4)

o *my-list* (1 2 4)

o (delete 1 *my-list*) (2 3 4)

o *my-list* (1 2 3 4)

But some macros, for example PUSH and POP, take a place as an argument and arrange to update the place with the correct value.

of 94

For example:o (defparameter *stack* ())

*STACK*o (push 3 *stack*)

(3)o (push 2 *stack*)

(2 3)o (push 1 *stack*)

(1 2 3)o *stack*

(1 2 3)o (pop *stack*)

1o *stack*

(2 3)

COMPARISONS Not All Comparisons are Equal. Lisp has a core set of comparison functions that work on virtually any kind of object. These are:

o EQo EQLo EQUALo EQUALP

The tests with the shorter names support stricter definitions of equality. The tests with the longer implement less restrictive, perhaps more intuitive, definitions of

equality. EQ

o EQ is true for identical symbols.o In fact, it’s true for any identical object. In other words, an object is EQ to itself. Even a

composite object, such as a list, is EQ to itself. (But two lists are not EQ just because they look the same when printed; they must truly be the same list to be EQ.) Under the covers, EQ just compares the memory addresses of objects.

o EQ is not guaranteed to be true for identical characters or numbers.o This is because most Lisp systems don’t assign a unique memory address to a particular

number or character; numbers and characters are generally created as needed and stored temporarily in the hardware registers of the processor.

EQLo EQL is also true for identical numbers and characters.o EQL retains EQ’s notion of equality, and extends it to identical numbers and characters.o Numbers must agree in value and type; thus 0.0 is not EQL to 0. Characters must be truly

identical; EQL is case sensitive. EQUAL

o EQUAL is usually true for things that print the same.o EQ and EQL are not generally true for lists that print the same.o Lists that are not EQ but have the same structure will be indistinguishable when printed;

they will also be EQUAL.

of 94

o Strings are also considered EQUAL if they print the same. Like EQL, the comparison of characters within strings is case-sensitive.

EQUALPo EQUALP ignores number type and character case.o EQUALP is the most permissive of the core comparison functions. Everything that is

EQUAL is also EQUALP. But EQUALP ignores case distinctions between characters, and applies the (typeless) mathematical concept of equality to numbers; thus 0.0 is EQUALP to 0.

o Furthermore, EQUALP is true if corresponding elements are EQUALP in the following composite data types:

Arrays Structures Hash Tables

Longer tests are slower; know what we are comparing. The generality of the above longer-named tests comes with a price. They must test the types of their arguments to decide what kind of equality is applicable; this

takes time. EQ is blind to type of an object; either the objects are the same object, or they're not. This kind

of test typically compiles into one or two machine instructions and is very fast. We can avoid unnecessary runtime overhead by using the most restrictive (shortest-named) test

that meets we need.

CHAP 3 :NEURAL NETWORK & FUZZY SYSTEMSCHAP 3 :NEURAL NETWORK & FUZZY SYSTEMS

WHAT DO WE SEE ON THE FOLLOWING PICTURE?

We ‘recognize’ an illusionary bright cross. But technically, there is no cross. Only four squares within a square. Such a square exists only in our brain, not on the screen. The real time interaction of millions of neurons in our brain is behind this cross like perception. The asynchronous, nonlinear, massively parallel, distributed neurons perform such recognition

under uncertainty. We reason something with vague concepts, beliefs, estimates, guesses etc. This inexactness is called fuzzy. Though we use exact scientific tools in day to day decision making, the final control remains

fuzzy. E.g. medical diagnosis. This we may casually refer as experience, judgment, sixth sense etc.

LOGIC

of 94

Bivalent. On or Off. True or False. Present or Absent A or Not A.

BIVALENT LOGIC CREATES PARADOXES A man says “Don’t trust me”. Can we trust him? One side of a card says “The sentence on the other side is true” and the other side of the card

says “The sentence on the other side is not true”. Which side is true? A speaker tells “I lie”. Does he tell the truth? A liar says all his friends are liars. Does he lie? A barber shaves everybody who cannot shave themselves. Can he shave himself?

BIVALENT PARADOXES AS FUZZY MID POINTS The paradoxes have the same property. A statement S and its negation have the same truth value i.e. t(S) = t (not S) i.e. t(S) = 1 - t (S) If S is true t(S) = 1, then 1 = 0 … If S is false t(S) = 0, then 0 = 1 … The fuzzy interpretation takes 2 * t(S) = 1 Giving t(S) = ½ Thus paradoxes reduce to literal half truths. Or, mathematically the midpoint of the interval [0, 1]. Thus, fuzziness means multi-valuedness or multivalence. Three valued fuzziness corresponds to true, false and indeterminent. Or present, absent and ambiguous. ‘Heap of sand’ problem can also be tackled using fuzzy approach. Thus fuzziness reduces the ‘black and white’ rigidity of bivalent logic to a multi-valued ‘gray

areas’ between black and white. TRUE and FALSE becomes two limiting cases of a band of indeterminacy.

FUZZY SYSTEMS: HISTORY In 300 years BC, Aristotle came up with binary logic involving the numbers 0 and 1. It came down to one law: True or False. Later Plato questioned the rationale of it by proposing a third region which is beyond True or

false. Buddha supported this argument by stating that world as it is, filled with contradictions, with

things and not things. Philosophers like Hegel, Marx and Engels supported the school of thought of many valued logic. In 1900, Lukasiewich first proposed three value logic along with the mathematics to accompany

it. The third value he proposed was “possible” and he assigned it a numerical value between True

and False. In 1965 Dr. Lotfi A Zadeh published his work “Fuzzy Sets”, which described the mathematics of

fuzzy set theory. This theory proposed the membership function operate over the range of real numbers [0, 1], in

place of 0 and 1 as followed by Boolean Set theory. The indicator function of a non fuzzy set A is given by IA = 1 if x ε A

of 94

= 0 if x ε A. Zadeh extended this function to multi value membership function mA: x [0, 1]. This membership function measures the degree to which element x belongs to the set A. mA(x) = Degree (x ε A) mA(x) = 0 denotes that x is not a member of the set; mA(x) = 1 denotes that x is definitely a member; And all other values denote degrees of membership.

FUZZY SYSTEMS: EXAMPLE Let us consider the example of TALL to illustrate fuzzy set. We can assign a degree of membership to each person in the fuzzy set TALL as follows: Tall(x) = 0, if height(x) < 5’ = (height(x) - 5’)/2, if 5’ <= height(x) <= 7’ = 1, if height(x) > 7’. The heights and degrees of membership of each person can be shown as follows:

Person Height Degree of MembershipA 5’ 2” 0.10B 5’ 3” 0.15C 5’ 5” 0.25D 5’ 7” 0.35E 6’ 1” 0.54

SOME APPLICATIONS / PRODUCTS Railway subway in Sendai, Japan where train’s movements are controlled by fuzzy controlled

systems. Omron – Camera aiming for the telecast of sporting events. Hitachi – washing machine with single button control. Sony – pocket computers with handwritten recognition.

FUZZY SYSTEMS & NEURAL NETWORKS Process inexact info inexactly. Share the same mathematical foundation. Each neuron emits a bounded signal just like a bounded set value. A set of n neurons defines a family of n-directional continuous or fuzzy sets. At each instant the n-vector neural output defines a fuzzy unit. Each fuzzy unit indicates the degree to which the neuron belongs to the n-dimensional fuzzy set. The neuronal state pace (the set of all n-possibilities) equals the set of all n-dimensional fit

vectors (the fuzzy power set), given by In = [0, 1] * [0, 1] * … * [0, 1]. This power set has 2n vertices, which is an n-dimensional unit cube.

NON FUZZY SET TO FUZZY SET: 1 DIMENSION Consider a non fuzzy set X = {x1}, contains only one element. The power set of X = {Ø, {x1}} where Ø = 0 and {x1} = 1, the binary bits. The corresponding fuzzy set contains all the values from 0 to 1. Thus,

of 94

NON FUZZY SET TO FUZZY SET: 2 DIMENSIONS Consider a non fuzzy set X = {x1, x2}, contains elements. The power set of X = {Ø, {x1}, {x2}, {x1, x2}} where Ø = (0, 0), {x1} = (1, 0), {x2} = (0, 1)

and {x1, x2} = (1, 1). The points correspond to vertices of a unit square. Thus,

NON FUZZY SET TO FUZZY SET: 3 DIMENSIONS Consider a non fuzzy set X = {x1, x2, x3}, contains 3 elements. The power set of X = {Ø, {x1}, {x2}, {x3}, {x1, x2}, {x1, x3}, {x2, x3}, {x1, x2, x3}} where Ø

= (0, 0, 0), {x1} = (1, 0, 0), {x2} = (0, 1, 0), {x3} = (0, 0, 1), {x1, x2} = (1, 1, 0), {x1, x3} = (1, 0, 1), {x2, x3} = (0, 1, 1) and {x1, x2, x3} = (1, 1, 1).

The points correspond to vertices of a unit cube. Thus,

NON FUZZY SET TO FUZZY SET: N DIMENSION In general, a n-vector non-fuzzy set value corresponds to an n-dimensional fit vector in the n-

cube In = [0, 1] * [0, 1] * … * [0, 1]. The mid point (1/2, 1/2… 1/2) of this n–cube corresponds to the paradoxes of logic where truth

and falsity has same value.

FIT VECTOR: EXAMPLE 1-dimension: 1/3 2-dimension: (1/3, 2/5) where mA (x1) = 1/3 and mA (x2) = 2/5. 3-dimension: (1/3, 2/5, 3/4) where mA (x1) = 1/3, mA (x2) = 2/5 and mA (x3) = 3/4.

SUBSETHOOD THEOREM Subsethood measures the degree to which set A B and is denoted by S (A, B). S(A, B) = Degree (A C B) = M(A ∩ B) / M(A) = P(B/A) where M(A) denotes the fuzzy of count

of fit vector, i.e. if A = (a1, a2, …, an) the M(A) = a1 + a2 + … + an where 0 <= S(A, B) <= 1. Question: Apply subsethood theorem for R3 with A = (3/4, 1/3, 1/6) and B = (1/4, 1/2, 1/3). Answer: X = R3 where X = {x1, x2, x3}, contains 3 elements.

of 94

0 1 0 1

(0, 1)

(1, 1)

(0, 1)

(0, 1)

(0, 1)

(1, 1)(0, 1)

(0, 1)

The power set of X = {(0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1), (1, 1, 0), (1, 0, 1), (0, 1, 1), (1, 1, 1)} Consider the fuzzy subset B = (1/4, 1/2, 1/3). If A and B are fuzzy sets, then mA (A ∩ B) = min(mA(A), mA(B)) and mA(A U B) = max(mA (A),

mA (B)). (A ∩ B ) = min(A, B) = (1/4, 1/3, 1/6) M(A ∩ B ) = 1/4 + 1/3 + 1/6 = 3/4. M(A) = 3/4 + 1/3 + 1/6 = 5/4. S(A, B) = M(A ∩ B) / M(A) = (3/4) / (5/4) = 3/5 = 60%.

PROBABILITY AS SUBSETHOOD Consider a statistical experiment X of n trials. Suppose A defines the subset of successful trials. Let there be nA successes out of n trials. Let 1 denote success and 0 denote failure. S (A, X) = M (A ∩ X) / M (X) = M (A) / M (X) = nA / n = P (A). Thus probability reduces to subsethood.

DYNAMIC SYSTEMS APPROACH Dynamic systems are common. Electrical engineering: Signal processing, filtering … Computer science: algorithms, robotics ... Maths: functions, Statistical estimation … Philosophy: thinking, action … Biology: neuroscience, evolution … Economics: market equilibrium, game theory … Anthropology: culture … All receive stimuli and adapt to give responses. Thus, Brain = A dynamic System.

BRAIN BIOLOGICAL NEURAL SYSTEMS

of 94

The brain is composed of approximately 100 billion (1011) neurons. A typical neuron collects signals from other neurons through a host of fine structures called

dendrites. The neuron sends out spikes of electrical activity through a long, thin strand known as an axon,

which splits into thousands of branches. At the end of the branch, a structure called a synapse converts the activity from the axon into

electrical effects that inhibit or excite activity in the connected neurons. When a neuron receives excitatory input that is sufficiently large compared with its inhibitory

input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one

neuron on the other changes.

BRAIN = A DYNAMIC SYSTEM We can duplicate working of brain in machines. These machines are expected to work smarter. This ‘smartness’ is referred as machine intelligence. Artificial Neural networks and Fuzzy systems are two adaptive machine intelligent systems.

NEURAL AND FUZZY SYSTEMS AS FUNCTION ESTIMATORS Neural Networks and Fuzzy Systems estimate I/O functions. Both are model free in I/O analysis and so the same architecture can be used for different

problems. Both are trainable dynamical systems using sample data. These sample information is encoded in parallel – distributed framework. Neural Networks have the property of ‘recognition without definition’ and learn from previous

experiences. E.g. child. This property helps in generalizing and better learning. E.g. NL development. Distributed encoding in Neural Networks helps in recognizing partial patterns, fault tolerance

and graceful degradation. Neural Networks contain a collection of processing units called neurons. Neurons work as I/O functions and synapses (joints) work as adjustable weights. Thus Neural Networks behave as adaptive function estimators.

NEURAL NETWORKS AS TRAINABLE DYNAMICAL SYSTEMS Network activity in Neural Network follows a trajectory in the state space of all possibilities.

of 94

Each point in the state space is a possible Neural Network configuration. Corresponding to an input, trajectory begins and with a solution trajectory ends. E.g. pattern

recognition. Here synaptic values gradually change to learn new patterns.

ARTIFICIAL NEURAL NETWORKS Artificial neurons are analogous to their biological inspirers.

Here the neuron is actually a processing unit, it calculates the weighted sum of the input signal to the neuron to generate the activation signal a, given by

Where wi is the strength of the synapse connected to the neuron, xi is an input feature to the neuron. The activation signal is passed through a transform function to produce the output of the neuron,

given by

The transform function can be linear, or non-linear, such as a threshold or sigmoid function

CLASSIFICATION OF NEURAL NETWORK MODELS We can classify Neural Networks based on whether they learn with supervision or whether they

contain feedback.

Supervised & Feed ForwardPerceptron

LMSBack Propagation

Supervised & Feed BackRecurrent Back Propagation

Unsupervised & Feed ForwardSelf-Organizing Map

Data Clustering

Unsupervised & Feed BackBoltzmann LearningHopfield Network

INTELLIGENT BEHAVIOR AS ADAPTIVE MODEL FREE ESTIMATION Intelligent systems adaptively functions from data without a model for I/O processing. Living creatures respond to stimuli. Or, we map stimuli to responses (like f: X Y).

of 94

Intelligent systems associate similar responses with similar stimuli. They produce minor changes if the inputs are changed slightly.

INTELLIGENT SYSTEMS GENERALIZE Let S and R be the spaces of stimuli and responses. Consider balls Bx and By. Here f (Bx) = By.

For every similar y’ in By, we can find some similar stimulus x’ in Bx such that y’ = f(x’). That is, f is an onto map.

INTELLIGENT SYSTEMS ARE CREATIVE The measure of creativity of f is given by CBx(f) = V(By) / V(Bx), V – volume. Case 1: CBx(f) = 0.

o => V(By) = 0 or V(Bx) = infinity.o => By = 0 or V(Bx) = infinity.o => f is a constant function or Bx is of infinite radius.o => f is “dull” or stimuli overwhelm responses.

Case 2: CBx(f) = infinity.o => Infinite radius for By.o => f emits infinite responses.

Small variations in input provide simplest novel stimuli. This manifests as creativity.

LEARNING AS CHANGE Intelligent systems learn or adapt. Learning or adaptation means just parameter change. The parameter may be:

o Average transmission rate at synaptic junctions.o Gene frequency at the locus of chromosome.

Practically, learning means change. So, learning laws could describe a dynamic system. We can impart any system to encode or decode information. E.g. mowing of lawn of grass. Lawn = brain. Supervised learning uses class – membership information. It can ‘know’ that ‘belong’ and ‘not belong’. E.g. speech recognition system at airport – supervised using ‘carrot and stick’ policy. Unsupervised systems adaptively cluster, like patterns with like patterns. Biological synapses learn without supervision.

SYMBOLS VS NUMBERS: RULES VS PRINCIPLES We cannot mathematically define our behavior. E.g. emotions. Natural languages follow this approach. Some languages are developed as articulated languages. E.g. ‘Esperanto’, ‘Interlingua’.

of 94

yx

Computer programming languages follow articulation. For example, Lisp and Prolog process symbols within a frame work of bivalent logic and

propositional rules.

EXPERT SYSTEM KNOWLEDGE AS RULE TREES AI systems store and process propositional rules. The rules have the form:

o IF (CONDITION) THEN (ACTION) E.g. water jug, IF (4, 0) THEN (4, 3). The collection of all such rules for a given problem is called knowledge base. The knowledge base can be put in a tree format as shown

By searching through the knowledge tree we can perform inference (process of finding solution). There are two forms of inference:

o Forward chainingo Backward chaining

FORWARD CHAINING Forward chaining starts with some initial information and work forward, attempting to match

that information with a rule. Once a fact has been matched to the IF part of the rule, the rule is fired. The action could produce new knowledge or a new fact that is stored in the knowledge base. This new fact may then be used to search out the next appropriate rule. This searching and matching process continues until a final conclusion rule is fired.

BACKWARD CHAINING Backward chaining starts with a fact in the database, but this time it is the hypothesis. The rule interpreter then begins examining the THEN parts of rules for a match. The inference engine searches for an evidence to support the hypothesis originally stated. If a match is found, the database is updated recording the conditions or premises that the rule

stated as necessary for supporting the matched conclusion. The chaining process continues with the system repeatedly attempting to match the right hand

side of the rule against the current status of the system. The corresponding IF sides of the rules matched are used to generate new intermediate

hypotheses or goal states which are recorded in the database. Again this backward chaining continues until the hypothesis is proved.

REMARK

of 94

(0, 0)

(4, 0) (0, 3)

(0, 0)(4, 3) (3, 0)(0, 0) (1, 3)(4, 3)

The choice of inference strategy with either forward or backward chaining is determined by the design of the system and the nature of the problem being solved.

In large systems with many rules, the forward chaining or data driven approach may be too slow, as it will generate many sequences of rules.

The search, as a result, can go off on undesired directions exploring alternatives that do not fit in the problem.

In such cases a backward chaining or goal driven approach may be advantageous. On the other hand, a backward chaining process could get a fixation on a particular hypothesis

and continue to explore it though the data available to support it may not be there. The system does not know when to switch the emphasis or context to a more appropriate search

sequence. Some expert systems incorporate both forward and backward chaining. This speeds up the process and ensures a solution. The concurrent forward-backward search

rapidly converges on an answer.

BRAIN = COMPUTER Consider Brain as computer:

o Language strings thoughts.o Programming learning.o Logical inference evolution.o Feed forward search through knowledge feed back from previous experience.

DIFFERENT MODEL FREE ESTIMATORS Model free = output mathematically not depending on inputs.

FRAMEWORK

KNOWLEDGESymbolic Numeric

Structured AI Expert Systems Fuzzy SystemsUnstructured ----- Neural Networks

FUZZY SYSTEMS AS STRUCTURED NUMERIC ESTIMATORS Fuzzy systems encode structured knowledge in a numeric framework. We can enter a fuzzy association like (TALL, HEIGHT) as an entry in FAM (Fuzzy Associative

Memory) rule matrix. FAM rule is an I/O map.

FAM RULES: E.G. FUZZY CONTROL OF A PENDULUM Let Θ – angle of pendulum, δΘ – angular velocity of pendulum, v – current to the motor control

that adjusts pendulum. All variables are fuzzy and v is the output and others are inputs. Each variable has 5 fuzzy set values:

o Negative Medium (NM)o Negative Small (NS)o Zero (ZE)o Positive Small (PS)o Positive Medium (PM)

FAM RULES OF PENDULUM

of 94

Θ

δΘ

NM NS ZE PS PMNM PMNS PSZE PM PS ZE NS NMPS NSPM NM

Usually fuzzy set values are defined as trapezoids. E.g. angle value 0 belongs to fuzzy value ZE to degree 1. The angle value 3 may belong to ZE only to degree say 0.6

GENERATING FUZZY RULES WITH PRODUCT SPACE CLUSTERING Pendulum case: 2 inputs, 1 output. Product space = R3

Each I/O triplet (Θ, δΘ, v) is a point in R3

Movement pendulum defines a trajectory and v = ZE corresponds to (0, 0, 0). Each fuzzy variable has 5 fuzzy subset of x, y, z coordinates. The Cartesian product of these subsets 5 . 5 . 5 = 125 FAM cells. Most systems pass through only few of these cells. Each FAM cell corresponds to a FAM rule.

FUZZY SYSTEMS AS PARALLEL ASSOCIATORS Fuzzy Systems store and process FAM rules in parallel. B = Σ WJ BJ

Adaptive Fuzzy Systems use sample data and neural / statistical algorithms to choose the coefficients.

If the input fuzzy system define points in the unit hypercube In and output fuzzy system define points in the unit hypercube Ip then the transformation S defines a Fuzzy System: if S: In Ip

S defines an adaptive Fuzzy System if it changes with time i.e. dS/dt != 0.

FUZZY SYSTEMS AS PRINCIPLE BASED SYSTEMS AI expert systems work through rules. Inference is performed by traversing through the decision tree. The tree can be shallow or deep. E.g. shallow – chess, deep – water jug. Shallow tree use only a small proportion of the stored knowledge in the inference. In that sense, they are non-interactive. Fuzzy systems are shallow but interactive. Every inference fires every FAM rule to some degree. AI expert systems use rule based approach. But fuzzy systems use principle based approach. E.g. AI vs. fuzzy judge. Rules apply “in an all-or-none” fashion. Principles have a dimension of weight or importance. Principles evolve, but rules are static. Adaptive Fuzzy Systems use neural techniques to abstract fuzzy principles from samples.

of 94

This is similar to our acquisition of knowledge.

NEURONS AS FUNCTIONS Neurons transforms an activation x(t) into a bounded output signal S(x(t)). Usually a sigmoid function is used for this purpose.

EFFECT OF SIGNAL FUNCTION

Where Θ = wn+1

Usually signal functions are monotone non-decreasing i.e. dS/dt >= 0. But, dS/dt = (dS/dx) (dx/dt) i.e. signal velocity depends on activation velocity.

CHAP 4: NEURAL NETWORK THEORYCHAP 4: NEURAL NETWORK THEORY

NEURONS AS FUNCTIONS Neurons transforms an activation x(t) into a bounded output signal S(x(t)). Usually a sigmoid function is used for this purpose.

EFFECT OF SIGNAL FUNCTION

of 94

Where Θ = wn+1

Usually signal functions are monotone non-decreasing i.e. dS/dt >= 0. But, dS/dt = (dS/dx) (dx/dt) i.e. signal velocity depends on activation velocity.

NEURON FIELDS A field of neuron is a topological grouping. E.g. closeness or proximity. In human, volume proximity offers a field. We denote fields of neurons as Fx, Fy, Fz etc.

NEURONAL STATE SPACE Consider a network with only two fields – the input field Fx of dim ‘n’ and output field Fy of

dim ‘p’. Then the state space of the system is given by:

o X(t) = (x1(t), x2(t), …, xn(t))o Y(t) = (y1(t), y2(t), …, yn(t))

Thus state space of X(t) = Rn. Thus state space of Y(t) = Rp. Thus state space of NN = Rn * Rp.

SIGNAL STATE SPACE AS HYPERCUBES The signal state space S(X) of field Fx is given by:

o S(X(t)) = (S1X(x1(t), S2

X(x2(t), …, SnX(xn(t))

SiX denotes the signal function of the Ith neuron in the field Fx.

S(X) consists of all possible signal spaces. Since signal functions are bounded, S(X) is an n dimensional hypercube. If the range of the signal function is [0, 1], then the signal state space S(X) is In = [0, 1]n

The unit hypercube In also defines the fuzzy power set F (2X) of a fuzzy set X of n elements.

COMMON SIGNAL FUNCTIONS Logistical signal function

of 94

o S(x) = 1 / (1 + e–cx)o S’ = c S (1 - S)o S’ > 0o => S is monotonic increasing.

Hyperbolic – tangent signal functiono S(x) = tanh (c x)o S’ = c (1 - S2)o S’ > 0o => S is monotonic increasing.

Threshold linear signal functiono It is binary function.o S(x) = 1, if c x >= 1o = 0, if c x < 0o = c x, otherwiseo S’ = co => S’ > 0

o => S is monotonic increasing.

Linear signal functiono S(x) = c xo S’ = co => S’ > 0o => S is monotonic increasing.

Threshold exponential signal functiono S(x) = min(1, ecx)o S’ = c ecx, if ecx < 1o => S’ > 0o => S is monotonic increasing.

Exponential distribution signal functiono S(x) = max(0, 1 - e-cx)o S’ = c e-cx, if x > 0o => S’ > 0o => S is monotonic increasing.o And S’’ = c2 e-cx < 0o => Strictly convex.

Ratio polynomial signal functiono S(x) = max(0, (xn / (c + xn))o S’ = c n xn-1 / ( c + xn)2, x > 0o => S’ > 0o => S is monotonic increasing.

CHAP 5: A GENTLE INTRODUCTION TOCHAP 5: A GENTLE INTRODUCTION TOGENETIC ALGORITHMSGENETIC ALGORITHMS

Genetic Algorithms are search and optimization techniques based on Darwin’s Principle of Natural Selection.

DARWIN’S PRINCIPLE OF NATURAL SELECTION

of 94

IF there are organisms that reproduce, and IF offspring’s inherit traits from their progenitors, and IF there is variability of traits, and IF the environment cannot support all members of a growing population, THEN those members of the population with less-adaptive traits (determined by the environment) will die out, and THEN those members with more-adaptive traits (determined by the environment) will thrive.

The result is the evolution of species.

EVOLUTION The context of evolution is a population (of organisms, objects, agents ...) that survives for a

limited time (usually) and then dies. Some produce offspring for succeeding generations, the ‘fitter’ ones tend to produce more. Over many generations, the make-up of the population changes. Without the need for any individual to change, successive generations, the ‘species’ changes, in

some sense (usually) adapts to the conditions.

REQUIREMENTS Heredity

o Offspring are (roughly) identical to their parents. Variability

o Except not exactly the same, some significant variation. Selection

o The ‘fitter’ ones are likely to have more offspring. Variability is usually random and undirected. Selection is usually un-random and directed. In natural evolution the ‘direction’ of selection does not imply a conscious director. In artificial evolution often we are the director.

BASIC IDEA OF PRINCIPLE OF NATURAL SELECTION “Select The Best, Discard The Rest”.

AN EXAMPLE OF NATURAL SELECTION Giraffes have long necks.

o Giraffes with slightly longer necks could feed on leaves of higher branches when all lower ones had been eaten off:

They had a better chance of survival. Favorable characteristic propagated through generations of giraffes. Now, evolved species has long necks.

o Longer necks may have been a deviant characteristic (mutation) initially but since it was favorable, was propagated over generations.

o Now an established trait.o So, some mutations are beneficial.

NATURE TO COMPUTER MAPPING

Nature ComputerPopulation Set of solutionsIndividual Solution to a problem

Fitness Quality of a solution

of 94

Chromosome Encoding of a solutionGene Part of the encoding of a solution

Reproduction Crossover, mutation

EVOLUTION THROUGH NATURAL SELECTION

CLASSES OF SEARCH TECHNIQUES

of 94

Initial Population of Animals

Struggle for Existence – Survival of the Fittest

Surviving Individuals Reproduce, Propagate Favorable Characteristics

Evolved Species

(Favorable Characteristics Now a Trait of Species)

Millions of Years

SEARCH METHODS Blind random search does not use acquired information in deciding on the future direction of the

search. Hill combing and gradient descent use acquired information; however, they are prone to becoming

trapped on local optima.

THE GENETIC ALGORITHM Directed search algorithms based on the mechanics of biological evolution. Developed by John Holland, University of Michigan (1970’s). To understand the adaptive processes of natural systems. To design artificial systems software that retains the robustness of natural systems. Provide efficient, effective techniques for optimization and machine learning applications. Widely – used today in business, scientific and engineering circles.

GENETIC ALGORITHMS VS. CONVENTIONAL OPTIMIZATION TECHNIQUES

of 94

Search Techniques

Calculus Based Techniques Guided Random Techniques Enumerative Techniques

DirectMethods

IndirectMethods

Fibonacci Newton

EvolutionaryAlgorithms

SimulatedAnnealing

DynamicProgramming

EvolutionaryStrategies

GeneticAlgorithms

Parallel Sequential

Centralized Distributed Steady-state Generational

Direct manipulation of a coding. Search from a population, not from a single point. Search via blind search. Search using probabilistic not deterministic rules. Does not use a knowledgebase. Biologically inspired.

COMPONENTS OF A GENETIC ALGORITHM A problem to solve, and ...

o Encoding technique (gene, chromosome)o Initialization procedure (creation)o Evaluation function (environment)o Selection of parents (reproduction)o Genetic operators (mutation, recombination / reproduction)o Parameter settings (practice and art)

WORKING MECHANISM OF GENETIC ALGORITHMS

SIMPLE GENETIC ALGORITHM

Simple_Genetic_Algorithm(){

Initialize the Population;Calculate Fitness Function;

of 94

Begin

InitializePopulation

EvaluateSolution

OptimumSolution?

Stop

Selection

Crossover

Mutation

T = T + 1

T = 0

N

Y

While(Fitness Value != Optimal Value){

Selection; //Natural Selection, Survival Of FittestCrossover; //Reproduction, Propagate favorable characteristicsMutation; //Mutation

Calculate Fitness Function;}

}

BASIC GENETIC ALGORITHM

THE GENETIC ALGORITHM CYCLE OF REPRODUCTION

POPULATION

Chromosomes could be:o Bit strings (0101 ... 1100)o Real numbers (43.2 -33.1 ... 0.0 89.2)

of 94

Produce OffspringFrom Parents

Select FitterFor Parents

Evaluate AllFitness

CurrentPopulation

Reproduction Modification

EvaluationPopulation

Discard

Children

ModifiedChildren

EvaluatedChildren

DeletedMembers

Parents

Population

o Permutations of element (E11 E3 E7 ... E1 E15)o Lists of rules (R1 R2 R3 ... R22 R23)o Program elements (genetic programming)o ... any data structure ...

REPRODUCTION

Parents are selected at random with selection chances biased in relation to chromosome evaluations.

CHROMOSOME MODIFICATION

Modifications are stochastically triggered. Operator types are:

o Mutationo Crossovero Reproduction (recombination)

EVALUATION

The evaluator decodes a chromosome and assigns it a fitness measure. The evaluator is the only link between a classical Genetic Algorithm and the problem it is

of 94

Reproduction

Population

Children

Parents

Modification

Evaluation

Children

ModifiedChildren

Evaluation

ModifiedChildren

EvaluatedChildren

solving.

DELETION

Generational Genetic Algorithmo Entire population is replaced with each iteration.

Steady-state Genetic Algorithm:o A few members replaced each generation.

MUTATION: LOCAL MODIFICATION

Causes movement in the search space (local or global). Restores lost information to the population.

CROSSOVER

Crossover is a critical feature of genetic algorithms:o It greatly accelerates search early in evolution of a population.o It leads to effective combination of schemata (sub-solutions on different chromosomes).

AN ABSTRACT EXAMPLE

Distribution of Individuals in Generation 0

of 94

Population

Discard

DeletedMembers

Before: (1 0 1 1 0 1 1 0)

After: (1 0 1 0 0 1 1 0)Before: (1.38 -69.4

326.44 0.1)After: (1.38 -67.5 326.44 0.1)

P1 (0 1 1 0 1 0 0 0)P2 (1 1 0 1 1 0 1 0)

(0 1 0 0 1 0 0 0) C1(1 1 1 1 1 0 1 0) C2

Distribution of Individuals in Generation N

A SIMPLE EXAMPLE

GENETIC ALGORITHM OPERATORS AND PARAMETERS Encoding

o The process of representing the solution in the form of a string that conveys the necessary information.

o Just as in a chromosome, each gene controls a particular characteristic of the individual; similarly, each bit in the string represents a characteristic of the solution.

Encoding Methodso Binary Encoding

Most common method of encoding. Chromosomes are strings of 1s and 0s and each position in the chromosome

represents a particular characteristic of the problem.

Chromosome A 10110010110011100101Chromosome B 11111110000000011111

o Permutation Encoding Useful in ordering problems such as the Traveling Salesman Problem (TSP). Example, in TSP, every chromosome is a string of numbers, each of which

represents a city to be visited.

Chromosome A 1 5 3 2 6 4 7 9 8Chromosome B 8 5 6 7 2 3 1 4 9

o Value Encoding Used in problems where complicated values, such as real numbers, are used and

where binary encoding would not suffice. Good for some problems, but often necessary to develop some specific crossover

of 94

and mutation techniques for these chromosomes.

Chromosome A 1.235 5.323 0.454 2.321 2.454Chromosome B (left), (back), (left), (right), (forward)

FITNESS FUNCTION A fitness function quantifies the optimality of a solution (chromosome) so that that particular

solution may be ranked against all the other solutions. A fitness value is assigned to each solution depending on how close it actually is to solving the

problem. Ideal fitness function correlates closely to goal + quickly computable. Example. In TSP, f(x) is sum of distances between the cities in solution. The lesser the value, the

fitter the solution is.

ROULETTE WHEEL SELECTION Each current string in the population has a slot assigned to it which is in proportion to its fitness. We spin the weighted roulette wheel thus defined n times (where n is the total number of

solutions). Each time the Roulette Wheel stops, the string corresponding to that slot is created. Strings that are fitter are assigned a larger slot and hence have a better chance of appearing in the

new population.

EXAMPLE OF ROULETTE WHEEL SELECTION

No. String Fitness % of Total1 01101 169 14.42 11000 576 49.23 01000 64 5.54 10011 361 30.9

Total 1170 100.0

CROSSOVER It is the process in which two chromosomes (strings) combine their genetic material (bits) to

produce a new offspring which possesses both their characteristics. Two strings are picked from the mating pool at random to cross over. The method chosen depends on the Encoding Method.

CROSSOVER METHODS

of 94

Single Point Crossovero A random point is chosen on the individual chromosomes (strings) and the genetic

material is exchanged at this point.

Chromosome 1 11011 | 00100110110Chromosome 2 11011 | 11000011110

Offspring 1 11011 | 11000011110Offspring 2 11011 | 00100110110

Two Point Crossovero Two random points are chosen on the individual chromosomes (strings) and the genetic

material is exchanged at these points.

Chromosome 1 11011 | 00100 | 110110Chromosome 2 10101 | 11000 | 011110

Offspring 1 10101 | 00100 | 011110Offspring 2 11011 | 11000 | 110110

NOTE: These chromosomes are different from the last example.

Uniform Crossovero Each gene (bit) is selected randomly from one of the corresponding genes of the parent

chromosomes.

Chromosome 1 11011 | 00100 | 110110Chromosome 2 10101 | 11000 | 011110

Offspring 10111 | 00000 | 110110

NOTE: Uniform Crossover yields only 1 offspring.

Crossover between 2 good solutions may not always yield a better or as good a solution. Since parents are good, probability of the child being good is high. If offspring is not good (poor solution), it will be removed in the next iteration during

“Selection”.

of 94

ELITISM Elitism is a method which copies the best chromosome to the new offspring population before

crossover and mutation. When creating a new population by crossover or mutation the best chromosome might be lost. Forces Genetic Algorithms to retain some number of the best individuals at each generation. Has been found that elitism significantly improves performance.

MUTATION It is the process by which a string is deliberately changed so as to maintain diversity in the

population set. We saw in the giraffes’ example, that mutations could be beneficial. Mutation Probability

o Determines how often the parts of a chromosome will be mutated. After an offspring has been produced from two parents (if sexual Genetic Algorithm) or from

one parent (if asexual Genetic Algorithm). Mutate at randomly chosen loci with some probability. Locus = a position on the genotype.

A SIMPLE OPTIMIZATION EXAMPLE

Optimization of f(x) = x2, with x ε [0, 31] Problem representation

o Encoding of the variable x as a binary vectoro [0, 31] [00000, 11111]

A GENETIC ALGORITHM BY HAND

StringNo.

InitialPopulation

x Value FitnessF(x) = x2

% of TotalFitness

SelectionProbability

1 01101 13 169 14.4 0.1442 11000 24 576 49.2 0.4923 01000 8 64 5.5 0.0554 10011 19 361 30.9 0.309

AfterSelection

Mate CrossoverPoint

Mutation NewPopulation

Fitnessf(x) = x2

0110 | 1 2 4 - 01100 1441100 | 0 1 4 3 11101 84111 | 000 4 2 - 11011 72910 | 011 2 2 - 10000 256

of 94

BUILDING BLOCKS (SCHEMAS) How to characterize evolution of population in Genetic Algorithm? Goal:

o Identify basic building block of Genetic Algorithmso Describe family of individuals.o Consider (11101) with fitness 841 and (11011) with fitness 729.o The structure (11***) is very much powerful, where * can be 1 or 0.

SCHEMA: DEFINITION String: (0, 1, *) (* - “don’t care”). Typical schemata: 10**0*, for string of length 6. Instances of above schema: 101101, 100000, … Have 36 = 729 schemata. In general, 3l Schemata for string of length l. Short – defining – length schemata are highly fit could dominate evolutionary process. Through crossover creating fitter ones. Mutation has little effect. Is an insurance policy to cover genetic policy.

BENEFITS OF GENETIC ALGORITHMS Concept is easy to understand. Modular, separate from application. Supports multi-objective optimization. Good for “noisy” environments. Always an answer; answer gets better with time. Inherently parallel; easily distributed. Many ways to speed up and improve a Genetic Algorithm – based application as knowledge

about problem domain is gained. Easy to exploit previous or alternate solutions. Flexible building blocks for hybrid applications. Substantial history and range of use.

WHEN TO USE A GENETIC ALGORITHM Alternate solutions are too slow or overly complicated. Need an exploratory tool to examine new approaches. Problem is similar to one that has already been successfully solved by using a Genetic

Algorithm. Want to hybridize with an existing solution. Benefits of the Genetic Algorithm technology meet key problem requirements.

MAIN DIFFICULTIES OF GENETIC ALGORITHMS Adjustment of the Genetic Algorithm control parameters.

o Population Sizeo Crossover Probabilityo Mutation Probability

Specification of the termination condition. Representation of the problem solutions.

of 94

SOME GENETIC ALGORITHM APPLICATION TYPES

Domain Application TypesControl Gas Pipeline, Pole Balancing, Missile Evasion, Pursuit

DesignSemiconductor Layout, Aircraft Design, Keyboard Configuration,Communication Networks

Scheduling Manufacturing, Facility Scheduling, Resource AllocationRobotics Trajectory PlanningMachineLearning

Designing Neural Networks, Improving Classification Algorithms,Classifier Systems

Signal Processing Filter DesignGame Playing Poker, Checkers, Prisoner’s DilemmaCombinatorialOptimization

Set Covering, Traveling Salesman, Routing, Bin Packing,Graph Coloring, Partitioning

6 : GENETIC ALGORITHMS REVISITED:6 : GENETIC ALGORITHMS REVISITED:MATHEMATICAL FOUNDATIONSMATHEMATICAL FOUNDATIONS

SCHEMA Consider strings constructed over the binary alphabets V = {0, 1} Thus a string can be represented as:

o A = 0101101 = a1a2a3 … where each ai is called a gene. Each gene can take a value 1 or 0. We call the values of ai (i.e. 1 or 0) alleles. Consider a population of strings A(t) at time t. Consider a schema H taken from the three letter alphabet V+ = {0, 1, *} E.g. the string (*11*0*1) is a representation of H.

SCHEMATA: PROPERTIES All schemata are not equal. They differ in counts:

o Order: The order of a schema H, denoted by O(H), is the number of fixed positions

present in the template of schema. E.g. O(1*11*0*) = 4.

o Defining length: The defining length of schema H, denoted by δ(H), is the distance between the

first and the last specific string position. δ(1*11*0*) = 6 - 1. δ(**1****) = 0.

SCHEMA DIFFERENCE EQUATION Suppose at any given time t, there are m examples of a particular schema H in the population

A(t). It is denoted by m = m(H, t). During reproduction, a particular string gets selected with probability pi = fi / Σ fi, where fi is the

fitness. Suppose a completely new generation is created from the population using reproduction.

of 94

Then the number of schemata at time t is given by m(H, t+1) = m(H, t) * f(H) / f’ where f’ = Σ fi / n.

f(H) is the average fitness of strings representing schema H at t. And f’ is the average fitness of the population. Thus, a schema grows as the ratio of the average fitness of the schema to the average fitness of

the population. Schemata with fitness value above the population average will receive an increasing number of

samples in the next generation. A schema grows or decays according to their schema averages under reproduction. If a particular schema H remains above an average amount cf’, then m(H, t+1) = m(H, t) * (f’ +

cf’) / f’ i.e. m(H, t+1) = (1+c) * m(H, t) When t = 0, m(H, 1) = (1+c) * m(H, 0) When t = 1, m(H, 2) = (1+c)2 * m(H, 0) … In general, m(H, t) = (1+c)t * m(H, 0) The equation is similar to compound interest, Geometric Progression i.e. reproduction allocations

exponentially increasing or decreasing schemata to future generations.

EFFECT OF CROSSOVER ON SCHEMATA Consider a string A = (0111100) and two schema H1 = (*1****0) and H2 = (***11**). Let the random crossover site be 3 i.e. A = (011|1100), H1 = (*1* | ***0) and H2 = (*** | 11**). Here schema H1 will be destroyed and H2 will be survived. Since the crossover site can uniformly between 1 and 6, as the defining length of H1 is large, H1

has lesser chance to survive. Probability of H1 to be destroyed = 5 / 6. In general, P(H1 to be destroyed) = pd = δ(H1) / (l-1). So, p(H1 to be survived) = ps = 1 - pd. If p(crossover) = pc, then ps = (1 - pc) * δ(H1) / (l-1). So, combined effect of reproduction and crossover is m(H, t+1) >= m(H, t) * (f(H)/f’) * ((1 - p c)

* δ(H) / (l-1)). Thus, those schemata with above average fitness and short defining length will grow

exponentially during evolution.

EFFECT OF MUTATION ON SCHEMA Mutation is the random alteration of a single position with a probability pm. For a schema to be survived, each of the specified position should be survived of mutation. So, survival probability for single position = 1 – pm

If O(H) is the fixed positions there in the string, p(survival of mutation) = (1 – pm) * O(H) = (1 - O(H)) * pm + ... Ξ (1 - O(H)) * pm

SCHEMA THEOREM So, combined effect of reproduction, crossover and mutation is m(H, t+1) >= m(H, t) * (f(H)/f’)

* (1 – (pc δ(H)/(l-1)) - O(H) pm) This is called the schema theorem or the Fundamental Theorem of Genetic Algorithm i.e. short,

low order above average schemata grow exponentially in the evolution.

SCHEMA PROCESSING: AN EXAMPLE

StringNo.

InitialPopulation

x Value Fitnessf(x) = x2

% of TotalFitness


of 94

1 01101 13 169 14.4 0.1442 11000 24 576 49.2 0.4923 01000 8 64 5.5 0.0554 10011 19 361 30.9 0.309

AfterSelection

Mate CrossoverPoint

Mutation NewPopulation

Fitnessf(x) = x2

0110 | 1 2 4 - 01100 1441100 | 0 1 4 3 11101 84111 | 000 4 2 - 11011 72910 | 011 2 2 - 10000 256

Reproduction on H1o Consider 3 schemata: H1 = 1****, H2 = *10** and H3 = 1***0.o Strings 2 and 4 are representations of H1.o So m(H1, t) = 2.o After reproduction, there are 3 copies of H1.o To check schema theorem,o f(H1) = (576 + 361) / 2 = 468.5o m(H1, t+1) = (f(H1) / f’) * m(H1, t) = (468.5 / 293) * 2 = 3.20 = 3 (observed).

Crossover on H1o No cross over since δ(H) = 0.

Mutation on H1o If pm = 0.001 then m * pm = 3 * 0.001 = 0.003 = 0 i.e. no bits moved due to mutation in

the schema.o No mutation.o Thus, we obtain the expected number of schemata as prescribed by the schema theorem.o Similar is the case with H2 and H3.

Two Armed And K – Armed Bandit Problem Schemata of low order, short defining length and above average fitness receive exponentially

increasing trials in future. Why should this way …? Can be explained using 2 bandit problem.

2 Armed Bandit Problem Slot machine with two arms: L and R. Each pays an award μ1or μ2 with variance σ12 and σ22, where μ1 > μ2. We want 2 things:

o Make a decision about which arm to play.o Collect information about which is the better arm.

First is called exploration and second is called exploitation. The trade off between the exploration and exploitation of knowledge is a characteristic of

adaptive systems. Experimentally, one can give exponentially increasing number of trials to the observed best of

arms. This is similar to the exponential allocation to better schemata.

COMPETING SCHEMATA

of 94

Two schemata A and B with individual positions ai and bi are competing if at all positions i = 1, 2… l either ai = bi = * or ai != *, bi != *, ai != bi for at least one value.

For example, consider the set of 8 schemata.o *00*0**o *00*1**o *01*0**o *01*1**o *10*0**o *10*1**o *11*0**o *11*1**

These schemata at fixed locations 2, 3 and 5 compete to be in the next population (similar to 8 armed bandits).

There are 7C3 = 35 different locations for these 23 = 8 schemata. In general, for schemata of order j of string length l, there are lCj different 2j schemata. Not all Σ lCj = 2l, schemata are played usually.

NUMBER OF SCHEMATA PROCESSED Consider a population of n binary strings. A number of long, high order schemata are destroyed by crossover and mutation. Still, a Genetic Algorithm process O(n3) schemata. This result, due to Holland, is known as Implicit Parallelism.

BUILDING BLOCK HYPOTHESIS

Schemata are building blocks. E.g. maximize function f(x) = x2 on [0, 31]. Let H1 = 1**** This 1-bit fixed schema corresponds to the right side of x = 16. The 0-bit schema H2 = 0**** corresponds to the left side of x = 16. The schema H3 = ****1 corresponds to the half domain between 1 & 2, between 3 & 4, … The schema H4 = ****0 corresponds to the half domain between 0 & 2, between 4 & 6, … Thus, 1-bit schemata contribute to the half domain of the full space. The schema H5 = 10*** corresponds to the domain between 16 & 24. The schema H6 = **1*1 contribute to the domain between 5 & 6, between 7 & 8, between 13 &

14 …

SCHEMATA: GEOMETRIC REPRESENTATION Consider strings of length 3. All of them can be represented by the vertices. The 2-bits schemata can be represented by the lines.

of 94

The 1-bit schemata can be represented by planes. In general, schemata are represented by hyper planes in a hyper cube.

GENETIC ALGORITHM HARD AND GENETIC ALGORITHM DECEPTIVE Schemata or building blocks lead to better population. But, not all problems can be solved using Genetic Algorithm way. The problems, which find difficult to solve using Genetic Algorithm techniques, are called

Genetic Algorithm hard problems.

GENETIC ALGORITHM DECEPTIVE Genetic Algorithm hard problems may have difficulties in coding. Possible solutions may not be amenable to genetic functions (operators). This coding-function combination of Genetic Algorithm hard problems is called Genetic

Algorithm deceptive.

GENETIC ALGORITHM DECEPTIVE: CHARACTERISTICS Genetic Algorithm deceptive tends to have a remote, isolated optima i.e. a best point surrounded

by a huge collection of worst points. Finding such is similar to finding a needle in a haystack. Many, not only Genetic Algorithm, techniques have difficulty in such cases. Consolation: such real world problems are less.

MINIMUM DECEPTIVE PROBLEM (MDP) Is the smallest problem that can be deceptive or misleading? For this, we consider low order, short schema which lead to incorrect longer order schema. We can show that 2-bit schema problem is the smallest MDP.

2-BIT PROBLEM IS MDP Consider four 2 order schema over two defining points with attached fitness.

o ***0*****0* f00

o ***0*****1* f01

o ***1*****0* f10

o ***1*****1* f11

(The fitness values are schema averages.) Suppose f11 is the global maximum. Then f11 > f00, f01, f10. Introduce an element deception to make Genetic Algorithm hard. For that, we assume that one or both of sub-optimal, 1 order schemata are better than the global

optimal 1 order schemata i.e. f(0*) > f(1*) and f(*0) > f(*1) i.e. (f(00) + f(01)) / 2 > (f(10) + f(11)) / 2 and (f(00) + f(10)) / 2 > (f(01) + f(11)) / 2.

Both cannot be true at the same time, as then f11 cannot be the global optimum. Only one result is true. Without loss of generality, assume that f(0*) > f(1*) and f(*0) < f(*1). Normalize the global conditions and label. So r = f11 / f00, c = f01 / f00, c’ = f10 / f00 and r > c, r > 1 and r > c’ Deception condition in normalized form: r < 1 + c – c’ These results give: c’ < 1 and c’ < c Thus there are two types of deceptive two-problems.

o Type 1: f01 > f00 (c > 1)

of 94

o Type 2: f00 > f01 (c <= 1) Both are deceptive. So 2-bit problem is deceptive. Similarly we can prove that 1-bit problem is not deceptive. So 2-bit problem is the MDP.

EXTENDED SCHEMA ANALYSIS OF 2-BIT PROBLEM We have a 2-bit problem that seems misleading. By schema theorem, m(H, t+1) >= m(H, t) * (f(H)/f’) * (1 – (pc δ(H)/(l-1)) – O(H) pm) When pm = 0, crossover has more importance. We look at a closer look at crossover. Cross over yield table in 2-bit problem.

X 00 01 10 1100 s s s 01, 1001 s s 00, 11 s10 s 00, 11 s s11 01, 10 s s s

On crossover, complements lose genetic material. This loss is compensated by the gain to the other complementary pair schemata. We have to account for the expected loss and gain of schemata due to cross over. Assuming proportionate reproduction, crossover and mutation, we can have the proportion for

each of the schema. Pt+1

11 = Pt11 * f11 / f’{1-pc’ (f00/f’) Pt

11} + pc’ (f01f10/f’2) Pt01 Pt

10

Pt+110 = Pt

10 * f10 / f’{1-pc’ (f01/f’) Pt01} + pc’ (f00f11/f’2) Pt

00 Pt11

Pt+101 = Pt


00 Pt11

Pt+100 = Pt


01 Pt10

Where f’ is average population fitness and pc’ = p(cross between bits) = pc * δ(H) / (l-1). These equations predict the expected proportions of the four schemata. A necessary condition for Genetic Algorithm to be successful is that sequence <P t

11> converges to 1.

Thus, Genetic Algorithm refuses to be misled by initial conditions.

7 : COMPUTER IMPLEMENTATION7 : COMPUTER IMPLEMENTATIONOF A GENETIC ALGORITHMOF A GENETIC ALGORITHM

It is disturbing initially. Due to

o Codingo Population not individuals.o Randomness giving direction.

GENETIC ALGORITHM IMPLEMENTATION Data Structure

o Strings by arrays, …o Specifyo Population sizeo String size

of 94

o Probability of mutationo Probability of crossover

Reproductiono Through roulette wheel selection method.o Take a partial sum of the fitness values.o Generate a random number to specify the location where the wheel has stopped.o Correspondingly select that string to be in the population.

Crossovero Take two parents randomly.o Generate a random number between 1 and l-1.o This is crossover site.o Exchange bits from that site onwards.o Two off-springs are generated.

Mutationo Generate a random number to select a string for mutation.o Generate a random number to select a bit for mutation.o Change the bit of that position.o Generating a new offspring.

Fitness Functiono Choose a proper fitness function.o A string’s suitability in the next population is judged based on this.

GOOD GENETIC ALGORITHM: EXPERIMENTS SHOW Choose high crossover probability. Choose low mutation probability. Choose moderate population size say 30.

GENETIC ALGORITHM DRAWBACK: PREMATURE CONVERGENCE Procedure may optimize. But not globally. Resulting in converging points prematurely.

MAPPING OBJECTIVE FUNCTION TO FITNESS FUNCTION Objective of some problems is to minimize a function say cost g(x). Then take f(x) = Cmax – g(x) where g(x) < Cmax is the largest cost observed till then. For maximization function, take f(x) = u(x) + Cmin

FITNESS SCALING At start of Genetic Algorithm: some better fitter strings among inferiors. Within a few generations, the superior ones will become dominant. This may lead to premature convergence. Later though diverse, almost all may have almost same fitness. This may lead to average and superior ones get same copies in the next population. Survival of the fittest becomes a random walk, which needs to be avoided. Strategy to avoid scaling.

LINEAR SCALING f’ = a f + b (a & b are to be found)

of 94

Always take:o f’avg = favg

o f’min = fmin

o 2 * f’avg = fmax

o f’max = Cmult . favg

Where Cmult = expected number of copies of best in the next population.

BECAUSE OF SCALING Number of extraordinary best ones is restricted. Number of lowly ones increases. In mature run, a few bad strings may have below population average. If scaling is applied here, the low fitness values can go negative, which is undesired. In that case, take fmin = f’min = 0. Scaling helps to prevent early dominance of a few best ones and encourages healthy competition

among equals.

CODING IN GENETIC ALGORITHM Different methods. Which to select …? Two guidelines

o Principle of meaningful building blocks The user should select a coding so that short order schemata are relevant to the

underlying problem and relatively unrelated to schemata over other fixed positions.

o Principle of minimal alphabets The user should select the smallest alphabet that permits a natural expression of

the problem.CODING: BINARY VS. NON-BINARY

E.g. f(x) = maximize x2 on [0, 31]

Binary Representation 32-Alphabet Representation01101 N11000 Y01001 I10011 T

There are non coding similarities to be exploited in non-binary.

DISCRETIZATION OF CONTINUM Many optimization problems require functions over a continum. This can be converted into a finite collection of discrete problems which can then be solved

using Genetic Algorithm.

CONSTRAINTS Genetic Algorithm generally used for unconstrained problems. Genetic Algorithm can be used for constrained problems too. This can be done by incorporating a penalty in the objective function as and when the constraints

are violated.

of 94

8 : SOME APPLICATION OF GENETIC ALGORITHMS8 : SOME APPLICATION OF GENETIC ALGORITHMS

HISTORY OF GENETIC ALGORITHM Initially biologists used computer for simulating natural genetics. Aim: understand natural phenomena. Pioneer: Frazer considered a phenotype function as phenotype = a + q a |a| + c a3

Chromosome = 15 bit string, where 5 bits each for a, q and c. Interaction of this group of bits forms a selection of diverse population. Chose strings with phenotype values between specified limits say 0 and 1. Future generations are evolved with acceptable string structure. Similar to typical optimization.

HOLLAND, THE FATHER OF GENETIC ALGORITHM Introduced Genetic Algorithm as a computational technique, though was entirely different. Wanted to create general programs and machines which could adapt to the changing

environment. Recognized:

o Importance of selection.o Effectiveness of population against individuals.o Initially, lesser importance to crossover, mutation …

BAGLEY AND ADAPTIVE GAME PLAYING PROGRAM First coined Genetic Algorithm. Used to play hexa-pawn. Used reproduction, crossover, and mutation. Reduced selection in the beginning to reduce dominance of some. Increased selection later to allow competition. Technique called as ‘Adaptive Genetic Algorithm’.

ROSENBERG AND BIOLOGICAL CELL SIMULATION Simulated a population of a single celled organism. Defined a finite length string with a pair of chromosome (diploid). A string length of 20 with a max of 16 alleles. Introduced ‘Offspring generation function (OGF)’ – fixes number of offspring – to check

selection. Introduced p(crossover site).

CAVICCHIO AND PATTERN RECOGNITION Applied Genetic Algorithm to the design of detectors for pattern recognition. An image is digitized as 25 x 25 binary pixel grid. A detector is a subset of the pixels. During training, known images are presented and list of detector states are stored. During recognition phase, unknown image is presented and count the matching. Allowed reproduction, crossover and mutation. Used pre-selection – an offspring always replaces one of the parents to give diversity.

WEISENBERG AND CELL SIMULATION Computer simulation of a living cell. Proposed a multilevel Genetic Algorithm. The lower one is an adaptive Genetic Algorithm and the upper is non-adaptive.

of 94

Lower one is meant to find parameters of Genetic Algorithm. These parameters are given to the upper level Genetic Algorithm and test the fitness of

population strings. The fitter ones will be sent to the lower level and continues. Upper one functions like a supreme judge.

HOLLSTEIN AND FUNCTION OPTIMIZATION Used five selection methods:

o Progeny testing: fitness of offspring controls parent’s further breeding.o Individual selection: fitness of one decides the future of it as a parent.o Family selection: fitness of family controls use of all family members as parent.o Within family selection: fitness of one within family controls selection within family.o Combined selection: combination of selection methods.

Used eight mating preferences:o Random mating: equally likely.o Inbreeding: related ones mate intentionally.o Line breeding: a unique one is identified and mate with standard one and offspring is

selected.o Outbreeding: contrast ones are chosen as parents.o Self fertilization: breeds itself.o Clonal propagation: a copy of one is formed.o Positive assortive mating: like ones with like ones.o Negative assortive mating: unlike ones are bred.

Used 16 string populations. Concluded that inbreeding and outbreeding are better.

FRANTZ AND POSITIONAL EFFECT Larger population size (100) and string size (25). Used:

o Roulette wheel selectiono Simple crossovero Mutation

Find a correlation between positional effect and rate improvement. Introduced:

o Inversiono Partial complement (migration)o Multiple point crossover

Migration:o Select a few strings.o Complement about one third of bits of these strings.o The new strings are called immigrants.o Helps in maintaining diversity, but reduces performance.

BOSWORTH, FOO, ZEIGLER – REAL GENES Coding based on minimalist – binary like - against maximalist approach. Thought mutation needed a change. So, introduced five variations.

of 94

Not Genetic Algorithm in the pure sense.

BOX AND EVOLUTION OPERATION More of a management technique for workers to execute a plan than an algorithm. Followed natures mechanism:

o Genetic variabilityo Selection

Loose application of mutation as anything which changes structure. Not a Genetic Algorithm in the modern sense.

FOGEL, QUEENS AND WASH – EVOLUTION PROGRAMMING Consider a state diagram of 3 state machines. 0 and 1 are the inputs. A, B, C are the states. α, β, γ are the outputs. Two operators:

o Selection: choose best out of parent and child.o Mutation: make different a string by an output, state transition, no of states or initial state.

Drawback: limited to small problem space. The transition description is given by:

Present State Input Symbol Next State Output SymbolC 0 B βB 1 C αC 1 A γA 1 A βA 0 B βB 1 C α

DE JONG & FUNCTION OPTIMIZATION Mainly used as a function optimizer. Used six functions with properties:

o Continuous / discreeto Convex / non-convexo Unimodal / multimodalo Quadratic / non-quadratico Low dimensionality / high dimensionalityo Deterministic / stochastic

Devised two different performance measures:o Offline (convergence) performanceo Online (ongoing) performance

Offline: different functions are tried and the best is saved for subsequent operations. The performance xe(s) of strategy s on environment e is given by xe(s) = 1/T Σ fe(t) where fe(t)

is the objective function value for environment e on trial t. Online: acceptable performance is taken. The online performance xe*(s) of strategy s on environment e is given by xe*(s) = 1/T Σ fe*(t)

where fe*(t) = best{fe(1), fe(2), …, fe(t)}.

of 94

De Jong called his algorithm reproduction plan R1. In R1, three operations were used:

o Roulette wheel selectiono Simple crossovero Simple mutation

R1 is a family of plans using 4 parameters n, pc, pm and G (Generation Gap). G = 1 for non-overlapping populations = 0 < G < 1 for overlapping populations. In overlapping populations, n x G individuals will be selected for genetic operations. He observed that larger pop size lead to better offline performance and smaller pop size lead to

rapid initial change. He investigated five variations of plan R1. They are:

o R2 - Elitist Modelo R3 - Expected Value Modelo R4 - Elitist Expected Value Modelo R5 - Crowding Factor Modelo R6 - Generalized Crossover Model

R2 - Elitist Modelo Let a*(t) be the best individual generated up to time t. After generating A(t+1) in the

usual fashion, if a*(t) is not in A(t+1), then include a*(t) to A(t+1) as the (N+1) th

member.o It improves the local search at the expense of global perspective.

R3 - Expected Value Modelo Each string in the population is given an expected number of off-springs f / f’.o Thereafter, each time a string is selected for crossover or mutation, its offspring count is

reduced by 0.5o When an individual is selected for reproduction without crossover or mutation, its

offspring count is reduced 1.o If the offspring count < 0, it is no longer available for selection.

R4 - Elitist Expected Value Modelo Combination of R2 and R3.o Much better performance.

R5 - Crowding Factor Modelo In nature, like individuals dominate a niche in the population.o Then, increased competition for limited resources decreases life expectancy and birth

rate.o De Jong enforced a crowding pressure by the forceful replacement of older strings with

newer off-springs.o For that, consider an overlapping pop with G = 0.1o Defined a parameter-crowding factor.o When an off-spring is born, a string is selected for dying. o The dying string is selected as that one which resembles the new off-spring (like bit-by-

bit similarity).o Process is similar to pre-selection of Cavicchio.

R6 - Generalized Cross Over Modelo Used a new parameter – number of crossover points (CP).

of 94

o When CP = 1, it is simple crossover.o If l is the length of string, then there are lCCP operators for multiple crossovers.o As CP is increased, each operator has less chance to be picked up during a particular

cross and hence less structure can be preserved i.e. effectively, the process becomes a random shuffle and fewer important schemas can be preserved.

IMPROVEMENT IN BASIC TECHNIQUES Since De Jong there were improvements to the basic Genetic Algorithm. They correspond to:

o Selectiono Scalingo Ranking

ALTERNATIVE SELECTION SCHEMA: BRINDLE Deterministic Sampling

o Find pselection = fi / Σ fi

o E(Number of Strings) = ei = int(pselection * n)o Population is selected according to the fraction part of ei.o Fill the remaining slots of population from the top of the sorted list.o Example:

StringNo.

InitialPopulation

x Value Fitnessf(x) = x2

% of TotalFitness


1 01101 13 169 14.4 0.1442 11000 24 576 49.2 0.4923 01000 8 64 5.5 0.0554 10011 19 361 30.9 0.309

o Here strings 2 and 4 are selected initially. o Then sort the fractional parts 0.96, 0.56, 0.23 and 0.22o Best strings are 2 and 1 corresponding to the fractional parts 0.96 and 0.56o Thus new population = {2, 4, 2, 1}

Remainder Stochastic Sampling Without Replacemento Like deterministic sampling, integer values are selected.o Fractional parts of ei are taken as probabilities.o Bernoulli’s trials are conducted with probability of success = fractional probabilities.o E.g. ei = 2.5 will have 2 sure and another with probability 0.5

Remainder Stochastic Sampling With Replacemento Like deterministic sampling, integer values are selected.o Fractional parts of ei are used to calculate weights in a roulette wheel selection procedure.

Stochastic Sampling With Replacemento Typical roulette selection.

Stochastic Sampling Without Replacemento Typical De Jong’s expected value modal R3.o Each string in the population is given an expected number of off-springs f / f`.o Thereafter, each time a string is selected for crossover or mutation, is offspring count is

reduced by 0.5

of 94

o When an individual is selected for reproduction without crossover or mutation, its offspring count is reduced 1.

o If the offspring count < 0, it is no longer available for selection. Stochastic Tournament

o Selection probabilities are calculated as normal.o Successive pairs of individuals are drawn using Roulette wheel selection.o Out of a pair, string with higher fitness is taken into the population.o A new pair is drawn and continued until the population is full.

These selection procedures show many drawbacks. It is because of the inferiority of Roulette Wheel selection Out of all these, R3 is considered to be better.

SCALING MECHANISM Without scaling some highly fit strings may dominate from the beginning. Important scaling procedures are:

o Linear scaling f’ = a f + b. (a & b are to be found). Always take: f’avg = favg & f’min = fmin & 2 * f’avg = fmax & f’max = Cmult . favg Where Cmult = expected number of copies of best in the next population.

o Sigma truncation F’ = f - (f^ - cσ) where σ = population standard deviation and c is a constant

between 1 and 3. Negative values will never occur.

o Power low scaling f’ = fk for some k.

RANKING PROCEDURES Selection of strings is based on ranks of their fitness values. Population is sorted according to the fitness values. Individuals are assigned an offspring count based on their rank.

APPLICATIONS OF GENETIC ALGORITHM Medical image registration with Genetic Algorithm

o Genetic Algorithm is used to perform image registration in a Digital Subtraction Angiography (DSA).

o DSA checks the interior of an artery by comparing two X-rays which are taken before and after injecting a dye.

o Images are digitized and subtracted pixel by pixel.o The difference image gives the interior of the artery.o The pre injection image is transformed by a bilinear map x’(x, y) = a0 + a1x + a2y + a3y

and y’(x, y) = b0 + b1x + b2y + b3y where ai and bi are unknown.o Genetic Algorithm is used to find this ai and bi by minimizing the mean absolute

differences of the images. Iterated prisoner’s Dilemma

Player 2Decision Co-operate Defect

Player 1 Co-operate (R = 6, R = 6) (S = 0, T = 10)

of 94

Defect (T = 10, S = 0) (P = 2, P = 2)

o Play the problem repeatedly to find history of C and D.o This iterative problem can be solved using ‘tit for tat’.o Axelord showed that this can be solved by a much better way using Genetic Algorithm

by a representation of 63 bit string and using last three strategies of the other prisoner.

9 : ADVANCED OPERATORS AND TECHNIQUES IN GENETIC SEARCH9 : ADVANCED OPERATORS AND TECHNIQUES IN GENETIC SEARCH

Until now, we considered Genetic Algorithm with genetic operators:o Selectiono Mutationo Crossover

DOMINANCE, DIPLOIDY, ABEYANCE Nature offers:

o Diploidy (i.e. pairs of chromosomes)o Dominance (as shown by Mendel on pea plants)

Until now we considered only haploid (i.e. single stranded chromosome like (1011110001)).

NATURE’S WAYS Most of the complex and difficult life forms in nature are diploids or double stranded

chromosomes. In diploid form, a genotype carries pairs of chromosomes. They are called homologous chromosomes. And they carry information for the same function.

DIPLOID CHROMOSOMES

Each cell has nucleus Rod-shaped particles inside are chromosomes which we think of in pairs. Different number for species, human (46), tobacco (48), goldfish (94), chimpanzee (48). Usually paired up. X & Y Chromosomes.

o Humans: Male(xy), Female(xx)

of 94

o Birds: Male(xx), Female(xy) E.g.

o AbCDeo aBCde

Each pair contains upper case and lower case characters. Each allele represents a special characteristic. E.g.

o a blue eyeso A green eyes

But phenotype at a time can have only one of them. This possible by using genetic operator-dominance.

DOMINANCE An allele is said to be dominant if it is expressed when paired with some other allele. E.g. upper cases are dominant and lower cases are recessive. The phenotype expressed by the chromosome pair will be: AbCDe and aBCde ABCDe i.e. a

dominant gene is expressed when heterozygous (Aa A) or homozygous (CC C). A recessive gene is expressed only when homozygous (ee e). Diploidy ‘remembers’ alleles. Dominance protects a ‘remembered’ allele from a harmful selection in a hostile environment. In a changed environment, ‘remembered’ but ‘neglected’ allele can be selected. The effectiveness of organism increases as it is more ready to face the changing environment. Diploidy permits to carry along multiple possibilities with only one is expressed. Old lesson learned from experience but not used.

DIPLOIDY AND DOMINANCE IN GENETIC ALGORITHM Two evolving dominance mechanisms. First Scheme:

o Each binary gene is described by two genes: A modifier gene A functional gene

o Functional gene takes 1 or 0.o Modifier gene takes either ‘M’ or ‘m’.o Here 0 is taken as dominant.o A dominance expression map is constructed as follow:

0M 0m 1M 1m0M 0 0 0 00m 0 0 0 11M 0 0 1 11m 0 1 1 1

o Similarly he introduced a single locus tri-allelic dominant ma as follows:

0 1 20 0 0 11 0 1 12 1 1 1

of 94

o Later Holland studied this further and represented tri-allelic as {0, 10, 1}.o Here most effective becomes dominant and shields the other.o A mutation like operator is needed to map 1 to 10 and 10 to 0 and so on.o Results:

Better population diversity. No improvement in average and final performance in comparison with haploid

simulation.

BRINDLE’S STUDY ON DOMINANCE Introduced six schemes:

o Random, Fixed, Global Dominance Dominance of binary allele is determined for all loci for all time at the beginning. At each location, an unbiased coin is thrown and a single dominance map is

recorded. The dominant allele is expressed is it is heterozygous or homozygous. The recessive allele is expressed only when it is homozygous.

o Variable, Global Dominance The probability of dominance of 0 or 1 at a locus = proportion of 0’s and 1’s at

each location. The expression of allele at a locus is performed by the Bernoulli’s trials for

heterozygous loci.o Deterministic, Variable, Global Dominance

Proportion of 0’s and 1’s at each location is calculated. The allele with greater proportion is declared dominant.

o Choose A Random Chromosome A chromosome is selected from the pair randomly and its alleles are taken as

dominant. Equal to selecting and using one of heterozygous pair at random.

o Dominance Of The Better Chromosome Compare the fitness of each chromosome and choose the better as dominant.

o Haploid Controls Diploid Adaptive Dominance A third (haploid) chromosome carries an adaptive dominance map to determine

the expression of the normal diploid pair. Dominance map can be constructed dynamically by a Genetic Algorithm.

Conclusion:o Many objected the proposals of Brindle on dominance.

AN ANALYSIS OF DOMINANCE OF DIPLOIDY IN GENETIC ALGORITHM SEARCH We can add dominance and Diploidy to the schema theorem. Let He be the expressed schema and H be the physical schema. Then m(H, t+1) >= m(H, t) (f(He)/f’) (1 – (pc δ(H)/(l-1)) - o(H) pm) For a fully dominant schema H, f(H) = f(He) It is expected that f(He) >= f(H) Suppose only two alternatives, competing schemata - one dominant and other recessive. The dominant one is expressed when it is heterozygous and the recessive one is expressed when

it is homozygous. Let fd and fr be the expected fitness values.

of 94

Then the proportion of recessives in the next generation is given by:o Pt+1 = pt K{pt + r(1-pt)/(1-r)pt . pt + r), where r = fd / fr and K = crossover-mutation loss

constant. Draw a graph with Pt+1 against Pt. Conclusion:

o The haploid case always destroys more than the corresponding diploid case.o Under Diploidy and dominance, mutation plays lesser role.

10 : INTRODUCTION TO GENETICS BASED MACHINE LEARNING10 : INTRODUCTION TO GENETICS BASED MACHINE LEARNING

Introduced by Holland. Suggested a language - Broadcast Language. It consists of production rules (called broadcast units) over 10-letter alphabet of bits and a wild

car. Later, first GBML system called Cognitive System Level One (CS-1) implemented. Applications in different fields.

CLASSIFIER SYSTEMS Most popular form of GBML. It is a machine learning system that learns syntactically simple string rules to guide its

performance in an arbitrary environment. Consists of:

o Rule & Message systemo Apportionment of credit system (modeled after an info based service economy)o Genetic Algorithm.

RULE & MESSAGE SYSTEM Kind of production system. Production rules have format:

o If <condition> then <action>o E.g. if <(0,0)> then <(4,0)>

Each rule is of a fixed length and is amenable to genetic operations. It allows parallel action of rules a single of usual expert system. The relative value of a rule is to be learned against fixed value. Competition ensures good rules survive. Classifier’s bank balance is taken as fitness and an internal currency is introduced. <message> :: {0, 1}l is the basic token of info exchange. <condition > :: {0, 1, #}l A condition is matched by a message if the corresponding bits are mapped. E.g. #10# matches 1100 but not 1000. Once matched, that classifier becomes candidate to post its message to the message list. Example:

No. Classifier1 01## : 00002 00#0 : 11003 11## : 10004 ##00 : 0001

of 94

o Suppose environment sends message 0111.

APPORTIONMENT OF CREDIT ALGORITHM Bucket Brigade Algorithm: rank the individual classifier according to its efficiency in achieving

reward from the environment. It is an ‘info economy’, where right to trade is bought & sold by classifiers. They form a chain of middlemen from the info manufactures (environment) to the info customers

(effectors). Has two components:

o Auction When a classifier matches a condition, then it is qualified for auction. Each classifier maintains a record of its net worth, called strength. Each matched classifier makes a Bid B proportional to its strength. Thus, highly fit ones are given preference.

o Clearing House When a classifier is selected for auction, it clears payment through clearing house. A matched & activated classifier sends its bids to those classifiers responsible for

sending the messages that matched bidding classifier’s conditions. Classifiers make bids (Bi) during the auction. Winning classifiers turn over their bids to the clearing house as payments (Pi). A classifier may have receipts Ri from its previous message sending activity or from

environment. Thus, the strength of ith classifier Si(t+1) = Si(t) – Pi(t) – Ti(t) + Ri(t). But, bids are corresponding to its strengths, so Bi = Cbid x Si. If there are noise, take the effective bid as EBi = Bi + N(σ bid). The winner pay their bids Bi not EBi to the clearing house. The tax of the classifier is given by Ti = Ctax x Si. So the difference equation is given by S(t+1) = S(t) – Cbid S(t) – Ctax S(t) + R(t) I.e. S(t+1) = (1 - k) S(t) + R(t) where k = Cbid + Ctax. The system is stable only when R(t) is bounded. Leaving R(t), S(t+1) = (1 – k) S(t) … S(n) = (1 – k )n S(0), for a active classifier.

GENETIC ALGORITHM In order to apply new and better rules into the system of bucket brigade, Genetic Algorithm is

used. Here it is different from the optimization case. Non overlapping population is possible. Generation gap is used. Selection is through RW and De Jong’s crowding procedure. Mutation with change as 0 {1, #} with probability.

SIMPLE CLASSIFIER SYSTEM (SCS) A SCS, a simple version of classifier system is developed. Experiment results show SCS with Genetic Algorithm outperform SCS without Genetic

Algorithm and random guessing.11 : APPLICATIONS OF GENETIC BASED MACHINE LEARNING11 : APPLICATIONS OF GENETIC BASED MACHINE LEARNING

of 94

GBML systems discuss better computer programs by applying selection, recombination and other genetic operators.

Holland pioneered theoretical foundation of GBML. This followed:

o Proposal of Broadcast Language.o Implementation of first classifier system (Cognitive System – 1)

Proposal to construct complex machines built from fixed components with schemata property.

CS-1 Classifier conditions are constructed over {0, 1, #}. Many resources - e.g. hunger, thirst. Maintains different reservoirs for different resources. Current resource levels are used to determine current demand, which then are used to determine

which rules to activate. An epochal algorithm is used instead of bucket brigade algorithm. Epoch is the time period between two payoff events. The parameters are the predicted payoff values ui. Let di be the current demand (i.e. lower reservoir level). Then appropriation value α = Σ di ui. Roulette wheel is weighted with αM where M = match score, which increases with rule

specificity. Roulette wheel selection is used to select winner. The epochal apportionment algorithm tracks the accuracy of a classifier’s predicted payoff using

three parameters:o Ageo Frequencyo Attenuation

Results show the procedure outperforms other methods.

LS-1 SYSTEM Smith’s LS-1 system has different architecture. Holland’s CS treat rules as individuals whereas LS-1 treats rule sets as individuals in a

population. Has four genetic operators:

o Reproductiono Mutationo Modified Crossovero Inversion (i.e. r1:r2:r3 r3:r2:r1).

Results: good, but cannot be compared to CS-1 as measures are different.

BOOKER’S FOOD AND POISON LEARNER Studies:

o Connection between classifier systems and cognitive science.o Modifications to Genetic Algorithm that offers Machine Learning.o Application of classifier systems to the problem of finding food and avoiding poison.

Uses split architecture of CS-1.

of 94

Introduces two mechanisms: sharing and marriage restriction. Sharing: conditions that match same pattern share the payoff. Marriage Restriction: restricts mating of complementary patterns.

EYE–EYE COORDINATION Wilson’s classifier system for the sensory-motor coordination of a video camera. To learn to center an object in the video camera by moving the camera in the right direction. Uses an innovated CS-1 architecture. Uses a complex retina-cortex mapping. Instead of 1-dimension string, 4 by 4 arrays are used as rules. Uses 2-dimension crossover.

ANIMAT CLASSIFIER SYSTEM Wilson’s roaming classifier system that searches two dimensional woods, seeking food and

avoiding trees. Uses a 18 by 58 rectangular grid which contains trees(T) and food(F). The ANIMAT, represented by *, has the knowledge about immediate surroundings. E.g.

o BTTo B*Fo BBB

It generates an environmental message TTFBBBBB. Take T – 01, F-11 and B – 00. Then 0101110000000000 is the bit representation. There are eight classifiers to recognize this. There are four genetic operators:

o Match set, set of matching classifiers.o Create op, when no matching classifier.o Partial intersection op: 2 rules with same action are aligned by replacing mismatch with

#.o Time to payoff estimation.

PIPE LINE OPERATIONS CLASSIFIER SYSTEM Due to Goldberg. Has two parts:

o Optimization of pipeline operations by Genetic Algorithm.o Learning control of pipeline operations by classifier systems.

BOOle Due to Wilson. A classified system that learn difficult Boolean function. Applied on a function that uses NOT, AND, OR.

CL-ONE Parallel Semantic Networks in a classifier frame work by Forest. Developed a complier to translate code written in semantic network language KL-ONE to

classifier system format.

of 94

Thus connected symbolic Artificial Intelligence to Classifier Systems. Has four components:

o Parser and classifier generatoro Symbol table managero External command processoro Classifier systems

LEARNING SIMPLE AND SEQUENTIAL PROGRAMS Due to Cramer. Shows that Gas can be used with programs not in production rule format. Worked with PL language and converted to a simpler one called PL. Devised two coding schemes: JB and TB.

GENETIC PROGRAMMING Due to Koza. Genetic Algorithm can be used to generate programs. Highly successful.

12 : INTRODUCTION12 : INTRODUCTION

Challenge: How to manage ever-increasing amounts of information. Solution: Data Mining and Knowledge Discovery Databases (KDD).

INFORMATION AS A PRODUCTION FACTOR Most international organizations produce more information in a week than many people could

read in a lifetime. Ability to learn and interpret is not sufficient. Mechanization of filtering, selecting, interpreting of data is important. E.g. Stock Market.

COMPUTER SYSTEMS THAT CAN LEARN Adaptation to the environment is natural. E.g. plants, animals, human beings. Learning is a form of adaptation. Machines could be programmed to learn from mistakes. Thus expert systems learning systems.

DATA MINING MOTIVATION Mechanical production of data, need for mechanical consumption of data. Large databases = vast amounts of information. Difficulty lies in accessing it.

KDD AND DATA MINING KDD

o Extraction of knowledge from data.o Official definition: “non-trivial extraction of implicit, previously unknown & potentially

useful knowledge from data”. Data Mining

o Discovery stage of the KDD process.

of 94

o Process of discovering patterns, automatically or semi-automatically, in large quantities of data.

o Patterns discovered must be useful: meaningful in that they lead to some advantage, usually economic.

Data mining is a multi-disciplinary field

DATA MINING VS. QUERY TOOLS SQL: When you know exactly what you are looking for. Data Mining: When you only vaguely know what you are looking for.

PRACTICAL APPLICATIONS KDD more complicated than initially thought

o 80% preparing datao 20% mining data

DATA MINING TECHNIQUES Not so much a single technique. More the idea that there is more knowledge hidden in the data than shows itself on the surface.

13 : WHAT IS LEARNING?13 : WHAT IS LEARNING?LEARNING

An individual learns how to carry out a certain task by making a transition from a situation in which the task cannot be carried out to a situation in which the same task can be carried out under the same circumstances.

SELF LEARNING COMPUTER SYSTEMS A self learning computer can generate programs itself, enabling it to carry out new tasks.

MACHINE LEARNING AND THE METHODOLOGY OF SCIENCE

of 94

Empirical cycle of scientific research

MACHINE LEARNING

Theory Formation

Theory Falsification The patterns that machine learning programs find can never be definitive theories.

of 94

They are only hypothesis with temporary values. So machine learning programs need to be checked for their statistical relevance.

CONCEPT LEARNING Recognition by experience. Classification accuracy. Transparency. Statistical relevance. Information content. Complexity of search space.

A KANGAROO IN MIST

Complexity of Search Spaces

So, it is important to know the complexity of search space before hand. We can use learning algorithms for search:

o Supervised vs. unsupervisedo Background knowledgeo Bias (constraint)o Batch learning vs. incremental learningo Noise and redundancy

14: DATA MINING AND THE DATA WAREHOUSE14: DATA MINING AND THE DATA WAREHOUSE

WHAT IS DATA WAREHOUSE? Defined in many different ways, but not rigorously.

o A decision support database that is maintained separately from the organization’s operational database.

o Support information processing by providing a solid platform of consolidated, historical data for analysis.

“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.” - W. H. Inmon.

Data Warehousing:

of 94

o The process of constructing and using data warehouses.

DATA WAREHOUSE Subject-Oriented

o Organized around major subjects, such as customer, product, sales.o Focusing on the modeling and analysis of data for decision makers, not on daily

operations or transaction processing.o Provide a simple and concise view around particular subject issues by excluding data that

are not useful in the decision support process. Integrated

o Constructed by integrating multiple, heterogeneous data sources. Relational databases, flat files, on-line transaction records.

o Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute

measures, etc. among different data sources E.g. Hotel price: currency, tax, breakfast covered, etc.

When data is moved to the warehouse, it is converted. Time Variant

o The time horizon for the data warehouse is significantly longer than that of operational systems.

Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g. past

5-10 years).o Every key structure in the data warehouse contains an element of time, explicitly or

implicitly but the key of operational data may or may not contain “time element”. Non-volatile

o A physically separate store of data transformed from the operational environment.o Operational update of data does not occur in the data warehouse environment.

Does not require transaction processing, recovery, and concurrency control mechanisms.

Requires only two operations in data accessing: initial loading of data and access of data.

DESIGNING DECISION SUPPORT SYSTEM Requirements of user. Hardware and Software requirements. Integration with Data Mining.

CLIENT / SERVER AND DATA WAREHOUSE Top down or bottom up. Requirement of data marts.

COST JUSTIFICATION Speed. Complexity. Repetition. Comparison with expert system.

of 94

15 : THE KNOWLEDGE DISCOVERY PROCESS15 : THE KNOWLEDGE DISCOVERY PROCESS

Pre-processingo Data selectiono Cleaningo Coding

Data Miningo Select a modelo Apply the model

Analysis of results and assimilationo Take action and measure the results

THE KDD PROCESS

DATA PREPROCESSING Data Selection

o Identify the relevant data, both internal and external to the organization.o Select the subset of the data appropriate for the particular data mining application.o Store the data in a database separate from the operational systems.

Cleaningo Domain consistency: replace certain values with null.o De-duplication: customers are often added to the DB on each purchase transaction.o Disambiguation: highlighting ambiguities for a decision by the user.

E.g. if names differed slightly but addresses were the same. Enrichment

o Additional fields are added to records from external sources which may be vital in establishing relationships.

Codingo E.g. take addresses and replace them with regional codes

of 94

DataSelection

Cleaning- Domain Consistency- De-duplication- Disambiguation

Enrichment Coding Data Mining- Clustering- Segmentation- Prediction

Reporting

Info.Reqt

OperationalData

ExternalData

Action

Feedback

o E.g. transform birth dates into age ranges It is often necessary to convert continuous data into range data for categorization purposes.

DATA MINING Preliminary Analysis

o Much interesting information can be found by querying the data set.o May be supported by a visualization of the data set.

Choose one or more modeling approaches. There are two styles of data mining:

o Hypothesis testingo Knowledge discovery

The styles and approaches are not mutually exclusive.

DATA MINING TECHNIQUES Not so much a single technique. More the idea that there is more knowledge hidden in the data than shows itself on the surface. Any technique that helps to extract more out of data is useful:

o Query Tools 80% of interesting info possible through this. But remaining 20% is more vital to business. Capable to provide a naïve prediction. Any better algorithm has to give better results.

o Statistical Techniqueso Visualization

Gives better idea about data sets and possible patterns. E.g. scatter diagram.

Likelihood & distance: similar records are close in the space. Records become points in multidimensional data space. Thus we can find clusters in the space. E.g. possible customers of a product.

o On-Line Analytical Processing (OLAP) Assume multidimensional data are available. We can access information corresponding to business requirements. OLAP vs. Data Mining:

OLAP tools do not learn, no knowledge. Cannot search for solutions.

of 94

Data Mining is more powerful.o Case-Based Learning (k-nearest neighborhood)

Records that are close to each other live in each other’s neighborhood. If we want to predict the behavior of an individual, look at the behavior of its

neighbors. Take the average of them. That will be prediction for the individual. K is the number of neighbors. Training set includes classes. Examine K items near to item to be classified. New item placed in class with the most number of close items.

KNN Algorithm

o Decision Trees Tree representation of data. Can identify the conditions. E.g. car ownership with age.

of 94

o Association Rules Identifies matching habits. E.g. blue jeans and white T-shirts. Example: Market Basket Data

Items frequently purchased together: Bread Butter Uses:

o Placemento Advertisingo Saleso Coupons

Objective: increase sales and reduce costs.o Neural Networks

Mainly 3 models. Perceptrons

o Perceptron is one of the simplest Neural Network.o No hidden layers.

of 94

Back propagation networks Kohonen Self Organizing Maps

o Brain has different places called visual maps, maps of spatial possibilities etc.

o Initially, SOM has a random assignment of vectors to each unit.o During training, these vectors are incrementally adjusted to give

better coverage of the same.o Competitive Unsupervised Learning.o Observe how neurons work in brain:

Firing impacts firing of those near. Neurons far apart inhibit each other. Neurons have specific non-overlapping tasks.

o Genetic Algorithms (Refer Previous Notes)

16 : SETTING UP A KDD ENVIRONMENT16 : SETTING UP A KDD ENVIRONMENT

DIFFERENT FORMS OF KNOWLEDGE Shallow: easily retrievable using SQL. Multidimensional: can use OLAP. Hidden: can use pattern recognition algorithms. Deep: can discover only with clues. E.g. decryption possible only through keys.

SIX STAGES OF KDD Data Selection Cleaning

of 94

Coding enrichment Data Mining Reporting

DATA MINING Different types of tasks could be tackled by suitable techniques:

o Classification tasks: Association Rules, k-n neighborhood, Decision Trees.o Problem solving tasks: Genetic Algorithm.o Knowledge engineering tasks: inductive logic algorithm.

A Data Mining algorithm is selected based on:o Quality of inputo Quality of outputo Performance

10 RULES FOR SETTING A GOOD DATA MINING ENVIRONMENT Support extremely large data set. Support hybrid learning: classification … Establish a Data Warehouse. Introduce data cleaning facilities. Facilitate working with dynamic coding. Integrate with DSS. Choose extendable architecture. Support heterogeneous Databases. Introduce Client-Server architecture. Introduce cache optimization.

17 : SOME REAL-LIFE APPLICATIONS17 : SOME REAL-LIFE APPLICATIONS

Customer profiling for a large bank. CAPTAIN - career planner for pilots in KLM airlines. Discovering foreign key relationships.

18 : SOME FORMAL ASPECTS OF LEARNING ALGORITHMS18 : SOME FORMAL ASPECTS OF LEARNING ALGORITHMS

LEARNING AS A COMPRESSION OF DATA Learning reduces search space i.e. learning is similar to data compression. Not all compressions are easily learnable. E.g. decryption.

LEARNING ALGORITHM AS A BLACK BOX With input file message and output file message. Types of messages:

o Unstructured or random messageso Highly structured messages with patterns that are easy to find.o Highly structured data sets that are difficult to decipher.o Partially structured data sets.

of 94

ai complete notes

Documents