cs7015 (deep learning) : lecture 2miteshk/cs7015/slides/... · the sense organs relay information...

421
1/69 CS7015 (Deep Learning) : Lecture 2 McCulloch Pitts Neuron, Thresholding Logic, Perceptrons, Perceptron Learning Algorithm and Convergence, Multilayer Perceptrons (MLPs), Representation Power of MLPs Mitesh M. Khapra Department of Computer Science and Engineering Indian Institute of Technology Madras Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Upload: others

Post on 07-May-2020

4 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

1/69

CS7015 (Deep Learning) : Lecture 2McCulloch Pitts Neuron, Thresholding Logic, Perceptrons, PerceptronLearning Algorithm and Convergence, Multilayer Perceptrons (MLPs),

Representation Power of MLPs

Mitesh M. Khapra

Department of Computer Science and EngineeringIndian Institute of Technology Madras

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 2: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

2/69

Module 2.1: Biological Neurons

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 3: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

3/69

x1 x2 x3

σ

y

w1 w2 w3

Artificial Neuron

The most fundamental unit of a deepneural network is called an artificialneuron

Why is it called a neuron ? Wheredoes the inspiration come from ?

The inspiration comes from biology(more specifically, from the brain)

biological neurons = neural cells =neural processing units

We will first see what a biologicalneuron looks like ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 4: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

3/69

x1 x2 x3

σ

y

w1 w2 w3

Artificial Neuron

The most fundamental unit of a deepneural network is called an artificialneuron

Why is it called a neuron ? Wheredoes the inspiration come from ?

The inspiration comes from biology(more specifically, from the brain)

biological neurons = neural cells =neural processing units

We will first see what a biologicalneuron looks like ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 5: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

3/69

x1 x2 x3

σ

y

w1 w2 w3

Artificial Neuron

The most fundamental unit of a deepneural network is called an artificialneuron

Why is it called a neuron ? Wheredoes the inspiration come from ?

The inspiration comes from biology(more specifically, from the brain)

biological neurons = neural cells =neural processing units

We will first see what a biologicalneuron looks like ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 6: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

3/69

x1 x2 x3

σ

y

w1 w2 w3

Artificial Neuron

The most fundamental unit of a deepneural network is called an artificialneuron

Why is it called a neuron ? Wheredoes the inspiration come from ?

The inspiration comes from biology(more specifically, from the brain)

biological neurons = neural cells =neural processing units

We will first see what a biologicalneuron looks like ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 7: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

3/69

x1 x2 x3

σ

y

w1 w2 w3

Artificial Neuron

The most fundamental unit of a deepneural network is called an artificialneuron

Why is it called a neuron ? Wheredoes the inspiration come from ?

The inspiration comes from biology(more specifically, from the brain)

biological neurons = neural cells =neural processing units

We will first see what a biologicalneuron looks like ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 8: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

4/69

Biological Neurons∗

∗Image adapted fromhttps://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg

dendrite: receives signals from otherneurons

synapse: point of connection toother neurons

soma: processes the information

axon: transmits the output of thisneuron

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 9: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

4/69

Biological Neurons∗

∗Image adapted fromhttps://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg

dendrite: receives signals from otherneurons

synapse: point of connection toother neurons

soma: processes the information

axon: transmits the output of thisneuron

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 10: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

4/69

Biological Neurons∗

∗Image adapted fromhttps://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg

dendrite: receives signals from otherneurons

synapse: point of connection toother neurons

soma: processes the information

axon: transmits the output of thisneuron

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 11: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

4/69

Biological Neurons∗

∗Image adapted fromhttps://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg

dendrite: receives signals from otherneurons

synapse: point of connection toother neurons

soma: processes the information

axon: transmits the output of thisneuron

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 12: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

4/69

Biological Neurons∗

∗Image adapted fromhttps://cdn.vectorstock.com/i/composite/12,25/neuron-cell-vector-81225.jpg

dendrite: receives signals from otherneurons

synapse: point of connection toother neurons

soma: processes the information

axon: transmits the output of thisneuron

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 13: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

5/69

Let us see a very cartoonish illustra-tion of how a neuron works

Our sense organs interact with theoutside world

They relay information to the neur-ons

The neurons (may) get activated andproduces a response (laughter in thiscase)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 14: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

5/69

Let us see a very cartoonish illustra-tion of how a neuron works

Our sense organs interact with theoutside world

They relay information to the neur-ons

The neurons (may) get activated andproduces a response (laughter in thiscase)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 15: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

5/69

Let us see a very cartoonish illustra-tion of how a neuron works

Our sense organs interact with theoutside world

They relay information to the neur-ons

The neurons (may) get activated andproduces a response (laughter in thiscase)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 16: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

5/69

Let us see a very cartoonish illustra-tion of how a neuron works

Our sense organs interact with theoutside world

They relay information to the neur-ons

The neurons (may) get activated andproduces a response (laughter in thiscase)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 17: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

6/69

Of course, in reality, it is not just a singleneuron which does all this

There is a massively parallel interconnectednetwork of neurons

The sense organs relay information to the low-est layer of neurons

Some of these neurons may fire (in red) in re-sponse to this information and in turn relayinformation to other neurons they are connec-ted to

These neurons may also fire (again, in red)and the process continues

eventually resultingin a response (laughter in this case)

An average human brain has around 1011 (100billion) neurons!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 18: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

6/69

Of course, in reality, it is not just a singleneuron which does all this

There is a massively parallel interconnectednetwork of neurons

The sense organs relay information to the low-est layer of neurons

Some of these neurons may fire (in red) in re-sponse to this information and in turn relayinformation to other neurons they are connec-ted to

These neurons may also fire (again, in red)and the process continues

eventually resultingin a response (laughter in this case)

An average human brain has around 1011 (100billion) neurons!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 19: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

6/69

Of course, in reality, it is not just a singleneuron which does all this

There is a massively parallel interconnectednetwork of neurons

The sense organs relay information to the low-est layer of neurons

Some of these neurons may fire (in red) in re-sponse to this information and in turn relayinformation to other neurons they are connec-ted to

These neurons may also fire (again, in red)and the process continues

eventually resultingin a response (laughter in this case)

An average human brain has around 1011 (100billion) neurons!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 20: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

6/69

Of course, in reality, it is not just a singleneuron which does all this

There is a massively parallel interconnectednetwork of neurons

The sense organs relay information to the low-est layer of neurons

Some of these neurons may fire (in red) in re-sponse to this information and in turn relayinformation to other neurons they are connec-ted to

These neurons may also fire (again, in red)and the process continues

eventually resultingin a response (laughter in this case)

An average human brain has around 1011 (100billion) neurons!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 21: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

6/69

Of course, in reality, it is not just a singleneuron which does all this

There is a massively parallel interconnectednetwork of neurons

The sense organs relay information to the low-est layer of neurons

Some of these neurons may fire (in red) in re-sponse to this information and in turn relayinformation to other neurons they are connec-ted to

These neurons may also fire (again, in red)and the process continues

eventually resultingin a response (laughter in this case)

An average human brain has around 1011 (100billion) neurons!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 22: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

6/69

Of course, in reality, it is not just a singleneuron which does all this

There is a massively parallel interconnectednetwork of neurons

The sense organs relay information to the low-est layer of neurons

Some of these neurons may fire (in red) in re-sponse to this information and in turn relayinformation to other neurons they are connec-ted to

These neurons may also fire (again, in red)and the process continues eventually resultingin a response (laughter in this case)

An average human brain has around 1011 (100billion) neurons!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 23: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

6/69

Of course, in reality, it is not just a singleneuron which does all this

There is a massively parallel interconnectednetwork of neurons

The sense organs relay information to the low-est layer of neurons

Some of these neurons may fire (in red) in re-sponse to this information and in turn relayinformation to other neurons they are connec-ted to

These neurons may also fire (again, in red)and the process continues eventually resultingin a response (laughter in this case)

An average human brain has around 1011 (100billion) neurons!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 24: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

7/69

This massively parallel network also ensuresthat there is division of work

Each neuron may perform a certain role orrespond to a certain stimulus

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 25: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

7/69

This massively parallel network also ensuresthat there is division of work

Each neuron may perform a certain role orrespond to a certain stimulus

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 26: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

7/69

A simplified illustration

This massively parallel network also ensuresthat there is division of work

Each neuron may perform a certain role orrespond to a certain stimulus

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 27: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

7/69

A simplified illustration

This massively parallel network also ensuresthat there is division of work

Each neuron may perform a certain role orrespond to a certain stimulus

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 28: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

7/69

A simplified illustration

This massively parallel network also ensuresthat there is division of work

Each neuron may perform a certain role orrespond to a certain stimulus

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 29: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

7/69

A simplified illustration

This massively parallel network also ensuresthat there is division of work

Each neuron may perform a certain role orrespond to a certain stimulus

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 30: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

7/69A simplified illustration

This massively parallel network also ensuresthat there is division of work

Each neuron may perform a certain role orrespond to a certain stimulus

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 31: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

8/69

The neurons in the brain are arrangedin a hierarchy

We illustrate this with the help ofvisual cortex (part of the brain) whichdeals with processing visual informa-tion

Starting from the retina, the informa-tion is relayed to several layers (followthe arrows)

We observe that the layers V 1, V 2 toAIT form a hierarchy (from identify-ing simple visual forms to high levelobjects)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 32: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

8/69

The neurons in the brain are arrangedin a hierarchy

We illustrate this with the help ofvisual cortex (part of the brain) whichdeals with processing visual informa-tion

Starting from the retina, the informa-tion is relayed to several layers (followthe arrows)

We observe that the layers V 1, V 2 toAIT form a hierarchy (from identify-ing simple visual forms to high levelobjects)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 33: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

8/69

The neurons in the brain are arrangedin a hierarchy

We illustrate this with the help ofvisual cortex (part of the brain) whichdeals with processing visual informa-tion

Starting from the retina, the informa-tion is relayed to several layers (followthe arrows)

We observe that the layers V 1, V 2 toAIT form a hierarchy (from identify-ing simple visual forms to high levelobjects)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 34: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

8/69

The neurons in the brain are arrangedin a hierarchy

We illustrate this with the help ofvisual cortex (part of the brain) whichdeals with processing visual informa-tion

Starting from the retina, the informa-tion is relayed to several layers (followthe arrows)

We observe that the layers V 1, V 2 toAIT form a hierarchy (from identify-ing simple visual forms to high levelobjects)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 35: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

9/69

Sample illustration of hierarchicalprocessing∗

∗Idea borrowed from Hugo Larochelle’s lecture slides

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 36: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

9/69

Sample illustration of hierarchicalprocessing∗

∗Idea borrowed from Hugo Larochelle’s lecture slides

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 37: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

9/69

Sample illustration of hierarchicalprocessing∗

∗Idea borrowed from Hugo Larochelle’s lecture slides

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 38: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

10/69

Disclaimer

I understand very little about how the brain works!

What you saw so far is an overly simplified explanation of how the brain works!

But this explanation suffices for the purpose of this course!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 39: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

11/69

Module 2.2: McCulloch Pitts Neuron

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 40: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn

∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs

and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 41: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1

x2 .. .. xn

∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs

and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 42: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2

.. .. xn

∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs

and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 43: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. ..

xn

∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs

and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 44: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn

∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs

and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 45: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs

and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 46: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs

and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 47: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 48: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 49: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 50: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 51: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 52: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ

= 0 if g(x) < θθ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 53: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 54: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 55: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

12/69

x1 x2 .. .. xn ∈ {0, 1}

y ∈ {0, 1}

g

f

McCulloch (neuroscientist) and Pitts (logi-cian) proposed a highly simplified computa-tional model of the neuron (1943)

g aggregates the inputs and the function ftakes a decision based on this aggregation

The inputs can be excitatory or inhibitory

y = 0 if any xi is inhibitory, else

g(x1, x2, ..., xn) = g(x) =

n∑i=1

xi

y = f(g(x)) = 1 if g(x) ≥ θ= 0 if g(x) < θ

θ is called the thresholding parameter

This is called Thresholding Logic

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 56: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

13/69

Let us implement some boolean functions using this McCulloch Pitts (MP) neuron...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 57: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 58: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2 x3

y ∈ {0, 1}

3

AND function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 59: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2 x3

y ∈ {0, 1}

3

AND function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 60: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2 x3

y ∈ {0, 1}

1

OR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 61: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2 x3

y ∈ {0, 1}

1

OR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 62: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2

y ∈ {0, 1}

1

x1 AND !x2∗

∗circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2 x3

y ∈ {0, 1}

1

OR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 63: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2

y ∈ {0, 1}

1

x1 AND !x2∗

∗circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2 x3

y ∈ {0, 1}

1

OR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 64: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2

y ∈ {0, 1}

1

x1 AND !x2∗

∗circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2

y ∈ {0, 1}

0

NOR function

x1 x2 x3

y ∈ {0, 1}

1

OR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 65: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2

y ∈ {0, 1}

1

x1 AND !x2∗

∗circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2

y ∈ {0, 1}

0

NOR function

x1 x2 x3

y ∈ {0, 1}

1

OR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 66: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2

y ∈ {0, 1}

1

x1 AND !x2∗

∗circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2

y ∈ {0, 1}

0

NOR function

x1 x2 x3

y ∈ {0, 1}

1

OR function

x1

y ∈ {0, 1}

0

NOT function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 67: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

14/69

x1 x2 x3

y ∈ {0, 1}

θ

A McCulloch Pitts unit

x1 x2

y ∈ {0, 1}

1

x1 AND !x2∗

∗circle at the end indicates inhibitory input: if any inhibitory input is 1 the output will be 0

x1 x2 x3

y ∈ {0, 1}

3

AND function

x1 x2

y ∈ {0, 1}

0

NOR function

x1 x2 x3

y ∈ {0, 1}

1

OR function

x1

y ∈ {0, 1}

0

NOT function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 68: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

15/69

Can any boolean function be represented using a McCulloch Pitts unit ?

Before answering this question let us first see the geometric interpretation of aMP unit ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 69: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

15/69

Can any boolean function be represented using a McCulloch Pitts unit ?

Before answering this question let us first see the geometric interpretation of aMP unit ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 70: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

16/69

x1 x2

y ∈ {0, 1}

1

OR functionx1 + x2 =

∑2i=1 xi ≥ 1

A single MP neuron splits the input points (4points for 2 binary inputs) into two halves

Points lying on or above the line∑n

i=1 xi−θ =0 and points lying below this line

In other words, all inputs which produce anoutput 0 will be on one side (

∑ni=1 xi < θ)

of the line and all inputs which produce anoutput 1 will lie on the other side (

∑ni=1 xi ≥

θ) of this line

Let us convince ourselves about this with afew more examples (if it is not already clearfrom the math)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 71: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

16/69

x1 x2

y ∈ {0, 1}

1

OR functionx1 + x2 =

∑2i=1 xi ≥ 1

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 1

A single MP neuron splits the input points (4points for 2 binary inputs) into two halves

Points lying on or above the line∑n

i=1 xi−θ =0 and points lying below this line

In other words, all inputs which produce anoutput 0 will be on one side (

∑ni=1 xi < θ)

of the line and all inputs which produce anoutput 1 will lie on the other side (

∑ni=1 xi ≥

θ) of this line

Let us convince ourselves about this with afew more examples (if it is not already clearfrom the math)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 72: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

16/69

x1 x2

y ∈ {0, 1}

1

OR functionx1 + x2 =

∑2i=1 xi ≥ 1

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 1

A single MP neuron splits the input points (4points for 2 binary inputs) into two halves

Points lying on or above the line∑n

i=1 xi−θ =0 and points lying below this line

In other words, all inputs which produce anoutput 0 will be on one side (

∑ni=1 xi < θ)

of the line and all inputs which produce anoutput 1 will lie on the other side (

∑ni=1 xi ≥

θ) of this line

Let us convince ourselves about this with afew more examples (if it is not already clearfrom the math)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 73: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

16/69

x1 x2

y ∈ {0, 1}

1

OR functionx1 + x2 =

∑2i=1 xi ≥ 1

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 1

A single MP neuron splits the input points (4points for 2 binary inputs) into two halves

Points lying on or above the line∑n

i=1 xi−θ =0 and points lying below this line

In other words, all inputs which produce anoutput 0 will be on one side (

∑ni=1 xi < θ)

of the line and all inputs which produce anoutput 1 will lie on the other side (

∑ni=1 xi ≥

θ) of this line

Let us convince ourselves about this with afew more examples (if it is not already clearfrom the math)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 74: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

16/69

x1 x2

y ∈ {0, 1}

1

OR functionx1 + x2 =

∑2i=1 xi ≥ 1

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 1

A single MP neuron splits the input points (4points for 2 binary inputs) into two halves

Points lying on or above the line∑n

i=1 xi−θ =0 and points lying below this line

In other words, all inputs which produce anoutput 0 will be on one side (

∑ni=1 xi < θ)

of the line and all inputs which produce anoutput 1 will lie on the other side (

∑ni=1 xi ≥

θ) of this line

Let us convince ourselves about this with afew more examples (if it is not already clearfrom the math)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 75: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

16/69

x1 x2

y ∈ {0, 1}

1

OR functionx1 + x2 =

∑2i=1 xi ≥ 1

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 1

A single MP neuron splits the input points (4points for 2 binary inputs) into two halves

Points lying on or above the line∑n

i=1 xi−θ =0 and points lying below this line

In other words, all inputs which produce anoutput 0 will be on one side (

∑ni=1 xi < θ)

of the line and all inputs which produce anoutput 1 will lie on the other side (

∑ni=1 xi ≥

θ) of this line

Let us convince ourselves about this with afew more examples (if it is not already clearfrom the math)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 76: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

16/69

x1 x2

y ∈ {0, 1}

1

OR functionx1 + x2 =

∑2i=1 xi ≥ 1

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 1

A single MP neuron splits the input points (4points for 2 binary inputs) into two halves

Points lying on or above the line∑n

i=1 xi−θ =0 and points lying below this line

In other words, all inputs which produce anoutput 0 will be on one side (

∑ni=1 xi < θ)

of the line and all inputs which produce anoutput 1 will lie on the other side (

∑ni=1 xi ≥

θ) of this line

Let us convince ourselves about this with afew more examples (if it is not already clearfrom the math)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 77: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

17/69

x1 x2

y ∈ {0, 1}

2

AND functionx1 + x2 =

∑2i=1 xi ≥ 2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 78: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

17/69

x1 x2

y ∈ {0, 1}

2

AND functionx1 + x2 =

∑2i=1 xi ≥ 2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 79: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

17/69

x1 x2

y ∈ {0, 1}

2

AND functionx1 + x2 =

∑2i=1 xi ≥ 2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 2

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 80: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

17/69

x1 x2

y ∈ {0, 1}

2

AND functionx1 + x2 =

∑2i=1 xi ≥ 2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 2

x1 x2

y ∈ {0, 1}

Tautology (always ON)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 81: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

17/69

x1 x2

y ∈ {0, 1}

2

AND functionx1 + x2 =

∑2i=1 xi ≥ 2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 2

x1 x2

y ∈ {0, 1}

0

Tautology (always ON)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 82: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

17/69

x1 x2

y ∈ {0, 1}

2

AND functionx1 + x2 =

∑2i=1 xi ≥ 2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 2

x1 x2

y ∈ {0, 1}

0

Tautology (always ON)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 83: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

17/69

x1 x2

y ∈ {0, 1}

2

AND functionx1 + x2 =

∑2i=1 xi ≥ 2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 2

x1 x2

y ∈ {0, 1}

0

Tautology (always ON)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

x1 + x2 = θ = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 84: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

18/69

x1 x2 x3

y ∈ {0, 1}

OR1

What if we have more than 2 inputs?

Well, instead of a line we will have aplane

For the OR function, we want a planesuch that the point (0,0,0) lies on oneside and the remaining 7 points lie onthe other side of the plane

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 85: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

18/69

x1 x2 x3

y ∈ {0, 1}

OR1

x1

x2

x3

(0, 0, 0)

(0, 1, 0)

(1, 0, 0)

(1, 1, 0)

(0, 0, 1) (1, 0, 1)

(0, 1, 1) (1, 1, 1)

x1 + x2 + x3 = θ = 1

What if we have more than 2 inputs?

Well, instead of a line we will have aplane

For the OR function, we want a planesuch that the point (0,0,0) lies on oneside and the remaining 7 points lie onthe other side of the plane

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 86: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

18/69

x1 x2 x3

y ∈ {0, 1}

OR1

x1

x2

x3

(0, 0, 0)

(0, 1, 0)

(1, 0, 0)

(1, 1, 0)

(0, 0, 1) (1, 0, 1)

(0, 1, 1) (1, 1, 1)

x1 + x2 + x3 = θ = 1

What if we have more than 2 inputs?

Well, instead of a line we will have aplane

For the OR function, we want a planesuch that the point (0,0,0) lies on oneside and the remaining 7 points lie onthe other side of the plane

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 87: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

18/69

x1 x2 x3

y ∈ {0, 1}

OR1

x1

x2

x3

(0, 0, 0)

(0, 1, 0)

(1, 0, 0)

(1, 1, 0)

(0, 0, 1) (1, 0, 1)

(0, 1, 1) (1, 1, 1)

x1 + x2 + x3 = θ = 1

What if we have more than 2 inputs?

Well, instead of a line we will have aplane

For the OR function, we want a planesuch that the point (0,0,0) lies on oneside and the remaining 7 points lie onthe other side of the plane

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 88: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

18/69

x1 x2 x3

y ∈ {0, 1}

OR1

x1

x2

x3

(0, 0, 0)

(0, 1, 0)

(1, 0, 0)

(1, 1, 0)

(0, 0, 1) (1, 0, 1)

(0, 1, 1) (1, 1, 1)x1 + x2 + x3 = θ = 1

What if we have more than 2 inputs?

Well, instead of a line we will have aplane

For the OR function, we want a planesuch that the point (0,0,0) lies on oneside and the remaining 7 points lie onthe other side of the plane

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 89: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

19/69

The story so far ...

A single McCulloch Pitts Neuron can be used to represent boolean functionswhich are linearly separable

Linear separability (for boolean functions) : There exists a line (plane) suchthat all inputs which produce a 1 lie on one side of the line (plane) and allinputs which produce a 0 lie on other side of the line (plane)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 90: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

19/69

The story so far ...

A single McCulloch Pitts Neuron can be used to represent boolean functionswhich are linearly separable

Linear separability (for boolean functions) : There exists a line (plane) suchthat all inputs which produce a 1 lie on one side of the line (plane) and allinputs which produce a 0 lie on other side of the line (plane)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 91: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

20/69

Module 2.3: Perceptron

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 92: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

21/69

The story ahead ...

What about non-boolean (say, real) inputs ?

Do we always need to hand code the threshold ?

Are all inputs equal ? What if we want to assign more weight (importance) tosome inputs ?

What about functions which are not linearly separable ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 93: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

21/69

The story ahead ...

What about non-boolean (say, real) inputs ?

Do we always need to hand code the threshold ?

Are all inputs equal ? What if we want to assign more weight (importance) tosome inputs ?

What about functions which are not linearly separable ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 94: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

21/69

The story ahead ...

What about non-boolean (say, real) inputs ?

Do we always need to hand code the threshold ?

Are all inputs equal ? What if we want to assign more weight (importance) tosome inputs ?

What about functions which are not linearly separable ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 95: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

21/69

The story ahead ...

What about non-boolean (say, real) inputs ?

Do we always need to hand code the threshold ?

Are all inputs equal ? What if we want to assign more weight (importance) tosome inputs ?

What about functions which are not linearly separable ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 96: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

22/69

x1 x2 .. .. xn

y

w1 w2 .. .. wn

Frank Rosenblatt, an American psychologist,proposed the classical perceptron model(1958)

A more general computational model thanMcCulloch–Pitts neurons

Main differences: Introduction of numer-ical weights for inputs and a mechanism forlearning these weights

Inputs are no longer limited to boolean values

Refined and carefully analyzed by Minsky andPapert (1969) - their model is referred to asthe perceptron model here

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 97: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

22/69

x1 x2 .. .. xn

y

w1 w2 .. .. wn

Frank Rosenblatt, an American psychologist,proposed the classical perceptron model(1958)

A more general computational model thanMcCulloch–Pitts neurons

Main differences: Introduction of numer-ical weights for inputs and a mechanism forlearning these weights

Inputs are no longer limited to boolean values

Refined and carefully analyzed by Minsky andPapert (1969) - their model is referred to asthe perceptron model here

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 98: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

22/69

x1 x2 .. .. xn

y

w1 w2 .. .. wn

Frank Rosenblatt, an American psychologist,proposed the classical perceptron model(1958)

A more general computational model thanMcCulloch–Pitts neurons

Main differences: Introduction of numer-ical weights for inputs and a mechanism forlearning these weights

Inputs are no longer limited to boolean values

Refined and carefully analyzed by Minsky andPapert (1969) - their model is referred to asthe perceptron model here

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 99: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

22/69

x1 x2 .. .. xn

y

w1 w2 .. .. wn

Frank Rosenblatt, an American psychologist,proposed the classical perceptron model(1958)

A more general computational model thanMcCulloch–Pitts neurons

Main differences: Introduction of numer-ical weights for inputs and a mechanism forlearning these weights

Inputs are no longer limited to boolean values

Refined and carefully analyzed by Minsky andPapert (1969) - their model is referred to asthe perceptron model here

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 100: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

22/69

x1 x2 .. .. xn

y

w1 w2 .. .. wn

Frank Rosenblatt, an American psychologist,proposed the classical perceptron model(1958)

A more general computational model thanMcCulloch–Pitts neurons

Main differences: Introduction of numer-ical weights for inputs and a mechanism forlearning these weights

Inputs are no longer limited to boolean values

Refined and carefully analyzed by Minsky andPapert (1969) - their model is referred to asthe perceptron model here

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 101: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

22/69

x1 x2 .. .. xn

y

w1 w2 .. .. wn

Frank Rosenblatt, an American psychologist,proposed the classical perceptron model(1958)

A more general computational model thanMcCulloch–Pitts neurons

Main differences: Introduction of numer-ical weights for inputs and a mechanism forlearning these weights

Inputs are no longer limited to boolean values

Refined and carefully analyzed by Minsky andPapert (1969) - their model is referred to asthe perceptron model here

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 102: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 ifn∑i=1

wi ∗ xi ≥ θ

= 0 ifn∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 103: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 ifn∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 104: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 105: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 106: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 107: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 108: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 109: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 if

n∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 110: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xn

x0 = 1

y

w1 w2 .. .. wn

w0 = −θ

A more accepted convention,

y = 1 if

n∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 111: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xnx0 = 1

y

w1 w2 .. .. wnw0 = −θ

A more accepted convention,

y = 1 if

n∑i=0

wi ∗ xi ≥ 0

= 0 ifn∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 112: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

23/69

x1 x2 .. .. xnx0 = 1

y

w1 w2 .. .. wnw0 = −θ

A more accepted convention,

y = 1 if

n∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

where, x0 = 1 and w0 = −θ

y = 1 if

n∑i=1

wi ∗ xi ≥ θ

= 0 if

n∑i=1

wi ∗ xi < θ

Rewriting the above,

y = 1 if

n∑i=1

wi ∗ xi − θ ≥ 0

= 0 if

n∑i=1

wi ∗ xi − θ < 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 113: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

24/69

We will now try to answer the following questions:

Why are we trying to implement boolean functions?

Why do we need weights ?

Why is w0 = −θ called the bias ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 114: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

Consider the task of predicting whether we would likea movie or not

Suppose, we base our decision on 3 inputs (binary, forsimplicity)

Based on our past viewing experience (data), we maygive a high weight to isDirectorNolan as compared tothe other inputs

Specifically, even if the actor is not Matt Damon andthe genre is not thriller we would still want to crossthe threshold θ by assigning a high weight to isDirect-orNolan

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 115: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

Consider the task of predicting whether we would likea movie or not

Suppose, we base our decision on 3 inputs (binary, forsimplicity)

Based on our past viewing experience (data), we maygive a high weight to isDirectorNolan as compared tothe other inputs

Specifically, even if the actor is not Matt Damon andthe genre is not thriller we would still want to crossthe threshold θ by assigning a high weight to isDirect-orNolan

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 116: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

Consider the task of predicting whether we would likea movie or not

Suppose, we base our decision on 3 inputs (binary, forsimplicity)

Based on our past viewing experience (data), we maygive a high weight to isDirectorNolan as compared tothe other inputs

Specifically, even if the actor is not Matt Damon andthe genre is not thriller we would still want to crossthe threshold θ by assigning a high weight to isDirect-orNolan

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 117: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

Consider the task of predicting whether we would likea movie or not

Suppose, we base our decision on 3 inputs (binary, forsimplicity)

Based on our past viewing experience (data), we maygive a high weight to isDirectorNolan as compared tothe other inputs

Specifically, even if the actor is not Matt Damon andthe genre is not thriller we would still want to crossthe threshold θ by assigning a high weight to isDirect-orNolan

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 118: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

w0 is called the bias as it represents the prior (preju-dice)

A movie buff may have a very low threshold and maywatch any movie irrespective of the genre, actor, dir-ector [θ = 0]

On the other hand, a selective viewer may only watchthrillers starring Matt Damon and directed by Nolan[θ = 3]

The weights (w1, w2, ..., wn) and the bias (w0) will de-pend on the data (viewer history in this case)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 119: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

w0 is called the bias as it represents the prior (preju-dice)

A movie buff may have a very low threshold and maywatch any movie irrespective of the genre, actor, dir-ector [θ = 0]

On the other hand, a selective viewer may only watchthrillers starring Matt Damon and directed by Nolan[θ = 3]

The weights (w1, w2, ..., wn) and the bias (w0) will de-pend on the data (viewer history in this case)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 120: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

w0 is called the bias as it represents the prior (preju-dice)

A movie buff may have a very low threshold and maywatch any movie irrespective of the genre, actor, dir-ector [θ = 0]

On the other hand, a selective viewer may only watchthrillers starring Matt Damon and directed by Nolan[θ = 3]

The weights (w1, w2, ..., wn) and the bias (w0) will de-pend on the data (viewer history in this case)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 121: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

25/69

x0 = 1 x1 x2 x3

y

w0 = −θ w1 w2 w3

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

w0 is called the bias as it represents the prior (preju-dice)

A movie buff may have a very low threshold and maywatch any movie irrespective of the genre, actor, dir-ector [θ = 0]

On the other hand, a selective viewer may only watchthrillers starring Matt Damon and directed by Nolan[θ = 3]

The weights (w1, w2, ..., wn) and the bias (w0) will de-pend on the data (viewer history in this case)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 122: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

26/69

What kind of functions can be implemented using the perceptron? Any differencefrom McCulloch Pitts neurons?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 123: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

27/69

McCulloch Pitts Neuron(assuming no inhibitory inputs)

y = 1 if

n∑i=0

xi ≥ 0

= 0 if

n∑i=0

xi < 0

Perceptron

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

From the equations it should be clear thateven a perceptron separates the input spaceinto two halves

All inputs which produce a 1 lie on one sideand all inputs which produce a 0 lie on theother side

In other words, a single perceptron can onlybe used to implement linearly separable func-tions

Then what is the difference?

The weights (in-cluding threshold) can be learned and the in-puts can be real valued

We will first revisit some boolean functionsand then see the perceptron learning al-gorithm (for learning weights)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 124: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

27/69

McCulloch Pitts Neuron(assuming no inhibitory inputs)

y = 1 if

n∑i=0

xi ≥ 0

= 0 if

n∑i=0

xi < 0

Perceptron

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

From the equations it should be clear thateven a perceptron separates the input spaceinto two halves

All inputs which produce a 1 lie on one sideand all inputs which produce a 0 lie on theother side

In other words, a single perceptron can onlybe used to implement linearly separable func-tions

Then what is the difference?

The weights (in-cluding threshold) can be learned and the in-puts can be real valued

We will first revisit some boolean functionsand then see the perceptron learning al-gorithm (for learning weights)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 125: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

27/69

McCulloch Pitts Neuron(assuming no inhibitory inputs)

y = 1 if

n∑i=0

xi ≥ 0

= 0 if

n∑i=0

xi < 0

Perceptron

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

From the equations it should be clear thateven a perceptron separates the input spaceinto two halves

All inputs which produce a 1 lie on one sideand all inputs which produce a 0 lie on theother side

In other words, a single perceptron can onlybe used to implement linearly separable func-tions

Then what is the difference?

The weights (in-cluding threshold) can be learned and the in-puts can be real valued

We will first revisit some boolean functionsand then see the perceptron learning al-gorithm (for learning weights)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 126: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

27/69

McCulloch Pitts Neuron(assuming no inhibitory inputs)

y = 1 if

n∑i=0

xi ≥ 0

= 0 if

n∑i=0

xi < 0

Perceptron

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

From the equations it should be clear thateven a perceptron separates the input spaceinto two halves

All inputs which produce a 1 lie on one sideand all inputs which produce a 0 lie on theother side

In other words, a single perceptron can onlybe used to implement linearly separable func-tions

Then what is the difference?

The weights (in-cluding threshold) can be learned and the in-puts can be real valued

We will first revisit some boolean functionsand then see the perceptron learning al-gorithm (for learning weights)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 127: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

27/69

McCulloch Pitts Neuron(assuming no inhibitory inputs)

y = 1 if

n∑i=0

xi ≥ 0

= 0 if

n∑i=0

xi < 0

Perceptron

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

From the equations it should be clear thateven a perceptron separates the input spaceinto two halves

All inputs which produce a 1 lie on one sideand all inputs which produce a 0 lie on theother side

In other words, a single perceptron can onlybe used to implement linearly separable func-tions

Then what is the difference?

The weights (in-cluding threshold) can be learned and the in-puts can be real valued

We will first revisit some boolean functionsand then see the perceptron learning al-gorithm (for learning weights)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 128: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

27/69

McCulloch Pitts Neuron(assuming no inhibitory inputs)

y = 1 if

n∑i=0

xi ≥ 0

= 0 if

n∑i=0

xi < 0

Perceptron

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

From the equations it should be clear thateven a perceptron separates the input spaceinto two halves

All inputs which produce a 1 lie on one sideand all inputs which produce a 0 lie on theother side

In other words, a single perceptron can onlybe used to implement linearly separable func-tions

Then what is the difference? The weights (in-cluding threshold) can be learned and the in-puts can be real valued

We will first revisit some boolean functionsand then see the perceptron learning al-gorithm (for learning weights)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 129: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

27/69

McCulloch Pitts Neuron(assuming no inhibitory inputs)

y = 1 if

n∑i=0

xi ≥ 0

= 0 if

n∑i=0

xi < 0

Perceptron

y = 1 ifn∑i=0

wi ∗ xi ≥ 0

= 0 if

n∑i=0

wi ∗ xi < 0

From the equations it should be clear thateven a perceptron separates the input spaceinto two halves

All inputs which produce a 1 lie on one sideand all inputs which produce a 0 lie on theother side

In other words, a single perceptron can onlybe used to implement linearly separable func-tions

Then what is the difference? The weights (in-cluding threshold) can be learned and the in-puts can be real valued

We will first revisit some boolean functionsand then see the perceptron learning al-gorithm (for learning weights)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 130: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0

0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 131: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0

w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 132: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi

< 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 133: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 134: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1

w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 135: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 136: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1

w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 137: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 138: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 139: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 140: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 141: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 142: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 143: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 144: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 145: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 146: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso

(Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 147: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

28/69

x1 x2 OR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 1 w0 +∑2

i=1wixi ≥ 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 ≥ 0 =⇒ w1 + w2 ≥ −w0

One possible solution to this set of inequalitiesis w0 = −1, w1 = 1.1, , w2 = 1.1 (and variousother solutions are possible)

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

Note that we can come upwith a similar set of inequal-ities and find the value of θfor a McCulloch Pitts neuronalso (Try it!)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 148: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

29/69

Module 2.4: Errors and Error Surfaces

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 149: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line?

We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 150: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line?

We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 151: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line?

We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 152: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line? We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 153: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line? We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 154: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line? We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 3

1.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 155: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line? We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 1

0.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 156: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line? We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 157: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line? We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 158: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

30/69

Let us fix the threshold (−w0 = 1) and trydifferent values of w1, w2

Say, w1 = −1, w2 = −1

What is wrong with this line? We make anerror on 1 out of the 4 inputs

Lets try some more values of w1, w2 and notehow many errors we make

w1 w2 errors

-1 -1 31.5 0 10.45 0.45 3

We are interested in those values of w0, w1, w2

which result in 0 error

Let us plot the error surface corresponding todifferent values of w0, w1, w2

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

−1 + 1.1x1 + 1.1x2 = 0

−1 + (−1)x1 + (−1)x2 = 0

−1 + (1.5)x1 + (0)x2 = 0

−1 + (0.45)x1 + (0.45)x2 = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 159: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

31/69

For ease of analysis, we will keep w0

fixed (-1) and plot the error for dif-ferent values of w1, w2

For a given w0, w1, w2 we will com-pute−w0+w1∗x1+w2∗x2 for all com-binations of (x1, x2) and note downhow many errors we make

For the OR function, an error occursif (x1, x2) = (0, 0) but −w0+w1∗x1+w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but−w0 + w1 ∗ x1 + w2 ∗ x2 < 0

We are interested in finding an al-gorithm which finds the values ofw1, w2 which minimize this error

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 160: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

31/69

For ease of analysis, we will keep w0

fixed (-1) and plot the error for dif-ferent values of w1, w2

For a given w0, w1, w2 we will com-pute−w0+w1∗x1+w2∗x2 for all com-binations of (x1, x2) and note downhow many errors we make

For the OR function, an error occursif (x1, x2) = (0, 0) but −w0+w1∗x1+w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but−w0 + w1 ∗ x1 + w2 ∗ x2 < 0

We are interested in finding an al-gorithm which finds the values ofw1, w2 which minimize this error

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 161: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

31/69

For ease of analysis, we will keep w0

fixed (-1) and plot the error for dif-ferent values of w1, w2

For a given w0, w1, w2 we will com-pute−w0+w1∗x1+w2∗x2 for all com-binations of (x1, x2) and note downhow many errors we make

For the OR function, an error occursif (x1, x2) = (0, 0) but −w0+w1∗x1+w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but−w0 + w1 ∗ x1 + w2 ∗ x2 < 0

We are interested in finding an al-gorithm which finds the values ofw1, w2 which minimize this error

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 162: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

31/69

For ease of analysis, we will keep w0

fixed (-1) and plot the error for dif-ferent values of w1, w2

For a given w0, w1, w2 we will com-pute−w0+w1∗x1+w2∗x2 for all com-binations of (x1, x2) and note downhow many errors we make

For the OR function, an error occursif (x1, x2) = (0, 0) but −w0+w1∗x1+w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but−w0 + w1 ∗ x1 + w2 ∗ x2 < 0

We are interested in finding an al-gorithm which finds the values ofw1, w2 which minimize this error

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 163: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

31/69

For ease of analysis, we will keep w0

fixed (-1) and plot the error for dif-ferent values of w1, w2

For a given w0, w1, w2 we will com-pute−w0+w1∗x1+w2∗x2 for all com-binations of (x1, x2) and note downhow many errors we make

For the OR function, an error occursif (x1, x2) = (0, 0) but −w0+w1∗x1+w2 ∗ x2 ≥ 0 or if (x1, x2) 6= (0, 0) but−w0 + w1 ∗ x1 + w2 ∗ x2 < 0

We are interested in finding an al-gorithm which finds the values ofw1, w2 which minimize this error

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 164: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

32/69

Module 2.5: Perceptron Learning Algorithm

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 165: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

33/69

We will now see a more principled approach for learning these weights andthreshold but before that let us answer this question...

Apart from implementing boolean functions (which does not look very interest-ing) what can a perceptron be used for ?

Our interest lies in the use of perceptron as a binary classifier. Let us see whatthis means...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 166: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

33/69

We will now see a more principled approach for learning these weights andthreshold but before that let us answer this question...

Apart from implementing boolean functions (which does not look very interest-ing) what can a perceptron be used for ?

Our interest lies in the use of perceptron as a binary classifier. Let us see whatthis means...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 167: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

33/69

We will now see a more principled approach for learning these weights andthreshold but before that let us answer this question...

Apart from implementing boolean functions (which does not look very interest-ing) what can a perceptron be used for ?

Our interest lies in the use of perceptron as a binary classifier. Let us see whatthis means...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 168: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

34/69

x0 = 1 x1 x2 .. .. xn

y

w0 = −θ w1 w2 .. .. wn

Let us reconsider our problem of decidingwhether to watch a movie or not

Suppose we are given a list of m movies anda label (class) associated with each movie in-dicating whether the user liked this movie ornot : binary decision

Further, suppose we represent each moviewith n features (some boolean, some real val-ued)

We will assume that the data is linearly sep-arable and we want a perceptron to learn howto make this decision

In other words, we want the perceptron to findthe equation of this separating plane (or findthe values of w0, w1, w2, .., wm)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 169: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

34/69

x0 = 1 x1 x2 .. .. xn

y

w0 = −θ w1 w2 .. .. wn

Let us reconsider our problem of decidingwhether to watch a movie or not

Suppose we are given a list of m movies anda label (class) associated with each movie in-dicating whether the user liked this movie ornot : binary decision

Further, suppose we represent each moviewith n features (some boolean, some real val-ued)

We will assume that the data is linearly sep-arable and we want a perceptron to learn howto make this decision

In other words, we want the perceptron to findthe equation of this separating plane (or findthe values of w0, w1, w2, .., wm)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 170: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

34/69

x0 = 1 x1 x2 .. .. xn

y

w0 = −θ w1 w2 .. .. wn

Let us reconsider our problem of decidingwhether to watch a movie or not

Suppose we are given a list of m movies anda label (class) associated with each movie in-dicating whether the user liked this movie ornot : binary decision

Further, suppose we represent each moviewith n features (some boolean, some real val-ued)

We will assume that the data is linearly sep-arable and we want a perceptron to learn howto make this decision

In other words, we want the perceptron to findthe equation of this separating plane (or findthe values of w0, w1, w2, .., wm)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 171: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

34/69

x0 = 1 x1 x2 .. .. xn

y

w0 = −θ w1 w2 .. .. wn

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

x4 = imdbRating(scaled to 0 to 1)

... ...

xn = criticsRating(scaled to 0 to 1)

Let us reconsider our problem of decidingwhether to watch a movie or not

Suppose we are given a list of m movies anda label (class) associated with each movie in-dicating whether the user liked this movie ornot : binary decision

Further, suppose we represent each moviewith n features (some boolean, some real val-ued)

We will assume that the data is linearly sep-arable and we want a perceptron to learn howto make this decision

In other words, we want the perceptron to findthe equation of this separating plane (or findthe values of w0, w1, w2, .., wm)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 172: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

34/69

x0 = 1 x1 x2 .. .. xn

y

w0 = −θ w1 w2 .. .. wn

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

x4 = imdbRating(scaled to 0 to 1)

... ...

xn = criticsRating(scaled to 0 to 1)

Let us reconsider our problem of decidingwhether to watch a movie or not

Suppose we are given a list of m movies anda label (class) associated with each movie in-dicating whether the user liked this movie ornot : binary decision

Further, suppose we represent each moviewith n features (some boolean, some real val-ued)

We will assume that the data is linearly sep-arable and we want a perceptron to learn howto make this decision

In other words, we want the perceptron to findthe equation of this separating plane (or findthe values of w0, w1, w2, .., wm)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 173: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

34/69

x0 = 1 x1 x2 .. .. xn

y

w0 = −θ w1 w2 .. .. wn

x1 = isActorDamon

x2 = isGenreThriller

x3 = isDirectorNolan

x4 = imdbRating(scaled to 0 to 1)

... ...

xn = criticsRating(scaled to 0 to 1)

Let us reconsider our problem of decidingwhether to watch a movie or not

Suppose we are given a list of m movies anda label (class) associated with each movie in-dicating whether the user liked this movie ornot : binary decision

Further, suppose we represent each moviewith n features (some boolean, some real val-ued)

We will assume that the data is linearly sep-arable and we want a perceptron to learn howto make this decision

In other words, we want the perceptron to findthe equation of this separating plane (or findthe values of w0, w1, w2, .., wm)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 174: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 175: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;

N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 176: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;

Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 177: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;

while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 178: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end

//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 179: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 180: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;

if x ∈ P and∑n

i=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 181: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

end

if x ∈ N and∑n

i=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 182: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

end

if x ∈ N and∑n

i=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 183: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 184: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 185: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 186: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

35/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and

∑ni=0wi ∗ xi < 0 then

w = w + x ;

endif x ∈ N and

∑ni=0wi ∗ xi ≥ 0 then

w = w − x ;

end

end//the algorithm converges when all theinputs are classified correctly

Why would this work ?

To understand why this workswe will have to get into a bit ofLinear Algebra and a bit of geo-metry...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 187: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

We can thus rewrite the perceptronrule as

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 188: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 189: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 190: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 191: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

y = 1 if wTx ≥ 0

= 0 if wTx < 0

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 192: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

y = 1 if wTx ≥ 0

= 0 if wTx < 0

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 193: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

y = 1 if wTx ≥ 0

= 0 if wTx < 0

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 194: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

y = 1 if wTx ≥ 0

= 0 if wTx < 0

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 195: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

y = 1 if wTx ≥ 0

= 0 if wTx < 0

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 196: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

36/69

Consider two vectors w and x

w = [w0, w1, w2, ..., wn]

x = [1, x1, x2, ..., xn]

w · x = wTx =

n∑i=0

wi ∗ xi

We can thus rewrite the perceptronrule as

y = 1 if wTx ≥ 0

= 0 if wTx < 0

We are interested in finding the linewTx = 0 which divides the inputspace into two halves

Every point (x) on this line satisfiesthe equation wTx = 0

What can you tell about the angle (α)between w and any point (x) whichlies on this line ?

The angle is 90◦ (∵ cosα = wT x||w||||x|| =

0)

Since the vector w is perpendicular toevery point on the line it is actuallyperpendicular to the line itself

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 197: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ?

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ?

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 198: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ?

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ?

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 199: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ?

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ?

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 200: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ? Obviously, less than 90◦

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ?

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 201: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ? Obviously, less than 90◦

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ?

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 202: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ? Obviously, less than 90◦

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ?

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 203: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ? Obviously, less than 90◦

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ? Obviously, greater than 90◦

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 204: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ? Obviously, less than 90◦

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ? Obviously, greater than 90◦

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 205: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

37/69

Consider some points (vectors) which lie inthe positive half space of this line (i.e., wTx ≥0)

What will be the angle between any such vec-tor and w ? Obviously, less than 90◦

What about points (vectors) which lie in thenegative half space of this line (i.e., wTx < 0)

What will be the angle between any such vec-tor and w ? Obviously, greater than 90◦

Of course, this also follows from the formula(cosα = wT x

||w||||x||)

Keeping this picture in mind let us revisit thealgorithm

x1

x2

p1

p2

p3

n1

n2 n3

w

wTx = 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 206: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦

(but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 207: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦

(but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 208: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 209: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 210: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 211: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 212: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 213: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 214: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 215: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

38/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ P if w.x < 0 then itmeans that the angle (α) betweenthis x and the current w isgreater than 90◦ (but we want αto be less than 90◦)

What happens to the new angle(αnew) when wnew = w + x

cos(αnew) ∝ wnewTx

∝ (w + x)Tx

∝ wTx + xTx

∝ cosα+ xTx

cos(αnew) > cosα

Thus αnew will be less than α andthis is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 216: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦

(but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 217: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦

(but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 218: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 219: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 220: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 221: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 222: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 223: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 224: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 225: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

39/69

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;Initialize w randomly;while !convergence do

Pick random x ∈ P ∪N ;if x ∈ P and w.x < 0 then

w = w + x ;endif x ∈ N and w.x ≥ 0 then

w = w − x ;end

end//the algorithm converges when all theinputs are classified correctly

cosα =wTx

||w||||x||

For x ∈ N if w.x ≥ 0 then itmeans that the angle (α) betweenthis x and the current w is lessthan 90◦ (but we want α to begreater than 90◦)

What happens to the new angle(αnew) when wnew = w − x

cos(αnew) ∝ wnewTx

∝ (w − x)Tx

∝ wTx− xTx

∝ cosα− xTx

cos(αnew) < cosα

Thus αnew will be greater than αand this is exactly what we want

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 226: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

40/69

We will now see this algorithm in action for a toy dataset

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 227: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 228: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 229: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 230: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p1), apply correc-tion w = w + x ∵ w · x < 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 231: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p1), apply correc-tion w = w + x ∵ w · x < 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 232: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p2), apply correc-tion w = w + x ∵ w · x < 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 233: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p2), apply correc-tion w = w + x ∵ w · x < 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 234: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n1), apply correc-tion w = w − x ∵ w · x ≥ 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 235: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n1), apply correc-tion w = w − x ∵ w · x ≥ 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 236: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n3), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 237: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n3), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 238: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n2), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 239: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n2), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 240: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p3), apply correc-tion w = w + x ∵ w · x < 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 241: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p3), apply correc-tion w = w + x ∵ w · x < 0 (you can checkthe angle visually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 242: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p1), no correctionneeded ∵ w · x ≥ 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 243: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p1), no correctionneeded ∵ w · x ≥ 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 244: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p2), no correctionneeded ∵ w · x ≥ 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 245: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p2), no correctionneeded ∵ w · x ≥ 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 246: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n1), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 247: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n1), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 248: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n3), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 249: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n3), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 250: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n2), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 251: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, n2), no correctionneeded ∵ w · x < 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 252: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

Randomly pick a point (say, p3), no correctionneeded ∵ w · x ≥ 0 (you can check the anglevisually)

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 253: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

41/69

x1

x2

p1

p2

p3

n1

n2 n3

We initialized w to a random value

We observe that currently, w · x < 0 (∵ angle> 90◦) for all the positive points and w ·x ≥ 0(∵ angle < 90◦) for all the negative points(the situation is exactly oppsite of what weactually want it to be)

We now run the algorithm by randomly goingover the points

The algorithm has converged

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 254: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

42/69

Module 2.6: Proof of Convergence

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 255: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

43/69

Now that we have some faith and intuition about why the algorithm works, wewill see a more formal proof of convergence ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 256: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if

n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 257: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist

such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 258: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0

and every point(x1, x2, ..., xn) ∈ N satisfies

∑ni=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 259: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 260: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition:

If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 261: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable,

the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 262: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times.

In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 263: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other,

a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 264: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 265: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

44/69

Theorem

Definition: Two sets P and N of points in an n-dimensional space are calledabsolutely linearly separable if n + 1 real numbers w0, w1, ..., wn exist such thatevery point (x1, x2, ..., xn) ∈ P satisfies

∑ni=1wi ∗ xi > w0 and every point

(x1, x2, ..., xn) ∈ N satisfies∑n

i=1wi ∗ xi < w0.

Proposition: If the sets P and N are finite and linearly separable, the perceptronlearning algorithm updates the weight vector wt a finite number of times. In otherwords: if the vectors in P and N are tested cyclically one after the other, a weightvector wt is found after a finite number of steps t which can separate the two sets.

Proof: On the next slide

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 266: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 267: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 268: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;

N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 269: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;

N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 270: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;

P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 271: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;

Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 272: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;

while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 273: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 274: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;

p← p||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 275: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;

p← p||p|| (so now,||p|| = 1) ;

if w.p < 0 then

w = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 276: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;

p← p||p|| (so now,||p|| = 1) ;

if w.p < 0 thenw = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 277: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;

p← p||p|| (so now,||p|| = 1) ;

if w.p < 0 thenw = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 278: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 thenw = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 279: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

45/69

Setup:

If x ∈ N then -x ∈ P (∵wTx < 0 =⇒ wT (−x) ≥ 0)

We can thus consider a singleset P ′ = P ∪ N− and forevery element p ∈ P ′ ensurethat wT p ≥ 0

Further we will normalize allthe p’s so that ||p|| = 1 (no-tice that this does not affectthe solution ∵ if wT p

||p|| ≥0 then wT p ≥ 0)

Let w∗ be the normalizedsolution vector (we know oneexists as the data is linearlyseparable)

Algorithm: Perceptron Learning Algorithm

P ← inputs with label 1;N ← inputs with label 0;N−contains negations of all points in N;P ′ ← P ∪N−;Initialize w randomly;while !convergence do

Pick random p ∈ P ′ ;p← p

||p|| (so now,||p|| = 1) ;

if w.p < 0 thenw = w + p ;

end

end//the algorithm converges when all the inputs areclassified correctly

//notice that we do not need the other if conditionbecause by construction we want all points in P ′ tolie in the positive half space w.p ≥ 0

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 280: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1

= w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 281: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1

= w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 282: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1

= w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 283: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1

= w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 284: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||

Numerator = w∗ · wt+1

= w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 285: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1

= w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 286: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 287: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi

≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 288: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})

≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 289: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 290: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 291: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 292: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 293: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 294: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 295: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 296: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 297: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

46/69

Observations:

w∗ is some optimal solutionwhich exists but we don’t knowwhat it is

We do not make a correctionat every time-step

We make a correction only ifwT · pi ≤ 0 at that time step

So at time-step t we wouldhave made only k (≤ t) cor-rections

Every time we make a correc-tion a quantity δ gets added tothe numerator

So by time-step t, a quantitykδ gets added to the numer-ator

Proof:

Now suppose at time step t we inspected thepoint pi and found that wT · pi ≤ 0

We make a correction wt+1 = wt + pi

Let β be the angle between w∗ and wt+1

cosβ =w∗ · wt+1

||wt+1||Numerator = w∗ · wt+1 = w∗ · (wt + pi)

= w∗ · wt + w∗ · pi≥ w∗ · wt + δ (δ = min{w∗ · pi|∀i})≥ w∗ · (wt−1 + pj) + δ

≥ w∗ · wt−1 + w∗ · pj + δ

≥ w∗ · wt−1 + 2δ

≥ w∗ · w0 + (k)δ (By induction)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 298: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 299: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 300: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 301: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 302: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 303: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)

≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 304: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 305: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 306: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 307: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 308: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

47/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 = ||wt+1||2

= (wt + pi) · (wt + pi)

= ||wt||2 + 2wt · pi + ||pi||2)≤ ||wt||2 + ||pi||2 (∵ wt · pi ≤ 0)

≤ ||wt||2 + 1 (∵ ||pi||2 = 1)

≤ (||wt−1||2 + 1) + 1

≤ ||wt−1||2 + 2

≤ ||w0||2 + (k) (By same observation that we made about δ)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 309: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

48/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ)

cosβ ≥ w∗ · w0 + kδ√||w0||2 + k

cosβ thus grows proportional to√k

As k (number of corrections) increases cosβ can become arbitrarily large

But since cosβ ≤ 1, k must be bounded by a maximum number

Thus, there can only be a finite number of corrections (k) to w and the algorithmwill converge!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 310: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

48/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ)

cosβ ≥ w∗ · w0 + kδ√||w0||2 + k

cosβ thus grows proportional to√k

As k (number of corrections) increases cosβ can become arbitrarily large

But since cosβ ≤ 1, k must be bounded by a maximum number

Thus, there can only be a finite number of corrections (k) to w and the algorithmwill converge!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 311: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

48/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ)

cosβ ≥ w∗ · w0 + kδ√||w0||2 + k

cosβ thus grows proportional to√k

As k (number of corrections) increases cosβ can become arbitrarily large

But since cosβ ≤ 1, k must be bounded by a maximum number

Thus, there can only be a finite number of corrections (k) to w and the algorithmwill converge!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 312: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

48/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ)

cosβ ≥ w∗ · w0 + kδ√||w0||2 + k

cosβ thus grows proportional to√k

As k (number of corrections) increases cosβ can become arbitrarily large

But since cosβ ≤ 1, k must be bounded by a maximum number

Thus, there can only be a finite number of corrections (k) to w and the algorithmwill converge!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 313: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

48/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ)

cosβ ≥ w∗ · w0 + kδ√||w0||2 + k

cosβ thus grows proportional to√k

As k (number of corrections) increases cosβ can become arbitrarily large

But since cosβ ≤ 1, k must be bounded by a maximum number

Thus, there can only be a finite number of corrections (k) to w and the algorithmwill converge!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 314: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

48/69

Proof (continued:)

So far we have, wT · pi ≤ 0 (and hence we made the correction)

cosβ =w∗ · wt+1

||wt+1||(by definition)

Numerator ≥ w∗ · w0 + kδ (proved by induction)

Denominator2 ≤ ||w0||2 + k (By same observation that we made about δ)

cosβ ≥ w∗ · w0 + kδ√||w0||2 + k

cosβ thus grows proportional to√k

As k (number of corrections) increases cosβ can become arbitrarily large

But since cosβ ≤ 1, k must be bounded by a maximum number

Thus, there can only be a finite number of corrections (k) to w and the algorithmwill converge!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 315: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

49/69

Coming back to our questions ...

What about non-boolean (say, real) inputs?

Real valued inputs are allowed inperceptron

Do we always need to hand code the threshold?

No, we can learn the threshold

Are all inputs equal? What if we want to assign more weight (importance) tosome inputs?

A perceptron allows weights to be assigned to inputs

What about functions which are not linearly separable ?

Not possible with asingle perceptron but we will see how to handle this ..

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 316: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

49/69

Coming back to our questions ...

What about non-boolean (say, real) inputs? Real valued inputs are allowed inperceptron

Do we always need to hand code the threshold?

No, we can learn the threshold

Are all inputs equal? What if we want to assign more weight (importance) tosome inputs?

A perceptron allows weights to be assigned to inputs

What about functions which are not linearly separable ?

Not possible with asingle perceptron but we will see how to handle this ..

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 317: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

49/69

Coming back to our questions ...

What about non-boolean (say, real) inputs? Real valued inputs are allowed inperceptron

Do we always need to hand code the threshold? No, we can learn the threshold

Are all inputs equal? What if we want to assign more weight (importance) tosome inputs?

A perceptron allows weights to be assigned to inputs

What about functions which are not linearly separable ?

Not possible with asingle perceptron but we will see how to handle this ..

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 318: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

49/69

Coming back to our questions ...

What about non-boolean (say, real) inputs? Real valued inputs are allowed inperceptron

Do we always need to hand code the threshold? No, we can learn the threshold

Are all inputs equal? What if we want to assign more weight (importance) tosome inputs? A perceptron allows weights to be assigned to inputs

What about functions which are not linearly separable ?

Not possible with asingle perceptron but we will see how to handle this ..

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 319: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

49/69

Coming back to our questions ...

What about non-boolean (say, real) inputs? Real valued inputs are allowed inperceptron

Do we always need to hand code the threshold? No, we can learn the threshold

Are all inputs equal? What if we want to assign more weight (importance) tosome inputs? A perceptron allows weights to be assigned to inputs

What about functions which are not linearly separable ? Not possible with asingle perceptron but we will see how to handle this ..

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 320: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

50/69

Module 2.7: Linearly Separable Boolean Functions

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 321: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

51/69

So what do we do about functions which are not linearly separable ?

Let us see one such simple boolean function first ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 322: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

51/69

So what do we do about functions which are not linearly separable ?

Let us see one such simple boolean function first ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 323: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 324: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 325: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 326: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 327: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 328: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 329: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 330: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

52/69

x1 x2 XOR

0 0 0 w0 +∑2

i=1wixi < 0

1 0 1 w0 +∑2

i=1wixi ≥ 0

0 1 1 w0 +∑2

i=1wixi ≥ 0

1 1 0 w0 +∑2

i=1wixi < 0

w0 + w1 · 0 + w2 · 0 < 0 =⇒ w0 < 0

w0 + w1 · 0 + w2 · 1 ≥ 0 =⇒ w2 ≥ −w0

w0 + w1 · 1 + w2 · 0 ≥ 0 =⇒ w1 ≥ −w0

w0 + w1 · 1 + w2 · 1 < 0 =⇒ w1 + w2 < −w0

The fourth condition contradicts conditions 2and 3

Hence we cannot have a solution to this set ofinequalities

x1

x2

(0, 0)

(0, 1)

(1, 0)

(1, 1)

And indeed you can see thatit is impossible to draw aline which separates the redpoints from the blue points

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 331: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

53/69

Most real world data is not linearly separableand will always contain some outliers

In fact, sometimes there may not be any out-liers but still the data may not be linearly sep-arable

We need computational units (models) whichcan deal with such data

While a single perceptron cannot deal withsuch data, we will show that a network of per-ceptrons can indeed deal with such data

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 332: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

53/69

Most real world data is not linearly separableand will always contain some outliers

In fact, sometimes there may not be any out-liers but still the data may not be linearly sep-arable

We need computational units (models) whichcan deal with such data

While a single perceptron cannot deal withsuch data, we will show that a network of per-ceptrons can indeed deal with such data

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 333: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

53/69

++

+++++

++++

+++++++

ooooo

ooooooooo

ooooooooooo o o o o

ooooooo

Most real world data is not linearly separableand will always contain some outliers

In fact, sometimes there may not be any out-liers but still the data may not be linearly sep-arable

We need computational units (models) whichcan deal with such data

While a single perceptron cannot deal withsuch data, we will show that a network of per-ceptrons can indeed deal with such data

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 334: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

53/69

++

+++++

++++

+++++++

ooooo

ooooooooo

ooooooooooo o o o o

ooooooo

Most real world data is not linearly separableand will always contain some outliers

In fact, sometimes there may not be any out-liers but still the data may not be linearly sep-arable

We need computational units (models) whichcan deal with such data

While a single perceptron cannot deal withsuch data, we will show that a network of per-ceptrons can indeed deal with such data

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 335: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

54/69

Before seeing how a network of perceptrons can deal with linearly inseparabledata, we will discuss boolean functions in some more detail ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 336: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 337: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 338: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2

f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16

0 0

0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 1

0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

1 0

0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 339: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1

f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f16

0 0 0

0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

0 1 0

0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

1 0 0

0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

1 1 0

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 340: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1

f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15

f160 0 0

0 0 0 0 0 0 0 1 1 1 1 1 1 1

10 1 0

0 0 0 1 1 1 1 0 0 0 0 1 1 1

11 0 0

0 1 1 0 0 1 1 0 0 1 1 0 0 1

11 1 0

1 0 1 0 1 0 1 0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 341: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2

f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15

f160 0 0 0

0 0 0 0 0 0 1 1 1 1 1 1 1

10 1 0 0

0 0 1 1 1 1 0 0 0 0 1 1 1

11 0 0 0

1 1 0 0 1 1 0 0 1 1 0 0 1

11 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 342: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2

f3 f4 f5 f6 f7

f8

f9 f10 f11 f12 f13 f14 f15

f160 0 0 0

0 0 0 0 0

0

1 1 1 1 1 1 1

10 1 0 0

0 0 1 1 1

1

0 0 0 0 1 1 1

11 0 0 0

1 1 0 0 1

1

0 0 1 1 0 0 1

11 1 0 1

0 1 0 1 0

1

0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 343: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3

f4 f5 f6 f7

f8

f9 f10 f11 f12 f13 f14 f15

f160 0 0 0 0

0 0 0 0

0

1 1 1 1 1 1 1

10 1 0 0 0

0 1 1 1

1

0 0 0 0 1 1 1

11 0 0 0 1

1 0 0 1

1

0 0 1 1 0 0 1

11 1 0 1 0

1 0 1 0

1

0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 344: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4

f5 f6 f7

f8

f9 f10 f11 f12 f13 f14 f15

f160 0 0 0 0 0

0 0 0

0

1 1 1 1 1 1 1

10 1 0 0 0 0

1 1 1

1

0 0 0 0 1 1 1

11 0 0 0 1 1

0 0 1

1

0 0 1 1 0 0 1

11 1 0 1 0 1

0 1 0

1

0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 345: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5

f6 f7

f8

f9 f10 f11 f12 f13 f14 f15

f160 0 0 0 0 0 0

0 0

0

1 1 1 1 1 1 1

10 1 0 0 0 0 1

1 1

1

0 0 0 0 1 1 1

11 0 0 0 1 1 0

0 1

1

0 0 1 1 0 0 1

11 1 0 1 0 1 0

1 0

1

0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 346: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6

f7

f8

f9 f10 f11 f12 f13 f14 f15

f160 0 0 0 0 0 0 0

0

0

1 1 1 1 1 1 1

10 1 0 0 0 0 1 1

1

1

0 0 0 0 1 1 1

11 0 0 0 1 1 0 0

1

1

0 0 1 1 0 0 1

11 1 0 1 0 1 0 1

0

1

0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 347: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8

f9 f10 f11 f12 f13 f14 f15

f160 0 0 0 0 0 0 0 0 0

1 1 1 1 1 1 1

10 1 0 0 0 0 1 1 1 1

0 0 0 0 1 1 1

11 0 0 0 1 1 0 0 1 1

0 0 1 1 0 0 1

11 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 348: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9

f10 f11 f12 f13 f14 f15

f160 0 0 0 0 0 0 0 0 0 1

1 1 1 1 1 1

10 1 0 0 0 0 1 1 1 1 0

0 0 0 1 1 1

11 0 0 0 1 1 0 0 1 1 0

0 1 1 0 0 1

11 1 0 1 0 1 0 1 0 1 0

1 0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 349: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10

f11 f12 f13 f14 f15

f160 0 0 0 0 0 0 0 0 0 1 1

1 1 1 1 1

10 1 0 0 0 0 1 1 1 1 0 0

0 0 1 1 1

11 0 0 0 1 1 0 0 1 1 0 0

1 1 0 0 1

11 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 350: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11

f12 f13 f14 f15

f160 0 0 0 0 0 0 0 0 0 1 1 1

1 1 1 1

10 1 0 0 0 0 1 1 1 1 0 0 0

0 1 1 1

11 0 0 0 1 1 0 0 1 1 0 0 1

1 0 0 1

11 1 0 1 0 1 0 1 0 1 0 1 0

1 0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 351: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12

f13 f14 f15

f160 0 0 0 0 0 0 0 0 0 1 1 1 1

1 1 1

10 1 0 0 0 0 1 1 1 1 0 0 0 0

1 1 1

11 0 0 0 1 1 0 0 1 1 0 0 1 1

0 0 1

11 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 352: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13

f14 f15

f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1

1 1

10 1 0 0 0 0 1 1 1 1 0 0 0 0 1

1 1

11 0 0 0 1 1 0 0 1 1 0 0 1 1 0

0 1

11 1 0 1 0 1 0 1 0 1 0 1 0 1 0

1 0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 353: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14

f15

f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1

1

10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1

1

11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0

1

11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0

1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 354: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 355: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ?

(turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 356: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ? (turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 357: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ? (turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ?

22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 358: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ? (turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ? 22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 359: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ? (turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ? 22n

How many of these 22n

functions are not linearly separable ?

For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 360: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

55/69

How many boolean functions can you design from 2 inputs ?

Let us begin with some easy ones which you already know ..

x1 x2 f1 f2 f3 f4 f5 f6 f7 f8 f9 f10 f11 f12 f13 f14 f15 f160 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 10 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 11 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 11 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Of these, how many are linearly separable ? (turns out all except XOR and!XOR - feel free to verify)

In general, how many boolean functions can you have for n inputs ? 22n

How many of these 22n

functions are not linearly separable ? For the time being,it suffices to know that at least some of these may not be linearly inseparable(I encourage you to figure out the exact answer :-) )

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 361: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

56/69

Module 2.8: Representation Power of a Network ofPerceptrons

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 362: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

57/69

We will now see how to implement any boolean function using a network ofperceptrons ...

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 363: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 364: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 365: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 366: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 367: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 368: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 369: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 370: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 371: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 372: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 373: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 374: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 375: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

58/69

x1 x2

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

For this discussion, we will assume True= +1 and False = -1

We consider 2 inputs and 4 perceptrons

Each input is connected to all the 4 per-ceptrons with specific weights

The bias (w0) of each perceptron is -2(i.e., each perceptron will fire only if theweighted sum of its input is ≥ 2)

Each of these perceptrons is connected toan output perceptron by weights (whichneed to be learned)

The output of this perceptron (y) is theoutput of this network

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 376: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

59/69

x1 x2

h1 h2 h3 h4

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Terminology:

This network contains 3 layers

The layer containing the inputs (x1, x2) iscalled the input layer

The middle layer containing the 4 per-ceptrons is called the hidden layer

The final layer containing one outputneuron is called the output layer

The outputs of the 4 perceptrons in thehidden layer are denoted by h1, h2, h3, h4

The red and blue edges are called layer 1weights

w1, w2, w3, w4 are called layer 2 weights

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 377: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

59/69

x1 x2

h1 h2 h3 h4

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Terminology:

This network contains 3 layers

The layer containing the inputs (x1, x2) iscalled the input layer

The middle layer containing the 4 per-ceptrons is called the hidden layer

The final layer containing one outputneuron is called the output layer

The outputs of the 4 perceptrons in thehidden layer are denoted by h1, h2, h3, h4

The red and blue edges are called layer 1weights

w1, w2, w3, w4 are called layer 2 weights

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 378: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

59/69

x1 x2

h1 h2 h3 h4

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Terminology:

This network contains 3 layers

The layer containing the inputs (x1, x2) iscalled the input layer

The middle layer containing the 4 per-ceptrons is called the hidden layer

The final layer containing one outputneuron is called the output layer

The outputs of the 4 perceptrons in thehidden layer are denoted by h1, h2, h3, h4

The red and blue edges are called layer 1weights

w1, w2, w3, w4 are called layer 2 weights

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 379: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

59/69

x1 x2

h1 h2 h3 h4

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Terminology:

This network contains 3 layers

The layer containing the inputs (x1, x2) iscalled the input layer

The middle layer containing the 4 per-ceptrons is called the hidden layer

The final layer containing one outputneuron is called the output layer

The outputs of the 4 perceptrons in thehidden layer are denoted by h1, h2, h3, h4

The red and blue edges are called layer 1weights

w1, w2, w3, w4 are called layer 2 weights

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 380: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

59/69

x1 x2

h1 h2 h3 h4

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Terminology:

This network contains 3 layers

The layer containing the inputs (x1, x2) iscalled the input layer

The middle layer containing the 4 per-ceptrons is called the hidden layer

The final layer containing one outputneuron is called the output layer

The outputs of the 4 perceptrons in thehidden layer are denoted by h1, h2, h3, h4

The red and blue edges are called layer 1weights

w1, w2, w3, w4 are called layer 2 weights

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 381: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

59/69

x1 x2

h1 h2 h3 h4

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Terminology:

This network contains 3 layers

The layer containing the inputs (x1, x2) iscalled the input layer

The middle layer containing the 4 per-ceptrons is called the hidden layer

The final layer containing one outputneuron is called the output layer

The outputs of the 4 perceptrons in thehidden layer are denoted by h1, h2, h3, h4

The red and blue edges are called layer 1weights

w1, w2, w3, w4 are called layer 2 weights

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 382: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

59/69

x1 x2

h1 h2 h3 h4

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Terminology:

This network contains 3 layers

The layer containing the inputs (x1, x2) iscalled the input layer

The middle layer containing the 4 per-ceptrons is called the hidden layer

The final layer containing one outputneuron is called the output layer

The outputs of the 4 perceptrons in thehidden layer are denoted by h1, h2, h3, h4

The red and blue edges are called layer 1weights

w1, w2, w3, w4 are called layer 2 weights

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 383: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim!

Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 384: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim!

Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 385: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim!

Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 386: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim! Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 387: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim! Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 388: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1

-1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim! Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

the first perceptron fires for {-1,-1}

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 389: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1

1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim! Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

the second perceptron fires for {-1,1}

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 390: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1

1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim! Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

the third perceptron fires for {1,-1}

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 391: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim! Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

the fourth perceptron fires for {1,1}

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 392: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

60/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

We claim that this network can be used toimplement any boolean function (linearlyseparable or not) !

In other words, we can find w1, w2, w3, w4

such that the truth table of any booleanfunction can be represented by this net-work

Astonishing claim! Well, not really, if youunderstand what is going on

Each perceptron in the middle layer firesonly for a specific input (and no two per-ceptrons fire for the same input)

Let us see why this network works by tak-ing an example of the XOR function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 393: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 394: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 395: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 396: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 397: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 398: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 399: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 400: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

61/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

Let w0 be the bias output of the neuron(i.e., it will fire if

∑4i=1wihi ≥ w0)

x1 x2 XOR h1 h2 h3 h4∑4

i=1 wihi0 0 0 1 0 0 0 w1

0 1 1 0 1 0 0 w2

1 0 1 0 0 1 0 w3

1 1 0 0 0 0 1 w4

This results in the following four conditionsto implement XOR: w1 < w0, w2 ≥ w0, w3 ≥w0, w4 < w0

Unlike before, there are no contradictions nowand the system of inequalities can be satisfied

Essentially each wi is now responsible for oneof the 4 possible inputs and can be adjustedto get the desired output for that input

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 401: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

62/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

It should be clear that the same networkcan be used to represent the remaining 15boolean functions also

Each boolean function will result in a dif-ferent set of non-contradicting inequalit-ies which can be satisfied by appropriatelysetting w1, w2, w3, w4

Try it!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 402: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

62/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

It should be clear that the same networkcan be used to represent the remaining 15boolean functions also

Each boolean function will result in a dif-ferent set of non-contradicting inequalit-ies which can be satisfied by appropriatelysetting w1, w2, w3, w4

Try it!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 403: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

62/69

x1 x2

h1 h2 h3 h4

-1,-1 -1,1 1,-1 1,1

bias =-2

y

w1 w2 w3 w4

red edge indicates w = -1blue edge indicates w = +1

It should be clear that the same networkcan be used to represent the remaining 15boolean functions also

Each boolean function will result in a dif-ferent set of non-contradicting inequalit-ies which can be satisfied by appropriatelysetting w1, w2, w3, w4

Try it!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 404: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

63/69

What if we have more than 3 inputs ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 405: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

64/69

Again each of the 8 perceptorns will fire only for one of the 8 inputs

Each of the 8 weights in the second layer is responsible for one of the 8 inputsand can be adjusted to produce the desired output for that input

x1 x2 x3

bias =-3

y

w1 w2 w3 w4 w5 w6 w7 w8

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 406: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

65/69

What if we have n inputs ?

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 407: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

66/69

Theorem

Any boolean function of n inputs can be represented exactly by a network ofperceptrons containing 1 hidden layer with 2n perceptrons and one output layercontaining 1 perceptron

Proof (informal:) We just saw how to construct such a network

Note: A network of 2n + 1 perceptrons is not necessary but sufficient. Forexample, we already saw how to represent AND function with just 1 perceptron

Catch: As n increases the number of perceptrons in the hidden layers obviouslyincreases exponentially

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 408: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

66/69

Theorem

Any boolean function of n inputs can be represented exactly by a network ofperceptrons containing 1 hidden layer with 2n perceptrons and one output layercontaining 1 perceptron

Proof (informal:) We just saw how to construct such a network

Note: A network of 2n + 1 perceptrons is not necessary but sufficient. Forexample, we already saw how to represent AND function with just 1 perceptron

Catch: As n increases the number of perceptrons in the hidden layers obviouslyincreases exponentially

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 409: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

66/69

Theorem

Any boolean function of n inputs can be represented exactly by a network ofperceptrons containing 1 hidden layer with 2n perceptrons and one output layercontaining 1 perceptron

Proof (informal:) We just saw how to construct such a network

Note: A network of 2n + 1 perceptrons is not necessary but sufficient. Forexample, we already saw how to represent AND function with just 1 perceptron

Catch: As n increases the number of perceptrons in the hidden layers obviouslyincreases exponentially

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 410: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

67/69

Again, why do we care about boolean functions ?

How does this help us with our original problem: which was to predict whetherwe like a movie or not?

Let us see!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 411: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

67/69

Again, why do we care about boolean functions ?

How does this help us with our original problem: which was to predict whetherwe like a movie or not?

Let us see!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 412: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

67/69

Again, why do we care about boolean functions ?

How does this help us with our original problem: which was to predict whetherwe like a movie or not? Let us see!

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 413: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

68/69

x1 x2 x3

bias =-3

y

w1 w2 w3 w4 w5 w6 w7 w8

p1p2...n1n2...

x11 x12 . . . x1n y1 = 1x21 x22 . . . x2n y2 = 1

......

......

...xk1 xk2 . . . xkn yi = 0xj1 xj2 . . . xjn yj = 0...

......

......

We are given this data about our past movieexperience

For each movie, we are given the values of thevarious factors (x1, x2, . . . , xn) that we baseour decision on and we are also also given thevalue of y (like/dislike)

pi’s are the points for which the output was 1and ni’s are the points for which it was 0

The data may or may not be linearly separable

The proof that we just saw tells us that it ispossible to have a network of perceptrons andlearn the weights in this network such that forany given pi or nj the output of the networkwill be the same as yi or yj (i.e., we can sep-arate the positive and the negative points)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 414: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

68/69

x1 x2 x3

bias =-3

y

w1 w2 w3 w4 w5 w6 w7 w8

p1p2...n1n2...

x11 x12 . . . x1n y1 = 1x21 x22 . . . x2n y2 = 1

......

......

...xk1 xk2 . . . xkn yi = 0xj1 xj2 . . . xjn yj = 0...

......

......

We are given this data about our past movieexperience

For each movie, we are given the values of thevarious factors (x1, x2, . . . , xn) that we baseour decision on and we are also also given thevalue of y (like/dislike)

pi’s are the points for which the output was 1and ni’s are the points for which it was 0

The data may or may not be linearly separable

The proof that we just saw tells us that it ispossible to have a network of perceptrons andlearn the weights in this network such that forany given pi or nj the output of the networkwill be the same as yi or yj (i.e., we can sep-arate the positive and the negative points)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 415: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

68/69

x1 x2 x3

bias =-3

y

w1 w2 w3 w4 w5 w6 w7 w8

p1p2...n1n2...

x11 x12 . . . x1n y1 = 1x21 x22 . . . x2n y2 = 1

......

......

...xk1 xk2 . . . xkn yi = 0xj1 xj2 . . . xjn yj = 0...

......

......

We are given this data about our past movieexperience

For each movie, we are given the values of thevarious factors (x1, x2, . . . , xn) that we baseour decision on and we are also also given thevalue of y (like/dislike)

pi’s are the points for which the output was 1and ni’s are the points for which it was 0

The data may or may not be linearly separable

The proof that we just saw tells us that it ispossible to have a network of perceptrons andlearn the weights in this network such that forany given pi or nj the output of the networkwill be the same as yi or yj (i.e., we can sep-arate the positive and the negative points)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 416: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

68/69

x1 x2 x3

bias =-3

y

w1 w2 w3 w4 w5 w6 w7 w8

p1p2...n1n2...

x11 x12 . . . x1n y1 = 1x21 x22 . . . x2n y2 = 1

......

......

...xk1 xk2 . . . xkn yi = 0xj1 xj2 . . . xjn yj = 0...

......

......

We are given this data about our past movieexperience

For each movie, we are given the values of thevarious factors (x1, x2, . . . , xn) that we baseour decision on and we are also also given thevalue of y (like/dislike)

pi’s are the points for which the output was 1and ni’s are the points for which it was 0

The data may or may not be linearly separable

The proof that we just saw tells us that it ispossible to have a network of perceptrons andlearn the weights in this network such that forany given pi or nj the output of the networkwill be the same as yi or yj (i.e., we can sep-arate the positive and the negative points)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 417: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

68/69

x1 x2 x3

bias =-3

y

w1 w2 w3 w4 w5 w6 w7 w8

p1p2...n1n2...

x11 x12 . . . x1n y1 = 1x21 x22 . . . x2n y2 = 1

......

......

...xk1 xk2 . . . xkn yi = 0xj1 xj2 . . . xjn yj = 0...

......

......

We are given this data about our past movieexperience

For each movie, we are given the values of thevarious factors (x1, x2, . . . , xn) that we baseour decision on and we are also also given thevalue of y (like/dislike)

pi’s are the points for which the output was 1and ni’s are the points for which it was 0

The data may or may not be linearly separable

The proof that we just saw tells us that it ispossible to have a network of perceptrons andlearn the weights in this network such that forany given pi or nj the output of the networkwill be the same as yi or yj (i.e., we can sep-arate the positive and the negative points)

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 418: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

69/69

The story so far ...

Networks of the form that we just saw (containing, an input, output and oneor more hidden layers) are called Multilayer Perceptrons (MLP, in short)

More appropriate terminology would be“Multilayered Network of Perceptrons”but MLP is the more commonly used name

The theorem that we just saw gives us the representation power of a MLP witha single hidden layer

Specifically, it tells us that a MLP with a single hidden layer can represent anyboolean function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 419: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

69/69

The story so far ...

Networks of the form that we just saw (containing, an input, output and oneor more hidden layers) are called Multilayer Perceptrons (MLP, in short)

More appropriate terminology would be“Multilayered Network of Perceptrons”but MLP is the more commonly used name

The theorem that we just saw gives us the representation power of a MLP witha single hidden layer

Specifically, it tells us that a MLP with a single hidden layer can represent anyboolean function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 420: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

69/69

The story so far ...

Networks of the form that we just saw (containing, an input, output and oneor more hidden layers) are called Multilayer Perceptrons (MLP, in short)

More appropriate terminology would be“Multilayered Network of Perceptrons”but MLP is the more commonly used name

The theorem that we just saw gives us the representation power of a MLP witha single hidden layer

Specifically, it tells us that a MLP with a single hidden layer can represent anyboolean function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2

Page 421: CS7015 (Deep Learning) : Lecture 2miteshk/CS7015/Slides/... · The sense organs relay information to the low-est layer of neurons Some of these neurons may re (in red) in re-sponse

69/69

The story so far ...

Networks of the form that we just saw (containing, an input, output and oneor more hidden layers) are called Multilayer Perceptrons (MLP, in short)

More appropriate terminology would be“Multilayered Network of Perceptrons”but MLP is the more commonly used name

The theorem that we just saw gives us the representation power of a MLP witha single hidden layer

Specifically, it tells us that a MLP with a single hidden layer can represent anyboolean function

Mitesh M. Khapra CS7015 (Deep Learning) : Lecture 2